Fixpoint is all you need

Fixpoint is all you need

Tags
Last Updated
Revised2492 Nov 23, 2024
Published
Jul 11, 2024
Author
Evan Gao
Text
🎶NOW LISTENING🎶
music
It Was a Good Day・Ice Cube
An instance of bondage. Courtesy of Angle Dust.
An instance of bondage. Courtesy of Angle Dust.

Origin: the duplicating name

若无说明,我们使用 CBN,而非 CBV. 即我们总是惰性求值的。
在 Lambda 演算中,经常干的一件事就是变量替换(Substitution)。「替换」指的是将某个表达式中的特定自由变量全部替换为另一个表达式。 化简就是用变量替换来定义的。
设想你是 Church,正在研究 Lambda Calculus,现在要来给出「替换」的一个十分 Primitive 的定义:
(约定俗成地,式子中的 是变量; 则是 lambda term)
在 Lambda 演算中,我们一般不关心约束变量(bounded variable)的名称,但是我们仍需要名称,这样我们才能在表达式中引用(reference)这些变量。
🤓
有效的表达式称为 lambda 项
  • 这就产生了问题:如果约束变量和自由变量重名会怎么样。

Bound Substitution

考虑一个情况:
根据定义 * 式有 . 显然这是错误的,因为我们只能替换自由变量。只有 beta 化简才能把 lambda 去掉(将 y 去约束):
那么只能加强 * 式,依赖于 等价,我们要求所有的抽象(Abstraction)不得与替换的变量名相同,即
现在上述情况就不合法了,必须要写成等价的 . (将 y 替换为 x 的这一步用到了 alpha 变换,经过这一变换得到的式子之间称 (alpha) 等价)

Accidental Capturing

不幸的是 * 式仍有问题。考虑
很显然这也是错的,x 本是一自由变量,经过替换却因为重名被捕捉(Capture),成为了约束变量。显然,还要规定 不是 中的自由变量:

insights

上述两种情况就是约束变量和自由变量重名所带来的问题。本质上来说都是命名问题,都可以通过 alpha 变换来解决。
然而这在变量较多的情况且有大量重名的时候还是令人头疼,而且上面也说过约束变量的名字并不重要,重要的是 scope 和 binder indices,所以出现了 de Brujin 记法。
考虑
可以看到第一个 lambda 的 scope 是第一对括号;第二个是第二对。内层可以看见外层。
在同一个作用域里讨论,根据 lambda 的序数就能确定其绑定的变量。上面的式子可写成
这东西其实挺抽象的,所以没啥人用。也不是本文的重点,不赘述。
而除此之外的另一种解决方法就是组合子(Combinators)。

Combinator Defined

Preliminary

🤓
Definition. 称一个不含自由变量的 lambda term 为闭合(closed)的,亦称为组合子
这个「组合子」就是 combinatory logic 中的 combinator. 组合子逻辑的提出就是为了消灭量词(quantifier)。在这点上谓词逻辑和 lambda calculus有相似之处:一阶逻辑量化变元,二阶逻辑量化谓词,量词除了量化,还有一关键作用即绑定(bind)变量,所以也是一个 binder,这和 lambda 有异曲同工之妙:后者也是 binder. 且只要涉及绑定,替换问题就是不可避免的。
  • 所以说组合子通过消灭变量的约束与自由之区别解决了重名问题——大家(指变量)的地位都一样。
它在 CS 中也是一种简化的计算模型,和 lambda 演算一样。(所以它也满足 Godel 不完备定理,这是可以预想的)在计算能力上,图灵机、lambda 演算和组合子逻辑三者等价,但组合子逻辑没有抽象也就是说它不像 lambda 演算那样需要注意语义(semantics),以免出现上述的替换问题。替换是建立在抽象上的。组合子逻辑根本没有替换这一说。
实际上,我们只需要两个组合子和 application 就能够编码(encode, 注意和 notation 的区别:前者更像一种 implementation,而后者则是换一种记法) lambda 演算。

吉祥三宝

最原始的三个组合子也叫做 Primitive Functions. 不妨用闭合的 lambda term 来定义它:
有人做了一个 Unlambda 语言,只有 S 和 K 两个 language primitives.即组合子演算 . 个人认为 esoteric programming language 之王非其莫属——连变量都没有(但组合子逻辑有)的语言多难写啊,相比之下 brainfuck 都简单许多。当然这没有什么实际用途,只是用于研究而已。
(当然咯,纯 lambda calculus 也只有函数,也是另一个极端。此处不赘述 Church Encoding)
上面这三个组合子,光看定义不太好表示意思,不妨用人话叙述一遍:
  • I 即 Identity(应该吧?Schönfinkel 是德国人,不好说),返回自身。
  • K 即 Konstant(应该吧?),输入一个 x 则得到一个返回 x 的常函数,接着再输入一个 y 就能得到 x。(lambda calculus 中只有一元函数,相当于 currying),也可以看成 K 返回一个2-元组的第一个元素。
  • S 即 Substitute(应该吧?),即 apply x to y. 但这个 Application 是 generalized 的:x, y 是在 z 下(所以 x, y 首先应用于 z)进行运算的。可以把 z 看成一个环境(environment)。例如 .
根据定义,I 即 Identity,这个组合子可以由 K、S 导出。想法是什么呢?
—— S 可以看成把 z 注入(插入到)x y 之后,搭配上 K,就是插入到 K 的第一个参数上,而该参数正好是 K 返回的函数会返回的值。因为我们希望实现 I,那么最终应该返回 x,不妨搭配 S K 和 x 来算一算:
正好,这正是 I 所作的事情。这一方面可以从上面的叙述看出来,还可以直接替换成 lambda 演算的定义具体地查看运算过程。
这三个组合子构成的演算系统称为 SKI-Calculus. 我们之后会互换着用组合子与lambda 演算,可以把前者视作后者的 extension.
  • 显然,combinator 本质上是 higher-order function. This is what combinator logic is all about.

Duplicator

现在我们来考虑一个组合子 M(或者 U),这个组合子需要一个参数,并返回这个参数作用于自身。即
所以叫它 duplicator. (M as in Mockingbird. Not a reference to Eminem’s album of the same name.) 这个东西看起来十分 Trivial,但他其实有个很有趣的性质:任何与之结合的 combinatory terms 都不能化简(reduce),即 . 把上述化成 lambda 项可以有助于理解:
eval 一下,发现 . 即 M 化简后仍是其本身,说明它是不可化简(irreducible)的。这个特殊情况叫做 term,它没有值。(, Paradoxical Combinator)这个性质很好的地方在于,每次求值都能生成一个函数调用(invocation),可以用作 invocation generator.
此外还有一个性质是 当 M 被用作参数的时候表达式是否可解与求值顺序有关,这个是显而易见的。
考虑 .
  • 如果是 call-by-value/eager eval,会首先去求解 ,显然 M 永远求不出值,程序也就不能终止。
  • 如果是 call-by-name,会首先将 M 代入式子中,最后计算,显然值为 1.
这和我们 PL 的 evaluation order产生了联系:大部分 fp 语言都是 CBV,但是像 haskell 使用 lazy eval,就是一种 call-by-need 求值方式,类似 call-by-name,比其更高效。
上面给出的是其直接定义,我们当然也能用 S I 编码 M——想法也很简单,因为 S 将最后的参数插入到前两个参数之后,I 能返回本身,猜想能够用 S I 实现 M,算一算:
这正好就是 M. 就是一种简单的实现。
除此之外还有一个组合子 W(或者 H),
先记住这些组合子,之后可以用到。关于对组合子逻辑更加全面的叙述,参阅 SEP 同名条目:

Recursion Implemented

Factorial

考虑老生特别常谈的 (Lean & Lisp):
def fact: ℕ -> ℕ | 0 => 1 | n + 1 => (n + 1) * fact n
(defun fact (n) (if (zerop n) 1 (* n (fact (1- n)))))
先不管是不是 tail-recursive,那不重要
现在假设我们要在 lambda 演算中实现这个递归定义,无非就是:
这里我们预设 if zerop Pred * 都已经编码(怎么编码参见 Church Encoding)好了,常数加顶表示 Church 数。
很可惜这是一个 invalid lambda expression——因为 lambda 演算只有匿名函数,没有递归定义(当然任何有 this 的语言都不在乎这个)——所以这只能是一个递归的等式。 或者说一个方程,我们就是要求解 fact .
  • 因此我们实际上是要构造一个 lambda 项 fact 使得其满足上面这一等式。
这听上去像什么——联系一下不动点的定义:,把它扩展到高阶函数的范围。
我们把等号左边的 fact 改写成 ,构造一个映射
可以发现这个映射长得很像上面的 fact ,那么它什么时候能成为 fact
——当 fact 成为 的不动点的时候(这就是我们的目标的等价叙述)。即
事实上任何一个 的不动点都能构造出 fact ——任意一个不动点都是 式的解,并且只要这个不动点接收到一个 的输入,就一定会返回 .
  • 换句话说,递归方程的解是该方程对应的函数的一个不动点。
  • 长得像 fact 但不是 fact ——根据其作用,不妨称其为 cons-fact 吧,即 fact 构造子。
那么当前的首要任务就是如何通过 构造 fact —— 构造 的不动点。

Attracting Fixed Points

可以把 想成是一个解开(unwind)递归定义的工具,fact 的一个逼近。每将 应用到 上一次, 就越接近 fact 一点,即如果 上与 fact 相等,那么 则是在 上和 fact 相等,还可以归纳出 一定在 上和 fact 相等。 在这里做的事就是用上一个近似计算下一个近似。可以标注一下其类型:
——如果注意力够集中,马上就能注意到这就是吸引不动点。Recall the definition:
🤓
Definition. 吸引不动点 的一个不动点 s.t. 对关于 的足够小的邻域 内的任何 而言,序列 收敛于 .
根据定义可知,随着迭代的进行,这个序列收敛于不动点 fact 。重点是,如何使邻域足够小?一个很直觉的判断是,只要让 长得像 fact 即可——显然目前我们手上和 fact 长得最像的就是函数 . 所以我们不妨试试让 apply to itself. 即 . (A Reference to the famous song TT by TWICE)
  • 但这也不够好——我们想从头构造出 fact 来。而程序需要终止——对于收敛于不动点的无穷序列来说,程序终止的那一刻我们只能获得一个近似,而非 fact 本身。
  • 那么有没有办法构造出不动点 fact 本身呢?也许我们需要改写 了。

Recursion by Self-Application

上面说过,要将 应用于自身。回想其定义,有一个细节即等号右边只有一个 . 因为现在要应用于自身(self-application), 的第一个参数应该是自身,在 body 中我们也要这样做,这样才能在递归过程中保持这个性质—— 的参数总是它自己。(有时也称 是一个自指(self-referencing)参数,为了函数体内对 的下一次 invocation 能够引用自身,所以要把自身作为参数传入)这样我们称改写后的
🤓
注意力好的同学会发现,.
然后我们来算 . Recall the substitution definition, we have
真是条条大路通罗马呀(万常选等. 数据库系统原理与实践
太妙了!这是什么?
这不就是我们心心念念的 的不动点吗——结束了。
为了验证一下,可以算算
这可能有点不好看,换成 fact
很明显, 在 lambda calculus 下递归定义了 fact .
这给我们的 insight 就是:在没有递归的计算系统内实现递归,self-application 是关键的。

Y

回想一下我们前面所作的事:
  1. 构造了一个函数 ,描述了某个递归定义,接着我们要寻找它的不动点
  1. 于是我们构造了
  1. 接着我们求一步
这就是一个 的不动点,同时可以发现上面的描述是一般化的,和 具体是什么没有关系。所以不妨把 提出来,构造
则有 ,Y 输入一个函数,返回该函数的不动点,称这样的组合子为不动点组合子。
显然,不动点组合子有一个不证自明的性质(即定义):
有意思的一点是, 返回的不动点正是上边组合子逻辑所述的 ——一个无限的调用生成器。 再比对一下上面 M、W 组合子的形式,很容易得出
我们把这样构造的不动点组合子称为 Y combinator.
我们完成了把 Y 用 lambda abstraction 定义的壮举。Remarkable.

Alternate Y

不动点组合子有无数种构造方式。上述是 Curry 的构造。Klop, 2007 给出了一种很抽象(adj. 难以理解的,cf. lambda abstraction)的构造:
where
不过这个是整活,带进去算一算就能验证了。 此外还有一个有名的 fixpoint combinator 是 Turing 发现的,使用上述方法也能 derive 出来,此处略。

Implementation Restrictions

Curry 的构造可以直接翻译成 unityped lazy Racket:
#lang lazy (define Y (λ (f) ((λ (x) (f (x x))) (λ (x) (f (x x)))))) (define fact (Y (λ (fact) (λ (n) (if (zero? n) 1 (* n (fact (- n 1))))))))
只能说没有辜负 PLT Scheme 的大名
许多语言不能做到这一点的原因是
  1. 静态类型语言不能很好地处理递归类型( 可能没有 finite type)(我们上面讨论的是动态类型的 lambda calculus/ Unityped LC)严格来说不算问题。
  1. CBV (计算 时先求 ,而不是先代入,不能终止)——会爆栈

Z

arguments to functions are evaluated before the function is applied.
先来看问题2.(问题 1看心情)回顾一下什么是 CBV:函数只会被应用于值上(换句话说,函数的参数必须被化简到不能化简为止)。什么是值——任何一个 lambda abstraction (i.e. a function definition) 都是值。
不妨手算一下 CBV 的求值过程:
essentially, it’s .
  • compare to that of CBN:
正是因为 CBN 不在乎函数的参数是否为值,只是把它看成一个整体代入,所以 不会无限求值。 本身是一个高阶函数,接收到 后返回的仍然是函数,即 abstraction. 所以终止求值。
但为什么 CBV 无法终止呢?这个问题很明显出现在 式的 中的 self-application 部分:CBV 不停地求出 而不是先 apply to ,导致爆栈。
怎样防止 在 CBV 下求值?上面说过,一个 abstraction 就是值。因此,我们可以把 包装在一个 lambda abstraction 中,即
这样, 只有在接受到参数(例如 fact n,某种意义上这也是一个 counter. 保证了递归能够终止,达到了迭代的效果。参见 Structural/Well-founded Recursion.)的时候才会求值。那么我们有
在 CBV 下手算可以加深我们的理解:
可见, 有效地阻挡了一步(lacking one) 式,避免了递归展开。
——那么为什么使用 而不是 ?明明后者也有同样的效果。
  • 这个问题是我初学时遇到的,却没见什么人讲。
在我们计算之前,可能不好看出;现在根据上面的计算过程,可以发现 的作用是承担了递归函数的参数位置,例如 fact nn. 如果没有它,相当于我们在递归的过程中丢失了 n 使得递归规模没有办法减小(不是 Structural Recursion),无法终止了。
这样,我们由 构造出了一个新的组合子,一般称它为 Z combinator.
  • 理论上,现在我们也能在 CBV 中使用递归了—— 直接把定义翻译成代码试试。
(defun Y (f) ; actually it's `Z` ((lambda (x) (funcall f (lambda (y) (funcall (funcall x x) y)))) (lambda (x) (funcall f (lambda (y) (funcall (funcall x x) y)))))) (defun cons-fact (f) ; T (lambda (n) (if (zerop n) 1 (* n (funcall f (1- n)))))) (defun cons-ackermann (f) (lambda (m n) (cond ((zerop m) (1+ n)) ((and (> m 0) (zerop n)) (funcall f (1- m) 1)) ((and (> m 0) (zerop n)) (funcall f (1- m) (funcall f m (1- n))))))) (funcall (Y #'cons-fact) 4) ; => 24 (mapcar (Y #'cons-fact) '(0 1 2 3 4 5 6 7 8 9 10)) ; => (1 1 2 6 24 120 720 5040 40320 362880 3628800) (defun fact (n) (if (zerop n) 1 (* n (fact (1- n))))) (defun tfact (n &optional (acc 1)) ; tail-recursive (if (zerop n) acc (tfact (1- n) (* n acc)))) (defun ffact (n) (reduce #'* (iota n))) ; any sane person would do
funcall 很丑,谁叫 CL 是 Lisp-2 呢
上面给出几种实现,我们不妨 profile 一下看看如何:
(mapcar (lambda (f) (time (funcall f 20000))) `(,#'fact ,#'tfact ,#'ffact ,(Y #'cons-fact)))
;; Order: fact -> tfact -> ffact -> (Y #'cons-fact) Evaluation took: 0.057 seconds of real time 0.104721 seconds of total run time (0.092086 user, 0.012635 system) [ Real times consist of 0.003 seconds GC time, and 0.054 seconds non-GC time. ] [ Run times consist of 0.003 seconds GC time, and 0.102 seconds non-GC time. ] 184.21% CPU 332,241,824 bytes consed Evaluation took: 0.064 seconds of real time 0.062912 seconds of total run time (0.053324 user, 0.009588 system) [ Real times consist of 0.015 seconds GC time, and 0.049 seconds non-GC time. ] [ Run times consist of 0.015 seconds GC time, and 0.048 seconds non-GC time. ] 98.44% CPU 339,602,176 bytes consed Evaluation took: 0.060 seconds of real time 0.059242 seconds of total run time (0.038523 user, 0.020719 system) 98.33% CPU 303,779,168 bytes consed Evaluation took: 0.061 seconds of real time 0.060913 seconds of total run time (0.041709 user, 0.019204 system) [ Real times consist of 0.002 seconds GC time, and 0.059 seconds non-GC time. ] [ Run times consist of 0.001 seconds GC time, and 0.060 seconds non-GC time. ] 100.00% CPU 304,801,856 bytes consed
SBCL 下,可以发现 1e5 左右的数据还是没问题的,大家都差不多。内存上肯定是 ffact 最小,不需要 GC. 考虑这一点和 GC 时间的话,肯定是它最高效
  • Y combinator 的效率是很低效的。不如说整个 Church Encoding 都是。
另外其实有一点,注意 Y 的代码,其中的两个 lambda 代码重复了,浪费:
(defun Y (f) ((lambda (x) ; W (funcall f (lambda (y) (funcall (funcall x x) y)))) (lambda (x) (funcall f (lambda (y) ; W (funcall (funcall x x) y))))))
Ad nauseam. 还记得 M W 组合子吗?我们再来改进一下上面的代码。
(defun Y (f) ((lambda (x) ;; M (funcall x x)) (lambda (x) (funcall f ;; W (lambda (y) (funcall (funcall x x) y))))))
这是什么?这就是 . 一切都融会贯通了。我更喜欢后面这个构造。
如果允许递归实现(= 脱裤子放屁)的话,那不用脑:
(defun Y (f) (lambda (x) (funcall (funcall f (Y f)) x))) ; Y f = f (Y f)
  • 对任何的 unityped/untyped(in the sense of not statically typed) language 来说, 都是可以轻松实现的。
用 J 还能写 Tacit 版本的,但是真的太抽象
The Y combinator can be implemented indirectly using, for example, the linear representations of verbs (Y becomes a wrapper which takes an ad hoc verb as an argument and serializes it; the underlying self-referring system interprets the serialized representation of a verb as the corresponding verb):
Y=. ((((&>)/)((((^:_1)b.)(`(<'0';_1)))(`:6)))(&([ 128!:2 ,&<)))

Addendum: Typed Fixpoint

(Requires basic prior knowledge of type theory, my dumb ass forgot about problem 1 after solving problem 2)

the Rectype Problem

In many forms of (multi-) typed - calculus (and more general type theory), a fixed-point combinator cannot be constructed, because there is no type whose terms can be applied to themselves. This is usually intentional, because it avoids the nontermination inherent in the existence of a fixed-point combinator.
Consider the combinator mentioned above — M can be defined (hypothetically) as . Which, for to be valid, must have the type for some type . Yet has type , so we have which does not hold (can’t unify both , since no type expression can be a subexpression of itself) under the assumption of STLC. Thus neither. — , , or generally, self-application plays a crucial part of the combinator. If M is ill-typed, so does Y.
  • Therefore it’s impossible to construct a in STLC.
In fact, making the fixed-point combinator inexpressible was the reason for introducing types into lambda calculus in the first place. — To prevent from using untyped lambda calculus as a logic as one can indeed perform non-terminating (i.e. ill-founded recursion) calculation using self-application. (that is, the whole point of this post, fixpoint combinator of course!)
— Any non-terminating program cannot be implmented in , without breaking its strong normalization property of course.
  • That’s why we have to write structural recursion or prove recursion can terminate in theorem prover. e.g. the gcd fib in lean:
See
Blame Prism.js for not supporting Lean. (jk)
def gcd (a b: ℕ): ℕ := match b with | 0 => a | b + 1 => gcd (b + 1) (a % (b + 1)) termination_by b -- structural recursion can't be used. decreasing_by -- prove that the recursion ends. i.e. a % (b + 1) < b + 1 simp_wf; apply mod_lt _ (zero_lt_of_ne_zero _); apply add_one_ne_zero -- `unsafe` works as expected but can't be used in proof unsafe def gcd_unsafe: ℕ -> ℕ -> ℕ | a, 0 => a | a, b + 1 => gcd_unsafe (b + 1) (a % (b + 1)) def fib(n: ℕ) := match n with | 0 => 0 | 1 => 1 | k + 2 => fib (k + 1) + fib k -- in pattern matching one must use structural recursion -- i.e. k + 2 (= (succ (succ 2)) ) instead `k => fib (k - 1) + fib (k - 2)` -- as it overlaps with case 0 and 1 because ∀k(k: ℕ -> k >=0). -- though lean performs top-down matching thus the recursion does terminate, -- it's nonterminate *definitionally*.
but this is off topic.

Fixing STLC

STLC isn’t turing complete yet, we can fix (cringe 🤓-tier joke yikes) that by adding unbounded recursion.
Recall : . We already know is a higher-order function which takes i.e. the actual definition of a recursive function. Assume has type . yields an fixpoint of which is — of course . Thus the body of may be written as where performs self-application within (thatis, the actual definition. are in the scope inside ). In short, and are of type .
Should we not care about strong normalization, we may add a primitive operator as
This extends the definiton of expression.
Literally this means is the fixpoint of .
Consider the good ol’ fact. In this case, cons-fact is and fact is . We can define fact as
So what does do with it? Recall . This also serves as a operational semantics, we just rewrite with the definition above:
(Operational semantics really are just reduction rules.)
Typing rule of this is also straightforward:
  • Combined, we get
There’s also denotational semantics for it, but spare me. Reference can be used to implement recursion as well, but spare me that too.
Now we can write non-terminate programs in , making it turing complete.
  • However, we did not solve problem 1. As most language don’t use simply typed lambda calculus — Even if they do, it’s pretty much impossible to add a primitive.

Polymorphism

In polymorphism (using System F as an example) lambda calculus, the combinator is typable and universally quantified over one type.
Only instantiate the first . We can prove this well-typed by
The problem is that we instantiate that universal quantifier with itself, which is self-referential (a infinite rectype of course). This inevitably leads to paradox (called impredicativity I think, hard word), thus some type systems reject it.
  • Haskell does not allow rectypes by default even though it utilize System F. But can be broken with newtype .
  • But is still ill-typed.
There’s also let-polymorphism, but spare me.

Rectype for Real

We can define a polymorphic Binary Tree with recursive type in most languages:
inductive BTree (α : Type) : Type where | Nil | Node : α -> BTree α -> BTree α -> BTree α
Or in java
public class BTree<T> { T data; BTree<T> l, r; public String toString() { return this.l.l.l.l.l.r.r.r.r.r.r.r.data.toString(); } }
Basically, we would like the type (nullable) to satisfy
Polymorhpism isn’t the actual problem here so we’ll just instantiate with . Thus the solution of
is . Doesn’t it sound familiar ? — it is a fixpoint for the RHS.
We’ve seen the fixpoint combinator , now we can define the same thing, but for types.
introducing the fixpoint type constructor. We use to denote the solution of . i.e.
Intuitively, we repeat this subtitution process, by replacing with . If we are devoted enough (that is, non-terminate), eventually the RHS of this equation will contain no more ’ s, thus we obtain the solution . Such process is called unwinding/unfolding.
  • Notice the use of in above expression. This means we are treating and it’s substitution (expansion) as the same type. This approach is called equirecursive.
However, we can take another approach called isorecursive which deems the types not interchangable (but they are isomorphic).
This isomorphic property is maintained by two operations, unfolding and folding. Folding is just the reverse of unfolding:
There are also typing rules but surely do not need further explanation. The differences between these two approaches really aren’t something big (one does these operation implicitly and the other doesn’t). If one implements some recursive procedures using isomorphic types then the implementation is still valid with the other.
Now we are ready to solve problem 1.

Implementation

We shall take the isomorphic approach as it works with both.
To maintain logical consistencies, Lean and other theorem provers that use dependent type are very strict about well-foundedness of recursion and non-termination won’t pass their type checkers. We prepend definition w/ unsafe to bypass that.

Mu

As is treated as a independent type, we define a type which binds the type expression and a as its constructor:
unsafe inductive μ (α : Type u) : Type u | fold : (μ α -> α) -> μ α
We use μ α to express , α to express the type after unfolding, which means we assume μ α is referenced within α. the operation means if we have a expansion μ α -> α going on, we can fold it to be μ α .
We also define a unfold function to destructure the rectype constructed by fold :
unsafe def unfold : μ α -> (μ α -> α) | .fold f => f
unfold can be seen as a destructor, contrary to fold . But given μ being a rectype, unfold can’t really destruct the type to be free of μ by definition.
Now recall . We can now give it rectypes. To type , we just need to find the solution of . Previously this wasn’t possible with STLC. But rectype seems to be it.
So is of type :
  • let’s try unfolding it and see what happens.
notice , therefore is valid. Now we can type as
consequently, we have , Now we may easily code it in lean:
unsafe def M (x : μ α) : α := unfold x x

Typed Y

Finally, recall the definition of :
For we have , suppose we have bound beforehand, which is of type ; is just like so , now we can deduce the type of :
And we can implement as
unsafe def fix (t : (α -> α) -> α -> α) : α -> α := let Wt : μ (α -> α) := .fold (λ f => t (unfold f f)) M Wt
The whole implementation is
unsafe inductive μ (α : Type u) : Type u | fold : (μ α -> α) -> μ α unsafe def unfold : μ α -> (μ α -> α) | .fold f => f unsafe def M (x : μ α) : α := unfold x x unsafe def fix (t : (α -> α) -> α -> α) : α -> α := let Wt : μ (α -> α) := .fold (λ f => t (unfold f f)) M Wt unsafe def gen_factorial (f : ℕ -> ℕ) (n : ℕ) : ℕ := if n = 0 then 1 else n * f (n - 1) #eval fix gen_factorial 25 -- 15511210043330985984000000 -- recursive version unsafe def fix_rec (f : (α -> α) -> α -> α) (x) := f (fix_rec f) x
dig deeper about gen_factorial
(You may notice the gen_factorial looked a bit like the CPS-style of fact .)
(** pseudocode, does not actually work *) let fact_cps = let rec fix x = x (fix x) in fix (fun f n k -> if n = 0 then k 1 else f (n - 1) (fun acc -> k (n * acc)))
Caveats
The strictness of Lean is weird. According to De Moura Lean is a strict language. But we may observe that the implementation above is the actual Y combinator, not Z.
  • Try changing the type of t in fix to α -> α (And subsequent any other type in Wt): Now it stackoverflows. I did not came up with a reasonable explanation for this evaluation semantics.

I asked. See:
  • Eta-{reduction, expansion}, suppose is a term:
e.g. consider , forall , holds. By functional extensionality we may say . eta-{reduction, expansion}’s construction of guarantees this extensionality under the assumption of .
Intuitively this wraps the term in a abstraction to create a thunk which delays evaluation, like what we did with . Though Lean is a strict language, it performs this expansion thus it’s not as strict as OCaml or some other languages.
See also
Phew, wasn’t so hard right?
— but it takes some time to fully understand this < 10 lines of code, surely.
There’s also a java version as well:
import java.util.function.Function; public interface YCombinator { interface RecursiveFunction<F> extends Function<RecursiveFunction<F>, F> { } public static <A,B> Function<A,B> Y(Function<Function<A,B>, Function<A,B>> f) { RecursiveFunction<Function<A,B>> r = w -> f.apply(x -> w.apply(w).apply(x)); return r.apply(r); } public static void main(String... arguments) { Function<Integer,Integer> fac = Y(f -> n -> (n <= 1) ? 1 : (n * f.apply(n - 1)) ); System.out.println("fac(10) = " + fac.apply(10)); } }
but syntax-wise, I prefer the ML family one.
 
Or in MoonBit, a Rust-like language me and my colleagues are working on:
(Unlike Lean, MoonBit is CBV. We use Z combinator)
enum Mu[T] { // uppercase for type constructor Fold((Mu[T]) -> T) } fn unfold[T](folded : Mu[T]) -> (Mu[T]) -> T { match folded { Fold(f) => f } } fn m[T](x : Mu[T]) -> T { // lowercase for func name unfold(x)(x) } fn fix[T](f : ((T) -> T) -> (T) -> T) -> (T) -> T { let wt : Mu[(T) -> T] = Fold(fn(x) { f(fn(t : T) { unfold(x)(x)(t) }) }) m(wt) } fn make_factorial(f : (Int) -> Int) -> (Int) -> Int { fn(n : Int) { if n == 0 { 1 } else { n * f(n - 1) } } } fix(make_factorial)(5) // => 120

后记

本来想着随便写点的,没想到最后是又臭又长。但我觉得是很充实了。
本文的写作时间在晚上 9 点到凌晨 3 点之间,持续了将近一周,实在是教人心力交瘁。(虽然一个月前就定了题)
为什么要写这篇文章呢,主要是面试被问到了,紧张之中没有写出来,虽然面试结果令人满意,但是这仍然是个小遗憾。因此写作本文复盘。

七,一一