Two-layer "Y-style" combinator. Is this common? Does this have an official name? - scheme

I've been looking into how languages that forbid use-before-def and don't have mutable cells (no set! or setq) can nonetheless provide recursion. I of course ran across the (famous? infamous?) Y combinator and friends, e.g.:
http://www.ece.uc.edu/~franco/C511/html/Scheme/ycomb.html
http://okmij.org/ftp/Computation/fixed-point-combinators.html
http://www.angelfire.com/tx4/cus/combinator/birds.html
http://en.wikipedia.org/wiki/Fixed-point_combinator
When I went to implement "letrec" semantics in this style (that is, allow a local variable to be defined such that it can be a recursive function, where under the covers it doesn't ever refer to its own name), the combinator I ended up writing looks like this:
Y_letrec = λf . (λx.x x) (λs . (λa . (f ((λx.x x) s)) a))
Or, factoring out the U combinator:
U = λx.x x
Y_letrec = λf . U (λs . (λa . (f (U s)) a))
Read this as: Y_letrec is a function which takes a to-be-recursed function f.
f must be a single-argument function which accepts s, where s is the function
that f can call to achieve self-recursion. f is expected to define and return
an "inner" function which does the "real" operation. That inner function accepts
argument a (or in the general case an argument list, but that can't be expressed
in the traditional notation). The result of calling Y_letrec is a result of calling
f, and it is presumed to be an "inner" function, ready to be called.
The reason I set things up this way is so that I could use the parse tree form of the
to-be-recursed function directly, without modification, merely wrapping an additional
function layer around it during transformation when handling letrec. E.g., if the
original code is:
(letrec ((foo (lambda (a) (foo (cdr a))))))
then the transformed form would be along the lines of:
(define foo (Y_letrec (lambda (foo) (lambda (a) (foo (cdr a))))))
Note that the inner function body is identical between the two.
My questions are:
Is my Y_letrec function commonly used?
Does it have a well-established name?
Note: The first link above refers to a similar function (in "step 5") as the "applicative-order Y combinator", though I'm having trouble finding an authoritative source for that naming.
UPDATE 28-apr-2013:
I realized that Y_letrec as defined above is very close to but not identical to the Z combinator as defined in Wikipedia. Per Wikipedia, the Z combinator and "call-by-value Y combinator" are the same thing, and it looks like that is indeed the thing that may be more commonly called the "applicative-order Y combinator."
So, what I have above is not the same as the applicative-order Y combinator as usually written, but there is almost certainly a sense in which they're related. Here's how I did the comparison:
Starting with:
Y_letrec = λf . (λx.x x) (λs . (λa . (f ((λx.x x) s)) a))
Apply the inner U:
Y_letrec = λf . (λx.x x) (λs . (λa . (f (s s)) a))
Apply the outer U:
Y_letrec = λf . (λs . (λa . (f (s s)) a)) (λs . (λa . (f (s s)) a))
Rename to match Wikipedia's definition of the Z combinator:
Y_letrec = λf . (λx . (λv . (f (x x)) v)) (λx . (λv . (f (x x)) v))
Compare this to Wikipedia's Z combinator:
Z = λf . (λx . f (λv . ((x x) v))) (λx . f (λv . ((x x) v)))
The salient difference is where the function f is being applied. Does it matter? Are these two functions equivalent despite this difference?

Yes, it is an applicative-order Y combinator. Using U inside it is perfectly OK, I did it too (cf. fixed point combinator in lisp). Whether the usage of U to shorten code has a name or not, I don't think so. It's just an application of a lambda-term, and yes, it makes it clearer IMO too.
What does have a name, is eta-conversion, used in your code to delay evaluation under applicative order, where arguments' values must be known before functional application.
With U applied through and through and eta-reduction performed on your code ( (λa.(f (s s)) a) ==> f (s s) ), it becomes the familiar normal-order Y combinator - i.e. such that works under normal-order evaluation, where arguments' values aren't demanded before functional application, which might end up not needing them (or some of them) after all:
Y = λf . (λs.f (s s)) (λs.f (s s))
BTW the delaying can be applied in slightly different way,
Y_ = λf . (λx.x x) (λs.f (λa.(s s) a))
which also works under applicative-order evaluation rules.
What is the difference? let's compare the reduction sequences. Your version,
Y_ = λf . (λx . (λv . (f (x x)) v)) (λx . (λv . (f (x x)) v))
((Y_ f) a) =
= ((λx . (λv . (f (x x)) v)) (λx . (λv . (f (x x)) v))) a
= (λv . (f (x x)) v) a { x := (λx . (λv . (f (x x)) v)) }
= (f (x x)) a
= | ; here (f (x x)) application must be evaluated, so
| ; the value of (x x) is first determined
| (x x)
| = ((λx . (λv . (f (x x)) v)) (λx . (λv . (f (x x)) v)))
| = (λv . (f (x x)) v) { x := (λx . (λv . (f (x x)) v)) }
and here f is entered. So here too, the well-behaved function f receives its first argument and it's supposed not to do anything with it. So maybe the two are exactly equivalent after all.
But really, the minutia of lambda-expressions definitions do not matter when it comes to the real implementation, because real implementation language will have pointers and we'll just manipulate them to point properly to the containing expression body, and not to its copy. Lambda calculus is done with pencil and paper after all, as textual copying and replacement. Y combinator in lambda calculus only emulates recursion. True recursion is true self-reference; not receiving copies equal to self, through self-application (however smart that is).
TL;DR: though language being defined can be devoid of such fun stuff as assignment and pointer equality, the language in which we define it will most certainly have those, because we need them for efficiency. At the very least, its implementation will have them, under the hood.
see also: fixed point combinator in lisp , esp. In Scheme, how do you use lambda to create a recursive function?.

Related

Understanding church numerals

I'm working my way through SICP, and it gives the following definition for zero for Church Numerals:
(define zero (lambda (f) (lambda (x) x)))
I have a few questions about that:
Why the complicated syntax? It seems to be quite readable by just having the following instead:
(define (zero f)
(lambda (x) x))
where we can see it's a function called zero that takes one (unused) argument f and returns a function-of-one-parameter that will return its parameter. It almost seems like the definition is just intended to be as non-straightforward as possible.
What is the x there for? For example doing something like:
((zero square) 100)
returns 100. Is x just the default value returned?
There is no x in (lambda (x) x). None.
The x in (lambda (x) x) is bound. It could be named by any name whatever. We can not talk about x in (lambda (x) x) any more than we could talk about y in (lambda (y) y).
There is no y in (lambda (y) y) to speak of. It is just a placeholder, an arbitrary name whose sole purpose in the body is to be the same as in the binder. Same, without regard for which specific name is used there as long as it is used twice -- first time in the binder, and the other time in the body.
And in fact there is this whole 'nother notation for lambda terms, called De Bruijn notation, where the same whole thing is written (lambda 1). With 1 meaning, "I refer to the argument which the binder 1 step above me receives".
So x is unimportant. What's important is (lambda (x) x) which denotes a function which returns its argument as is. The so called "identity" function.
But even this is not important here. The Church encoding of a number is really a binary function, a function expecting two arguments -- the f and the z. The "successor step" unary function f and the "zero" "value" z, whatever that might be, as long as the two go together. Make sense together. Work together.
So how come we see two unary functions there when it is really one binary function in play?
That is the important bit. It is known as currying.
In lambda calculus all functions are unary. And to represent a binary function an unary function is used, such that when given its (first) argument it returns another unary function, which, when given its (now, second) argument, performs whatever thing our intended binary function ought to perform, using those two arguments, the first and the second.
This is all very very simple if we just write it in combinatory (equational) notation instead of the lambda notation:
zero f z = z
one f z = f z
two f z = f (f z) = f (one f z) = succ one f z
succ one f z = f (one f z)
where every juxtaposition denotes an application, and all applications associate on the left, so we imagine the above being a shortcut notation for
zero f = lambda z. z
zero = lambda f. (lambda z. z)
......
......
succ = lambda one. (lambda f. (lambda z. f (one f z) ))
;; such that
succ one f z = (((succ one) f) z)
= ((((lambda one. (lambda f. (lambda z. f (one f z) ))) one) f) z)
= ....
= (f ((one f) z))
= f (one f z)
but it's the same thing. The differences in notation are not important.
And of course there is no one in lambda one. (lambda f. (lambda z. f (one f z) )). It is bound. It could just be named, I dunno, number:
succ number f z = f (number f z) = f ((number f) z)
meaning, (succ number) is such a number, which, given the f and the z, does with them one more f step compared to what number would do.
And so, ((zero square) 100) means, use the number zero with the successor step square and the zero value of 100, and have zero perform its number of successor steps for us -- that is to say, 0 steps -- starting from the zero value. Thus returning it unchanged.
Another possible use is ((zero (lambda (x) 0)) 1), or in general
((lambda (n) ((n (lambda (x) 0)) 1)) zero)
;; or even more generally, abstracting away the 0 and the 1,
((((lambda (n) (lambda (t) (lambda (f) ((n (lambda (x) f)) t)))) zero) 1) 0)
which is just another way of writing
zero (lambda x. 0) 1 ;; or
foo n t f = n (lambda x. f) t ;; and calling
foo zero 1 0
Hopefully you can see what foo is, easily. And also how to read aloud this t and this f. (Probably the original f would be better named s, for "successor", or something like that).

Understanding extra arguments in the Y Combinator in Scheme

According to RosettaCode, the Y Combinator in Scheme is implemented as
(define Y
(λ (h)
((λ (x) (x x))
(λ (g)
(h (λ args (apply (g g) args)))))))
Of course, the traditional Y Combinator is λf.(λx. f(x x))(λx. f(x x))
My question, then, is about h and args, which don't appear in the mathematical definition, and about apply, which seems like it should either be in both halves of the Combinator or in neither.
Can someone help me understand what is going on here?
Lets start off with the lambda calculus version traslated to Scheme:
(λ (f)
((λ (x) (f (x x)))
(λ (x) (f (x x)))))
I'd like to simplify this since I see (λ (x) (f x x)) is repeated twice. You can substitute the beginning there to this:
(λ (f)
((λ (b) (b b))
(λ (x) (f (x x)))))
Scheme is an eager language so it will go into an infinite loop. In order to avoid that we make a proxy.. Imagine you have + that takes two numbers, you can substitute it with (λ (a b) (+ a b)) without the result being changed. Lets do that with the code:
(λ (f)
((λ (b) (b b))
(λ (x) (f (λ (p) ((x x) p))))))
Actully this has its own name. It's called the Z combinator. (x x) is not done when f is applied only when the supplied proxy is applied. Delayed one step. It might look strange but I know (x x) becomes a function so this is exactly the same as my + substitution above.
In Lambda calculus all functions takes one argument. If you see f x y it's actually the same as ((f x) y) in Scheme. If you want it to work with functions of all arities your substitution needs to reflect that. In Scheme we have rest arguments and apply to do this.
(λ (f)
((λ (b) (b b))
(λ (x) (f (λ p (apply (x x) p))))))
This isn't neede if you only are going to use one arity functions as in lambda calculus.
Notice that in your code you use h instead of f. It doesn't really matter what you call the variables. This is the same code with different names. So this is the same:
(λ (rec-fun)
((λ (yfun) (yfun yfun))
(λ (self) (rec-fun (λ args (apply (self self) args))))))
Needless to say (yfun yfun) and (self self) does the same thing.

Composing two functions in Scheme?

I'm trying to make a Scheme function which takes the functions f(x) and g(x) and combines them into f(g(x))
Here's the attempt to my solution.
define(combine R T)(lambda(x) (R(T(x)))
What am I doing wrong?
define(combine R T)(lambda(x) (R(T(x)))
What am I doing wrong?
You have ( in the wrong place
(define (compose f g)
(λ (x) (f (g x))))

How do I make the substitution ? Scheme

How do I make the substitution? I tried to trace but I don't really get what is going on...
the code:
(define (repeated f n)
(if (zero? n)
identity
(lambda (x) ((repeated f (- n 1)) (f x)))))
f is a function and n is an integer that gives the number of times we should apply f.
....can someone help me to interpret it. I know it returns several procedures and i want to believe that it goes f(f(f(x)))
okey i will re-ask this question but in different manner, because i didn't really get an answer last time. consider this code
(define (repeated f n)
(if (zero? n)
identity
(lambda (x) ((repeated f (- n 1)) (f x)))))
where n is a positive integer and f is an arbitrary function: how does scheme operate on this code lets say we give (repeated f 2). what will happen? this is what think:
(f 2)
(lambda (x) ((repeated f (- 2 1)) (f x))))
(f 1)
(lambda (x) ((lambda (x) ((repeated f (- 1 1)) (f x)))) (f x))))
(f 0)
(lambda (x) ((lambda (x) (identity (f x)))) (f x))))
> (lambda (x) ((lambda (x) (identity (f x)))) (f x))))
> (lambda (x) ((lambda (x) ((f x)))) (f x))))
here is were i get stuck first i want it to go (f(f(x)) but now i will get (lambda x ((f x) (f x)) , the parentheses is certaintly wrong , but i think you understand what i mean. What is wrong with my arguments on how the interpreter works
Your implementation actually delays the further recursion and return a procedure whose body will create copies of itself to fulfill the task at runtime.
Eg. (repeated double 4) ==> (lambda (x) ((repeated double (- 4 1)) (double x)))
So when calling it ((repeated double 4) 2) it runs ((repeated double (- 4 1)) (double 2)))
where the operand part evaluates to (lambda (x) ((repeated double (- 3 1)) (double x))) and so on making the closures at run time so the evaluation becomes equal to this, but in stages during runtime..
((lambda (x) ((lambda (x) ((lambda (x) ((lambda (x) ((lambda (x) (identity x)) (double x))) (double x))) (double x))) (double x))) 2)
A different way of writing the same functionality would be like this:
(define (repeat fun n)
(lambda (x)
(let repeat-loop ((n n)
(x x))
(if (<= n 0)
x
(repeat-loop (- n 1) (fun x))))))
(define (double x) (+ x x))
((repeat double 4) 2) ; ==> 32
You've got a function that takes a function f and an non-negative integer n and returns the function fn, i.e., f(f(f(…f(n)…). Depending on how you think of your recursion, this could be implemented straightforwardly in either of two ways. In both cases, if n is 0, then you just need a function that returns its argument, and that function is the identity function. (This is sort of by convention, in the same way that x0 = 1. It does make sense when it's considered in more depth, but that's probably out of scope for this question.)
How you handle the recursive case is where you have some options. The first option is to think of fn(x) as f(fn-1(x)), where you call f with the result of calling fn-1 with x:
(define (repeated f n)
(if (zero? n)
identity
(lambda (x)
(f ((repeated f (- n 1)) x)))))
The other option is to think of fn(x) as fn-1(f(x)) where _fn-1 gets called with the result of f(x).
(define (repeated f n)
(if (zero? n)
identity
(lambda (x)
((repeated f (- n 1)) (f x)))))
In either case, the important thing to note here is that in Scheme, a form like
(function-form arg-form-1 arg-form-2 ...)
is evaluated by evaluating function-form to produce a value function-value (which should be a function) and evaluating each arg-form-i to produce values arg-value-i, and then calling _function-value_ with the arg-values. Since (repeated ...) produces a function, it's suitable as a function-form:
(f ((repeated f (- n 1)) x))
; |--- f^{n-1} ------|
; |---- f^{n-1}(x) ------|
;|------f(f^{n-1}(x)) ------|
((repeated f (- n 1)) (f x))
; |--- f^{n-1} ------|
;|---- f^{n-1}(f(x))--------|
Based on Will Ness's comment, it's worth pointing out that while these are somewhat natural ways to decompose this problem (i.e., based on the equalities fn(x) = fn-1(f(x)) = f(fn-1(x))), it's not necessarily the most efficient. These solutions both require computing some intermediate function objects to represent fn-1 that require a fair amount of storage, and then some computation on top of that. Computing fn(x) directly is pretty straightforward and efficient with, e.g., repeat:
(define (repeat f n x)
(let rep ((n n) (x x))
(if (<= n 0)
x
(rep (- n 1) (f x)))))
A more efficient version of repeated, then, simply curries the x argument of repeat:
(define (repeated f n)
(lambda (x)
(repeat f n x)))
This should have better run time performance than either of the other implementations.
Danny. I think that if we work repeated with small values of n (0, 1 and 2) will be able to see how the function translates to f(f(f(...(x))). I assume that identity's implementation is (define (identity x) x) (i.e. returns its only parameter as is), and that the "then" part of the if should be (identity f).
(repeated f 0) ;should apply f only once, no repetition
-> (identity f)
-> f
(repeated f 1) ;expected result is f(f(x))
-> (lambda (x) ((repeated f 0) (f x)))
-> (lambda (x) (f (f x))) ;we already know that (repeated f 0) is f
(repeated f 2) ;expected result is f(f(f(x)))
-> (lambda (x) ((repeated f 1) (f x)))
-> (lambda (x) (f (f (f x)))) ; we already know that (repeated f 1) if f(f(x))
... and so on.
Equational reasoning would be very helpful here. Imagine lambda calculus-based language with Haskell-like syntax, practically a combinatory calculus.
Here, parentheses are used just for grouping of expressions (not for function calls, which have no syntax at all – just juxtaposition): f a b c is the same as ((f a) b) c, the same as Scheme's (((f a) b) c). Definitions like f a b = ... are equivalent to (define f (lambda (a) (lambda (b) ...))) (and shortcut for (lambda (a) ...) is (\a-> ...).
Scheme's syntax just obscures the picture here. I don't mean parentheses, but being forced to explicit lambdas instead of just equations and freely shifting the arguments around:
f a b = \c -> .... === f a b c = .... ; `\ ->` is for 'lambda'
Your code is then nearly equivalent to
repeated f n x ; (define (repeated f n)
| n <= 0 = x ; (if (zero? n) identity
| otherwise = repeated f (n-1) (f x) ; (lambda (x)
; ((repeated f (- n 1)) (f x)))))
(read | as "when"). So
repeated f 2 x = ; ((repeated f 2) x) = ((\x-> ((repeated f 1) (f x))) x)
= repeated f 1 (f x) ; = ((repeated f 1) (f x))
= repeated f 0 (f (f x)) ; = ((\y->((repeated f 0) (f y))) (f x))
= f (f x) ; = ((\z-> z) (f (f x)))
; = (f (f x))
The above reduction sequence leaves out the particulars of environment frames creation and chaining in Scheme, but it all works out pretty much intuitively. f is the same f, n-1 where n=2 is 1 no matter when we perform the subtraction, etc..

Lambda calculus predecessor function reduction steps

I am getting stuck with the Wikipedia description of the predecessor function in lambda calculus.
What Wikipedia says is the following:
PRED := λn.λf.λx. n (λg.λh. h (g f)) (λu.x) (λu.u)
Can someone explain reduction processes step-by-step?
Thanks.
Ok, so the idea of Church numerals is to encode "data" using functions, right? The way that works is by representing a value by some generic operation you'd perform with it. We can therefore go in the other direction as well, which can sometimes make things clearer.
Church numerals are a unary representation of the natural numbers. So, let's use Z to mean zero and Sn to represent the successor of n. Now we can count like this: Z, SZ, SSZ, SSSZ... The equivalent Church numeral takes two arguments--the first corresponding to S, and second to Z--then uses them to construct the above pattern. So given arguments f and x, we can count like this: x, f x, f (f x), f (f (f x))...
Let's look at what PRED does.
First, it creates a lambda taking three arguments--n is the Church numeral whose predecessor we want, of course, which means that f and x are the arguments to the resulting numeral, which thus means that the body of that lambda will be f applied to x one time fewer than n would.
Next, it applies n to three arguments. This is the tricky part.
The second argument, that corresponds to Z from earlier, is λu.x--a constant function that ignores one argument and returns x.
The first argument, that corresponds to S from earlier, is λgh.h (g f). We can rewrite this as λg. (λh.h (g f)) to reflect the fact that only the outermost lambda is being applied n times. What this function does is take the accumulated result so far as g and return a new function taking one argument, which applies that argument to g applied to f. Which is absolutely baffling, of course.
So... what's going on here? Consider the direct substitution with S and Z. In a non-zero number Sn, the n corresponds to the argument bound to g. So, remembering that f and x are bound in an outside scope, we can count like this: λu.x, λh. h ((λu.x) f), λh'. h' ((λh. h ((λu.x) f)) f) ... Performing the obvious reductions, we get this: λu.x, λh. h x, λh'. h' (f x) ... The pattern here is that a function is being passed "inward" one layer, at which point an S will apply it, while a Z will ignore it. So we get one application of f for each S except the outermost.
The third argument is simply the identity function, which is dutifully applied by the outermost S, returning the final result--f applied one fewer times than the number of S layers n corresponds to.
McCann's answer explains it pretty well. Let's take a concrete example for Pred 3 = 2:
Consider expression: n (λgh.h (g f)) (λu.x). Let K = (λgh.h (g f))
For n = 0, we encode 0 = λfx.x, so when we apply the beta reduction for (λfx.x)(λgh.h(gf)) means (λgh.h(gf)) is replaced 0 times. After further beta-reduction we get:
λfx.(λu.x)(λu.u)
reduces to
λfx.x
where λfx.x = 0, as expected.
For n = 1, we apply K for 1 times:
(λgh.h (g f)) (λu.x)
=> λh. h((λu.x) f)
=> λh. h x
For n = 2, we apply K for 2 times:
(λgh.h (g f)) (λh. h x)
=> λh. h ((λh. h x) f)
=> λh. h (f x)
For n = 3, we apply K for 3 times:
(λgh.h (g f)) (λh. h (f x))
=> λh.h ((λh. h (f x)) f)
=> λh.h (f (f x))
Finally, we take this result and apply an id function to it, we got
λh.h (f (f x)) (λu.u)
=> (λu.u)(f (f x))
=> f (f x)
This is the definition of number 2.
The list based implementation might be easier to understand, but it takes many intermediate steps. So it is not as nice as the Church's original implementation IMO.
After Reading the previous answers (good ones), I’d like to give my own vision of the matter in hope it helps someone (corrections are welcomed). I’ll use an example.
First off, I’d like to add some parenthesis to the definition that made everything clearer to me. Let’s redifine the given formula to:
PRED := λn λf λx.(n (λgλh.h (g f)) (λu.x)) (λu.u)
Let’s also define three Church numerals that will help with the example:
Zero := λfλx.x
One := λfλx. f (Zero f x)
Two := λfλx. f (One f x)
Three := λfλx. f (Two f x)
In order to understand how this works, let's focus first on this part of the formula:
n (λgλh.h (g f)) (λu.x)
From here, we can extract this conclusions:
n is a Church numeral, the function to be applied is λgλh.h (g f) and the starting data is λu.x
With this in mind, let's try an example:
PRED Three := λf λx.(Three (λgλh.h (g f)) (λu.x)) (λu.u)
Let's focus first on the reduction of the numeral (the part we explained before):
Three (λgλh.h (g f)) (λu.x)
Which reduces to:
(λgλh.h (g f)) (Two (λgλh.h (g f)) (λu.x))
(λgλh.h (g f)) ((λgλh.h (g f)) (One (λgλh.h (g f)) (λu.x)))
(λgλh.h (g f)) ((λgλh.h (g f)) ((λgλh.h (g f)) (Zero (λgλh.h (g f)) (λu.x))))
(λgλh.h (g f)) ((λgλh.h (g f)) ((λgλh.h (g f)) ((λfλx.x) (λgλh.h (g f)) (λu.x)))) -- Here we lose one application of f
(λgλh.h (g f)) ((λgλh.h (g f)) ((λgλh.h (g f)) (λu.x)))
(λgλh.h (g f)) ((λgλh.h (g f)) (λh.h ((λu.x) f)))
(λgλh.h (g f)) ((λgλh.h (g f)) (λh.h x))
(λgλh.h (g f)) (λh.h ((λh.h x) f))
(λgλh.h (g f)) (λh.h (f x))
(λh.h ((λh.h (f x) f)))
Ending up with:
λh.h f (f x)
So, we have:
PRED Three := λf λx.(λh.h (f (f x))) (λu.u)
Reducing again:
PRED Three := λf λx.((λu.u) (f (f x)))
PRED Three := λf λx.f (f x)
As you can see in the reductions, we end up applying the function one time less thanks to a clever way of using functions.
Using add1 as f and 0 as x, we get:
PRED Three add1 0 := add1 (add1 0) = 2
Hope this helps.
You can try to understand this definition of the predecessor function (not my favourite one) in terms of continuations.
To simplify the matter a bit, let us consider the following variant
PRED := λn.n (λgh.h (g S)) (λu.0) (λu.u)
then, you can replace S with f, and 0 with x.
The body of the function iterates n times a transformation M over an argument N. The argument N is a function of type (nat -> nat) -> nat that expects a continuation for nat and returns a nat. Initially, N = λu.0, that is it ignores the continuation and just returns 0.
Let us call N the current computation.
The function M: (nat -> nat) -> nat) -> (nat -> nat) -> nat modifies the computation g: (nat -> nat)->nat as follows.
It takes in input a continuation h, and applies it to the
result of continuing the current computation g with S.
Since the initial computation ignored the continuation, after one application of M we get the computation (λh.h 0), then (λh.h (S 0)), and so on.
At the end, we apply the computation to the identity continuation
to extract the result.
I'll add my explanation to the above good ones, mostly for the sake of my own understanding. Here's the definition of PRED again:
PRED := λnfx. (n (λg (λh.h (g f))) ) λu.x λu.u
The stuff on the right side of the first dot is supposed to be the (n-1) fold composition of f applied to x: f^(n-1)(x).
Let's see why this is the case by incrementally grokking the expression.
λu.x is the constant function valued at x. Let's just denote it const_x.
λu.u is the identity function. Let's call it id.
λg (λh.h (g f)) is a weird function that we need to understand. Let's call it F.
Ok, so PRED tells us to evaluate the n-fold composition of F on the constant function and then to evaluate the result on the identity function.
PRED := λnfx. F^n const_x id
Let's take a closer look at F:
F:= λg (λh.h (g f))
F sends g to evaluation at g(f).
Let's denote evaluation at value y by ev_y.
That is, ev_y := λh.h y
So
F = λg. ev_{g(f)}
Now we figure out what F^n const_x is.
F const_x = ev_{const_x(f)} = ev_x
and
F^2 const_x = F ev_x = ev_{ev_x(f)} = ev_{f(x)}
Similarly,
F^3 const_x = F ev_{f(x)} = ev_{f^2(x)}
and so on:
F^n const_x = ev_{f^(n-1)(x)}
Now,
PRED = λnfx. F^n const_x id
= λnfx. ev_{f^(n-1)(x)} id
= λnfx. id(f^(n-1)(x))
= λnfx. f^(n-1)(x)
which is what we wanted.
Super goofy. The idea is to turn doing something n times into doing f n-1 times. The solution is to apply F n times to const_x to obtain
ev_{f^(n-1)(x)} and then to extract f^(n-1)(x) by evaluating at the identity function.
Split this definition
PRED := λn.λf.λx.n (λg.λh.h (g f)) (λu.x) (λu.u)
into 4 parts:
PRED := λn.λf.λx. | n | (λg.λh.h (g f)) | (λu.x) | (λu.u)
- --------------- ------ ------
A B C D
For now, ignore D. By definition of Church numerals, A B C is B^n C: Apply n folds of B to C.
Now treat B like a machine that turns one input into one output. Its input g has form λh.h *, when appended by f, becomes (λh.h *) f = f *. This adds one more application of f to *. The result f * is then prepended by λh.h to become λh.h (f *).
You see the pattern: Each application of B turns λh.h * into λh.h (f *). If we had λh.h x as the begin term, we would have λh.h (f^n x) as the end term (after n applications of B).
However, the begin term is C = (λu.x), when appended by f, becomes (λu.x) f = x, then prepended by λh.h to become λh.h x. So we had λh.h x after, not before, the first application of B. This is why we have λh.h (f^(n-1) x) as the end term: The first application of f was ignored.
Finally, apply λh.h (f^(n-1) x) to D = (λu.u), which is identity, to get f^(n-1) x. That is:
PRED := λn.λf.λx.f^(n-1) x

Resources