Understanding church numerals - scheme

I'm working my way through SICP, and it gives the following definition for zero for Church Numerals:
(define zero (lambda (f) (lambda (x) x)))
I have a few questions about that:
Why the complicated syntax? It seems to be quite readable by just having the following instead:
(define (zero f)
(lambda (x) x))
where we can see it's a function called zero that takes one (unused) argument f and returns a function-of-one-parameter that will return its parameter. It almost seems like the definition is just intended to be as non-straightforward as possible.
What is the x there for? For example doing something like:
((zero square) 100)
returns 100. Is x just the default value returned?

There is no x in (lambda (x) x). None.
The x in (lambda (x) x) is bound. It could be named by any name whatever. We can not talk about x in (lambda (x) x) any more than we could talk about y in (lambda (y) y).
There is no y in (lambda (y) y) to speak of. It is just a placeholder, an arbitrary name whose sole purpose in the body is to be the same as in the binder. Same, without regard for which specific name is used there as long as it is used twice -- first time in the binder, and the other time in the body.
And in fact there is this whole 'nother notation for lambda terms, called De Bruijn notation, where the same whole thing is written (lambda 1). With 1 meaning, "I refer to the argument which the binder 1 step above me receives".
So x is unimportant. What's important is (lambda (x) x) which denotes a function which returns its argument as is. The so called "identity" function.
But even this is not important here. The Church encoding of a number is really a binary function, a function expecting two arguments -- the f and the z. The "successor step" unary function f and the "zero" "value" z, whatever that might be, as long as the two go together. Make sense together. Work together.
So how come we see two unary functions there when it is really one binary function in play?
That is the important bit. It is known as currying.
In lambda calculus all functions are unary. And to represent a binary function an unary function is used, such that when given its (first) argument it returns another unary function, which, when given its (now, second) argument, performs whatever thing our intended binary function ought to perform, using those two arguments, the first and the second.
This is all very very simple if we just write it in combinatory (equational) notation instead of the lambda notation:
zero f z = z
one f z = f z
two f z = f (f z) = f (one f z) = succ one f z
succ one f z = f (one f z)
where every juxtaposition denotes an application, and all applications associate on the left, so we imagine the above being a shortcut notation for
zero f = lambda z. z
zero = lambda f. (lambda z. z)
......
......
succ = lambda one. (lambda f. (lambda z. f (one f z) ))
;; such that
succ one f z = (((succ one) f) z)
= ((((lambda one. (lambda f. (lambda z. f (one f z) ))) one) f) z)
= ....
= (f ((one f) z))
= f (one f z)
but it's the same thing. The differences in notation are not important.
And of course there is no one in lambda one. (lambda f. (lambda z. f (one f z) )). It is bound. It could just be named, I dunno, number:
succ number f z = f (number f z) = f ((number f) z)
meaning, (succ number) is such a number, which, given the f and the z, does with them one more f step compared to what number would do.
And so, ((zero square) 100) means, use the number zero with the successor step square and the zero value of 100, and have zero perform its number of successor steps for us -- that is to say, 0 steps -- starting from the zero value. Thus returning it unchanged.
Another possible use is ((zero (lambda (x) 0)) 1), or in general
((lambda (n) ((n (lambda (x) 0)) 1)) zero)
;; or even more generally, abstracting away the 0 and the 1,
((((lambda (n) (lambda (t) (lambda (f) ((n (lambda (x) f)) t)))) zero) 1) 0)
which is just another way of writing
zero (lambda x. 0) 1 ;; or
foo n t f = n (lambda x. f) t ;; and calling
foo zero 1 0
Hopefully you can see what foo is, easily. And also how to read aloud this t and this f. (Probably the original f would be better named s, for "successor", or something like that).

Related

Scheme High Order Functions [duplicate]

I was just beginning to feel I had a vague understanding of the use of lambda in racket and scheme when I came across the following 'alternate' definitions for cons and car in SICP
(define (cons x y)
(lambda (m) (m x y)))
(define (car z)
(z (lambda (p q) p)))
(define (cdr z)
(z (lambda (p q) q)))
For the life of me I just cannot parse them.
Can anybody explain how to parse or expand these in a way that makes sense for total neophytes?
This is an interesting way to represent data: as functions. Notice that this
definition of cons returns a lambda which closes over the parameters x
and y, capturing their values inside. Also notice that the returned lambda
receives a function m as a parameter:
;creates a closure that "remembers' 2 values
(define (cons x y) (lambda (m) (m x y)))
;recieves a cons holding 2 values, returning the 0th value
(define (car z) (z (lambda (p q) p)))
;recieves a cons holding 2 values, returning the 1st value
(define (cdr z) (z (lambda (p q) q)))
In the above code z is a closure, the same that was created by cons, and in
the body of the procedure we're passing it another lambda as parameter,
remember m? it's just that! the function that it was expecting.
Understanding the above, it's easy to see how car and cdr work; let's
dissect how car, cdr is evaluated by the interpreter one step at a time:
; lets say we started with a closure `cons`, passed in to `car`
(car (cons 1 2))
; the definition of `cons` is substituted in to `(cons 1 2)` resulting in:
(car (lambda (m) (m 1 2)))
; substitute `car` with its definition
((lambda (m) (m 1 2)) (lambda (p q) p))
; replace `m` with the passed parameter
((lambda (p q) p) 1 2)
; bind 1 to `p` and 2 to `q`, return p
1
To summarize: cons creates a closure that "remembers' two values, car
receives that closure and passes it along a function that acts as a selector for
the zeroth value, and cdr acts as a selector for the 1st value. The key
point to understand here is that lambda acts as a
closure.
How cool is this? we only need functions to store and retrieve arbitrary data!
Nested Compositions of car & cdr are defined up to 4 deep in most LISPs. example:
(define caddr (lambda (x) (car (cdr (cdr x)))))
In my view, the definitive trick is reading the definitions from the end to the beginning, because in all three of them the free variables are always those that can be found in the lambda within the body (m, p and q). Here is an attempt to translate the code to English, from the end (bottom-right) to the beginning (top-left):
(define (cons x y)
(lambda (m) (m x y))
Whatever m is, and we suspect it is a function because it appears right next to a (, it must be applied over both x and y: this is the definition of consing x and y.
(define (car z)
(z (lambda (p q) q)))
Whatever p and q are, when something called z is applied, and z is something that accepts functions as its input, then the first one of p and q is selected: this is the definition of car.
For an example of "something that accepts functions as its input", we just need to look back to the definition of cons. So, this means car accepts cons as its input.
(car (cons 1 2)) ; looks indeed familiar and reassuring
(car (cons 1 (cons 2 '()))) ; is equivalent
(car '(1 2)) ; is also equivalent
(car z)
; if the previous two are equivalent, then z := '(1 2)
The last line means: a list is "something that accepts a function as its input".
Don't let your head spin at that moment! The list will only accept functions that can work on list elements, anyway. And this is the case precisely because we have re-defined cons the way that we have.
I think the main point from this exercise is "computation is bringing operations and data together, and it doesn't matter in which order you bring them together".
This should be easy to understand with the combinatory notation (implicitly translated to Scheme as currying functions, f x y = z ==> (define f (λ (x) (λ (y) z)))):
cons x y m = m x y
car z = z _K ; _K p q = p
cdr z = z (_K _I) ; _I x = x _K _I p q = _I q = q
so we get
car (cons x y) = cons x y _K = _K x y = x
cdr (cons x y) = cons x y (_K _I) = _K _I x y = _I y = y
so the definitions do what we expect. Easy.
In English, the cons x y value is a function that says "if you'll give me a function of two arguments I'll call it with the two arguments I hold. Let it decide what to do with them, then!".
In other words, it expects a "continuation" function, and calls it with the two arguments used in its (the "pair") creation.

Using 'define' in Scheme

I'm new to Scheme and was just curious about 'define'. I've seen things like:
(define (square x) (* x x))
which makes sense [Function name 'square' input parameter 'x']. However, I found some example code from the 90's and am trying to make sense of:
(define (play-loop-iter strat0 strat1 count history0 history1 limit) (~Code for function~)
Except for the function name, are all of those input parameters?
Short answer - yes, all the symbols after the first one are parameters for the procedure (the first one being the procedures's name). Also it's good to point out that this:
(define (f x y)
(+ x y))
Is just syntactic sugar for this, and both forms are equivalent:
(define f
(lambda (x y)
(+ x y)))
In general - you use the special form define for binding a name to a value, that value can be any data type available, including in particular functions (lambdas).
A bit more about parameters and procedure definitions - it's good to know that the . notation can be used for defining procedures with a variable number of arguments, for example:
(define (f . x) ; here `x` is a list with all the parameters
(apply + x))
(f 1 2 3 4 5) ; 0 or more parameters can be passed
=> 15
And one final trick with define (not available in all interpreters, but works in Racket). A quick shortcut for defining procedures that return procedures, like this one:
(define (f x)
(lambda (y)
(+ x y)))
... Which is equivalent to this, shorter syntax:
(define ((f x) y)
(+ x y))
((f 1) 2)
=> 3
Yes, strat0 through limit are the parameters of the play-loop-iter function.
The general form for define is:
(define (desired-name-of-procedure item-1 item-2 item-3 ... item-n)
(; what to do with the items))
Another way to explain the behaviour of define, is in terms of "means of combination", and "means of abstraction".
[A] The means of combination in simple terms:
The syntax (item-1 item-2 item-3 ... ... item-n) is the fundamental means of combination provided by Scheme (and Lisp in general.)
All code is a list represented using the above pattern
The very first (leftmost) item is always treated as an operator
Parentheses enforce the application of the operator... The leftmost item is required to accept all the items that follow, as arguments
[B] means of abstraction is simply; a way to name things.
An example will demonstrate how this all folds into the idea of the define primitive...
Example--Arriving at define in a bottom-up way
Consider this expression:
(lambda (x y) (* x y))
In plain English, the above expression translates to "Create a nameless procedure that accepts two arguments, and returns the value of the their product". Note that this generates a nameless procedure.
More accurately, in terms of means of combination, Scheme provides us the keyword lambda as a primitive operator that creates user-defined procedures.
The leftmost item--lambda--is passed items (x y) and (* x y) as arguments, and the operator-application rule forces lambda to do something with the items.
The way lambda is defined internally causes it to parse the list (x y), and treat x and y as arguments to pass to the list (* x y), which lambda assumes is the user's definition of what to do when arguments x and y are encountered. Any value assigned to x and y will be processed in accordance with the rule (* x y).
Enter, means of abstraction...
Suppose I wanted to refer to this type of multiplication at several places in my program, I might tweak the above lambda expression like this:
(define mul-two-things (lambda (x y) (* x y)))
define takes mul-two-things and the lambda expression as arguments, and "binds" them together. Now Scheme knows that mul-two-things should be associated with a procedure to take two arguments and return their product.
As it happens, the requirement of naming procedures is so very common and provides so much power of expression, that Scheme provides a cleaner-looking shortcut to do it.
Like #oscar-lopez says, define is the "special form" Scheme provides, to name things. And as far as Scheme's Interpreter is concerned, both the following definitions are identical:
(define (mul-two-things x y) (* x y))
(define mul-two-things (lambda (x y) (* x y))

How do you convert to lambda syntax?

Part of a question I'm trying to understand involves this:
twice (twice) f x , where twice == lambda f x . f (f x)
I'm trying to understand how to make that substitution, and what it means.
My understanding is that (lambda x y . x + y) 2 3 == 2 + 3 == 5. I don't understand what twice (twice) means, or f ( f x ).
Two ways of looking at this.
Mechanical application of beta-reduction
You can solve this mechanically just by expanding any subterm of the form twice F X - with this term you will eventually eliminate all the occurences of twice, although you need to take care that you really understand the syntax tree of the lambda calculus to avoid mistakes.
twice takes two arguments, so your expression twice (twice) f x is the redex twice (twice) f applied to x. (A redex is a subterm that you can reduce independently of the rest of the term).
Expand the definition of twice in the redex: twice (twice) f x -> twice (twice f).
Substitute this into the original term to get twice (twice f) x, which is another redex we can expand twice in to get twice f (twice f x) (take care with the brackets in this step).
We have two twice redexes we can expand here, expanding the one inside the brackets is slightly simpler, giving twice f (f (f x)), which can again be expanded to give f (f (f (f x))).
Semantics of twice via abstraction
You can see what's going on at a more intuitive level by appealing to a higher-order combinator, the "○" infix combinator for function composition:
f ○ g = lambda x. f (g x)
It's easy to verify that twice f x and (f ○ f) x both expand to the same normal form, i.e., f (f x), so by extensionality, we have
twice f = f ○ f
Using this, we can expand very straightforwardly, first eliminating twice in favour of the composition combinator:
twice (twice) f x
= (twice ○ twice) f x
= (twice (twice f)) x /* expand out '○' */
= (twice (f ○ f)) x
= ((f ○ f) ○ (f ○ f)) x
and then expanding out '○':
= (f ○ f) ((f ○ f) x)
= (f ○ f) (f (f x))
= (f (f (f (f x))))
That's more expansion steps, because we first expand to terms containing the '○' operator, and then expand these operators out, but the steps are simpler, more intuitive ones, where you are less likely to misunderstand what you are doing. The '○' is widely used, standard operator in Haskell and is well worth getting used to.

Scheme-- how do procedures take other procedures as arguments?

Write a Scheme procedure named 'proc4' which takes 2 procedures as arguments (w,x) [note that w and x can be expected to work correctly when given two numbers as arguments]
and returns a procedure which takes 2 numbers (y,z) as arguments and returns the
procedure (w or x) which results in the greatest number when applied to y and z
(i.e. in C++ pseudocode if ((y w z) > (y x z)) {return w; } else {return x;} )
So I started
(define proc4(lamdda ( w x) (lambda y z)...
Then I wanted to do the if part. Something like
(if (> (apply w ( y z)) (apply x( w z))) but I keep getting errors.
I've been trying to find help on internet but everything I've seen so far does not make sense to me.
You can invoke function objects directly, without using apply:
(define (proc4 f g)
(lambda (x y)
(if (> (f x y) (g x y))
f
g)))
A bit of syntactic sugar for #ChrisJester-Young's answer - you can declare a procedure that returns another procedure like this:
(define ((proc4 f g) x y)
(if (> (f x y) (g x y))
f
g))
In the above code, the first procedure receives as parameters the procedures f and g, and in turn returns a procedure that receives as parameters x and y. We know that f and g are procedures because the way they're used inside the body of the definition, but they can have any name you want. Of course you can call the procedure in the usual way:
((proc4 + *) 10 20)
=> #<procedure:*>
The point of interest in this example is that procedures can also be passed as parameters (and returned as values), you don't need to apply them, just invoke the procedures received as parameters as you would with any other procedure. Also notice that all the answers to this question are equivalent, but the short-hand syntax that I'm using might not be available in all interpreters.
I cannot make much sense of this (obviously homework) question but I'd go for this:
(define proc4
(lambda (w x)
(lambda (y z)
(if (> (w y z) (x y z))
w
x))))

Lambda calculus predecessor function reduction steps

I am getting stuck with the Wikipedia description of the predecessor function in lambda calculus.
What Wikipedia says is the following:
PRED := λn.λf.λx. n (λg.λh. h (g f)) (λu.x) (λu.u)
Can someone explain reduction processes step-by-step?
Thanks.
Ok, so the idea of Church numerals is to encode "data" using functions, right? The way that works is by representing a value by some generic operation you'd perform with it. We can therefore go in the other direction as well, which can sometimes make things clearer.
Church numerals are a unary representation of the natural numbers. So, let's use Z to mean zero and Sn to represent the successor of n. Now we can count like this: Z, SZ, SSZ, SSSZ... The equivalent Church numeral takes two arguments--the first corresponding to S, and second to Z--then uses them to construct the above pattern. So given arguments f and x, we can count like this: x, f x, f (f x), f (f (f x))...
Let's look at what PRED does.
First, it creates a lambda taking three arguments--n is the Church numeral whose predecessor we want, of course, which means that f and x are the arguments to the resulting numeral, which thus means that the body of that lambda will be f applied to x one time fewer than n would.
Next, it applies n to three arguments. This is the tricky part.
The second argument, that corresponds to Z from earlier, is λu.x--a constant function that ignores one argument and returns x.
The first argument, that corresponds to S from earlier, is λgh.h (g f). We can rewrite this as λg. (λh.h (g f)) to reflect the fact that only the outermost lambda is being applied n times. What this function does is take the accumulated result so far as g and return a new function taking one argument, which applies that argument to g applied to f. Which is absolutely baffling, of course.
So... what's going on here? Consider the direct substitution with S and Z. In a non-zero number Sn, the n corresponds to the argument bound to g. So, remembering that f and x are bound in an outside scope, we can count like this: λu.x, λh. h ((λu.x) f), λh'. h' ((λh. h ((λu.x) f)) f) ... Performing the obvious reductions, we get this: λu.x, λh. h x, λh'. h' (f x) ... The pattern here is that a function is being passed "inward" one layer, at which point an S will apply it, while a Z will ignore it. So we get one application of f for each S except the outermost.
The third argument is simply the identity function, which is dutifully applied by the outermost S, returning the final result--f applied one fewer times than the number of S layers n corresponds to.
McCann's answer explains it pretty well. Let's take a concrete example for Pred 3 = 2:
Consider expression: n (λgh.h (g f)) (λu.x). Let K = (λgh.h (g f))
For n = 0, we encode 0 = λfx.x, so when we apply the beta reduction for (λfx.x)(λgh.h(gf)) means (λgh.h(gf)) is replaced 0 times. After further beta-reduction we get:
λfx.(λu.x)(λu.u)
reduces to
λfx.x
where λfx.x = 0, as expected.
For n = 1, we apply K for 1 times:
(λgh.h (g f)) (λu.x)
=> λh. h((λu.x) f)
=> λh. h x
For n = 2, we apply K for 2 times:
(λgh.h (g f)) (λh. h x)
=> λh. h ((λh. h x) f)
=> λh. h (f x)
For n = 3, we apply K for 3 times:
(λgh.h (g f)) (λh. h (f x))
=> λh.h ((λh. h (f x)) f)
=> λh.h (f (f x))
Finally, we take this result and apply an id function to it, we got
λh.h (f (f x)) (λu.u)
=> (λu.u)(f (f x))
=> f (f x)
This is the definition of number 2.
The list based implementation might be easier to understand, but it takes many intermediate steps. So it is not as nice as the Church's original implementation IMO.
After Reading the previous answers (good ones), I’d like to give my own vision of the matter in hope it helps someone (corrections are welcomed). I’ll use an example.
First off, I’d like to add some parenthesis to the definition that made everything clearer to me. Let’s redifine the given formula to:
PRED := λn λf λx.(n (λgλh.h (g f)) (λu.x)) (λu.u)
Let’s also define three Church numerals that will help with the example:
Zero := λfλx.x
One := λfλx. f (Zero f x)
Two := λfλx. f (One f x)
Three := λfλx. f (Two f x)
In order to understand how this works, let's focus first on this part of the formula:
n (λgλh.h (g f)) (λu.x)
From here, we can extract this conclusions:
n is a Church numeral, the function to be applied is λgλh.h (g f) and the starting data is λu.x
With this in mind, let's try an example:
PRED Three := λf λx.(Three (λgλh.h (g f)) (λu.x)) (λu.u)
Let's focus first on the reduction of the numeral (the part we explained before):
Three (λgλh.h (g f)) (λu.x)
Which reduces to:
(λgλh.h (g f)) (Two (λgλh.h (g f)) (λu.x))
(λgλh.h (g f)) ((λgλh.h (g f)) (One (λgλh.h (g f)) (λu.x)))
(λgλh.h (g f)) ((λgλh.h (g f)) ((λgλh.h (g f)) (Zero (λgλh.h (g f)) (λu.x))))
(λgλh.h (g f)) ((λgλh.h (g f)) ((λgλh.h (g f)) ((λfλx.x) (λgλh.h (g f)) (λu.x)))) -- Here we lose one application of f
(λgλh.h (g f)) ((λgλh.h (g f)) ((λgλh.h (g f)) (λu.x)))
(λgλh.h (g f)) ((λgλh.h (g f)) (λh.h ((λu.x) f)))
(λgλh.h (g f)) ((λgλh.h (g f)) (λh.h x))
(λgλh.h (g f)) (λh.h ((λh.h x) f))
(λgλh.h (g f)) (λh.h (f x))
(λh.h ((λh.h (f x) f)))
Ending up with:
λh.h f (f x)
So, we have:
PRED Three := λf λx.(λh.h (f (f x))) (λu.u)
Reducing again:
PRED Three := λf λx.((λu.u) (f (f x)))
PRED Three := λf λx.f (f x)
As you can see in the reductions, we end up applying the function one time less thanks to a clever way of using functions.
Using add1 as f and 0 as x, we get:
PRED Three add1 0 := add1 (add1 0) = 2
Hope this helps.
You can try to understand this definition of the predecessor function (not my favourite one) in terms of continuations.
To simplify the matter a bit, let us consider the following variant
PRED := λn.n (λgh.h (g S)) (λu.0) (λu.u)
then, you can replace S with f, and 0 with x.
The body of the function iterates n times a transformation M over an argument N. The argument N is a function of type (nat -> nat) -> nat that expects a continuation for nat and returns a nat. Initially, N = λu.0, that is it ignores the continuation and just returns 0.
Let us call N the current computation.
The function M: (nat -> nat) -> nat) -> (nat -> nat) -> nat modifies the computation g: (nat -> nat)->nat as follows.
It takes in input a continuation h, and applies it to the
result of continuing the current computation g with S.
Since the initial computation ignored the continuation, after one application of M we get the computation (λh.h 0), then (λh.h (S 0)), and so on.
At the end, we apply the computation to the identity continuation
to extract the result.
I'll add my explanation to the above good ones, mostly for the sake of my own understanding. Here's the definition of PRED again:
PRED := λnfx. (n (λg (λh.h (g f))) ) λu.x λu.u
The stuff on the right side of the first dot is supposed to be the (n-1) fold composition of f applied to x: f^(n-1)(x).
Let's see why this is the case by incrementally grokking the expression.
λu.x is the constant function valued at x. Let's just denote it const_x.
λu.u is the identity function. Let's call it id.
λg (λh.h (g f)) is a weird function that we need to understand. Let's call it F.
Ok, so PRED tells us to evaluate the n-fold composition of F on the constant function and then to evaluate the result on the identity function.
PRED := λnfx. F^n const_x id
Let's take a closer look at F:
F:= λg (λh.h (g f))
F sends g to evaluation at g(f).
Let's denote evaluation at value y by ev_y.
That is, ev_y := λh.h y
So
F = λg. ev_{g(f)}
Now we figure out what F^n const_x is.
F const_x = ev_{const_x(f)} = ev_x
and
F^2 const_x = F ev_x = ev_{ev_x(f)} = ev_{f(x)}
Similarly,
F^3 const_x = F ev_{f(x)} = ev_{f^2(x)}
and so on:
F^n const_x = ev_{f^(n-1)(x)}
Now,
PRED = λnfx. F^n const_x id
= λnfx. ev_{f^(n-1)(x)} id
= λnfx. id(f^(n-1)(x))
= λnfx. f^(n-1)(x)
which is what we wanted.
Super goofy. The idea is to turn doing something n times into doing f n-1 times. The solution is to apply F n times to const_x to obtain
ev_{f^(n-1)(x)} and then to extract f^(n-1)(x) by evaluating at the identity function.
Split this definition
PRED := λn.λf.λx.n (λg.λh.h (g f)) (λu.x) (λu.u)
into 4 parts:
PRED := λn.λf.λx. | n | (λg.λh.h (g f)) | (λu.x) | (λu.u)
- --------------- ------ ------
A B C D
For now, ignore D. By definition of Church numerals, A B C is B^n C: Apply n folds of B to C.
Now treat B like a machine that turns one input into one output. Its input g has form λh.h *, when appended by f, becomes (λh.h *) f = f *. This adds one more application of f to *. The result f * is then prepended by λh.h to become λh.h (f *).
You see the pattern: Each application of B turns λh.h * into λh.h (f *). If we had λh.h x as the begin term, we would have λh.h (f^n x) as the end term (after n applications of B).
However, the begin term is C = (λu.x), when appended by f, becomes (λu.x) f = x, then prepended by λh.h to become λh.h x. So we had λh.h x after, not before, the first application of B. This is why we have λh.h (f^(n-1) x) as the end term: The first application of f was ignored.
Finally, apply λh.h (f^(n-1) x) to D = (λu.u), which is identity, to get f^(n-1) x. That is:
PRED := λn.λf.λx.f^(n-1) x

Resources