How do you convert to lambda syntax? - lambda-calculus

Part of a question I'm trying to understand involves this:
twice (twice) f x , where twice == lambda f x . f (f x)
I'm trying to understand how to make that substitution, and what it means.
My understanding is that (lambda x y . x + y) 2 3 == 2 + 3 == 5. I don't understand what twice (twice) means, or f ( f x ).

Two ways of looking at this.
Mechanical application of beta-reduction
You can solve this mechanically just by expanding any subterm of the form twice F X - with this term you will eventually eliminate all the occurences of twice, although you need to take care that you really understand the syntax tree of the lambda calculus to avoid mistakes.
twice takes two arguments, so your expression twice (twice) f x is the redex twice (twice) f applied to x. (A redex is a subterm that you can reduce independently of the rest of the term).
Expand the definition of twice in the redex: twice (twice) f x -> twice (twice f).
Substitute this into the original term to get twice (twice f) x, which is another redex we can expand twice in to get twice f (twice f x) (take care with the brackets in this step).
We have two twice redexes we can expand here, expanding the one inside the brackets is slightly simpler, giving twice f (f (f x)), which can again be expanded to give f (f (f (f x))).
Semantics of twice via abstraction
You can see what's going on at a more intuitive level by appealing to a higher-order combinator, the "○" infix combinator for function composition:
f ○ g = lambda x. f (g x)
It's easy to verify that twice f x and (f ○ f) x both expand to the same normal form, i.e., f (f x), so by extensionality, we have
twice f = f ○ f
Using this, we can expand very straightforwardly, first eliminating twice in favour of the composition combinator:
twice (twice) f x
= (twice ○ twice) f x
= (twice (twice f)) x /* expand out '○' */
= (twice (f ○ f)) x
= ((f ○ f) ○ (f ○ f)) x
and then expanding out '○':
= (f ○ f) ((f ○ f) x)
= (f ○ f) (f (f x))
= (f (f (f (f x))))
That's more expansion steps, because we first expand to terms containing the '○' operator, and then expand these operators out, but the steps are simpler, more intuitive ones, where you are less likely to misunderstand what you are doing. The '○' is widely used, standard operator in Haskell and is well worth getting used to.

Related

Understanding church numerals

I'm working my way through SICP, and it gives the following definition for zero for Church Numerals:
(define zero (lambda (f) (lambda (x) x)))
I have a few questions about that:
Why the complicated syntax? It seems to be quite readable by just having the following instead:
(define (zero f)
(lambda (x) x))
where we can see it's a function called zero that takes one (unused) argument f and returns a function-of-one-parameter that will return its parameter. It almost seems like the definition is just intended to be as non-straightforward as possible.
What is the x there for? For example doing something like:
((zero square) 100)
returns 100. Is x just the default value returned?
There is no x in (lambda (x) x). None.
The x in (lambda (x) x) is bound. It could be named by any name whatever. We can not talk about x in (lambda (x) x) any more than we could talk about y in (lambda (y) y).
There is no y in (lambda (y) y) to speak of. It is just a placeholder, an arbitrary name whose sole purpose in the body is to be the same as in the binder. Same, without regard for which specific name is used there as long as it is used twice -- first time in the binder, and the other time in the body.
And in fact there is this whole 'nother notation for lambda terms, called De Bruijn notation, where the same whole thing is written (lambda 1). With 1 meaning, "I refer to the argument which the binder 1 step above me receives".
So x is unimportant. What's important is (lambda (x) x) which denotes a function which returns its argument as is. The so called "identity" function.
But even this is not important here. The Church encoding of a number is really a binary function, a function expecting two arguments -- the f and the z. The "successor step" unary function f and the "zero" "value" z, whatever that might be, as long as the two go together. Make sense together. Work together.
So how come we see two unary functions there when it is really one binary function in play?
That is the important bit. It is known as currying.
In lambda calculus all functions are unary. And to represent a binary function an unary function is used, such that when given its (first) argument it returns another unary function, which, when given its (now, second) argument, performs whatever thing our intended binary function ought to perform, using those two arguments, the first and the second.
This is all very very simple if we just write it in combinatory (equational) notation instead of the lambda notation:
zero f z = z
one f z = f z
two f z = f (f z) = f (one f z) = succ one f z
succ one f z = f (one f z)
where every juxtaposition denotes an application, and all applications associate on the left, so we imagine the above being a shortcut notation for
zero f = lambda z. z
zero = lambda f. (lambda z. z)
......
......
succ = lambda one. (lambda f. (lambda z. f (one f z) ))
;; such that
succ one f z = (((succ one) f) z)
= ((((lambda one. (lambda f. (lambda z. f (one f z) ))) one) f) z)
= ....
= (f ((one f) z))
= f (one f z)
but it's the same thing. The differences in notation are not important.
And of course there is no one in lambda one. (lambda f. (lambda z. f (one f z) )). It is bound. It could just be named, I dunno, number:
succ number f z = f (number f z) = f ((number f) z)
meaning, (succ number) is such a number, which, given the f and the z, does with them one more f step compared to what number would do.
And so, ((zero square) 100) means, use the number zero with the successor step square and the zero value of 100, and have zero perform its number of successor steps for us -- that is to say, 0 steps -- starting from the zero value. Thus returning it unchanged.
Another possible use is ((zero (lambda (x) 0)) 1), or in general
((lambda (n) ((n (lambda (x) 0)) 1)) zero)
;; or even more generally, abstracting away the 0 and the 1,
((((lambda (n) (lambda (t) (lambda (f) ((n (lambda (x) f)) t)))) zero) 1) 0)
which is just another way of writing
zero (lambda x. 0) 1 ;; or
foo n t f = n (lambda x. f) t ;; and calling
foo zero 1 0
Hopefully you can see what foo is, easily. And also how to read aloud this t and this f. (Probably the original f would be better named s, for "successor", or something like that).

How to use quote and unquote to more faithfully translate The Reasoned Schemer into Racket?

(Details of my miniKanren in Racket setup appear at the bottom[1].)
The way quotes and unquotes work in The Reasoned Schemer appears not to match the way they work in Racket. For instance, verse 2 of chapter 2 suggests[2] the following function definition:
(run #f
(r )
(fresh (y x )
(== '(,x ,y) r )))
If I evaluate that, I get '((,x ,y)). If instead I rewrite it as this:
(run #f
(r )
(fresh (y x )
(== (list x y) r)))
I get the expected result, '((_.0 _.1)).
This might seem like a minor problem, but in many cases the required translation is extremely verbose. For instance, in exercise 45 of chapter 3 (page 34), the book provides, roughly[3] the following definition:
(run 5 (r)
(fresh (w x y z)
(loto (('g 'g) ('e w) (x y) . z))
(== (w (x y) z) r)))
In order to get the results they get, I had to rewrite it like this:
(run 5 (r)
(fresh (w x y z)
(loto (cons '(g g)
(cons (list 'e w)
(cons (list x y)
z))))
(== (list w (list x y) z)
r)))
[1] As described here, I ran raco pkg install minikanren and then defined a few missing pieces.
[2] Actually, they don't write precisely that, but if you heed the advice in the footnotes to that verse and an earlier verse, it's what you get.
[3] Modulo some implicit quoting and unquoting that I cannot deduce.
Use the backquote ` instead of the simple quote ' you have been using.

Two-layer "Y-style" combinator. Is this common? Does this have an official name?

I've been looking into how languages that forbid use-before-def and don't have mutable cells (no set! or setq) can nonetheless provide recursion. I of course ran across the (famous? infamous?) Y combinator and friends, e.g.:
http://www.ece.uc.edu/~franco/C511/html/Scheme/ycomb.html
http://okmij.org/ftp/Computation/fixed-point-combinators.html
http://www.angelfire.com/tx4/cus/combinator/birds.html
http://en.wikipedia.org/wiki/Fixed-point_combinator
When I went to implement "letrec" semantics in this style (that is, allow a local variable to be defined such that it can be a recursive function, where under the covers it doesn't ever refer to its own name), the combinator I ended up writing looks like this:
Y_letrec = λf . (λx.x x) (λs . (λa . (f ((λx.x x) s)) a))
Or, factoring out the U combinator:
U = λx.x x
Y_letrec = λf . U (λs . (λa . (f (U s)) a))
Read this as: Y_letrec is a function which takes a to-be-recursed function f.
f must be a single-argument function which accepts s, where s is the function
that f can call to achieve self-recursion. f is expected to define and return
an "inner" function which does the "real" operation. That inner function accepts
argument a (or in the general case an argument list, but that can't be expressed
in the traditional notation). The result of calling Y_letrec is a result of calling
f, and it is presumed to be an "inner" function, ready to be called.
The reason I set things up this way is so that I could use the parse tree form of the
to-be-recursed function directly, without modification, merely wrapping an additional
function layer around it during transformation when handling letrec. E.g., if the
original code is:
(letrec ((foo (lambda (a) (foo (cdr a))))))
then the transformed form would be along the lines of:
(define foo (Y_letrec (lambda (foo) (lambda (a) (foo (cdr a))))))
Note that the inner function body is identical between the two.
My questions are:
Is my Y_letrec function commonly used?
Does it have a well-established name?
Note: The first link above refers to a similar function (in "step 5") as the "applicative-order Y combinator", though I'm having trouble finding an authoritative source for that naming.
UPDATE 28-apr-2013:
I realized that Y_letrec as defined above is very close to but not identical to the Z combinator as defined in Wikipedia. Per Wikipedia, the Z combinator and "call-by-value Y combinator" are the same thing, and it looks like that is indeed the thing that may be more commonly called the "applicative-order Y combinator."
So, what I have above is not the same as the applicative-order Y combinator as usually written, but there is almost certainly a sense in which they're related. Here's how I did the comparison:
Starting with:
Y_letrec = λf . (λx.x x) (λs . (λa . (f ((λx.x x) s)) a))
Apply the inner U:
Y_letrec = λf . (λx.x x) (λs . (λa . (f (s s)) a))
Apply the outer U:
Y_letrec = λf . (λs . (λa . (f (s s)) a)) (λs . (λa . (f (s s)) a))
Rename to match Wikipedia's definition of the Z combinator:
Y_letrec = λf . (λx . (λv . (f (x x)) v)) (λx . (λv . (f (x x)) v))
Compare this to Wikipedia's Z combinator:
Z = λf . (λx . f (λv . ((x x) v))) (λx . f (λv . ((x x) v)))
The salient difference is where the function f is being applied. Does it matter? Are these two functions equivalent despite this difference?
Yes, it is an applicative-order Y combinator. Using U inside it is perfectly OK, I did it too (cf. fixed point combinator in lisp). Whether the usage of U to shorten code has a name or not, I don't think so. It's just an application of a lambda-term, and yes, it makes it clearer IMO too.
What does have a name, is eta-conversion, used in your code to delay evaluation under applicative order, where arguments' values must be known before functional application.
With U applied through and through and eta-reduction performed on your code ( (λa.(f (s s)) a) ==> f (s s) ), it becomes the familiar normal-order Y combinator - i.e. such that works under normal-order evaluation, where arguments' values aren't demanded before functional application, which might end up not needing them (or some of them) after all:
Y = λf . (λs.f (s s)) (λs.f (s s))
BTW the delaying can be applied in slightly different way,
Y_ = λf . (λx.x x) (λs.f (λa.(s s) a))
which also works under applicative-order evaluation rules.
What is the difference? let's compare the reduction sequences. Your version,
Y_ = λf . (λx . (λv . (f (x x)) v)) (λx . (λv . (f (x x)) v))
((Y_ f) a) =
= ((λx . (λv . (f (x x)) v)) (λx . (λv . (f (x x)) v))) a
= (λv . (f (x x)) v) a { x := (λx . (λv . (f (x x)) v)) }
= (f (x x)) a
= | ; here (f (x x)) application must be evaluated, so
| ; the value of (x x) is first determined
| (x x)
| = ((λx . (λv . (f (x x)) v)) (λx . (λv . (f (x x)) v)))
| = (λv . (f (x x)) v) { x := (λx . (λv . (f (x x)) v)) }
and here f is entered. So here too, the well-behaved function f receives its first argument and it's supposed not to do anything with it. So maybe the two are exactly equivalent after all.
But really, the minutia of lambda-expressions definitions do not matter when it comes to the real implementation, because real implementation language will have pointers and we'll just manipulate them to point properly to the containing expression body, and not to its copy. Lambda calculus is done with pencil and paper after all, as textual copying and replacement. Y combinator in lambda calculus only emulates recursion. True recursion is true self-reference; not receiving copies equal to self, through self-application (however smart that is).
TL;DR: though language being defined can be devoid of such fun stuff as assignment and pointer equality, the language in which we define it will most certainly have those, because we need them for efficiency. At the very least, its implementation will have them, under the hood.
see also: fixed point combinator in lisp , esp. In Scheme, how do you use lambda to create a recursive function?.

Scheme-- how do procedures take other procedures as arguments?

Write a Scheme procedure named 'proc4' which takes 2 procedures as arguments (w,x) [note that w and x can be expected to work correctly when given two numbers as arguments]
and returns a procedure which takes 2 numbers (y,z) as arguments and returns the
procedure (w or x) which results in the greatest number when applied to y and z
(i.e. in C++ pseudocode if ((y w z) > (y x z)) {return w; } else {return x;} )
So I started
(define proc4(lamdda ( w x) (lambda y z)...
Then I wanted to do the if part. Something like
(if (> (apply w ( y z)) (apply x( w z))) but I keep getting errors.
I've been trying to find help on internet but everything I've seen so far does not make sense to me.
You can invoke function objects directly, without using apply:
(define (proc4 f g)
(lambda (x y)
(if (> (f x y) (g x y))
f
g)))
A bit of syntactic sugar for #ChrisJester-Young's answer - you can declare a procedure that returns another procedure like this:
(define ((proc4 f g) x y)
(if (> (f x y) (g x y))
f
g))
In the above code, the first procedure receives as parameters the procedures f and g, and in turn returns a procedure that receives as parameters x and y. We know that f and g are procedures because the way they're used inside the body of the definition, but they can have any name you want. Of course you can call the procedure in the usual way:
((proc4 + *) 10 20)
=> #<procedure:*>
The point of interest in this example is that procedures can also be passed as parameters (and returned as values), you don't need to apply them, just invoke the procedures received as parameters as you would with any other procedure. Also notice that all the answers to this question are equivalent, but the short-hand syntax that I'm using might not be available in all interpreters.
I cannot make much sense of this (obviously homework) question but I'd go for this:
(define proc4
(lambda (w x)
(lambda (y z)
(if (> (w y z) (x y z))
w
x))))

Lambda calculus predecessor function reduction steps

I am getting stuck with the Wikipedia description of the predecessor function in lambda calculus.
What Wikipedia says is the following:
PRED := λn.λf.λx. n (λg.λh. h (g f)) (λu.x) (λu.u)
Can someone explain reduction processes step-by-step?
Thanks.
Ok, so the idea of Church numerals is to encode "data" using functions, right? The way that works is by representing a value by some generic operation you'd perform with it. We can therefore go in the other direction as well, which can sometimes make things clearer.
Church numerals are a unary representation of the natural numbers. So, let's use Z to mean zero and Sn to represent the successor of n. Now we can count like this: Z, SZ, SSZ, SSSZ... The equivalent Church numeral takes two arguments--the first corresponding to S, and second to Z--then uses them to construct the above pattern. So given arguments f and x, we can count like this: x, f x, f (f x), f (f (f x))...
Let's look at what PRED does.
First, it creates a lambda taking three arguments--n is the Church numeral whose predecessor we want, of course, which means that f and x are the arguments to the resulting numeral, which thus means that the body of that lambda will be f applied to x one time fewer than n would.
Next, it applies n to three arguments. This is the tricky part.
The second argument, that corresponds to Z from earlier, is λu.x--a constant function that ignores one argument and returns x.
The first argument, that corresponds to S from earlier, is λgh.h (g f). We can rewrite this as λg. (λh.h (g f)) to reflect the fact that only the outermost lambda is being applied n times. What this function does is take the accumulated result so far as g and return a new function taking one argument, which applies that argument to g applied to f. Which is absolutely baffling, of course.
So... what's going on here? Consider the direct substitution with S and Z. In a non-zero number Sn, the n corresponds to the argument bound to g. So, remembering that f and x are bound in an outside scope, we can count like this: λu.x, λh. h ((λu.x) f), λh'. h' ((λh. h ((λu.x) f)) f) ... Performing the obvious reductions, we get this: λu.x, λh. h x, λh'. h' (f x) ... The pattern here is that a function is being passed "inward" one layer, at which point an S will apply it, while a Z will ignore it. So we get one application of f for each S except the outermost.
The third argument is simply the identity function, which is dutifully applied by the outermost S, returning the final result--f applied one fewer times than the number of S layers n corresponds to.
McCann's answer explains it pretty well. Let's take a concrete example for Pred 3 = 2:
Consider expression: n (λgh.h (g f)) (λu.x). Let K = (λgh.h (g f))
For n = 0, we encode 0 = λfx.x, so when we apply the beta reduction for (λfx.x)(λgh.h(gf)) means (λgh.h(gf)) is replaced 0 times. After further beta-reduction we get:
λfx.(λu.x)(λu.u)
reduces to
λfx.x
where λfx.x = 0, as expected.
For n = 1, we apply K for 1 times:
(λgh.h (g f)) (λu.x)
=> λh. h((λu.x) f)
=> λh. h x
For n = 2, we apply K for 2 times:
(λgh.h (g f)) (λh. h x)
=> λh. h ((λh. h x) f)
=> λh. h (f x)
For n = 3, we apply K for 3 times:
(λgh.h (g f)) (λh. h (f x))
=> λh.h ((λh. h (f x)) f)
=> λh.h (f (f x))
Finally, we take this result and apply an id function to it, we got
λh.h (f (f x)) (λu.u)
=> (λu.u)(f (f x))
=> f (f x)
This is the definition of number 2.
The list based implementation might be easier to understand, but it takes many intermediate steps. So it is not as nice as the Church's original implementation IMO.
After Reading the previous answers (good ones), I’d like to give my own vision of the matter in hope it helps someone (corrections are welcomed). I’ll use an example.
First off, I’d like to add some parenthesis to the definition that made everything clearer to me. Let’s redifine the given formula to:
PRED := λn λf λx.(n (λgλh.h (g f)) (λu.x)) (λu.u)
Let’s also define three Church numerals that will help with the example:
Zero := λfλx.x
One := λfλx. f (Zero f x)
Two := λfλx. f (One f x)
Three := λfλx. f (Two f x)
In order to understand how this works, let's focus first on this part of the formula:
n (λgλh.h (g f)) (λu.x)
From here, we can extract this conclusions:
n is a Church numeral, the function to be applied is λgλh.h (g f) and the starting data is λu.x
With this in mind, let's try an example:
PRED Three := λf λx.(Three (λgλh.h (g f)) (λu.x)) (λu.u)
Let's focus first on the reduction of the numeral (the part we explained before):
Three (λgλh.h (g f)) (λu.x)
Which reduces to:
(λgλh.h (g f)) (Two (λgλh.h (g f)) (λu.x))
(λgλh.h (g f)) ((λgλh.h (g f)) (One (λgλh.h (g f)) (λu.x)))
(λgλh.h (g f)) ((λgλh.h (g f)) ((λgλh.h (g f)) (Zero (λgλh.h (g f)) (λu.x))))
(λgλh.h (g f)) ((λgλh.h (g f)) ((λgλh.h (g f)) ((λfλx.x) (λgλh.h (g f)) (λu.x)))) -- Here we lose one application of f
(λgλh.h (g f)) ((λgλh.h (g f)) ((λgλh.h (g f)) (λu.x)))
(λgλh.h (g f)) ((λgλh.h (g f)) (λh.h ((λu.x) f)))
(λgλh.h (g f)) ((λgλh.h (g f)) (λh.h x))
(λgλh.h (g f)) (λh.h ((λh.h x) f))
(λgλh.h (g f)) (λh.h (f x))
(λh.h ((λh.h (f x) f)))
Ending up with:
λh.h f (f x)
So, we have:
PRED Three := λf λx.(λh.h (f (f x))) (λu.u)
Reducing again:
PRED Three := λf λx.((λu.u) (f (f x)))
PRED Three := λf λx.f (f x)
As you can see in the reductions, we end up applying the function one time less thanks to a clever way of using functions.
Using add1 as f and 0 as x, we get:
PRED Three add1 0 := add1 (add1 0) = 2
Hope this helps.
You can try to understand this definition of the predecessor function (not my favourite one) in terms of continuations.
To simplify the matter a bit, let us consider the following variant
PRED := λn.n (λgh.h (g S)) (λu.0) (λu.u)
then, you can replace S with f, and 0 with x.
The body of the function iterates n times a transformation M over an argument N. The argument N is a function of type (nat -> nat) -> nat that expects a continuation for nat and returns a nat. Initially, N = λu.0, that is it ignores the continuation and just returns 0.
Let us call N the current computation.
The function M: (nat -> nat) -> nat) -> (nat -> nat) -> nat modifies the computation g: (nat -> nat)->nat as follows.
It takes in input a continuation h, and applies it to the
result of continuing the current computation g with S.
Since the initial computation ignored the continuation, after one application of M we get the computation (λh.h 0), then (λh.h (S 0)), and so on.
At the end, we apply the computation to the identity continuation
to extract the result.
I'll add my explanation to the above good ones, mostly for the sake of my own understanding. Here's the definition of PRED again:
PRED := λnfx. (n (λg (λh.h (g f))) ) λu.x λu.u
The stuff on the right side of the first dot is supposed to be the (n-1) fold composition of f applied to x: f^(n-1)(x).
Let's see why this is the case by incrementally grokking the expression.
λu.x is the constant function valued at x. Let's just denote it const_x.
λu.u is the identity function. Let's call it id.
λg (λh.h (g f)) is a weird function that we need to understand. Let's call it F.
Ok, so PRED tells us to evaluate the n-fold composition of F on the constant function and then to evaluate the result on the identity function.
PRED := λnfx. F^n const_x id
Let's take a closer look at F:
F:= λg (λh.h (g f))
F sends g to evaluation at g(f).
Let's denote evaluation at value y by ev_y.
That is, ev_y := λh.h y
So
F = λg. ev_{g(f)}
Now we figure out what F^n const_x is.
F const_x = ev_{const_x(f)} = ev_x
and
F^2 const_x = F ev_x = ev_{ev_x(f)} = ev_{f(x)}
Similarly,
F^3 const_x = F ev_{f(x)} = ev_{f^2(x)}
and so on:
F^n const_x = ev_{f^(n-1)(x)}
Now,
PRED = λnfx. F^n const_x id
= λnfx. ev_{f^(n-1)(x)} id
= λnfx. id(f^(n-1)(x))
= λnfx. f^(n-1)(x)
which is what we wanted.
Super goofy. The idea is to turn doing something n times into doing f n-1 times. The solution is to apply F n times to const_x to obtain
ev_{f^(n-1)(x)} and then to extract f^(n-1)(x) by evaluating at the identity function.
Split this definition
PRED := λn.λf.λx.n (λg.λh.h (g f)) (λu.x) (λu.u)
into 4 parts:
PRED := λn.λf.λx. | n | (λg.λh.h (g f)) | (λu.x) | (λu.u)
- --------------- ------ ------
A B C D
For now, ignore D. By definition of Church numerals, A B C is B^n C: Apply n folds of B to C.
Now treat B like a machine that turns one input into one output. Its input g has form λh.h *, when appended by f, becomes (λh.h *) f = f *. This adds one more application of f to *. The result f * is then prepended by λh.h to become λh.h (f *).
You see the pattern: Each application of B turns λh.h * into λh.h (f *). If we had λh.h x as the begin term, we would have λh.h (f^n x) as the end term (after n applications of B).
However, the begin term is C = (λu.x), when appended by f, becomes (λu.x) f = x, then prepended by λh.h to become λh.h x. So we had λh.h x after, not before, the first application of B. This is why we have λh.h (f^(n-1) x) as the end term: The first application of f was ignored.
Finally, apply λh.h (f^(n-1) x) to D = (λu.u), which is identity, to get f^(n-1) x. That is:
PRED := λn.λf.λx.f^(n-1) x

Resources