Why the definition of Church's Numerals - lambda-calculus

I want to understand, why Church define the numerals like:
0 = λ f . λ x . x
1 = λ f . λ x . f x
2 = λ f . λ x . f f x
3 = λ f . λ x . f f f x
4 = λ f . λ x . f f f f x
What is the logic behind?
Why 0 is represent like:
0 = λ f . λ x . x

Church wasn't trying to be practical. He was trying to prove results about the expressive power of lambda calculus — that in principle any possible computation can be done in lambda calculus, hence lambda calculus can serve as a theoretical foundation for the study of computability. In order to do so, it was necessary to encode numbers as lambda expressions, in such a way that things like the successor function are easily definable. This was a key step in showing the equivalence of lambda calculus and Gödel's recursive function theory (which was about computable functions on the natural numbers). Church numerals are basically a convenient albeit not very readable encoding of numbers. In some sense, there isn't any very deep logic to it. The claim isn't that 1 in its essence is λ f . λ x . f x, but that the latter is a serviceable encoding of the former.
This doesn't mean that it is an arbitrary encoding. There is a definite logic to it. The most natural way to encode a number n is by something which involves n. Church numerals use n function applications. The natural number n is represented by the higher order function which applies a function n times to an input. 1 is encoded by a function applied once, 2 by a function applied twice and so on. It is a very natural encoding, especially in the context of lambda calculus. Furthermore, the fact that it is easy to define arithmetic on them streamlines the proof that lambda calculus is equivalent to recursive functions.
To see this in practice, you can run the following Python3 script:
#some Church numerals:
ZERO = lambda f: lambda x: x
ONE = lambda f: lambda x: f(x)
TWO = lambda f: lambda x: f(f(x))
THREE = lambda f: lambda x: f(f(f(x)))
#function to apply these numerals to:
def square(x): return x**2
#so ZERO(square), ONE(square), etc. are functions
#apply these to 2 and print the results:
print(ZERO(square)(2), ONE(square)(2), TWO(square)(2),THREE(square)(2))
Output:
2 4 16 256
Note that these numbers have been obtained by squaring the number two 0 times, 1 times, 2 times, and 3 times respectively.

According to the Peano axioms, a natural number is either 0 or S(n) for another natural number n:
0 = 0
1 = S(0)
2 = S(S(0))
...
You can see Church numerals as a generalization of Peano numbers, where you provide your own 0 and S:
0 = λs.λz. z
1 = λs.λz. s(z)
2 = λs.λz. s(s(z))
...
Since this is a programming forum, let's create some Church numerals in EcmaScript 6:
const ZERO = s => z => z;
const ONE = s => z => s(z);
const TWO = s => z => s(s(z));
...
You can convert these Church numerals to JavaScript numbers by providing the appropriate zero and successor:
function toInt(n) {
return n(i => i + 1)(0);
}
And then:
> toInt(TWO)
2
You could use Church numerals to do some practical things:
function shout(text) {
return text + "!";
}
> shout("hi")
"hi!"
> NINE(shout)("hi")
"hi!!!!!!!!!"
​
You can try it here: https://es6console.com/iyoim5y8/

The following paper by Robert (Corky) Cartwright broke it down for me very well.
Essential points to grasp, for the very beginning:
all Church numerals are functions with two parameters;
in the Church representation of any number, it is implied that:
f — is the 'successor' function (i.e. function which accepts a Church numeral and returns church numeral next to the passed one, it's basically and increment);
x — is a (Church numeral) value representing 'zero' (the count starting point).
Keeping that in mind:
λf . λx . x
will be equal to zero, if we will pass the appropriate f ('successor' —increment function) and x ('zero' — count starting point). In this particular case it doesn't matter what function will be passed as f, since it never applied:
λf . λx . ZERO
this:
λf . λx . fx
will be evaluated to 1:
λf . λx . INCREMENT ZERO
and the following:
λf . λx . f f x
will be qual to 2:
λf . λx . INCREMENT(INCREMENT ZERO)
and so on, for all the successive numbers.
Bonus (addition, multiplication and exponentiation of Church numerals):
Here is a Python code snippet to illustrate (and expand on) said above:
ZERO = lambda f: lambda x: x
ONE = lambda f: lambda x: f(x)
TWO = lambda f: lambda x: f(f(x))
THREE = lambda f: lambda x: f(f(f(x)))
SUCC = lambda x: x + 1
ADD = lambda f: lambda x: lambda n: lambda m: n(f)(m(f)(x))
MULT = lambda f: lambda x: lambda n: lambda m: n(m(f))(x)
EXPON = lambda m: lambda n: n(m)
ADD exploits the fact that any Church numeral accepts a 'zero' count starting point, as it's argument — it just counts to n starting with m. So, ADD(SUCC)(0)(THREE)(TWO) will just count to 3, but starting with 2, thus giving us 2 + 1 + 1 + 1 = 5.
MULT utilizes the fact that a 'successor' function is just an argument of a Church numeral, and thus could be replaced. Thus MULT(SUCC)(0)(TWO)(THREE) will return 3 twice, which is the same as 2 * 3 = 6.
EXPON is a bit tricky (it was for myself, at least), the key point here is to keep track of what is getting returned by what. What it, basically, does — is uses the intrinsic mechanism of Church numerals representation (recursion of f application, in particular). Here are some examples to illustrate:
30
EXPON(THREE)(ZERO)(SUCC)(0)
↓
lambda n: n(THREE)(ZERO)(SUCC)(0)
↓
ZERO(THREE)(SUCC)(0)
↓
lambda x: (SUCC)(0)
↓
SUCC(0)
↓
1
31
EXPON(THREE)(ONE)(SUCC)(0)
↓
lambda n: n(THREE)(ONE)(SUCC)(0)
↓
ONE(THREE)(SUCC)(0)
↓
lambda x: THREE(x)(SUCC)(0)
↓
THREE(SUCC)(0)
↓
3
13
EXPON(ONE)(THREE)(SUCC)(0)
↓
lambda n: n(ONE)(THREE)(SUCC)(0)
↓
THREE(ONE)(SUCC)(0)
↓
lambda x: ONE(ONE(ONE(x)))(SUCC)(0)
↓
ONE(ONE(ONE(SUCC)))(0)
↓
ONE(ONE(lambda x: SUCC(x)))(0)
↓
lambda x:(lambda x: (lambda x: SUCC(x)) (x))(x)(0)
↓
SUCC(0)
↓
1

Related

Allow outer variable inside Julia's sort comparison

The issue:
I want to be able to sort with a custom function that depends on an outer defined variable, for example:
k = 2
sort([1,2,3], lt=(x,y) -> x + k > y)
This works all dandy and such because k is defined in the global scope.
That's where my issue lays, as I want to do something akin to:
function
k = 2
comp = (x,y) -> x + k > y
sort([1,3,3], lt=comp)
end
Which works, but feels like a hack, because my comparison function is way bigger and it feels really off to have to have it defined there in the body of the function.
For instance, this wouldn't work:
comp = (x,y) -> x + k > y # or function comp(x,y) ... end
function
k = 2
sort([1,3,3], lt=comp)
end
So I'm just wondering if there's any way to capture variables such as k like you'd be able to do with lambda functions in C++.
Is this what you want?
julia> comp(k) = (x,y) -> x + k > y
comp (generic function with 1 method)
julia> sort([1,3,2,3,2,2,2,3], lt=comp(2))
8-element Vector{Int64}:
3
2
2
2
3
2
3
1

Defining a mathematical language in prolog

So I have this mathematical language, it goes like this:
E -> number
[+,E,E,E] //e.g. [+,1,2,3] is 1+2+3 %we can put 2 to infinite Es here.
[-,E,E,E] //e.g. [-,1,2,3] is 1-2-3 %we can put 2 to infinite Es here.
[*,E,E,E] //e.g. [*,1,2,3] is 1*2*3 %we can put 2 to infinite Es here.
[^,E,E] //e.g. [^,2,3] is 2^3
[sin,E] //e.g. [sin,0] is sin 0
[cos,E] //e.g. [cos,0] is cos 0
and I want to write the set of rules that finds the numeric value of a mathematical expression written by this language in prolog.
I first wrote a function called "check", it checks to see if the list is written in a right way according to the language we have :
check1([]).
check1([L|Ls]):- number(L),check1(Ls).
check([L|Ls]):-atom(L),check1(Ls).
now I need to write the function "evaluate" that takes a list that is an expression written by this language, and a variable that is the numeric value corresponding to this language.
example:
?-evaluate([*,1,[^,2,2],[*,2,[+,[sin,0],5]]]],N) -> N = 40
so I wrote this:
sum([],0).
sum([L|Ls],N):- not(is_list(L)),sum(Ls,No),N is No + L.
min([],0).
min([L|Ls],N):-not(is_list(L)), min(Ls,No),N is No - L.
pro([],0).
pro([X],[X]).
pro([L|Ls],N):-not(is_list(L)), pro(Ls,No), N is No * L.
pow([L|Ls],N):-not(is_list(L)), N is L ^ Ls.
sin_(L,N):-not(is_list(L)), N is sin(L).
cos_(L,N):-not(is_list(L)), N is cos(L).
d([],0).
d([L|Ls],N):- L == '+' ,sum(Ls,N);
L == '-',min(Ls,N);
L == '*',pro(Ls,N);
L == '^',pow(Ls,N);
L == 'sin',sin_(Ls,N);
L == 'cos',cos_(Ls,N).
evaluate([],0).
evaluate([L|Ls],N):-
is_list(L) , check(L) , d(L,N),L is N,evaluate(Ls,N);
is_list(L), not(check(L)) , evaluate(Ls,N);
not(is_list(L)),not(is_list(Ls)),check([L|Ls]),d([L|Ls],N),
L is N,evaluate(Ls,N);
is_list(Ls),evaluate(Ls,N).
and it's working for just a list and returning the right answer , but not for multiple lists inside the main list, how should my code be?
The specification you work with looks like a production rule that describes that E (presumably short for Expression) might be a number or one of the 6 specified operations. That is the empty list [] is not an expression. So the fact
evaluate([],0).
should not be in your code. Your predicate sum/2 almost works the way you wrote it, except for the empty list and a list with a single element, that are not valid inputs according to your specification. But the predicates min/2 and pro/2 are not correct. Consider the following examples:
?- sum([1,2,3],X).
X = 6 % <- correct
?- sum([1],X).
X = 1 % <- incorrect
?- sum([],X).
X = 0 % <- incorrect
?- min([1,2,3],X).
X = -6 % <- incorrect
?- pro([1,2,3],X).
X = 6 ? ; % <- correct
X = 0 % <- incorrect
Mathematically speaking, addition and multiplication are associative but subtraction is not. In programming languages all three of these operations are usually left associative (see e.g. Operator associativity) to yield the mathematically correct result. That is, the sequence of subtractions in the above query would be calculated:
1-2-3 = (1-2)-3 = -4
The way you define a sequence of these operations resembles the following calculation:
[A,B,C]: ((0 op C) op B) op A
That works out fine for addition:
[1,2,3]: ((0 + 3) + 2) + 1 = 6
But it doesn't for subtraction:
[1,2,3]: ((0 - 3) - 2) - 1 = -6
And it is responsible for the second, incorrect solution when multiplying:
[1,2,3]: ((0 * 3) * 2) * 1 = 0
There are also some other issues with your code (see e.g. #lurker's comments), however, I won't go into further detail on that. Instead, I suggest a predicate that adheres closely to the specifying production rule. Since the grammar is describing expressions and you want to know the corresponding values, let's call it expr_val/2. Now let's describe top-down what an expression can be: It can be a number:
expr_val(X,X) :-
number(X).
It can be an arbitrarily long sequence of additions or subtractions or multiplications respectively. For the reasons above all three sequences should be evaluated in a left associative way. So it's tempting to use one rule for all of them:
expr_val([Op|Es],V) :-
sequenceoperator(Op), % Op is one of the 3 operations
exprseq_op_val(Es,Op,V). % V is the result of a sequence of Ops
The power function is given as a list with three elements, the first being ^ and the others being expressions. So that rule is pretty straightforward:
expr_val([^,E1,E2],V) :-
expr_val(E1,V1),
expr_val(E2,V2),
V is V1^V2.
The expressions for sine and cosine are both lists with two elements, the first being sin or cos and the second being an expression. Note that the argument of sin and cos is the angle in radians. If the second argument of the list yields the angle in radians you can use sin/1 and cos/2 as you did in your code. However, if you get the angle in degrees, you need to convert it to radians first. I include the latter case as an example, use the one that fits your application.
expr_val([sin,E],V) :-
expr_val(E,V1),
V is sin(V1*pi/180). % radians = degrees*pi/180
expr_val([cos,E],V) :-
expr_val(E,V1),
V is cos(V1*pi/180). % radians = degrees*pi/180
For the second rule of expr_val/2 you need to define the three possible sequence operators:
sequenceoperator(+).
sequenceoperator(-).
sequenceoperator(*).
And subsequently the predicate exprseq_op_val/3. As the leading operator has already been removed from the list in expr_val/2, the list has to have at least two elements according to your specification. In order to evaluate the sequence in a left associative way the value of the head of the list is passed as an accumulator to another predicate exprseq_op_val_/4
exprseq_op_val([E1,E2|Es],Op,V) :-
expr_val(E1,V1),
exprseq_op_val_([E2|Es],Op,V,V1).
that is describing the actual evaluation. There are basically two cases: If the list is empty then, regardless of the operator, the accumulator holds the result. Otherwise the list has at least one element. In that case another predicate, op_val_args/4, delivers the result of the respective operation (Acc1) that is then recursively passed as an accumulator to exprseq_op_val_/4 alongside with the tail of the list (Es):
exprseq_op_val_([],_Op,V,V).
exprseq_op_val_([E1|Es],Op,V,Acc0) :-
expr_val(E1,V1),
op_val_args(Op,Acc1,Acc0,V1),
exprseq_op_val_(Es,Op,V,Acc1).
At last you have to define op_val_args/4, that is again pretty straightforward:
op_val_args(+,V,V1,V2) :-
V is V1+V2.
op_val_args(-,V,V1,V2) :-
V is V1-V2.
op_val_args(*,V,V1,V2) :-
V is V1*V2.
Now let's see how this works. First your example query:
?- expr_val([*,1,[^,2,2],[*,2,[+,[sin,0],5]]],V).
V = 40.0 ? ;
no
The simplest expression according to your specification is a number:
?- expr_val(-3.14,V).
V = -3.14 ? ;
no
The empty list is not an expression:
?- expr_val([],V).
no
The operators +, - and * need at least 2 arguments:
?- expr_val([-],V).
no
?- expr_val([+,1],V).
no
?- expr_val([*,1,2],V).
V = 2 ? ;
no
?- expr_val([-,1,2,3],V).
V = -4 ? ;
no
The power function has exactly two arguments:
?- expr_val([^,1,2,3],V).
no
?- expr_val([^,2,3],V).
V = 8 ? ;
no
?- expr_val([^,2],V).
no
?- expr_val([^],V).
no
And so on...

Church numerals: How should I interpret the numbers from expressions?

Can someone explain to me using substitutions how we get a number "zero" or the rest of natural numbers?
For example the value: "zero"
λf.λx.x
if I apply this expression on an another expression:
"(λf.(λx.x)) a"
then using substitution:
:=[a/f](λx.x)
:=(λx.x)
what am I missing? How should I interpret these number expressions?
The church numeral n is a function that takes another function f and returns a function that applies f to its argument n times. So 0 a (where 0 is, as you said, λf.λx.x
) returns λx.x because that applies a to x 0 times.
1 a gives you λx. a x, 2 a gives you λx. a (a x) and so on.
Below is the explanation based on paper by Erhan Bagdemir in the comment to answer by sepp2k.
Essential points to grasp:
all Church numerals are functions of two parameters;
for Church numerals, it is implied that:
f — is the 'successor' function (i.e. function which accepts a Church numeral and return church numeral next one, it's basically and increment);
x — is a (Church numeral) value representing 'zero' (the count starting point).
Keeping that in mind:
λf . λx . x
will be equal to zero, if we will pass the appropriate f (in this particular case it doesn't matter what function will be passed as f, since it never applied) and x:
λf . λx . ZERO
following:
λf . λx . fx
will be evaluated to 1:
λf . λx . INCREMENT ZERO
and this:
λf . λx . f f x
will be qual to 2:
λf . λx . INCREMENT(INCREMENT ZERO)
and so on, for all the successive numbers.
See my (broader) answer to another (but closely related) question.
A church numeral n, (say 2,) represents the "action" of applying any given function n times (here,two times) on any given parameter.
A church numeral, by definition, is a function that takes two parameters, namely
1) a function
2) a parameter or expression or value on which the supplied function is applied.
When the supplied function is the successor function, and the supplied second parameter is Zero , you get the numeral. (2, in the above example)
Church numeral 2 is by definition,
λf . λx . f( f( x))
,Which is obviously a function that takes two parameters.
On passing the successor function, i.e f(x)=x+1 as first parameter and zero as second parameter to the function, we get...
f(f(0))
=f(1)
=2
This explanation is kinda simplified as definition of successor function and zero aren't as shown, in lambda calculus..
Refer :http://www.cse.unt.edu/~tarau/teaching/GPL/docs/Church%20encoding.pdf
An excellent explanation on church encodings

What's the formal term for a function that can be written in terms of `fold`?

I use the LINQ Aggregate operator quite often. Essentially, it lets you "accumulate" a function over a sequence by repeatedly applying the function on the last computed value of the function and the next element of the sequence.
For example:
int[] numbers = ...
int result = numbers.Aggregate(0, (result, next) => result + next * next);
will compute the sum of the squares of the elements of an array.
After some googling, I discovered that the general term for this in functional programming is "fold". This got me curious about functions that could be written as folds. In other words, the f in f = fold op.
I think that a function that can be computed with this operator only needs to satisfy (please correct me if I am wrong):
f(x1, x2, ..., xn) = f(f(x1, x2, ..., xn-1), xn)
This property seems common enough to deserve a special name. Is there one?
An Iterated binary operation may be what you are looking for.
You would also need to add some stopping conditions like
f(x) = something
f(x1,x2) = something2
They define a binary operation f and another function F in the link I provided to handle what happens when you get down to f(x1,x2).
To clarify the question: 'sum of squares' is a special function because it has the property that it can be expressed in terms of the fold functional plus a lambda, ie
sumSq = fold ((result, next) => result + next * next) 0
Which functions f have this property, where dom f = { A tuples }, ran f :: B?
Clearly, due to the mechanics of fold, the statement that f is foldable is the assertion that there exists an h :: A * B -> B such that for any n > 0, x1, ..., xn in A, f ((x1,...xn)) = h (xn, f ((x1,...,xn-1))).
The assertion that the h exists says almost the same thing as your condition that
f((x1, x2, ..., xn)) = f((f((x1, x2, ..., xn-1)), xn)) (*)
so you were very nearly correct; the difference is that you are requiring A=B which is a bit more restrictive than being a general fold-expressible function. More problematically though, fold in general also takes a starting value a, which is set to a = f nil. The main reason your formulation (*) is wrong is that it assumes that h is whatever f does on pair lists, but that is only true when h(x, a) = a. That is, in your example of sum of squares, the starting value you gave to Accumulate was 0, which is a does-nothing when you add it, but there are fold-expressible functions where the starting value does something, in which case we have a fold-expressible function which does not satisfy (*).
For example, take this fold-expressible function lengthPlusOne:
lengthPlusOne = fold ((result, next) => result + 1) 1
f (1) = 2, but f(f(), 1) = f(1, 1) = 3.
Finally, let's give an example of a functions on lists not expressible in terms of fold. Suppose we had a black box function and tested it on these inputs:
f (1) = 1
f (1, 1) = 1 (1)
f (2, 1) = 1
f (1, 2, 1) = 2 (2)
Such a function on tuples (=finite lists) obviously exists (we can just define it to have those outputs above and be zero on any other lists). Yet, it is not foldable because (1) implies h(1,1)=1, while (2) implies h(1,1)=2.
I don't know if there is other terminology than just saying 'a function expressible as a fold'. Perhaps a (left/right) context-free list function would be a good way of describing it?
In functional programming, fold is used to aggregate results on collections like list, array, sequence... Your formulation of fold is incorrect, which leads to confusion. A correct formulation could be:
fold f e [x1, x2, x3,..., xn] = f((...f(f(f(e, x1),x2),x3)...), xn)
The requirement for f is actually very loose. Lets say the type of elements is T and type of e is U. So function f indeed takes two arguments, the first one of type U and the second one of type T, and returns a value of type U (because this value will be supplied as the first argument of function f again). In short, we have an "accumulate" function with a signature f: U * T -> U. Due to this reason, I don't think there is a formal term for these kinds of function.
In your example, e = 0, T = int, U = int and your lambda function (result, next) => result + next * next has a signaturef: int * int -> int, which satisfies the condition of "foldable" functions.
In case you want to know, another variant of fold is foldBack, which accumulates results with the reverse order from xn to x1:
foldBack f [x1, x2,..., xn] e = f(x1,f(x2,...,f(n,e)...))
There are interesting cases with commutative functions, which satisfy f(x, y) = f(x, y), when fold and foldBack return the same result. About fold itself, it is a specific instance of catamorphism in category theory. You can read more about catamorphism here.

ML Expression, help line by line

val y=2;
fun f(x) = x*y;
fun g(h) = let val y=5 in 3+h(y) end;
let val y=3 in g(f) end;
I'm looking for a line by line explanation. I'm new to ML and trying to decipher some online code. Also, a description of the "let/in" commands would be very helpful.
I'm more familiar with ocaml but it all looks the same to me.
val y=2;
fun f(x) = x*y;
The first two lines bind variables y and f. y to an integer 2 and f to a function which takes an integer x and multiplies it by what's bound to y, 2. So you can think of the function f takes some integer and multiplies it by 2. (f(x) = x*2)
fun g(h) = let val y=5
in
3+h(y)
end;
The next line defines a function g which takes some h (which turns out to be a function which takes an integer and returns an integer) and does the following:
Binds the integer 5 to a temporary variable y.
You can think of the let/in/end syntax as a way to declare a temporary variable which could be used in the expression following in. end just ends the expression. (this is in contrast to ocaml where end is omitted)
Returns the sum of 3 plus the function h applying the argument y, or 5.
At a high level, the function g takes some function, applies 5 to that function and adds 3 to the result. (g(h) = 3+h(5))
At this point, three variables are bound in the environment: y = 2, f = function and g = function.
let val y=3
in
g(f)
end;
Now 3 is bound to a temporary variable y and calls function g with the function f as the argument. You need to remember that when a function is defined, it keeps it's environment along with it so the temporary binding of y here has no affect on the functions g and f. Their behavior does not change.
g (g(h) = 3+h(5)), is called with argument f (f(x) = x*2). Performing the substitutions for parameter h, g becomes 3+((5)*2) which evaluates to 13.
I hope this is clear to you.

Resources