An example of where normal order has less steps than applicative order? - lambda-calculus

I can't seem to come up with an example of this and wondering if there is such a case? I know if I have an expression where applicative order doesn't terminate that normal order may still terminate. I'm wondering though if there is an example where both orders terminate but normal order has fewer steps.

(λ p. λ q. q) ((λ x. λ y. λ z. ((x y) z)) (λ w. λ v. w))
With some whitespace:
(λ p.
λ q.
q
)
(
(λ x.
λ y.
λ z.
((x y) z)
)
(λ w.
λ v.
w
)
)
In normal order, the outermost reduction can be performed first, reducing directly to the identity combinator in one step. Applicative order will get there too, but it takes much longer since the x-y-z-w-v expression needs to be evaluated first.
Note that the x-y-z-w-v expression isn't even used. You can think of normal order as a sort of lazy evaluation: expressions are only evaluated or reduced when they are used. So you just build a formula that doesn't use one of its arguments and you immediately have an example of this kind of efficiency win.

In a lambda expression, any variable bound within an abstraction can be used zero or more times in the body of the abstraction.
Normal order evaluates the argument n times, where n is the number of times it is used in the body.
Applicative order evaluates the argument exactly once, irrespective of the number of times it is used in the body.
Comparison
If argument is used exactly once, both normal order and applicative order will have same performance.
If the argument is used more than once, applicative order will be faster.
If the argument is used zero times, normal order would be faster.

Related

Does SKS equal SKK?

Context
I started teaching myself lambda calculus last night and I am trying to determine if what I understand so far is correct.
Understanding
SKK is equivalent to the Identity combinator, I.
Where L stands for lambda:
S = LxLyLz((xz)(yz))
K = LxLy(x)
K essentially takes the next 2 (lambda) terms and gives back the first of those. S seems a little more complicated in the untyped lambda calculus.
My Interpretation
SK(any-lambda-term) is also equivalent to I.
I.e. the application of the application of S to K to Any-lambda-term is equivalent to the Identity combinator:
((S K)(Any)) = I = S K K = ((S K)(K))
I am using the convention of “left-association” in my above notation, if that helps (And I tried to make that clear in the 4th term above with parentheses. Everything I have read so far seems to use this convention).
Reasoning
S K = LyLz((K z)(y z))
The next lambda term will be substituted for y, let the term be Y.
S K Y = Lz((K z)(Y z))
(Y z) is the application of Y to z, also a lambda term.
(K z)returns the constant function that returns z, given another term input: (Y z).
Is my interpretation true? If not, can you provide an explanation? I would greatly appreciate it. Particularly if a sort of order of operations can be explained—I regularly find myself confused when considering when to evaluate. Perhaps that will be refined with practice.
Your intuition is correct, but an intuition proves nothing (alas...)
So, how can we prove your statement? Simply by showing that SKK and SKS have the same behaviour. "Behaviour" is an informal notion, which is formally capture by "semantics": if SKK and SKS are equals, then they should always reduce to the same term, according to the SKI-calculus semantics.
Now, there is a deep question, which is: what are the SKI-calculus? Actually, there is not a single way to answer that. What you implicitly do in your question is that you express SKI in terms of λ terms and you rely on the semantics of the λ calculus. This is absolutly correct. An other way to do it could have been to define directly SKI semantics. For instance, if you look at the wikipedia page, you can see that the semantics are not defined with lambda terms (and the fact that it correspond to lambda term is a (nice and expected) side effect). In the rest of this answer, I'll take the same approach as you do, and convert SKI terms in λ terms. A good exercise for you is to redo the proof, using the proper SKI semantics.
So, let formalize your question: your question is whether, for any SKI term t, SKKt = SKSt? Well... Let's see.
SKKt is encoded as (λx.λy.λz.(xz)(yz))(λx.λy.x)(λx.λy.x)t in the λ-calculus. We now just have to reduce it to a normal form (I detail every step, each time I reduce the leftmost λ, even tho it is not the fastest strategy):
(λx.λy.λz.(xz)(yz))(λx.λy.x)(λx.λy.x)t
= (λy.λz.((λx.λy.x)z)(yz))(λx.λy.x)t
= (λz.((λx.λy.x)z)((λx.λy.x)z))t
= ((λx.λy.x)t)((λx.λy.x)t)
= (λy.t)((λx.λy.x)t)
= t
So, the encoding of SKKt in the λ calculus reduces to t (as a sidenote, we just proved that SKK is equivalent to I here). To conclude our proof, we have to reduce SKSt and see whether it also reduces to t.
SKSt is encoded as (λx.λy.λz.(xz)(yz))(λx.λy.x)(λx.λy.λz.(xz)(yz))t. Let reduce it. (I don't detail as much this time)
(λx.λy.λz.(xz)(yz))(λx.λy.x)(λx.λy.λz.(xz)(yz))t
= ((λx.λy.x) t)((λx.λy.λz.(xz)(yz)) t))
= (λy.t)((λx.λy.λz.(xz)(yz)) t))
= t
Hurrah! It also reduce to t, so indeed, SKS and SKK are equivalent. It seems that the third combinator is not important: that as soon as you have SK?, it is equivalent to I. As an exercise, you can easily prove it (same strategy, if it is the case, then for any terms t and s, SKts = s). As mentionned above, an other good exercise is to redo the proof without using the λ semantics, but the proper SKI semantics.
Finally, my answer should raise a new question to you: we have two semantics, one that encodes SKI terms into λ terms, and one that does not. The question you may have is: are the two semantics equivalent? What does it mean for two semantics to be equivalent? If you are only starting to teach yourself λ calculus, it may be a bit early to try to answer those questions right now, but you can keep it in a corner of your head for when you'll get more familiar with formal languages.

Converting first-order logic to CNF without exponential blowup

When attempting to solve logic problems on a computer, it is usual to first convert them to CNF, because the best solving algorithms expect CNF as input.
For propositional logic, the textbook rules for this conversion are simple, but if you apply them as is, the result is one of the very rare cases where a program encounters double exponential resource consumption without being specifically constructed to do so:
a <=> (b <=> (c <=> ...))
with N variables, generates 2^2^N clauses, one exponential blowup in the conversion of equivalence to AND/OR, and another in the distribution of OR into AND.
The solution to this is to rename subterms. If we rewrite the above as something like
r <=> (c <=> ...)
a <=> (b <=> r)
where r is a fresh symbol that is being defined to be equal to a subterm - in general, we may need O(N) such symbols - the exponential blowups can be avoided.
Unfortunately, this runs into a problem when we try to extend it to first-order logic. Using TPTP notation where ? means 'there exists' and variables begin with capital letters, consider
a <=> ?[X]:p(X)
Admittedly this case is simple enough that there is no actual need to rename the subterm, but it's necessary to use a simple case for illustration, so suppose we are using an algorithm that just automatically renames arguments of the equivalence operator; the point generalizes to more complex cases.
If we try the above trick and rename the ? subterm, we get
r <=> ?[X]:p(X)
Existential variables are converted to Skolem symbols, so that ends up as
r <=> p(s)
The original formula then expands to
(~a | r) & (a | ~r)
Which is by construction equivalent to
(~a | p(s)) & (a | ~p(s))
But this is not correct! Suppose we had not done the renaming, but just expanded the original formula as it was, we would get
(~a | ?[X]:p(X)) & (a | ~?[X]:p(X))
(~a | ?[X]:p(X)) & (a | ![X]:~p(X))
(~a | p(s)) & (a | ~p(X))
which is critically different from the version we got with the renaming.
The problem is that equivalence needs both the positive and negative versions of each argument, but applying negation to terms that contain universal or existential quantifiers, structurally changes those terms; you cannot just encapsulate them in a definition, then apply the negation to the defined symbol.
The upshot of this is that when you have equivalence and the arguments may contain such quantifiers, you actually need to recur through each argument twice, once for the positive version, once for the negative. This suffices to bring back the existential blowup we hoped to avoid by doing the renaming. As far as I can see, this problem is not caused by the way a particular algorithm works, but by the nature of the task.
So my question:
Given an input formula that may contain arbitrary nesting of equivalence and quantifiers, is there any algorithm that will correctly turn this to CNF with a polynomial rather than exponential number of clauses?
As you observed, an existential such as ∃X.p(X) is not in fact equivalent to a Skolemized expression p(S). Its negation ¬∃X.p(X) is not equivalent to ¬p(S), but to ∀Y.¬p(Y).
Possible approaches that avoid the exponential blow-up:
Convert existentials such as ∃X.p(X) to universals such as ¬∀Y.p(Y), or vice versa, so you have a canonical form. Skolemize at a later step.
Remember when you convert that your p(S) is a Skolemized existential, and that its negation is ∀Y.¬p(Y).
Define terms equivalent to universals and existentials, such that a represents ∀Y.p(Y) and ¬a then represents ¬∀Y.p(Y), or equivalently, ∃X.¬p(X).
Use the symmetry of Boolean duals, so that the same transformations apply with AND and OR swapped, De Morgan’s Laws, and the equivalence between existentials and negated universals, to restore the symmetry between the expansions of r and ~r. The negations in the conversion between universals and existentials and in De Morgan's Laws cancel each other out, and the duality of switching AND and OR means you can re-use the result on the left to generate the one on the right mechanically again?
Given that you need to support ALL and NOT ALL statements anyway, this should not create any new problems. Just canonicalize and use the same approach you would for a universal.
If you’re solving by converting to SAT, your terms can represent universals, too. So, in your example, you’re trying to replace a with r, but you can still use ~a, equivalent to the negative universal.
In your expressions. you’d still use (~a | r) & (a | ~r), but expand ~r to its correct rather than the incorrect value. That example is trivial, since that’s just ~a, but you’d normally define r as equivalent to a more complex transformation, and in that case you need to remember what both r and ~r represent. It is not really a simple mechanical transformation of the Skolemized expression.
In this example, I’m not sure why it’s a problem that (~a | r) & (a | ~r) is equivalent to (~a | r) & (a | ~a), which simplifies to (~a | r). That’s not going to give you exponential blow-up? When you translate back to first-order predicate logic, make the correct translation.
Update
Thanks for clarifying what the problem was in chat. As I currently think I understand it, what you have is an equivalence with a left and a right side, which contains other nested equivalences, and you want to expand both the equivalence and its negation. The problem is that, because the negation does not have symmetrical form, you need to recurse twice for each nested equivalence in the tree, once when expanding the equivalence and once when expanding its negation?
You should define a transformation that generates the negative expansion from the positive expansion in linear time, and divide-and-conquer the expressions containing nested equivalences using that. This seems to be what you were after with the ~p(S) transformation.
To do this, you recall that ¬∃X.p(X) is equivalent to ∀X.¬p(X), and vice versa. Then if you’ve expanded p(x) into normal form as conjunctions and disjunctions, De Morgan’s Laws lets you turn an expression like ¬(a ∨ ¬b) into ¬a ∧ b. The inner ¬ on the quantifier transformation and the outer ¬ on the De Morgan transformation cancel each other out. Finally, the dual of any Boolean equivalence remains valid when you replace each ∨ and ∧ with the other and any atom a or ¬a with its inverse.
So, while I might be making an error, especially at 1 AM, it looks to me like what you want is the dual transformation that substitutes:
An outer ∃ for ∀ and vice versa
∧ for ∨ and vice versa
Each term t with ¬t and vice versa
Apply this to the expansion of the positive equivalence to generate the negative dual in time proportional to its length, without further recursion.

MicroKanren - what are the terms?

Having a little trouble understanding the core terms of the MicroKanren DSL. Section 4 says:
Terms of the language are defined by the unify operator. Here, terms of the language consist of variables, objects deemed identical under eqv?, and pairs of the foregoing.
But they never describe what the "pairs" actually mean. Are the pairs supposed to represent equality of two subterms, like so:
type 'a ukanren = KVar of int | KVal of 'a | KEq of 'a kanren * 'a kanren
So a term like:
(call/fresh (λ (a) (≡ a 7)))
Generates a pair for (≡ a 7)?
Edit: upon further thought, I don't think this is it. The mentions of "pair" in the paper seem to come much later, with extensions and refinements to the basic system, which would mean the pairs have no meaning in the terms for the basic intro. Is this correct?
In this context, "pair" just means a cons pair, such as (5 . 6) or (foo . #t). An example unification between two pairs is:
(call/fresh
(λ (a)
(call/fresh
(λ (b)
(≡ (cons a b) (cons 5 6))))))
which associates a with 5 and b with 6.
Sorry for the confusion and difficulty!! And thank you for the question!
You can think of the (typical) Kanren term language as having a single binary functor tag cons/2, and an infinite quantity of constants (the exact makeup changes from embedding to embedding).
Under the assumption that cons/2 is the only (n < 0)-ary functor tag, every compound term will have been built with it. If you look up standard presentations of unification (e.g. Martelli-Montanari) you'll usually see a step f(t0,...,tn) g(s0,...,sm) => fail if f =/= g or n =/= m. We handle clash-failure in unification when comparing ground atomic terms; for us, compound terms must necessarily be of the same arity and tag.
pair? recognizes conses. In Racket, in fact, they provide a cons? alias, to make this clearer!
Once again, thank you!

SICP Exercise 1.5

Exercise 1.5. Ben Bitdiddle has invented a test to determine whether the interpreter he is faced with is using applicative-order
evaluation or normal-order evaluation. He defines the following two
procedures:
(define (p) (p))
(define (test x y) (if (= x 0)
0
y))
Then he evaluates the expression
(test 0 (p))
What behavior will Ben observe with an interpreter that uses
applicative-order evaluation? What behavior will he observe with an
interpreter that uses normal-order evaluation?
I understand the answer to the exercise; my question lies in how (p) is interpreted versus p. For example, (test 0 (p)) causes the interpreter to hang (which is expected), but (test 0 p) with the above definition immediately evaluates to 0. Why?
Moreover, suppose we changed the definition to (define (p) p). With the given definition, (test 0 (p)) and (test 0 p) both evaluate to 0. Why does this occur? Why doesn't the interpreter hang? I am using Dr. Racket with the SICP package.
p is a function. (p) is a call to a function.
In your interpreter evaluate p.
p <Return>
==> P : #function
Now evaluate (p). Make sure you know how to kill your interpreter! (Probably there is a “Stop” button in Dr. Racket.)
(p)
Note that nothing happens. Or, at least, nothing visible. The interpreter is spinning away, eliminating tail calls (so, using near 0 memory), calling p.
As p and (p) evaluate to different things, you should expect different behaviour.
As to your second question : You are defining p to be a function that returns itself. Again, try evaluating p and (p) with your (define (p) p) and see what you get. My guess (I am using a computer on which I cannot install anything and which has no scheme) is that they will evaluate to the same thing. (I might even bet that (eq? p (p)) will evaluate to #t.)

Is this implementation tail-recursive

I read in an algorithmic book that the Ackermann function cannot be made tail-recursive (what they say is "it can't be transformed into an iteration"). I'm pretty perplex about this, so I tried and come up with this:
let Ackb m n =
let rec rAck cont m n =
match (m, n) with
| 0, n -> cont (n+1)
| m, 0 -> rAck cont (m-1) 1
| m, n -> rAck (fun x -> rAck cont (m-1) x) m (n-1)
in rAck (fun x -> x) m n
;;
(it's OCaml / F# code).
My problem is, I'm not sure that this is actually tail recursive. Could you confirm that it is? If not, why? And eventually, what does it mean when people say that the Ackermann function is not primitive recursive?
Thanks!
Yes, it is tail-recursive. Every function can be made tail-rec by an explicit transformation to Continuation Passing Style.
This does not mean that the function will execute in constant memory : you build stacks of continuations that must be allocated. It may be more efficient to defunctionalize the continuations to represent that data as a simple algebraic datatype.
Being primitive recursive is a very different notion, related to expressiveness of a certain form of recursive definition that is used in mathematical theory, but probably not very much relevant to computer science as you know it: they are of very reduced expressiveness, and systems with function composition (starting with Gödel's System T), such as all current programming languages, are much more powerful.
In term of computer languages, primtive recursive functions roughly correspond to programs without general recursion where all loop/iterations are statically bounded (the number of possible repetitions is known).
Yes.
By definition, any recursive function can be transformed into an iteration as long as it has access to an unbounded stack-like construct. The interesting question is whether it can be done without a stack or any other unbounded data storage.
A tail-recursive function can be turned into such an iteration only if the size of its arguments is bounded. In your example (and almost any recursive function that uses continuations), the cont parameter is for all means and purposes a stack that can grow to any size. Indeed, the entire point of continuation-passing style is to store data usually present on the call stack ("what to do after I return?") in a continuation parameter instead.

Resources