Is this implementation tail-recursive - algorithm

I read in an algorithmic book that the Ackermann function cannot be made tail-recursive (what they say is "it can't be transformed into an iteration"). I'm pretty perplex about this, so I tried and come up with this:
let Ackb m n =
let rec rAck cont m n =
match (m, n) with
| 0, n -> cont (n+1)
| m, 0 -> rAck cont (m-1) 1
| m, n -> rAck (fun x -> rAck cont (m-1) x) m (n-1)
in rAck (fun x -> x) m n
;;
(it's OCaml / F# code).
My problem is, I'm not sure that this is actually tail recursive. Could you confirm that it is? If not, why? And eventually, what does it mean when people say that the Ackermann function is not primitive recursive?
Thanks!

Yes, it is tail-recursive. Every function can be made tail-rec by an explicit transformation to Continuation Passing Style.
This does not mean that the function will execute in constant memory : you build stacks of continuations that must be allocated. It may be more efficient to defunctionalize the continuations to represent that data as a simple algebraic datatype.
Being primitive recursive is a very different notion, related to expressiveness of a certain form of recursive definition that is used in mathematical theory, but probably not very much relevant to computer science as you know it: they are of very reduced expressiveness, and systems with function composition (starting with Gödel's System T), such as all current programming languages, are much more powerful.
In term of computer languages, primtive recursive functions roughly correspond to programs without general recursion where all loop/iterations are statically bounded (the number of possible repetitions is known).

Yes.
By definition, any recursive function can be transformed into an iteration as long as it has access to an unbounded stack-like construct. The interesting question is whether it can be done without a stack or any other unbounded data storage.
A tail-recursive function can be turned into such an iteration only if the size of its arguments is bounded. In your example (and almost any recursive function that uses continuations), the cont parameter is for all means and purposes a stack that can grow to any size. Indeed, the entire point of continuation-passing style is to store data usually present on the call stack ("what to do after I return?") in a continuation parameter instead.

Related

Coq `simpl` reduces `S n + m` to `S(n + m)` for free?

I'm just beginning to learn Coq via software foundations. One of the homework Theorems (with my successful proof elided) in Induction.v is:
Theorem plus_n_Sm : forall n m : nat,
S (n + m) = n + (S m).
Proof.
(* elided per request of authors *)
Qed.
Later, I noticed that the following similar "leftward" statement comes for free with the built-in tactic .simpl:
Example left_extract : forall n m : nat, S n + m = S (n + m).
Proof.
intros. simpl. reflexivity.
Qed.
I've perused the documentation and haven't been able to figure out why .simpl gives us one direction "for free" but the other direction requires a user-supplied proof. The documentation is over my head at this very early point in my learning.
I guess it has something to do with left-ness being built-in and right-ness being not, but the propositions seem to my childlike eyes to be of equal complexity and subtlety. Would someone be so kind as to explain why, and perhaps give me some guidance about what is going on with .simpl?
Why should I NOT be surprised by my finding?
What other good things can I expect from .simpl, so it surprises me less and so I can eventually predict what it's going to do and rely on it?
What's the best way to wade through the theory -- unfolding of iota reductions and what not -- to focus on the relevant bits for this phenomenon? Or do I have to learn all the theory before I can understand this one bit?
I believe your surprise stems from the fact that you are accustomed to think of addition as a primitive concept. It is not, it is a function that has been defined and other primitive concepts are used to explain its behavior.
The addition function is defined by a function with a name written with letters (not the + symbol) in a way that looks like this:
Fixpoint add (n m : nat) : nat :=
match n with
| 0 =>
| S p => S (add p)
end.
You can find this information by typing
Locate "_ + _".
The + notation is used for two functions, only one of them can be applied on numbers.
Coming back to the add function, its very description explains that add 0 m computes to m and add (S n) m computes to S (add m n), but it does not say anything when the second argument has the form S m, it is just not written anywhere. Still the theory of Coq makes it possible to prove the fact.
So the equality that is given for free comes directly from the definition of addition. There are a few other equality statements that are natural to the human eye, but not given for free. They can just be proved by induction.
Would it be possible to design the Coq system in such a way that both equality statements would be given for free? The answer is probably yes, but the theory must be designed carefully for that.

Does SKS equal SKK?

Context
I started teaching myself lambda calculus last night and I am trying to determine if what I understand so far is correct.
Understanding
SKK is equivalent to the Identity combinator, I.
Where L stands for lambda:
S = LxLyLz((xz)(yz))
K = LxLy(x)
K essentially takes the next 2 (lambda) terms and gives back the first of those. S seems a little more complicated in the untyped lambda calculus.
My Interpretation
SK(any-lambda-term) is also equivalent to I.
I.e. the application of the application of S to K to Any-lambda-term is equivalent to the Identity combinator:
((S K)(Any)) = I = S K K = ((S K)(K))
I am using the convention of “left-association” in my above notation, if that helps (And I tried to make that clear in the 4th term above with parentheses. Everything I have read so far seems to use this convention).
Reasoning
S K = LyLz((K z)(y z))
The next lambda term will be substituted for y, let the term be Y.
S K Y = Lz((K z)(Y z))
(Y z) is the application of Y to z, also a lambda term.
(K z)returns the constant function that returns z, given another term input: (Y z).
Is my interpretation true? If not, can you provide an explanation? I would greatly appreciate it. Particularly if a sort of order of operations can be explained—I regularly find myself confused when considering when to evaluate. Perhaps that will be refined with practice.
Your intuition is correct, but an intuition proves nothing (alas...)
So, how can we prove your statement? Simply by showing that SKK and SKS have the same behaviour. "Behaviour" is an informal notion, which is formally capture by "semantics": if SKK and SKS are equals, then they should always reduce to the same term, according to the SKI-calculus semantics.
Now, there is a deep question, which is: what are the SKI-calculus? Actually, there is not a single way to answer that. What you implicitly do in your question is that you express SKI in terms of λ terms and you rely on the semantics of the λ calculus. This is absolutly correct. An other way to do it could have been to define directly SKI semantics. For instance, if you look at the wikipedia page, you can see that the semantics are not defined with lambda terms (and the fact that it correspond to lambda term is a (nice and expected) side effect). In the rest of this answer, I'll take the same approach as you do, and convert SKI terms in λ terms. A good exercise for you is to redo the proof, using the proper SKI semantics.
So, let formalize your question: your question is whether, for any SKI term t, SKKt = SKSt? Well... Let's see.
SKKt is encoded as (λx.λy.λz.(xz)(yz))(λx.λy.x)(λx.λy.x)t in the λ-calculus. We now just have to reduce it to a normal form (I detail every step, each time I reduce the leftmost λ, even tho it is not the fastest strategy):
(λx.λy.λz.(xz)(yz))(λx.λy.x)(λx.λy.x)t
= (λy.λz.((λx.λy.x)z)(yz))(λx.λy.x)t
= (λz.((λx.λy.x)z)((λx.λy.x)z))t
= ((λx.λy.x)t)((λx.λy.x)t)
= (λy.t)((λx.λy.x)t)
= t
So, the encoding of SKKt in the λ calculus reduces to t (as a sidenote, we just proved that SKK is equivalent to I here). To conclude our proof, we have to reduce SKSt and see whether it also reduces to t.
SKSt is encoded as (λx.λy.λz.(xz)(yz))(λx.λy.x)(λx.λy.λz.(xz)(yz))t. Let reduce it. (I don't detail as much this time)
(λx.λy.λz.(xz)(yz))(λx.λy.x)(λx.λy.λz.(xz)(yz))t
= ((λx.λy.x) t)((λx.λy.λz.(xz)(yz)) t))
= (λy.t)((λx.λy.λz.(xz)(yz)) t))
= t
Hurrah! It also reduce to t, so indeed, SKS and SKK are equivalent. It seems that the third combinator is not important: that as soon as you have SK?, it is equivalent to I. As an exercise, you can easily prove it (same strategy, if it is the case, then for any terms t and s, SKts = s). As mentionned above, an other good exercise is to redo the proof without using the λ semantics, but the proper SKI semantics.
Finally, my answer should raise a new question to you: we have two semantics, one that encodes SKI terms into λ terms, and one that does not. The question you may have is: are the two semantics equivalent? What does it mean for two semantics to be equivalent? If you are only starting to teach yourself λ calculus, it may be a bit early to try to answer those questions right now, but you can keep it in a corner of your head for when you'll get more familiar with formal languages.

Is there a way to receive information on execution order(Specifically for sieve of erastothenes)?

I am attempting to understand one of the prime number algorithms enumerated here: https://wiki.haskell.org/index.php?title=Prime_numbers&oldid=36858#Postponed_Filters_Sieve, specifically:
primes :: [Integer]
primes = 2: 3: sieve (tail primes) [5,7..]
where
sieve (p:ps) xs = h ++ sieve ps [x | x <- t, x `rem` p /= 0]
-- or: filter ((/=0).(`rem`p)) t
where (h,~(_:t)) = span (< p*p) xs
So conceptually I understand how this algorithm works (sieve of Erastothenes), start with 2,3, and a list of numbers, then eliminate any that are greater than the previous square and divisible by any below it.
But I'm having a hard time following along with the nested recursive step (prime calles sieve on primes, which calls sieve on primes which...)
I understand that this works due to lazy evaluation, and it demonstrably produces the right result, but I am incapable of following it.
So for example if I were to run take 5 primes what would actually happen:
e.g (I will refer to the result of the take operation as t for ease of reading/reasoning):
Step 1)
primes returns a list [2,3, xs]
so t is [2,3, take 3 xs]
where xs is sieve (tail primes) [5,7..]
Step 2)
tail primes is 3:xs
where xs is sieve (tail primes) [5,7..]
etc
so t should now be [2,3,3,3,3,3...]
I have little trouble following sieve itself...
So I guess I have two questions.
1) How exactly does this algorithm actually work, and where/why is my trace wrong
2) Is there a way, generally, in Haskell to figure out what order things are running in? Maybe print a recursion tree? Or at the very least drop in a debugger halt?
I took the liberty of de-optimizing and clarifying the algorithm a little bit:
primes :: [Integer]
primes = 2 : sieve primes [3 ..]
sieve :: [Integer] -> [Integer] -> [Integer]
sieve [] xs = xs -- degenerate case for testing
sieve (p:ps) xs = h ++ sieve ps [x | x <- t, x `rem` p /= 0]
where (h, t) = span (< p*p) xs
This is the same base logic, but it does a lot more redundant work (a constant factor per output value, though) than the version you provided. I think that's a simpler starting point, and once you understand how this version works, it's easy to see what the optimizations do. I also pulled sieve into its own definition. It didn't use anything from its enclosing scope, and the ability to test it standalone might help with understanding what's going on.
If you'd like to peek into how evaluation proceeds, you can use the Debug.Trace module. The two functions I use most from it are trace and traceShow, depending on the value I want to see.
So, let's get a bit of tracing info from sieve:
import Debug.Trace
primes :: [Integer]
primes = 2 : sieve primes [3 ..]
sieve :: [Integer] -> [Integer] -> [Integer]
sieve [] xs = trace "degenerate case for testing" xs
sieve (p:ps) xs = traceShow (p, h) $ h ++ sieve ps [x | x <- t, x `rem` p /= 0]
where (h, t) = span (< p*p) xs
And to test it out:
ghci> take 10 primes
[2(2,[3])
,3(3,[5,7])
,5,7(5,[11,13,17,19,23])
,11,13,17,19,23(7,[29,31,37,41,43,47])
,29]
Well, that's a lot less clear than hoped. When ghci prints out a result, it uses the Show instance for the result's type. And the Show instance for [Integer] is lazy itself, so the printing of the list is getting interleaved with the tracing. To do better, let's have ghci produce a value that won't be output until after the tracing is complete. The sum should do:
ghci> sum $ take 10 primes
129
That was.. less than useful. Where'd the tracing go? Well, remember that the tracing functions are very impure. Their explicit goal is to produce side effects. But GHC doesn't respect side effects. It assumes that all functions are pure. One result of that assumption is that it can store the result of evaluating expressions. (Whether it does so or not depends on whether there is a shared reference or CSE optimizations kick in. In this case, primes itself is a shared reference.)
Maybe if we ask it to evaluate further than it has so far?
ghci> sum $ take 20 primes
(11,[53,59,61,67,71,73,79,83,89,97,101,103,107,109,113])
639
Ok, the tracing is separate from ghci's output as desired. But it's not really very informative at that point. To get a better picture, it needs to start back at the beginning. To do that, we need to get ghci to unload the definition of primes so that it will re-evaluate it from scratch. There are a bunch of ways to do this, but I'll demonstrate a method that has some additional ways to be useful.
ghci> :load *sieve.hs
[1 of 1] Compiling Main ( sieve.hs, interpreted )
Ok, modules loaded: Main.
By putting the * in front of the file name in the :load command, I instructed ghci to interpret the source from scratch, regardless of its current state. This works in this case because it forces a re-interpretation even though the source hasn't changed. It also is useful when you want to use :load on a source that has compiled output in the current directory, and have it interpret the whole module, not just load the compiled code.
ghci> sum $ take 10 primes
(2,[3])
(3,[5,7])
(5,[11,13,17,19,23])
(7,[29,31,37,41,43,47])
129
Now, let's get into how the algorithm actually works. The first thing to look into what the components of the tracing output are. The first element is the prime whose multiples are being sieved out of the potential outputs. The second element is the list of values being accepted as primes because they're less than p*p, and all non-primes less than that have already been removed from the candidate list. The mechanics of that should be familiar from any study of the sieve of Eratosthenes.
The calls to sieve start with sieve primes [3..]. The first place laziness critically comes into play is the pattern match on the first argument. The (:) constructor is already known, so the pattern matches p to the literal 2, and ps to an unevaluated expression. It's very important that it's unevaluated, because this call to sieve is what calculates the value. If it forced it to be evaluated to proceed, it would introduce a circular data dependency, which results in an infinite loop.
As the tracing indicates, the prime being used to remove elements from the candidates is 2. The call to span splits the input [3..] into ([3], [4..]). h is [3], as demonstrated by the tracing output. So the result of the call to sieve is [3] ++ <recursive call to sieve>. This is the second place laziness critically comes into play in the algorithm. The implementation of (++) doesn't do anything at all with its second argument until it has already produced the prefix of the list. This means that before the recursive call to sieve is evaluated, it's known that ps refers to a thunk that evaluates to [3] ++ <recursive call>.
That's enough information to handle the recursive call to sieve. Now, p is matched to 3, ps is matched to a thunk, and the logic continues. The tracing should illustrate what's going on at this point.
Now, the version you started with does a few things to optimize. First, it observes that the first element of t is always going to equal p*p, and it uses pattern matching to eliminate that element without doing any remainder calculation on it. This is a small saving per prime examined, but it is a clear saving.
Second, it skips filtering out the multiples of two, and just doesn't generate them in the first place. This reduces the amount of elements generated to be filtered later by a factor of two, and it reduces the number of filters being applied to each odd element by one.
As an aside, note that the stacking filter behavior is actually algorithmically significant, and not faithful to the sieve of Eratosthenes as described in literature. For further discussion of this, see The Genuine Sieve of Eratosthenes by Melissa O'Neill.

(++) operator and (:) operator and lazy evaluation

Chapter 8 of The RealWorldHaskell
globToRegex' (c:cs) = escape c ++ globToRegex' cs
This function is not tail recursive and it says that the answer relies on Haskell non-strict(lazy) evaluation strategy. The (++) operator's simple definition lies in the following and it's not tail recursive.
(++) :: [a] -> [a] -> [a]
(x:xs) ++ ys = x : (xs ++ ys)
[] ++ ys = ys
In a strict language, if we evaluate "foo" ++ "bar", the entire list is constructed, then returned. Non-strict evaluation defers much of the work until it is needed.
If we demand an element of the expression "foo" ++ "bar", the first pattern of the function's definition matches, and we return the expression x : (xs ++ ys). Because the (:) constructor is non-strict, the evaluation of xs ++ ys can be deferred: we generate more elements of the result at whatever rate they are demanded. When we generate more of the result, we will no longer be using x, so the garbage collector can reclaim it. Since we generate elements of the result on demand, and do not hold onto parts that we are done with, the compiler can evaluate our code in constant space.
(Emphasis added.)
The explanation above in bold is something essential to Haskell, But
How can we comprehend that?
What happend in the underlying?
"x:(xs ++ ys) will evalute in constant space", how? It sounds what tail recursion does!
Remember that "foo" is just syntactic sugar for 'f':'o':'o':[].
That is, String is just an alias for [Char] which is just a linked list of characters.
When client code is consuming the linked list, it decomposes it back into a head and tail (e.g. x:xs), does something with the head (if desired), and then recurses for the tail.
When your code is constructing the linked list, because of lazy evaluation, all it needs to do is return a thunk or promise that it will return a linked list when asked for. When the head is dereferenced, it is supplied on demand, and the tail is left as a promise for the rest of the list.
It should be easy to see that as long as the list is not copied or otherwise stored, each thunk will get used once and then discarded, so that the overall storage space is constant.
Many strict languages expose a mechanism (often called a generator) to accomplish the same kind of lazy list generation, but with lazy evaluation such features come "for free" as part of the language -- in essence, all Haskell lists are generators!
Relying on lazy evaluation rather than tail recursion is a characteristic of Haskell in comparison to other FP languages. The two play related roles in terms of limiting memory usage; which one is the appropriate mechanism depends on the data being produced.
If your output may be incrementally consumed, then you should prefer to take advantage of lazy evaluation, as output will only be generated as it is required, thus limiting heap consumption. If you eagerly construct the output, then you are resigning yourself to using heap, but can at least conserve stack by being tail recursive.
If your output can not be incrementally consumed -- perhaps you are computing an Int -- then lazyness can leave you with an unwanted pile of thunks whose evaluation will blow your stack. In this case, a strict accumulator and tail recursion are called for.
So, if you are eager you might waste heap building a big data structure. If you are lazy, you might defer simplification (e.g. reducing 1 + 1 to 2) to the heap, only to eventually sully your stack when paying the piper.
To play with both sides of the coin, ponder foldl' and foldr.
Tail recursion would keep the stack constant, but in a strict language the heap would grow as x : (xs ++ ys) was computed. In Haskell, because it is non-strict, x would be freed before the next value was computed (unless the caller held a reference to x unnecessarily), so the heap would also be constant.

Ocaml implementation advice for an algorithm on sets

I am having problems for converting following algo in ocaml To implement this algorithm i used Set.Make(String) functor actualy in and out are 2 functions Can any one give me percise code help in ocaml for following
This is actually Algo for Live Variables[PDF] ..Help would be appreciated greatly
for all n, in[n] = out[n] = Ø
w = { set of all nodes }
repeat until w empty
n = w.pop( )
out[n] = ∪ n’ ∈ succ [n] in[n’]
in[n] = use[n] ∪ (out[n] — def [n])
if change to in[n],
for all predecessors m of n, w.push(m)
end
for all n, in[n] = out[n] = Ø
w = { set of all nodes }
repeat until w empty
n = w.pop( )
out[n] = ∪ n’ ∈ succ [n] in[n’]
in[n] = use[n] ∪ (out[n] — def [n])
if change to in[n],
for all predecessors m of n, w.push(m)
end
It's hard for me to tell what is exactly going on here. I think there is some alignment issues with your text --repeat until w empty should be repeating the next 5lines, right? And how are in and out functions, they look like arrays to me? Aside from those deficiencies I'll tackle some general rules I have followed.
I've had to translate a number of numerical methods in C and Fortran algorithms to functional languages and I have some suggestions for you.
0) Define the datatypes being used. This will really help you with the next step (spoiler: looking for functional constructs). Once you know the datatypes you can more accurately define the functional constructs you will eventually apply.
1) Look for functional constructs. fold, recursion, and maps, and when to use them. For example, the for all predecessors m is a fold (unsure if that it would fold over a tree or list, but none-the-less, a fold). While loops are a good place for recursion --but don't worry about making it a tail call, you can modify the parameters later to adhere to those requirements. Don't worry about being 100% pure. Remove enough impure constructs (references or arrays) to get a feel for the algorithm in a functional way.
2) Write any part of the algorithm that you can. Leaving functions blank, add dummy values, and just implement what you know --then you can ask SO better, more informed questions.
3) Re-write it. There is a good chance you missed some functional constructs or used an array or reference where you now realize you can use a list or set or by passing an accumulator. You may have defined a list, but later you realize you cannot randomly access it (well, it would be pretty detrimental to), or it needs to be traversed forward and back (a good place for a zipper). Either way, when you finally get it you'll know, and you should have a huge ear-to-ear grin on your face.

Resources