Chapter 8 of The RealWorldHaskell
globToRegex' (c:cs) = escape c ++ globToRegex' cs
This function is not tail recursive and it says that the answer relies on Haskell non-strict(lazy) evaluation strategy. The (++) operator's simple definition lies in the following and it's not tail recursive.
(++) :: [a] -> [a] -> [a]
(x:xs) ++ ys = x : (xs ++ ys)
[] ++ ys = ys
In a strict language, if we evaluate "foo" ++ "bar", the entire list is constructed, then returned. Non-strict evaluation defers much of the work until it is needed.
If we demand an element of the expression "foo" ++ "bar", the first pattern of the function's definition matches, and we return the expression x : (xs ++ ys). Because the (:) constructor is non-strict, the evaluation of xs ++ ys can be deferred: we generate more elements of the result at whatever rate they are demanded. When we generate more of the result, we will no longer be using x, so the garbage collector can reclaim it. Since we generate elements of the result on demand, and do not hold onto parts that we are done with, the compiler can evaluate our code in constant space.
(Emphasis added.)
The explanation above in bold is something essential to Haskell, But
How can we comprehend that?
What happend in the underlying?
"x:(xs ++ ys) will evalute in constant space", how? It sounds what tail recursion does!
Remember that "foo" is just syntactic sugar for 'f':'o':'o':[].
That is, String is just an alias for [Char] which is just a linked list of characters.
When client code is consuming the linked list, it decomposes it back into a head and tail (e.g. x:xs), does something with the head (if desired), and then recurses for the tail.
When your code is constructing the linked list, because of lazy evaluation, all it needs to do is return a thunk or promise that it will return a linked list when asked for. When the head is dereferenced, it is supplied on demand, and the tail is left as a promise for the rest of the list.
It should be easy to see that as long as the list is not copied or otherwise stored, each thunk will get used once and then discarded, so that the overall storage space is constant.
Many strict languages expose a mechanism (often called a generator) to accomplish the same kind of lazy list generation, but with lazy evaluation such features come "for free" as part of the language -- in essence, all Haskell lists are generators!
Relying on lazy evaluation rather than tail recursion is a characteristic of Haskell in comparison to other FP languages. The two play related roles in terms of limiting memory usage; which one is the appropriate mechanism depends on the data being produced.
If your output may be incrementally consumed, then you should prefer to take advantage of lazy evaluation, as output will only be generated as it is required, thus limiting heap consumption. If you eagerly construct the output, then you are resigning yourself to using heap, but can at least conserve stack by being tail recursive.
If your output can not be incrementally consumed -- perhaps you are computing an Int -- then lazyness can leave you with an unwanted pile of thunks whose evaluation will blow your stack. In this case, a strict accumulator and tail recursion are called for.
So, if you are eager you might waste heap building a big data structure. If you are lazy, you might defer simplification (e.g. reducing 1 + 1 to 2) to the heap, only to eventually sully your stack when paying the piper.
To play with both sides of the coin, ponder foldl' and foldr.
Tail recursion would keep the stack constant, but in a strict language the heap would grow as x : (xs ++ ys) was computed. In Haskell, because it is non-strict, x would be freed before the next value was computed (unless the caller held a reference to x unnecessarily), so the heap would also be constant.
Related
I am attempting to understand one of the prime number algorithms enumerated here: https://wiki.haskell.org/index.php?title=Prime_numbers&oldid=36858#Postponed_Filters_Sieve, specifically:
primes :: [Integer]
primes = 2: 3: sieve (tail primes) [5,7..]
where
sieve (p:ps) xs = h ++ sieve ps [x | x <- t, x `rem` p /= 0]
-- or: filter ((/=0).(`rem`p)) t
where (h,~(_:t)) = span (< p*p) xs
So conceptually I understand how this algorithm works (sieve of Erastothenes), start with 2,3, and a list of numbers, then eliminate any that are greater than the previous square and divisible by any below it.
But I'm having a hard time following along with the nested recursive step (prime calles sieve on primes, which calls sieve on primes which...)
I understand that this works due to lazy evaluation, and it demonstrably produces the right result, but I am incapable of following it.
So for example if I were to run take 5 primes what would actually happen:
e.g (I will refer to the result of the take operation as t for ease of reading/reasoning):
Step 1)
primes returns a list [2,3, xs]
so t is [2,3, take 3 xs]
where xs is sieve (tail primes) [5,7..]
Step 2)
tail primes is 3:xs
where xs is sieve (tail primes) [5,7..]
etc
so t should now be [2,3,3,3,3,3...]
I have little trouble following sieve itself...
So I guess I have two questions.
1) How exactly does this algorithm actually work, and where/why is my trace wrong
2) Is there a way, generally, in Haskell to figure out what order things are running in? Maybe print a recursion tree? Or at the very least drop in a debugger halt?
I took the liberty of de-optimizing and clarifying the algorithm a little bit:
primes :: [Integer]
primes = 2 : sieve primes [3 ..]
sieve :: [Integer] -> [Integer] -> [Integer]
sieve [] xs = xs -- degenerate case for testing
sieve (p:ps) xs = h ++ sieve ps [x | x <- t, x `rem` p /= 0]
where (h, t) = span (< p*p) xs
This is the same base logic, but it does a lot more redundant work (a constant factor per output value, though) than the version you provided. I think that's a simpler starting point, and once you understand how this version works, it's easy to see what the optimizations do. I also pulled sieve into its own definition. It didn't use anything from its enclosing scope, and the ability to test it standalone might help with understanding what's going on.
If you'd like to peek into how evaluation proceeds, you can use the Debug.Trace module. The two functions I use most from it are trace and traceShow, depending on the value I want to see.
So, let's get a bit of tracing info from sieve:
import Debug.Trace
primes :: [Integer]
primes = 2 : sieve primes [3 ..]
sieve :: [Integer] -> [Integer] -> [Integer]
sieve [] xs = trace "degenerate case for testing" xs
sieve (p:ps) xs = traceShow (p, h) $ h ++ sieve ps [x | x <- t, x `rem` p /= 0]
where (h, t) = span (< p*p) xs
And to test it out:
ghci> take 10 primes
[2(2,[3])
,3(3,[5,7])
,5,7(5,[11,13,17,19,23])
,11,13,17,19,23(7,[29,31,37,41,43,47])
,29]
Well, that's a lot less clear than hoped. When ghci prints out a result, it uses the Show instance for the result's type. And the Show instance for [Integer] is lazy itself, so the printing of the list is getting interleaved with the tracing. To do better, let's have ghci produce a value that won't be output until after the tracing is complete. The sum should do:
ghci> sum $ take 10 primes
129
That was.. less than useful. Where'd the tracing go? Well, remember that the tracing functions are very impure. Their explicit goal is to produce side effects. But GHC doesn't respect side effects. It assumes that all functions are pure. One result of that assumption is that it can store the result of evaluating expressions. (Whether it does so or not depends on whether there is a shared reference or CSE optimizations kick in. In this case, primes itself is a shared reference.)
Maybe if we ask it to evaluate further than it has so far?
ghci> sum $ take 20 primes
(11,[53,59,61,67,71,73,79,83,89,97,101,103,107,109,113])
639
Ok, the tracing is separate from ghci's output as desired. But it's not really very informative at that point. To get a better picture, it needs to start back at the beginning. To do that, we need to get ghci to unload the definition of primes so that it will re-evaluate it from scratch. There are a bunch of ways to do this, but I'll demonstrate a method that has some additional ways to be useful.
ghci> :load *sieve.hs
[1 of 1] Compiling Main ( sieve.hs, interpreted )
Ok, modules loaded: Main.
By putting the * in front of the file name in the :load command, I instructed ghci to interpret the source from scratch, regardless of its current state. This works in this case because it forces a re-interpretation even though the source hasn't changed. It also is useful when you want to use :load on a source that has compiled output in the current directory, and have it interpret the whole module, not just load the compiled code.
ghci> sum $ take 10 primes
(2,[3])
(3,[5,7])
(5,[11,13,17,19,23])
(7,[29,31,37,41,43,47])
129
Now, let's get into how the algorithm actually works. The first thing to look into what the components of the tracing output are. The first element is the prime whose multiples are being sieved out of the potential outputs. The second element is the list of values being accepted as primes because they're less than p*p, and all non-primes less than that have already been removed from the candidate list. The mechanics of that should be familiar from any study of the sieve of Eratosthenes.
The calls to sieve start with sieve primes [3..]. The first place laziness critically comes into play is the pattern match on the first argument. The (:) constructor is already known, so the pattern matches p to the literal 2, and ps to an unevaluated expression. It's very important that it's unevaluated, because this call to sieve is what calculates the value. If it forced it to be evaluated to proceed, it would introduce a circular data dependency, which results in an infinite loop.
As the tracing indicates, the prime being used to remove elements from the candidates is 2. The call to span splits the input [3..] into ([3], [4..]). h is [3], as demonstrated by the tracing output. So the result of the call to sieve is [3] ++ <recursive call to sieve>. This is the second place laziness critically comes into play in the algorithm. The implementation of (++) doesn't do anything at all with its second argument until it has already produced the prefix of the list. This means that before the recursive call to sieve is evaluated, it's known that ps refers to a thunk that evaluates to [3] ++ <recursive call>.
That's enough information to handle the recursive call to sieve. Now, p is matched to 3, ps is matched to a thunk, and the logic continues. The tracing should illustrate what's going on at this point.
Now, the version you started with does a few things to optimize. First, it observes that the first element of t is always going to equal p*p, and it uses pattern matching to eliminate that element without doing any remainder calculation on it. This is a small saving per prime examined, but it is a clear saving.
Second, it skips filtering out the multiples of two, and just doesn't generate them in the first place. This reduces the amount of elements generated to be filtered later by a factor of two, and it reduces the number of filters being applied to each odd element by one.
As an aside, note that the stacking filter behavior is actually algorithmically significant, and not faithful to the sieve of Eratosthenes as described in literature. For further discussion of this, see The Genuine Sieve of Eratosthenes by Melissa O'Neill.
When I was writing down this question on an empty list as a difference list I wanted to test what I knew about those structures. However, when I tried something as simple as comparing different notations it seemed that I was wrong and that I did not understand what is actually going on with difference lists.
?- L = [a,b,c|[d,e]]-[d,e], L = [a,b,c].
false % expected true
I tested this on SWI-Prolog as well as SICStus. I verified the notation as this is how it is written in Bratko's Prolog Programming for AI, page 210, but apparently unification is not possible. Why is that? Don't these notations have the same declarative meaning?
I think you have the idea that the Prolog interpreter treats difference lists as something special. That is not the case: Prolog is not aware of the concept of a difference list (nor of nearly every concept except some syntactical sugar). He only sees:
L=-( |(a, |(b, |(c, |(d, |(e, []))))), |(d, |(e, [] )))
where -/2 and |/2 are functors, and a, b, c, d, e and [] are constants.
Difference lists are simply a programming technique (like for instance dynamic programming is a technique as well, the compiler cannot detect nor treat dynamic programming programs differently). It is used to efficiently unify a (partially) ununified part deep in an expression.
Say you want to append/3 two lists. You can do this as follows:
%append(A,B,C).
append([],L,L).
append([H|T],L,[H|B]) :-
append(T,L,B).
But this runs in O(n): you first need to iterate through the entire first list. If that list contains thousands of elements, it will take a lot of time.
Now you can define yourself a contract that you will feed an append_diff/3 not only the list, but a tuple -(List,Tail) where List is a reference to the beginning of the list, and Tail is a reference to the end of the not unified list. Examples of structures that fulfill this requirement are Tail-Tail, [a|Tail]-Tail, [1,4,2,5|Tail]-Tail.
Now you can effectively append_diff/3 in O(1) with:
append_diff(H1-T1,T1-T2,H1-T2).
Why? Because you unify the ununified tail of the first list with the second list. Now the ununified tail of the second lists becomes the tail of the final list. So take for instance:
append_diff([a|T1]-T1,[1,4,2,5|T2]-T2,L).
If you call the predicate, as you see above, T1 will unify with [1,4,2,5|T2], so now the first list collapses to [a|[1,4,2,5|T2]] or shorter [a,1,4,2,5|T2], since we also have a reference to T2, we can "return" (in Prolog nothing is returned), [a,1,4,2,5|T2]-T2: a new difference list with an open tail T2. But this is only because you give - a special meaning yourself: for Prolog - is simply -, it is not minus, it does not calculate a difference, etc. Prolog does not attach semantics to functors. If you would have used + instead of -, that would not have made the slightest difference.
So to return back to your question: you simply state to Prolog that L = -([a,b,c,d,e],[d,e]) and later state that L = [a,b,c]. Now it is clear that those two expressions cannot be unified. So Prolog says false.
I am new in Prolog and I am studying it for an universitary exam, we use SWI Prolog
I have some problem to understand how work this simple program that say TRUE if a list S is a sublist of a list L, otherwise say that the predicate is FALSE.
I have the following solution but I have some problem to understand it's declarative meaning
Reading the book I think that I had have some idea but I am not sure about it...
This is the solution that use concatenation:
sublist(S,L) :- conc(L1, L2, L),
conc(S, L3, L2).
conc([],L,L).
conc([X|L1],L2,[X|L3]) :- conc(L1,L2,L3).
This solution use an other litle program that respond TRUE if the third list is the concatenation of the first and the second list.
To say if S i sublist of L have to be TRUE the following two conditions:
L have to be a list that is the concatenation of L1 and L2
L2 have to be a list that is the concatenation of S (my sublist if exist into L list) and another list L3
This is the book explaination but it is just a litle obsucre for me...
I have try to reasoning about it and try to understand what really deeply mean...
So I think that, in some way, it is like to search if an element is member of a list using this other program:
member2(X, [X|_]).
member2(X,[_|T]):- member2(X,T).
In this program I simply say that if X is the element in the top of the list (its head) then X is in the list and the program respond true. Otherwise, if X element is not in the top of the list (or it is not my solution) I try to search it it the TAIL T of this list.
Back to the sublist program I think that the reasoning is similar
First I decompose L list in two list L1 and L2 (using conc program)**
Then I check if it is true that the concatenation of S and L3 is the L2 list.
If booth these condition it is true then S is sublist of L
I think that the L1 list have a similar role of the X element that I extract from the list in the member program.
Since the sublist S can start at the beginning of the list L, L1 can be [] and I have that I can decompose L in the concatenation of L1=[] and L2 and the I can try to decompose L2 in S and L3.
If I can do this last decomposition then the program end and I can say that it is true that S is a sublist of the original list L
If it is not true that conc(S, L3, L2) then ddo backtrack and take an other branch of computation
Is it right my declarative interpretation?
I am finding great difficulties with this example, I have also try to find a procedural explaination (using the operation trace in the Prolog shell) but I have big problem because the computation it is so big also for a short list...
The book explanation is more declarative, because it doesn't invoke Prolog's search mechanism. I would probably write this with more underscores:
sublist(S, L) :- append(_, Suffix, L), append(S, _, Suffix).
This at least makes the relationship between S and L2 (renamed Suffix) a little more clear. What we're trying to say, and this is hard to express clearly in declarative English, is that S is a sublist of L if there is a suffix of L called Suffix and S is a prefix of Suffix. Naming the other constituents only adds confusion. Prolog will internally name these variables and unify something with them as it attempts to unify everything else, but it won't share that information with the caller. Though these variables need to exist in some sense, they aren't germane to your formula or they would not be singletons. Whenever you get a singleton variable warning, replace the variable with the underscore. It will add clarity.
It happens that since the prefixes and suffixes involved can be empty lists, S can be a proper prefix of L or a proper suffix of L and everything will work out.
The declarative reading of member/2, for reference, is X is a member of a list if X is the head of the list or if X is a member of the tail of the list. Note carefully what is absent: mention of checking, success or failure, or, really, any order of operations. It is equally declarative to say X is a member of a list if it is a member of the tail or if it is the head. It is just an unavoidable fact of life that to make a computer perform a calculation it must be done in a certain order, so you have to tell Prolog things in the right order or it will enter infinite loops, but this is not an aspect of logic, just Prolog.
As we've gone over several other times, when you invoke the machinery of Prolog, you are no longer in a declarative reading. So when you say, for instance "First I decompose..." you've already left the declarative world and entered the procedural world. The declarative world doesn't have steps, even though Prolog must do things in a certain order to perform a computation on a real-life computer. Likewise, in a declarative reading you do not check things, they simply are or are not. The word backtrack also cannot appear as part of a declarative reading. The only "verb" you should be using in a declarative reading is the verb of being, "is."
That said, your Prolog/procedural readings are perfectly correct.
For example, in OCaml when you are appending an item to a list of length n.
x#[mylist]
Yes, the runtime of # in OCaml is O(n) (where n is the length of the left operand).
Generally appending to the end of an immutable singly linked list (or an immutable doubly linked list for that matter) will always be O(n).
Your code snippet doesn't match your question, which suggests you're confused about what the operator does or which operator to use.
The # or List.append operator concatenates 2 lists, and list1 # list2 takes O(length(list1)) time and is not tail-recursive. rev_append is tail-recursive but still O(n) in time. The usual way to add an item to a list however is with the :: constructor, and item :: mylist takes O(1) time.
Yes, as mentioned, there are two reasons why it must be O(n):
You must iterate to the end of the singly-linked list, which takes O(n)
Since pairs are immutable, you must copy all the pairs in the first list in order to append, which also takes O(n)
A related interesting topic is tail-recursive vs non-tail recursive ways to append
In summary, yes.
To illustrate, a simple (not tail-recursive) append function could be written as follows:
let rec (#) xs ys =
match xs with
| [] -> ys
| x::xs' -> x::(xs' # ys)
So internally append (#) breaks down the first list (xs) and uses cons (::) operator to build the resulting list. It's easy to see that there are n steps of prepending (::), where n is the length of the first list.
I read in an algorithmic book that the Ackermann function cannot be made tail-recursive (what they say is "it can't be transformed into an iteration"). I'm pretty perplex about this, so I tried and come up with this:
let Ackb m n =
let rec rAck cont m n =
match (m, n) with
| 0, n -> cont (n+1)
| m, 0 -> rAck cont (m-1) 1
| m, n -> rAck (fun x -> rAck cont (m-1) x) m (n-1)
in rAck (fun x -> x) m n
;;
(it's OCaml / F# code).
My problem is, I'm not sure that this is actually tail recursive. Could you confirm that it is? If not, why? And eventually, what does it mean when people say that the Ackermann function is not primitive recursive?
Thanks!
Yes, it is tail-recursive. Every function can be made tail-rec by an explicit transformation to Continuation Passing Style.
This does not mean that the function will execute in constant memory : you build stacks of continuations that must be allocated. It may be more efficient to defunctionalize the continuations to represent that data as a simple algebraic datatype.
Being primitive recursive is a very different notion, related to expressiveness of a certain form of recursive definition that is used in mathematical theory, but probably not very much relevant to computer science as you know it: they are of very reduced expressiveness, and systems with function composition (starting with Gödel's System T), such as all current programming languages, are much more powerful.
In term of computer languages, primtive recursive functions roughly correspond to programs without general recursion where all loop/iterations are statically bounded (the number of possible repetitions is known).
Yes.
By definition, any recursive function can be transformed into an iteration as long as it has access to an unbounded stack-like construct. The interesting question is whether it can be done without a stack or any other unbounded data storage.
A tail-recursive function can be turned into such an iteration only if the size of its arguments is bounded. In your example (and almost any recursive function that uses continuations), the cont parameter is for all means and purposes a stack that can grow to any size. Indeed, the entire point of continuation-passing style is to store data usually present on the call stack ("what to do after I return?") in a continuation parameter instead.