Nested recursive calls - is this tail recursion? - algorithm

I think that I understand the textbook definition of a tail recursrive function: a function that doesn't perform any calculations after the function call. I also get that as a consequence a tail recursive function will be more memory efficient, because it will need only one record for every call, instead of needing to keep a record each (as in normal recursion).
What is less clear to me is how does this definition apply to nested calls. I'll provide an example:
func foo91(x int)
if(x > 100):
return x - 10
else:
return foo91(foo91(x+11))
The answer I originally came up with was that it was not tail recursive by definition (because the external call is performed after evaluating the interal one, so other calculations are done after the first call), so generally functions with nested recursive calls are not tail recursive; on the other hand it is the same in practice, as it shares the side effects of a tail recursive function: it seems to me that a single activation record is needed for the whole function. Is that true?
Are nested recursive function calls generally considerable tail recursive?

Your understanding is only partially correct.
You defined tail recursion correctly. But whether it is actually efficient is implementation dependent. In some languages like Scheme it is. But most languages are like JavaScript and it is not. In particular one of the key tradeoffs is that making tail recursion efficient means you lose stack backtraces.
Now to your actual question, the inner call is not tail recursive because it has to come back to your call and do something else. But the outer call is tail recursive.

Related

Is having only one argument functions efficient? Haskell

I have started learning Haskell and I have read that every function in haskell takes only one argument and I can't understand what magic happens under the hood of Haskell that makes it possible and I am wondering if it is efficient.
Example
>:t (+)
(+) :: Num a => a -> a -> a
Signature above means that (+) function takes one Num then returns another function which takes one Num and returns a Num
Example 1 is relatively easy but I have started wondering what happens when functions are a little more complex.
My Questions
For sake of the example I have written a zipWith function and executed it in two ways, once passing one argument at the time and once passing all arguments.
zipwithCustom f (x:xs) (y:ys) = f x y : zipwithCustom f xs ys
zipwithCustom _ _ _ = []
zipWithAdd = zipwithCustom (+)
zipWithAddTo123 = zipWithAdd [1,2,3]
test1 = zipWithAddTo123 [1,1,1]
test2 = zipwithCustom (+) [1,2,3] [1,1,1]
>test1
[2,3,4]
>test2
[2,3,4]
Is passing one argument at the time (scenario_1) as efficient as passing all arguments at once (scenario_2)?
Are those scenarios any different in terms of what Haskell is actually doing to compute test1 and test2 (except the fact that scenario_1 probably takes more memory as it needs to save zipWithAdd and zipWithAdd123)
Is this correct and why? In scenario_1 I iterate over [1,2,3] and then over [1,1,1]
Is this correct and why? In scenario_1 and scenario_2 I iterate over both lists at the same time
I realise that I have asked a lot of questions in one post but I believe those are connected and will help me (and other people who are new to Haskell) to better understand what actually is happening in Haskell that makes both scenarios possible.
You ask about "Haskell", but Haskell the language specification doesn't care about these details. It is up to implementations to choose how evaluation happens -- the only thing the spec says is what the result of the evaluation should be, and carefully avoids giving an algorithm that must be used for computing that result. So in this answer I will talk about GHC, which, practically speaking, is the only extant implementation.
For (3) and (4) the answer is simple: the iteration pattern is exactly the same whether you apply zipWithCustom to arguments one at a time or all at once. (And that iteration pattern is to iterate over both lists at once.)
Unfortunately, the answer for (1) and (2) is complicated.
The starting point is the following simple algorithm:
When you apply a function to an argument, a closure is created (allocated and initialized). A closure is a data structure in memory, containing a pointer to the function and a pointer to the argument. When the function body is executed, any time its argument is mentioned, the value of that argument is looked up in the closure.
That's it.
However, this algorithm kind of sucks. It means that if you have a 7-argument function, you allocate 7 data structures, and when you use an argument, you may have to follow a 7-long chain of pointers to find it. Gross. So GHC does something slightly smarter. It uses the syntax of your program in a special way: if you apply a function to multiple arguments, it generates just one closure for that application, with as many fields as there are arguments.
(Well... that might be not quite true. Actually, it tracks the arity of every function -- defined again in a syntactic way as the number of arguments used to the left of the = sign when that function was defined. If you apply a function to more arguments than its arity, you might get multiple closures or something, I'm not sure.)
So that's pretty nice, and from that you might think that your test1 would then allocate one extra closure compared to test2. And you'd be right... when the optimizer isn't on.
But GHC also does lots of optimization stuff, and one of those is to notice "small" definitions and inline them. Almost certainly with optimizations turned on, your zipWithAdd and zipWithAddTo123 would both be inlined anywhere they were used, and we'd be back to the situation where just one closure gets allocated.
Hopefully this explanation gets you to where you can answer questions (1) and (2) yourself, but just in case it doesn't, here's explicit answers to those:
Is passing one argument at the time as efficient as passing all arguments at once?
Maybe. It's possible that passing arguments one at a time will be converted via inlining to passing all arguments at once, and then of course they will be identical. In the absence of this optimization, passing one argument at a time has a (very slight) performance penalty compared to passing all arguments at once.
Are those scenarios any different in terms of what Haskell is actually doing to compute test1 and test2?
test1 and test2 will almost certainly be compiled to the same code -- possibly even to the point that only one of them is compiled and the other is an alias for it.
If you want to read more about the ideas in the implementation, the Spineless Tagless G-machine paper is much more approachable than its title suggests, and only a little bit out of date.

Is this a correct way to think about recursivity in programming? (example)

I've been trying to learn what recursion in programming is, and I need someone to confirm whether I have thruly understood what it is.
The way I'm trying to think about it is through collision-detection between objects.
Let's say we have a function. The function is called when it's certain that a collision has occured, and it's called with a list of objects to determine which object collided, and with what object it collided with. It does this by first confirming whether the first object in the list collided with any of the other objects. If true, the function returns the objects in the list that collided. If false, the function calls itself with a shortened list that excludes the first object, and then repeats the proccess to determine whether it was the next object in the list that collided.
This is a finite recursive function because if the desired conditions aren't met, it calls itself with a shorter and shorter list to until it deductively meets the desired conditions. This is in contrast to a potentially infinite recursive function, where, for example, the list it calls itself with is not shortened, but the order of the list is randomized.
So... is this correct? Or is this just another example of iteration?
Thanks!
Edit: I was fortunate enough to get three VERY good answers by #rici, #Evan and #Jack. They all gave me valuable insight on this, in both technical and practical terms from different perspectives. Thank you!
Any iteration can be expressed recursively. (And, with auxiliary data structures, vice versa, but not so easily.)
I would say that you are thinking iteratively. That's not a bad thing; I don't say it to criticise. Simply, your explanation is of the form "Do this and then do that and continue until you reach the end".
Recursion is a slightly different way of thinking. I have some problem, and it's not obvious how to solve it. But I observe that if I knew the answer to a simpler problem, I could easily solve the problem at hand. And, moreover, there are some very simple problems which I can solve directly.
The recursive solution is based on using a simpler (smaller, fewer, whatever) problem to solve the problem at hand. How do I find out which pairs of objects in a set of objects collide?
If the set has fewer than 2 elements, there are no pairs. That's the simplest problem, and it has an obvious solution: the empty set.
Otherwise, I select some object. All colliding pairs either include this object, or they don't. So that gives me two subproblems.
The set of collisions which don't involve the selected object is obviously the same problem which I started with, but with a smaller set. So I've replaced this part of the problem with a smaller problem. That's one recursion.
But I also need the set of objects which the selected object collides with (which might be an empty set). That's a simpler problem, because now one element of each pair is known. I can solve that problem recursively as well:
I need the set of pairs which include the object X and a set S of objects.
If the set is empty, there are no pairs. Simple.
Otherwise, I choose some element from the set. Then I find all the collisions between X and the rest of the set (a simpler but otherwise identical problem).
If there is a collision between X and the selected element, I add that to the set I just found.
Then I return the set.
Technically speaking, you have the right mindset of how recursion works.
Practically speaking, you would not want to use recursion for an instance such as the one you described above. Reasons being is that every recursive call adds to the stack (which is finite in size), and recursive calls are expensive on the processor, with enough objects you are going to run into some serious bottle-necking on a large application. With enough recursive calls, you would result with a stack overflow, which is exactly what you would get in "infinite recursion". You never want something to infinitely recurse; it goes against the fundamental principal of recursion.
Recursion works on two defining characteristics:
A base case can be defined: It is possible to eventually reach 0 or 1 depending on your necessity
A general case can be defined: The general case is continually called, reducing the problem set until your base case is reached.
Once you have defined both cases, you can define a recursive solution.
The point of recursion is to take a very large and difficult-to-solve problem and continually break it down until it's easy to work with.
Once our base case is reached, the methods "recurse-out". This means they bounce backwards, back into the function that called it, bringing all the data from the functions below it!
It is at this point that our operations actually occur.
Once the original function is reached, we have our final result.
For example, let's say you want the summation of the first 3 integers. The first recursive call is passed the number 3.
public factorial(num) {
//Base case
if (num == 1) {
return 1;
}
//General case
return num + factorial(num-1);
}
Walking through the function calls:
factorial(3); //Initial function call
//Becomes..
factorial(1) + factorial(2) + factorial(3) = returned value
This gives us a result of 6!
Your scenario seems to me like iterative programming, but your function is simply calling itself as a way of continuing its comparisons. That is simply re-tasking your function to be able to call itself with a smaller list.
In my experience, a recursive function has more potential to branch out into multiple 'threads' (so to speak), and is used to process information the same way the hierarchy in a company works for delegation; The boss hands a contract down to the managers, who divide up the work and hand it to their respective staff, the staff get it done, and had it back to the managers, who report back to the boss.
The best example of a recursive function is one that iterates through all files on a file system. ( I will do this in pseudo code because it works in all languages).
function find_all_files (directory_name)
{
- Check the given directory name for sub-directories within it
- for each sub-directory
find_all_files(directory_name + subdirectory_name)
- Check the given directory for files
- Do your processing of the filename; it is located at directory_name + filename
}
You use the function by calling it with a directory path as the parameter. The first thing it does is, for each subdirectory, it generates a value of the actual path to the subdirectory and uses it as a value to call find_all_files() with. As long as there are sub-directories in the given directory, it will keep calling itself.
Now, when the function reaches a directory that contains only files, it is allowed to proceed to the part where it process the files. Once done that, it exits, and returns to the previous instance of itself that is iterating through directories.
It continues to process directories and files until it has completed all iterations and returns to the main program flow where you called the original instance of find_all_files in the first place.
One additional note: Sometimes global variables can be handy with recursive functions. If your function is merely searching for the first occurrence of something, you can set an "exit" variable as a flag to "stop what you are doing now!". You simply add checks for the flag's status during any iterations you have going on inside the function (as in the example, the iteration through all the sub-directories). Then, when the flag is set, your function just exits. Since the flag is global, all generations of the function will exit and return to the main flow.

Recursive function call hanging, Erlang

I am currently teaching my self Erlang. Everything is going well until I found a problem with this function.
-module(chapter).
-compile(export_all).
list_length([]) -> 0;
list_length([_|Xs]) -> 1+list_length([Xs]).
This was taken out of an textbook. When I run this code using OTP 17, it just hangs, meaning it just sits as shown below.
1> c(chapter).
{ok,chapter}
2> chapter:list_length([]).
0
3> chapter:list_length([1,2]).
When looking in the task manager the Erlang OTP is using 200 Mb to 330 Mb of memory. What causes this.
It is not terminating because you are creating a new non-empty list in every case: [Anything] is always a non-empty list, even if that list contains an empty list as its only member ([[]] is a non-empty list of one member).
A proper list terminates like this: [ Something | [] ].
So with that in mind...
list_length([]) -> 0;
list_length([_|Xs]) -> 1 + list_length(Xs).
In most functional languages "proper lists" are cons-style lists. Check out the Wikipedia entry on "cons" and the Erlang documentation about lists, and then meditate on examples of list operations you see written in example code for a bit.
NOTES
Putting whitespace around operators is a Good Thing; it will prevent you from doing confused stuff with arrows and binary syntax operators next to each other along with avoiding a few other ambiguities (and its easier to read anyway).
As Steve points out, the memory explosion you noticed is because while your function is recursive, it is not tail recursive -- that is, 1 + list_length(Xs) leaves pending work to be done, which must leave a reference on the stack. To have anything to add 1 to it must complete execution of list_length/1, return a value, and in this case remember that pending value as many times as there are members in the list. Read Steve's answer for an example of how tail recursive functions can be written using an accumulator value.
Since the OP is learning Erlang, note also that the list_length/1 function isn't amenable to tail call optimization because of its addition operation, which requires the runtime to call the function recursively, take its return value, add 1 to it, and return the result. This requires stack space, which means that if the list is long enough, you can run out of stack.
Consider this approach instead:
list_length(L) -> list_length(L, 0).
list_length([], Acc) -> Acc;
list_length([_|Xs], Acc) -> list_length(Xs, Acc+1).
This approach, which is very common in Erlang code, creates an accumulator in list_length/1 to hold the length value, initializing it to 0 and passing it to list_length/2, which does the recursion. Each call of list_length/2 then increments the accumulator, and when the list is empty, the first clause of list_length/2 returns the accumulator as the result. But note that the addition operation here occurs before the recursive call takes place, which means the calls are true tail calls and thus do not require extra stack space.
For non-beginner Erlang programmers, it can be instructive to compile both the original and modified versions of this module with erlc -S and examine the generated Erlang assembler. With the original version, the assembler contains allocate calls for stack space and uses call for recursive invocation, where call is the instruction for normal function calls. But for this modified version, no allocate calls are generated, and instead of using call it performs recursion using call_only, which is optimized for tail calls.

Tail recursion in gcc/g++

I've tried to search, but was unable to find: what are requisites for functions so that gcc would optimize the tail recursion? Is there any reference or list that would contain the most important cases? Since this is version specific, of my interests are 4.6.3 or higher versions (the newer the better). However, in fact, any concrete reference would be greatly appreciated.
Thanks in advance!
With -O2 enabled, gcc will perform tail call optimization, if possible. Now, the question is: When is it possible?
It is possible to eliminate a single recursive call whenever the recursive call happens either immediately before or inside the (single) return statement, so there are no observable side effects afterwards other than possibly returning a different value. This includes returning expressions involving the ternary operator.
Note that even if there are several recursive calls in the return expression (such as in the well-known fibonacci example: return (n==1) ? 1 : fib(n-1)+fib(n-2);), tail call recursion is still applied, but only ever to the last recursive call.
As an obvious special case, if the compiler can prove that the recursive function (and its arguments) qualifies as constant expression, it can (up to a configurable maximum recursion depth, by default 500) eliminate not only the tail call, but the entire execution of the function. This is something GCC does routinely and quite reliably at -O2, which even includes calls to certain library functions such as strlen on literals.

Should I avoid tail recursion in Prolog and in general?

I'm working through "Learn Prolog now" online book for fun.
I'm trying to write a predicate that goes through each member of a list and adds one to it, using accumulators. I have already done it easily without tail recursion.
addone([],[]).
addone([X|Xs],[Y|Ys]) :- Y is X+1, addone(Xs,Ys).
But I have read that it is better to avoid this type of recursion for performance reasons. Is this true? Is it considered 'good practice' to use tail recursion always? Will it be worth the effort to use accumulators to get into a good habit?
I have tried to change this example into using accumulators, but it reverses the list. How can I avoid this?
accAddOne([X|Xs],Acc,Result) :- Xnew is X+1, accAddOne(Xs,[Xnew|Acc],Result).
accAddOne([],A,A).
addone(List,Result) :- accAddOne(List,[],Result).
Short answer: Tail recursion is desirable, but don't over-emphasize it.
Your original program is as tail recursive as you can get in Prolog. But there are more important issues: Correctness and termination.
In fact, many implementations are more than willing to sacrifice tail-recursiveness for other properties they consider more important. For example steadfastness.
But your attempted optimization has some point. At least from a historical perspective.
Back in the 1970s, the major AI language was LISP. And the corresponding definition would have been
(defun addone (xs)
(cond ((null xs) nil)
(t (cons (+ 1 (car xs))
(addone (cdr xs))))))
which is not directly tail-recursive: The reason is the cons: In implementations of that time, its arguments were evaluated first, only then, the cons could be executed. So rewriting this as you have indicated (and reversing the resulting list) was a possible optimization technique.
In Prolog, however, you can create the cons prior to knowing the actual values, thanks to logic variables. So many programs that were not tail-recursive in LISP, translated to tail-recursive programs in Prolog.
The repercussions of this can still be found in many Prolog textbooks.
Your addOne procedure already is tail recursive.
There are no choice points between the head and the last recursive call, because is/2 is deterministic.
Accumulators are sometime added to allow tail recursion, the simpler example I can think of is reverse/2. Here is a naive reverse (nreverse/2), non tail recursive
nreverse([], []).
nreverse([X|Xs], R) :- nreverse(Xs, Rs), append(Rs, [X], R).
if we add an accumulator
reverse(L, R) :- reverse(L, [], R).
reverse([], R, R).
reverse([X|Xs], A, R) :- reverse(Xs, [X|A], R).
now reverse/3 is tail recursive: the recursive call is the last one, and no choice point is left.
O.P. said:
But I have read that it is better to avoid [tail] recursion for performance reasons.
Is this true? Is it considered 'good practice' to use tail recursion always? Will it
be worth the effort to use accumulators to get into a good habit?
It is a fairly straightforward optimization to convert a tail-recursive construct into iteration (a loop). Since the tail (recursive) call is the last thing done, the stack frame can be reused in the recursive call, making the recursion, for all intents and purposes, a loop, by simply jumping to the beginning of the predicate/function/method/subroutine. Thus, a tail recursive predicate will not overflow the stack. Tail-recursive construct, with the optimization applied have the following benefits:
Slightly faster execution as new stack frames don't need to be allocated/freed; further, you get better locality of reference, so arguably less paging.
No upper bound on the depth of recursion.
No stack overflows.
The possible downsides?
loss of useful stack trace. Not an issue if TRO is only applied in a release/optimized build and not in a debug build, but...
developers will write code that depends on TRO, which means that code will run fine with TRO applied will fail without TRO being applied. Which means that in the above case (TRO only in release/optimized builds), a functional change exists between release and debug builds, essentially meaning one's choice of compiler options generates two different programs from identical source code.
This is not, of course, an issue, when the language standard demands tail recursion optimization.
To quote Wikipedia:
Tail calls are significant because they can be implemented without adding
a new stack frame to the call stack. Most of the frame of the current procedure
is not needed any more, and it can be replaced by the frame of the tail call,
modified as appropriate (similar to overlay for processes, but for function
calls). The program can then jump to the called subroutine. Producing such code
instead of a standard call sequence is called tail call elimination, or tail
call optimization.
See also:
What Is Tail Call Optimization?
The Portland Pattern Repository on the matter
and even Microsoft's MSDN
I've never understood why more languages don't implement tail recursion optimization
I don't think that the first version of addone should lead to less efficient code. It is also a lot more readable, so I see no reason why it should be good practice to avoid it.
In more complex examples, the compiler might not be able to transfer the code automatically to tail recursion. Then it may be reasonable to rewrite it as an optimization, but only if it is really necessary.
So, how can you implement a working tail recursive version of addone? It may be cheating but assuming that reverse is implemented with tail-recursion (e.g., see here), then it can be used to fix your problem:
accAddOne([X|Xs],Acc,Result) :- Xnew is X+1, accAddOne(Xs,[Xnew|Acc],Result).
accAddOne([],Acc,Result) :- reverse(Acc, Result).
addone(List,Result) :- accAddOne(List,[],Result).
It is extremly clumsy, though. :-)
By the way, I cannot find a simpler solution. It may because of the same reason as foldr in Haskell is normally not defined with tail recursion.
In contrast to so some other programming languages, certain Prolog implementations are well suited for tail recursive programs. Tail recursion can be handled as a special case of last call optimization (LCO). For example this here in Java doesn't work:
public static boolean count(int n) {
if (n == 0) {
return true;
} else {
return count(n-1);
}
}
public static void main(String[] args) {
System.out.println("count(1000)="+count(1000));
System.out.println("count(1000000)="+count(1000000));
}
The result will be:
count(1000)=true
Exception in thread "main" java.lang.StackOverflowError
at protect.Count.count(Count.java:9)
at protect.Count.count(Count.java:9)
On the other hand major Prolog implementations don't have any problem with it:
?- [user].
count(0) :- !.
count(N) :- M is N-1, count(M).
^D
The result will be:
?- count(1000).
true.
?- count(1000000).
true.
The reason Prolog systems can do that, is that their execution is most often anyway trampolin style, and last call optimization is then a matter of choice point elimination and environment trimming. Environment trimming was already documented in early WAM.
But yes, debugging might be a problem.

Resources