Iteration vs Recursion efficiency - performance

I got a basic idea of how recursion works - but I've always programmed iteratively.
When we look at the keywords CPU/stack/calls and space, how is recursion different from iterations?
It needs more memory because of running many "stacks(?)" which each (most likely) stores a value. It therefore takes up much more space, than an iterative solution to the same problem. This is generally speaking. There are some cases where Recursion would be better, such as programming Towers of Hanoi and such.
Am I all wrong? I've got an exam soon and I have to prepare a lot of subjects. Recursion is not my strong suit, so I would appreciate some help on this matter :)

This really depends on the nature of the language and compiler/interpreter.
Some functional languages implement tail recursion, for example, to recognize specific cases where the stack frame can be destroyed/freed prior to recursing into the next call. In those special cases among the languages/compilers/interpreters that support it, you can actually have infinite recursion without overflowing the stack.
If you're working with languages that use the hardware stack and don't implement tail recursion, then typically you have to push arguments to the stack prior to branching into a function and pop them off along with return values, so there's somewhat of an implicit data structure there under the hood (if we can call it that). There's all kinds of additional things that can happen here as well, like register shadowing to optimize it.
The hardware stack is usually very efficient, typically just incrementing and decrementing a stack pointer register to push and pop, but it does involve a bit more state and instructions than branching with a loop counter or a condition. Perhaps more importantly, it tends to involve more distant branching to jump into another function's code as opposed to looping within the same body of code which could involve more instruction cache and page misses.
In these types of languages/compilers/interpreters that use the hardware stack and will always overflow it with enough recursion, the loopy routes often provide a performance advantage (but can be more tedious to code).
As a twist, you also have aggressive optimizers sometimes which do all kinds of magic with your code in the process of translating it to machine instructions and linking it like inlining functions and unrolling loops, and when you take all these factors into account, it's often better to just code things a bit more naturally and then measure with a profiler if it could use some tweaking. Of course you shouldn't use recursion in cases that can overflow, but I generally wouldn't worry about the overhead most of the time.

Related

Performance of recursive function in register based compiler

I have a question of whether there will be a performance hit when we write recursive functions in Register based compilers like DVM. I'm aware that recursion isn't recommended in compilers with limited depth like compilers for python.
Being register-based does not help for recursive functions, they still have the same problem: conceptually, every call creates a new stack frame. If that is implemented literally, then a recursive call is inherently a little slower than looping, and perhaps more importantly, uses up a finite resource so the recursion depth is limited. A register-based code representation does not have the concept of an operand stack, but that concept is mostly disjoint from the concept of a call stack, which is still necessary just to have general subroutines. Subroutines can be implemented without a call stack if recursion is banned, in which case they need not be re-entrant so the local variables and the variable that holds the return address can be statically allocated.
Going through a trampoline works around the stack growth by quickly returning to a special caller that then calls the continuation, that way recursion doesn't grow the stack at all since the old frame gets deallocated before a new one is created, but it adds even more run-time overhead. Tail call elimination by rewriting the call into a jump achieves a similar effect but by reusing the same frame, with less associated overhead, this requires explicit support from the VM.
Both of those techniques apply equally to stack based and register based representations of the code, which incidentally is primarily a difference in the format in which the code is stored, and need not reflect a difference in the way the code is actually executed: a JIT compiler can turn both of them into whatever form the machine requires.

Will there be functions if there were no stacks?

I know that a stack data structure is used to store the local variables among many other things of a function that is being run.
I also understand how stack can be used to eleganlty manage recursion.
Suppose there was a machine that did not provide a stack area in memory, I don't think there will be programming languages for the machine that will support recursion. I am also wondering if programming languages for the machine would support functions without recursion.
Please, someone shread some sight on this for me.
A bit of theoretical framework is needed to understand that recursion is indeed not tied to functions at all, rather it is tied to expressiveness.
I won't dig into that, leaving Google fill any gaps.
Yes, we can have functions without a stack.
We don't need even the call/ret machinery for functions, we can just have the compiler inline every function call.
So there is no need for a stack at all.
This considers only functions in the programming sense, not mathematical sense.
A better name would be routines.
Anyway that is a simply proof of concepts that functions, intended as reusable code, don't need a stack.
However not all functions, in the mathematical sense, can implemented this way.
This is analogous to say: "We can have dogs on the bed but not all dogs can be on the bed".
You are in the right track by citing recursion, however when it comes to recursion, we need to be a lot more formal as there are various forms of recursion.
For example in-lining every function call may loop the compiler if the function being inlined is not constrained somehow.
Without digging into the theory, in order to be always sure that our compiler won't loop we can only allow primitive (bounded) recursion.
What you probably means by "recursion" is general recursion, that cannot be achieved by in-lining, we can show that we need an infinite amount of memory for GR and that is the demarcation between PR and GR, not having a stack.
So we can have function without a stack, even recursive (for some form of recursion) functions.
If your question was more practical then just consider MIPS.
There is no stack instructions or stack pointer register in the MIPS ISA, everything related to stack is just convention.
The compiler could use any memory area and treat it like a stack.

what is the advantage of recursion algorithm over iteration algorithm?

I am having problem in understanding one thing that when recursion involves so much space as well as the time complexity of both the iterative algos and recursive algos are same unless I apply Dynamic programming to it , then why should we use recursion ,Is it mere for reducing the lines of code that we use this , since even for implementing recursion , a whole PCB has to be saved during passing of control of function from one call to another ?
Although I have seen many posts related to it but still it's not clear to me that what is the major advantage of implementing recursion over iteration ?
what is the major advantage of implementing recursion over iteration ?
Readability - don't neglect it. If the code is readable and simple - it will take less time to code it (which is very important in real life), and a simpler code is also easier to maintain (since in future updates, it will be easy to understand what's going on).
Now, I am not saying "Make everything recursive!", but if something is significantly more readable in a recursive solution - unless in production it makes the code suffer in performance in noticeable way - keep it!.
Performance. Yes, you heard me. Sometimes, to transform a recursive solution to an iterative one, you need more than a simple loop - you need a loop + a stack.
However, many times, your stack DS is not as efficient as the machine one, which is optimized exactly for this purpose by hundreds of employees in Intel/AMD.
This thread discusses a specific case, where a trivial conversion of an algorithm from recursive to iterative yields worse results, and in order to outperform the machine stack - you will need to invest lots of hours in optimizing your stack, and your time is a scarce resource.
Again, I am not saying - "recursion is always faster!". But in some situations, it could be.

Is classical (recursion based) depth first search more memory efficient than stack based DFS?

I was looking at the responses to this question by #AndreyT and I had a question regarding the memory efficiency of classical DFS vs. stack based DFS. The argument is that the classical backtracking DFS cannot be created from BFS by a simple stack-to-queue replacement. In doing the BFS to DFS by stack-to-queue replacement, you lose the space efficiency of classical DFS. Not being a search algorithm expert (though I am reading up on it) I'm going to assume this is just "true" and go with it.
However, my question is really about overall memory efficiency. While a recursive solution does have a certain code efficiency (I can do a lot more with a few lines of recursive search code) and elegance, doesn't it have a memory (and possibly performance) "hit" because of the fact it is recursive?
Every time you recurse into the function, it pushes local data onto the stack, the return address of the function, and whatever else the compiler thought was necessary to maintain state on return, etc. This can add up quickly. It also has to make a function call for each recursion, which eats up some ops as well (maybe minor...or maybe it breaks branching predictability forcing a flush of the pipeline...not an expert here...feel free to chime in).
I think I want to stick to simple recursion for now and not get into "alternative forms" like tail-recursion for the answer to this question. At least for now.
Whereas you can probably do better than a compiler by explicitly managing a stack, the reward often isn't worth the extra work. Compilers are a lot smarter these days than a lot of people think. They're also not quite as good as others think sometimes--you take the good with the bad.
That said, most compilers can detect and remove tail recursion. Some can convert simple recursive functions to iterative solutions, and some really good compilers can figure out that local variables can be re-used.
Recursive solutions often result in smaller code, which means it's more likely that the code will fit into the CPU cache. A non-recursive solution that requires an explicitly managed stack can result in larger code, more cache misses, and slower performance than the "slow" recursive solution.
Finally, many solutions are most intuitively implemented recursively, and those solutions tend to be short and easy to understand. Converting a recursive solution to an iterative solution that involves explicitly managing a stack results in more lines of code that is often obscure, fragile, and difficult to prove correct.
I've found that an easy to code and understand recursive solution is usually plenty fast enough and doesn't use too much memory. In the rare case that profiling reveals my recursive implementation is the bottleneck (usually it's a comparison function or something similar that the recursive function calls), I'll consider a non-recursive solution. I don't very often realize a significant gain by making the change.

Reasoning about performance in Haskell

The following two Haskell programs for computing the n'th term of the Fibonacci sequence have greatly different performance characteristics:
fib1 n =
case n of
0 -> 1
1 -> 1
x -> (fib1 (x-1)) + (fib1 (x-2))
fib2 n = fibArr !! n where
fibArr = 1:1:[a + b | (a, b) <- zip fibArr (tail fibArr)]
They are very close to mathematically identical, but fib2 uses the list notation to memoize its intermediate results, while fib1 has explicit recursion. Despite the potential for the intermediate results to be cached in fib1, the execution time gets to be a problem even for fib1 25, suggesting that the recursive steps are always evaluated. Does referential transparency contribute anything to Haskell's performance? How can I know ahead of time if it will or won't?
This is just an example of the sort of thing I'm worried about. I'd like to hear any thoughts about overcoming the difficulty inherent in reasoning about the performance of a lazily-executed, functional programming language.
Summary: I'm accepting 3lectrologos's answer, because the point that you don't reason so much about the language's performance, as about your compiler's optimization, seems to be extremely important in Haskell - more so than in any other language I'm familiar with. I'm inclined to say that the importance of the compiler is the factor that differentiates reasoning about performance in lazy, functional langauges, from reasoning about the performance of any other type.
Addendum: Anyone happening on this question may want to look at the slides from Johan Tibell's talk about high performance Haskell.
In your particular Fibonacci example, it's not very hard to see why the second one should run faster (although you haven't specified what f2 is).
It's mainly an algorithmic issue:
fib1 implements the purely recursive algorithm and (as far as I know) Haskell has no mechanism for "implicit memoization".
fib2 uses explicit memoization (using the fibArr list to store previously computed values.
In general, it's much harder to make performance assumptions for a lazy language like Haskell, than for an eager one. Nevertheless, if you understand the underlying mechanisms (especially for laziness) and gather some experience, you will be able to make some "predictions" about performance.
Referential transparency increases (potentially) performance in (at least) two ways:
First, you (as a programmer) can be sure that two calls to the same function will always return the same, so you can exploit this in various cases to benefit in performance.
Second (and more important), the Haskell compiler can be sure for the above fact and this may enable many optimizations that can't be enabled in impure languages (if you've ever written a compiler or have any experience in compiler optimizations you are probably aware of the importance of this).
If you want to read more about the reasoning behind the design choices (laziness, pureness) of Haskell, I'd suggest reading this.
Reasoning about performance is generally hard in Haskell and lazy languages in general, although not impossible. Some techniques are covered in Chris Okasaki's Purely Function Data Structures (also available online in a previous version).
Another way to ensure performance is to fix the evaluation order, either using annotations or continuation passing style. That way you get to control when things are evaluated.
In your example you might calculate the numbers "bottom up" and pass the previous two numbers along to each iteration:
fib n = fib_iter(1,1,n)
where
fib_iter(a,b,0) = a
fib_iter(a,b,1) = a
fib_iter(a,b,n) = fib_iter(a+b,a,n-1)
This results in a linear time algorithm.
Whenever you have a dynamic programming algorithm where each result relies on the N previous results, you can use this technique. Otherwise you might have to use an array or something completely different.
Your implementation of fib2 uses memoization but each time you call fib2 it rebuild the "whole" result. Turn on ghci time and size profiling:
Prelude> :set +s
If it was doing memoisation "between" calls the subsequent calls would be faster and use no memory. Call fib2 20000 twice and see for yourself.
By comparison a more idiomatic version where you define the exact mathematical identity:
-- the infinite list of all fibs numbers.
fibs = 1 : 1 : zipWith (+) fibs (tail fibs)
memoFib n = fibs !! n
actually do use memoisation, explicit as you see. If you run memoFib 20000 twice you'll see the time and space taken the first time then the second call is instantaneous and take no memory. No magic and no implicit memoization like a comment might have hinted at.
Now about your original question: optimizing and reasoning about performance in Haskell...
I wouldn't call myself an expert in Haskell, I have only been using it for 3 years, 2 of which at my workplace but I did have to optimize and get to understand how to reason somewhat about its performance.
As mentionned in other post laziness is your friend and can help you gain performance however YOU have to be in control of what is lazily evaluated and what is strictly evaluated.
Check this comparison of foldl vs foldr
foldl actually stores "how" to compute the value i.e. it is lazy. In some case you saves time and space beeing lazy, like the "infinite" fibs. The infinite "fibs" doesn't generate all of them but knows how. When you know you will need the value you might as well just get it "strictly" speaking... That's where strictness annotation are usefull, to give you back control.
I recall reading many times that in lisp you have to "minimize" consing.
Understanding what is stricly evaluated and how to force it is important but so is understanding how much "trashing" you do to the memory. Remember Haskell is immutable, that means that updating a "variable" is actually creating a copy with the modification. Prepending with (:) is vastly more efficient than appending with (++) because (:) does not copy memory contrarily to (++). Whenever a big atomic block is updated (even for a single char) the whole block needs to be copied to represent the "updated" version. The way you structure data and update it can have a big impact on performance. The ghc profiler is your friend and will help you spot these. Sure the garbage collector is fast but not having it do anything is faster!
Cheers
Aside from the memoization issue, fib1 also uses non-tailcall recursion. Tailcall recursion can be re-factored automatically into a simple goto and perform very well, but the recursion in fib1 cannot be optimized in this way, because you need the stack frame from each instance of fib1 in order to calculate the result. If you rewrote fib1 to pass a running total as an argument, thus allowing a tail call instead of needing to keep the stack frame for the final addition, the performance would improve immensely. But not as much as the memoized example, of course :)
Since allocation is a major cost in any functional language, an important part of understanding performance is to understand when objects are allocated, how long they live, when they die, and when they are reclaimed. To get this information you need a heap profiler. It's an essential tool, and luckily GHC ships with a good one.
For more information, read Colin Runciman's papers.

Resources