Why should recursion be preferred over iteration? - performance

Iteration is more performant than recursion, right? Then why do some people opine that recursion is better (more elegant, in their words) than iteration? I really don't see why some languages like Haskell do not allow iteration and encourage recursion? Isn't that absurd to encourage something that has bad performance (and that too when more performant option i.e. recursion is available) ? Please shed some light on this. Thanks.

Iteration is more performant than
recursion, right?
Not necessarily.
This conception comes from many C-like languages, where calling a function, recursive or not, had a large overhead and created a new stackframe for every call.
For many languages this is not the case, and recursion is equally or more performant than an iterative version. These days, even some C compilers rewrite some recursive constructs to an iterative version, or reuse the stack frame for a tail recursive call.

Try implementing depth-first search recursively and iteratively and tell me which one gave you an easier time of it. Or merge sort. For a lot of problems it comes down to explicitly maintaining your own stack vs. leaving your data on the function stack.
I can't speak to Haskell as I've never used it, but this is to address the more general part of the question posed in your title.

Haskell do not allow iteration because iteration involves mutable state (the index).

As others have stated, there's nothing intrinsically less performant about recursion. There are some languages where it will be slower, but it's not a universal rule.
That being said, to me recursion is a tool, to be used when it makes sense. There are some algorithms that are better represented as recursion (just as some are better via iteration).
Case in point:
fib 0 = 0
fib 1 = 1
fib n = fib(n-1) + fib(n-2)
I can't imagine an iterative solution that could possibly make the intent clearer than that.

Here is some information on the pros & cons of recursion & iteration in c:
http://www.stanford.edu/~blp/writings/clc/recursion-vs-iteration.html
Mostly for me Recursion is sometimes easier to understand than iteration.

Iteration is just a special form of recursion.

Recursion is one of those things that seem elegant or efficient in theory but in practice is generally less efficient (unless the compiler or dynamic recompiler) is changing what the code does. In general anything that causes unnecessary subroutine calls is going to be slower, especially when more than 1 argument is being pushed/popped. Anything you can do to remove processor cycles i.e. instructions the processor has to chew on is fair game. Compilers can do a pretty good job of this these days in general but it's always good to know how to write efficient code by hand.

Several things:
Iteration is not necessarily faster
Root of all evil: encouraging something just because it might be moderately faster is premature; there are other considerations.
Recursion often much more succinctly and clearly communicates your intent
By eschewing mutable state generally, functional programming languages are easier to reason about and debug, and recursion is an example of this.
Recursion takes more memory than iteration.

I don't think there's anything intrinsically less performant about recursion - at least in the abstract. Recursion is a special form of iteration. If a language is designed to support recursion well, it's possible it could perform just as well as iteration.
In general, recursion makes one be explicit about the state you're bringing forward in the next iteration (it's the parameters). This can make it easier for language processors to parallelize execution. At least that's a direction that language designers are trying to exploit.

As a low level ITERATION deals with the CX registry to count loops, and of course data registries.
RECURSION not only deals with that it also adds references to the stack pointer to keep references of the previous calls and then how to go back.-
My University teacher told me that whatever you do with recursion can be done with Iterations and viceversa, however sometimes is simpler to do it by recursion than Iteration (more elegant) but at a performance level is better to use Iterations.-

In Java, recursive solutions generally outperform non-recursive ones. In C it tends to be the other way around. I think this holds in general for adaptively compiled languages vs. ahead-of-time compiled languages.
Edit:
By "generally" I mean something like a 60/40 split. It is very dependent on how efficiently the language handles method calls. I think JIT compilation favors recursion because it can choose how to handle inlining and use runtime data in optimization. It's very dependent on the algorithm and compiler in question though. Java in particular continues to get smarter about handling recursion.
Quantitative study results with Java (PDF link). Note that these are mostly sorting algorithms, and are using an older Java Virtual Machine (1.5.x if I read right). They sometimes get a 2:1 or 4:1 performance improvement by using the recursive implementation, and rarely is recursion significantly slower. In my personal experience, the difference isn't often that pronounced, but a 50% improvement is common when I use recursion sensibly.

I find it hard to reason that one is better than the other all the time.
Im working on a mobile app that needs to do background work on user file system. One of the background threads needs to sweep the whole file system from time to time, to maintain updated data to the user. So in fear of Stack Overflow , i had written an iterative algorithm. Today i wrote a recursive one, for the same job. To my surprise, the iterative algorithm is faster: recursive -> 37s, iterative -> 34s (working over the exact same file structure).
Recursive:
private long recursive(File rootFile, long counter) {
long duration = 0;
sendScanUpdateSignal(rootFile.getAbsolutePath());
if(rootFile.isDirectory()) {
File[] files = getChildren(rootFile, MUSIC_FILE_FILTER);
for(int i = 0; i < files.length; i++) {
duration += recursive(files[i], counter);
}
if(duration != 0) {
dhm.put(rootFile.getAbsolutePath(), duration);
updateDurationInUI(rootFile.getAbsolutePath(), duration);
}
}
else if(!rootFile.isDirectory() && checkExtension(rootFile.getAbsolutePath())) {
duration = getDuration(rootFile);
dhm.put(rootFile.getAbsolutePath(), getDuration(rootFile));
updateDurationInUI(rootFile.getAbsolutePath(), duration);
}
return counter + duration;
}
Iterative: - iterative depth-first search , with recursive backtracking
private void traversal(File file) {
int pointer = 0;
File[] files;
boolean hadMusic = false;
long parentTimeCounter = 0;
while(file != null) {
sendScanUpdateSignal(file.getAbsolutePath());
try {
Thread.sleep(Constants.THREADS_SLEEP_CONSTANTS.TRAVERSAL);
} catch (InterruptedException e) {
e.printStackTrace();
}
files = getChildren(file, MUSIC_FILE_FILTER);
if(!file.isDirectory() && checkExtension(file.getAbsolutePath())) {
hadMusic = true;
long duration = getDuration(file);
parentTimeCounter = parentTimeCounter + duration;
dhm.put(file.getAbsolutePath(), duration);
updateDurationInUI(file.getAbsolutePath(), duration);
}
if(files != null && pointer < files.length) {
file = getChildren(file,MUSIC_FILE_FILTER)[pointer];
}
else if(files != null && pointer+1 < files.length) {
file = files[pointer+1];
pointer++;
}
else {
pointer=0;
file = getNextSybling(file, hadMusic, parentTimeCounter);
hadMusic = false;
parentTimeCounter = 0;
}
}
}
private File getNextSybling(File file, boolean hadMusic, long timeCounter) {
File result= null;
//se o file é /mnt, para
if(file.getAbsolutePath().compareTo(userSDBasePointer.getParentFile().getAbsolutePath()) == 0) {
return result;
}
File parent = file.getParentFile();
long parentDuration = 0;
if(hadMusic) {
if(dhm.containsKey(parent.getAbsolutePath())) {
long savedValue = dhm.get(parent.getAbsolutePath());
parentDuration = savedValue + timeCounter;
}
else {
parentDuration = timeCounter;
}
dhm.put(parent.getAbsolutePath(), parentDuration);
updateDurationInUI(parent.getAbsolutePath(), parentDuration);
}
//procura irmao seguinte
File[] syblings = getChildren(parent,MUSIC_FILE_FILTER);
for(int i = 0; i < syblings.length; i++) {
if(syblings[i].getAbsolutePath().compareTo(file.getAbsolutePath())==0) {
if(i+1 < syblings.length) {
result = syblings[i+1];
}
break;
}
}
//backtracking - adiciona pai, se tiver filhos musica
if(result == null) {
result = getNextSybling(parent, hadMusic, parentDuration);
}
return result;
}
Sure the iterative isn't elegant, but alhtough its currently implemented on an ineficient way, it is still faster than the recursive one. And i have better control over it, as i dont want it running at full speed, and will let the garbage collector do its work more frequently.
Anyway, i wont take for granted that one method is better than the other, and will review other algorithms that are currently recursive. But at least from the 2 algorithms above, the iterative one will be the one in the final product.

I think it would help to get some understanding of what performance is really about. This link shows how a perfectly reasonably-coded app actually has a lot of room for optimization - namely a factor of 43! None of this had anything to do with iteration vs. recursion.
When an app has been tuned that far, it gets to the point where the cycles saved by iteration as against recursion might actually make a difference.

Recursion is the typical implementation of iteration. It's just a lower level of abstraction (at least in Python):
class iterator(object):
def __init__(self, max):
self.count = 0
self.max = max
def __iter__(self):
return self
# I believe this changes to __next__ in Python 3000
def next(self):
if self.count == self.max:
raise StopIteration
else:
self.count += 1
return self.count - 1
# At this level, iteration is the name of the game, but
# in the implementation, recursion is clearly what's happening.
for i in iterator(50):
print(i)

I would compare recursion with an explosive: you can reach big result in no time. But if you use it without cautions the result could be disastrous.
I was impressed very much by proving of complexity for the recursion that calculates Fibonacci numbers here. Recursion in that case has complexity O((3/2)^n) while iteration just O(n). Calculation of n=46 with recursion written on c# takes half minute! Wow...
IMHO recursion should be used only if nature of entities suited for recursion well (trees, syntax parsing, ...) and never because of aesthetic. Performance and resources consumption of any "divine" recursive code need to be scrutinized.

Iteration is more performant than recursion, right?
Yes.
However, when you have a problem which maps perfectly to a Recursive Data Structure, the better solution is always recursive.
If you pretend to solve the problem with iterations you'll end up reinventing the stack and creating a messier and ugly code, compared to the elegant recursive version of the code.
That said, Iteration will always be faster than Recursion. (in a Von Neumann Architecture), so if you use recursion always, even where a loop will suffice, you'll pay a performance penalty.
Is recursion ever faster than looping?

"Iteration is more performant than recursion" is really language- and/or compiler-specific. The case that comes to mind is when the compiler does loop-unrolling. If you've implemented a recursive solution in this case, it's going to be quite a bit slower.
This is where it pays to be a scientist (testing hypotheses) and to know your tools...

on ntfs UNC max path as is 32K
C:\A\B\X\C.... more than 16K folders can be created...
But you can not even count the number of folders with any recursive method, sooner or later all will give stack overflow.
Only a Good lightweight iterative code should be used to scan folders professionally.
Believe or not, most top antivirus cannot scan maximum depth of UNC folders.

Related

Factorial using recursion(backtracking vs a new approach)

I am trying to learn recursion. for the starting problem of it , which is calculating factorial of a number I have accomplished it using two methods.
the first one being the normal usual approach.
the second one i have tried to do something different.
in the second one i return the value of n at the end rather than getting the starting value as in the first one which uses backtracking.
my question is that does my approach has any advantages over backtracking?
if asked to chose which one would be a better solution?
//first one is ,
ll factorial(int n)
{
if(n==1)
return 1 ;
return n*factorial(n-1) ;
}
int main()
{
factorial(25) ;
return 0 ;
}
// second one is ,
ll fact(ll i,ll n)
{
if(i==0)return n ;
n=(n*i)
i--;
n=fact(i,n);
}
int main()
{
int n ;
cin>>n ;
cout<<fact(n,1) ;
return 0 ;
}
// ll is long long int
First of all I want to point out that premature optimization at the expense of readibility is almost always a mistake, especially when the need for optimization comes from intuition and not from measurements.
"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%" - Donald Knuth
But let's say we care about the 3% in this case, because all our program ever does is compute lots of factorials. To keep it short: You will never be smarter than the compiler.
If this seems slightly crazy then this definitely applies to you and you should stop thinking about 'micromanaging/optimizing your code'. If you are a very skilled C++ programmer this will still apply to you in most cases, but you will recognize the opportunities to help your compiler out.
To back this up with some fact, we can compile the code (with automatic optimization) and (roughly) compare the assembly output. I will use the wonderful website godbolt.org
Don't be discouraged by the crazy assembler code, we don't need to understand it. But we can see that both methods
are basically the same length when compiled as assembler code
contain almost the same instructions
So to recap, readability should be your number one priority. In case a speed measurement shows that this one part of your code really is a big performance problem, really think about if you can make a change that structurally improves the algorithm (i.e. by decreasing the complexity). Otherwise your compiler will take care of it for you.

Compiling container operations to efficient incremental code

Most programming languages nowadays provide containers like list, set, multiset, map. Operations on all elements of a container e.g. copy_if or transform normally take O(n) time. Lazy evaluation can make this sublinear if you only need the first few elements of the result, but it's back to linear if you need the full result.
Consider for example the following implementation of the DPLL algorithm for propositional satisfiability. It's essentially a translation of the pseudocode to C++, thus optimized for readability. But each step takes time linear in the number of variables, so even if it could guess all the assignments correctly the first time, total time and memory consumption would be quadratic in the number of variables, making the implementation too slow to use in practice. This is true even if we apply well-known optimizations like passing containers by constant reference and using lazy evaluation anywhere it might avoid unnecessary work.
Efficient implementations of DPLL use incremental techniques, where instead of scanning and copying entire containers, only the consequences of making a small change like assigning a single variable are computed, using O(1) time and memory per step.
A human can translate pseudocode or unoptimized reference implementation to an efficient incremental implementation; this is what we do when we write practical SAT solvers, theorem provers etc. The price we pay is to thenceforth work with a large, complex body of optimized code, which is much more difficult than working with a concise description of the mathematical logic.
What we ideally want is a higher-level compiler, that can compile the unoptimized reference implementation into efficient incremental code.
My question is: has any work yet been done on such a thing? Are there any existing implementations, partial implementations or discussions to look at? Higher-level optimization of container operations is not entirely unknown in principle, e.g. SQL query optimizers, Haskell loop fusion, but I'm not aware of anything that tries to go as far as I'm looking for here.
Unoptimized reference implementation (simple translation of pseudocode) of DPLL:
bool dpll(map<Var, bool> m, set<set<Literal>> clauses) {
clauses = eval(m, clauses);
// Solved
if (isFalse(clauses))
return false;
if (isTrue(clauses))
return true;
// Unit clause
auto unitClauses =
copy_if(clauses, [](set<Literal> clause) { return clause.size() == 1; });
if (unitClauses.size()) {
auto x = front(front(unitClauses));
return dpll(m + makeMap(x.var, x.pol), clauses);
}
// Pure literal
auto pureVars =
copy_if(vars(clauses), [=](Var x) { return pure(x, clauses); });
if (pureVars.size()) {
auto x = front(pureVars);
return dpll(m + makeMap(x, pol(x, clauses)), clauses);
}
// Choice
auto x = choose(vars(clauses));
return dpll(m + makeMap(x, false), clauses) ||
dpll(m + makeMap(x, true), clauses);
}
Similar reference implementation of helper functions: https://github.com/russellw/ayane/blob/master/logic/dpll.cpp

How to assess maximum number of recursive calls before stack overflows

Lets take a recursive function, for example factorial. Lets also assume that we have a stack of 1 MB size. Using a pen and paper, how can I estimate the number of recursive calls to the function before the stack overflows? I'm not interested in any particular language but rather in an abstract approach.
There are questions on SO that look similar but most of them are concerned with a specific language, or extending stack size, or estimating it by running specific function, or preventing overflow. I would like to find a mathematical way to estimate it.
I found similar question in an algorithmic challenge but couldn't come up with any reasonable solution.
Any suggestion highly appreciated.
EDIT
In response to provided replays if the language truly cannot be taken out of the equation let's assume it's C#. Also, since we are passing simple int or long to the function it's not passed by reference but as a copy. Also, assume a naive implementation, without hashing, without multi-threading, an implementation that as much as possible resembles a mathematical representation of the function:
private static long Factorial(long n)
{
if (n < 0)
{
throw new ArgumentException("Negative numbers not supported");
}
switch (n)
{
case 0:
return 1;
case 1:
return 1;
default:
return n * Factorial(n - 1);
}
}
It highly depends on the implementation of the function. How much memory does the function use, before calling itself again. When it recurses 100 times, you will also have 100 function scopes in memory, including the function arguments and variables. It also reserves 100 places on the stack to store the return values.
I don't think the language can easily be taken out of the equation, because you need to know exactly how the stack is used. For examples are objects passed by reference? Or are the objects copy as a new instance on the stack?

About reducing the branch miss prediciton

I saw a sentence in a paper "Transforming branches into data dependencies to avoid mispredicted branches." (Page 6)
I wonder how to change the code from branches into data dependencies.
This is the paper: http://www.adms-conf.org/p1-SCHLEGEL.pdf
Update: How to transform branches in the binary search?
The basic idea (I would presume) would be to change something like:
if (a>b)
return "A is greater than B";
else
return "A is less than or equal to B";
into:
static char const *strings[] = {
"A is less than or equal to B",
"A is greater than B"
};
return strings[a>b];
For branches in a binary search, let's consider the basic idea of the "normal" binary search, which typically looks (at least vaguely) like this:
while (left < right) {
middle = (left + right)/2;
if (search_for < array[middle])
right = middle;
else
left = middle;
}
We could get rid of most of the branching here in pretty much the same way:
size_t positions[] = {left, right};
while (left < right) {
size_t middle = (left + right)/2;
positions[search_for >= array[middle]] = middle;
}
[For general purpose code use left + (right-left)/2 instead of (left+right)/2.]
We do still have the branching for the loop itself, of course, but we're generally a lot less concerned about that--that branch is extremely amenable to prediction, so even if we did eliminate it, doing so would gain little as a rule.
Removing branches is not always optimal, even (especially) with simple binary conditions like this. I have looked into removing branches in a similar manner in various circumstances before. After running into an instance where code with a conditional branch in a loop ran faster than equivalent, branch-less code, I did some research on processor execution strategies.
I knew that the ARM instruction set had conditional instructions, which could make a conditional branch faster than the kind of branch-less code in the paper, but I was working on an intel (and the branch was not one that cmove could take care of). It turns out that modern CPUs, including intel, will sometimes turn an ordinary instruction into a conditional one: they use eager execution if the two end points of the branch are sufficiently short. That is, the CPU will put both possible paths in the pipeline and execute them both and only keep the correct result once the condition is known. This avoids the possibility of a mis-prediction without having to index an array. See https://en.wikipedia.org/wiki/Speculative_execution#Variants for more information.
It takes a lot of detailed knowledge of a processor to write optimal code for it, even in assembly. This makes it possible for "naive" implementations to sometimes be faster than hand-optimized ones that overlook certain CPU characteristics. The same thing can happen with optimizing compilers: sometimes writing more complicated "optimized" code breaks some of the compiler's optimizations and results in a slower executable than a naive implementation that the compiler can optimize fully.
When in doubt and performance is critical and there is time, it is usually best to try it both ways and see which is faster!

Algorithm Efficiency - Is partially unrolling a loop effective if it requires more comparisons?

How to judge if putting two extra assignments in an iteration is expensive or setting a if condition to test another thing? here I elaborate. question is to generate and PRINT the first n terms of the Fibonacci sequence where n>=1. my implement in C was:
#include<stdio.h>
void main()
{
int x=0,y=1,output=0,l,n;
printf("Enter the number of terms you need of Fibonacci Sequence ? ");
scanf("%d",&n);
printf("\n");
for (l=1;l<=n;l++)
{
output=output+x;
x=y;
y=output;
printf("%d ",output);
}
}
but the author of the book "how to solve it by computer" says it is inefficient since it uses two extra assignments for a single fibonacci number generated. he suggested:
a=0
b=1
loop:
print a,b
a=a+b
b=a+b
I agree this is more efficient since it keeps a and b relevant all the time and one assignment generates one number. BUT it is printing or supplying two fibonacci numbers at a time. suppose question is to generate an odd number of terms, what would we do? author suggested put a test condition to check if n is an odd number. wouldn't we be losing the gains of reducing number of assignments by adding an if test in every iteration?
I consider it very bad advice from the author to even bring this up in a book targeted at beginning programmers. (Edit: In all fairness, the book was originally published in 1982, a time when programming was generally much more low-level than it is now.)
99.9% of code does not need to be optimized. Especially in code like this that mixes extremely cheap operations (arithmetic on integers) with very expensive operations (I/O), it's a complete waste of time to optimize the cheap part.
Micro-optimizations like this should only be considered in time-critical code when it is necessary to squeeze every bit of performance out of your hardware.
When you do need it, the only way to know which of several options performs best is to measure. Even then, the results may change with different processors, platforms, memory configurations...
Without commenting on your actual code: As you are learning to program, keep in mind that minor efficiency improvements that make code harder to read are not worth it. At least, they aren't until profiling of a production application reveals that it would be worth it.
Write code that can be read by humans; it will make your life much easier and keep maintenance programmers from cursing the name of you and your offspring.
My first advice echoes the others: Strive first for clean, clear code, then optimize where you know there is a performance issue. (It's hard to imagine a time-critical fibonacci sequencer...)
However, speaking as someone who does work on systems where microseconds matter, there is a simple solution to the question you ask: Do the "if odd" test only once, not inside the loop.
The general pattern for loop unrolling is
create X repetitions of the loop logic.
divide N by X.
execute the loop N/X times.
handle the N%X remaining items.
For your specific case:
a=0;
b=1;
nLoops = n/2;
while (nloops-- > 0) {
print a,b;
a=a+b;
b=a+b;
}
if (isOdd(n)) {
print a;
}
(Note also that N/2 and isOdd are trivially implemented and extremely fast on a binary computer.)

Resources