I'm a student, new to scheme language.
I'm trying to write efficient functions.
I already know how to calculate the execution time of a function, but what I'd like to know is how to calculate the stack(or memory) utilization of this function.
Because as I know, the less number of instructions waiting on the stack during the execution the better efficiency gets.
so is there a way to count the number of instructions waiting on the stack?
No there is not a 'way to count the number of instructions waiting on the stack' in standard Scheme.
If you are learning Scheme, focusing on efficiency, execution time of a function and stack/memory utilization are utterly the wrong focus. "Premature optimization is the root of all evil (or at least most of it) in programming." D. Knuth - way too much focus on optimization at the level you are questioning.
You should be thinking at the level of algorithmic computational complexity and if you start expressing algorithms recursively, as Scheme encourages, then start learning about tail recursion (because Scheme guarantees that they are optimized to iteration).
Related
Are not algorithms supposed to have the same time complexity across any programming language, then why do we consider such programming language differences in time complexity such as pass by value or pass by reference in the arguments when finding the total time complexity of the algorithm? Or am I wrong in saying we should not consider such programming language differences in implementation when finding the time complexity of an algorithm? I cannot think of any other instance as of now, but for example, if there was an algorithm in C++ that either passed its arguments by reference or by value, would there be difference in time complexity? If so, why? CLRS, the famous book about algorithms never discusses this point, so I am unsure if I should consider this point when finding time complexity.
I think what you are refer to is not language but compiler differencies... This is how I see it (however take in mind I am no expert in the matter)
Yes complexity of some source code ported to different langauges should be the same (if language capabilities allows it). The problem is compiler implementation and conversion to binary (or asm if you want). Each language usually adds its engine to the program that is handling the stuff like variables, memory allocation, heap/stack ,... Its similar to OS. This stuff is hidden but it have its own complexities lets call them hidden therms of complexity
For example using reference instead of direct operand in function have big impact on speed. Because while we are coding we often consider only complexity of the code and forgetting about the heap/stack. This however does not change complexity, its just affect n of the hidden therms of complexity (but heavily).
The same goes for compilers specially designed for recursion (functional programing) like LISP. In those the iteration is much slower or even not possible but recursion is way faster than on standard compilers... These I think change complexity but only of the hidden parts (related to language engine)
Some languages use internally bignums like Python. These affect complexity directly as arithmetics on CPU native datatypes is often considered O(1) however on Bignums its no longer true and can be O(1),O(log(n)),O(n),O(n.log),O(n^2) and worse depending on the operation. However while comparing to different language on the same small range the bignums are also considered O(1) however usually much slower than CPU native datatype.
Interpreted and virtual machine languages like Python,JAVA,LUA are adding much bigger overhead as they hidden therms usually not only contain langue engine but also the interpreter which must decode code, then interpret it or emulate or what ever which changes the hidden therms of complexity greatly. Not precompiled interpreters are even worse as you first need to parse text which is way much slower...
If I put all together its a matter of perspective if you consider hidden therms of complexity as constant time or complexity. In most cases the average speed difference between language/compilers/interpreters is constant (so you can consider it as constant time so no complexity change) but not always. To my knowledge the only exception is functional programming where the difference is almost never constant ...
I am having problem in understanding one thing that when recursion involves so much space as well as the time complexity of both the iterative algos and recursive algos are same unless I apply Dynamic programming to it , then why should we use recursion ,Is it mere for reducing the lines of code that we use this , since even for implementing recursion , a whole PCB has to be saved during passing of control of function from one call to another ?
Although I have seen many posts related to it but still it's not clear to me that what is the major advantage of implementing recursion over iteration ?
what is the major advantage of implementing recursion over iteration ?
Readability - don't neglect it. If the code is readable and simple - it will take less time to code it (which is very important in real life), and a simpler code is also easier to maintain (since in future updates, it will be easy to understand what's going on).
Now, I am not saying "Make everything recursive!", but if something is significantly more readable in a recursive solution - unless in production it makes the code suffer in performance in noticeable way - keep it!.
Performance. Yes, you heard me. Sometimes, to transform a recursive solution to an iterative one, you need more than a simple loop - you need a loop + a stack.
However, many times, your stack DS is not as efficient as the machine one, which is optimized exactly for this purpose by hundreds of employees in Intel/AMD.
This thread discusses a specific case, where a trivial conversion of an algorithm from recursive to iterative yields worse results, and in order to outperform the machine stack - you will need to invest lots of hours in optimizing your stack, and your time is a scarce resource.
Again, I am not saying - "recursion is always faster!". But in some situations, it could be.
I was looking at the responses to this question by #AndreyT and I had a question regarding the memory efficiency of classical DFS vs. stack based DFS. The argument is that the classical backtracking DFS cannot be created from BFS by a simple stack-to-queue replacement. In doing the BFS to DFS by stack-to-queue replacement, you lose the space efficiency of classical DFS. Not being a search algorithm expert (though I am reading up on it) I'm going to assume this is just "true" and go with it.
However, my question is really about overall memory efficiency. While a recursive solution does have a certain code efficiency (I can do a lot more with a few lines of recursive search code) and elegance, doesn't it have a memory (and possibly performance) "hit" because of the fact it is recursive?
Every time you recurse into the function, it pushes local data onto the stack, the return address of the function, and whatever else the compiler thought was necessary to maintain state on return, etc. This can add up quickly. It also has to make a function call for each recursion, which eats up some ops as well (maybe minor...or maybe it breaks branching predictability forcing a flush of the pipeline...not an expert here...feel free to chime in).
I think I want to stick to simple recursion for now and not get into "alternative forms" like tail-recursion for the answer to this question. At least for now.
Whereas you can probably do better than a compiler by explicitly managing a stack, the reward often isn't worth the extra work. Compilers are a lot smarter these days than a lot of people think. They're also not quite as good as others think sometimes--you take the good with the bad.
That said, most compilers can detect and remove tail recursion. Some can convert simple recursive functions to iterative solutions, and some really good compilers can figure out that local variables can be re-used.
Recursive solutions often result in smaller code, which means it's more likely that the code will fit into the CPU cache. A non-recursive solution that requires an explicitly managed stack can result in larger code, more cache misses, and slower performance than the "slow" recursive solution.
Finally, many solutions are most intuitively implemented recursively, and those solutions tend to be short and easy to understand. Converting a recursive solution to an iterative solution that involves explicitly managing a stack results in more lines of code that is often obscure, fragile, and difficult to prove correct.
I've found that an easy to code and understand recursive solution is usually plenty fast enough and doesn't use too much memory. In the rare case that profiling reveals my recursive implementation is the bottleneck (usually it's a comparison function or something similar that the recursive function calls), I'll consider a non-recursive solution. I don't very often realize a significant gain by making the change.
Here's a claim made in one of the answers at: https://softwareengineering.stackexchange.com/a/136146/18748
Try your hand at a Functional language or two. Try implementing
factorial in Erlang using recursion and watch your jaw hit the floor
when 20000! returns in 5 seconds (no stack overflow in site)
Why would it be faster/more efficient than using recursion/iteration in Java/C/C++/Python (any)? What is the underlying mathematical/theoretical concept that makes this happen? Unfortunately, I wasn't ever exposed to functional programming in my undergrad (started with 'C') so I may just lack knowing the 'why'.
A recursive solution would probably be slower and clunkier in Java or C++, but we don't do things that way in Java and C++, so. :)
As for why, functional languages invariably make very heavy usage of recursion (as by default they either have to jump through hoops, or simply aren't allowed, to modify variables, which on its own makes most forms of iteration ineffective or outright impossible). So they effectively have to optimize the hell out of it in order to be competitive.
Almost all of them implement an optimization called "tail call elimination", which uses the current call's stack space for the next call and turns a "call" into a "jump". That little change basically turns recursion into iteration, but don't remind the FPers of that. When it comes to iteration in a procedural language and recursion in a functional one, the two would prove about equal performancewise. (If either one's still faster, though, it'd be iteration.)
The rest is libraries and number types and such. Decent math code ==> decent math performance. There's nothing keeping procedural or OO languages from optimizing similarly, other than that most people don't care that much. Functional programmers, on the other hand, love to navel gaze about how easily they can calculate Fibonacci sequences and 20000! and other numbers that most of us will never use anyway.
Essentially, functional languages make recursion cheap.
Particularly via the tail call optimization.
Tail-call optimization
Lazy evaluation: "When using delayed evaluation, an expression is not evaluated as soon as it gets bound to a variable, but when the evaluator is forced to produce the expression's value."
Both concepts are illustrated in this question about various ways of implementing factorial in Clojure. You can also find a detailed discussion of lazy sequences and TCO in Stuart Halloway and Aaron Bedra's book, Programming Clojure.
The following function, adopted from Programming Clojure, creates a lazy sequence with the first 1M members of the Fibonacci sequence and realizes the one-hundred-thousandth member:
user=> (time (nth (take 1000000 (map first (iterate (fn [[a b]] [b (+ a b)]) [0N 1N]))) 100000))
"Elapsed time: 252.045 msecs"
25974069347221724166155034....
(20900 total digits)
(512MB heap, Intel Core i7, 2.5GHz)
It's not faster (than what?), and it has nothing to do with tail call optimization (it's not enough to throw a buzzword in here, one should also explain why tail call optimization should be faster than looping? it's simply not the case!)
Mind you that I am not a functional programming hater, to the contrary! But spreading myths does not serve the case for functional programming.
BTW, has any one here actually tried how long it takes to compute (and print, which should consume at least 50% of the CPU cycles needed) 20000!?
I did:
main _ = println (product [2n..20000n])
This is a JVM language compiled to Java, and it uses Java big integers (known to be slow). It also suffers of JVM startup costs. And it is not the fastest way to do it (explicit recursion could save list construction, for example).
The result is:
181920632023034513482764175686645876607160990147875264
...
... many many lines with digits
...
000000000000000000000000000000000000000000000000000000000000000
real 0m3.330s
user 0m4.472s
sys 0m0.212s
(On a Intel® Core™ i3 CPU M 350 # 2.27GHz × 4)
We can safely assume that the same in C with GMP wouldn't even use 50% of that time.
Hence, functional is faster is a myth, as well as functional is slower. It's not even a myth, it's simply nonsense as long as one does not say compared to what it is faster/slower.
The following two Haskell programs for computing the n'th term of the Fibonacci sequence have greatly different performance characteristics:
fib1 n =
case n of
0 -> 1
1 -> 1
x -> (fib1 (x-1)) + (fib1 (x-2))
fib2 n = fibArr !! n where
fibArr = 1:1:[a + b | (a, b) <- zip fibArr (tail fibArr)]
They are very close to mathematically identical, but fib2 uses the list notation to memoize its intermediate results, while fib1 has explicit recursion. Despite the potential for the intermediate results to be cached in fib1, the execution time gets to be a problem even for fib1 25, suggesting that the recursive steps are always evaluated. Does referential transparency contribute anything to Haskell's performance? How can I know ahead of time if it will or won't?
This is just an example of the sort of thing I'm worried about. I'd like to hear any thoughts about overcoming the difficulty inherent in reasoning about the performance of a lazily-executed, functional programming language.
Summary: I'm accepting 3lectrologos's answer, because the point that you don't reason so much about the language's performance, as about your compiler's optimization, seems to be extremely important in Haskell - more so than in any other language I'm familiar with. I'm inclined to say that the importance of the compiler is the factor that differentiates reasoning about performance in lazy, functional langauges, from reasoning about the performance of any other type.
Addendum: Anyone happening on this question may want to look at the slides from Johan Tibell's talk about high performance Haskell.
In your particular Fibonacci example, it's not very hard to see why the second one should run faster (although you haven't specified what f2 is).
It's mainly an algorithmic issue:
fib1 implements the purely recursive algorithm and (as far as I know) Haskell has no mechanism for "implicit memoization".
fib2 uses explicit memoization (using the fibArr list to store previously computed values.
In general, it's much harder to make performance assumptions for a lazy language like Haskell, than for an eager one. Nevertheless, if you understand the underlying mechanisms (especially for laziness) and gather some experience, you will be able to make some "predictions" about performance.
Referential transparency increases (potentially) performance in (at least) two ways:
First, you (as a programmer) can be sure that two calls to the same function will always return the same, so you can exploit this in various cases to benefit in performance.
Second (and more important), the Haskell compiler can be sure for the above fact and this may enable many optimizations that can't be enabled in impure languages (if you've ever written a compiler or have any experience in compiler optimizations you are probably aware of the importance of this).
If you want to read more about the reasoning behind the design choices (laziness, pureness) of Haskell, I'd suggest reading this.
Reasoning about performance is generally hard in Haskell and lazy languages in general, although not impossible. Some techniques are covered in Chris Okasaki's Purely Function Data Structures (also available online in a previous version).
Another way to ensure performance is to fix the evaluation order, either using annotations or continuation passing style. That way you get to control when things are evaluated.
In your example you might calculate the numbers "bottom up" and pass the previous two numbers along to each iteration:
fib n = fib_iter(1,1,n)
where
fib_iter(a,b,0) = a
fib_iter(a,b,1) = a
fib_iter(a,b,n) = fib_iter(a+b,a,n-1)
This results in a linear time algorithm.
Whenever you have a dynamic programming algorithm where each result relies on the N previous results, you can use this technique. Otherwise you might have to use an array or something completely different.
Your implementation of fib2 uses memoization but each time you call fib2 it rebuild the "whole" result. Turn on ghci time and size profiling:
Prelude> :set +s
If it was doing memoisation "between" calls the subsequent calls would be faster and use no memory. Call fib2 20000 twice and see for yourself.
By comparison a more idiomatic version where you define the exact mathematical identity:
-- the infinite list of all fibs numbers.
fibs = 1 : 1 : zipWith (+) fibs (tail fibs)
memoFib n = fibs !! n
actually do use memoisation, explicit as you see. If you run memoFib 20000 twice you'll see the time and space taken the first time then the second call is instantaneous and take no memory. No magic and no implicit memoization like a comment might have hinted at.
Now about your original question: optimizing and reasoning about performance in Haskell...
I wouldn't call myself an expert in Haskell, I have only been using it for 3 years, 2 of which at my workplace but I did have to optimize and get to understand how to reason somewhat about its performance.
As mentionned in other post laziness is your friend and can help you gain performance however YOU have to be in control of what is lazily evaluated and what is strictly evaluated.
Check this comparison of foldl vs foldr
foldl actually stores "how" to compute the value i.e. it is lazy. In some case you saves time and space beeing lazy, like the "infinite" fibs. The infinite "fibs" doesn't generate all of them but knows how. When you know you will need the value you might as well just get it "strictly" speaking... That's where strictness annotation are usefull, to give you back control.
I recall reading many times that in lisp you have to "minimize" consing.
Understanding what is stricly evaluated and how to force it is important but so is understanding how much "trashing" you do to the memory. Remember Haskell is immutable, that means that updating a "variable" is actually creating a copy with the modification. Prepending with (:) is vastly more efficient than appending with (++) because (:) does not copy memory contrarily to (++). Whenever a big atomic block is updated (even for a single char) the whole block needs to be copied to represent the "updated" version. The way you structure data and update it can have a big impact on performance. The ghc profiler is your friend and will help you spot these. Sure the garbage collector is fast but not having it do anything is faster!
Cheers
Aside from the memoization issue, fib1 also uses non-tailcall recursion. Tailcall recursion can be re-factored automatically into a simple goto and perform very well, but the recursion in fib1 cannot be optimized in this way, because you need the stack frame from each instance of fib1 in order to calculate the result. If you rewrote fib1 to pass a running total as an argument, thus allowing a tail call instead of needing to keep the stack frame for the final addition, the performance would improve immensely. But not as much as the memoized example, of course :)
Since allocation is a major cost in any functional language, an important part of understanding performance is to understand when objects are allocated, how long they live, when they die, and when they are reclaimed. To get this information you need a heap profiler. It's an essential tool, and luckily GHC ships with a good one.
For more information, read Colin Runciman's papers.