Performance Implications of Point-Free style - performance

I’m taking my first baby-steps in learning functional programing using F# and I’ve just come across the Forward Pipe (|>) and Forward Composition (>>) operators. At first I thought they were just sugar rather than having an effect on the final running code (though I know piping helps with type inference).
However I came across this SO article:
What are advantages and disadvantages of “point free” style in functional programming?
Which has two interesting and informative answers (that instead of simplifying things for me opened a whole can of worms surrounding “point-free” or “pointless” style) My take-home from these (and other reading around) is that point-free is a debated area. Like lambas, point-free style can make code easier to understand, or much harder, depending on use. It can help in naming things meaningfully.
But my question concerns a comment on the first answer:
AshleyF muses in the answer:
“It seems to me that composition may reduce GC pressure by making it more obvious to the compiler that there is no need to produce intermediate values as in pipelining; helping make the so-called "deforestation" problem more tractable.”
Gasche replies:
“The part about improved compilation is not true at all. In most languages, point-free style will actually decrease performances. Haskell relies heavily on optimizations precisely because it's the only way to make the cost of these things bearable. At best, those combinators are inlined away and you get an equivalent pointful version”
Can anyone expand on the performance implications? (In general and specifically for F#) I had just assumed it was a writing-style thing and the compiler would unstrangle both idioms into equivalent code.

This answer is going to be F#-specific. I don't know how the internals of other functional languages work, and the fact that they don't compile to CIL could make a big difference.
I can see three questions here:
What are the performance implications of using |>?
What are the performance implications of using >>?
What is the performance difference between declaring a function with its arguments and without them?
The answers (using examples from the question you linked to):
Is there any difference between x |> sqr |> sum and sum (sqr x)?
No, there isn't. The compiled CIL is exactly the same (here represented in C#):
sum.Invoke(sqr.Invoke(x))
(Invoke() is used, because sqr and sum are not CIL methods, they are FSharpFunc, but that's not relevant here.)
Is there any difference between (sqr >> sum) x and sum (sqr x)?
No, both samples compile to the same CIL as above.
Is there any difference between let sumsqr = sqr >> sum and let sumsqr x = (sqr >> sum) x?
Yes, the compiled code is different. If you specify the argument, sumsqr is compiled into a normal CLI method. But if you don't specify it, it's compiled as a property of type FSharpFunc with a backing field, whose Invoke() method contains the code.
But the effect of all is that invoking the point-free version means loading one field (the FSharpFunc), which is not done if you specify the argument. But I think that shouldn't measurably affect performance, except in the most extreme circumstances.

Related

If you write a compiler in pure Prolog, will it work as a decompiler also?

If you write a compiler in pure Prolog (no extra-logical bits), will it work as a decompiler also?
(A book I was reading opined on this, but I wonder if anyone has actually tried it)
I once wrote the equivalent of cdecl.org as a reversible program. It was a bit tricky, but I demonstrated that it could be done. (Somewhere in a pile of papers is the source code; one of these days, I hope to publish it on github.) The code was 2 or 3 times as compact at some existing code that used tools such as yacc/lex (bison/flex).
For something like cdecl -- where you're translating between char ** const * const x and declare x as const pointer to const pointer to pointer to char, compiling/decompiling makes sense. But what does it mean to translate from arbitrary machine code to source code? Even translating between some IR and source code doesn't seem to make a lot of sense.
This question needs to be much more precise, as we don't know what a "compiler" is (an extraneous-information-dumping transformation from a graph - the program in language 1 - to another graph - the algorithmically equivalent graph in language 2, I suppose). It also not clear what "no-extra logical bits implies". If yo get rid of these, what kind of compilers can you still build?
Seen this way, compilation looks like pure deduction (Prolog running forward, or CHR) while decompilation looks like possibly very hard search (you will get a program among the gazillion possible ones but it won't be pleasant too look at and in no way resemble the one you had earlier). Someone who as a toolbox of theorems freshly in his mind can certainly say more.
But I would say not automagically, no. For one, there will be no guarantee that an infinite "recursion on the left" loop won't appear when "decompiling".

Simple tips for Haskell performance increases (on ProjectEuler problems)?

I'm new to programming and learning Haskell by reading and working through Project Euler problems. Of course, the most important thing one can do to improve performance on these problems is to use a better algorithm. However, it is clear to me that there are other simple and easy to implement ways to improve performance. A cursory search brought up this question, and this question, which give the following tips:
Use the ghc flags -O2 and -fllvm.
Use the type Int, instead of Integer, because it is unboxed (or even Integer instead of Int64). This requires typing the functions, not letting the compiler decide on the fly.
Use rem, not mod, for division testing.
Use Schwartzian transformations when appropriate.
Using an accumulator in recursive functions (a tail-recursion optimization, I believe).
Memoization (?)
(One answer also mentions worker/wrapper transformation, but that seems fairly advanced.)
Question: What other simple optimizations can one make in Haskell to improve performance on Project Euler-style problems? Are there any other Haskell-specific (or functional programming specific?) ideas or features that could be used to help speed up solutions to Project Euler problems? Conversely, what should one watch out for? What are some common yet inefficient things to be avoided?
Here are some good slides by Johan Tibell that I frequently refer to:
Haskell Performance Patterns
One easy suggestion is to use hlint which is a program that checks your source code and makes suggestions for improvements syntax wise. This might not increase speed because most likely it's already done by the compiler or the lazy evaluation. But it might help the compiler in some cases. Further more it will make you a better Haskell programmer since you will learn better ways to do things, and it might be easier to understand your program and analyze it.
examples taken from http://community.haskell.org/~ndm/darcs/hlint/hlint.htm such as:
darcs-2.1.2\src\CommandLine.lhs:94:1: Error: Use concatMap
Found:
concat $ map escapeC s
Why not:
concatMap escapeC s
and
darcs-2.1.2\src\Darcs\Patch\Test.lhs:306:1: Error: Use a more efficient monadic variant
Found:
mapM (delete_line (fn2fp f) line) old
Why not:
mapM_ (delete_line (fn2fp f) line) old
I think the largest increases you can do in Project Euler problems is to understand the problem and remove unnecessary computations. Even if you don't understand everything you can do some small fixes which will make your program run twice the speed. Let's say you are looking for primes up to 1.000.000, then you of course can do filter isPrime [1..1000000]. But if you think a bit, then you can realize that well, no even number above is a prime, there you have removed (about) half the work. Instead doing [1,2] ++ filter isPrime [3,5..999999]
There is a fairly large section of the Haskell wiki about performance.
One fairly common problem is too little (or too much) strictness (this is covered by the sections listed in the General techniques section of the performance page above). Too much laziness causes a large number of thunks to be accumulated, too much strictness can cause too much to be evaluated.
These considerations are especially important when writing tail recursive functions (i.e. those with an accumulator); And, on that note, depending on how the function is used, a tail recursive function is sometimes less efficient in Haskell than the equivalent non-tail-recursive function, even with the optimal strictness annotations.
Also, as demonstrated by this recent question, sharing can make a huge difference to performance (in many cases, this can be considered a form of memoisation).
Project Euler is mostly about finding clever algorithmic solutions to the problems. Once you have the right algorithm, micro-optimization is rarely an issue, since even a straightforward or interpreted (e.g. Python or Ruby) implementation should run well within the speed constraints. The main technique you need is understanding lazy evaluation so you can avoid thunk buildups.

Better to use "and" or "in" when chaining "let" statements?

I realize this is probably a silly question, but...
If I'm chaining a bunch of let statements which do not need to know each other's values, is it better to use and or in?
For example, which of these is preferable, if any:
let a = "foo"
and b = "bar"
and c = "baz"
in
(* etc. *)
or
let a = "foo" in
let b = "bar" in
let c = "baz"
in
(* etc. *)
My intuition tells me the former ought to be "better" (by a very petty definition of "better") because it creates the minimum number of scopes necessary, whereas the latter is a scope-within-a-scope-within-a-scope which the compiler/interpreter takes care to note but is ultimately unimportant and needlessly deep.
My opinion is that in is better. The use of and implies that the definitions are mutually dependent on each other. I think it is better to be clear that this is not the case. On the other hand, some OCaml programmers do prefer and for very short definitions, where the slightly more compact notation can appear cleaner. This is especially true when you can fit the definitions on a single line:
let a = "foo" and b = "bar" in
The answer to the question "which is better?" only really makes sense when the interpretations do not differ. The ML family of languages (at least SML and OCaml) have both a parallel initialization form (and) and a sequential, essentially nested-scope form (in) because they are both useful in certain situations.
In your case the semantics are the same, so you are left to answer the question "what reads best to you?" and maybe (if this is not premature) "which might be executed more efficiently?" In your case the alternatives are:
For the and case: evaluate a bunch of strings and do a parallel binding to three identifiers.
For the in case: bind foo to a, then bind bar to b, then bind baz to c.
Which reads better? It is a toss up in this case because the outcome does not matter. You can poll some English speakers but I doubt there will be much preference. Perhaps a majority will like and as it separates bindings leaving the sole in before the expression.
As to what executes more efficiently, I think a good compiler will likely produce the same code because it can detect the order of binding will not matter. But perhaps you have a compiler that generates code for a multicore processor that does better with and. Or maybe a naive compiler which writes all the RHSs into temporary storage and then binds them, making the and case slower.
These are likely to be non-essential optimization concerns, especially since they involve bindings and are probably not in tight loops.
Bottom line: it's a true toss-up in this case; just make sure to use parallel vs. sequencial correctly when the outcome does matter.
I would say in is better because it reduces scope and better expresses intent. If I see all these defintions chained together in a shared scope manner I would be under the impression that it was done for a reason and would be looking for how they effect each other.
It's the same as the difference between let and let* in Lisp, I believe.
let* (similar in functionality to the in..in.. structure - the second structure in your question) is sometimes used to hide imperative programming since it guarantees sequential execution of expressions (see what Paul Graham had to say about let* in On Lisp).
So, I'd say the former construct is better. But the truth is, I think the latter is more common in the Ocaml programs I have seen. Probably just easier to use in letting one build on previously named expressions.

Pseudocode interpreter?

Like lots of you guys on SO, I often write in several languages. And when it comes to planning stuff, (or even answering some SO questions), I actually think and write in some unspecified hybrid language. Although I used to be taught to do this using flow diagrams or UML-like diagrams, in retrospect, I find "my" pseudocode language has components of C, Python, Java, bash, Matlab, perl, Basic. I seem to unconsciously select the idiom best suited to expressing the concept/algorithm.
Common idioms might include Java-like braces for scope, pythonic list comprehensions or indentation, C++like inheritance, C#-style lambdas, matlab-like slices and matrix operations.
I noticed that it's actually quite easy for people to recognise exactly what I'm triying to do, and quite easy for people to intelligently translate into other languages. Of course, that step involves considering the corner cases, and the moments where each language behaves idiosyncratically.
But in reality, most of these languages share a subset of keywords and library functions which generally behave identically - maths functions, type names, while/for/if etc. Clearly I'd have to exclude many 'odd' languages like lisp, APL derivatives, but...
So my questions are,
Does code already exist that recognises the programming language of a text file? (Surely this must be a less complicated task than eclipse's syntax trees or than google translate's language guessing feature, right?) In fact, does the SO syntax highlighter do anything like this?
Is it theoretically possible to create a single interpreter or compiler that recognises what language idiom you're using at any moment and (maybe "intelligently") executes or translates to a runnable form. And flags the corner cases where my syntax is ambiguous with regards to behaviour. Immediate difficulties I see include: knowing when to switch between indentation-dependent and brace-dependent modes, recognising funny operators (like *pointer vs *kwargs) and knowing when to use list vs array-like representations.
Is there any language or interpreter in existence, that can manage this kind of flexible interpreting?
Have I missed an obvious obstacle to this being possible?
edit
Thanks all for your answers and ideas. I am planning to write a constraint-based heuristic translator that could, potentially, "solve" code for the intended meaning and translate into real python code. It will notice keywords from many common languages, and will use syntactic clues to disambiguate the human's intentions - like spacing, brackets, optional helper words like let or then, context of how variables are previously used etc, plus knowledge of common conventions (like capital names, i for iteration, and some simplistic limited understanding of naming of variables/methods e.g containing the word get, asynchronous, count, last, previous, my etc). In real pseudocode, variable naming is as informative as the operations themselves!
Using these clues it will create assumptions as to the implementation of each operation (like 0/1 based indexing, when should exceptions be caught or ignored, what variables ought to be const/global/local, where to start and end execution, and what bits should be in separate threads, notice when numerical units match / need converting). Each assumption will have a given certainty - and the program will list the assumptions on each statement, as it coaxes what you write into something executable!
For each assumption, you can 'clarify' your code if you don't like the initial interpretation. The libraries issue is very interesting. My translator, like some IDE's, will read all definitions available from all modules, use some statistics about which classes/methods are used most frequently and in what contexts, and just guess! (adding a note to the program to say why it guessed as such...) I guess it should attempt to execute everything, and warn you about what it doesn't like. It should allow anything, but let you know what the several alternative interpretations are, if you're being ambiguous.
It will certainly be some time before it can manage such unusual examples like #Albin Sunnanbo's ImportantCustomer example. But I'll let you know how I get on!
I think that is quite useless for everything but toy examples and strict mathematical algorithms. For everything else the language is not just the language. There are lots of standard libraries and whole environments around the languages. I think I write almost as many lines of library calls as I write "actual code".
In C# you have .NET Framework, in C++ you have STL, in Java you have some Java libraries, etc.
The difference between those libraries are too big to be just syntactic nuances.
<subjective>
There has been attempts at unifying language constructs of different languages to a "unified syntax". That was called 4GL language and never really took of.
</subjective>
As a side note I have seen a code example about a page long that was valid as c#, Java and Java script code. That can serve as an example of where it is impossible to determine the actual language used.
Edit:
Besides, the whole purpose of pseudocode is that it does not need to compile in any way. The reason you write pseudocode is to create a "sketch", however sloppy you like.
foreach c in ImportantCustomers{== OrderValue >=$1M}
SendMailInviteToSpecialEvent(c)
Now tell me what language it is and write an interpreter for that.
To detect what programming language is used: Detecting programming language from a snippet
I think it should be possible. The approach in 1. could be leveraged to do this, I think. I would try to do it iteratively: detect the syntax used in the first line/clause of code, "compile" it to intermediate form based on that detection, along with any important syntax (e.g. begin/end wrappers). Then the next line/clause etc. Basically write a parser that attempts to recognize each "chunk". Ambiguity could be flagged by the same algorithm.
I doubt that this has been done ... seems like the cognitive load of learning to write e.g. python-compatible pseudocode would be much easier than trying to debug the cases where your interpreter fails.
a. I think the biggest problem is that most pseudocode is invalid in any language. For example, I might completely skip object initialization in a block of pseudocode because for a human reader it is almost always straightforward to infer. But for your case it might be completely invalid in the language syntax of choice, and it might be impossible to automatically determine e.g. the class of the object (it might not even exist). Etc.
b. I think the best you can hope for is an interpreter that "works" (subject to 4a) for your pseudocode only, no-one else's.
Note that I don't think that 4a,4b are necessarily obstacles to it being possible. I just think it won't be useful for any practical purpose.
Recognizing what language a program is in is really not that big a deal. Recognizing the language of a snippet is more difficult, and recognizing snippets that aren't clearly delimited (what do you do if four lines are Python and the next one is C or Java?) is going to be really difficult.
Assuming you got the lines assigned to the right language, doing any sort of compilation would require specialized compilers for all languages that would cooperate. This is a tremendous job in itself.
Moreover, when you write pseudo-code you aren't worrying about the syntax. (If you are, you're doing it wrong.) You'll wind up with code that simply can't be compiled because it's incomplete or even contradictory.
And, assuming you overcame all these obstacles, how certain would you be that the pseudo-code was being interpreted the way you were thinking?
What you would have would be a new computer language, that you would have to write correct programs in. It would be a sprawling and ambiguous language, very difficult to work with properly. It would require great care in its use. It would be almost exactly what you don't want in pseudo-code. The value of pseudo-code is that you can quickly sketch out your algorithms, without worrying about the details. That would be completely lost.
If you want an easy-to-write language, learn one. Python is a good choice. Use pseudo-code for sketching out how processing is supposed to occur, not as a compilable language.
An interesting approach would be a "type-as-you-go" pseudocode interpreter. That is, you would set the language to be used up front, and then it would attempt to convert the pseudo code to real code, in real time, as you typed. An interactive facility could be used to clarify ambiguous stuff and allow corrections. Part of the mechanism could be a library of code which the converter tried to match. Over time, it could learn and adapt its translation based on the habits of a particular user.
People who program all the time will probably prefer to just use the language in most cases. However, I could see the above being a great boon to learners, "non-programmer programmers" such as scientists, and for use in brainstorming sessions with programmers of various languages and skill levels.
-Neil
Programs interpreting human input need to be given the option of saying "I don't know." The language PL/I is a famous example of a system designed to find a reasonable interpretation of anything resembling a computer program that could cause havoc when it guessed wrong: see http://horningtales.blogspot.com/2006/10/my-first-pli-program.html
Note that in the later language C++, when it resolves possible ambiguities it limits the scope of the type coercions it tries, and that it will flag an error if there is not a unique best interpretation.
I have a feeling that the answer to 2. is NO. All I need to prove it false is a code snippet that can be interpreted in more than one way by a competent programmer.
Does code already exist that
recognises the programming language
of a text file?
Yes, the Unix file command.
(Surely this must be a less
complicated task than eclipse's syntax
trees or than google translate's
language guessing feature, right?) In
fact, does the SO syntax highlighter
do anything like this?
As far as I can tell, SO has a one-size-fits-all syntax highlighter that tries to combine the keywords and comment syntax of every major language. Sometimes it gets it wrong:
def median(seq):
"""Returns the median of a list."""
seq_sorted = sorted(seq)
if len(seq) & 1:
# For an odd-length list, return the middle item
return seq_sorted[len(seq) // 2]
else:
# For an even-length list, return the mean of the 2 middle items
return (seq_sorted[len(seq) // 2 - 1] + seq_sorted[len(seq) // 2]) / 2
Note that SO's highlighter assumes that // starts a C++-style comment, but in Python it's the integer division operator.
This is going to be a major problem if you try to combine multiple languages into one. What do you do if the same token has different meanings in different languages? Similar situations are:
Is ^ exponentiation like in BASIC, or bitwise XOR like in C?
Is || logical OR like in C, or string concatenation like in SQL?
What is 1 + "2"? Is the number converted to a string (giving "12"), or is the string converted to a number (giving 3)?
Is there any language or interpreter
in existence, that can manage this
kind of flexible interpreting?
On another forum, I heard a story of a compiler (IIRC, for FORTRAN) that would compile any program regardless of syntax errors. If you had the line
= Y + Z
The compiler would recognize that a variable was missing and automatically convert the statement to X = Y + Z, regardless of whether you had an X in your program or not.
This programmer had a convention of starting comment blocks with a line of hyphens, like this:
C ----------------------------------------
But one day, they forgot the leading C, and the compiler choked trying to add dozens of variables between what it thought was subtraction operators.
"Flexible parsing" is not always a good thing.
To create a "pseudocode interpreter," it might be necessary to design a programming language that allows user-defined extensions to its syntax. There already are several programming languages with this feature, such as Coq, Seed7, Agda, and Lever. A particularly interesting example is the Inform programming language, since its syntax is essentially "structured English."
The Coq programming language allows "syntax extensions", so the language can be extended to parse new operators:
Notation "A /\ B" := (and A B).
Similarly, the Seed7 programming language can be extended to parse "pseudocode" using "structured syntax definitions." The while loop in Seed7 is defined in this way:
syntax expr: .while.().do.().end.while is -> 25;
Alternatively, it might be possible to "train" a statistical machine translation system to translate pseudocode into a real programming language, though this would require a large corpus of parallel texts.

Reasoning about performance in Haskell

The following two Haskell programs for computing the n'th term of the Fibonacci sequence have greatly different performance characteristics:
fib1 n =
case n of
0 -> 1
1 -> 1
x -> (fib1 (x-1)) + (fib1 (x-2))
fib2 n = fibArr !! n where
fibArr = 1:1:[a + b | (a, b) <- zip fibArr (tail fibArr)]
They are very close to mathematically identical, but fib2 uses the list notation to memoize its intermediate results, while fib1 has explicit recursion. Despite the potential for the intermediate results to be cached in fib1, the execution time gets to be a problem even for fib1 25, suggesting that the recursive steps are always evaluated. Does referential transparency contribute anything to Haskell's performance? How can I know ahead of time if it will or won't?
This is just an example of the sort of thing I'm worried about. I'd like to hear any thoughts about overcoming the difficulty inherent in reasoning about the performance of a lazily-executed, functional programming language.
Summary: I'm accepting 3lectrologos's answer, because the point that you don't reason so much about the language's performance, as about your compiler's optimization, seems to be extremely important in Haskell - more so than in any other language I'm familiar with. I'm inclined to say that the importance of the compiler is the factor that differentiates reasoning about performance in lazy, functional langauges, from reasoning about the performance of any other type.
Addendum: Anyone happening on this question may want to look at the slides from Johan Tibell's talk about high performance Haskell.
In your particular Fibonacci example, it's not very hard to see why the second one should run faster (although you haven't specified what f2 is).
It's mainly an algorithmic issue:
fib1 implements the purely recursive algorithm and (as far as I know) Haskell has no mechanism for "implicit memoization".
fib2 uses explicit memoization (using the fibArr list to store previously computed values.
In general, it's much harder to make performance assumptions for a lazy language like Haskell, than for an eager one. Nevertheless, if you understand the underlying mechanisms (especially for laziness) and gather some experience, you will be able to make some "predictions" about performance.
Referential transparency increases (potentially) performance in (at least) two ways:
First, you (as a programmer) can be sure that two calls to the same function will always return the same, so you can exploit this in various cases to benefit in performance.
Second (and more important), the Haskell compiler can be sure for the above fact and this may enable many optimizations that can't be enabled in impure languages (if you've ever written a compiler or have any experience in compiler optimizations you are probably aware of the importance of this).
If you want to read more about the reasoning behind the design choices (laziness, pureness) of Haskell, I'd suggest reading this.
Reasoning about performance is generally hard in Haskell and lazy languages in general, although not impossible. Some techniques are covered in Chris Okasaki's Purely Function Data Structures (also available online in a previous version).
Another way to ensure performance is to fix the evaluation order, either using annotations or continuation passing style. That way you get to control when things are evaluated.
In your example you might calculate the numbers "bottom up" and pass the previous two numbers along to each iteration:
fib n = fib_iter(1,1,n)
where
fib_iter(a,b,0) = a
fib_iter(a,b,1) = a
fib_iter(a,b,n) = fib_iter(a+b,a,n-1)
This results in a linear time algorithm.
Whenever you have a dynamic programming algorithm where each result relies on the N previous results, you can use this technique. Otherwise you might have to use an array or something completely different.
Your implementation of fib2 uses memoization but each time you call fib2 it rebuild the "whole" result. Turn on ghci time and size profiling:
Prelude> :set +s
If it was doing memoisation "between" calls the subsequent calls would be faster and use no memory. Call fib2 20000 twice and see for yourself.
By comparison a more idiomatic version where you define the exact mathematical identity:
-- the infinite list of all fibs numbers.
fibs = 1 : 1 : zipWith (+) fibs (tail fibs)
memoFib n = fibs !! n
actually do use memoisation, explicit as you see. If you run memoFib 20000 twice you'll see the time and space taken the first time then the second call is instantaneous and take no memory. No magic and no implicit memoization like a comment might have hinted at.
Now about your original question: optimizing and reasoning about performance in Haskell...
I wouldn't call myself an expert in Haskell, I have only been using it for 3 years, 2 of which at my workplace but I did have to optimize and get to understand how to reason somewhat about its performance.
As mentionned in other post laziness is your friend and can help you gain performance however YOU have to be in control of what is lazily evaluated and what is strictly evaluated.
Check this comparison of foldl vs foldr
foldl actually stores "how" to compute the value i.e. it is lazy. In some case you saves time and space beeing lazy, like the "infinite" fibs. The infinite "fibs" doesn't generate all of them but knows how. When you know you will need the value you might as well just get it "strictly" speaking... That's where strictness annotation are usefull, to give you back control.
I recall reading many times that in lisp you have to "minimize" consing.
Understanding what is stricly evaluated and how to force it is important but so is understanding how much "trashing" you do to the memory. Remember Haskell is immutable, that means that updating a "variable" is actually creating a copy with the modification. Prepending with (:) is vastly more efficient than appending with (++) because (:) does not copy memory contrarily to (++). Whenever a big atomic block is updated (even for a single char) the whole block needs to be copied to represent the "updated" version. The way you structure data and update it can have a big impact on performance. The ghc profiler is your friend and will help you spot these. Sure the garbage collector is fast but not having it do anything is faster!
Cheers
Aside from the memoization issue, fib1 also uses non-tailcall recursion. Tailcall recursion can be re-factored automatically into a simple goto and perform very well, but the recursion in fib1 cannot be optimized in this way, because you need the stack frame from each instance of fib1 in order to calculate the result. If you rewrote fib1 to pass a running total as an argument, thus allowing a tail call instead of needing to keep the stack frame for the final addition, the performance would improve immensely. But not as much as the memoized example, of course :)
Since allocation is a major cost in any functional language, an important part of understanding performance is to understand when objects are allocated, how long they live, when they die, and when they are reclaimed. To get this information you need a heap profiler. It's an essential tool, and luckily GHC ships with a good one.
For more information, read Colin Runciman's papers.

Resources