Efficiency of purely functional programming - algorithm

Does anyone know what is the worst possible asymptotic slowdown that can happen when programming purely functionally as opposed to imperatively (i.e. allowing side-effects)?
Clarification from comment by itowlson: is there any problem for which the best known non-destructive algorithm is asymptotically worse than the best known destructive algorithm, and if so by how much?

According to Pippenger [1996], when comparing a Lisp system that is purely functional (and has strict evaluation semantics, not lazy) to one that can mutate data, an algorithm written for the impure Lisp that runs in O(n) can be translated to an algorithm in the pure Lisp that runs in O(n log n) time (based on work by Ben-Amram and Galil [1992] about simulating random access memory using only pointers). Pippenger also establishes that there are algorithms for which that is the best you can do; there are problems which are O(n) in the impure system which are Ω(n log n) in the pure system.
There are a few caveats to be made about this paper. The most significant is that it does not address lazy functional languages, such as Haskell. Bird, Jones and De Moor [1997] demonstrate that the problem constructed by Pippenger can be solved in a lazy functional language in O(n) time, but they do not establish (and as far as I know, no one has) whether or not a lazy functional language can solve all problems in the same asymptotic running time as a language with mutation.
The problem constructed by Pippenger requires Ω(n log n) is specifically constructed to achieve this result, and is not necessarily representative of practical, real-world problems. There are a few restrictions on the problem that are a bit unexpected, but necessary for the proof to work; in particular, the problem requires that results are computed on-line, without being able to access future input, and that the input consists of a sequence of atoms from an unbounded set of possible atoms, rather than a fixed size set. And the paper only establishes (lower bound) results for an impure algorithm of linear running time; for problems that require a greater running time, it is possible that the extra O(log n) factor seen in the linear problem may be able to be "absorbed" in the process of extra operations necessary for algorithms with greater running times. These clarifications and open questions are explored briefly by Ben-Amram [1996].
In practice, many algorithms can be implemented in a pure functional language at the same efficiency as in a language with mutable data structures. For a good reference on techniques to use for implementing purely functional data structures efficiently, see Chris Okasaki's "Purely Functional Data Structures" [Okasaki 1998] (which is an expanded version of his thesis [Okasaki 1996]).
Anyone who needs to implement algorithms on purely-functional data structures should read Okasaki. You can always get at worst an O(log n) slowdown per operation by simulating mutable memory with a balanced binary tree, but in many cases you can do considerably better than that, and Okasaki describes many useful techniques, from amortized techniques to real-time ones that do the amortized work incrementally. Purely functional data structures can be a bit difficult to work with and analyze, but they provide many benefits like referential transparency that are helpful in compiler optimization, in parallel and distributed computing, and in implementation of features like versioning, undo, and rollback.
Note also that all of this discusses only asymptotic running times. Many techniques for implementing purely functional data structures give you a certain amount of constant factor slowdown, due to extra bookkeeping necessary for them to work, and implementation details of the language in question. The benefits of purely functional data structures may outweigh these constant factor slowdowns, so you will generally need to make trade-offs based on the problem in question.
References
Ben-Amram, Amir and Galil, Zvi 1992. "On Pointers versus Addresses" Journal of the ACM, 39(3), pp. 617-648, July 1992
Ben-Amram, Amir 1996. "Notes on Pippenger's Comparison of Pure and Impure Lisp" Unpublished manuscript, DIKU, University of Copenhagen, Denmark
Bird, Richard, Jones, Geraint, and De Moor, Oege 1997. "More haste, less speed: lazy versus eager evaluation" Journal of Functional Programming 7, 5 pp. 541–547, September 1997
Okasaki, Chris 1996. "Purely Functional Data Structures" PhD Thesis, Carnegie Mellon University
Okasaki, Chris 1998. "Purely Functional Data Structures" Cambridge University Press, Cambridge, UK
Pippenger, Nicholas 1996. "Pure Versus Impure Lisp" ACM Symposium on Principles of Programming Languages, pages 104–109, January 1996

There are indeed several algorithms and data structures for which no asymptotically efficient purely functional solution (t.i. one implementable in pure lambda calculus) is known, even with laziness.
The aforementioned union-find
Hash tables
Arrays
Some graph algorithms
...
However, we assume that in "imperative" languages access to memory is O(1) whereas in theory that can't be so asymptotically (i.e. for unbounded problem sizes) and access to memory within a huge dataset is always O(log n), which can be emulated in a functional language.
Also, we must remember that actually all modern functional languages provide mutable data, and Haskell even provides it without sacrificing purity (the ST monad).

This article claims that the known purely functional implementations of the union-find algorithm all have worse asymptotic complexity than the one they publish, which has a purely functional interface but uses mutable data internally.
The fact that other answers claim that there can never be any difference and that for instance, the only "drawback" of purely functional code is that it can be parallelized gives you an idea of the informedness/objectivity of the functional programming community on these matters.
EDIT:
Comments below point out that a biased discussion of the pros and cons of pure functional programming may not come from the “functional programming community”. Good point. Perhaps the advocates I see are just, to cite a comment, “illiterate”.
For instance, I think that this blog post is written by someone who could be said to be representative of the functional programming community, and since it's a list of “points for lazy evaluation”, it would be a good place to mention any drawback that lazy and purely functional programming might have. A good place would have been in place of the following (technically true, but biased to the point of not being funny) dismissal:
If strict a function has O(f(n)) complexity in a strict language then it has complexity O(f(n)) in a lazy language as well. Why worry? :)

With a fixed upper bound on memory usage, there should be no difference.
Proof sketch:
Given a fixed upper bound on memory usage, one should be able to write a virtual machine which executes an imperative instruction set with the same asymptotic complexity as if you were actually executing on that machine. This is so because you can manage the mutable memory as a persistent data structure, giving O(log(n)) read and writes, but with a fixed upper bound on memory usage, you can have a fixed amount of memory, causing these to decay to O(1). Thus the functional implementation can be the imperative version running in the functional implementation of the VM, and so they should both have the same asymptotic complexity.

Related

Does time complexity differ of the implementation of algorithms in different programming languages?

Are not algorithms supposed to have the same time complexity across any programming language, then why do we consider such programming language differences in time complexity such as pass by value or pass by reference in the arguments when finding the total time complexity of the algorithm? Or am I wrong in saying we should not consider such programming language differences in implementation when finding the time complexity of an algorithm? I cannot think of any other instance as of now, but for example, if there was an algorithm in C++ that either passed its arguments by reference or by value, would there be difference in time complexity? If so, why? CLRS, the famous book about algorithms never discusses this point, so I am unsure if I should consider this point when finding time complexity.
I think what you are refer to is not language but compiler differencies... This is how I see it (however take in mind I am no expert in the matter)
Yes complexity of some source code ported to different langauges should be the same (if language capabilities allows it). The problem is compiler implementation and conversion to binary (or asm if you want). Each language usually adds its engine to the program that is handling the stuff like variables, memory allocation, heap/stack ,... Its similar to OS. This stuff is hidden but it have its own complexities lets call them hidden therms of complexity
For example using reference instead of direct operand in function have big impact on speed. Because while we are coding we often consider only complexity of the code and forgetting about the heap/stack. This however does not change complexity, its just affect n of the hidden therms of complexity (but heavily).
The same goes for compilers specially designed for recursion (functional programing) like LISP. In those the iteration is much slower or even not possible but recursion is way faster than on standard compilers... These I think change complexity but only of the hidden parts (related to language engine)
Some languages use internally bignums like Python. These affect complexity directly as arithmetics on CPU native datatypes is often considered O(1) however on Bignums its no longer true and can be O(1),O(log(n)),O(n),O(n.log),O(n^2) and worse depending on the operation. However while comparing to different language on the same small range the bignums are also considered O(1) however usually much slower than CPU native datatype.
Interpreted and virtual machine languages like Python,JAVA,LUA are adding much bigger overhead as they hidden therms usually not only contain langue engine but also the interpreter which must decode code, then interpret it or emulate or what ever which changes the hidden therms of complexity greatly. Not precompiled interpreters are even worse as you first need to parse text which is way much slower...
If I put all together its a matter of perspective if you consider hidden therms of complexity as constant time or complexity. In most cases the average speed difference between language/compilers/interpreters is constant (so you can consider it as constant time so no complexity change) but not always. To my knowledge the only exception is functional programming where the difference is almost never constant ...

Does the choice of compiler and language affect Time Complexity?

When people speak about the time complexity involved with solving mathematical problems using computer science strategies, does the choice of compiler, computer, or language have an affect on the times a single equation might produce? would running an algorithm in assembly on an x86 machine produce a faster result than creating the same formula in java on a x64 machine?
Edit: this question sort of branches off from the original, if the choice of compiler and language don't matter, then is an algorithm itself the single determining factor of its time complexity?
The whole point of asymptotic analysis is so that we don't have to consider the nuances introduced by varying compilers, architectures, implementations, etc. As long as the algorithm is implemented a way that follows the assumptions used in the time analysis, other factors can be safely ignored.
I'm primarily talking about non-trivial algorithms though. For example, I might have the code:
for(int i=0; i<N; i++){}
Standard analysis would say that this snippet of code has a runtime of O(N). Yet a good compiler would realize that this is merely a nop and optimize it away, leaving you with an O(1). However, compilers are not smart enough (yet) to make any asymptotically significant optimizations for anything non-trivial.
An example where certain analysis assumptions do not hold is when you don't have random access memory. So you have to be sure that your programming platform meets all these assumptions. Otherwise, a different analysis will have to performed.
Except if you use some exotic language such as brainfuck which does not feature built-ins for common operations (+,-,/,*,…), time complexity does not depend on the language.
But space complexity depends on the compiler in case you use tail-rec functions.

What makes people think that NNs have more computational power than existing models?

I've read in Wikipedia that neural-network functions defined on a field of arbitrary real/rational numbers (along with algorithmic schemas, and the speculative `transrecursive' models) have more computational power than the computers we use today. Of course it was a page of russian wikipedia (ru.wikipedia.org) and that may be not properly proven, but that's not the only source of such.. rumors
Now, the thing that I really do not understand is: How can a string-rewriting machine (NNs are exactly string-rewriting machines just as Turing machines are; only programming language is different) be more powerful than a universally capable U-machine?
Yes, the descriptive instrument is really different, but the fact is that any function of such class can be (easily or not) turned to be a legal Turing-machine. Am I wrong? Do I miss something important?
What is the cause of people saying that? I do know that the fenomenum of undecidability is widely accepted today (though not consistently proven according to what I've read), but I do not really see a smallest chance of NNs being able to solve that particular problem.
Add-in: Not consistently proven according to what I've read - I meant that you might want to take a look at A. Zenkin's (russian mathematician) papers after mid-90-s where he persuasively postulates the wrongness of G. Cantor's concepts, including transfinite sets, uncountable sets, diagonalization method (method used in the proof of undecidability by Turing) and maybe others. Even Goedel's incompletness theorems were proven in right way in only 21-st century.. That's all just to plug Zenkin's work to the post cause I don't know how widespread that knowledge is in CS community so forgive me if that did look stupid.
Thank you!
From what little research I've done, most of these claims of trans-Turing systems, or of the incorrectness of Cantor's diagonalization proof, etc. are, shall we say, "controversial" in legitimate mathematical circles. Words like "crank" get thrown around frequently.
Obviously, the strong Church-Turing thesis remains unproven, but as you pointed out there's really no good reason to believe that artificial neural networks constitute computational capabilities beyond general recursion/UTMs/lambda calculus/etc.
From a theoretical viewpoint, I think you're absolutely correct -- neural networks provide very little that's new or different.
From a practical viewpoint, neural networks are simply a way of casting solutions into a form where parallel execution is natural and easy, whereas Turing machines are sequential in nature, and executing their sequences in parallel is relatively difficult. In fact, most of what's been done in CPU development over the last few decades has basically been figuring out ways to execute code in parallel while maintaining the illusion that it's executing in sequence. A lot of the hardware in a modern CPU is devoted to maintaining that illusion, and the degree to which parallel execution has become explicit is mostly an admission that maintaining the illusion has become prohibitively expensive.
Anyone who "proves" that Cantor's diagonal method doesn't work proves only their own incompetence. Cf. Wilfred Hodges' An editor recalls some hopeless papers for a surprisingly sympathetic explanation of what kind of thing is going wrong with these attempts.
You can provide speculative descriptions of hyper-Turing neural nets, just as you can provide speculative descriptions of other kinds of hyper-Turing computers: there's nothing incoherent in the idea that hypercomputation is possible, and speculative descriptions of mechanical hypercomputers have been made where the hypercomputer is stipulated to have infinitely fine engravings that encode an oracle for the Halting machine: the existence of such a machine is consistent with Newtonian mechanics, though not quantum mechanics. Rather, the Church-Turing thesis says that they can't be constructed, and there are two reasons to believe the Church-Turing thesis is correct:
No such machines have ever been constructed; and
There's work been done connecting models of physics to models of computation, going back to work in the early 1970s by Robin Gandy, with recent work by people such as David Deutsch (e.g., Machines, Logic and Quantum Physics and John Tucker (e.g., Computations via experiments with kinematic systems) which argues that physics doesn't support hypercomputation.
The main point is that the truth of the Church-Turing thesis is an empirical fact, and not a mathematical fact. It's one that we can have confidence is true, but not certainty.
From a layman's perspective, I see that
NNs can be more effective at solving some types problems than a turing machine, but they are not compuationally more powerful.
Even if NNs were provably more powerful than TMs, execution on current hardware would render them less powerful, since current hardware is only an apporximation of a TM and can only execute problems computable by a bounded TM.
You may be interested in S. Franklin and M. Garzon, Neural computability. There is a preview on Google. It discusses the computational power of neural nets and also states that it is rumored that neural nets are strictly more powerful than Turing machines.

Do you use Big-O complexity evaluation in the 'real world'?

Recently in an interview I was asked several questions related to the Big-O of various algorithms that came up in the course of the technical questions. I don't think I did very well on this... In the ten years since I took programming courses where we were asked to calculate the Big-O of algorithms I have not have one discussion about the 'Big-O' of anything I have worked on or designed. I have been involved in many discussions with other team members and with the architects I have worked with about the complexity and speed of code, but I have never been part of a team that actually used Big-O calculations on a real project. The discussions are always "is there a better or more efficient way to do this given our understanding of out data?" Never "what is the complexity of this algorithm"?
I was wondering if people actually have discussions about the "Big-O" of their code in the real word?
It's not so much using it, it's more that you understand the implications.
There are programmers who do not realise the consequence of using an O(N^2) sorting algorithm.
I doubt many apart from those working in academia would use Big-O Complexity Analysis in anger day-to-day.
No needless n-squared
In my experience you don't have many discussions about it, because it doesn't need discussing. In practice, in my experience, all that ever happens is you discover something is slow and see that it's O(n^2) when in fact it could be O(n log n) or O(n), and then you go and change it. There's no discussion other than "that's n-squared, go fix it".
So yes, in my experience you do use it pretty commonly, but only in the basest sense of "decrease the order of the polynomial", and not in some highly tuned analysis of "yes, but if we switch to this crazy algorithm, we'll increase from logN down to the inverse of Ackerman's function" or some such nonsense. Anything less than a polynomial, and the theory goes out the window and you switch to profiling (e.g. even to decide between O(n) and O(n log n), measure real data).
Big-O notation is rather theoretical, while in practice, you are more interested in actual profiling results which give you a hard number as to how your performance is.
You might have two sorting algorithms which by the book have O(n^2) and O(nlogn) upper bounds, but profiling results might show that the more efficient one might have some overhead (which is not reflected in the theoretical bound you found for it) and for the specific problem set you are dealing with, you might choose the theoretically-less-efficient sorting algorithm.
Bottom line: in real life, profiling results usually take precedence over theoretical runtime bounds.
I do, all the time. When you have to deal with "large" numbers, typically in my case: users, rows in database, promotion codes, etc., you have to know and take into account the Big-O of your algorithms.
For example, an algorithm that generates random promotion codes for distribution could be used to generate billions of codes... Using a O(N^2) algorithm to generate unique codes means weeks of CPU time, whereas a O(N) means hours.
Another typical example is queries in code (bad!). People look up a table then perform a query for each row... this brings up the order to N^2. You can usually change the code to use SQL properly and get orders of N or NlogN.
So, in my experience, profiling is useful ONLY AFTER the correct class of algorithms is used. I use profiling to catch bad behaviours like understanding why a "small" number bound application under-performs.
The answer from my personal experience is - No. Probably the reason is that I use only simple, well understood algorithms and data structures. Their complexity analysis is already done and published, decades ago. Why we should avoid fancy algorithms is better explained by Rob Pike here. In short, a practitioner almost never have to invent new algorithms and as a consequence almost never have to use Big-O.
Well that doesn't mean that you should not be proficient in Big-O. A project might demand the design and analysis of an altogether new algorithm. For some real-world examples, please read the "war stories" in Skiena's The Algorithm Design Manual.
To the extent that I know that three nested for-loops are probably worse than one nested for-loop. In other words, I use it as a reference gut feeling.
I have never calculated an algorithm's Big-O outside of academia. If I have two ways to approach a certain problem, if my gut feeling says that one will have a lower Big-O than the other one, I'll probably instinctively take the smaller one, without further analysis.
On the other hand, if I know for certain the size of n that comes into my algorithm, and I know for certain it to be relatively small (say, under 100 elements), I might take the most legible one (I like to know what my code does even one month after it has been written). After all, the difference between 100^2 and 100^3 executions is hardly noticeable by the user with today's computers (until proven otherwise).
But, as others have pointed out, the profiler has the last and definite word: If the code I write executes slowly, I trust the profiler more than any theoretical rule, and fix accordingly.
I try to hold off optimizations until profiling data proves they are needed. Unless, of course, it is blatently obvious at design time that one algorithm will be more efficient than the other options (without adding too much complexity to the project).
Yes, I use it. And no, it's not often "discussed", just like we don't often discuss whether "orderCount" or "xyz" is a better variable name.
Usually, you don't sit down and analyze it, but you develop a gut feeling based on what you know, and can pretty much estimate the O-complexity on the fly in most cases.
I typically give it a moment's thought when I have to perform a lot of list operations. Am I doing any needless O(n^2) complexity stuff, that could have been done in linear time? How many passes am I making over the list? It's not something you need to make a formal analysis of, but without knowledge of big-O notation, it becomes a lot harder to do accurately.
If you want your software to perform acceptably on larger input sizes, then you need to consider the big-O complexity of your algorithms, formally or informally. Profiling is great for telling you how the program performs now, but if you're using a O(2^n) algorithm, your profiler will tell you that everything is just fine as long as your input size is tiny. And then your input size grows, and runtime explodes.
People often dismiss big-O notation as "theoretical", or "useless", or "less important than profiling". Which just indicates that they don't understand what big-O complexity is for. It solves a different problem than a profiler does. Both are essential in writing software with good performance. But profiling is ultimately a reactive tool. It tells you where your problem is, once the problem exists.
Big-O complexity proactively tells you which parts of your code are going to blow up if you run it on larger inputs. A profiler can not tell you that.
No. I don't use Big-O complexity in 'real world' situations.
My view on the whole issue is this - (maybe wrong.. but its just my take.)
The Big-O complexity stuff is ultimately to understand how efficient an algorithm is. If from experience or by other means, you understand the algorithms you are dealing with, and are able to use the right algo in the right place, thats all that matters.
If you know this Big-O stuff and are able to use it properly, well and good.
If you don't know to talk about algos and their efficiencies in the mathematical way - Big-O stuff, but you know what really matters - the best algo to use in a situation - thats perfectly fine.
If you don't know either, its bad.
Although you rarely need to do deep big-o analysis of a piece of code, it's important to know what it means and to be able to quickly evaluate the complexity of the code you're writing and the consequences it might have.
At development time you often feel like it's "good enough". Eh, no-one will ever put more than 100 elements in this array right ? Then, one day, someone will put 1000 elements in the array (trust users on that: if the code allows it, one of them will do it). And that n^2 algorithm that was good enough now is a big performance problem.
It's sometimes usefull the other way around: if you know that you functionaly have to make n^2 operations and the complexity of your algorithm happens to be n^3, there might be something you can do about it to make it n^2. Once it's n^2, you'll have to work on smaller optimizations.
In the contrary, if you just wrote a sorting algorithm and find out it has a linear complexity, you can be sure that there's a problem with it. (Of course, in real life, occasions were you have to write your own sorting algorithm are rare, but I once saw someone in an interview who was plainly satisfied with his one single for loop sorting algorithm).
Yes, for server-side code, one bottle-neck can mean you can't scale, because you get diminishing returns no matter how much hardware you throw at a problem.
That being said, there are often other reasons for scalability problems, such as blocking on file- and network-access, which are much slower than any internal computation you'll see, which is why profiling is more important than BigO.

Can any algorithmic puzzle be implemented in a purely functional way?

I've been contemplating programming language designs, and from the definition of Declarative Programming on Wikipedia:
This is in contrast from imperative programming, which requires a detailed description of the algorithm to be run.
and further down:
... Any style of programming that is not imperative. ...
It then goes on to express that functional languages, because they are not imperative, are declarative by their very nature.
However, this makes me wonder, are purely functional programming languages able to solve any algorithmic problem, or are the constraints based upon what functions are available in that language?
I'm mostly interested in general thoughts on the subject, although if specific examples can illustrate the point, I certainly welcome them.
According to the Church-Turing Thesis ,
the three computational processes (recursion, λ-calculus, and Turing machine) were shown to be equivalent"
where Turing machine can be read as "procedural" and lambda calculus as "functional".
Yes, Haskell, Erlang, etc. are Turing complete languages. In principle, you don't need mutable state to solve a problem, since you can always create a new object instead of mutating the old one. Of course, Brainfuck is also Turing complete. In other words, just because an algorithm can be expressed in a functional language doesn't mean it's not horribly awkward.
OK, so Church and Turing provied it is possible, but how do we actually do something?
Rewriting imperative code in pure functional style is an exercise I frequently assign to undergraduate students:
Each mutable variable becomes a function parameter
Loops are rewritten using recursion
Each goto is expressed as a function call with arguments
Sometimes what comes out is a mess, but often the results are surprisingly elegant. The only real trick is not to pass arguments that never change, but instead to let-bind them in the outer environment.
The big difference with functional style programming is that it avoids mutable state. Where imperative programming will typically update variables, functional programming will define new, read-only values.
The main place where this will hit performance is with algorithms that use updatable arrays. An imperative implementation can update an array element in O(1) time, while the best a purely functional style of implementation can achieve is O(log N) (using a sorted tree).
Note that functional languages generally have some way to use updateable arrays with O(1) access time (e.g., Haskell provides this with its state transformer monad). However, this is arguably an imperative programming method... nothing wrong with that; you want to use the best tools for a particular job, after all.
The functional style of O(log N) incremental array update is not all bad, though, as functional style algorithms seem to lend themselves well to parallellization.
Too long to be posted as a comment on #SteveB's answer.
Functional programming and imperative programming have equal capability: whatever one can do, the other can do. They are said to be Turing complete. The functions that a Turing machine can compute are exactly the ones that recursive function theory and λ-calculus express.
But the Church-Turing Thesis, as such, is irrelevant. It asserts that any computation can be carried out by a Turing machine. This relates an informal idea - computation - to a formal one - the Turing machine. Nobody has yet found anything we would recognise as computation that a Turing machine can't do. Will someone find such a thing in future? Who can tell.
Using state monads you can program in an imperative style in Haskell.
So the assertion that Haskell is declarative by its very nature needs to be taken with a grain of salt. On the positive side it then is equivalent to imperative programming languages, also in a practical sense which doesn't completely ignore efficiency.
While I completely agree with the answer that invokes Church-Turing thesis, this begs an interesting question actually. If I have a parallel computation problem (which is not algorithmic in a strict mathematical sense), such as multiple producer/consumer queue or some network protocol between several machines, can this be adequately modeled by Turing machine? It can be simulated, but if we simulate it, we lose the purpose why we have the parallelism in the problem (because then we can find simpler algorithm on the Turing machine). So what if we were not to lose parallelism inherent to the problem (and thus the reason why are we interested in it), we couldn't remove the notion of state?
I remember reading somewhere that there are problems which are provably harder when solved in a purely functional manner, but I can't seem to find the reference.
As noted above, the primary problem is array updates. While the compiler may use a mutable array under the hood in some conditions, it must be guaranteed that only one reference to the array exists in the entire program.
Not only is this a hard mathematical fact, it is also a problem in practice, if you don't use impure constructs.
On a more subjective note, stating that all Turing complete languages are equivalent is only true in a narrow mathematical sense. Paul Graham explores the issue in Beating the Averages in the section "The Blub Paradox."
Formal results such as Turing-completeness may be provably correct, but they are not necessarily useful. The travelling salesman problem may be NP-complete, and yet salesman travel all the time. It seems they don't feel the need to follow an "optimal" path, so the theorem is irrelevant.
NOTE: I am not trying to bash functional programming, since I really like it. It is just important to remember that it is not a panacea.

Resources