Related
I've heard many anecdotes that a large problem with dynamically typed languages is that type checking is very slow. Why is it slow though? What is the computer science rational that using runtime assigned types that may change cause large slowdowns in computational efficiency?
Dynamically typed languages must perform type-checking while code is running. Although they can sometimes be compiled, they need to cut many corners for reasonable performance. One big drawback of checking at runtime is that if a type fails to be valid, the interpreter can only throw exceptions or stop execution.
So they often try to coerce types to prevent exceptions, even when it may be undesirable. In python, it isn't uncommon to discover that a simple division by whole integers means that my user output is suddenly full of '2.0' because I didn't explicitly cast back into int.
The computer science rational is that type-checking is an extremely heavy algorithm. For every function you call, all the types involved must be validated (or coerced which may be another function call), and type information must be updated afterwards. At runtime you can only afford to have a simple type system and very little optimization. A compiler by comparison can exploit even a weak type system to optimize your inefficient algorithms away.
It's very common for statically-typed languages to be compiled, and dynamically-typed languages to be interpreted. This is because if a language is being designed for a compiler, it's a no-brainer to give the responsibility of type-checking to the compiler so that your code will be more optimal and won't need to manage typing at runtime. The less you need to carry at runtime, the faster code will execute.
Ultimately, this means languages designed for interpreters can't afford the level of typing a compiler can. In addition to having less freedom to exploit type information to optimize - strike 1 to performance - they must carry and modify type information at runtime - strike 2. The weaker type system also introduces many type safety bugs.
Naturally, there are also numerous cases where weak typing is desirable. Dynamic languages often take the role of scripting; they're quick to code, easy to interpret, and can be ported to new platforms faster than a compiler! This makes them invaluable for gluing very different systems together. One script can interact with the operating system and many programs on it to schedule a daily download of all the latest cat videos from your favourite website.
As always, I highly recommend that you have a dynamic language and a static language in your repertoire. It's invaluable to have access to the guarantees of strong typing and access to the ease of weak typing. Be a code omnivore :)
Is it possible to design something like Ruby or Clojure without the significant performance loss in many situations compared with C/Java? Does hardware design play a role?
Edit: With significant I mean in an order of magnitudes, not just ten procent
Edit: I suspect that delnan is correct with me meaning dynamic languages so I changed the title
Performance depends on many things. Of course the semantics of the language have to be preserved even if we are compiling it - you can't remove dynamic dispatch from Ruby, it would speed things up drmatically but it would totally break 95% of the all Ruby code in the world. But still, much of the performance depends on how smart the implementation is.
I assume, by "high-level", you mean "dynamic"? Haskell and OCaml are extremely high-level, yet are is compiled natively and can outperform C# or Java, even C and C++ in some corner cases - especially if parallelism comes into play. And they certainly weren't designed with performance as #1 goal. But compiler writers, especially those focused onfunctional languages, are a very clever folk. If you or I started a high-level language, even if we used e.g. LLVM as backend for native compilation, we wouldn't get anywhere near this performance.
Making dynamic languages run fast is harder - they delay many decisions (types, members of a class/an object, ...) to runtime instead of compiletime, and while static code analysis can sometimes prove it's not possible in lines n and m, you still have to carry an advanced runtime around and do quite a few things a static language's compiler can do at compiletime. Even dynamic dispatch can be optimized with a smarter VM (Inline Cache anyone?), but it's a lot of work. More than a small new-fangeled language could do, that is.
Also see Steve Yegge's Dynamic Languages Strike Back.
And of course, what is a significant peformance loss? 100 times slower than C reads like a lot, but as we all know, 80% of execution time is spent in 20% of the code = 80% of the code won't have notable impact on the percieved performance of the whole program. For the remaining 20%, you can always rewrite it in C or C++ and call it from the dynamic language. For many applications, this suffices (for some, you don't even need to optimize). For the rest... well, if performance is that critical, you should propably write it in a language designed for performance.
Don't confuse the language design with the platform that it runs on.
For instance, Java is a high-level language. It runs on the JVM (as does Clojure - identified above, and JRuby - a Java version of Ruby). The JVM will perform byte-code analysis and optimise how the code runs (making use of escape analysis, just-in-time compilation etc.). So the platform has an effect on the performance that is largely independent of the language itself (see here for more info on Java performance and comparisons to C/C++)
Loss compared to what? If you need a garbage collector or closures then you need them, and you're going to pay the price regardless. If a language makes them easy for you to get at, that doesn't mean you have to use them when you don't need them.
If a language is interpreted instead of compiled, that's going to introduce an order of magnitude slowdown. But such a language may have compensating advantages, like ease of use, platform independence, and not having to compile. And, the programs you write in them may not run long enough for speed to be an issue.
There may be language implementations that introduce slowness for no good reason, but those don't have to be used.
You might want to look at what the DARPA HPCS initiative has come up with. There were 3 programming languages proposed: Sun's Fortress, IBM's X10 and Cray's Chapel. The latter two are still under development. Whether any of these meet your definition of high-level I don't know.
And yes, hardware design certainly does play a part. All 3 of these languages are targeted at supercomputers with very many processors and exhibit features appropriate to that domain.
It's certainly possible. For example, Objective-C is a dynamically-typed language that has performance comparable to C++ (although a wee bit slower, generally speaking, but still roughly equivalent).
Isn't every language compiled into low-level computer language?
If so, shouldn't all languages have the same performance?
Just wondering...
As pointed out by others, not every language is translated into machine language; some are translated into some form (bytecode, reverse Polish, AST) that is interpreted.
But even among languages that are translated to machine code,
Some translators are better than others
Some language features are easier to translate to high-performance code than others
An example of a translator that is better than some others is the GCC C compiler. It has had many years' work invested in producing good code, and its translations outperform those of the simpler compilers lcc and tcc, for example.
An example of a feature that is hard to translate to high-performance code is C's ability to do pointer arithmetic and to dereference pointers: when a program stores through a pointer, it is very difficult for the compiler to know what memory locations are affected. Similarly, when an unknown function is called, the compiler must make very pessimistic assumptions about what might happen to the contents of objects allocated on the heap. In a language like Java, the compiler can do a better job translating because the type system enforces greater separation between pointers of different types. In a language like ML or Haskell, the compiler can do better still, because in these languages, most data allocated in memory cannot be changed by a function call. But of course object-oriented languages and functional languages present their own translation challenges.
Finally, translation of a Turing-complete language is itself a hard problem: in general, finding the best translation of a program is an NP-hard problem, which means that the only solutions known potentially take time exponential in the size of the program. This would be unacceptable in a compiler (can't wait forever to compile a mere few thousand lines), and so compilers use heuristics. There is always room for improvement in these heuristics.
It is easier and more efficient to map some languages into machine language than others. There is no easy analogy that I can think of for this. The closest I can come to is translating Italian to Spanish vs. translating a Khoisan language into Hawaiian.
Another analogy is saying "Well, the laws of physics are what govern how every animal moves, so why do some animals move so much faster than others? Shouldn't they all just move at the same speed?".
No, some languages are simply interpreted. They never actually get turned into machine code. So those languages will generally run slower than low-level languages like C.
Even for the languages which are compiled into machine code, sometimes what comes out of the compiler is not the most efficient possible way to write that given program. So it's often possible to write programs in, say, assembly language that run faster than their C equivalents, and C programs that run faster than their JIT-compiled Java equivalents, etc. (Modern compilers are pretty good, though, so that's not so much of an issue these days)
Yes, all programs get eventually translated into machine code. BUT:
Some programs get translated during compilation, while others are translated on-the-fly by an interpreter (e.g. Perl) or a virtual machine (e.g. original Java)
Obviously, the latter is MUCH slower as you spend time on translation during running.
Different languages can be translated into DIFFERENT machine code. Even when the same programming task is done. So that machine code might be faster or slower depending on the language.
You should understand the difference between compiling (which is translating) and interpreting (which is simulating). You should also understand the concept of a universal basis for computation.
A language or instruction set is universal if it can be used to write an interpreter (or simulator) for any other language or instruction set. Most computers are electronic, but they can be made in many other ways, such as by fluidics, or mechanical parts, or even by people following directions. A good teaching exercise is to write a small program in BASIC and then have a classroom of students "execute" the program by following its steps. Since BASIC is universal (to a first approximation) you can use it to write a program that simulates the instruction set for any other computer.
So you could take a program in your favorite language, compile (translate) it into machine language for your favorite machine, have an interpreter for that machine written in BASIC, and then (in principle) have a class full of students "execute" it. In this way, it is first being reduced to an instruction set for a "fast" machine, and then being executed by a very very very slow "computer". It will still get the same answer, only about a trillion times slower.
Point being, the concept of universality makes all computers equivalent to each other, even though some are very fast and others are very slow.
No, some languages are run by a 'software interpreter' as byte code.
Also, it depends on what the language does in the background as well, so 2 identically functioning programs in different languages may have different mechanics behind the scenes and hence be actually running different instructions resulting in differing performance.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
Are dynamic languages slower than static languages because, for example, the run-time has to check the type consistently?
No.
Dynamic languages are not slower than static languages. In fact, it is impossible for any language, dynamic or not, to be slower than another language (or faster, for that matter), simply because a language is just a bunch of abstract mathematical rules. You cannot execute a bunch of abstract mathematical rules, therefore they cannot ever be slow(er) or fast(er).
The statement that "dynamic languages are slower than static languages" is not only wrong, it doesn't even make sense. If English were a typed language, that statement wouldn't even typecheck.
In order for a language to even be able to run, it has to be implemented first. Now you can measure performance, but you aren't measuring the performance of the language, you are measuring the performance of the execution engine. Most languages have many different execution engines, with very different performance characteristics. For C, for example, the difference between the fastest and slowest implementations is a factor of 100000 or so!
Also, you cannot really measure the performance of an execution engine, either: you have to write some code to run on that exection engine first. But now you aren't measuring the performance of the execution engine, you are measuring the performance of the benchmark code. Which has very little to do with the performance of the execution engine and certainly nothing to do with the performance of the language.
In general, running well-designed code on well-designed high-performance execution engines will yield about the same performance, independent of whether the language is static or dynamic, procedural, object-oriented or functional, imperative or declarative, lazy or strict, pure or impure.
In fact, I would propose that the performance of a system is solely dependent on the amount of money that was spent making it fast, and completely independent of any particular typing discipline, programming paradigm or language.
Take for example Smalltalk, Lisp, Java and C++. All of them are, or have at one point been, the language of choice for high-performance code. All of them have huge amounts of engineering and research man-centuries expended on them to make them fast. All of them have highly-tuned proprietary commercial high-performance execution engines available. Given roughly the same problem, implemented by roughly comparable developers, they all perform roughly the same.
Two of those languages are dynamic, two are static. Java is interesting, because although it is a static language, most modern high-performance implementations are actually dynamic implementations. (In fact, several modern high-performance JVMs are actually either Smalltalk VMs in disguise, derived from Smalltalk VMs or written by Smalltalk VM companies.) Lisp is also interesting, because although it is a dynamic language, there are some (although not many) static high-performance implementations.
And we haven't even begun talking about the rest of the execution environment: modern mainstream operating systems, mainstream CPUs and mainstream hardware architectures are heavily biased towards static languages, to the point of being actively hostile for dynamic languages. Given that modern mainstream execution environments are pretty much of a worst-case scenario for dynamic languages, it is quite astonishing how well they actually perform and one can only imagine what the performance in a less hostile environment would look like.
All other things being equal, usually, yes.
First you must clarify whether you consider
dynamic typing vs. static typing or
statically compiled languaged vs. interpreted languages vs. bytecode JIT.
Usually we mean
dynamc language = dynamic typing + interpreted at run-time and
static languages = static typing + statically compiled
, but it's not necessary the case.
Type information can help the VM dispatch the message faster than witout type information, but the difference tend to disappear with optimization in the VM which detect monomorphic call sites. See the paragraph "performance consideration" in this post about dynamic invokation.
The debates between compiled vs. interpreted vs. byte-code JIT is still open. Some argue that bytecode JIT results in faster execution than regular compilation because the compilation is more accurate due to the presence of more information collected at run-time. Read the wikipedia entry about JIT for more insight. Interpreted language are indeed slower than any of the two forms or compilation.
I will not argue further, and start a heated discussion, I just wanted to point out that the gap between both tend to get smaller and smaller. Chances are that the performance problem that you might face will not be related to the language and VM but because of your design.
EDIT
If you want numbers, I suggest you look at the The Computer Language Benchmarks. I found it insightful.
At the instruction level current implementations of dynamically typed languages are typically slower than current implementations of statically typed languages.
However that does not necessarily mean that the implementation of a program will be slower in dynamic languages - there are lots of documented cases of the same program being implemented in both a static and dynamic language and the dynamic implementation has turned out to be faster. For example this study (PDF) gave the same problem to programmers in a variety of languages and compared the result. The mean runtime for the Python and Perl implementations were faster than the mean runtime for the C++ and Java implementations.
There are several reasons for this:
1) the code can be implemented more quickly in a dynamic language, leaving more time for optimisation.
2) high level data structures (maps, sets etc) are a core part of most dynamic languages and so are more likely to be used. Since they are core to the language they tend to be highly optimised.
3) programmer skill is more important than language speed - an inexperienced programmer can write slow code in any language. In the study mentioned above there were several orders of magnitude difference between the fastest and slowest implementation in each of the languages.
4) in many problem domains execution speed it dominated by I/O or some other factor external to the language.
5) Algorithm choice can dwarf language choice. In the book "More Programming Pearls" Jon Bentley implemented two algorithms for a problem - one was O(N^3) and implemented in optimised fortran on a Cray1. The other was O(N) and implemented in BASIC on a TRS80 home micro (this was in the 1980s). The TRS80 outperformed the Cray 1 for N > 5000.
Dynamic language run-times only need to check the type occasionally.
But it is still, typically, slower.
There are people making good claims that such performance gaps are attackable, however; e.g. http://steve-yegge.blogspot.com/2008/05/dynamic-languages-strike-back.html
Themost important factor is to consider the method dispatch algorithm. With static languages each method is typically allocated an index. THe names we see in source are not actually used at runtime and are in source for readaility purposes. Naturally languages like java keep them and make them available in reflection but in terms of when one invokes a method they are not used. I will leave reflection and binding out of this discussion. This means when a method is invoked the runtmne simply uses the offset to lookup a table and call. A dynamic language on the other hand uses the name of the function to lookup a map and then calls said function. A hashmap is always going to be slower than using an index lookup into an array.
No, dynamic languages are not necessarily slower than static languages.
The pypy and psyco projects have been making a lot of progress on building JIT compilers for python that have data-driven compilation; in other words, they will automatically compile versions of frequently called functions specialised for particular common values of arguments. Not just by type, like a C++ template, but actual argument values; say an argument is usually zero, or None, then there will be a specifically compiled version of the function for that value.
This can lead to compiled code that is faster than you'd get out of a C++ compiler, and since it is doing this at runtime, it can discover optimisations specifically for the actual input data for this particular instance of the program.
Reasonable to assume as more things need to be computed in runtime.
Actually, it's difficult to say because many of the benchmarks used are not that representative. And with more sophisticated execution environments, like HotSpot JVM, differences are getting less and less relevant. Take a look at following article:
Java theory and practice: Dynamic compilation and performance measurement
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
Why are functional languages always tailing behind C in benchmarks? If you have a statically typed functional language, it seems to me it could be compiled to the same code as C, or to even more optimized code since more semantics are available to the compiler. Why does it seem like all functional languages are slower than C, and why do they always need garbage collection and excessive use of the heap?
Does anyone know of a functional language appropriate for embedded / real-time applications, where memory allocation is kept to a minimum and the produced machine code is lean and fast?
Are functional languages inherently slow?
In some sense, yes. They require infrastructure that inevitably adds overheads over what can theoretically be attained using assembler by hand. In particular, first-class lexical closures only work well with garbage collection because they allow values to be carried out of scope.
Why are functional languages always tailing behind C in benchmarks?
Firstly, beware of selection bias. C acts as a lowest common denominator in benchmark suites, limiting what can be accomplished. If you have a benchmark comparing C with a functional language then it is almost certainly an extremely simple program. Arguably so simple that it is of little practical relevance today. It is not practically feasible to solve more complicated problems using C for a mere benchmark.
The most obvious example of this is parallelism. Today, we all have multicores. Even my phone is a multicore. Multicore parallelism is notoriously difficult in C but can be easy in functional languages (I like F#). Other examples include anything that benefits from persistent data structures, e.g. undo buffers are trivial with purely functional data structures but can be a huge amount of work in imperative languages like C.
Why does it seem like all functional languages are slower than C, and why do they always need garbage collection and excessive use of the heap?
Functional languages will seem slower because you'll only ever see benchmarks comparing code that is easy enough to write well in C and you'll never see benchmarks comparing meatier tasks where functional languages start to excel.
However, you've correctly identified what is probably the single biggest bottleneck in functional languages today: their excessive allocation rates. Nice work!
The reasons why functional languages allocate so heavily can be split into historical and inherent reasons.
Historically, Lisp implementations have been doing a lot of boxing for 50 years now. This characteristic spread to many other languages which use Lisp-like intermediate representations. Over the years, language implementers have continually resorted to boxing as a quick fix for complications in language implementation. In object oriented languages, the default has been to always heap allocate every object even when it can obviously be stack allocated. The burden of efficiency was then pushed onto the garbage collector and a huge amount of effort has been put into building garbage collectors that can attain performance close to that of stack allocation, typically by using a bump-allocating nursery generation. I think that a lot more effort should be put into researching functional language designs that minimize boxing and garbage collector designs that are optimized for different requirements.
Generational garbage collectors are great for languages that heap allocate a lot because they can be almost as fast as stack allocation. But they add substantial overheads elsewhere. Today's programs are increasingly using data structures like queues (e.g. for concurrent programming) and these give pathological behaviour for generational garbage collectors. If the items in the queue outlive the first generation then they all get marked, then they all get copied ("evacuated"), then all of the references to their old locations get updated and then they become eligible for collection. This is about 3× slower than it needs to be (e.g. compared to C). Mark region collectors like Beltway (2002) and Immix (2008) have the potential to solve this problem because the nursery is replaced with a region that can either be collected as if it were a nursery or, if it contains mostly reachable values, it can be replaced with another region and left to age until it contains mostly unreachable values.
Despite the pre-existence of C++, the creators of Java made the mistake of adopting type erasure for generics, leading to unnecessary boxing. For example, I benchmarked a simple hash table running 17× faster on .NET than the JVM partly because .NET did not make this mistake (it uses reified generics) and also because .NET has value types. I actually blame Lisp for making Java slow.
All modern functional language implementations continue to box excessively. JVM-based languages like Clojure and Scala have little choice because the VM they target cannot even express value types. OCaml sheds type information early in its compilation process and resorts to tagged integers and boxing at run-time to handle polymorphism. Consequently, OCaml will often box individual floating point numbers and always boxes tuples. For example, a triple of bytes in OCaml is represented by a pointer (with an implicit 1-bit tag embedded in it that gets checked repeatedly at run-time) to a heap-allocated block with a 64 bit header and 192 bit body containing three tagged 63-bit integers (where the 3 tags are, again, repeatedly examined at run time!). This is clearly insane.
Some work has been done on unboxing optimizations in functional languages but it never really gained traction. For example, the MLton compiler for Standard ML was a whole-program optimizing compiler that did sophisticated unboxing optimizations. Sadly, it was before its time and the "long" compilation times (probably under 1s on a modern machine!) deterred people from using it.
The only major platform to have broken this trend is .NET but, amazingly, it appears to have been an accident. Despite having a Dictionary implementation very heavily optimized for keys and values that are of value types (because they are unboxed) Microsoft employees like Eric Lippert continue to claim that the important thing about value types is their pass-by-value semantics and not the performance characteristics that stem from their unboxed internal representation. Eric seems to have been proven wrong: more .NET developers seem to care more about unboxing than pass-by-value. Indeed, most structs are immutable and, therefore, referentially transparent so there is no semantic difference between pass-by-value and pass-by-reference. Performance is visible and structs can offer massive performance improvements. The performance of structs even saved Stack Overflow and structs are used to avoid GC latency in commercial software like Rapid Addition's!
The other reason for heavy allocation by functional languages is inherent. Imperative data structures like hash tables use huge monolithic arrays internally. If these were persistent then the huge internal arrays would need to be copied every time an update was made. So purely functional data structures like balanced binary trees are fragmented into many little heap-allocated blocks in order to facilitate reuse from one version of the collection to the next.
Clojure uses a neat trick to alleviate this problem when collections like dictionaries are only written to during initialization and are then read from a lot. In this case, the initialization can use mutation to build the structure "behind the scenes". However, this does not help with incremental updates and the resulting collections are still substantially slower to read than their imperative equivalents. On the up-side, purely functional data structures offer persistence whereas imperative ones do not. However, few practical applications benefit from persistence in practice so this is often not advantageous. Hence the desire for impure functional languages where you can drop to imperative style effortlessly and reap the benefits.
Does anyone know of a functional language appropriate for embedded / real-time applications, where memory allocation is kept to a minimum and the produced machine code is lean and fast?
Take a look at Erlang and OCaml if you haven't already. Both are reasonable for memory constrained systems but neither generate particularly great machine code.
Nothing is inherently anything. Here is an example where interpreted OCaml runs faster than equivalent C code, because the OCaml optimizer has different information available to it, due to differences in the language. Of course, it would be foolish to make a general claim that OCaml is categorically faster than C. The point is, it depends upon what you're doing, and how you do it.
That said, OCaml is an example of a (mostly) functional language which is actually designed for performance, in contrast to purity.
Functional languages require the elimination of mutable state that is visible at the level of the language abstraction. Therefore, data that would be mutated in place by an imperative language needs to be copied instead, with the mutation taking place on the copy. For a simple example, see a quick sort in Haskell vs. C.
Furthermore, garbage collection is required because free() is not a pure function, as it has side effects. Therefore, the only way to free memory that does not involve side effects at the level of the language abstraction is with garbage collection.
Of course, in principle, a sufficiently smart compiler could optimize out much of this copying. This is already done to some degree, but making the compiler sufficiently smart to understand the semantics of your code at that level is just plain hard.
The short answer: because C is fast. As in, blazingly ridiculously crazy fast. A language simply doesn't have to be 'slow' to get its rear handed to it by C.
The reason why C is fast is that it was created by really great coders, and gcc has been optimized over the course of a couple more decades and by dozens more brilliant coders than 99% of languages out there.
In short, you're not going to beat C except for specialized tasks that require very specific functional programming constructs.
The control flow of proceedural languages much better matches the actual processing patterns of modern computers.
C maps very closely onto the assembly code its compilation produces, hence the nickname "cross-platform assembly". Computer manufacturers have spent a few decades making assembly code run as fast as possible, so C inherits all of this raw speed.
In comparison, the no side-effects, inherent parallelism of functional languages does not map onto a single processor at all well. The arbitrary order in which functions can be invoked needs to be serialised down to the CPU bottleneck: without extremely clever compilation, you're going to be context switching all the time, none of the pre-fetching will work because you're constantly jumping all over the place, ... Basically, all the optimisation work that computer manufacturers have done for nice, predictable proceedural languages is pretty much useless.
However! With the move towards lots of less powerful cores (rather than one or two turbo-charged cores), functional languages should begin to close the gap, as they naturally scale horizontally.
C is fast because it's basically a set of macros for assembler :) There is no "behind the scene" when you are writing a program in C. You alloc memory when you decide it's time to do that and you free in the same fashion. This is a huge advantage when you are writing a real time application, where predictabily is important (more than anything else, actually).
Also, C compilers are generally extremly fast because language itself is simple. It even doesn't make any type checkings :) This also means that is easier to make hard to find errors.
Ad advantage with the lack of type checking is that a function name can just be exported with its name for example and this makes C code easy to link with other language's code
Well Haskell is only 1.8 times slower than GCC's C++, which is faster than GCC's C implementation for typical benchmark tasks.
That makes Haskell very fast, even faster than C#(Mono that is).
relative Language
speed
1.0 C++ GNU g++
1.1 C GNU gcc
1.2 ATS
1.5 Java 6 -server
1.5 Clean
1.6 Pascal Free Pascal
1.6 Fortran Intel
1.8 Haskell GHC
2.0 C# Mono
2.1 Scala
2.2 Ada 2005 GNAT
2.4 Lisp SBCL
3.9 Lua LuaJIT
source
For the record I use Lua for Games on the iPhone, thus you could easily use Haskell or Lisp if you prefer, since they are faster.
As for now, functional languages aren't used heavily for industry projects, so not enough serious work goes into optimizers. Also, optimizing imperative code for an imperative target is probably way easier.
Functional languages have one feat that will let them outdo imperative languages really soon now: trivial parallelization.
Trivial not in the sense that it is easy, but that it can be built into the language environment, without the developer needing to think about it.
The cost of robust multithreading in a thread-agnostic language like C is prohibitive for many projects.
I disagree with tuinstoel. The important question is whether the functional language provides a faster development time and results in faster code when it is used to what functional languages were meant to be used. See the efficiency issues section on Wikipedia for a glimpse of what I mean.
One more reason for bigger executable size could be lazy evaluation and non-strictness. The compiler can't figure out at compile-time when certain expressions get evaluated, so some runtime gets stuffed into the executable to handle this (to call upon the evaluation of the so-called thunks). As for performance, laziness can be both good and bad. On one hand it allows for additional potential optimization, on the other hand the code size can be larger and programmers are more likely to make bad decisions, e.g. see Haskell's foldl vs. foldr vs. foldl' vs. foldr'.