Is it possible to design a dynamic language without significant performance loss?

Is it possible to design a dynamic language without significant performance loss? - performance

Is it possible to design something like Ruby or Clojure without the significant performance loss in many situations compared with C/Java? Does hardware design play a role?
Edit: With significant I mean in an order of magnitudes, not just ten procent
Edit: I suspect that delnan is correct with me meaning dynamic languages so I changed the title

Performance depends on many things. Of course the semantics of the language have to be preserved even if we are compiling it - you can't remove dynamic dispatch from Ruby, it would speed things up drmatically but it would totally break 95% of the all Ruby code in the world. But still, much of the performance depends on how smart the implementation is.
I assume, by "high-level", you mean "dynamic"? Haskell and OCaml are extremely high-level, yet are is compiled natively and can outperform C# or Java, even C and C++ in some corner cases - especially if parallelism comes into play. And they certainly weren't designed with performance as #1 goal. But compiler writers, especially those focused onfunctional languages, are a very clever folk. If you or I started a high-level language, even if we used e.g. LLVM as backend for native compilation, we wouldn't get anywhere near this performance.
Making dynamic languages run fast is harder - they delay many decisions (types, members of a class/an object, ...) to runtime instead of compiletime, and while static code analysis can sometimes prove it's not possible in lines n and m, you still have to carry an advanced runtime around and do quite a few things a static language's compiler can do at compiletime. Even dynamic dispatch can be optimized with a smarter VM (Inline Cache anyone?), but it's a lot of work. More than a small new-fangeled language could do, that is.
Also see Steve Yegge's Dynamic Languages Strike Back.
And of course, what is a significant peformance loss? 100 times slower than C reads like a lot, but as we all know, 80% of execution time is spent in 20% of the code = 80% of the code won't have notable impact on the percieved performance of the whole program. For the remaining 20%, you can always rewrite it in C or C++ and call it from the dynamic language. For many applications, this suffices (for some, you don't even need to optimize). For the rest... well, if performance is that critical, you should propably write it in a language designed for performance.

Don't confuse the language design with the platform that it runs on.
For instance, Java is a high-level language. It runs on the JVM (as does Clojure - identified above, and JRuby - a Java version of Ruby). The JVM will perform byte-code analysis and optimise how the code runs (making use of escape analysis, just-in-time compilation etc.). So the platform has an effect on the performance that is largely independent of the language itself (see here for more info on Java performance and comparisons to C/C++)

Loss compared to what? If you need a garbage collector or closures then you need them, and you're going to pay the price regardless. If a language makes them easy for you to get at, that doesn't mean you have to use them when you don't need them.
If a language is interpreted instead of compiled, that's going to introduce an order of magnitude slowdown. But such a language may have compensating advantages, like ease of use, platform independence, and not having to compile. And, the programs you write in them may not run long enough for speed to be an issue.
There may be language implementations that introduce slowness for no good reason, but those don't have to be used.

You might want to look at what the DARPA HPCS initiative has come up with. There were 3 programming languages proposed: Sun's Fortress, IBM's X10 and Cray's Chapel. The latter two are still under development. Whether any of these meet your definition of high-level I don't know.
And yes, hardware design certainly does play a part. All 3 of these languages are targeted at supercomputers with very many processors and exhibit features appropriate to that domain.

It's certainly possible. For example, Objective-C is a dynamically-typed language that has performance comparable to C++ (although a wee bit slower, generally speaking, but still roughly equivalent).

Related

Does the speed of programming language depend on compiler?

I am trying to learn how computer programs work and have this question. I often read articles like "C/C++ are faster than java" or "Java and C#: speed comparison". In all cases programs written in any language is translated to assembly language. So, what is the reason of speed differences in those languages. Does that mean, the compiler of one language generates better and faster assembly code ?

Sort of.
There are several reasons why speeds will differ between compilers/interpreters/programming languages, some of it having to do with the compiler, and some of it having to do with the language itself.
Some programming languages require more overhead.
If your language is very high level, it'll have more overhead compared to C, which is very low level. (garbage collection is a good example of this). It becomes a tradeoff. Do I want blazing fast binaries, or do I want to be able to write programs easily?
Languages are designed to do different things.
For example, PHP is designed to be used on web servers, and nobody in their right mind would try and use it to create a top-tier fps game. Different languages are better suited for different tasks, and will be faster in some areas then in others.
Not all languages compile to assembly.
While C/C++ may compile to assembly, languages like Java instead compile to bytecode and is run against the java virtual machine, for interoperability reasons. Once again, this is a tradeoff -- you gain portability at the expense of overhead.
Furthermore, C/C++ doesn't even have to compile to assembly. For example, enscriptem ultimately will compile C/C++ to Javascript so it can run on web browsers.
Compilers are not magic.
They're programs, and like all programs, have bugs and will improve (or degrade) over time. I could try writing a C compiler over the weekend, and I'd bet a million dollars that it would perform several orders of magnitude worse then a compiler/interpreter for the slowest language you can think of.
Compiler/interpreter optimization is an ongoing field of research and study.
Every year, researchers are writing and publishing papers on a new way to compile and make programs run faster. If a language is newer, it may not yet have had the time to fully apply every optimization available. (see above). Some optimizations may apply to only one kind of compiler/interpreter.
So, to summarize, the speed of a language is a mixture of the intrinsic features of the language itself, along with the maturity of the compiler/interpreter/platform used.
Compilers and interpreters are not some monolithic magical process that's constant between all programming languages -- they're all different, have different benefits and disadvantages, and are constantly in a state of flux.

Performance of Google's Go?

So has anyone used Google's Go? I was wondering how the mathematical performance (e.g. flops) is compared to other languages with a garbage collector... like Java or .NET?
Has anyone investigated this?

Theoretical performance: The theoretical performance of pure Go programs is somewhere between C/C++ and Java. This assumes an advanced optimizing compiler and it also assumes the programmer takes advantage of all features of the language (be it C, C++, Java or Go) and refactors the code to fit the programming language.
Practical performance (as of July 2011): The standard Go compiler (5g/6g/8g) is currently unable to generate efficient instruction streams for high-performance numerical codes, so the performance will be lower than C/C++ or Java. There are multiple reasons for this: each function call has an overhead of a couple of additional instructions (compared to C/C++ or Java), no function inlining, average-quality register allocation, average-quality garbage collector, limited ability to erase bound checks, no access to vector instructions from Go, compiler has no support for SSE2 on 32-bit x86 CPUs, etc.
Bottom line: As a rule of thumb, expect the performance of numerical codes implemented in pure Go, compiled by 5g/6g/8g, to be 2 times lower than C/C++ or Java. Expect the performance to get better in the future.
Practical performance (September 2013): Compared to older Go from July 2011, Go 1.1.2 is capable of generating more efficient numerical codes but they remain to run slightly slower than C/C++ and Java. The compiler utilizes SSE2 instructions even on 32-bit x86 CPUs which causes 32-bit numerical codes to run much faster, most likely thanks to better register allocation. The compiler now implements function inlining and escape analysis. The garbage collector has also been improved but it remains to be less advanced than Java's garbage collector. There is still no support for accessing vector instructions from Go.
Bottom line: The performance gap seems sufficiently small for Go to be an alternative to C/C++ and Java in numerical computing, unless the competing implementation is using vector instructions.

The Go math package is largely written in assembler for performance.
Benchmarks are often unreliable and are subject to interpretation. For example, Robert Hundt's paper Loop Recognition in C++/Java/Go/Scala looks flawed. The Go blog post on Profiling Go Programs dissects Hundt's claims.

You're actually asking several different questions. First of all, Go's math performance is going to be about as fast as anything else. Any language that compiles down to native code (which arguably includes even JIT languages like .NET) is going to perform extremely well at raw math -- as fast as the machine can go. Simple math operations are very easy to compile into a zero-overhead form. This is the area where compiled (including JIT) languages have a advantage over interpreted ones.
The other question you asked was about garbage collection. This is, to a certain extent, a bit of a side issue if you're talking about heavy math. That's not to say that GC doesn't impact performance -- actually it impacts quite a bit. But the common solution for tight loops is to avoid or minimize GC sweeps. This is often quite simple if you're doing a tight loop -- you just re-use your old variables instead of constantly allocating and discarding them. This can speed your code by several orders of magnitude.
As for the GC implementations themselves -- Go and .NET both use mark-and-sweep garbage collection. Microsoft has put a lot of focus and engineering into their GC engine, and I'm obliged to think that it's quite good all things considered. Go's GC engine is a work in progress, and while it doesn't feel any slower than .NET's architecture, the Golang folks insist that it needs some work. The fact that Go's specification disallows destructors goes a long way in speeding things up, which may be why it doesn't seem that slow.
Finally, in my own anecdotal experience, I've found Go to be extremely fast. I've written very simple and easy programs that have stood up in my own benchmarks against highly-optimized C code from some long-standing and well-respected open source projects that pride themselves on performance.
The catch is that not all Go code is going to be efficient, just like not all C code is efficient. You've got to build it correctly, which often means doing things differently than what you're used to from other languages. The profiling blog post mentioned here several times is a good example of that.

Google did a study comparing Go to some other popular languages (C++, Java, Scala). They concluded it was not as strong performance-wise:
https://days2011.scala-lang.org/sites/days2011/files/ws3-1-Hundt.pdf
Quote from the Conclusion, about Go:
Go offers interesting language features, which also allow for a concise and standardized notation. The compilers for this language are still immature, which reﬂects in both performance and binary sizes.

most suitable language for computationally and memory expensive algorithms

Let's say you have to implement a tool to efficiently solve an NP-hard problem, with unavoidable possible explosion of memory usage (the output size in some cases exponential to the input size) and you are particularly concerned about the performances of this tool at running time. The source code has also to be readable and understandable once the underlying theory is known, and this requirement is as important as the efficiency of the tool itself.
I personally think that 3 languages could be suitable for these three requirements: c++, scala, java.
They all provide the right abstraction on data types that makes it possible to compare different structures or apply the same algorithms (which is also important) to different data types.
C++ has the advantage of being statically compiled and optimized, and with function inlining (if the data structures and algorithms are designed carefully) and other optimisation techniques it's possible to achieve a performance close to that of pure C while maintaining a fairly good readability.
If you also put a lot of care in data representation you can optimise the cache performance, which can gain orders of magnitude in speed when the cache miss rate is low.
Java is instead JIT compiled, which allows to apply optimisations during runtime, and in this category of algorithms that could have different behaviours between different runs, that may be a plus. I fear instead that such an approach could suffer from garbage collector, however in the case of this algorithm it's common to continuously allocate memory and java heap performance is notoriously better than C/C++ and if you implement your own memory manager inside the language you could even achieve good efficiency.
This approach instead is not able to inline method invocation (which induces a huge performance penalty) and doesn't give you control over the cache performance. Among the pros there's a better and cleaner syntax than C++.
My concerns about scala are more or less the same as Java, plus the fact that I can't control how the language is optimised unless I have a deep knowledge on the compiler and the standard library. But well: I get a very clean syntax :)
What's your take on the subject? Have you had to deal with this already? Would you implement an algorithm with such properties and requirements in any of these languages or would you suggest something else? How would you compare them?

Usually I’d say “C++” in a heartbeat. The secret being that C++ simply produces less (memory) garbage that needs managing.
On the other hand, your observation that
however in the case of this algorithm it's common to continuously allocate memory
is a hint that Java / Scala may actually be more suited. But then you could use a small object heap in C++ as well. Boost has one that uses the standard allocator interface, if memory serves.
Another advantage of C++ is obviously the use of abstraction without penalty through templates – i.e. that you can easily create generic algorithmic components that can interact without incurring a runtime overhead due to abstraction. In fact, you noted that
it's possible to achieve a performance close to that of pure C while maintaining a fairly good readability
– this is looking at things the wrong way: Templates allow C++ to achieve performance superior to that of C while still maintaining high abstraction.

D might be worth a look, seeing as how it tries to be a better C++.
From a superficial glance, it has better source code readability than C++ does, so that's one of your points covered.
It also has memory management, which makes playing with algorithms a bit easier.
And templates
Here is a stackoverflow discussion comparing the performance of C++ and D

The languages you noticed were my first guesses as well.
Each language has a different take on how to handle specific issues like compilation, memory management and source code, but in theory, any of them should be fitting to your problem.
It is impossible to tell which is best, and there is likely no major difference if you are familiar enough with all of them to work around their respective quirks.
And obviously, if you actually find the need to optimize (I'm not sure if that's a given), that's possible in each language. Lower level languages obviously offer more options, but are also (far) more complex to actually improve.
A single note about C++ vs Java: This is really a holy war, and if you've followed the recent development you'll probably have your own opinion. I, for one, think Java offers enough good aspects to make up for its flaws, usually.
And a final note on C++ vs C: According to my knowledge, the difference usually amounts to a sufficiently low percentage to ignore this. It it doesn't make a difference for the source code, it's fine to go with C, if C++ could make for easier-to-read source code, go with C++. In any case, the choice is kind of negligible.
In the end, remember that money spent on a few hours of programming/optimizing this could as well go into slightly superior hardware to make up for missed tiny details.
It all boils down to: Any of your options is fine as long as you do it right (domain knowledge).

I would use a language which makes it very easy to work on the algorithm. Get the algorithm right and it could very easily outweigh any advantage from fine-tuning the wrong algorithm. Don't be scared to play around in a language normally thought of as slow in execution speed if that language makes it easier to express algorithmic ideas. It is usually much easier to transcribe the right algorithm into another language than it is to eek-out the last dregs of speed from the wrong algorithm in the fastest executing language.
So do it in a language you are comfortable with and which is expressive. You might surprise yourself and find that what is produced is fast enough!

Are dynamic languages slower than static languages? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
Are dynamic languages slower than static languages because, for example, the run-time has to check the type consistently?

No.
Dynamic languages are not slower than static languages. In fact, it is impossible for any language, dynamic or not, to be slower than another language (or faster, for that matter), simply because a language is just a bunch of abstract mathematical rules. You cannot execute a bunch of abstract mathematical rules, therefore they cannot ever be slow(er) or fast(er).
The statement that "dynamic languages are slower than static languages" is not only wrong, it doesn't even make sense. If English were a typed language, that statement wouldn't even typecheck.
In order for a language to even be able to run, it has to be implemented first. Now you can measure performance, but you aren't measuring the performance of the language, you are measuring the performance of the execution engine. Most languages have many different execution engines, with very different performance characteristics. For C, for example, the difference between the fastest and slowest implementations is a factor of 100000 or so!
Also, you cannot really measure the performance of an execution engine, either: you have to write some code to run on that exection engine first. But now you aren't measuring the performance of the execution engine, you are measuring the performance of the benchmark code. Which has very little to do with the performance of the execution engine and certainly nothing to do with the performance of the language.
In general, running well-designed code on well-designed high-performance execution engines will yield about the same performance, independent of whether the language is static or dynamic, procedural, object-oriented or functional, imperative or declarative, lazy or strict, pure or impure.
In fact, I would propose that the performance of a system is solely dependent on the amount of money that was spent making it fast, and completely independent of any particular typing discipline, programming paradigm or language.
Take for example Smalltalk, Lisp, Java and C++. All of them are, or have at one point been, the language of choice for high-performance code. All of them have huge amounts of engineering and research man-centuries expended on them to make them fast. All of them have highly-tuned proprietary commercial high-performance execution engines available. Given roughly the same problem, implemented by roughly comparable developers, they all perform roughly the same.
Two of those languages are dynamic, two are static. Java is interesting, because although it is a static language, most modern high-performance implementations are actually dynamic implementations. (In fact, several modern high-performance JVMs are actually either Smalltalk VMs in disguise, derived from Smalltalk VMs or written by Smalltalk VM companies.) Lisp is also interesting, because although it is a dynamic language, there are some (although not many) static high-performance implementations.
And we haven't even begun talking about the rest of the execution environment: modern mainstream operating systems, mainstream CPUs and mainstream hardware architectures are heavily biased towards static languages, to the point of being actively hostile for dynamic languages. Given that modern mainstream execution environments are pretty much of a worst-case scenario for dynamic languages, it is quite astonishing how well they actually perform and one can only imagine what the performance in a less hostile environment would look like.

All other things being equal, usually, yes.

First you must clarify whether you consider
dynamic typing vs. static typing or
statically compiled languaged vs. interpreted languages vs. bytecode JIT.
Usually we mean
dynamc language = dynamic typing + interpreted at run-time and
static languages = static typing + statically compiled
, but it's not necessary the case.
Type information can help the VM dispatch the message faster than witout type information, but the difference tend to disappear with optimization in the VM which detect monomorphic call sites. See the paragraph "performance consideration" in this post about dynamic invokation.
The debates between compiled vs. interpreted vs. byte-code JIT is still open. Some argue that bytecode JIT results in faster execution than regular compilation because the compilation is more accurate due to the presence of more information collected at run-time. Read the wikipedia entry about JIT for more insight. Interpreted language are indeed slower than any of the two forms or compilation.
I will not argue further, and start a heated discussion, I just wanted to point out that the gap between both tend to get smaller and smaller. Chances are that the performance problem that you might face will not be related to the language and VM but because of your design.
EDIT
If you want numbers, I suggest you look at the The Computer Language Benchmarks. I found it insightful.

At the instruction level current implementations of dynamically typed languages are typically slower than current implementations of statically typed languages.
However that does not necessarily mean that the implementation of a program will be slower in dynamic languages - there are lots of documented cases of the same program being implemented in both a static and dynamic language and the dynamic implementation has turned out to be faster. For example this study (PDF) gave the same problem to programmers in a variety of languages and compared the result. The mean runtime for the Python and Perl implementations were faster than the mean runtime for the C++ and Java implementations.
There are several reasons for this:
1) the code can be implemented more quickly in a dynamic language, leaving more time for optimisation.
2) high level data structures (maps, sets etc) are a core part of most dynamic languages and so are more likely to be used. Since they are core to the language they tend to be highly optimised.
3) programmer skill is more important than language speed - an inexperienced programmer can write slow code in any language. In the study mentioned above there were several orders of magnitude difference between the fastest and slowest implementation in each of the languages.
4) in many problem domains execution speed it dominated by I/O or some other factor external to the language.
5) Algorithm choice can dwarf language choice. In the book "More Programming Pearls" Jon Bentley implemented two algorithms for a problem - one was O(N^3) and implemented in optimised fortran on a Cray1. The other was O(N) and implemented in BASIC on a TRS80 home micro (this was in the 1980s). The TRS80 outperformed the Cray 1 for N > 5000.

Dynamic language run-times only need to check the type occasionally.
But it is still, typically, slower.
There are people making good claims that such performance gaps are attackable, however; e.g. http://steve-yegge.blogspot.com/2008/05/dynamic-languages-strike-back.html

Themost important factor is to consider the method dispatch algorithm. With static languages each method is typically allocated an index. THe names we see in source are not actually used at runtime and are in source for readaility purposes. Naturally languages like java keep them and make them available in reflection but in terms of when one invokes a method they are not used. I will leave reflection and binding out of this discussion. This means when a method is invoked the runtmne simply uses the offset to lookup a table and call. A dynamic language on the other hand uses the name of the function to lookup a map and then calls said function. A hashmap is always going to be slower than using an index lookup into an array.

No, dynamic languages are not necessarily slower than static languages.
The pypy and psyco projects have been making a lot of progress on building JIT compilers for python that have data-driven compilation; in other words, they will automatically compile versions of frequently called functions specialised for particular common values of arguments. Not just by type, like a C++ template, but actual argument values; say an argument is usually zero, or None, then there will be a specifically compiled version of the function for that value.
This can lead to compiled code that is faster than you'd get out of a C++ compiler, and since it is doing this at runtime, it can discover optimisations specifically for the actual input data for this particular instance of the program.

Reasonable to assume as more things need to be computed in runtime.

Actually, it's difficult to say because many of the benchmarks used are not that representative. And with more sophisticated execution environments, like HotSpot JVM, differences are getting less and less relevant. Take a look at following article:
Java theory and practice: Dynamic compilation and performance measurement

Are functional languages inherently slow? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
Why are functional languages always tailing behind C in benchmarks? If you have a statically typed functional language, it seems to me it could be compiled to the same code as C, or to even more optimized code since more semantics are available to the compiler. Why does it seem like all functional languages are slower than C, and why do they always need garbage collection and excessive use of the heap?
Does anyone know of a functional language appropriate for embedded / real-time applications, where memory allocation is kept to a minimum and the produced machine code is lean and fast?

Are functional languages inherently slow?
In some sense, yes. They require infrastructure that inevitably adds overheads over what can theoretically be attained using assembler by hand. In particular, first-class lexical closures only work well with garbage collection because they allow values to be carried out of scope.
Why are functional languages always tailing behind C in benchmarks?
Firstly, beware of selection bias. C acts as a lowest common denominator in benchmark suites, limiting what can be accomplished. If you have a benchmark comparing C with a functional language then it is almost certainly an extremely simple program. Arguably so simple that it is of little practical relevance today. It is not practically feasible to solve more complicated problems using C for a mere benchmark.
The most obvious example of this is parallelism. Today, we all have multicores. Even my phone is a multicore. Multicore parallelism is notoriously difficult in C but can be easy in functional languages (I like F#). Other examples include anything that benefits from persistent data structures, e.g. undo buffers are trivial with purely functional data structures but can be a huge amount of work in imperative languages like C.
Why does it seem like all functional languages are slower than C, and why do they always need garbage collection and excessive use of the heap?
Functional languages will seem slower because you'll only ever see benchmarks comparing code that is easy enough to write well in C and you'll never see benchmarks comparing meatier tasks where functional languages start to excel.
However, you've correctly identified what is probably the single biggest bottleneck in functional languages today: their excessive allocation rates. Nice work!
The reasons why functional languages allocate so heavily can be split into historical and inherent reasons.
Historically, Lisp implementations have been doing a lot of boxing for 50 years now. This characteristic spread to many other languages which use Lisp-like intermediate representations. Over the years, language implementers have continually resorted to boxing as a quick fix for complications in language implementation. In object oriented languages, the default has been to always heap allocate every object even when it can obviously be stack allocated. The burden of efficiency was then pushed onto the garbage collector and a huge amount of effort has been put into building garbage collectors that can attain performance close to that of stack allocation, typically by using a bump-allocating nursery generation. I think that a lot more effort should be put into researching functional language designs that minimize boxing and garbage collector designs that are optimized for different requirements.
Generational garbage collectors are great for languages that heap allocate a lot because they can be almost as fast as stack allocation. But they add substantial overheads elsewhere. Today's programs are increasingly using data structures like queues (e.g. for concurrent programming) and these give pathological behaviour for generational garbage collectors. If the items in the queue outlive the first generation then they all get marked, then they all get copied ("evacuated"), then all of the references to their old locations get updated and then they become eligible for collection. This is about 3× slower than it needs to be (e.g. compared to C). Mark region collectors like Beltway (2002) and Immix (2008) have the potential to solve this problem because the nursery is replaced with a region that can either be collected as if it were a nursery or, if it contains mostly reachable values, it can be replaced with another region and left to age until it contains mostly unreachable values.
Despite the pre-existence of C++, the creators of Java made the mistake of adopting type erasure for generics, leading to unnecessary boxing. For example, I benchmarked a simple hash table running 17× faster on .NET than the JVM partly because .NET did not make this mistake (it uses reified generics) and also because .NET has value types. I actually blame Lisp for making Java slow.
All modern functional language implementations continue to box excessively. JVM-based languages like Clojure and Scala have little choice because the VM they target cannot even express value types. OCaml sheds type information early in its compilation process and resorts to tagged integers and boxing at run-time to handle polymorphism. Consequently, OCaml will often box individual floating point numbers and always boxes tuples. For example, a triple of bytes in OCaml is represented by a pointer (with an implicit 1-bit tag embedded in it that gets checked repeatedly at run-time) to a heap-allocated block with a 64 bit header and 192 bit body containing three tagged 63-bit integers (where the 3 tags are, again, repeatedly examined at run time!). This is clearly insane.
Some work has been done on unboxing optimizations in functional languages but it never really gained traction. For example, the MLton compiler for Standard ML was a whole-program optimizing compiler that did sophisticated unboxing optimizations. Sadly, it was before its time and the "long" compilation times (probably under 1s on a modern machine!) deterred people from using it.
The only major platform to have broken this trend is .NET but, amazingly, it appears to have been an accident. Despite having a Dictionary implementation very heavily optimized for keys and values that are of value types (because they are unboxed) Microsoft employees like Eric Lippert continue to claim that the important thing about value types is their pass-by-value semantics and not the performance characteristics that stem from their unboxed internal representation. Eric seems to have been proven wrong: more .NET developers seem to care more about unboxing than pass-by-value. Indeed, most structs are immutable and, therefore, referentially transparent so there is no semantic difference between pass-by-value and pass-by-reference. Performance is visible and structs can offer massive performance improvements. The performance of structs even saved Stack Overflow and structs are used to avoid GC latency in commercial software like Rapid Addition's!
The other reason for heavy allocation by functional languages is inherent. Imperative data structures like hash tables use huge monolithic arrays internally. If these were persistent then the huge internal arrays would need to be copied every time an update was made. So purely functional data structures like balanced binary trees are fragmented into many little heap-allocated blocks in order to facilitate reuse from one version of the collection to the next.
Clojure uses a neat trick to alleviate this problem when collections like dictionaries are only written to during initialization and are then read from a lot. In this case, the initialization can use mutation to build the structure "behind the scenes". However, this does not help with incremental updates and the resulting collections are still substantially slower to read than their imperative equivalents. On the up-side, purely functional data structures offer persistence whereas imperative ones do not. However, few practical applications benefit from persistence in practice so this is often not advantageous. Hence the desire for impure functional languages where you can drop to imperative style effortlessly and reap the benefits.
Does anyone know of a functional language appropriate for embedded / real-time applications, where memory allocation is kept to a minimum and the produced machine code is lean and fast?
Take a look at Erlang and OCaml if you haven't already. Both are reasonable for memory constrained systems but neither generate particularly great machine code.

Nothing is inherently anything. Here is an example where interpreted OCaml runs faster than equivalent C code, because the OCaml optimizer has different information available to it, due to differences in the language. Of course, it would be foolish to make a general claim that OCaml is categorically faster than C. The point is, it depends upon what you're doing, and how you do it.
That said, OCaml is an example of a (mostly) functional language which is actually designed for performance, in contrast to purity.

Functional languages require the elimination of mutable state that is visible at the level of the language abstraction. Therefore, data that would be mutated in place by an imperative language needs to be copied instead, with the mutation taking place on the copy. For a simple example, see a quick sort in Haskell vs. C.
Furthermore, garbage collection is required because free() is not a pure function, as it has side effects. Therefore, the only way to free memory that does not involve side effects at the level of the language abstraction is with garbage collection.
Of course, in principle, a sufficiently smart compiler could optimize out much of this copying. This is already done to some degree, but making the compiler sufficiently smart to understand the semantics of your code at that level is just plain hard.

The short answer: because C is fast. As in, blazingly ridiculously crazy fast. A language simply doesn't have to be 'slow' to get its rear handed to it by C.
The reason why C is fast is that it was created by really great coders, and gcc has been optimized over the course of a couple more decades and by dozens more brilliant coders than 99% of languages out there.
In short, you're not going to beat C except for specialized tasks that require very specific functional programming constructs.

The control flow of proceedural languages much better matches the actual processing patterns of modern computers.
C maps very closely onto the assembly code its compilation produces, hence the nickname "cross-platform assembly". Computer manufacturers have spent a few decades making assembly code run as fast as possible, so C inherits all of this raw speed.
In comparison, the no side-effects, inherent parallelism of functional languages does not map onto a single processor at all well. The arbitrary order in which functions can be invoked needs to be serialised down to the CPU bottleneck: without extremely clever compilation, you're going to be context switching all the time, none of the pre-fetching will work because you're constantly jumping all over the place, ... Basically, all the optimisation work that computer manufacturers have done for nice, predictable proceedural languages is pretty much useless.
However! With the move towards lots of less powerful cores (rather than one or two turbo-charged cores), functional languages should begin to close the gap, as they naturally scale horizontally.

C is fast because it's basically a set of macros for assembler :) There is no "behind the scene" when you are writing a program in C. You alloc memory when you decide it's time to do that and you free in the same fashion. This is a huge advantage when you are writing a real time application, where predictabily is important (more than anything else, actually).
Also, C compilers are generally extremly fast because language itself is simple. It even doesn't make any type checkings :) This also means that is easier to make hard to find errors.
Ad advantage with the lack of type checking is that a function name can just be exported with its name for example and this makes C code easy to link with other language's code

Well Haskell is only 1.8 times slower than GCC's C++, which is faster than GCC's C implementation for typical benchmark tasks.
That makes Haskell very fast, even faster than C#(Mono that is).
relative Language
speed
1.0 C++ GNU g++
1.1 C GNU gcc
1.2 ATS
1.5 Java 6 -server
1.5 Clean
1.6 Pascal Free Pascal
1.6 Fortran Intel
1.8 Haskell GHC
2.0 C# Mono
2.1 Scala
2.2 Ada 2005 GNAT
2.4 Lisp SBCL
3.9 Lua LuaJIT
source
For the record I use Lua for Games on the iPhone, thus you could easily use Haskell or Lisp if you prefer, since they are faster.

As for now, functional languages aren't used heavily for industry projects, so not enough serious work goes into optimizers. Also, optimizing imperative code for an imperative target is probably way easier.
Functional languages have one feat that will let them outdo imperative languages really soon now: trivial parallelization.
Trivial not in the sense that it is easy, but that it can be built into the language environment, without the developer needing to think about it.
The cost of robust multithreading in a thread-agnostic language like C is prohibitive for many projects.

I disagree with tuinstoel. The important question is whether the functional language provides a faster development time and results in faster code when it is used to what functional languages were meant to be used. See the efficiency issues section on Wikipedia for a glimpse of what I mean.

One more reason for bigger executable size could be lazy evaluation and non-strictness. The compiler can't figure out at compile-time when certain expressions get evaluated, so some runtime gets stuffed into the executable to handle this (to call upon the evaluation of the so-called thunks). As for performance, laziness can be both good and bad. On one hand it allows for additional potential optimization, on the other hand the code size can be larger and programmers are more likely to make bad decisions, e.g. see Haskell's foldl vs. foldr vs. foldl' vs. foldr'.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio