Why is type checking expensive? - performance

I've heard many anecdotes that a large problem with dynamically typed languages is that type checking is very slow. Why is it slow though? What is the computer science rational that using runtime assigned types that may change cause large slowdowns in computational efficiency?

Dynamically typed languages must perform type-checking while code is running. Although they can sometimes be compiled, they need to cut many corners for reasonable performance. One big drawback of checking at runtime is that if a type fails to be valid, the interpreter can only throw exceptions or stop execution.
So they often try to coerce types to prevent exceptions, even when it may be undesirable. In python, it isn't uncommon to discover that a simple division by whole integers means that my user output is suddenly full of '2.0' because I didn't explicitly cast back into int.
The computer science rational is that type-checking is an extremely heavy algorithm. For every function you call, all the types involved must be validated (or coerced which may be another function call), and type information must be updated afterwards. At runtime you can only afford to have a simple type system and very little optimization. A compiler by comparison can exploit even a weak type system to optimize your inefficient algorithms away.
It's very common for statically-typed languages to be compiled, and dynamically-typed languages to be interpreted. This is because if a language is being designed for a compiler, it's a no-brainer to give the responsibility of type-checking to the compiler so that your code will be more optimal and won't need to manage typing at runtime. The less you need to carry at runtime, the faster code will execute.
Ultimately, this means languages designed for interpreters can't afford the level of typing a compiler can. In addition to having less freedom to exploit type information to optimize - strike 1 to performance - they must carry and modify type information at runtime - strike 2. The weaker type system also introduces many type safety bugs.
Naturally, there are also numerous cases where weak typing is desirable. Dynamic languages often take the role of scripting; they're quick to code, easy to interpret, and can be ported to new platforms faster than a compiler! This makes them invaluable for gluing very different systems together. One script can interact with the operating system and many programs on it to schedule a daily download of all the latest cat videos from your favourite website.
As always, I highly recommend that you have a dynamic language and a static language in your repertoire. It's invaluable to have access to the guarantees of strong typing and access to the ease of weak typing. Be a code omnivore :)

Related

Will using Sorbet's ruby type checker have an impact on a ruby app's performance?

Maybe a newb's question but if you never ask youll never know
Will using Stripe's Sorbet (https://sorbet.org/) on a RoR app, can potentially improve the app's performance?
(performance meaning response times, not robustness \ runtime error rate)
I did some reading on dynamically typed languages (particularly Javascript in this case) and found out that if we keep sending some function (foo for example) the same type of objects, the engine does some optimising work on that function, so that when it is invoked again with the same types, there interpreting work would be quicker.
I thought maybe ruby interpreter does a similar work which can potentially mean that type-checking may increase interpreting speed
I thought maybe ruby interpreter does a similar work which can potentially mean that type-checking may increase interpreting speed
It doesn't yet, but one could potentially build this one day.
Goal of Sorbet was to build a type system for people, compared to building a type system for computers(compiler). It can introduce some performance overhead, but as Stripe runs it in production, we keep it in check. Internally, we page us if overhead is >7% of cpu time.
I did some reading on dynamically typed languages (particularly Javascript in this case) and found out that if we keep sending some function (foo for example) the same type of objects, the engine does some optimising work on that function, so that when it is invoked again with the same types, there interpreting work would be quicker.
Yes, this can be done. What you're describing is a common optimization in Just-In-Time(JIT) compilers. The technique that you seem to refer to uses run time profiling and actually is a common alternative technique that allows to achieve this result in absence of type system. It's also worth noting that well-build JITs can do it more frequently than a type system, as type system encodes what could happen, while profiling & JITs can optimize for what actually happens in practice.
That said, building a JIT is frequently much more work than building an online compiler, thus, depending on amount of investment one wants to put into speeding up Ruby, either using building a JIT or using types can prove better under different real-world constrains.
I thought maybe ruby interpreter does a similar work which can potentially mean that type-checking may increase interpreting speed
Summarizing the previous paragraph, Sorbet typesystem does not currently speedup Ruby, but it doesn't slow it down much either.
Type systems could be indeed used to speed up languages, but they aren't your only tool, with profiling & JIT compilation being the major competitor.
the optimizations you are talking about apply more to the JIT that is beeing worked on for the ruby runtime.
in general, sorbet aims at type-safety by introducing type interfaces or method signatures. they enable static type-checks that are applied before deploying the application in order to get rid of "type errors".
sorbet comes with a runtime component that can enforce type checks at runtime in your runnable application, but those are going to decrease the applications performance as they wrap method-calls in order to check for correct types https://sorbet.org/docs/runtime#runtime-checked-sig-s

Is Swift the only (mainstream) language with overflow checking arithmetic?

I started studying Swift language today. I've leaned up to basic and advanced operators.
To me, the fact that all the default arithmetic operations in Swift are checked against overflow/underflow is a little surprising.
Is there any other mainstream language with this feature?
Is Swift runtime's arithmetic could be sub-optimal (performance wise) because of this?
Why did they include this feature in the language, and if it's good, why others don't already use it?
Performance
There's necessarily a cost to that kind of verification. At the lowest level, the operation which causes the overflow takes a single CPU operation. Checking that there was or not and overflow during this operation requires at least one more operation (e.g. using jump operations (branching) which consider the overflow flag).
Note that, depending on the actual code, there are probably a good number of optimizations which can be done during compilation to avoid making the time consuming check on every operation (simple example: 32-bit 0 + any short cannot overflow).
So, yes, it's sub-optimal to include the check.
Why
Some languages include this kind of check to remove the burden from the programmer. If the programmer doesn't have to check himself, he can't make the mistake of not checking. If programmers in a given language can't make a certain class of mistakes, the language's reputation goes up a little bit regarding it's reliability.
As to why some other languages don't include the checks... Different language designers have different philosophies. For some it's important to allow programmers to create programs which are as fast as they theoretically can be. For some others, it's more important to help programmers code correct and robust programs. For example, C/C++ have the reputation of rendering super fast programs, while Ada has the reputation of not even compiling until your program is most probably correct.
Sometimes it's a matter of philosophy, sometimes it's a matter of needs. On small micro-controllers (little memory, slow clock), you don't want to have too many automatic checks because it may significantly slow down the program's execution.
Other languages
Here's a small table about the different ways languages deal (or not) with integer overflow. Some languages leave it to the programmer. Some raise an exception. Some will change the data type to accept the result without error (e.g.: int to long, long to BigInteger).

every language eventually compiled into low-level computer language?

Isn't every language compiled into low-level computer language?
If so, shouldn't all languages have the same performance?
Just wondering...
As pointed out by others, not every language is translated into machine language; some are translated into some form (bytecode, reverse Polish, AST) that is interpreted.
But even among languages that are translated to machine code,
Some translators are better than others
Some language features are easier to translate to high-performance code than others
An example of a translator that is better than some others is the GCC C compiler. It has had many years' work invested in producing good code, and its translations outperform those of the simpler compilers lcc and tcc, for example.
An example of a feature that is hard to translate to high-performance code is C's ability to do pointer arithmetic and to dereference pointers: when a program stores through a pointer, it is very difficult for the compiler to know what memory locations are affected. Similarly, when an unknown function is called, the compiler must make very pessimistic assumptions about what might happen to the contents of objects allocated on the heap. In a language like Java, the compiler can do a better job translating because the type system enforces greater separation between pointers of different types. In a language like ML or Haskell, the compiler can do better still, because in these languages, most data allocated in memory cannot be changed by a function call. But of course object-oriented languages and functional languages present their own translation challenges.
Finally, translation of a Turing-complete language is itself a hard problem: in general, finding the best translation of a program is an NP-hard problem, which means that the only solutions known potentially take time exponential in the size of the program. This would be unacceptable in a compiler (can't wait forever to compile a mere few thousand lines), and so compilers use heuristics. There is always room for improvement in these heuristics.
It is easier and more efficient to map some languages into machine language than others. There is no easy analogy that I can think of for this. The closest I can come to is translating Italian to Spanish vs. translating a Khoisan language into Hawaiian.
Another analogy is saying "Well, the laws of physics are what govern how every animal moves, so why do some animals move so much faster than others? Shouldn't they all just move at the same speed?".
No, some languages are simply interpreted. They never actually get turned into machine code. So those languages will generally run slower than low-level languages like C.
Even for the languages which are compiled into machine code, sometimes what comes out of the compiler is not the most efficient possible way to write that given program. So it's often possible to write programs in, say, assembly language that run faster than their C equivalents, and C programs that run faster than their JIT-compiled Java equivalents, etc. (Modern compilers are pretty good, though, so that's not so much of an issue these days)
Yes, all programs get eventually translated into machine code. BUT:
Some programs get translated during compilation, while others are translated on-the-fly by an interpreter (e.g. Perl) or a virtual machine (e.g. original Java)
Obviously, the latter is MUCH slower as you spend time on translation during running.
Different languages can be translated into DIFFERENT machine code. Even when the same programming task is done. So that machine code might be faster or slower depending on the language.
You should understand the difference between compiling (which is translating) and interpreting (which is simulating). You should also understand the concept of a universal basis for computation.
A language or instruction set is universal if it can be used to write an interpreter (or simulator) for any other language or instruction set. Most computers are electronic, but they can be made in many other ways, such as by fluidics, or mechanical parts, or even by people following directions. A good teaching exercise is to write a small program in BASIC and then have a classroom of students "execute" the program by following its steps. Since BASIC is universal (to a first approximation) you can use it to write a program that simulates the instruction set for any other computer.
So you could take a program in your favorite language, compile (translate) it into machine language for your favorite machine, have an interpreter for that machine written in BASIC, and then (in principle) have a class full of students "execute" it. In this way, it is first being reduced to an instruction set for a "fast" machine, and then being executed by a very very very slow "computer". It will still get the same answer, only about a trillion times slower.
Point being, the concept of universality makes all computers equivalent to each other, even though some are very fast and others are very slow.
No, some languages are run by a 'software interpreter' as byte code.
Also, it depends on what the language does in the background as well, so 2 identically functioning programs in different languages may have different mechanics behind the scenes and hence be actually running different instructions resulting in differing performance.

Static/strong typing and refactoring

It seems to me that the most invaluable thing about a static/strongly-typed programming language is that it helps refactoring: if/when you change any API, then the compiler will tell you what that change has broken.
I can imagine writing code in a runtime/weakly-typed language ... but I can't imagine refactoring without the compiler's help, and I can't imagine writing tens of thousands of lines of code without refactoring.
Is this true?
I think you're conflating when types are checked with how they're checked. Runtime typing isn't necessarily weak.
The main advantage of static types is exactly what you say: they're exhaustive. You can be confident all call sites conform to the type just by letting the compiler do it's thing.
The main limitation of static types is that they're limited in the constraints they can express. This varies by language, with most languages having relatively simple type systems (c, java), and others with extremely powerful type systems (haskell, cayenne).
Because of this limitation types on their own are not sufficient. For example, in java types are more or less restricted to checking type names match. This means the meaning of any constraint you want checked has to be encoded into a naming scheme of some sort, hence the plethora of indirections and boiler plate common to java code. C++ is a little better in that templates allow a bit more expressiveness, but don't come close to what you can do with dependent types. I'm not sure what the downsides to the more powerful type systems are, though clearly there must be some or more people would be using them in industry.
Even if you're using static typing, chances are it's not expressive enough to check everything you care about, so you'll need to write tests too. Whether static typing saves you more effort than it requires in boilerplate is a debate that's raged for ages and that I don't think has a simple answer for all situations.
As to your second question:
How can we re-factor safely in a runtime typed language?
The answer is tests. Your tests have to cover all the cases that matter. Tools can help you in gauging how exhaustive your tests are. Coverage checking tools let you know wether lines of code are covered by the tests or not. Test mutation tools (jester, heckle) can let you know if your tests are logically incomplete. Acceptance tests let you know what you've written matches requirements, and lastly regression and performance tests ensure that each new version of the product maintains the quality of the last.
One of the great things about having proper testing in place vs relying on elaborate type indirections is that debugging becomes much simpler. When running the tests you get specific failed assertions within tests that clearly express what they're doing, rather than obtuse compiler error statements (think c++ template errors).
No matter what tools you use: writing code you're confident in will require effort. It most likely will require writing a lot of tests. If the penalty for bugs is very high, such as aerospace or medical control software, you may need to use formal mathematical methods to prove the behavior of your software, which makes such development extremely expensive.
I totally agree with your sentiment. The very flexibility that dynamically typed languages are supposed to be good at is actually what makes the code very hard to maintain. Really, is there such a thing as a program that continues to work if the data types are changed in a non trivial way without actually changing the code?
In the mean time, you could check the type of variable being passed, and somehow fail if its not the expected type. You'd still have to run your code to root out those cases, but at least something would tell you.
I think Google's internal tools actually do a compilation and probably type checking to their Javascript. I wish I had those tools.
To start, I'm a native Perl programmer so on the one hand I've never programmed with the net of static types. OTOH I've never programmed with them so I can't speak to their benefits. What I can speak to is what its like to refactor.
I don't find the lack of static types to be a problem wrt refactoring. What I find a problem is the lack of a refactoring browser. Dynamic languages have the problem that you don't really know what the code is really going to do until you actually run it. Perl has this more than most. Perl has the additional problem of having a very complicated, almost unparsable, syntax. Result: no refactoring tools (though they're working very rapidly on that). The end result is I have to refactor by hand. And that is what introduces bugs.
I have tests to catch them... usually. I do find myself often in front of a steaming pile of untested and nigh untestable code with the chicken/egg problem of having to refactor the code in order to test it, but having to test it in order to refactor it. Ick. At this point I have to write some very dumb, high level "does the program output the same thing it did before" sort of tests just to make sure I didn't break something.
Static types, as envisioned in Java or C++ or C#, really only solve a small class of programming problems. They guarantee your interfaces are passed bits of data with the right label. But just because you get a Collection doesn't mean that Collection contains the data you think it does. Because you get an integer doesn't mean you got the right integer. Your method takes a User object, but is that User logged in?
Classic example: public static double sqrt(double a) is the signature for the Java square root function. Square root doesn't work on negative numbers. Where does it say that in the signature? It doesn't. Even worse, where does it say what that function even does? The signature only says what types it takes and what it returns. It says nothing about what happens in between and that's where the interesting code lives. Some people have tried to capture the full API by using design by contract, which can broadly be described as embedding run-time tests of your function's inputs, outputs and side effects (or lack thereof)... but that's another show.
An API is far more than just function signatures (if it wasn't, you wouldn't need all that descriptive prose in the Javadocs) and refactoring is far more even than just changing the API.
The biggest refactoring advantage a statically typed, statically compiled, non-dynamic language gives you is the ability to write refactoring tools to do quite complex refactorings for you because it knows where all the calls to your methods are. I'm pretty envious of IntelliJ IDEA.
I would say refactoring goes beyond what the compiler can check, even in statically-typed languages. Refactoring is just changing a programs internal structure without affecting the external behavior. Even in dynamic languages, there are still things that you can expect to happen and test for, you just lose a little bit of assistance from the compiler.
One of the benefits of using var in C# 3.0 is that you can often change the type without breaking any code. The type needs to still look the same - properties with the same names must exist, methods with the same or similar signature must still exist. But you can really change to a very different type, even without using something like ReSharper.

Are functional languages inherently slow? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
Why are functional languages always tailing behind C in benchmarks? If you have a statically typed functional language, it seems to me it could be compiled to the same code as C, or to even more optimized code since more semantics are available to the compiler. Why does it seem like all functional languages are slower than C, and why do they always need garbage collection and excessive use of the heap?
Does anyone know of a functional language appropriate for embedded / real-time applications, where memory allocation is kept to a minimum and the produced machine code is lean and fast?
Are functional languages inherently slow?
In some sense, yes. They require infrastructure that inevitably adds overheads over what can theoretically be attained using assembler by hand. In particular, first-class lexical closures only work well with garbage collection because they allow values to be carried out of scope.
Why are functional languages always tailing behind C in benchmarks?
Firstly, beware of selection bias. C acts as a lowest common denominator in benchmark suites, limiting what can be accomplished. If you have a benchmark comparing C with a functional language then it is almost certainly an extremely simple program. Arguably so simple that it is of little practical relevance today. It is not practically feasible to solve more complicated problems using C for a mere benchmark.
The most obvious example of this is parallelism. Today, we all have multicores. Even my phone is a multicore. Multicore parallelism is notoriously difficult in C but can be easy in functional languages (I like F#). Other examples include anything that benefits from persistent data structures, e.g. undo buffers are trivial with purely functional data structures but can be a huge amount of work in imperative languages like C.
Why does it seem like all functional languages are slower than C, and why do they always need garbage collection and excessive use of the heap?
Functional languages will seem slower because you'll only ever see benchmarks comparing code that is easy enough to write well in C and you'll never see benchmarks comparing meatier tasks where functional languages start to excel.
However, you've correctly identified what is probably the single biggest bottleneck in functional languages today: their excessive allocation rates. Nice work!
The reasons why functional languages allocate so heavily can be split into historical and inherent reasons.
Historically, Lisp implementations have been doing a lot of boxing for 50 years now. This characteristic spread to many other languages which use Lisp-like intermediate representations. Over the years, language implementers have continually resorted to boxing as a quick fix for complications in language implementation. In object oriented languages, the default has been to always heap allocate every object even when it can obviously be stack allocated. The burden of efficiency was then pushed onto the garbage collector and a huge amount of effort has been put into building garbage collectors that can attain performance close to that of stack allocation, typically by using a bump-allocating nursery generation. I think that a lot more effort should be put into researching functional language designs that minimize boxing and garbage collector designs that are optimized for different requirements.
Generational garbage collectors are great for languages that heap allocate a lot because they can be almost as fast as stack allocation. But they add substantial overheads elsewhere. Today's programs are increasingly using data structures like queues (e.g. for concurrent programming) and these give pathological behaviour for generational garbage collectors. If the items in the queue outlive the first generation then they all get marked, then they all get copied ("evacuated"), then all of the references to their old locations get updated and then they become eligible for collection. This is about 3× slower than it needs to be (e.g. compared to C). Mark region collectors like Beltway (2002) and Immix (2008) have the potential to solve this problem because the nursery is replaced with a region that can either be collected as if it were a nursery or, if it contains mostly reachable values, it can be replaced with another region and left to age until it contains mostly unreachable values.
Despite the pre-existence of C++, the creators of Java made the mistake of adopting type erasure for generics, leading to unnecessary boxing. For example, I benchmarked a simple hash table running 17× faster on .NET than the JVM partly because .NET did not make this mistake (it uses reified generics) and also because .NET has value types. I actually blame Lisp for making Java slow.
All modern functional language implementations continue to box excessively. JVM-based languages like Clojure and Scala have little choice because the VM they target cannot even express value types. OCaml sheds type information early in its compilation process and resorts to tagged integers and boxing at run-time to handle polymorphism. Consequently, OCaml will often box individual floating point numbers and always boxes tuples. For example, a triple of bytes in OCaml is represented by a pointer (with an implicit 1-bit tag embedded in it that gets checked repeatedly at run-time) to a heap-allocated block with a 64 bit header and 192 bit body containing three tagged 63-bit integers (where the 3 tags are, again, repeatedly examined at run time!). This is clearly insane.
Some work has been done on unboxing optimizations in functional languages but it never really gained traction. For example, the MLton compiler for Standard ML was a whole-program optimizing compiler that did sophisticated unboxing optimizations. Sadly, it was before its time and the "long" compilation times (probably under 1s on a modern machine!) deterred people from using it.
The only major platform to have broken this trend is .NET but, amazingly, it appears to have been an accident. Despite having a Dictionary implementation very heavily optimized for keys and values that are of value types (because they are unboxed) Microsoft employees like Eric Lippert continue to claim that the important thing about value types is their pass-by-value semantics and not the performance characteristics that stem from their unboxed internal representation. Eric seems to have been proven wrong: more .NET developers seem to care more about unboxing than pass-by-value. Indeed, most structs are immutable and, therefore, referentially transparent so there is no semantic difference between pass-by-value and pass-by-reference. Performance is visible and structs can offer massive performance improvements. The performance of structs even saved Stack Overflow and structs are used to avoid GC latency in commercial software like Rapid Addition's!
The other reason for heavy allocation by functional languages is inherent. Imperative data structures like hash tables use huge monolithic arrays internally. If these were persistent then the huge internal arrays would need to be copied every time an update was made. So purely functional data structures like balanced binary trees are fragmented into many little heap-allocated blocks in order to facilitate reuse from one version of the collection to the next.
Clojure uses a neat trick to alleviate this problem when collections like dictionaries are only written to during initialization and are then read from a lot. In this case, the initialization can use mutation to build the structure "behind the scenes". However, this does not help with incremental updates and the resulting collections are still substantially slower to read than their imperative equivalents. On the up-side, purely functional data structures offer persistence whereas imperative ones do not. However, few practical applications benefit from persistence in practice so this is often not advantageous. Hence the desire for impure functional languages where you can drop to imperative style effortlessly and reap the benefits.
Does anyone know of a functional language appropriate for embedded / real-time applications, where memory allocation is kept to a minimum and the produced machine code is lean and fast?
Take a look at Erlang and OCaml if you haven't already. Both are reasonable for memory constrained systems but neither generate particularly great machine code.
Nothing is inherently anything. Here is an example where interpreted OCaml runs faster than equivalent C code, because the OCaml optimizer has different information available to it, due to differences in the language. Of course, it would be foolish to make a general claim that OCaml is categorically faster than C. The point is, it depends upon what you're doing, and how you do it.
That said, OCaml is an example of a (mostly) functional language which is actually designed for performance, in contrast to purity.
Functional languages require the elimination of mutable state that is visible at the level of the language abstraction. Therefore, data that would be mutated in place by an imperative language needs to be copied instead, with the mutation taking place on the copy. For a simple example, see a quick sort in Haskell vs. C.
Furthermore, garbage collection is required because free() is not a pure function, as it has side effects. Therefore, the only way to free memory that does not involve side effects at the level of the language abstraction is with garbage collection.
Of course, in principle, a sufficiently smart compiler could optimize out much of this copying. This is already done to some degree, but making the compiler sufficiently smart to understand the semantics of your code at that level is just plain hard.
The short answer: because C is fast. As in, blazingly ridiculously crazy fast. A language simply doesn't have to be 'slow' to get its rear handed to it by C.
The reason why C is fast is that it was created by really great coders, and gcc has been optimized over the course of a couple more decades and by dozens more brilliant coders than 99% of languages out there.
In short, you're not going to beat C except for specialized tasks that require very specific functional programming constructs.
The control flow of proceedural languages much better matches the actual processing patterns of modern computers.
C maps very closely onto the assembly code its compilation produces, hence the nickname "cross-platform assembly". Computer manufacturers have spent a few decades making assembly code run as fast as possible, so C inherits all of this raw speed.
In comparison, the no side-effects, inherent parallelism of functional languages does not map onto a single processor at all well. The arbitrary order in which functions can be invoked needs to be serialised down to the CPU bottleneck: without extremely clever compilation, you're going to be context switching all the time, none of the pre-fetching will work because you're constantly jumping all over the place, ... Basically, all the optimisation work that computer manufacturers have done for nice, predictable proceedural languages is pretty much useless.
However! With the move towards lots of less powerful cores (rather than one or two turbo-charged cores), functional languages should begin to close the gap, as they naturally scale horizontally.
C is fast because it's basically a set of macros for assembler :) There is no "behind the scene" when you are writing a program in C. You alloc memory when you decide it's time to do that and you free in the same fashion. This is a huge advantage when you are writing a real time application, where predictabily is important (more than anything else, actually).
Also, C compilers are generally extremly fast because language itself is simple. It even doesn't make any type checkings :) This also means that is easier to make hard to find errors.
Ad advantage with the lack of type checking is that a function name can just be exported with its name for example and this makes C code easy to link with other language's code
Well Haskell is only 1.8 times slower than GCC's C++, which is faster than GCC's C implementation for typical benchmark tasks.
That makes Haskell very fast, even faster than C#(Mono that is).
relative Language
speed
1.0 C++ GNU g++
1.1 C GNU gcc
1.2 ATS
1.5 Java 6 -server
1.5 Clean
1.6 Pascal Free Pascal
1.6 Fortran Intel
1.8 Haskell GHC
2.0 C# Mono
2.1 Scala
2.2 Ada 2005 GNAT
2.4 Lisp SBCL
3.9 Lua LuaJIT
source
For the record I use Lua for Games on the iPhone, thus you could easily use Haskell or Lisp if you prefer, since they are faster.
As for now, functional languages aren't used heavily for industry projects, so not enough serious work goes into optimizers. Also, optimizing imperative code for an imperative target is probably way easier.
Functional languages have one feat that will let them outdo imperative languages really soon now: trivial parallelization.
Trivial not in the sense that it is easy, but that it can be built into the language environment, without the developer needing to think about it.
The cost of robust multithreading in a thread-agnostic language like C is prohibitive for many projects.
I disagree with tuinstoel. The important question is whether the functional language provides a faster development time and results in faster code when it is used to what functional languages were meant to be used. See the efficiency issues section on Wikipedia for a glimpse of what I mean.
One more reason for bigger executable size could be lazy evaluation and non-strictness. The compiler can't figure out at compile-time when certain expressions get evaluated, so some runtime gets stuffed into the executable to handle this (to call upon the evaluation of the so-called thunks). As for performance, laziness can be both good and bad. On one hand it allows for additional potential optimization, on the other hand the code size can be larger and programmers are more likely to make bad decisions, e.g. see Haskell's foldl vs. foldr vs. foldl' vs. foldr'.

Resources