rules for exploiting pypy performance speedup vs python - performance

I have written a python program and run it both with pypy and python
i have inserted some timing prints to measure the performance difference.
in some cases the speedup is 10X, in other there is no change.
Can anyone explain if there are rules to follow when writing a program
in order to exploit the potential speedup given by pypy?
E.G. avoid some syntax, prefer some data strcutures vs others....
I found a speedup between 12x and 1,5x

PyPy tends to be fast in pure numerically-intensive codes with hot loops dealing with (small) integers/float number as it can directly use native types instead of variable-sized dynamic integer/float objects. It will still be slower than a natively-compiled C/C++ code because it needs to check the types at runtime and compile the code at runtime.
PyPy does not like (large) dynamic codes. It uses a tracing just-in-time compiler that can track which part of the code are more likely to be executed and compile this path dynamically at runtime if it is executed often. When there are many path executed changing dynamically, the overhead of the JIT can be significant, and in the worst case, PyPy can choose not to compile any path. The thing is the PyPy fallback interpreter is slower than the one of CPython (due to the ability to trace and compile the code at runtime). Some dynamic features like frame introspection is supported but it is slow (since a code is not expected to use it massively).
PyPy is not fast for short-running scripts since the JIT has to compile the code at runtime and the overhead of the JIT or using the fallback interpreter (slower than CPython) can be higher than just interpreting the code with CPython for such scripts.
PyPy use a Garbage Collector (GC) as opposed to CPython which use Automatic Reference Counting (ARC). GCs can be faster to allocate/free many objects (especially small temporary objects), but they needs to track the object alive to know which one are dead and then free them. This means codes dealing with a huge amount of references and regular object allocations can actually be slower. This includes dynamic graph-based data structures and trees for example.
The C binding APIs (C extension and CTypes but not CFFI) tends to be slower than CPython (mainly because it has been designed for CPython in the first place). This means glue codes calling a lot of wrapped C function will actually be slower with PyPy. A lot of work has been done recently to significantly improve the performance of PyPy in this case, but AFAIK PyPy is still slower. An example of use case is operation on large Numpy arrays (for small ones, embedded JITs like Numba are certainly better), as well as the CSV and pickling packages.
String operations tends to be often slower, and especially string concatenation. One reason is that CPython use efficient algorithms for string operations that are pretty-well optimized and written in C at the expense of a large and complex code base. This is a significant work for the small PyPy teams which needs to reimplement this and maintain it with the additional complexity of the JIT and the GC. As a result, operations can be less well optimized. Regarding the concatenation, the inefficiency comes from the JIT which cannot optimize out intermediate copies. That being said, note that string-appending loops should be avoided anyway.
Generators tends to be slower than simple basic loops. The simpler, the better. One should not expect the JIT to perform complex expensive optimizations at runtime since the overhead of the JIT should not be too big compared to the rest of the code (and PyPy do not know the time taken by the code ahead of time).
Global variables are slow in CPython but not in PyPy. That being said, they should not be used for software engineering reasons anyway.
This is a pretty broad topic. There are many other interesting point to consider to evaluate the performance of a given Python code. For more information, please read:
https://www.pypy.org/performance.html (performance tips)
https://speed.pypy.org (benchmark)
https://www.pypy.org/blog (articles about the development of PyPy)

There are no hard and fast rules. There are some hints in the PyPy FAQ https://doc.pypy.org/en/latest/faq.html#how-fast-is-pypy. PyPy can JIT hot python code, and may be able to make lists, dictionaries, and tuples that store one type of object (int, float, string) more efficient.

Related

Java vs. Julia: Differences in JIT Compiling and Resulting Performance

I've recently started reading about JIT compilation. On another note, I've read that well-written Julia code often performs on-par with statically compiled languages (see, e.g., paragraph 2 of the introduction section of the Julia docs) while I've recurrently heard Java often does not. Why is that?
On the surface, they seem to have in common that they both run JIT-compiled bytecode in a VM (although I am aware that Java dynamically infers which code to JIT). While I can rationalize the performance difference in Julia vs. (purely) interpreted languages like (vanilla) Python, how come two JIT-compiled languages have such different reputations for performance? Speaking of performance, I am particularly referring to scientific computing applications.
Please note that this question is intentionally phrased broadly. I feel like its possible answers could give me insights into what defines fast Julia code, given the way Julia's compiler works in comparison to other JIT compiled languages.
While AFAIK there is currently one implementation of Julia, there are several implementations of Java and not all behave the same nor use the same technics internally. Thus it does not mean much to compare languages. For example, GCJ is a GNU compiler meant to compile Java codes to native ones (ie. no JIT nor bytecode). It is now a defunct project since the open-source JIT-based implementations super-seeded this project (AFAIK even performance-wise).
The primary reference Java VM implementation is HotSpot (made by Oracle). The JIT of HotSpot use an adaptative strategy for compiling functions so to reduce the latency of the compilation. The code can be interpreted for a short period of time and if it is executed many times, then the JIT use more aggressive optimizations with multiple levels. As a result hot loops are very aggressively optimized while glue code executed once is mostly interpreted. Meanwhile, Julia is based on the LLVM compiler stack capable of producing very efficient code (it is used by Clang which is a compiler used to compile C/C++ code to native one), but it is also not yet very well suited for very dynamic codes (it works but the latency is pretty big compared to other existing JIT implementations).
The thing is Java and Julia target different domains. Java is used for example on embedded systems where latency matters a lot. It is also use for GUI applications and Web servers. Introducing a high latency during the execution is not very reasonable. This is especially why Java implementation spent a lot of time in the past so to optimize the GC (Garbage Collector) in order to reduce the latency of collections. Julia mainly target HPC/scientific applications that do not care much about latency. The main goal of Julia is to minimize the wall-clock time and not the responsiveness of the application.
I've read that well-written Julia code often performs on-par with statically compiled languages
Well, optimizing JITs like the one of Julia or the one of HotSpot are very good nowadays to compile scalar codes in hot loops. Their weakness lies in the capability to apply high-level expensive computations. For example, optimizing compilers like ICC/PGI can use the polyhedral model so to completely rewrite loops and vectorize them efficiently using SIMD instructions. This is frequent in HPC (numerically intensive) applications but very rare in embedded/Web/GUI ones. The use of the best specific instructions on the available platform is not always great in most JIT implementations (eg. bit operations) though the situation is rapidly improving. On the other hand, JIT can outperform static compilers by using runtime informations. For example, they can assume a value is a constant and optimize expressions based on that (eg. a runtime-dependent stride of 1 of a multi-dimensional array do not need additional multiplications). Still, static compilers can do similar optimisation with profile-guided optimizations (unfortunately rarely used in practice).
However, there is a catch: languages likes C/C++ compiled natively have access to lower-level features barely available in Java. This is a bit different in Julia since the link with native language code is easier and inline assembly is possible (as pointed out by #OscarSmith) enabling skilled developers to write efficient wrappers. Julia and Java use a GC that can speed up a bit some unoptimized codes but also slow down a lot some others (typically code manipulating big data-structures with a lot of references likes trees and graphs, especially in parallel codes). This is why a C/C++ code can significantly outperform a Julia/Java code. While JIT implementations can sometime (but rarely) outperform static C/C++ compilers, no compilers are perfect and there are case where nearly all implementations perform poorly. The use of (low-level) intrinsics enable developers to implement possibly faster codes at the expense of a lower portability and a higher complexity. This is not rare for SIMD code since auto-vectorization is far from being great so far. More generally, the access to lower-level features (eg. operating system specific functions, parallel tools) help to write faster codes for skilled programmers.
Chosen algorithms and methods matters often far more than the target language implementation. The best algorithm/method in one language implementation may not be the best in another. Two best algorithms/methods of two different implementation are generally hard to compare (it is fair to compare only the performance of codes if one is is nearly impossible to maintain and is very hard/long to write without bugs?). This is partially why comparing language implementation is so difficult, even on a specific problem to solve.
(purely) interpreted languages like (vanilla) Python
While the standard implementation of Python is the CPython interpreter, there are fast JIT for Python like PyPy or Pyston.
Speaking of performance, I am particularly referring to scientific computing applications
Note that scientific computing applications is still quite broad. While physicist tends to write heavily numerically intensive applications operating on large contiguous arrays where the use of multiple threads and SIMD instruction is critical, biologist tends to write codes requiring very different optimizations. For example, genomic codes tends to do a lot of string matching operations. They also often make use of complex data-structures/algorithms (eg. phylogenetic tree, compression).
Some Java features like boxing are performance killers for such applications though there are often complex way to mitigate their cost.
You may be interested by this famous language benchmark:
Julia VS C-GCC (one can see that Julia and Java are slow for binary trees, as expected, certainly due to the GC, though the Java's GC is more efficient at the expense of a bigger memory usage)
Julia VS Java-OpenJDK
C-GCC VS C-Clang
As you can see in the benchmark, the fastest implementations are generally the more-complex and/or bigger ones using the best algorithms and lower-level methods/tricks.

Difference between compiled and interpreted languages?

What are the relative strengths and weaknesses of compiled and interpreted languages?
Neither approach has a clear advantage over the other - if one approach was always better, chances are that we'd start using it everywhere!
Generally speaking, compilers offer the following advantages:
Because they can see all the code up-front, they can perform a number of analyses and optimizations when generating code that makes the final version of the code executed faster than just interpreting each line individually.
Compilers can often generate low-level code that performs the equivalent of a high-level ideas like "dynamic dispatch" or "inheritance" in terms of memory lookups inside of tables. This means that the resulting programs need to remember less information about the original code, lowering the memory usage of the generated program.
Compiled code is generally faster than interpreted code because the instructions executed are usually just for the program itself, rather than the program itself plus the overhead from an interpreter.
Generally speaking, compilers have the following drawbacks:
Some language features, such as dynamic typing, are difficult to compile efficiently because the compiler can't predict what's going to happen until the program is actually run. This means that the compiler might not generate very good code.
Compilers generally have a long "start-up" time because of the cost of doing all the analysis that they do. This means that in settings like web browsers where it's important to load code fast, compilers might be slower because they optimize short code that won't be run many times.
Generally speaking, interpreters have the following advantages:
Because they can read the code as written and don't have to do expensive operations to generate or optimize code, they tend to start up faster than compilers.
Because interpreters can see what the program does as its running, interpreters can use a number of dynamic optimizations that compilers might not be able to see.
Generally speaking, interpreters have the following disadvantages:
Interpreters typically have higher memory usage than compilers because the interpreter needs to keep more information about the program available at runtime.
Interpreters typically spend some CPU time inside of the code for the interpreter, which can slow down the program being run.
Because interpreters and compilers have complementary strengths and weaknesses, it's becoming increasingly common for language runtimes to combine elements of both. Java's JVM is a good example of this - the Java code itself is compiled, and initially it's interpreted. The JVM can then find code that's run many, many times and compile it directly to machine code, meaning that "hot" code gets the benefits of compilation while "cold" code does not. The JVM can also perform a number of dynamic optimizations like inline caching to speed up performance in ways that compilers typically don't.
Many modern JavaScript implementations use similar tricks. Most JavaScript code is short and doesn't do all that much, so they typically start off using an interpreter. However, if it becomes clear that the code is being run repeatedly, many JS engines will compile the code - or at least, compile bits and pieces of it - and optimize it using standard techniques. The net result is that the code is fast at startup (useful for loading web pages quickly) but gets faster the more that it runs.
One last detail is that languages are not compiled or interpreted. Usually, C code is compiled, but there are C interpreters available that make it easier to debug or visualize the code that's being run (they're often used in introductory programming classes - or at least, they used to be.) JavaScript used to be thought of as an interpreted language until some JS engines started compiling it. Some Python implementations are purely interpreters, but you can get Python compilers that generate native code. Now, some languages are easier to compile or interpret than others, but there's nothing stopping you from making a compiler or interpreter for any particular programming language. There's a theoretical result called the Futamura projections that shows that anything that can be interpreted can be compiled, for example.
Because the Start up time it’s cheaper and you can read interpreters program as is makes it a no brainer to me for the interpreter

Performance of Google's Go?

So has anyone used Google's Go? I was wondering how the mathematical performance (e.g. flops) is compared to other languages with a garbage collector... like Java or .NET?
Has anyone investigated this?
Theoretical performance: The theoretical performance of pure Go programs is somewhere between C/C++ and Java. This assumes an advanced optimizing compiler and it also assumes the programmer takes advantage of all features of the language (be it C, C++, Java or Go) and refactors the code to fit the programming language.
Practical performance (as of July 2011): The standard Go compiler (5g/6g/8g) is currently unable to generate efficient instruction streams for high-performance numerical codes, so the performance will be lower than C/C++ or Java. There are multiple reasons for this: each function call has an overhead of a couple of additional instructions (compared to C/C++ or Java), no function inlining, average-quality register allocation, average-quality garbage collector, limited ability to erase bound checks, no access to vector instructions from Go, compiler has no support for SSE2 on 32-bit x86 CPUs, etc.
Bottom line: As a rule of thumb, expect the performance of numerical codes implemented in pure Go, compiled by 5g/6g/8g, to be 2 times lower than C/C++ or Java. Expect the performance to get better in the future.
Practical performance (September 2013): Compared to older Go from July 2011, Go 1.1.2 is capable of generating more efficient numerical codes but they remain to run slightly slower than C/C++ and Java. The compiler utilizes SSE2 instructions even on 32-bit x86 CPUs which causes 32-bit numerical codes to run much faster, most likely thanks to better register allocation. The compiler now implements function inlining and escape analysis. The garbage collector has also been improved but it remains to be less advanced than Java's garbage collector. There is still no support for accessing vector instructions from Go.
Bottom line: The performance gap seems sufficiently small for Go to be an alternative to C/C++ and Java in numerical computing, unless the competing implementation is using vector instructions.
The Go math package is largely written in assembler for performance.
Benchmarks are often unreliable and are subject to interpretation. For example, Robert Hundt's paper Loop Recognition in C++/Java/Go/Scala looks flawed. The Go blog post on Profiling Go Programs dissects Hundt's claims.
You're actually asking several different questions. First of all, Go's math performance is going to be about as fast as anything else. Any language that compiles down to native code (which arguably includes even JIT languages like .NET) is going to perform extremely well at raw math -- as fast as the machine can go. Simple math operations are very easy to compile into a zero-overhead form. This is the area where compiled (including JIT) languages have a advantage over interpreted ones.
The other question you asked was about garbage collection. This is, to a certain extent, a bit of a side issue if you're talking about heavy math. That's not to say that GC doesn't impact performance -- actually it impacts quite a bit. But the common solution for tight loops is to avoid or minimize GC sweeps. This is often quite simple if you're doing a tight loop -- you just re-use your old variables instead of constantly allocating and discarding them. This can speed your code by several orders of magnitude.
As for the GC implementations themselves -- Go and .NET both use mark-and-sweep garbage collection. Microsoft has put a lot of focus and engineering into their GC engine, and I'm obliged to think that it's quite good all things considered. Go's GC engine is a work in progress, and while it doesn't feel any slower than .NET's architecture, the Golang folks insist that it needs some work. The fact that Go's specification disallows destructors goes a long way in speeding things up, which may be why it doesn't seem that slow.
Finally, in my own anecdotal experience, I've found Go to be extremely fast. I've written very simple and easy programs that have stood up in my own benchmarks against highly-optimized C code from some long-standing and well-respected open source projects that pride themselves on performance.
The catch is that not all Go code is going to be efficient, just like not all C code is efficient. You've got to build it correctly, which often means doing things differently than what you're used to from other languages. The profiling blog post mentioned here several times is a good example of that.
Google did a study comparing Go to some other popular languages (C++, Java, Scala). They concluded it was not as strong performance-wise:
https://days2011.scala-lang.org/sites/days2011/files/ws3-1-Hundt.pdf
Quote from the Conclusion, about Go:
Go offers interesting language features, which also allow for a concise and standardized notation. The compilers for this language are still immature, which reflects in both performance and binary sizes.

How can JVM implementations like Jython and JRuby beat their native counterparts?

I was watching this video here, where Robert Nicholson discusses P8, an implementation of PHP on the JVM. At some point he mentions that they aim to surpass native PHP in performance some time in the future.
He mentions JRuby and Jython, which started out slower than their native counterparts, but eventually surpassed them. Quercus, another PHP interpreter on the JVM claims to be 4x faster than mod_php and is also worth of note.
Does that mean that the general idea that the JVM is slower than C is wrong, or are there flaws in the original C implementations?
Does that mean that the general idea that the JVM is slower than C is wrong, or are there >flaws in the original C implementations?
A bit of both
The JVM has been around for a long time and has made significant progress in efficiency. The garbage collection, jitting, caching and other areas are more advanced than in 'reference' implementations such as PHP.
Anyone taking a look under the hood of PHP will understand why efficiency gains are easy to achieve.
I am personally doubtful that the JVM can outperform the CPython however ... but I could be wrong ... I am, this is down to the JVM GC being faster, and IronPython too. Performance improvements may be a non-reliance on the C call stack such as in stackless Python. The Jython site states
Jython is approximately as fast as CPython--sometimes faster, sometimes slower. Because >most JVMs--certainly the fastest ones--do long running, hot code will run faster overtime.
Which I can appricate as fact as the JVM will reach C performance levels as caches generate and so on basically negate the higher level aspects to the VM implementation code (a large part of which is written in C anyway)
In many interpreted languages such as PHP and Python are just bridges to equivalent C calls and dives into machine code. In the JVM, the Jitter performs a similar function by reducing the bytecode to machine-code equivalents. Eventually, the intermediate representations such as the high-level syntax and bytecode are usually reduced to C-speed or faster CPU operations anyway ... so it is all the same, just more intermediate steps which only affects the latency to full efficiency when loading new code. There comes a point in RAM where you say "what is the real difference?" and the answer is only the process that gets it there and the final representation that determines the speed of stack winding, garbage collection algorithms, register usage and logic representation such as arithmetic.
It's not too hard. If you write your implementation in C you have to write your own GC, JIT and more (to be fast and efficient). To do that really good you need really smart people with a lot of experience and give them a lot of time.
I will go out on a limb here and say that the current implementation of PHP (not based not on the knowledge of the inner working but rather on the benchmarks I have seen and on stuff people who know more about PHP told me) is not state of the art. Facebook tries to address this but they do it in a uncommon way (because of there special needs and the typical use of PHP see http://www.stanford.edu/class/ee380/Abstracts/100505.html).
Summary:
So if somebody implements PHP in java (or on any fast VM) he doesn't need to write a super GC or JIT to be fast "only" a compiler (which can be simple).
There are some hints about what the virtual machine does here. For example, it looks like the Java Virtual Machine first checks which parts of the bytecode are executed most often and then compiles the relevant parts into native code (which then should then execute with similar speed as e.g. compiled C code).
By the way, does PHP compile into bytecode or is it just interpreted using an in-memory data structure ? By translating PHP first into bytecode executable by the Java virtual machine, one benefits automatically from the existing (language-agnostic) optimizations of bytecode execution.

Why are Interpreted Languages Slow?

I was reading about the pros and cons of interpreted languages, and one of the most common cons is the slowness, but why are programs in interpreted languages slow?
Native programs runs using instructions written for the processor they run on.
Interpreted languages are just that, "interpreted". Some other form of instruction is read, and interpreted, by a runtime, which in turn executes native machine instructions.
Think of it this way. If you can talk in your native language to someone, that would generally work faster than having an interpreter having to translate your language into some other language for the listener to understand.
Note that what I am describing above is for when a language is running in an interpreter. There are interpreters for many languages that there is also native linkers for that build native machine instructions. The speed reduction (however the size of that might be) only applies to the interpreted context.
So, it is slightly incorrect to say that the language is slow, rather it is the context in which it is running that is slow.
C# is not an interpreted language, even though it employs an intermediate language (IL), this is JITted to native instructions before being executed, so it has some of the same speed reduction, but not all of it, but I'd bet that if you built a fully fledged interpreter for C# or C++, it would run slower as well.
And just to be clear, when I say "slow", that is of course a relative term.
All answers seem to miss the real important point here. It's the detail how "interpreted" code is implemented.
Interpreted script languages are slower because their method, object and global variable space model is dynamic. In my opinion this is the real definition of of script language not the fact that it is interpreted. This requires many extra hash-table lookups on each access to a variable or method call. And its the main reason why they are all terrible at multithreading and using a GIL (Global Interpreter Lock). This lookups is where most of the time is spent. It is a painful random memory lookup, which really hurts when you get a L1/L2 cache-miss.
Google's Javascript Core8 is so fast and targeting almost C speed for a simple optimization: they take the object data model as fixed and create internal code to access it like the data structure of a native compiled program. When a new variable or method is added or removed then the whole compiled code is discarded and compiled again.
The technique is well explained in the Deutsch/Schiffman paper "Efficient Implementation of the Smalltalk-80 System".
The question why php, python and ruby aren't doing this is pretty simple to answer: the technique is extremely complicated to implement.
And only Google has the money to pay for JavaScript because a fast browser-based JavaScript interpreter is their fundamental need of their billion dollar business model.
Think of the interpeter as an emulator for a machine you don't happen to have
The short answer is that the compiled languages are executed by machine instructions whereas the interpreted ones are executed by a program (written in a compiled language) that reads either the source or a bytecode and then essentially emulates a hypothetical machine that would have run the program directly if the machine existed.
Think of the interpreted runtime as an emulator for a machine that you don't happen to actually have around at the moment.
This is obviously complicated by the JIT (Just In Time) compilers that Java, C#, and others have. In theory, they are just as good as "AOT" ("At One Time") compilers but in practice those languages run slower and are handicapped by needing to have the compiler around using up memory and time at the program's runtime. But if you say any of that here on SO be prepared to attract rabid JIT defenders who insist that there is no theoretical difference between JIT and AOT. If you ask them if Java and C# are as fast as C and C++, then they start making excuses and kind of calm down a little. :-)
So, C++ totally rules in games where the maximum amount of available computing can always be put to use.
On the desktop and web, information-oriented tasks are often done by languages with more abstraction or at least less compilation, because the computers are very fast and the problems are not computationally intensive, so we can spend some time on goals like time-to-market, programmer productivity, reliable memory-safe environments, dynamic modularity, and other powerful tools.
This is a good question, but should be formulated a little different in my opinion, for example: "Why are interpreted languages slower than compiled languages?"
I think it is a common misconception that interpreted languages are slow per se. Interpreted languages are not slow, but, depending on the use case, might be slower than the compiled version. In most cases interpreted languages are actually fast enough!
"Fast enough", plus the increase in productivity from using a language like Python over, for example, C should be justification enough to consider an interpreted language. Also, you can always replace certain parts of your interpreted program with a fast C implementation, if you really need speed. But then again, measure first and determine if speed is really the problem, then optimize.
In addition to the other answers there's optimization: when you're compiling a programme, you don't usually care how long it takes to compile - the compiler has lots of time to optimize your code. When you're interpreting code, it has to be done very quickly so some of the more clever optimizations might not be able to be made.
Loop a 100 times, the contents of the loop are interpreted 100 times into low level code.
Not cached, not reused, not optimised.
In simple terms, a compiler interprets once into low level code
Edit, after comments:
JIT is compiled code, not interpreted. It's just compiled later not up-front
I refer to the classical definition, not modern practical implementations
A simple question, without any real simple answer. The bottom line is that all computers really "understand" is binary instructions, which is what "fast" languages like C are compiled into.
Then there are virtual machines, which understand different binary instructions (like Java and .NET) but those have to be translated on the fly to machine instructions by a Just-In-Compiler (JIT). That is almost as fast (even faster in some specific cases because the JIT has more information than a static compiler on how the code is being used.)
Then there are interpreted languages, which usually also have their own intermediate binary instructions, but the interpreter functions much like a loop with a large switch statement in it with a case for every instruction, and how to execute it. This level of abstraction over the underlying machine code is slow. There are more instructions involved, long chains of function calls in the interpreter to do even simple things, and it can be argued that the memory and cache aren't used as effectively as a result.
But interpreted languages are often fast enough for the purposes for which they're used. Web applications are invariably bound by IO (usually database access) which is an order of magnitude slower than any interpreter.
From about.com:
An Interpreted language is processed
at runtime. Every line is read,
analysed, and executed. Having to
reprocess a line every time in a loop
is what makes interpreted languages so
slow. This overhead means that
interpreted code runs between 5 - 10
times slower than compiled code. The
interpreted languages like Basic or
JavaScript are the slowest. Their
advantage is not needing to be
recompiled after changes and that is
handy when you're learning to program.
The 5-10 times slower is not necessarily true for languages like Java and C#, however. They are interpreted, but the just-in-time compilers can generate machine language instructions for some operations, speeding things up dramatically (near the speed of a compiled language at times).
There is no such thing as an interpreted language. Any language can be implemented by an interpreter or a compiler. These days most languages have implementations using a compiler.
That said, interpreters are usually slower, because they need process the language or something rather close to it at runtime and translate it to machine instructions. A compiler does this translation to machine instructions only once, after that they are executed directly.
Yeah, interpreted languages are slow...
However, consider the following. I had a problem to solve. It took me 4 minutes to solve the problem in Python, and the program took 0.15 seconds to run. Then I tried to write it in C, and I got a runtime of 0.12 seconds, and it took me 1 hour to write it. All this because the practical way to solve problem in question was to use hashtables, and the hashtable dominated the runtime anyway.
Interpreted languages need to read and interpret your source code at execution time. With compiled code a lot of that interpretation is done ahead of time (at compilation time).
Very few contemporary scripting languages are "interpreted" these days; they're typically compiled on the fly, either into machine code or into some intermediate bytecode language, which is (more efficiently) executed in a virtual machine.
Having said that, they're slower because your cpu is executing many more instructions per "line of code", since many of the instructions are spent understanding the code rather than doing whatever the semantics of the line suggest!
Read this Pros And Cons Of Interpreted Languages
This is the relevant idea in that post to your problem.
An execution by an interpreter is
usually much less efficient then
regular program execution. It happens
because either every instruction
should pass an interpretation at
runtime or as in newer
implementations, the code has to be
compiled to an intermediate
representation before every execution.
For the same reason that it's slower to talk via translator than in native language. Or, reading with dictionary. It takes time to translate.
Update: no, I didn't see that my answer is the same as the accepted one, to a degree ;-)
Wikipedia says,
Interpreting code is slower than running the compiled code because the interpreter must analyze each statement in the program each time it is executed and then perform the desired action, whereas the compiled code just performs the action within a fixed context determined by the compilation. This run-time analysis is known as "interpretive overhead". Access to variables is also slower in an interpreter because the mapping of identifiers to storage locations must be done repeatedly at run-time rather than at compile time.
Refer this IBM doc,
Interpreted program must be translated each time it is executed, there is a higher overhead. Thus, an interpreted language is generally more suited to ad hoc requests than predefined requests.
In Java though it is considered as an interpreted language, It uses JIT (Just-in-Time) compilation which mitigate the above issue by using a caching technique to cache the compiled bytecode.
The JIT compiler reads the bytecodes in many sections (or in full, rarely) and compiles them dynamically into machine code so the program can run faster. This can be done per-file, per-function or even on any arbitrary code fragment; the code can be compiled when it is about to be executed (hence the name "just-in-time"), and then cached and reused later without needing to be recompiled.

Resources