How do JIT interpreters handle variable names? - bytecode

Let's say I am to design a JIT interpreter that translates IL or bytecode to executable instructions at runtime. Every time a variable name is encountered in the code, the JIT interpreter has to translate that into the respective memory address, right?
What technique do JIT interpreters use in order to resolve variable references in a performant enough manner? Do they use hashing, are the variables compiled to addresses ahead of time, or am I missing something altogether?

There is a huge variety of possible answers to this question, just as there are a huge variety of answers to how to design a JIT in general.
But to take one example, consider the JVM. Java bytecode actually does not contain variable names at all, except for debugging/reflection metadata. Instead, the compiler assigns each variable an "index" from 0 to 65535 and bytecode instructions use that index. However, the VM is free to make further optimizations if it wants to. For example, it may convert everything into SSA form and then compile it into machine code, in which case variables will end up being turned into machine-registers or fixed offsets in the stack frame or optimized away entirely.
Consider another example: CPython. Python actually maintains variable names at runtime, due to its high level, flexible nature. However, the interperter still performs a few optimizations. For example, classes with a __slots__ attribute will allocate a fixed size array for the fields, and use a name -> index hashmap for dynamic lookups. I am not familiar with the implementation, but I think it does something similar with local variables. Note that normal local variable accesses (not using reflection), can be converted to a fixed offset at "compile" time.
So in short, the answer to
Do they use hashing, are the variables compiled to addresses ahead of time, or am I missing something altogether?
is yes.

Related

Does Fortran use module command have higher operation cost if there are more variables in the module?

Would this:
module variables
! 100 variables declared here
end
Be significantly more computationally expensive when used everywhere than if there are 5 modules with 20 variables each and only some are called in different places?
In other words, does the use statement iterate through all the contents or does it simply give access in a more efficient way regardless of what the specific contents are?
Thanks!
Fortran programs are (nearly always) compiled and so declaring variables should have no overhead at runtime. Except they take some space in the heap/stack. While this generally have no significant impact on the performance of programs, there are few (very) rare pathological cases. For example, declared variables can impact the the alignment in memory of other variable and the alignment can slightly impact the speed of loading variable from memory. The organization of the variable in modules does not introduce any additional overhead (as long as optimizations are enabled) since Fortran programs are compiled to monolithic binaries. I think it is a good idea not to care about that unless you see regressions. One should focus about readability & maintainability first.

Best practices to determine stack usage in Ravenscar program

I am writing an Ada program using the Ravenscar subset (thus, I am aware of the number of running tasks at execution time). The code is compiled by gcc with the -fstack-check switch enabled. This should cause the program raise a STORAGE_ERROR at runtime if any of my tasks exceed their stack.
Ada allows to set the upper limit for those (task-specific) stacks during the specification of the respective task like so:
pragma Storage_Size (Some_Value);
Now I was wondering what options I have to determine Some_Value. What I have heard of so far:
Do wild guesses until no STORAGE_ERROR is raised anymore. This is more or less what the OP suggests here.
Feed the output of -fstack-usage in there.
Use some gnat specific extensions as outlined here (how does this technically differ from item #2?).
Get a stack analyzer like gnatstack and let it do the work for you.
If I understand this correctly all the above techniques are dynamic (i.e. they require the program to run in order to work). Are static approaches also conceivable? E.g. by restricting myself further through some of Ada's high integrity options (such as No_Recursion, what else?).
Perhaps any of you can name some best practices to tackle this problem and/or extend/comment on my (surely incomplete) list.
Bonus question: What is the default size of a task's stack when the above pragma is not specified? GCC's docs only state this value depends on the runtime, without giving any concrete numbers.
You can generally check the stack space required by individual types with the 'Storage_Size attribute (which counts in bits).
Once you have tabulated this (you may need to round it up to whole words/double words), you can add up how much stack space is used by each declarative region, and then walk through your calls to find the maximum stack usage.

"GLOBAL could be very inefficient"

I am using (in Matlab) a global statement inside an if command, so that I import the global variable into the local namespace only if it is really needed.
The code analyzer warns me that "global could be very inefficient unless it is a top-level statement in its function". Thinking about possible internal implementation, I find this restriction very strange and unusual. I am thinking about two possibilities:
What this warning really means is "global is very inefficient of its own, so don't use it in a loop". In particular, using it inside an if, like I'm doing, is perfectly safe, and the warning is issued wrongly (and poorly worded)
The warning is correct; Matlab uses some really unusual variable loading mechanism in the background, so it is really much slower to import global variables inside an if statement. In this case, I'd like to have a hint or a pointer to how this stuff really works, because I am interested and it seems to be important if I want to write efficient code in future.
Which one of these two explanations is correct? (or maybe neither is?)
Thanks in advance.
EDIT: to make it clearer: I know that global is slow (and apparently I can't avoid using it, as it is a design decision of an old library I am using); what I am asking is why the Matlab code analyzer complains about
if(foo==bar)
GLOBAL baz
baz=1;
else
do_other_stuff;
end
but not about
GLOBAL baz
if(foo==bar)
baz=1;
else
do_other_stuff;
end
I find it difficult to imagine a reason why the first should be slower than the second.
To supplement eykanals post, this technical note gives an explanation to why global is slow.
... when a function call involves global variables, performance is even more inhibited. This is because to look for global variables, MATLAB has to expand its search space to the outside of the current workspace. Furthermore, the reason a function call involving global variables appears a lot slower than the others is that MATLAB Accelerator does not optimize such a function call.
I do not know the answer, but I strongly suspect this has to do with how memory is allocated and shared at runtime.
Be that as it may, I recommend reading the following two entries on the Mathworks blogs by Loren and Doug:
Writing deployable code, the very first thing he writes in that post
Top 10 MATLAB code practices that make me cry, #2 on that list.
Long story short, global variables are almost never the way to go; there are many other ways to accomplish variable sharing - some of which she discusses - which are more efficient and less error-prone.
The answer from Walter Roberson here
http://mathworks.com/matlabcentral/answers/19316-global-could-be-very-inefficient#answer_25760
[...] This is not necessarily more work if not done in a top-level command, but people would tend to put the construct in a loop, or in multiple non-exclusive places in conditional structures. It is a lot easier for a person writing mlint warnings to not have to add clarifications like, "Unless you can prove those "global" will only be executed once, in which case it isn't less efficient but it is still bad form"
supports my option (1).
Fact(from Matlab 2014 up until Matlab 2016a, and not using parallell toolbox): often, the fastest code you can achieve with Matlab is by doing nested functions, sharing your variables between functions without passing them.
The step close to that, is using global variables, and splitting your project up into multiple files. This may pull down performance slightly, because (supposedly, although I have never seen it verified in any tests) Matlab incurs overhead by retrieving from the global workspace, and because there is some kind of problem (supposedly, although never seen any evidence of it) with the JIT acceleration.
Through my own testing, passing very large data matrices (hi-res images) between calls to functions, using nested functions or global variables are almost identical in performance.
The reason that you can get superior performance with global variables or nested functions, is because you can avoid having extra data copying that way. If you send a variable to function, Matlab does so by reference, but if you modify the variable in the function, Matlab makes a copy on the fly (copy-on-write). There is no way I know of to avoid that in Matlab, except by nested functions and global variables. Any small drain you get from hinderance to JIT or global fetch times, is totally gained by avoiding this extra data copying, (when using larger data).
This may have changed with never versions of Matlab, but from what i hear from friends, I doubt it. I cant submit any test, dont have a Matlab license anymore.
As proof, look no further then this toolbox of video processing i made back in the day I was working with Matlab. It is horribly ugly under the hood, because I had no way of getting performance without globals.
This fact about Matlab (that global variables is the most optimized way you can code when you need to modify large data in different functions), is an indication that the language and/or interpreter needs to be updated.
Instead, Matlab could use a better, more dynamic notion of workspace. But nothing I have seen indicates this will ever happen. Especially when you see the community of users seemingly ignore the facts, and push forward oppions without any basis: such as using globals in Matlab are slow.
They are not.
That said, you shouldnt use globals, ever. If you are forced to do real time video processing in pure Matlab, and you find you have no other option then using globals to reach performance, you should get the hint and change language. Its time to get into higher performance languages.... and also maybe write an occasional rant on stack overflow, in hopes that Matlab can get improved by swaying the oppinions of its users.

what are the consequences of having unused functions

I'm wondering what / if any consequences there are in having unused functions in code?
If you hunt down and remove all unused functions and variables would there be any percievable improvement in performance?
Or is it just good practice to remove unused functions and variables would?
Unused functions can't harm performance. They are making the job harder for guys who are maintaining the code. Modern IDE's keep track of unused functions/methods and variables. If it's not a case with the technology that you are speaking about maintainers will have to deal with unused code thinking it's necessary.
Depending on your compiler / linker, it may have no cost at all (and even be removed automatically), or give a small penalty because the code is bigger and gives cache misses. But I'd expect it too be very minor difference.
Edit: Removal cannot be done automatically when there are chances that other code will call it, i.e. library code or other binary that can later be reused. It is also language dependent - if you write JavaScript, everything will get loaded and probably parsed, so this will make a much bigger penalty than in compiled languages.
there is a security issue: if an attacker can control execution of your application (buffer overflow, crosssite scripting etc.), code fragments in memory will make it easier for him to achieve something significant (especially true if code fragments access privileged resources such as registry keys and files).
In most languages unused functions will not have any measurable performance impact on execution. Unused functions will affect the code/binary size. In Javascript this affects the download time and some parsing time.
Unused variables might affect performance a little bit, since they do some memory allocations. But the overhead of some unused variable here and there is probably not measurable either.
The big benefit of removing unused code is better control during development. If you do a change you don't need to go through lots of unused code to check if it might be effected.
OCaml and Haskell warn you of unused functions/variables on the assumption that if you defined them it must have been for a reason, and not using them may indicate a typo somewhere else in the code (e.g. calling a similarly named function instead). For the benefit of this additional help, I try to avoid or at least comment out things that I don't use.
A good compiler will simply optimize away unused code, so there is no penalty at runtime.
Just a good practice.
Nearly every compiler/linker will skip non-used code when compiling with optimizations turned on.
They will increase the compile time, but the final binary (or library) will not increase, because all unused symbols should be striped.
As already mentioned, there are no run-time penalty.

Why are Interpreted Languages Slow?

I was reading about the pros and cons of interpreted languages, and one of the most common cons is the slowness, but why are programs in interpreted languages slow?
Native programs runs using instructions written for the processor they run on.
Interpreted languages are just that, "interpreted". Some other form of instruction is read, and interpreted, by a runtime, which in turn executes native machine instructions.
Think of it this way. If you can talk in your native language to someone, that would generally work faster than having an interpreter having to translate your language into some other language for the listener to understand.
Note that what I am describing above is for when a language is running in an interpreter. There are interpreters for many languages that there is also native linkers for that build native machine instructions. The speed reduction (however the size of that might be) only applies to the interpreted context.
So, it is slightly incorrect to say that the language is slow, rather it is the context in which it is running that is slow.
C# is not an interpreted language, even though it employs an intermediate language (IL), this is JITted to native instructions before being executed, so it has some of the same speed reduction, but not all of it, but I'd bet that if you built a fully fledged interpreter for C# or C++, it would run slower as well.
And just to be clear, when I say "slow", that is of course a relative term.
All answers seem to miss the real important point here. It's the detail how "interpreted" code is implemented.
Interpreted script languages are slower because their method, object and global variable space model is dynamic. In my opinion this is the real definition of of script language not the fact that it is interpreted. This requires many extra hash-table lookups on each access to a variable or method call. And its the main reason why they are all terrible at multithreading and using a GIL (Global Interpreter Lock). This lookups is where most of the time is spent. It is a painful random memory lookup, which really hurts when you get a L1/L2 cache-miss.
Google's Javascript Core8 is so fast and targeting almost C speed for a simple optimization: they take the object data model as fixed and create internal code to access it like the data structure of a native compiled program. When a new variable or method is added or removed then the whole compiled code is discarded and compiled again.
The technique is well explained in the Deutsch/Schiffman paper "Efficient Implementation of the Smalltalk-80 System".
The question why php, python and ruby aren't doing this is pretty simple to answer: the technique is extremely complicated to implement.
And only Google has the money to pay for JavaScript because a fast browser-based JavaScript interpreter is their fundamental need of their billion dollar business model.
Think of the interpeter as an emulator for a machine you don't happen to have
The short answer is that the compiled languages are executed by machine instructions whereas the interpreted ones are executed by a program (written in a compiled language) that reads either the source or a bytecode and then essentially emulates a hypothetical machine that would have run the program directly if the machine existed.
Think of the interpreted runtime as an emulator for a machine that you don't happen to actually have around at the moment.
This is obviously complicated by the JIT (Just In Time) compilers that Java, C#, and others have. In theory, they are just as good as "AOT" ("At One Time") compilers but in practice those languages run slower and are handicapped by needing to have the compiler around using up memory and time at the program's runtime. But if you say any of that here on SO be prepared to attract rabid JIT defenders who insist that there is no theoretical difference between JIT and AOT. If you ask them if Java and C# are as fast as C and C++, then they start making excuses and kind of calm down a little. :-)
So, C++ totally rules in games where the maximum amount of available computing can always be put to use.
On the desktop and web, information-oriented tasks are often done by languages with more abstraction or at least less compilation, because the computers are very fast and the problems are not computationally intensive, so we can spend some time on goals like time-to-market, programmer productivity, reliable memory-safe environments, dynamic modularity, and other powerful tools.
This is a good question, but should be formulated a little different in my opinion, for example: "Why are interpreted languages slower than compiled languages?"
I think it is a common misconception that interpreted languages are slow per se. Interpreted languages are not slow, but, depending on the use case, might be slower than the compiled version. In most cases interpreted languages are actually fast enough!
"Fast enough", plus the increase in productivity from using a language like Python over, for example, C should be justification enough to consider an interpreted language. Also, you can always replace certain parts of your interpreted program with a fast C implementation, if you really need speed. But then again, measure first and determine if speed is really the problem, then optimize.
In addition to the other answers there's optimization: when you're compiling a programme, you don't usually care how long it takes to compile - the compiler has lots of time to optimize your code. When you're interpreting code, it has to be done very quickly so some of the more clever optimizations might not be able to be made.
Loop a 100 times, the contents of the loop are interpreted 100 times into low level code.
Not cached, not reused, not optimised.
In simple terms, a compiler interprets once into low level code
Edit, after comments:
JIT is compiled code, not interpreted. It's just compiled later not up-front
I refer to the classical definition, not modern practical implementations
A simple question, without any real simple answer. The bottom line is that all computers really "understand" is binary instructions, which is what "fast" languages like C are compiled into.
Then there are virtual machines, which understand different binary instructions (like Java and .NET) but those have to be translated on the fly to machine instructions by a Just-In-Compiler (JIT). That is almost as fast (even faster in some specific cases because the JIT has more information than a static compiler on how the code is being used.)
Then there are interpreted languages, which usually also have their own intermediate binary instructions, but the interpreter functions much like a loop with a large switch statement in it with a case for every instruction, and how to execute it. This level of abstraction over the underlying machine code is slow. There are more instructions involved, long chains of function calls in the interpreter to do even simple things, and it can be argued that the memory and cache aren't used as effectively as a result.
But interpreted languages are often fast enough for the purposes for which they're used. Web applications are invariably bound by IO (usually database access) which is an order of magnitude slower than any interpreter.
From about.com:
An Interpreted language is processed
at runtime. Every line is read,
analysed, and executed. Having to
reprocess a line every time in a loop
is what makes interpreted languages so
slow. This overhead means that
interpreted code runs between 5 - 10
times slower than compiled code. The
interpreted languages like Basic or
JavaScript are the slowest. Their
advantage is not needing to be
recompiled after changes and that is
handy when you're learning to program.
The 5-10 times slower is not necessarily true for languages like Java and C#, however. They are interpreted, but the just-in-time compilers can generate machine language instructions for some operations, speeding things up dramatically (near the speed of a compiled language at times).
There is no such thing as an interpreted language. Any language can be implemented by an interpreter or a compiler. These days most languages have implementations using a compiler.
That said, interpreters are usually slower, because they need process the language or something rather close to it at runtime and translate it to machine instructions. A compiler does this translation to machine instructions only once, after that they are executed directly.
Yeah, interpreted languages are slow...
However, consider the following. I had a problem to solve. It took me 4 minutes to solve the problem in Python, and the program took 0.15 seconds to run. Then I tried to write it in C, and I got a runtime of 0.12 seconds, and it took me 1 hour to write it. All this because the practical way to solve problem in question was to use hashtables, and the hashtable dominated the runtime anyway.
Interpreted languages need to read and interpret your source code at execution time. With compiled code a lot of that interpretation is done ahead of time (at compilation time).
Very few contemporary scripting languages are "interpreted" these days; they're typically compiled on the fly, either into machine code or into some intermediate bytecode language, which is (more efficiently) executed in a virtual machine.
Having said that, they're slower because your cpu is executing many more instructions per "line of code", since many of the instructions are spent understanding the code rather than doing whatever the semantics of the line suggest!
Read this Pros And Cons Of Interpreted Languages
This is the relevant idea in that post to your problem.
An execution by an interpreter is
usually much less efficient then
regular program execution. It happens
because either every instruction
should pass an interpretation at
runtime or as in newer
implementations, the code has to be
compiled to an intermediate
representation before every execution.
For the same reason that it's slower to talk via translator than in native language. Or, reading with dictionary. It takes time to translate.
Update: no, I didn't see that my answer is the same as the accepted one, to a degree ;-)
Wikipedia says,
Interpreting code is slower than running the compiled code because the interpreter must analyze each statement in the program each time it is executed and then perform the desired action, whereas the compiled code just performs the action within a fixed context determined by the compilation. This run-time analysis is known as "interpretive overhead". Access to variables is also slower in an interpreter because the mapping of identifiers to storage locations must be done repeatedly at run-time rather than at compile time.
Refer this IBM doc,
Interpreted program must be translated each time it is executed, there is a higher overhead. Thus, an interpreted language is generally more suited to ad hoc requests than predefined requests.
In Java though it is considered as an interpreted language, It uses JIT (Just-in-Time) compilation which mitigate the above issue by using a caching technique to cache the compiled bytecode.
The JIT compiler reads the bytecodes in many sections (or in full, rarely) and compiles them dynamically into machine code so the program can run faster. This can be done per-file, per-function or even on any arbitrary code fragment; the code can be compiled when it is about to be executed (hence the name "just-in-time"), and then cached and reused later without needing to be recompiled.

Resources