Obtaining source code from the binary - compilation

Shouldn't it be possible to obtain the source code from its binary?
Since compilation is the process of converting high level language (source code) into low level language (machine code), can't we just reverse the process in order to obtain the source code? If not, why?

Suppose I give you the number 3, and tell you that I obtained it by summing two numbers. Can you tell me the two numbers that 3 is a sum of? It's impossible, because sum is a one-way function - it's impossible to recover its arguments from its output. I could have obtained it from -55 and 58, even if for you 1 and 2 still works out to the same answer.
Compilation is similar. There's an infinite number of C++ programs that will generate any particular machine code output (more or less).
You can certainly reverse the compilation process and produce C or C++ code that would result, at least, in machine code with same semantics (meaning), but likely not byte-for-byte identical. Such tools exist to a varying degree.
So yes, it is possible, but because a lot of information from the source code necessarily has to be lost, the code you'll get back will not yield much insight into the design of the original source code. For any significantly-sized project, what you'd get back would be code that does the same thing, but is pretty much unreadable for a human. It'd be some very, very obfuscated C/C++.
Why is the information lost? Because the whole big deal with high-level languages is that they should be efficient for humans to deal with. The more high-level and human-comprehensible the source code is, the more it differs from the machine code that results when the compiler is done. As a software designer, your primary objective is to leverage the compiler and other code generating tools to transform high-level ideas and concepts into machine code. The larger the gap between the two, the more information about high-level design is lost.
Remember that the only thing that a compiler has to preserve is the semantics (meaning) of your code. As long as it appears that the code is doing what you meant it to do, everything is fine. Modern compilers can, for example, pre-execute parts of your code and only store the results of the operations, when it makes "sense" to do so according to some metric. Say that your entire program reads as follows:
#include <iostream>
#include <cmath>
int main() {
std::cout << sin(1)+cos(2) << std::endl;
return 0;
}
Assuming a Unix system, the compiler is perfectly within its right to produce machine code that executes two syscalls: a write to stdout, with a constant buffer, and an exit. Upon decompiling such code, you could get:
#include <unistd.h>
#include <cstdlib>
int main() {
write(0, "0.425324\n", 9);
exit(0);
}

Related

If you write a compiler in pure Prolog, will it work as a decompiler also?

If you write a compiler in pure Prolog (no extra-logical bits), will it work as a decompiler also?
(A book I was reading opined on this, but I wonder if anyone has actually tried it)
I once wrote the equivalent of cdecl.org as a reversible program. It was a bit tricky, but I demonstrated that it could be done. (Somewhere in a pile of papers is the source code; one of these days, I hope to publish it on github.) The code was 2 or 3 times as compact at some existing code that used tools such as yacc/lex (bison/flex).
For something like cdecl -- where you're translating between char ** const * const x and declare x as const pointer to const pointer to pointer to char, compiling/decompiling makes sense. But what does it mean to translate from arbitrary machine code to source code? Even translating between some IR and source code doesn't seem to make a lot of sense.
This question needs to be much more precise, as we don't know what a "compiler" is (an extraneous-information-dumping transformation from a graph - the program in language 1 - to another graph - the algorithmically equivalent graph in language 2, I suppose). It also not clear what "no-extra logical bits implies". If yo get rid of these, what kind of compilers can you still build?
Seen this way, compilation looks like pure deduction (Prolog running forward, or CHR) while decompilation looks like possibly very hard search (you will get a program among the gazillion possible ones but it won't be pleasant too look at and in no way resemble the one you had earlier). Someone who as a toolbox of theorems freshly in his mind can certainly say more.
But I would say not automagically, no. For one, there will be no guarantee that an infinite "recursion on the left" loop won't appear when "decompiling".

How to know what code has been optimized out?

Is there a way to get information during or after compilation about what parts of the code have been optimized out, but without looking at the assembly or executing the code.
It'd be nice to know immediately if a big code chunk gets optimized away.
Sorry, but your expectations do not match what compilers actually do. Whether you're trying to find dead code or to find bugs that cause code that should run to be skipped, it is not information that a compiler can provide in an easy-to-read form.
With a compiler that translates each line of source code into a sequence of machine instructions, the compiler could easily tell you that it didn't include anything corresponding to a particular line. Of course it couldn't tell you if a line was translated to machine instructions but those machine instructions in fact won't ever be executed — code reachability is undecidable — but I don't think that's what you're after anyway.
The problem is that modern optimizing compilers are a lot more complex than that. A piece of code is often copied around and compiled multiple times under different assumptions (specialization, partial evaluation, loop unrolling, …). Or, conversely, pieces of code can be merged together (function inlining, …). There isn't a simple correspondence between source code and machine code. (That's why debuggers sometimes have trouble reporting the exact source code location of a binary instruction.)
If a big chunk of code gets optimized away, that may simply because it's one of many specialized copies and that particular specialization never happens (e.g. there's separate code for x==0 and x!=0, and separate code for y==0 and y!=0, and x and y are never 0 together so the x==0 && y==0 branch is eventually dropped). It may be something generated by a compile-time conditional instruction, such as a C macro that the compiler optimizes; this happens so often in C code that if compilers reported all such instances, that would create a lot of false positives.
Getting useful reports of potentially unused code or suspicious-looking program code that could indicate a bug requires a rather different kind of static analysis than what compilers do. There are tools that can do that, but they're typically not the same tools that convert source code to optimized machine code. Making static analysis tools that both detect potential problems often enough to be useful and don't produce so many false positives that they're practically unusable is not easy.
gcc -S
will output the assembly code that would have been passed to the assembler (and eventually been linked into the executable). If you squint the right way (and are patient), you can work backwards from that to confirm whether a given bit of code has actually been included in the executable, or was optimized away.
Obviously not something you'd do unless you have a suspicion that something was going on, given the time and effort required...

Why is Befunge considered hard to compile?

One of the design goals of Befunge was to be hard to compile. However, it is quite easy to interpret. One can write an interpreter in a conventional language, say C. To translate a Befunge program to equivalent machine code, one can hard-code the Befunge code into the C interpreter, and compile the resulting C program to machine code. Or does "compile" mean something more restricted which excludes this translation?
To translate a Befunge program to equivalent machine code, one can hard-code the Befunge code into the C interpreter, and compile the resulting C program to machine code.
Yes, sure. This can be applied to any interpreter, esoteric language or not, and under some definitions this can be called a compiler.
But that's not what is meant in "compilation" in the context of Befunge - and I'd argue that calling this a "compiler" is very much missing the point of compilation, which is to convert code in some (higher) language to semantically equivalent code in some other (lower) language. No such conversion is being done here.
Under this definition, Befunge is indeed a hard language to convert in such a way, since given an instruction it's hard to know - at compile time - what the next instruction will be.
Befunge is impossible to really AOT compile due to p. In terms of JITs, it's a cakewalk compared to all those dynamic languages out there. I've worked on some fast implementations.
marsh gains it's speed by being a threaded interpreter. In order to speed up instruction dispatch it has to create 4 copies of the interpreter, for each direction. I optimize bounds checking & lookup by storing the program in a 80x32 space instead of a 80x25 space
bejit was my observation that the majority of program time is spent in moving around. bejit records a trace as it interprets, & if the same location is ever hit in the same direction we jump to an internal bytecode format that the trace recorded. When p performs a write on program source that we've traced, we drop all traces & return to the interpreter. In practice this executes stuff like mandel.bf 3x faster. It also opens up peephole optimization, where the tracer can apply constant propagation. This is especially useful in Befunge due to constants being built up out of multiple instructions
My python implementations compile the whole program before executing since Python's function's bytecode is immutable. This opens up possibility of whole program analysis
funge.py traces befunge instructions into CPython bytecode. It has to keep an int at the top of the stack to track stack height since CPython doesn't handle stack underflow. I was originally hoping to create a generic python bytecode optimizer, but I ended up realizing that it'd be more efficient to optimize in an intermediate format which lacked jump offsets. Besides that the common advice that arrays are faster than linked lists doesn't apply in CPython as much since arrays are arrays of pointers & a linked list will just be spreading those pointers out. So I created funge2.py
(wfunge.py is a port of funge.py in preparation for http://bugs.python.org/issue26647)
funge2.py traces instructions into a control flow graph. Unfortunately we don't get to have the static stack adjustments that the JVM & CIL demand, so optimizations are a bit harder. funge2.py does constant folding, loop unrolling, some stack depth tracking to reduce stack depth checks, & I'm in the process of adding more (jump to jump optimizations, smarter stack depth juggling, not-jump or jump-pop or dup-jump combining)
By the time funge2 gets to optimizing Befunge, it's a pretty simple IR
load const
binop (+, -, *, /, %, >)
not
pop
dup
swap
printint/printchar/printstr (the last for when constant folding makes these deterministic)
getint/getchar
readmem
writemem
jumprand
jumpif
exit
Which doesn't seem so hard to compile
Befunge is hard to compile due to the p and g commands. With these you can put and get commands during runtime, i.e. write self-altering code.
There is no way you can translate that directly to assembly, let alone binary code.
If you embed a Befunge-program into the interpreter code and compile that, you are still compiling the interpreter, not the Befunge-program...

AVR's Program memory

I ve written a code in C for ATmega128 and
I d like to know how the changes that I do in the code influence the Program Memory.
To be more specific, let's consider that the code is similar to that one:
d=fun1(a,b);
c=fun2(c,d);
the change that I do in the code is that I call the same functions more times e.g.:
d=fun1(a,b);
c=fun2(c,d);
h=fun1(k,l);
n=fun2(p,m);
etc...
I build the solution at the AtmelStudio 6.1 and I see the changes in the Program Memory.
Is there anyway to foresee, without builiding the solution, how the chages in the code will affect the program memory?
Thanks!!
Generally speaking this is next to impossible using C/C++ (that means the effort does not pay off).
In your simple case (the number of calls increase), you can determine the number of instructions for each call, and multiply by the number. This will only be correct, if the compiler does not inline in all cases, and does not apply optimzations at a higher level.
These calculations might be wrong, if you upgrade to a newer gcc version.
So normally you only get exact numbers when you compare two builds (same compiler version, same optimisations). avr-size and avr-nm gives you all information, for example to compare functions by size. You can automate this task (by converting the output into .csv files), and use a spreadsheet or diff to look for changes.
This method normally only pays off, if you have to squeeze a program into a smaller device (from 4k flash into 2k for example - you already have 128k flash, that's quite a lot).
This process is frustrating, because if you apply the same design pattern in C with small differences, it can lead to different sizes: So from C/C++, you cannot really predict what's going to happen.

Why are compilers so stupid?

I always wonder why compilers can't figure out simple things that are obvious to the human eye. They do lots of simple optimizations, but never something even a little bit complex. For example, this code takes about 6 seconds on my computer to print the value zero (using java 1.6):
int x = 0;
for (int i = 0; i < 100 * 1000 * 1000 * 1000; ++i) {
x += x + x + x + x + x;
}
System.out.println(x);
It is totally obvious that x is never changed so no matter how often you add 0 to itself it stays zero. So the compiler could in theory replace this with System.out.println(0).
Or even better, this takes 23 seconds:
public int slow() {
String s = "x";
for (int i = 0; i < 100000; ++i) {
s += "x";
}
return 10;
}
First the compiler could notice that I am actually creating a string s of 100000 "x" so it could automatically use s StringBuilder instead, or even better directly replace it with the resulting string as it is always the same. Second, It does not recognize that I do not actually use the string at all, so the whole loop could be discarded!
Why, after so much manpower is going into fast compilers, are they still so relatively dumb?
EDIT: Of course these are stupid examples that should never be used anywhere. But whenever I have to rewrite a beautiful and very readable code into something unreadable so that the compiler is happy and produces fast code, I wonder why compilers or some other automated tool can't do this work for me.
In my opinion, I don't believe it's the job of the compiler to fix what is, honestly, bad coding. You have, quite explicitly, told the compiler you want that first loop executed. It's the same as:
x = 0
sleep 6 // Let's assume this is defined somewhere.
print x
I wouldn't want the compiler removing my sleep statement just because it did nothing. You may argue that the sleep statement is an explicit request for a delay whereas your example is not. But then you will be allowing the compiler to make very high-level decisions about what your code should do, and I believe that to be a bad thing.
Code, and the compiler that processes it, are tools and you need to be a tool-smith if you want to use them effectively. How many 12" chainsaws will refuse to try cut down a 30" tree? How many drills will automatically switch to hammer mode if they detect a concrete wall?
None, I suspect, and this is because the cost of designing this into the product would be horrendous for a start. But, more importantly, you shouldn't be using drills or chainsaws if you don't know what you're doing. For example: if you don't know what kickback is (a very easy way for a newbie to take off their arm), stay away from chainsaws until you do.
I'm all for allowing compilers to suggest improvements but I'd rather maintain the control myself. It should not be up to the compiler to decide unilaterally that a loop is unnecessary.
For example, I've done timing loops in embedded systems where the clock speed of the CPU is known exactly but no reliable timing device is available. In that case, you can calculate precisely how long a given loop will take and use that to control how often things happen. That wouldn't work if the compiler (or assembler in that case) decided my loop was useless and optimized it out of existence.
Having said that, let me leave you with an old story of a VAX FORTRAN compiler that was undergoing a benchmark for performance and it was found that it was many orders of magnitude faster than its nearest competitor.
It turns out the compiler noticed that the result of the benchmark loops weren't being used anywhere else and optimized the loops into oblivion.
Oh, I don't know. Sometimes compilers are pretty smart. Consider the following C program:
#include <stdio.h> /* printf() */
int factorial(int n) {
return n == 0 ? 1 : n * factorial(n - 1);
}
int main() {
int n = 10;
printf("factorial(%d) = %d\n", n, factorial(n));
return 0;
}
On my version of GCC (4.3.2 on Debian testing), when compiled with no optimizations, or -O1, it generates code for factorial() like you'd expect, using a recursive call to compute the value. But on -O2, it does something interesting: It compiles down to a tight loop:
factorial:
.LFB13:
testl %edi, %edi
movl $1, %eax
je .L3
.p2align 4,,10
.p2align 3
.L4:
imull %edi, %eax
subl $1, %edi
jne .L4
.L3:
rep
ret
Pretty impressive. The recursive call (not even tail-recursive) has been completely eliminated, so factorial now uses O(1) stack space instead of O(N). And although I have only very superficial knowledge of x86 assembly (actually AMD64 in this case, but I don't think any of the AMD64 extensions are being used above), I doubt that you could write a better version by hand. But what really blew my mind was the code that it generated on -O3. The implementation of factorial stayed the same. But main() changed:
main:
.LFB14:
subq $8, %rsp
.LCFI0:
movl $3628800, %edx
movl $10, %esi
movl $.LC0, %edi
xorl %eax, %eax
call printf
xorl %eax, %eax
addq $8, %rsp
ret
See the movl $3628800, %edx line? gcc is pre-computing factorial(10) at compile-time. It doesn't even call factorial(). Incredible. My hat is off to the GCC development team.
Of course, all the usual disclaimers apply, this is just a toy example, premature optimization is the root of all evil, etc, etc, but it illustrates that compilers are often smarter than you think. If you think you can do a better job by hand, you're almost certainly wrong.
(Adapted from a posting on my blog.)
Speaking from a C/C++ point of view:
Your first example will be optimized by most compilers. If the java-compiler from Sun really executes this loop it's the compilers fault, but take my word that any post 1990 C, C++ or Fortran-compiler completely eliminates such a loop.
Your second example can't be optimized in most languages because memory allocation happens as a side-effect of concatenating the strings together. If a compiler would optimize the code the pattern of memory allocation would change, and this could lead to effects that the programmer tries to avoid. Memory fragmentation and related problems are issues that embedded programmers still face every day.
Overall I'm satisfied with the optimizations compilers can do these days.
Compilers are designed to be predictable. This may make them look stupid from time to time, but that's OK. The compiler writer's goals are
You should be able to look at your code and make reasonable predictions about its performance.
Small changes in the code should not result in dramatic differences in performance.
If a small change looks to the programmer like it should improve performance, it should at least not degrade performance (unless surprising things are happening in the hardware).
All these criteria militate against "magic" optimizations that apply only to corner cases.
Both of your examples have a variable updated in a loop but not used elsewhere. This case is actually quite difficult to pick up unless you are using some sort of framework that can combine dead-code elimination with other optimizations like copy propagation or constant propagation. To a simple dataflow optimizer the variable doesn't look dead. To understand why this problem is hard, see the paper by Lerner, Grove, and Chambers in POPL 2002, which uses this very example and explains why it is hard.
The HotSpot JIT compiler will only optimize code that has been running for some time. By the time your code is hot, the loop has already been started and the JIT compiler has to wait until the next time the method is entered to look for ways to optimize away the loop. If you call the method several times, you might see better performance.
This is covered in the HotSpot FAQ, under the question "I write a simple loop to time a simple operation and it's slow. What am I doing wrong?".
Seriously? Why would anyone ever write real-world code like that? IMHO, the code, not the compiler is the "stupid" entity here. I for one am perfectly happy that compiler writers don't bother wasting their time trying to optimize something like that.
Edit/Clarification:
I know the code in the question is meant as an example, but that just proves my point: you either have to be trying, or be fairly clueless to write supremely inefficient code like that. It's not the compiler's job to hold our hand so we don't write horrible code. It is our responsibility as the people that write the code to know enough about our tools to write efficiently and clearly.
Well, I can only speak of C++, because I'm a Java beginner totally. In C++, compilers are free to disregard any language requirements placed by the Standard, as long as the observable behavior is as-if the compiler actually emulated all the rules that are placed by the Standard. Observable behavior is defined as any reads and writes to volatile data and calls to library functions. Consider this:
extern int x; // defined elsewhere
for (int i = 0; i < 100 * 1000 * 1000 * 1000; ++i) {
x += x + x + x + x + x;
}
return x;
The C++ compiler is allowed to optimize out that piece of code and just add the proper value to x that would result from that loop once, because the code behaves as-if the loop never happened, and no volatile data, nor library functions are involved that could cause side effects needed. Now consider volatile variables:
extern volatile int x; // defined elsewhere
for (int i = 0; i < 100 * 1000 * 1000 * 1000; ++i) {
x += x + x + x + x + x;
}
return x;
The compiler is not allowed to do the same optimization anymore, because it can't prove that side effects caused by writing to x could not affect the observable behavior of the program. After all, x could be set to a memory cell watched by some hardware device that would trigger at every write.
Speaking of Java, I have tested your loop, and it happens that the GNU Java Compiler (gcj) takes in inordinate amount of time to finish your loop (it simply didn't finish and I killed it). I enabled optimization flags (-O2) and it happened it printed out 0 immediately:
[js#HOST2 java]$ gcj --main=Optimize -O2 Optimize.java
[js#HOST2 java]$ ./a.out
0
[js#HOST2 java]$
Maybe that observation could be helpful in this thread? Why does it happen to be so fast for gcj? Well, one reason surely is that gcj compiles into machine code, and so it has no possibility to optimize that code based on runtime behavior of the code. It takes all its strongness together and tries to optimize as much as it can at compile time. A virtual machine, however, can compile code Just in Time, as this output of java shows for this code:
class Optimize {
private static int doIt() {
int x = 0;
for (int i = 0; i < 100 * 1000 * 1000 * 1000; ++i) {
x += x + x + x + x + x;
}
return x;
}
public static void main(String[] args) {
for(int i=0;i<5;i++) {
doIt();
}
}
}
Output for java -XX:+PrintCompilation Optimize:
1 java.lang.String::hashCode (60 bytes)
1% Optimize::doIt # 4 (30 bytes)
2 Optimize::doIt (30 bytes)
As we see, it JIT compiles the doIt function 2 times. Based on the observation of the first execution, it compiles it a second time. But it happens to have the same size as bytecode two times, suggesting the loop is still in place.
As another programmer shows, execution time for certain dead loops even is increased for some cases for subsequently compiled code. He reported a bug which can be read here, and is as of 24. October 2008.
On your first example, it's an optimization that only works if the value is zero. The extra if statement in the compiler needed to look for this one rarely-seen clause may just not be worth it (since it'll have to check for this on every single variable). Furthermore, what about this:
int x = 1;
int y = 1;
int z = x - y;
for (int i = 0; i < 100 * 1000 * 1000 * 1000; ++i) {
z += z + z + z + z + z;
}
System.out.println(z);
This is still obviously the same thing, but now there's an extra case we have to code for in the compiler. There's just an infinite amount of ways that it can end up being zero that aren't worth coding in for, and I guess you could say that if you're going to have one of them you'd might as well have them all.
Some optimizations do take care of the second example you have posted, but I think I've seen it more in functional languages and not so much Java. The big thing that makes it hard in newer languages is monkey-patching. Now += can have a side-effect that means if we optimize it out, it's potentially wrong (e.g. adding functionality to += that prints out the current value will mean a different program altogether).
But it comes down to the same thing all over again: there's just too many cases you'd have to look for to make sure no side effects are being performed that will potentially alter the final program's state.
It's just easier to take an extra moment and make sure what you're writing is what you really want the computer to do. :)
Compilers in general are very smart.
What you must consider is that they must account for every possibly exception or situation where optimizing or re-factoring code could cause unwanted side-effects.
Things like, threaded programs, pointer aliasing, dynamically linked code and side effects (system calls/memory alloc) etc. make formally prooving refactoring very difficult.
Even though your example is simple, there still may be difficult situations to consider.
As for your StringBuilder argument, that is NOT a compilers job to choose which data structures to use for you.
If you want more powerful optimisations move to a more strongly typed language like fortran or haskell, where the compilers are given much more information to work with.
Most courses teaching compilers/optimisation (even acedemically) give a sense of appreciation about how making gerneral formally prooven optimisatons rather than hacking specific cases is a very difficult problem.
I think you are underestimating how much work it is to make sure that one piece of code doesn't affect another piece of code. With just a small change to your examples x, i, and s could all point to the same memory. Once one of the variables is a pointer, it is much harder to tell what code might have side effects depending on is point to what.
Also, I think people who program compliers would rather spend time making optimizations that aren't as easy for humans to do.
Because we're just not there yet. You could just as easily have asked, "why do I still need to write programs... why can't I just feed in the requirements document and have the computer write the application for me?"
Compiler writers spend time on the little things, because those are the types of things that application programmers tend to miss.
Also, they cannot assume too much (maybe your loop was some kind of ghetto time delay or something)?
It's an eternal arms race between compiler writers and programmers.
Non-contrived examples work great -- most compilers do indeed optimize away the obviously useless code.
Contrived examines will always stump the compiler. Proof, if any was needed, that any programmer is smarter than any program.
In the future, you'll need more contrived examples than the one's you've posted here.
As others have addressed the first part of your question adequately, I'll try to tackle the second part, i.e. "automatically uses StringBuilder instead".
There are several good reasons for not doing what you're suggesting, but the biggest factor in practice is likely that the optimizer runs long after the actual source code has been digested & forgotten about. Optimizers generally operate either on the generated byte code (or assembly, three address code, machine code, etc.), or on the abstract syntax trees that result from parsing the code. Optimizers generally know nothing of the runtime libraries (or any libraries at all), and instead operate at the instruction level (that is, low level control flow and register allocation).
Second, as libraries evolve (esp. in Java) much faster than languages, keeping up with them and knowing what deprecates what and what other library component might be better suited to the task would be a herculean task. Also likely an impossible one, as this proposed optimizer would have to precisely understand both your intent and the intent of each available library component, and somehow find a mapping between them.
Finally, as others have said (I think), the compiler/optimizer writer can reasonably assume that the programmer writing the input code is not brain-dead. It would be a waste of time to devote significant effort to asinine special cases like these when other, more general optimizations abound. Also, as others have also mentioned, seemingly brain-dead code can have an actual purpose (a spin lock, busy wait prior to a system-level yield, etc.), and the compiler has to honor what the programmer asks for (if it's syntactically and semantically valid).
Did you compile to release code? I think a good compiler detects in your second example that the string is never used an removes the entire loop.
Actually, Java should use string builder in your second example.
The basic problem with trying to optimize these examples away is that doing so would require theorem proving. Which means that the compiler would need to construct a mathematical proof of what you're code will actually do. And that's no small task at all. In fact, being able to prove that all code really does have an effect is equivalent to the halting problem.
Sure, you can come up with trivial examples, but the number of trivial examples is unlimited. You could always think of something else, so there is no way to catch them all.
Of course, it is possible for some code to be proven not to have any effect, as in your examples. What you would want to do is have the compiler optimize away every problem that can be proven unused in P time.
But anyway, that's a ton of work and it doesn't get you all that much. People spend a lot of time trying to figure out ways to prevent programs from having bugs in them, and type systems like those in Java and Scala are attempts to prevent bugs, but right now no one is using type systems to make statements about execution time, as far as I know.
You might want to look into Haskel, which I think has the most advanced theory proving stuff, although I'm not sure on that. I don't know it myself.
Mostly what you're complaining about is 'why are Java compiler so stupid', since most other language compilers are much smarter.
The reason for the stupidity of Java compilers is historical. First, the original java implementations were interpreter based, and performance was consisdered unimportant. Second, many of the original java benchmarks were problematic to optimize. I recall one benchmark that looked a lot like your second example. Unfortunately, if the compiler optimized the loop away, the benchmark would get a divide by zero exception when it tried to divide a baseline number by the elapsed time to compute its performance score. So when writing a optimizing java compiler, you had to be very careful NOT to optimize some things, as people would then claim your compiler was broken.
It's almost considered bad practice to optimize things like this when compiling down to JVM bytecode. Sun's javac does have some basic optimizations, as does scalac, groovyc, etc. In short, anything that's truely language-specific can get optimized within the compiler. However, things like this which are obviously so contrived as to be language agnostic will slip through simply out of policy.
The reason for this is it allows HotSpot to have a much more consistent view of the bytecode and its patterns. If the compilers start mucking about with edge cases, that reduces the VM's ability to optimize the general case which may not be apparent at compile time. Steve Yeggie likes to harp on about this: optimization is often easier when performed at runtime by a clever virtual machine. He even goes so far as to claim that HotSpot strips out javac's optimizations. While I don't know if this is true, it wouldn't surprise me.
To summarize: compilers targeting VMs have a very different set of criteria, particularly in the area of optimization and when it is appropriate. Don't go blaming the compiler writers for leaving the work to the far-more-capable JVM. As pointed out several times on this thread, modern compilers targeting the native architecture (like the gcc family) are extremely clever, producing obscenely fast code through some very smart optimizations.
I have never seen the point in dead code elimination in the first place. Why did the programmer write it?? If you're going to do something about dead code, declare it a compiler error! It almost certainly means the programmer made a mistake--and for the few cases it doesn't, a compiler directive to use a variable would be the proper answer. If I put dead code in a routine I want it executed--I'm probably planning to inspect the results in the debugger.
The case where the compiler could do some good is pulling out loop invariants. Sometimes clarity says to code the calculation in the loop and having the compiler pull such things out would be good.
Compilers that can do strict-aliasing optimizations, will optimize first example out. See here.
Second example can't be optimized because the slowest part here is memory allocation/reallocation and operator+= is redefined into a function that does the memory stuff. Different implementations of strings use different allocation strategies.
I myself also would rather like to have malloc(100000) than thousand malloc(100) too when doing s += "s"; but right now that thing is out of scope of compilers and has to be optimized by people. This is what D language tries to solve by introducing pure functions.
As mentioned here in other answers, perl does second example in less than a second because it allocates more memory than requested just in case more memory will be needed later.
In release mode VS 2010 C++ this doesnt take any time to run. However debug mode is another story.
#include <stdio.h>
int main()
{
int x = 0;
for (int i = 0; i < 100 * 1000 * 1000 * 1000; ++i) {
x += x + x + x + x + x;
}
printf("%d", x);
}
Absolute optimization is an undecidable problem, that means, there is no Turing machine (and, therefore, no computer program) that can yield the optimal version of ANY given program.
Some simple optimizations can be (and, in fact, are) done, but, in the examples you gave...
To detect that your first program always prints zero, the compiler would have to detect that x remains constant despite all the loop iterations. How can you explain (I know, it's not the best word, but I can't come up with another) that to a compiler?
How can the compiler know that the StringBuilder is the right tool for the job without ANY reference to it?
In a real-world application, if efficiency is critical in a part of your application, it must be written in a low-level language like C. (Haha, seriously, I wrote this?)
This is an example of procedural code v. functional code.
You have detailed a procedure for the compiler to follow, so the optimisations are going to be based around the procedure detailed and will minimise any side effects or not optimise where it will not be doing what you expect. This makes it easier to debug.
If you put in a functional description of what you want eg. SQL then you are giving the compiler a wide range of options to optimise.
Perhaps some type of code analysis would be able to find this type of issue or profiling at run-time, but then you will want to change the source to something more sensible.
Because compiler writers try add optimizations for things that matter (I hope) and that are measured in *Stone benchmarks (I fear).
There are zillions of other possible code fragments like yours, which do nothing and could be optimized with increasing effort on the compiler writer, but which are hardly ever encountered.
What I feel embarrassing is that even today most compilers generate code to check for the switchValue being greater than 255 for a dense or almost full switch on an unsigned character. That adds 2 instructions to most bytecode interpreter's inner loop.
I hate to bring this up on such an old question (how did I get here, anyway?), but I think part of this might be something of a holdout from the days of the Commodore 64.
In the early 1980s, everything ran on a fixed clock. There was no Turbo Boosting and code was always created for a specific system with a specific processor and specific memory, etc. In Commodore BASIC, the standard method for implementing delays looked a lot like:
10 FOR X = 1 TO 1000
20 NEXT : REM 1-SECOND DELAY
(Actually, in practice, it more closely resembled 10FORX=1TO1000:NEXT, but you know what I mean.)
If they were to optimize this, it would break everything—nothing would ever be timed. I don't know of any examples, but I'm sure there are lots of little things like this scattered through the history of compiled languages that prevented things from being optimized.
Admittedly, these non-optimizations aren't necessary today. There's probably, however, some unspoken rule among compiler developers not to optimize things like this. I wouldn't know.
Just be glad that your code is optimized somewhat, unlike code on the C64. Displaying a bitmap on the C64 could take up to 60 seconds with the most efficient BASIC loops; thus, most games, etc. were written in machine language. Writing games in machine language isn't fun.
Just my thoughts.
Premise: I studied compilers at university.
The javac compiler is extremely stupid and performs absolutely no optimization because it relies on the java runtime to do them. The runtime will catch that thing and optimize it, but it will catch it only after the function is executed a few thousand times.
If you use a better compiler (like gcc) enabling optimizations, it will optimize your code, because it's quite an obvious optimization to do.
A compiler's job is to optimize how the code does something, not what the code does.
When you write a program, you are telling the computer what to do. If a compiler changed your code to do something other than what you told it to, it wouldn't be a very good compiler! When you write x += x + x + x + x + x, you are explicitly telling the computer that you want it to set x to 6 times itself. The compiler may very well optimize how it does this (e.g. multiplying x by 6 instead of doing repeated addition), but regardless it will still calculate that value in some way.
If you don't want something to be done, don't tell someone to do it.
Compilers are as smart as we make them. I don't know too many programmers who would bother writing a compiler that would check for constructs such as the ones you used. Most concentrate on more typical ways to improve performance.
It is possible that someday we will have software, including compilers, that can actually learn and grow. When that day comes most, maybe all, programmers will be out of job.
The meaning of your two examples is pointless, useless and only made to fool the compiler.
The compiler is not capable (and should not be) to see the meaning of a method, a loop or a program. That is where you get into the picture. You create a method for a certain functionality/meaning, no matter how stupid it is. It's the same case for simple problems
or extreme complex programs.
In your case the compiler might optimize it, because it "thinks" it should be optimized
in another way but why stay there?
Extreme other situation. We have a smart compiler compiling Windows. Tons of code to compile. But if it's smart, it boils it down to 3 lines of code...
"starting windows"
"enjoy freecell/solitaire"
"shutting down windows"
The rest of the code is obsolete, because it's never used, touched, accessed.
Do we really want that?
It forces you (the programmer) to think about what you're writing. Forcing compilers to do your work for you doesn't help anyone: it makes the compilers much more complex (and slower!), and it makes you stupider and less attentive to your code.

Resources