Is F# compiler trick? - visual-studio-2010

I create a program to calculate the integration of an variable-include-expression such as "x" , "x+1" , "sin(x)"
When i debug in vs , i got the very slow calculation-process
but when i open the output execute (in bin/debug/) i got the pretty speed process, the result appear almost immediately , and of course equal to the result i got by the above test
please tell me what is that trick , why it happened ?
and if can , please guide me the algorithm to calculate the integration of arbitrary var-expression
thank alot , i love stack exchange so much ^^

it's normal to have a program running slower in debug mode than in normal free execution. There are no tricks.

If you run without debugging from visual studio you will also get good performance. The reason debugging is slowing your program down is two-fold.
First there are several optimizations that the JITer can do but will not when a debugger is attached, these optimizations make stepping through code impossible, but output equivalent information.
For instance the following code could potentially be completely removed by the JIT (except the definition).
int[] data = new int[10000];
for (int i = 0; i < data.Length; i++)
data[i] = 0;
The reason is the constructor for the array clears the contents and therefore this operation is redundant. However when debugging you may want to step through the code so it is preserved.
Additionally during high performance sections you will often have a 50% speed-down due to every other operation being a NOP, which does nothing but still takes a clock cycle. These NOPs are what enable breakpoints to function, but would not be emitted by the JIT if a debugger were not attached.
Note that what I am saying is a generalization, the actual interactions of the compiler and JIT are a bit more complex.

For your second question, the multivariate numerical integration, you can have a look at Monte Carlo integration.
If the function to be integrate is a probability distribution, you may want to read Gibbs sampling method.

Related

How to analyze the computational overhead of a Processing Sketch

Please forgive me if the terms are inaccurate or this is the wrong forum.
Essentially I write sketches in Processing and I am struggling to find out why my code runs slow.
Sometimes a sketch runs fast and I have no idea why other than there are less lines of code. Sometimes a different sketch runs slow and I have no idea why.
I am curious if there is a way within the Processing IDE, or maybe a general tool, to determine or analyze which lines of code are causing the sketch to run slow?
Such as a way to isolate "Oh, these lines are causing it to run the slowest. Looks like it's a section of this loop. Maybe I should concentrate on improving this function rather than searching for a needle in a haystack."
Similar to how when my computer is running slow I can use task manager to take a look at which programs are running slow and then adjust. I don't just guess. Or develop an unfounded penchant for quitting one program over another.
I of course could upload my sketch but this is a example independent problem I am trying to get better at solving. I would like to find a way to analyze all sketches. Or even code in different languages, etc.
If there is no general tool for analyzing a Processing sketch how do people go about this? Surely there must be a better method than trial and error, brute force, intuition, or otherwise. Of course those methods could yield better running code but there must be some formal analysis.
Even if you didn't have anything specific to share for Processing any suggestions on search terms, subjects, or topics would be appreciated as I have no idea how to proceed other than the brute force/trial and error method.
Thank you,
Without an example code snippet I can only provide a couple of general approaches.
The simplest thing you could do is use time sections of code and add print statements (poor man's profiler). The idea is to take a time snapshot before running a function/block of code you suspect is slow, take another one after then print the difference. The chunks of code with the largest time difference is what you want to focus on. Here's a minimal example, assuming doSomethingInteresting() is a function that you suspect is slow
// get the current millis since the sketch started
int before = millis();
// call the function here
doSomethingInteresting();
// get the millis again, after the function and calculate the time difference
int diff = millis() - before;
println("executed in ~" + diff + "ms");
If you use Processing's default Java mode you can use VisualVM to sample your PApplet. In your case you will want to sample CPU usage and the nice thing about VisualVM is that you will the results sorted by the slowest function calls (the first you should improve) and you can drill down to see what is your code versus what is part of the runtime or other native parts of code.

MATLAB: GUI progressively getting slower

I've been programming some MATLAB GUIs (not using GUIDE), mainly for viewing images and some other simple operations (such as selecting points and plotting some data from the images).
When the GUI starts, all the operations are performed quickly.
However, as the GUI is used (showing different frames from 3D/4D volumes and perfoming the operations mentioned above), it starts getting progressively slower, reaching a point where it is too slow for common usage.
I would like to hear some input regarding:
Possible strategies to find out why the GUI is getting slower;
Good MATLAB GUI programming practices to avoid this;
Possible references that address these issues.
I'm using set/getappdata to save variables in the main figure of the GUI and communicate between functions.
(I wish I could provide a minimal working example, but I don't think it is suitable in this case because this only happens in somewhat more complex GUIs.)
Thanks a lot.
EDIT: (Reporting back some findings using the profiler:)
I used the profiler in two occasions:
immediatly after starting the GUI;
after playing around with it for some time, until it started getting too slow.
I performed the exact same procedure in both profiling operations, which was simply moving the mouse around the GUI (same "path" both times).
The profiler results are as follows:
I am having difficulties in interpreting these results...
Why is the number of calls of certain functions (such as impixelinfo) so bigger in the second case?
Any opinions?
Thanks a lot.
The single best way I have found around this problem was hinted at above: forced garbage collection. Great advice though the command forceGarbageCollection is not recognized in MATLAB. The command you want is java.lang.System.gc()... such a beast.
I was working on a project wherein I was reading 2 serial ports at 40Hz (using a timer) and one NIDAQ at 1000Hz (using startBackground()) and graphing them all in real-time. MATLAB's parallel processing limitations ensured that one of those processes would cause a buffer choke at any given time. Animations would not be able to keep up, and eventually freeze, etc. I gained some initial success by making sure that I was defining a single plot and only updating parameters that changed inside my animation loop with the set command. (ex. figure, subplot(311), axis([...]),hold on, p1 = plot(x1,y1,'erasemode','xor',...); etc. then --> tic, while (toc<8) set(p1,'xdata',x1,'ydata',y1)...
Using set will make your animations MUCH faster and more fluid. However, you will still run into the buffer wall if you animate long enough with too much going on in the background-- especially real-time data inputs. Garbage collection is your answer. It isn't instantaneous so you don't want it to execute every loop cycle unless your loop is extremely long. My solution is to set up a counter variable outside the while loop and use a mod function so that it only executes every 'n' cycles (ex. counter = 0; while ()... counter++; if (~mod(counter,n)) java.lang.System.gc(); and so on.
This will save you (and hopefully others) loads of time and headache, trust me, and you will have MATLAB executing real-time data acq and animation on par with LabVIEW.
A good strategy to find out why anything is slow in Matlab is to use the profiler. Here is the basic way to use the profiler:
profile on
% do stuff now that you want to measure
profile off
profile viewer
I would suggest profiling a freshly opened GUI, and also one that has been open for a while and is noticeably slow. Then compare results and look for functions that have a significant increase in "Self Time" or "Total Time" for clues as to what is causing the slowdown.

Possible shortcomings for using JIT with R?

I recently discovered that one can use JIT (just in time) compilation with R using the compiler package (I summarizes my findings on this topic in a recent blog post).
One of the questions I was asked is:
Is there any pitfall? it sounds too good to be true, just put one line
of code and that's it.
After looking around I could find one possible issue having to do with the "start up" time for the JIT. But is there any other issue to be careful about when using JIT?
I guess that there will be some limitation having to do with R's environments architecture, but I can not think of a simple illustration of the problem off the top of my head, any suggestions or red flags will be of great help?
the output of a simple test with rpart could be an advice not to use enableJIT in ALL cases:
library(rpart)
fo <- function() for(i in 1:500){rpart(Kyphosis ~ Age + Number + Start, data=kyphosis)}
system.time(fo())
#User System verstrichen
#2.11 0.00 2.11
require(compiler)
enableJIT(3)
system.time(fo())
#User System verstrichen
#35.46 0.00 35.60
Any explanantion?
The rpart example given above, no longer seems to be an issue:
library("rpart")
fo = function() {
for(i in 1:500){
rpart(Kyphosis ~ Age + Number + Start, data=kyphosis)
}
} system.time(fo())
# user system elapsed
# 1.212 0.000 1.206
compiler::enableJIT(3)
# [1] 3
system.time(fo())
# user system elapsed
# 1.212 0.000 1.210
I've also tried a number of other examples, such as
growing a vector;
A function that's just a wrapper around mean
While I don't always get a speed-up, I've never experience a significant slow-down.
R> sessionInfo()
R version 3.3.0 (2016-05-03)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04 LTS
In principle, once the byte-code is compiled and loaded, it should always be interpreted at least as fast as the original AST interpreter. Some code will benefit from big speedups, this is usually code with a lot of scalar operations and loops where most time is spent in R interpretation (I've seen examples with 10x speedup but arbitrary micro-benchmarks could indeed inflate this as needed). Some code will run at the same speed, this is usually code well vectorized and hence spending nearly no time in interpretation. Now, compilation itself can be slow. Hence, the just in time compiler now does not compile functions when it guesses it won't pay off (and the heuristics change over time, this is already in 3.4.x). The heuristics don't always guess it right, so there may be situations when compilation won't pay off. Typical problematic patterns are code generation, code modification and manipulation of bindings of environments captured in closures.
Packages can be byte-compiled at installation time so that the compilation cost is not paid (repeatedly) at run time, at least for code that is known ahead of time. This is now the default in development version of R. While the loading of compiled code is much faster than compiling it, in some situations one may be loading even code that won't be executed, so there actually may be an overhead, but overall pre-compilation is beneficial. Recently some parameters of the GC have been tuned to reduce the cost of loading code that won't be executed.
My recommendation for package writers would be to use the defaults (just-in-time compilation is now on by default in released versions, byte-compilation at package installation time is now on in the development version). If you find an example where the byte-code compiler does not perform well, please submit a bug report (I've also seen a case involving rpart in earlier versions). I would recommend against code generation and code manipulation and particularly so in hot loops. This includes defining closures, deleting and inserting bindings in environments captured by closures. Definitely one should not do eval(parse(text= in hot loops (and this had been bad already without byte-compilation). It is always better to use branches than to generate new closures (without branches) dynamically. Also it is better to write code with loops than to dynamically generate code with huge expressions (without loops). Now with the byte-code compiler, it is now often ok to write loops operating on scalars in R (the performance won't be as bad as before, so one could more often get away without switching to C for the performance critical parts).
Further to the previous answer, experimentation shows the problem is not with the compilation of the loop, it is with the compilation of closures. [enableJIT(0) or enableJIT(1) leave the code fast, enableJIT(2) slows it down dramatically, and enableJIT(3) is slightly faster than the previous option (but still very slow)]. Also contrary to Hansi's comment, cmpfun slows execution to a similar extent.

Practical tips debugging deep recursion?

I'm working on a board game algorithm where a large tree is traversed using recursion, however, it's not behaving as expected. How do I handle this and what are you experiences with these situations?
To make things worse, it's using alpha-beta pruning which means entire parts of the tree are never visited, as well that it simply stops recursion when certain conditions are met. I can't change the search-depth to a lower number either, because while it's deterministic, the outcome does vary by how deep is searched and it may behave as expected at a lower search-depth (and it does).
Now, I'm not gonna ask you "where is the problem in my code?" but I am looking for general tips, tools, visualizations, anything to debug code like this. Personally, I'm developing in C#, but any and all tools are welcome. Although I think that this may be most applicable to imperative languages.
Logging. Log in your code extensively. In my experience, logging is THE solution for these types of problems. when it's hard to figure out what your code is doing, logging it extensively is a very good solution, as it lets you output from within your code what the internal state is; it's really not a perfect solution, but as far as I've seen, it works better than using any other method.
One thing I have done in the past is to format your logs to reflect the recursion depth. So you may do a new indention for every recurse, or another of some other delimiter. Then make a debug dll that logs everything you need to know about a each iteration. Between the two, you should be able to read the execution path and hopefully tell whats wrong.
I would normally unit-test such algorithms with one or more predefined datasets that have well-defined outcomes. I would typically make several such tests in increasing order of complexity.
If you insist on debugging, it is sometimes useful to doctor the code with statements that check for a given value, so you can attach a breakpoint at that time and place in the code:
if ( depth = X && item.id = 32) {
// Breakpoint here
}
Maybe you could convert the recursion into an iteration with an explicit stack for the parameters. Testing is easier in this way because you can directly log values, access the stack and don't have to pass data/variables in each self-evaluation or prevent them from falling out of scope.
I once had a similar problem when I was developing an AI algorithm to play a Tetris game. After trying many things a loosing a LOT of hours in reading my own logs and debugging and stepping in and out of functions what worked out for me was to code a fast visualizer and test my code with FIXED input.
So, if time is not a problem and you really want to understand what is going on, get a fixed board state and SEE what your program is doing with the data using a mix of debug logs/output and some sort of your own tools that shows information on each step.
Once you find a board state that gives you this problem, try to pin-point the function(s) where it starts and then you will be in a position to fix it.
I know what a pain this can be. At my job, we are currently working with a 3rd party application that basically behaves as a black box, so we have to devise some interesting debugging techniques to help us work around issues.
When I was taking a compiler theory course in college, we used a software library to visualize our trees; this might help you as well, as it could help you see what the tree looks like. In fact, you could build yourself a WinForms/WPF application to dump the contents of your tree into a TreeView control--it's messy, but it'll get the job done.
You might want to consider some kind of debug output, too. I know you mentioned that your tree is large, but perhaps debug statements or breaks at key point during execution that you're having trouble visualizing would lend you a hand.
Bear in mind, too, that intelligent debugging using Visual Studio can work wonders. It's tough to see how state is changing across multiple breaks, but Visual Studio 2010 should actually help with this.
Unfortunately, it's not particularly easy to help you debug without further information. Have you identified the first depth at which it starts to break? Does it continue to break with higher search depths? You might want to evaluate your working cases and try to determine how it's different.
Since you say that the traversal is not working as expected, I assume you have some idea of where things may go wrong. Then inspect the code to verify that you have not overlooked something basic.
After that I suggest you set up some simple unit tests. If they pass, then keep adding tests until they fail. If they fail, then reduce the tests until they either pass or are as simple as they can be. That should help you pinpoint the problems.
If you want to debug as well, I suggest you employ conditional breakpoints. Visual Studio lets you modify breakpoints, so you can set conditions on when the breakpoint should be triggered. That can reduce the number of iterations you need to look at.
I would start by instrumenting the function(s). At each recursive call log the data structures and any other info that will be useful in helping you identify the problem.
Print out the dump along with the source code then get away from the computer and have a nice paper-based debugging session over a cup of coffee.
Start from the base case where you've mentioned if else statements and then try to channelize your thinking by writing it down on pen and paper + printing the values on console when the first few instances of recursive functions are generated with values.
The motto is to find the correct trend between the values you print and match them with those values you wrote on paper in the initial few steps of your recursive algorithm.

Why are compilers so stupid?

I always wonder why compilers can't figure out simple things that are obvious to the human eye. They do lots of simple optimizations, but never something even a little bit complex. For example, this code takes about 6 seconds on my computer to print the value zero (using java 1.6):
int x = 0;
for (int i = 0; i < 100 * 1000 * 1000 * 1000; ++i) {
x += x + x + x + x + x;
}
System.out.println(x);
It is totally obvious that x is never changed so no matter how often you add 0 to itself it stays zero. So the compiler could in theory replace this with System.out.println(0).
Or even better, this takes 23 seconds:
public int slow() {
String s = "x";
for (int i = 0; i < 100000; ++i) {
s += "x";
}
return 10;
}
First the compiler could notice that I am actually creating a string s of 100000 "x" so it could automatically use s StringBuilder instead, or even better directly replace it with the resulting string as it is always the same. Second, It does not recognize that I do not actually use the string at all, so the whole loop could be discarded!
Why, after so much manpower is going into fast compilers, are they still so relatively dumb?
EDIT: Of course these are stupid examples that should never be used anywhere. But whenever I have to rewrite a beautiful and very readable code into something unreadable so that the compiler is happy and produces fast code, I wonder why compilers or some other automated tool can't do this work for me.
In my opinion, I don't believe it's the job of the compiler to fix what is, honestly, bad coding. You have, quite explicitly, told the compiler you want that first loop executed. It's the same as:
x = 0
sleep 6 // Let's assume this is defined somewhere.
print x
I wouldn't want the compiler removing my sleep statement just because it did nothing. You may argue that the sleep statement is an explicit request for a delay whereas your example is not. But then you will be allowing the compiler to make very high-level decisions about what your code should do, and I believe that to be a bad thing.
Code, and the compiler that processes it, are tools and you need to be a tool-smith if you want to use them effectively. How many 12" chainsaws will refuse to try cut down a 30" tree? How many drills will automatically switch to hammer mode if they detect a concrete wall?
None, I suspect, and this is because the cost of designing this into the product would be horrendous for a start. But, more importantly, you shouldn't be using drills or chainsaws if you don't know what you're doing. For example: if you don't know what kickback is (a very easy way for a newbie to take off their arm), stay away from chainsaws until you do.
I'm all for allowing compilers to suggest improvements but I'd rather maintain the control myself. It should not be up to the compiler to decide unilaterally that a loop is unnecessary.
For example, I've done timing loops in embedded systems where the clock speed of the CPU is known exactly but no reliable timing device is available. In that case, you can calculate precisely how long a given loop will take and use that to control how often things happen. That wouldn't work if the compiler (or assembler in that case) decided my loop was useless and optimized it out of existence.
Having said that, let me leave you with an old story of a VAX FORTRAN compiler that was undergoing a benchmark for performance and it was found that it was many orders of magnitude faster than its nearest competitor.
It turns out the compiler noticed that the result of the benchmark loops weren't being used anywhere else and optimized the loops into oblivion.
Oh, I don't know. Sometimes compilers are pretty smart. Consider the following C program:
#include <stdio.h> /* printf() */
int factorial(int n) {
return n == 0 ? 1 : n * factorial(n - 1);
}
int main() {
int n = 10;
printf("factorial(%d) = %d\n", n, factorial(n));
return 0;
}
On my version of GCC (4.3.2 on Debian testing), when compiled with no optimizations, or -O1, it generates code for factorial() like you'd expect, using a recursive call to compute the value. But on -O2, it does something interesting: It compiles down to a tight loop:
factorial:
.LFB13:
testl %edi, %edi
movl $1, %eax
je .L3
.p2align 4,,10
.p2align 3
.L4:
imull %edi, %eax
subl $1, %edi
jne .L4
.L3:
rep
ret
Pretty impressive. The recursive call (not even tail-recursive) has been completely eliminated, so factorial now uses O(1) stack space instead of O(N). And although I have only very superficial knowledge of x86 assembly (actually AMD64 in this case, but I don't think any of the AMD64 extensions are being used above), I doubt that you could write a better version by hand. But what really blew my mind was the code that it generated on -O3. The implementation of factorial stayed the same. But main() changed:
main:
.LFB14:
subq $8, %rsp
.LCFI0:
movl $3628800, %edx
movl $10, %esi
movl $.LC0, %edi
xorl %eax, %eax
call printf
xorl %eax, %eax
addq $8, %rsp
ret
See the movl $3628800, %edx line? gcc is pre-computing factorial(10) at compile-time. It doesn't even call factorial(). Incredible. My hat is off to the GCC development team.
Of course, all the usual disclaimers apply, this is just a toy example, premature optimization is the root of all evil, etc, etc, but it illustrates that compilers are often smarter than you think. If you think you can do a better job by hand, you're almost certainly wrong.
(Adapted from a posting on my blog.)
Speaking from a C/C++ point of view:
Your first example will be optimized by most compilers. If the java-compiler from Sun really executes this loop it's the compilers fault, but take my word that any post 1990 C, C++ or Fortran-compiler completely eliminates such a loop.
Your second example can't be optimized in most languages because memory allocation happens as a side-effect of concatenating the strings together. If a compiler would optimize the code the pattern of memory allocation would change, and this could lead to effects that the programmer tries to avoid. Memory fragmentation and related problems are issues that embedded programmers still face every day.
Overall I'm satisfied with the optimizations compilers can do these days.
Compilers are designed to be predictable. This may make them look stupid from time to time, but that's OK. The compiler writer's goals are
You should be able to look at your code and make reasonable predictions about its performance.
Small changes in the code should not result in dramatic differences in performance.
If a small change looks to the programmer like it should improve performance, it should at least not degrade performance (unless surprising things are happening in the hardware).
All these criteria militate against "magic" optimizations that apply only to corner cases.
Both of your examples have a variable updated in a loop but not used elsewhere. This case is actually quite difficult to pick up unless you are using some sort of framework that can combine dead-code elimination with other optimizations like copy propagation or constant propagation. To a simple dataflow optimizer the variable doesn't look dead. To understand why this problem is hard, see the paper by Lerner, Grove, and Chambers in POPL 2002, which uses this very example and explains why it is hard.
The HotSpot JIT compiler will only optimize code that has been running for some time. By the time your code is hot, the loop has already been started and the JIT compiler has to wait until the next time the method is entered to look for ways to optimize away the loop. If you call the method several times, you might see better performance.
This is covered in the HotSpot FAQ, under the question "I write a simple loop to time a simple operation and it's slow. What am I doing wrong?".
Seriously? Why would anyone ever write real-world code like that? IMHO, the code, not the compiler is the "stupid" entity here. I for one am perfectly happy that compiler writers don't bother wasting their time trying to optimize something like that.
Edit/Clarification:
I know the code in the question is meant as an example, but that just proves my point: you either have to be trying, or be fairly clueless to write supremely inefficient code like that. It's not the compiler's job to hold our hand so we don't write horrible code. It is our responsibility as the people that write the code to know enough about our tools to write efficiently and clearly.
Well, I can only speak of C++, because I'm a Java beginner totally. In C++, compilers are free to disregard any language requirements placed by the Standard, as long as the observable behavior is as-if the compiler actually emulated all the rules that are placed by the Standard. Observable behavior is defined as any reads and writes to volatile data and calls to library functions. Consider this:
extern int x; // defined elsewhere
for (int i = 0; i < 100 * 1000 * 1000 * 1000; ++i) {
x += x + x + x + x + x;
}
return x;
The C++ compiler is allowed to optimize out that piece of code and just add the proper value to x that would result from that loop once, because the code behaves as-if the loop never happened, and no volatile data, nor library functions are involved that could cause side effects needed. Now consider volatile variables:
extern volatile int x; // defined elsewhere
for (int i = 0; i < 100 * 1000 * 1000 * 1000; ++i) {
x += x + x + x + x + x;
}
return x;
The compiler is not allowed to do the same optimization anymore, because it can't prove that side effects caused by writing to x could not affect the observable behavior of the program. After all, x could be set to a memory cell watched by some hardware device that would trigger at every write.
Speaking of Java, I have tested your loop, and it happens that the GNU Java Compiler (gcj) takes in inordinate amount of time to finish your loop (it simply didn't finish and I killed it). I enabled optimization flags (-O2) and it happened it printed out 0 immediately:
[js#HOST2 java]$ gcj --main=Optimize -O2 Optimize.java
[js#HOST2 java]$ ./a.out
0
[js#HOST2 java]$
Maybe that observation could be helpful in this thread? Why does it happen to be so fast for gcj? Well, one reason surely is that gcj compiles into machine code, and so it has no possibility to optimize that code based on runtime behavior of the code. It takes all its strongness together and tries to optimize as much as it can at compile time. A virtual machine, however, can compile code Just in Time, as this output of java shows for this code:
class Optimize {
private static int doIt() {
int x = 0;
for (int i = 0; i < 100 * 1000 * 1000 * 1000; ++i) {
x += x + x + x + x + x;
}
return x;
}
public static void main(String[] args) {
for(int i=0;i<5;i++) {
doIt();
}
}
}
Output for java -XX:+PrintCompilation Optimize:
1 java.lang.String::hashCode (60 bytes)
1% Optimize::doIt # 4 (30 bytes)
2 Optimize::doIt (30 bytes)
As we see, it JIT compiles the doIt function 2 times. Based on the observation of the first execution, it compiles it a second time. But it happens to have the same size as bytecode two times, suggesting the loop is still in place.
As another programmer shows, execution time for certain dead loops even is increased for some cases for subsequently compiled code. He reported a bug which can be read here, and is as of 24. October 2008.
On your first example, it's an optimization that only works if the value is zero. The extra if statement in the compiler needed to look for this one rarely-seen clause may just not be worth it (since it'll have to check for this on every single variable). Furthermore, what about this:
int x = 1;
int y = 1;
int z = x - y;
for (int i = 0; i < 100 * 1000 * 1000 * 1000; ++i) {
z += z + z + z + z + z;
}
System.out.println(z);
This is still obviously the same thing, but now there's an extra case we have to code for in the compiler. There's just an infinite amount of ways that it can end up being zero that aren't worth coding in for, and I guess you could say that if you're going to have one of them you'd might as well have them all.
Some optimizations do take care of the second example you have posted, but I think I've seen it more in functional languages and not so much Java. The big thing that makes it hard in newer languages is monkey-patching. Now += can have a side-effect that means if we optimize it out, it's potentially wrong (e.g. adding functionality to += that prints out the current value will mean a different program altogether).
But it comes down to the same thing all over again: there's just too many cases you'd have to look for to make sure no side effects are being performed that will potentially alter the final program's state.
It's just easier to take an extra moment and make sure what you're writing is what you really want the computer to do. :)
Compilers in general are very smart.
What you must consider is that they must account for every possibly exception or situation where optimizing or re-factoring code could cause unwanted side-effects.
Things like, threaded programs, pointer aliasing, dynamically linked code and side effects (system calls/memory alloc) etc. make formally prooving refactoring very difficult.
Even though your example is simple, there still may be difficult situations to consider.
As for your StringBuilder argument, that is NOT a compilers job to choose which data structures to use for you.
If you want more powerful optimisations move to a more strongly typed language like fortran or haskell, where the compilers are given much more information to work with.
Most courses teaching compilers/optimisation (even acedemically) give a sense of appreciation about how making gerneral formally prooven optimisatons rather than hacking specific cases is a very difficult problem.
I think you are underestimating how much work it is to make sure that one piece of code doesn't affect another piece of code. With just a small change to your examples x, i, and s could all point to the same memory. Once one of the variables is a pointer, it is much harder to tell what code might have side effects depending on is point to what.
Also, I think people who program compliers would rather spend time making optimizations that aren't as easy for humans to do.
Because we're just not there yet. You could just as easily have asked, "why do I still need to write programs... why can't I just feed in the requirements document and have the computer write the application for me?"
Compiler writers spend time on the little things, because those are the types of things that application programmers tend to miss.
Also, they cannot assume too much (maybe your loop was some kind of ghetto time delay or something)?
It's an eternal arms race between compiler writers and programmers.
Non-contrived examples work great -- most compilers do indeed optimize away the obviously useless code.
Contrived examines will always stump the compiler. Proof, if any was needed, that any programmer is smarter than any program.
In the future, you'll need more contrived examples than the one's you've posted here.
As others have addressed the first part of your question adequately, I'll try to tackle the second part, i.e. "automatically uses StringBuilder instead".
There are several good reasons for not doing what you're suggesting, but the biggest factor in practice is likely that the optimizer runs long after the actual source code has been digested & forgotten about. Optimizers generally operate either on the generated byte code (or assembly, three address code, machine code, etc.), or on the abstract syntax trees that result from parsing the code. Optimizers generally know nothing of the runtime libraries (or any libraries at all), and instead operate at the instruction level (that is, low level control flow and register allocation).
Second, as libraries evolve (esp. in Java) much faster than languages, keeping up with them and knowing what deprecates what and what other library component might be better suited to the task would be a herculean task. Also likely an impossible one, as this proposed optimizer would have to precisely understand both your intent and the intent of each available library component, and somehow find a mapping between them.
Finally, as others have said (I think), the compiler/optimizer writer can reasonably assume that the programmer writing the input code is not brain-dead. It would be a waste of time to devote significant effort to asinine special cases like these when other, more general optimizations abound. Also, as others have also mentioned, seemingly brain-dead code can have an actual purpose (a spin lock, busy wait prior to a system-level yield, etc.), and the compiler has to honor what the programmer asks for (if it's syntactically and semantically valid).
Did you compile to release code? I think a good compiler detects in your second example that the string is never used an removes the entire loop.
Actually, Java should use string builder in your second example.
The basic problem with trying to optimize these examples away is that doing so would require theorem proving. Which means that the compiler would need to construct a mathematical proof of what you're code will actually do. And that's no small task at all. In fact, being able to prove that all code really does have an effect is equivalent to the halting problem.
Sure, you can come up with trivial examples, but the number of trivial examples is unlimited. You could always think of something else, so there is no way to catch them all.
Of course, it is possible for some code to be proven not to have any effect, as in your examples. What you would want to do is have the compiler optimize away every problem that can be proven unused in P time.
But anyway, that's a ton of work and it doesn't get you all that much. People spend a lot of time trying to figure out ways to prevent programs from having bugs in them, and type systems like those in Java and Scala are attempts to prevent bugs, but right now no one is using type systems to make statements about execution time, as far as I know.
You might want to look into Haskel, which I think has the most advanced theory proving stuff, although I'm not sure on that. I don't know it myself.
Mostly what you're complaining about is 'why are Java compiler so stupid', since most other language compilers are much smarter.
The reason for the stupidity of Java compilers is historical. First, the original java implementations were interpreter based, and performance was consisdered unimportant. Second, many of the original java benchmarks were problematic to optimize. I recall one benchmark that looked a lot like your second example. Unfortunately, if the compiler optimized the loop away, the benchmark would get a divide by zero exception when it tried to divide a baseline number by the elapsed time to compute its performance score. So when writing a optimizing java compiler, you had to be very careful NOT to optimize some things, as people would then claim your compiler was broken.
It's almost considered bad practice to optimize things like this when compiling down to JVM bytecode. Sun's javac does have some basic optimizations, as does scalac, groovyc, etc. In short, anything that's truely language-specific can get optimized within the compiler. However, things like this which are obviously so contrived as to be language agnostic will slip through simply out of policy.
The reason for this is it allows HotSpot to have a much more consistent view of the bytecode and its patterns. If the compilers start mucking about with edge cases, that reduces the VM's ability to optimize the general case which may not be apparent at compile time. Steve Yeggie likes to harp on about this: optimization is often easier when performed at runtime by a clever virtual machine. He even goes so far as to claim that HotSpot strips out javac's optimizations. While I don't know if this is true, it wouldn't surprise me.
To summarize: compilers targeting VMs have a very different set of criteria, particularly in the area of optimization and when it is appropriate. Don't go blaming the compiler writers for leaving the work to the far-more-capable JVM. As pointed out several times on this thread, modern compilers targeting the native architecture (like the gcc family) are extremely clever, producing obscenely fast code through some very smart optimizations.
I have never seen the point in dead code elimination in the first place. Why did the programmer write it?? If you're going to do something about dead code, declare it a compiler error! It almost certainly means the programmer made a mistake--and for the few cases it doesn't, a compiler directive to use a variable would be the proper answer. If I put dead code in a routine I want it executed--I'm probably planning to inspect the results in the debugger.
The case where the compiler could do some good is pulling out loop invariants. Sometimes clarity says to code the calculation in the loop and having the compiler pull such things out would be good.
Compilers that can do strict-aliasing optimizations, will optimize first example out. See here.
Second example can't be optimized because the slowest part here is memory allocation/reallocation and operator+= is redefined into a function that does the memory stuff. Different implementations of strings use different allocation strategies.
I myself also would rather like to have malloc(100000) than thousand malloc(100) too when doing s += "s"; but right now that thing is out of scope of compilers and has to be optimized by people. This is what D language tries to solve by introducing pure functions.
As mentioned here in other answers, perl does second example in less than a second because it allocates more memory than requested just in case more memory will be needed later.
In release mode VS 2010 C++ this doesnt take any time to run. However debug mode is another story.
#include <stdio.h>
int main()
{
int x = 0;
for (int i = 0; i < 100 * 1000 * 1000 * 1000; ++i) {
x += x + x + x + x + x;
}
printf("%d", x);
}
Absolute optimization is an undecidable problem, that means, there is no Turing machine (and, therefore, no computer program) that can yield the optimal version of ANY given program.
Some simple optimizations can be (and, in fact, are) done, but, in the examples you gave...
To detect that your first program always prints zero, the compiler would have to detect that x remains constant despite all the loop iterations. How can you explain (I know, it's not the best word, but I can't come up with another) that to a compiler?
How can the compiler know that the StringBuilder is the right tool for the job without ANY reference to it?
In a real-world application, if efficiency is critical in a part of your application, it must be written in a low-level language like C. (Haha, seriously, I wrote this?)
This is an example of procedural code v. functional code.
You have detailed a procedure for the compiler to follow, so the optimisations are going to be based around the procedure detailed and will minimise any side effects or not optimise where it will not be doing what you expect. This makes it easier to debug.
If you put in a functional description of what you want eg. SQL then you are giving the compiler a wide range of options to optimise.
Perhaps some type of code analysis would be able to find this type of issue or profiling at run-time, but then you will want to change the source to something more sensible.
Because compiler writers try add optimizations for things that matter (I hope) and that are measured in *Stone benchmarks (I fear).
There are zillions of other possible code fragments like yours, which do nothing and could be optimized with increasing effort on the compiler writer, but which are hardly ever encountered.
What I feel embarrassing is that even today most compilers generate code to check for the switchValue being greater than 255 for a dense or almost full switch on an unsigned character. That adds 2 instructions to most bytecode interpreter's inner loop.
I hate to bring this up on such an old question (how did I get here, anyway?), but I think part of this might be something of a holdout from the days of the Commodore 64.
In the early 1980s, everything ran on a fixed clock. There was no Turbo Boosting and code was always created for a specific system with a specific processor and specific memory, etc. In Commodore BASIC, the standard method for implementing delays looked a lot like:
10 FOR X = 1 TO 1000
20 NEXT : REM 1-SECOND DELAY
(Actually, in practice, it more closely resembled 10FORX=1TO1000:NEXT, but you know what I mean.)
If they were to optimize this, it would break everything—nothing would ever be timed. I don't know of any examples, but I'm sure there are lots of little things like this scattered through the history of compiled languages that prevented things from being optimized.
Admittedly, these non-optimizations aren't necessary today. There's probably, however, some unspoken rule among compiler developers not to optimize things like this. I wouldn't know.
Just be glad that your code is optimized somewhat, unlike code on the C64. Displaying a bitmap on the C64 could take up to 60 seconds with the most efficient BASIC loops; thus, most games, etc. were written in machine language. Writing games in machine language isn't fun.
Just my thoughts.
Premise: I studied compilers at university.
The javac compiler is extremely stupid and performs absolutely no optimization because it relies on the java runtime to do them. The runtime will catch that thing and optimize it, but it will catch it only after the function is executed a few thousand times.
If you use a better compiler (like gcc) enabling optimizations, it will optimize your code, because it's quite an obvious optimization to do.
A compiler's job is to optimize how the code does something, not what the code does.
When you write a program, you are telling the computer what to do. If a compiler changed your code to do something other than what you told it to, it wouldn't be a very good compiler! When you write x += x + x + x + x + x, you are explicitly telling the computer that you want it to set x to 6 times itself. The compiler may very well optimize how it does this (e.g. multiplying x by 6 instead of doing repeated addition), but regardless it will still calculate that value in some way.
If you don't want something to be done, don't tell someone to do it.
Compilers are as smart as we make them. I don't know too many programmers who would bother writing a compiler that would check for constructs such as the ones you used. Most concentrate on more typical ways to improve performance.
It is possible that someday we will have software, including compilers, that can actually learn and grow. When that day comes most, maybe all, programmers will be out of job.
The meaning of your two examples is pointless, useless and only made to fool the compiler.
The compiler is not capable (and should not be) to see the meaning of a method, a loop or a program. That is where you get into the picture. You create a method for a certain functionality/meaning, no matter how stupid it is. It's the same case for simple problems
or extreme complex programs.
In your case the compiler might optimize it, because it "thinks" it should be optimized
in another way but why stay there?
Extreme other situation. We have a smart compiler compiling Windows. Tons of code to compile. But if it's smart, it boils it down to 3 lines of code...
"starting windows"
"enjoy freecell/solitaire"
"shutting down windows"
The rest of the code is obsolete, because it's never used, touched, accessed.
Do we really want that?
It forces you (the programmer) to think about what you're writing. Forcing compilers to do your work for you doesn't help anyone: it makes the compilers much more complex (and slower!), and it makes you stupider and less attentive to your code.

Resources