what is -o3 optimization flag used in gcc - gcc

Can someone tell me the simple question why we need to put -o3 associates with -o in gcc to compile c program, simply it will help to increase the speed of compilation or reduced the time lapsed spending in compilation?
Thanks!!!

It can potentially increase the performance of the generated code.
In principle, compilation usually takes longer because this requires (much) more analysis by the compiler.
For typical modern C++ code, the effect of -O2 and higher can be very dramatic (an order of magnitude, depending on the nature of the program).
Precisely which optimizations are performed at the various optimization levels is documented in the manual pages: http://linux.die.net/man/1/gcc
Keep in mind, though that the various optimizations can potentially make latent bugs manifest, because compilers are allowed to exploit Undefined Behaviour¹ to achieve more efficient target code.
Undefined Behaviour lurks in places where the language standard(s) do not specify exactly what needs to happen. These can be extremely subtle.
So I recommend on using anything higher than -O2 unless you have rigid quality controls in place that guard against such hidden undefined behaviours (think of valgrind/purify, statical analysis tools and (stress) testing in general).
¹ A very profound blog about undefined behaviour in optimizing compilers is here: http://blog.regehr.org/archives/213 . In particular it let's you take the perspective of a compiler writer, whose only objective is to generate the fastest possible code that still satisfies the specifications.

Related

Do compilers take the "status quo" when optimizations produced worse results?

To my knowledge, when using optimizations there is a risk to face the "maybe will be worse" case (i.e. the performance will be degraded, or the code size will be higher, or both). However do compilers able to detect such cases and return to the "status quo" (i.e. fall back to the original non-optimized code) when optimizations produced worse results? Can someone give (if possible) a particular examples of what compilers (for example, gcc, Clang (LLVM), etc.) do in this case?
In JIT compilers there is a thing called Deoptimization. Normally the compiler will optimize heavily assuming something, but during execution some of the assumption may fail. For example the compiler will assume the inmput of a function is always an integer and produce a highly efficient code for integer manipulation, but if, and such things happen in dynamic languages, the input is suddenly and array or a string, the code should revert. See v8 turbofan speculative optimizator for example.
For non JIT there is no way to deoptimize during runtime, but the compiler may create multiple execution paths. Your question is not fully logical because how would compiler know if it created unoptimal code? It can only use the same algorithm it used to do the optimization itself. That's probably why you are downwoted.

Performance overhead with "-g" (debug) flag of GCC? [duplicate]

I'm compiling a program with -O3 for performance and -g for debug symbols (in case of crash I can use the core dump). One thing bothers me a lot, does the -g option results in a performance penalty? When I look on the output of the compilation with and without -g, I see that the output without -g is 80% smaller than the output of the compilation with -g. If the extra space goes for the debug symbols, I don't care about it (I guess) since this part is not used during runtime. But if for each instruction in the compilation output without -g I need to do 4 more instructions in the compilation output with -g than I certainly prefer to stop using -g option even at the cost of not being able to process core dumps.
How to know the size of the debug symbols section inside the program and in general does compilation with -g creates a program which runs slower than the same code compiled without -g?
Citing from the gcc documentation
GCC allows you to use -g with -O. The shortcuts taken by optimized
code may occasionally produce surprising results: some variables you
declared may not exist at all; flow of control may briefly move where
you did not expect it; some statements may not be executed because
they compute constant results or their values are already at hand;
some statements may execute in different places because they have been
moved out of loops.
that means:
I will insert debugging symbols for you but I won't try to retain them if an optimization pass screws them out, you'll have to deal with that
Debugging symbols aren't written into the code but into another section called "debug section" which isn't even loaded at runtime (only by a debugger). That means: no code changes. You shouldn't notice any performance difference in code execution speed but you might experience some slowness if the loader needs to deal with the larger binary or if it takes into account the increased binary size somehow. You will probably have to benchmark the app yourself to be 100% sure in your specific case.
Notice that there's also another option from gcc 4.8:
-Og
Optimize debugging experience. -Og enables optimizations that do not interfere with debugging. It should be the optimization level of choice for the standard edit-compile-debug cycle, offering a reasonable level of optimization while maintaining fast compilation and a good debugging experience.
This flag will impact performance because it will disable any optimization pass that would interfere with debugging infos.
Finally, it might even happen that some optimizations are better suited to a specific architecture rather than another one and unless instructed to do so for your specific processor (see march/mtune options for your architecture), in O3 gcc will do its best for a generic architecture. That means you might even experience O3 being slower than O2 in some contrived scenarios. "Best-effort" doesn't always mean "the best available".

Coding Style for GCC ARM Optimization levels

I've been doing embedded firmware since 1977 but I have never enabled optimization on any of the compilers I've used.
I'm working with the GCC ARM compiler for a CM4 micro.
Code runs as expected with NO optimization.
I use a lot of structures and pointers in my code.
I use volatile when a variable can change from within an interrupt routine.
I recently need to speed up execution of my code so I used optimization level -Og (first time ever enabling optimization) - which still gives good debugging and increased performance where I wanted it.
My issue/concern is the code behaves really flacky!!!
It behaves OK - then I make a small change - and it mis-behaves -- changes each time I run the compiler - almost like there is an issue with address alignment or instructions have been completely removed.
I can change some variables to volatile and that also changes behavior but I don't understand why that would affect how global variables (not modified in an interrupt routine) would have a positive change in behavior.
I'm about ready to give up with over-all optimization and look at using function specific optimization since I know which functions affect the performance I'm trying to improve.
Can anyone explain how coding style can be impacted negatively with optimization?
Any good documents that address coding style with optimization in mind?
Does GCC function level optimization work well?
Thanks.
Joe

GCC optimization levels. Which is better?

I am focusing on the CPU/memory consumption of compiled programs by GCC.
Executing code compiled with O3 is it always so greedy in term of resources ?
Is there any scientific reference or specification that shows the difference of Mem/cpu consumption of different levels?
People working on this problem often focus on the impact of these optimizations on the execution time, compiled code size, energy. However, I can't find too much work talking about resource consumption (by enabling optimizations).
Thanks in advance.
No, there is no absolute way, because optimization in compilers is an art (and is even not well defined, and might be undecidable or intractable).
But some guidelines first:
be sure that your program is correct and has no bugs before optimizing anything, so do debug and test your program
have well designed test cases and representative benchmarks (see this).
be sure that your program has no undefined behavior (and this is tricky, see this), since GCC will optimize strangely (but very often correctly, according to C99 or C11 standards) if you have UB in your code; use the -fsanitize=style options (and gdb and valgrind ....) during debugging phase.
profile your code (on various benchmarks), in particular to find out what parts are worth optimization efforts; often (but not always) most of the CPU time happens in a small fraction of the code (rule of thumb: 80% of time spent in 20% of code; on some applications like the gcc compiler this is not true, check with gcc -ftime-report to ask gcc to show time spent in various compiler modules).... Most of the time "premature optimization is the root of all evil" (but there are exceptions to this aphorism).
improve your source code (e.g. use carefully and correctly restrict and const, add some pragmas or function or variable attributes, perhaps use wisely some GCC builtins __builtin_expect, __builtin_prefetch -see this-, __builtin_unreachable...)
use a recent compiler. Current version (october 2015) of GCC is 5.2 (and GCC 8 in june 2018) and continuous progress on optimization is made ; you might consider compiling GCC from its source code to have a recent version.
enable all warnings (gcc -Wall -Wextra) in the compiler, and try hard to avoid all of them; some warnings may appear only when you ask for optimization (e.g. with -O2)
Usually, compile with -O2 -march=native (or perhaps -mtune=native, I assume that you are not cross-compiling, if you do add the good -march option ...) and benchmark your program with that
Consider link-time optimization by compiling and linking with -flto and the same optimization flags. E.g., put CC= gcc -flto -O2 -march=native in your Makefile (then remove -O2 -mtune=native from your CFLAGS there)...
Try also -O3 -march=native, usually (but not always, you might sometimes has slightly faster code with -O2 than with -O3 but this is uncommon) you might get a tiny improvement over -O2
If you want to optimize the generated program size, use -Os instead of -O2 or -O3; more generally, don't forget to read the section Options That Control Optimization of the documentation. I guess that both -O2 and -Os would optimize the stack usage (which is very related to memory consumption). And some GCC optimizations are able to avoid malloc (which is related to heap memory consumption).
you might consider profile-guided optimizations, -fprofile-generate, -fprofile-use, -fauto-profile options
dive into the documentation of GCC, it has numerous optimization & code generation arguments (e.g. -ffast-math, -Ofast ...) and parameters and you could spend months trying some more of them; beware that some of them are not strictly C standard conforming!
recent GCC and Clang can emit DWARF debug information (somehow "approximate" if strong optimizations have been applied) even when optimizing, so passing both -O2 and -g could be worthwhile (you still would be able, with some pain, to use the gdb debugger on optimized executable)
if you have a lot of time to spend (weeks or months), you might customize GCC using MELT (or some other plugin) to add your own new (application-specific) optimization passes; but this is difficult (you'll need to understand GCC internal representations and organization) and probably rarely worthwhile, except in very specific cases (those when you can justify spending months of your time for improving optimization)
you might want to understand the stack usage of your program, so use -fstack-usage
you might want to understand the emitted assembler code, use -S -fverbose-asm in addition of optimization flags (and look into the produced .s assembler file)
you might want to understand the internal working of GCC, use various -fdump-* flags (you'll get hundred of dump files!).
Of course the above todo list should be used in an iterative and agile fashion.
For memory leaks bugs, consider valgrind and several -fsanitize= debugging options. Read also about garbage collection (and the GC handbook), notably Boehm's conservative garbage collector, and about compile-time garbage collection techniques.
Read about the MILEPOST project in GCC.
Consider also OpenMP, OpenCL, MPI, multi-threading, etc... Notice that parallelization is a difficult art.
Notice that even GCC developers are often unable to predict the effect (on CPU time of the produced binary) of such and such optimization. Somehow optimization is a black art.
Perhaps gcc-help#gcc.gnu.org might be a good place to ask more specific & precise and focused questions about optimizations in GCC
You could also contact me on basileatstarynkevitchdotnet with a more focused question... (and mention the URL of your original question)
For scientific papers on optimizations, you'll find lots of them. Start with ACM TOPLAS, ACM TACO etc... Search for iterative compiler optimization etc.... And define better what resources you want to optimize for (memory consumption means next to nothing....).

Compiling in GCC: Is -O3 harmful?

I have heard that one should not compile with -O3 option with gcc. Is that true? If so, what are the reasons for avoiding -O3?
The answer is: it depends on your code.
The basic rule of thumb is like this:
At -O1 the compiler does optimizations that don't take too long to compute.
At -O2 the compiler does "expensive" optimizations that may slow the compile process. They might also make the output program a little larger, but probably not so much.
-Os is roughly the same as -O2, but the optimizations are tuned more towards size than speed. For the most part these two features don't conflict (more optimal code does less steps and is therefore smaller), but there are some tricks that duplicate code to avoid branching penalties, for example.
At -O3 the compiler really cranks up the space-hungry optimizations. It will inline functions much more aggressively, and try to use vectorization where possible.
You can read more details in the GCC documentation. If you really want to super optimize your code then you can try to enable even more options not used even at -O3; the -floop-* options, for instance`.
The problem with speed-space optimizations, in particular, is that they can have a negative impact on the effectiveness of your memory caches. The code might be better for the CPU, but if it's not better for your memory, then you lose. For this reason, if your program doesn't have a single hot-spot where it spends all it's time then you might find it is slowed down overall.
Real-world optimization is a imprecise science for three reasons:
User's hardware varies a lot.
What's good for one code base might not be good for another.
We want the compiler to run quickly, so it must make best guesses, rather than trying all the options and picking the best.
Basically, the answer is always, if performance matters try all the optimization levels, measure how well your code performs, and choose the best one for you. And do this again every something big changes.
If performance does not matter, -O2 is the choice for you.

Resources