I've got a question in one of my tests that asked me to answer what switch statement produce the least efficient machine code. The possible answers were O4, O1, O2, or O3. I don't event know what those are supposed to mean.
GCC Options That Control Optimization
-O
-O1
Optimize. Optimizing compilation takes somewhat more time, and a lot more memory for a large function.
-O2
Optimize even more. GCC performs nearly all supported optimizations that do not involve a space-speed tradeoff. As compared to -O, this option increases both compilation time and the performance of the generated code.
-O3
Optimize yet more
-Os
Optimize for size.
-Ofast
Disregard strict standards compliance. -Ofast enables all -O3 optimizations. It also enables optimizations that are not valid for all standard compliant programs
Related
This question already has answers here:
Are there any drawbacks to using -O3 in GCC?
(4 answers)
Closed 1 year ago.
Should I always specify the -O3 flag when compiling a release version with gcc, or is there any other possible drawback?
Should I always specify the -O3 flag when compiling a release version with gcc?
No, or at least maybe not. For performance; sometimes -O3 makes code that is slower than you get from -O2.
Under the hood; it's really a bunch of different optimizations that can be enabled/disabled individually; where -O3 (and -O2 and -Os) is just a convenient shorthand for enabling a group of many optimizations. -O2 is supposed to represent "enable all optimizations that always help", and -O3 is supposed to represent "enable all optimizations that often help (but may make things worse)". Which actual optimizations are/aren't enabled for each -O setting is detailed in the manual (at https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html ).
If you don't use the shortcuts and specify individual optimizations yourself; then (using a laborious "trial and error" approach and benchmarking the result for each case) you can find the set of optimizations that always help your program (and avoid enabling optimizations that make the performance of your program worse).
A more practical approach would be to start with O2 then determine which of the optimizations that aren't already enabled by -O2 also help.
However; performance isn't the only thing that matters. To save time; most people just try -O2 or -O3 and pick whatever seems fastest. Part of the reason for this is that your software and the compiler is constantly changing; so any "laborious benchmarking" you do would need to be done again regularly.
Note: to actually get the maximum performance possible, each translation unit can be compiled with different optimization settings (so you can do the "laborious trial and error" for each individual source file); and then the resulting set of "optimized differently" object files can be fed into a link-time optimizer to optimize more.
I was thinking the other day and this came to mind. I heard that -g turn off optimizations for debugging, but -O0 does turn off all optimizations, right? I am just wondering.
I heard that -g turn off optimizations for debugging
That's not true. -g only tells the compiler to include debug information in the object file; it has no effect on the code that is actually generated, and it can be used in combination with any optimization option you like. It will be more useful when used with -O0 or -Og, as higher optimization levels can change the code in ways that makes it hard to use a debugger, but this is not mandatory.
but -O0 does turn off all optimizations, right?
Yes, that is correct; at least, all optimizations that can be turned off at all. (As JcWasmx86 points out, there are some "optimizations" that are done even at -O0, but you can expect them to be transformations that won't require significant compile time or dramatically change the structure of the generated code.)
In gcc compiler is there a way to enable auto vectorization only? I do know that -ftree-vectorize flag enables auto vectorization. But it requires at least -O2 optimization level. Is there a way to enable auto vectorization without using the -O2 optimization flag?
Thanks in advance.
You could actually get decent auto vectorization with -ftree-vectorize combined with -O1, for example: Godbolt.
With -O0, however, vectorized code won't be generated, even for very simple examples.
I suspect that gcc's tree vectorizer isn't even called with -O0, or called and bails out, but that has to be verified in the gcc source code.
Generally, -O0 and auto vectorization don't mix very well. In compilers, optimizations happen in phases, where each optimization phase prepares the ground for the next one.
For auto vectorization to occur, at least on non trivial examples, the compiler has to perform some optimizations beforehand. For example, loops that contain jumps usually cannot be vectorized, unless branches are eliminated and replaced with predicated instructions by an optimization called if-conversion - resulting in a flat block of code, which could be vectorized more conviniently.
Footnote - I came across this nice presentation about GCC auto vectorization, which you may find interesting - it gives a good introduction to auto vectorization with gcc, compiler flags and basic concepts.
I am focusing on the CPU/memory consumption of compiled programs by GCC.
Executing code compiled with O3 is it always so greedy in term of resources ?
Is there any scientific reference or specification that shows the difference of Mem/cpu consumption of different levels?
People working on this problem often focus on the impact of these optimizations on the execution time, compiled code size, energy. However, I can't find too much work talking about resource consumption (by enabling optimizations).
Thanks in advance.
No, there is no absolute way, because optimization in compilers is an art (and is even not well defined, and might be undecidable or intractable).
But some guidelines first:
be sure that your program is correct and has no bugs before optimizing anything, so do debug and test your program
have well designed test cases and representative benchmarks (see this).
be sure that your program has no undefined behavior (and this is tricky, see this), since GCC will optimize strangely (but very often correctly, according to C99 or C11 standards) if you have UB in your code; use the -fsanitize=style options (and gdb and valgrind ....) during debugging phase.
profile your code (on various benchmarks), in particular to find out what parts are worth optimization efforts; often (but not always) most of the CPU time happens in a small fraction of the code (rule of thumb: 80% of time spent in 20% of code; on some applications like the gcc compiler this is not true, check with gcc -ftime-report to ask gcc to show time spent in various compiler modules).... Most of the time "premature optimization is the root of all evil" (but there are exceptions to this aphorism).
improve your source code (e.g. use carefully and correctly restrict and const, add some pragmas or function or variable attributes, perhaps use wisely some GCC builtins __builtin_expect, __builtin_prefetch -see this-, __builtin_unreachable...)
use a recent compiler. Current version (october 2015) of GCC is 5.2 (and GCC 8 in june 2018) and continuous progress on optimization is made ; you might consider compiling GCC from its source code to have a recent version.
enable all warnings (gcc -Wall -Wextra) in the compiler, and try hard to avoid all of them; some warnings may appear only when you ask for optimization (e.g. with -O2)
Usually, compile with -O2 -march=native (or perhaps -mtune=native, I assume that you are not cross-compiling, if you do add the good -march option ...) and benchmark your program with that
Consider link-time optimization by compiling and linking with -flto and the same optimization flags. E.g., put CC= gcc -flto -O2 -march=native in your Makefile (then remove -O2 -mtune=native from your CFLAGS there)...
Try also -O3 -march=native, usually (but not always, you might sometimes has slightly faster code with -O2 than with -O3 but this is uncommon) you might get a tiny improvement over -O2
If you want to optimize the generated program size, use -Os instead of -O2 or -O3; more generally, don't forget to read the section Options That Control Optimization of the documentation. I guess that both -O2 and -Os would optimize the stack usage (which is very related to memory consumption). And some GCC optimizations are able to avoid malloc (which is related to heap memory consumption).
you might consider profile-guided optimizations, -fprofile-generate, -fprofile-use, -fauto-profile options
dive into the documentation of GCC, it has numerous optimization & code generation arguments (e.g. -ffast-math, -Ofast ...) and parameters and you could spend months trying some more of them; beware that some of them are not strictly C standard conforming!
recent GCC and Clang can emit DWARF debug information (somehow "approximate" if strong optimizations have been applied) even when optimizing, so passing both -O2 and -g could be worthwhile (you still would be able, with some pain, to use the gdb debugger on optimized executable)
if you have a lot of time to spend (weeks or months), you might customize GCC using MELT (or some other plugin) to add your own new (application-specific) optimization passes; but this is difficult (you'll need to understand GCC internal representations and organization) and probably rarely worthwhile, except in very specific cases (those when you can justify spending months of your time for improving optimization)
you might want to understand the stack usage of your program, so use -fstack-usage
you might want to understand the emitted assembler code, use -S -fverbose-asm in addition of optimization flags (and look into the produced .s assembler file)
you might want to understand the internal working of GCC, use various -fdump-* flags (you'll get hundred of dump files!).
Of course the above todo list should be used in an iterative and agile fashion.
For memory leaks bugs, consider valgrind and several -fsanitize= debugging options. Read also about garbage collection (and the GC handbook), notably Boehm's conservative garbage collector, and about compile-time garbage collection techniques.
Read about the MILEPOST project in GCC.
Consider also OpenMP, OpenCL, MPI, multi-threading, etc... Notice that parallelization is a difficult art.
Notice that even GCC developers are often unable to predict the effect (on CPU time of the produced binary) of such and such optimization. Somehow optimization is a black art.
Perhaps gcc-help#gcc.gnu.org might be a good place to ask more specific & precise and focused questions about optimizations in GCC
You could also contact me on basileatstarynkevitchdotnet with a more focused question... (and mention the URL of your original question)
For scientific papers on optimizations, you'll find lots of them. Start with ACM TOPLAS, ACM TACO etc... Search for iterative compiler optimization etc.... And define better what resources you want to optimize for (memory consumption means next to nothing....).
I think I won't find that in any textbook, because answering this takes experience.
I am currently in the stage of testing/validating my code / hunting bugs to get it into production state and any errors would lead to many people suffering e.g. the dark side.
What kind of flags do you set when you compile your program for Fortran for debugging purposes?
What kind of flags do you set for the production system?
What do you do before you deploy?
The production version uses ifort as a compiler, yet I do my testing with gfortran. Am I doing it wrong?
Bare minimum
-Og/-O0
-O0 basically tells the compiler to make no optimisations. Optimiser can remove some local variables, merge some code blocks, etc. and as an outcome it can make debugging unpredictable. The price for -O0 option is very slow code execution, but starting from version 4.8 GCC compilers (including the Fortran one) accept a newly introduced optimisation level -Og:
-Og
Optimize debugging experience. -Og enables optimizations that do not interfere with debugging. It should be the optimization level of choice for the standard edit-compile-debug cycle, offering a reasonable level of optimization while maintaining fast compilation and a good debugging experience.
So, if possible use -Og, otherwise use -O0.
-g
This option actually makes debugging possible by requesting the compiler to produce debugging information intended to be used by interactive debugger (GDB).
Addititonal
There are a plenty of them. The most useful in my opinion are:
-Wall to "enable all the warnings about constructions that some users consider questionable, and that are easy to avoid (or modify to prevent the warning), even in conjunction with macros."
-Wextra to "enable some extra warning flags that are not enabled by -Wall."
-pedantic to generate warnings about language features that are supported by gfortran but are not part of the official Fortran 95 standard. It possible to be even more "pedantic" and use -std=f95 flag for warnings to become errors.
-fimplicit-none to "specify that no implicit typing is allowed, unless overridden by explicit IMPLICIT statements. This is the equivalent of adding implicit none to the start of every procedure."
-fcheck=all to "enable run-time tests", such as, for instance, array bounds checks.
-fbacktrace to "specify that, when a runtime error is encountered or a deadly signal is emitted (segmentation fault, illegal instruction, bus error or floating-point exception), the Fortran runtime library should output a backtrace of the error."
For debugging I use: -O2 -fimplicit-none -Wall -Wline-truncation -Wcharacter-truncation -Wsurprising -Waliasing -Wimplicit-interface -Wunused-parameter -fwhole-file -fcheck=all -std=f2008 -pedantic -fbacktrace. For ones not already explained, check the gfortran manual. -fcheck=all includes -fcheck=bounds.
For production I use: -O3 -march=native -fimplicit-none -Wall -Wline-truncation -fwhole-file -std=f2008. Runtime checks such as bounds checking increase the execution time of the resulting executable. I've found the cost to frequently be surprisingly low, but it can be high. Hence no runtime checks on compiles for production.