I have small program that performs much better when compiled with -O1 as opposed to no optimisation. I am interested in knowing what optimisation(s) done by the compiler is leading to this speedup.
What I thought I would do is to take the list of optimisation flags that -O1 is equivalent to (got both from the man page and from gcc -Q -v) and then to pick away at the list to see how the performance changes.
What I have found is that even including the whole list of optimisations still does not give me a program that performs as well as an -O1 optimised one.
In other words
gcc -O0 -fcprop-registers -fdefer-pop -fforward-propagate -fguess-branch-probability \
-fif-conversion -fif-conversion2 -finline -fipa-pure-const -fipa-reference \
-fmerge-constants -fsplit-wide-types -ftoplevel-reorder -ftree-ccp -ftree-ch \
-ftree-copy-prop -ftree-copyrename -ftree-dce -ftree-dominator-opts -ftree-dse \
-ftree-fre -ftree-sink -ftree-sra -ftree-ter myprogram.c
is not the same as
gcc -O1 myprogram.c
I am using gcc version 4.5.3
Is there something else that -O1 does that isn't included in the list of optimisation flags associated with -O1 in the manual?
How about using -S option to check the produced assembler?
From two experiments using also "my_program.c" it seems, that -O0 option disables all optimizations regardless of the long list of suggested algorithms.
This is expected, not a bug:
https://gcc.gnu.org/wiki/FAQ#optimization-options
Is there something else that -O1 does that isn't included in the list of optimisation flags associated with -O1 in the manual?
Yes, it turns on optimization. Specifying individual -fxxx flags doesn't do that.
If you don't use one of the -O1, -O2, -O3, -Ofast, or -Og optimization options (and not -O0) then no optimization happens at all, so adjusting which optimization passes are active doesn't do anything.
To find which optimization pass makes the difference you can turn on -O1 and then disable individual optimization passes until you find the one that makes a difference.
i.e. instead of:
gcc -fxxx -fyyy -fzzz ...
Use:
gcc -O1 -fno-xxx -fno-yyy -fno-zzz ...
Related
Recently I read that using specific compiler flags can prevent or make reverse engineering much more complicated. I'm using those flags
-s -O3 -Os -fdata-sections -ffunction-sections -fvisibility=hidden -fvisibility-inlines-hidden -Wl,--gc-sections
Is it enough protection or maybe i used too many flags?
I'm using MinGW-W64 x86_64-posix 11.3.0
I made a simple quick sort algorithm using C language, named test.c
I'm trying to maximize the optimization, so I use -O3 options like belows.
gcc -S -O3 -o test.s test.c
gcc -S -O3 -o test1.s test.s
gcc -S -O3 -o test2.s test1.s
gcc -S -O3 -o test3.s test2.s
.
.
.
But strange thing happens. The more times I did above procedure, the more number of line assembly get.
I don't know why this happens, because I think that I have to get more optimized assembly file that has smaller number of line as I did above procedure.
If this is not right way, using -O3 only one time is the way of the best optimization?
Thanks
Most of the gcc optimizations operate on the representation of C source code in an intermediate language. I'm not aware of any optimization specifically operating at the assembler instruction level other than peephole. But that would also be included in -O3.
So yes, -O3 is supposed to be used only once, when turning C source into object files.
I've read in a few places that gcc tries to perform tail-call optimization when called with -O2 but not with -O3. Why would the latter optimize less than the former? The former should perform less optimization.
I don't think that's accurate. From the gcc documentation (https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html), the flag -foptimize-sibling-calls is responsible for tail recursion elimination, and is enabled at both -O2 and -O3.
-foptimize-sibling-calls
Optimize sibling and tail recursive calls.
Enabled at levels -O2, -O3, -Os.
RTFM.
-O3 turns on all optimizations specified by -O2
(https://gcc.gnu.org/onlinedocs/gcc-4.9.2/gcc/Optimize-Options.html#Optimize-Options)
I am trying to understand how to turn off specific optimisation flags compiling with GCC. I understand that some flags have a -fno option, but most flags don't (from what I have seen). I am trying to compile a program with -O1 flags but remove one of the flags in -O1 for each compile.
For instance; -fauto-inc-dec does not have an equivalent -fno-auto-inc-dec flag that I could pass into the arguments like: -O1 -fno-auto-inc-dec.
Want to compile with -O1 options but turn off specific options given by -O1 to see the difference that causes.
Any help will be appreciated, unfortunately I'm new to this so I'm very much a beginner.
As stated in man gcc:
Most optimizations are only enabled if an -O level is set on
the command line. Otherwise they are disabled,
even if individual optimization flags are specified.
So basically by not passing any -O flags you aren't using configurable optimizations.
Also, -O1 is not the default, -O0 is.
You could also go from the opposite, disable all optimizations and enable "batches" by hand, i.e. have a look at gcc -Q --help=optimizers, see what optimizations are enabled at which level and strip those.
To address your concern that -O* options enable flags that aren't listed, I'd say that it's a man-page thing. Actively querying compiler on a particular architecture should give you an exhaustive list of optimization that will be enabled with a particular -O flag, so using -O0 in combination with the list of those flags should produce exactly the same result.
why not go the other way round? turn off all optimization with -O0 and enable them selectively.
or if you prefer disabling them one by one, start with:
CFLAGS=-O0 \
-fauto-inc-dec \
-fcompare-elim -fcprop-registers \
-fdce -fdefer-pop -fdelayed-branch -fdse \
-fguess-branch-probability \
-fif-conversion2 -fif-conversion \
-fipa-pure-const -fipa-profile -fipa-reference \
-fmerge-constants \
-fsplit-wide-types \
-ftree-bit-ccp -ftree-builtin-call-dce -ftree-ccp -ftree-ch \
-ftree-copyrename -ftree-dce -ftree-dominator-opts -ftree-dse \
-ftree-forwprop -ftree-fre -ftree-phiprop -ftree-slsr -ftree-sra \
-ftree-pta -ftree-ter \
-funit-at-a-time
(btw, all of this information is distilled from man gcc)
In a particular project, I saw the following compiler options used all at once:
gcc foo.c -o foo.o -Icomponent1/subcomponent1 -Icomponent2/subcomponent1 -Wall -fPIC -s
Are the -fPIC and -s used together contradictory here? If not, why?
-s and -fPIC are two flags used for different purposes. They are not contradictory.
From the gcc manual
-s
Remove all symbol table and relocation information from the executable.
-fPIC
If supported for the target machine, emit position-independent code, suitable for dynamic linking and avoiding any limit on the size of the global offset table. This option makes a difference on the m68k, PowerPC and SPARC.