gcc tail call optimization with -O2 but not -O3

gcc tail call optimization with -O2 but not -O3 - gcc

I've read in a few places that gcc tries to perform tail-call optimization when called with -O2 but not with -O3. Why would the latter optimize less than the former? The former should perform less optimization.

I don't think that's accurate. From the gcc documentation (https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html), the flag -foptimize-sibling-calls is responsible for tail recursion elimination, and is enabled at both -O2 and -O3.
-foptimize-sibling-calls
Optimize sibling and tail recursive calls.
Enabled at levels -O2, -O3, -Os.

RTFM.
-O3 turns on all optimizations specified by -O2
(https://gcc.gnu.org/onlinedocs/gcc-4.9.2/gcc/Optimize-Options.html#Optimize-Options)

Related

Compiler flags which make reverse engineering harder

Recently I read that using specific compiler flags can prevent or make reverse engineering much more complicated. I'm using those flags
-s -O3 -Os -fdata-sections -ffunction-sections -fvisibility=hidden -fvisibility-inlines-hidden -Wl,--gc-sections
Is it enough protection or maybe i used too many flags?
I'm using MinGW-W64 x86_64-posix 11.3.0

Is gcc flags repetition and ordering important?

I see some of gcc flags repeated when building C extension for python. When I run:
python setup.py build_ext
The running build command looks like this:
gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -march=x86-64 -mtune=generic -O3 -pipe -fstack-protector-strong -fno-plt -flto=4 -fuse-linker-plugin -ffat-lto-objects -flto-partition=none -march=x86-64 -mtune=generic -O3 -pipe -fstack-protector-strong -fno-plt -march=x86-64 -mtune=generic -O3 -pipe -fstack-protector-strong -fno-plt -fPIC -I/usr/include/python3.7m -c /tmp/src/source.c -o build/temp.linux-x86_64-3.7/tmp/src/source.o
gcc -pthread -shared -Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now -flto=4 -fuse-linker-plugin -ffat-lto-objects -flto-partition=none -Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now build/temp.linux-x86_64-3.7/tmp/src/source.o -L/usr/lib -lpython3.7m -o build/lib.linux-x86_64-3.7/source.cpython-37m-x86_64-linux-gnu.so
Ok, that's a long one. But, as you can see some flags are repeated. Here is the repetition in the first command:
-O3 repeated 4 times.
-fno-plt repeated 3 times.
-fstack-protector-strong repeated 3 times.
-march=x86-64 repeated 3 times.
-mtune=generic repeated 3 times.
-pipe repeated 3 times.
Beside -Wl,... flags which are passed to linker, does those flags repetition and ordering have any meanings?

It's not unusual to see repetitions of options in GCC commandlines
that have been generated by some tool-stack (frequently an IDE), with
human input "at the top".
None of the repetitions you have spotted makes any difference to the
meaning of the commandline. Usually such repetitions just amount to innocuous
redundancy, and they can have a rational
motive. A tool that is adding something incrementally to a GCC commandline may
wish to ensure that a certain option is in enabled at that point, even if it
might have been somehow disabled by an option appended since the last occurrence that enabled it. Repeating
the option redundantly may be cheaper than checking whether it is redundant.
But repetition is not necessarily innocuous...
If an option OPT occurs at some point in the commandline:
... OPT ...
then replacing that one occurrence with 2 or more will not make a difference.
However, if the commandline is of the form:
... OPT1 ... OPT2 ...
Then adding another occurrence of OPT1 anywhere after OPT2 may well make a
difference. Likewise adding another occurrence of OPT2 anywhere before OPT1.
That is because the order in which options occur very often makes a
difference.
An option is routinely composed of a flag and a value, e.g
-O3 -> Flag = -O, value = 3
-I./inc -> Flag = -I, value = ../inc
Some flags, like -O can take any one of a set of mutually
exclusive values. Call these mutex flags, for short. When a mutex flag occurs repeatedly with
countervailing values, the last in the commandline prevails:
-O1 -O2 -O3 = -O3
-O3 -O2 -O1 = -O1
Others flags, like -I, can take arbitary
non-exclusive values successively that are accumulated, in their order of occurrence, to form a sequence that
is one of the parameters of compilation or linkage. E.g.
-I./foo -I./bar
appends ./foo and then ./bar to the user-specified include-directory
search order for compilation. Call these cumulative flags.
Other flags are boolean and have an enabling form and a disabling
form , e.g. -fstack-protector,
-fno-stack-protector. These can be equated to mutex options with
exclusive possible values True and False.
And yet another kind of flag, like -l, accepts arbitrary non-exclusive values
successively that are not accumulated, but each just becomes the value of the flag
at that point in the commandline. For all I can recall, -l is the only flag of this kind,
which is an anomalous kind: -lfoo isn't really an option so much as a positional argument
to which the flag attaches a method of interpretation. It says that a file
libfoo.{so|a} is to be input to the linkage at this point, whose absolute pathname
the linker is to discover algorithmically (with reference to the -L options). Let's
call such flags positional flags.
For mutex flags, the meaning of a commandline can be changed if an option
occurring somewhere is repeated later. E.g.
-fno-stack-protector -O1 -O3 -fstack-protector
already looks as if too many cooks have been spoiling the broth, and
is equivalent to:
-O3 -fstack-protector
But if we append some repetition:
-fno-stack-protector -O1 -O3 -fstack-protector -fno-stack-protector -O1
it becomes equivalent to:
-O1 -fno-stack-protector
For cumulative flags, it's easier to envisage messing with the meaning
of the commandline by repeating an option before some occurrence than after:
-I./foo -I./bar
means what it says. Whereas
-I./bar -I./foo -I./bar
means the same as:
-I./bar -I./foo
But that sort of messing hardly happens in practice, because repetitions of
options are almost always generated by appending
a repetition to a commandline during incremental construction.
Positional flags are by definition sensitive to order, both amongst themselves
and in relation to other options and positional arguments. Every permutation of
... -lfoo -lbar main.o ...
yields a different linkage. And repetition of options with positional flags
can also easily make a difference. Notoriously,
... -lfoo main.o ...
may well result in a linkage failure, which
... -lfoo main.o -lfoo
would fix.
So emphatically, yes, repetition and ordering of flags can be important.

How can I maximize optimization using gcc?

I made a simple quick sort algorithm using C language, named test.c
I'm trying to maximize the optimization, so I use -O3 options like belows.
gcc -S -O3 -o test.s test.c
gcc -S -O3 -o test1.s test.s
gcc -S -O3 -o test2.s test1.s
gcc -S -O3 -o test3.s test2.s
.
.
.
But strange thing happens. The more times I did above procedure, the more number of line assembly get.
I don't know why this happens, because I think that I have to get more optimized assembly file that has smaller number of line as I did above procedure.
If this is not right way, using -O3 only one time is the way of the best optimization?
Thanks

Most of the gcc optimizations operate on the representation of C source code in an intermediate language. I'm not aware of any optimization specifically operating at the assembler instruction level other than peephole. But that would also be included in -O3.
So yes, -O3 is supposed to be used only once, when turning C source into object files.

equivalent of pgcc "-Minfo=" flag for gcc compiler?

I just discovered the nice "-Minfo=" flag in pgcc, which outputs all the optimizations that the compiler is making.
IE:
pgcc -c -pg -O3 -Minfo=all -Minline -c -o example.o example.c
run:
55, Memory zero idiom, loop replaced by call to __c_mzero8
91, Memory zero idiom, loop replaced by call to __c_mzero8
pgcc -c -pg -O3 -Minfo=all -Minline -c -o controller.o controller.c
main:
82, second inlined, size=4, file controller.c (113)
84, second inlined, size=4, file controller.c (113)
is there an equivalent compiler flag for GCC?

Yes there is. -fopt-info is what you are looking for.
gcc -O3 -fopt-info example.c -o example
Or equivalently you can do
gcc -O3 -fopt-info-all=all.dat example.c -o example
Will output all the optimization information to file all.dat. You can also be specific about which optimization information you want by specifying -fopt-info-options like so:
-fopt-info-loop # info about all loop optimizations
-fopt-info-vec # info about auto-vectorization
-fopt-info-inline # info about function inlining
-fopt-info-ipa # info about all interprocedural optimizations
You can get more specific if you want by telling gcc to dump information only about loops/inlinings/vectorizations that were optimized or were missed
-fopt-info-inline-optimized # info only about functions that were inlined
-fopt-info-vec-missed # info only about vectorizations that were missed
-fopt-info-loop-note # verbose info about loop optimization
For more details look at the online documentation.

Selecting gcc optimisation flags equivalent to -O1

I have small program that performs much better when compiled with -O1 as opposed to no optimisation. I am interested in knowing what optimisation(s) done by the compiler is leading to this speedup.
What I thought I would do is to take the list of optimisation flags that -O1 is equivalent to (got both from the man page and from gcc -Q -v) and then to pick away at the list to see how the performance changes.
What I have found is that even including the whole list of optimisations still does not give me a program that performs as well as an -O1 optimised one.
In other words
gcc -O0 -fcprop-registers -fdefer-pop -fforward-propagate -fguess-branch-probability \
-fif-conversion -fif-conversion2 -finline -fipa-pure-const -fipa-reference \
-fmerge-constants -fsplit-wide-types -ftoplevel-reorder -ftree-ccp -ftree-ch \
-ftree-copy-prop -ftree-copyrename -ftree-dce -ftree-dominator-opts -ftree-dse \
-ftree-fre -ftree-sink -ftree-sra -ftree-ter myprogram.c
is not the same as
gcc -O1 myprogram.c
I am using gcc version 4.5.3
Is there something else that -O1 does that isn't included in the list of optimisation flags associated with -O1 in the manual?

How about using -S option to check the produced assembler?
From two experiments using also "my_program.c" it seems, that -O0 option disables all optimizations regardless of the long list of suggested algorithms.

This is expected, not a bug:
https://gcc.gnu.org/wiki/FAQ#optimization-options
Is there something else that -O1 does that isn't included in the list of optimisation flags associated with -O1 in the manual?
Yes, it turns on optimization. Specifying individual -fxxx flags doesn't do that.
If you don't use one of the -O1, -O2, -O3, -Ofast, or -Og optimization options (and not -O0) then no optimization happens at all, so adjusting which optimization passes are active doesn't do anything.
To find which optimization pass makes the difference you can turn on -O1 and then disable individual optimization passes until you find the one that makes a difference.
i.e. instead of:
gcc -fxxx -fyyy -fzzz ...
Use:
gcc -O1 -fno-xxx -fno-yyy -fno-zzz ...

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio