What is the optimization level of `-S` switch to GCC - gcc

In this question, I meet the situation that gcc myfile.c -S produce the assembly code that is better than gcc myfile.c -O0 but worse than gcc myfile.c -O1.
At -O0, both loops are generated. At -O1, both loops are optimized out. (Thanks #Raymond Chen for reminder. cited from his comments) (using the -S just optimize one loop out)
I search the Internet and only find this:
-S (cited from Overall options)
Stop after the stage of compilation proper; do not assemble. The output is in the form of an assembler code file for each non-assembler input file specified.
By default, the assembler file name for a source file is made by replacing the suffix ‘.c’, ‘.i’, etc., with ‘.s’.
Input files that don't require compilation are ignored.
So my question is:
what is exactly the optimization level of -S option when it compile file? (-O0.5?)
why not just using the -O0 or -O1... (or it is a bug?)
Edit: you can use this site to help you reproduce the problem. Code is in the question I mentioned. ( If you just use -S compiler option(or no compiler option), you can get one loop elision. )
step 1:
Open this site and copy the following code in Code Eidtor.
#include <stdio.h>
int main (int argc, char *argv[]) {
unsigned int j = 10;
for (; j > -1; --j) {
printf("%u", j);
}
}
step 2:
Choose g++ 4.8 as compiler. Compiler option is empty.(or -S)
step 3:
You get the first situation. Now, change the j > -1 to j >= -1 and you can see the second.

With your last edit, it's now somewhat clear what you're actually doing, so:
For the 1. case, j > -1
This can never happen. j is an unsigned int, and -1 converted to an unsigned value will correspond to a value with all bits being set. That's the same as UINT_MAX, and j can never be greater than that. So gcc eliminates the loop, since its condition will always be false
For the 2. case, j >= -1:
This can happen. j can surely become (unsigned int)-1, or UINT_MAX as mentioned above. The loop is not eliminated.
what is exactly the optimization level of -S option when it compile file? (-O0.5?)
The optimization level is controlled with the -O flag. The -S does not impact optimization. The default optimization if no -O flag is given is -O0 (no optimization)

-S doesn't optimize. -O0, on the other hand, disables all and any optimizations, even the default ones.
So the effect that you see is that you're "enabling" the default optimizations if you use just -S.
Use -S with various -O options to see the effect on the assembler code.
EDIT I've been using GCC since about 2.6 (in 1994). I'm pretty sure I remember that in some versions, the compiler would do default optimizations that you could disable with -O0 to debug the compiler (i.e. gcc ... crashes, gcc -O0 ... doesn't crash -> congrats, you found a bug).
But that doesn't seem to be the case here. I get the same assembler output for -S, -O0 and not giving either. So it seems that the simple optimizations (like if(0){} to comment out a code block) are always applied, no matter which optimization level is selected.
Therefore, I'd say is that original statement above:
At -O0, both loops are generated. At -O1, both loops are optimized out. (Thanks #Raymond Chen for reminder. cited from his comments) (using the -S just optimize one loop out)
is not correct to begin with (at least for GCC 4.8.2). The only other alternative is that the GCC version used by the OP (4.8) has a bug when it comes to enabling/disabling optimizer options.

Related

GCC: __atomic_always_lock_free compiles with -O3, but not with -O0

Sample code:
int *s;
int foo(void)
{
return 4;
}
int bar(void)
{
return __atomic_always_lock_free(foo(), s);
}
Invocations:
$ gcc t0.c -O3 -c
<nothing>
$ gcc t0.c -O0 -c
t0.c:10:10: error: non-constant argument 1 to '__atomic_always_lock_free'
Any ideas?
Relevant: https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html.
This doesn't seem surprising. The documentation you linked says that "size must resolve to a compile-time constant" and so it's to be expected that you might get an error when passing foo(). However, it's typical that if GCC is able to determine the value of an expression at compile time, then it will treat it as a compile-time constant, even if it doesn't meet the language's basic definition of a constant expression. This may be considered an extension and is explicitly allowed by the C17 standard at 6.6p10.
The optimization level is relevant to what the compiler tries in attempting to evaluate an expression at compile time. With optimizations off, it does little more than the basic constant folding that the standard requires (e.g. 2*4). With optimizations on, you get the benefit of its full constant propagation pass, as well as function inlining.
So in essence, under -O0, the compiler doesn't notice that foo() always returns the same value, because you've disabled the optimizations that would allow it to reach that conclusion. With -O3 it does and so it accepts it as a constant.

How to find hanging LLVM optimization pass?

I've written an LLVM pass that replaces a few store instructions with calls to a function that perform some book-keeping, and then performs the store in a special way. It works fine when I compile with -O0, but I can only guarantee the functionality of my pass when using -O3. When I compile with -O3(or -O1/-O2), it completes my pass successfully, and then hangs in some later optimization stage. Is there a way to discover which optimization pass is hanging / why?
Just so I don't have to provide it later, here is my code and my compile line.
clang++-5.0 -std=c++11 -Xclang -load -Xclang ../../plugin/build/mylib.so single_param.cc -c -I ../../libs/ -S -emit-llvm -O3
The problem is not in code generation because I'm only generating bitcode. I noticed that stores in -O3 (without my pass) include alias information, and I thought that since I'm deleting these instructions, some later optimization using this alias information might encounter some trouble, so I turned off most of the alias analysis using -fno-strict-aliasing
#include <stdio.h>
#include <stdlib.h>
#include <memory.h>
void __attribute__((noinline)) f(int *n){
*n = *n + 1;
}
int main(){
int a = 4;
f(&a);
return a;
}
The way I was able to find the pass that was stalling was by turning remarks on with
-Rpass=.* -Rpass-missed=.* -Rpass-analysis=.*
I found that the only optimization pass giving remarks was tail call optimization, so I turned it off. I later found the problem with my code, but this is how I found the problem I was causing.

Passing multiple -std switches to g++

Is it safe to assume that running g++ with
g++ -std=c++98 -std=c++11 ...
will compile using C++11? I haven't found an explicit confirmation in the documentation, but I see the -O flags behave this way.
The GCC manual doesn't state that the
last of any mutually exclusive -std=... options specified takes effect. The first occurrence
or the last occurrence are the only alternatives. There are numerous
GCC flags that take mutually exclusive alternative values from a finite set - mutually
exclusive, at least modulo the language of a translation unit. Let's call them mutex options for short.
It is a seemingly random rarity for it to be documented that the last setting takes effect. It is
documented for the -O options as you've noted, and in general terms for mutually exclusive warning options, perhaps
others. It's never documented that the first of multiple setting takes effect, because
it's never true.
The documentation leans - with imperfect consistency - on the historical conventions
of command usage in unix-likes OSes. If a command accepts a mutex option
then the last occurrence of the option takes effect. If the command were - unusually -
to act only on the first occurrence of the option then it would be a bug for
the command to accept subsequent occurrences at all: it should give a usage error.
This is custom and practice. The custom facilitates scripting with tools that
respect it, e.g. a script can invoke a tool passing a default setting of some
mutex option but enable the user to override that setting via a parameter of the script,
whose value can simply be appended to the default invocation.
In the absence of official GCC documentation to the effect you want, you might get
reassurance by attempting to find any GCC mutex option for which it is not
the case that the last occurrence takes effect. Here's one stab:
I'll compile and link this program:
main.cpp
#include <cstdio>
#if __cplusplus >= 201103L
static const char * str = "C++11";
#else
static const char * str = "Not C++11";
#endif
int main()
{
printf("%s\n%d\n",str,str); // Format `%d` for `str` mismatch
return 0;
}
with the commandline:
g++ -std=c++98 -std=c++11 -m32 -m64 -O0 -O1 -g3 -g0 \
-Wformat -Wno-format -o wrong -o right main.cpp
which requests contradictory option pairs:
-std=c++98 -std=c++11: Conform to C++98. Conform to C++11.
-m32 -m64: Produce 32-bit code. Produce 64-bit code.
-O0 -O1: Do not optimise at all. Optimize to level 1.
-g3 -g0: Emit maximum debugging info. Emit no debugging info.
-Wformat -Wno-format. Sanity-check printf arguments. Don't sanity check them.
-o wrong -o right. Output program wrong. Output program right
It builds successfully with no diagnostics:
$ echo "[$(g++ -std=c++98 -std=c++11 -m32 -m64 -O0 -O1 -g3 -g0 \
-Wformat -Wno-format -o wrong -o right main.cpp 2>&1)]"
[]
It outputs no program wrong:
$ ./wrong
bash: ./wrong: No such file or directory
It does output a program right:
$ ./right
C++11
-1713064076
which tells us it was compiled to C++11, not C++98.
The bug exposed by the garbage -1713064076 was not diagnosed because
-Wno-format, not -Wformat, took effect.
It is a 64-bit, not 32-bit executable:
$ file right
right: ELF 64-bit LSB shared object, x86-64 ...
It was optimized -O1, not -O0, because:
$ "[$(nm -C right | grep str)]"
[]
shows that the local symbol str is not in the symbol table.
And it contains no debugging information:
echo "[$(readelf --debug-dump right)]"
[]
as per -g0, not -g3.
Since GCC is open-source software, another way of resolving doubts
about its behaviour that is available to C programmers, at least,
is to inspect the relevant source code, available via git source-control at
https://github.com/gcc-mirror/gcc.
The relevant source code for your question is in file gcc/gcc/c-family/c-opts.c,
function,
/* Handle switch SCODE with argument ARG. VALUE is true, unless no-
form of an -f or -W option was given. Returns false if the switch was
invalid, true if valid. Use HANDLERS in recursive handle_option calls. */
bool
c_common_handle_option (size_t scode, const char *arg, int value,
int kind, location_t loc,
const struct cl_option_handlers *handlers);
It is essentially a simple switch ladder over option settings enumerated by scode - which
is OPT_std_c__11 for option -std=c++11 - and leaves no doubt that it
puts an -std option setting into effect regardless of what setting was in effect previously. You can look at branches other than master
(gcc-{5|6|7}-branch) with the same conclusion.
It's not uncommon to find GCC build system scripts that rely on the validity of
overriding an option setting by appending a new setting. Legalistically, this
is usually counting on undocumented behaviour, but there's a better
chance of Russia joining NATO than of GCC ceasing to take the last setting that
it parses for a mutex option.

gcc optimizations: how to deal with macro expantion in strncmp & other functions

Take this sample code:
#include <string.h>
#define STRcommaLEN(str) (str), (sizeof(str)-1)
int main() {
const char * b = "string2";
const char * c = "string3";
strncmp(b, STRcommaLEN(c));
}
If you don't use optimizations in GCC, all is fine, but if you add -O1 and above, as in gcc -E -std=gnu99 -Wall -Wextra -c -I/usr/local/include -O1 sample.c, strncmp becomes a macro, and in preprocessing stage STRcommaLen is not expanded. In fact in resulting "code" strncmp's arguments are completely stripped.
I know if I add #define NEWstrncmp(a, b) strncmp (a, b) and use it instead, the problem goes away. However, mapping your own functions to every standard function that may become a macro doesn't seem like a great solution.
I tried finding the specific optimization that is responsible for it and failed. In fact if I replace -O1 with all the flags that it enables according to man gcc, the problem goes away. My conclusion is that -O1 adds some optimizations that are not controlled by flags and this is one of them.
How would you deal with this issue in a generic way? There may be some macro magic I am not familiar with or compiler flags I haven't looked at? We have many macros and a substantial code base - this code is just written to demonstrate one example.
Btw, GCC version/platform is gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5).
Thanks,
Alen
You correctly noted that
in preprocessing stage STRcommaLen is not expanded
- more precisely, not before the strncmp macro gets expanded. This inevitably leads to an error you probably overlooked or forgot to mention:
sample.c:7:30: error: macro "strncmp" requires 3 arguments, but only 2 given
Your conclusion
that -O1 adds some optimizations that are not controlled by flags and
this is one of them
is also right - this is controlled by the macro __OPTIMIZE__ which apparently gets set by -O1.
If I'd do something like that (which I probably wouldn't, in respect of the pitfall you demonstrated by using sizeof a char *), I'd still choose
mapping your own functions to every standard function that may become
a macro
- but rather like
#include <string.h>
#define STRNCMP(s1, s2) strncmp(s1, s2, sizeof(s2)-1)
int main()
{
const char b[] = "string2";
const char c[] = "string3";
STRNCMP(b, c);
}

How effective is g++/gcc at unrolling recursive inline functions?

I've recursive but not tail recursive inline function for which I'd want gcc to unroll the recursion. Yes, I'm using g++ -O3 -funroll-loops of course.
inline void recurse_fun(..., unsigned depth = 0, unsigned max_depth = 40) {
if (++depth > max_depth) return;
for (auto i = ..., iend = ...; i != iend; i++) {
if (...) continue;
...
recurse_fun(...,depth,max_depth);
}
}
I could easily replace this by handling a stack<...> object manually, which gcc should unroll properly, but it would not be as quite as elegant or maintainable.
I should really try profiling both versions regardless, but I'm curious if anyone can say with confidence that some recent gcc version would or would not handle this correctly.
GCC (at least recent versions like 4.5 or 4.6) does unroll some tail recursive calls.
Of course you need to ask it to optimize (so -O2 or -O3 is required).
To understand what it is doing you can
Ask for the assembly output with something like gcc -O3 -fverbose-asm -S yoursource.c
Ask for various dump files, like gcc -c -fdump-tree-all -fdump-ipa-all -O3 yoursource.c (and there are other dump files)
Beware that GCC would print a lot (hundreds!) of dump files. And the dump files are only to help GCC developers or GCC plugin developers (or GCC MELT developpers). Don't expect them to stay in the same format from one release of GCC to the next.
The numbering of the dump files is useless: it is not chronological or logical.
And the dump options are likely to change in next GCC release (4.7, probably in 2012)

Resources