I have a Fortran program that gives different results with -O0 and -O1 in 32bit systems. Tracking down the difference, I came up with the following test case (test.f90):
program test
implicit none
character foo
real*8 :: Fact,Final,Zeta,rKappa,Rnxyz,Zeta2
read(5,*) rKappa
read(5,*) Zeta
backspace(5)
read(5,*) Zeta2
read(5,*) Rnxyz
Fact=rKappa/Sqrt(Zeta**3)
write(6,'(ES50.40)') Fact*Rnxyz
Fact=rKappa/Sqrt(Zeta2**3)
Final = Fact*Rnxyz
write(6,'(ES50.40)') Final
end program test
with this data file:
4.1838698196228139E-013
20.148674000000000
-0.15444754236171612
The program should write exactly the same number. Note that Zeta2 is the same as Zeta, since the same number is read again (this is to prevent the compiler realizing they are the same number and hiding the problem). The only difference is that first an operation is done "on the fly" when writing, and then the result is saved in a variable and the variable is printed.
Now I compile with gfortran 4.8.4 (Ubuntu 14.04 version) and run it:
$ gfortran -O0 -m32 test.f90 && ./a.out < data
-7.1447898573566615177997578153994664188136E-16
-7.1447898573566615177997578153994664188136E-16
$ gfortran -O1 -m32 test.f90 && ./a.out < data
-7.1447898573566615177997578153994664188136E-16
-7.1447898573566605317236262891347096541529E-16
So, with -O0 the numbers are identical, with -O1 they are not.
I tried checking the optimized code with -fdump-tree-optimized:
final.10_53 = fact_44 * rnxyz.9_52;
D.1835 = final.10_53;
_gfortran_transfer_real_write (&dt_parm.5, &D.1835, 8);
[...]
final.10_63 = rnxyz.9_52 * fact_62;
final = final.10_63;
[...]
_gfortran_transfer_real_write (&dt_parm.6, &final, 8);
The only difference I see is that in one case the number printed is fact*rnxyz, and in the other it is rnxyz*fact. Can this change the result? From High Performance Mark's answer, I guess that might have to do with which variable goes to which register when. I also tried looking at the assembly output generated with -S, but I can't say I understand it.
And then, without the -m32 flag (on a 64bit machine), the numbers are also identical...
Edit: The numbers are identical if I add -ffloat-store or -mfpmath=sse -sse2 (see here, at the end). This makes sense, I guess, when I compile in an i686 machine, as the compiler would by default use 387 math. But when I compile in an x86-64 machine, with -m32, it shouldn't be needed according to the documentation:
-mfpmath=sse [...]
For the i386 compiler, you must use -march=cpu-type, -msse or -msse2 switches to enable SSE extensions and make this option effective. For the x86-64 compiler, these extensions are enabled by default.
[...]
This is the default choice for the x86-64 compiler.
Maybe -m32 makes these "defaults" ineffective? However, running gfortran -Q --help=target says mfpmath is 387 and msse2 is disabled...
Too long for a comment, but more of a suspicion than an answer. OP writes
The only difference is that first an operation is done "on the fly"
when writing, and then the result is saved in a variable and the
variable is printed.
which has me thinking about the x86_64 architecture's internal 80-bit f-p arithmetic. The precise results of a sequence of f-p arithmetic operations will be affected by when intermediate values are trimmed from 80- to 64-bits. And that's the kind of thing which may differ from one compiler optimisation level to another.
Note too that the differences between the two numbers printed by the O1 version of the code kick in at the 15th decimal digit, about the limits of precision available in 64-bit f-p arithmetic.
Some more fiddling around gives
1 01111001100 1001101111011110011111001110101101101100011000001110
as the IEEE-754 representation of
-7.1447898573566615177997578153994664188136E-16
and
1 01111001100 1001101111011110011111001110101101101100011000001101
as the IEEE-754 representation of
-7.1447898573566605317236262891347096541529E-16
The two numbers differ by 1 in their significands. It's possible that at O0 your compiler adheres to IEEE-754 rules for f-p arithmetic (those rules are strict about matters such as rounding at the low-order bits) but at O1 adheres only to Fortran's rather more relaxed view of arithmetic. (The Fortran standard does not require the use of IEEE-754 arithmetic.)
You may find a compiler option to enforce adherence to IEEE-754 rules at higher levels of optimisation. You may also find that that adherence costs you a measurable amount of run time.
Related
Sample code:
int *s;
int foo(void)
{
return 4;
}
int bar(void)
{
return __atomic_always_lock_free(foo(), s);
}
Invocations:
$ gcc t0.c -O3 -c
<nothing>
$ gcc t0.c -O0 -c
t0.c:10:10: error: non-constant argument 1 to '__atomic_always_lock_free'
Any ideas?
Relevant: https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html.
This doesn't seem surprising. The documentation you linked says that "size must resolve to a compile-time constant" and so it's to be expected that you might get an error when passing foo(). However, it's typical that if GCC is able to determine the value of an expression at compile time, then it will treat it as a compile-time constant, even if it doesn't meet the language's basic definition of a constant expression. This may be considered an extension and is explicitly allowed by the C17 standard at 6.6p10.
The optimization level is relevant to what the compiler tries in attempting to evaluate an expression at compile time. With optimizations off, it does little more than the basic constant folding that the standard requires (e.g. 2*4). With optimizations on, you get the benefit of its full constant propagation pass, as well as function inlining.
So in essence, under -O0, the compiler doesn't notice that foo() always returns the same value, because you've disabled the optimizations that would allow it to reach that conclusion. With -O3 it does and so it accepts it as a constant.
The following works for g++
assert(nullptr == 0);
I need to know if there is any implicit type conversion that is happening.
From what I know, nullptr can be compared with pointers only and not with integers, and also that it is more type-safe. Then why the comparison with integer works?
Then why the comparison with integer works?
Because, in most implementations, the nullptr is a 0 machine address. In other words (intptr_t)nullptr is 0. This is the case on Linux/x86-64 for example. Check by inspecting the generated assembler code obtained with g++ -S -O2 -fverbose-asm
I even believe that this is guaranteed by the C++ standard (read e.g. n3337)
However, if you compile your code with a recent GCC as gcc -Wall -Wextra you could get a warning.
Read also assert(3). In some cases (with NDEBUG) it is expanded to a no-op at compilation time.
From gcc5.4 documentation, it says
-O2 turns on all optimization flags specified by -O. It also turns on the following optimization flags:
-fthread-jumps
-falign-functions -falign-jumps
-falign-loops -falign-labels
-fcaller-saves
-fcrossjumping
-fcse-follow-jumps, etc
It seems that using -O2 has the same effect of using all the 83 optimization flags turned on by -O2 in gcc 5.4.0 on the performance of the test programs.
However, I compare the running time of the executable files test1 and test2 obtained by
gcc-5.4 -O2 test.c -o test1
and
gcc-5.4 -fauto-inc-dec
-fbranch-count-reg
-fcombine-stack-adjustments
-fcompare-elim ... -fthread-jumps -falign-functions ...(all the 83 flags) test.c -o test2
I tested on 20 random generated c programs and running each test case 100000 times to make sure the measurement of running time is accurate enough. But the result is that using -O2 is averagely about 60% faster than using all the 83 flags.
I am really confused why the effect of using -O2 is not equivalent to using all the optimization flags it turns on.
I must misunderstood something, but I couldn't find any explanation yet. I'd appreciate any help. Thanks a lot.
It is a common gotcha. In order to enable (or disable) specific optimizations, you must first enable the optimizer in general, i.e. use one of -O... flags, except -O0 (or just -O, which is equivalent to -O1).
The optimisation level affects decisions in other parts of the compiler besides determining which passes get run. These can be during mandatory processes like transforming between internal representations of the code, register allocation etc, so the optimisation level is not exactly equivalent to a set of switches enabling every compiler pass.
Look at this thread for some discussion on this topic.
Is it safe to assume that running g++ with
g++ -std=c++98 -std=c++11 ...
will compile using C++11? I haven't found an explicit confirmation in the documentation, but I see the -O flags behave this way.
The GCC manual doesn't state that the
last of any mutually exclusive -std=... options specified takes effect. The first occurrence
or the last occurrence are the only alternatives. There are numerous
GCC flags that take mutually exclusive alternative values from a finite set - mutually
exclusive, at least modulo the language of a translation unit. Let's call them mutex options for short.
It is a seemingly random rarity for it to be documented that the last setting takes effect. It is
documented for the -O options as you've noted, and in general terms for mutually exclusive warning options, perhaps
others. It's never documented that the first of multiple setting takes effect, because
it's never true.
The documentation leans - with imperfect consistency - on the historical conventions
of command usage in unix-likes OSes. If a command accepts a mutex option
then the last occurrence of the option takes effect. If the command were - unusually -
to act only on the first occurrence of the option then it would be a bug for
the command to accept subsequent occurrences at all: it should give a usage error.
This is custom and practice. The custom facilitates scripting with tools that
respect it, e.g. a script can invoke a tool passing a default setting of some
mutex option but enable the user to override that setting via a parameter of the script,
whose value can simply be appended to the default invocation.
In the absence of official GCC documentation to the effect you want, you might get
reassurance by attempting to find any GCC mutex option for which it is not
the case that the last occurrence takes effect. Here's one stab:
I'll compile and link this program:
main.cpp
#include <cstdio>
#if __cplusplus >= 201103L
static const char * str = "C++11";
#else
static const char * str = "Not C++11";
#endif
int main()
{
printf("%s\n%d\n",str,str); // Format `%d` for `str` mismatch
return 0;
}
with the commandline:
g++ -std=c++98 -std=c++11 -m32 -m64 -O0 -O1 -g3 -g0 \
-Wformat -Wno-format -o wrong -o right main.cpp
which requests contradictory option pairs:
-std=c++98 -std=c++11: Conform to C++98. Conform to C++11.
-m32 -m64: Produce 32-bit code. Produce 64-bit code.
-O0 -O1: Do not optimise at all. Optimize to level 1.
-g3 -g0: Emit maximum debugging info. Emit no debugging info.
-Wformat -Wno-format. Sanity-check printf arguments. Don't sanity check them.
-o wrong -o right. Output program wrong. Output program right
It builds successfully with no diagnostics:
$ echo "[$(g++ -std=c++98 -std=c++11 -m32 -m64 -O0 -O1 -g3 -g0 \
-Wformat -Wno-format -o wrong -o right main.cpp 2>&1)]"
[]
It outputs no program wrong:
$ ./wrong
bash: ./wrong: No such file or directory
It does output a program right:
$ ./right
C++11
-1713064076
which tells us it was compiled to C++11, not C++98.
The bug exposed by the garbage -1713064076 was not diagnosed because
-Wno-format, not -Wformat, took effect.
It is a 64-bit, not 32-bit executable:
$ file right
right: ELF 64-bit LSB shared object, x86-64 ...
It was optimized -O1, not -O0, because:
$ "[$(nm -C right | grep str)]"
[]
shows that the local symbol str is not in the symbol table.
And it contains no debugging information:
echo "[$(readelf --debug-dump right)]"
[]
as per -g0, not -g3.
Since GCC is open-source software, another way of resolving doubts
about its behaviour that is available to C programmers, at least,
is to inspect the relevant source code, available via git source-control at
https://github.com/gcc-mirror/gcc.
The relevant source code for your question is in file gcc/gcc/c-family/c-opts.c,
function,
/* Handle switch SCODE with argument ARG. VALUE is true, unless no-
form of an -f or -W option was given. Returns false if the switch was
invalid, true if valid. Use HANDLERS in recursive handle_option calls. */
bool
c_common_handle_option (size_t scode, const char *arg, int value,
int kind, location_t loc,
const struct cl_option_handlers *handlers);
It is essentially a simple switch ladder over option settings enumerated by scode - which
is OPT_std_c__11 for option -std=c++11 - and leaves no doubt that it
puts an -std option setting into effect regardless of what setting was in effect previously. You can look at branches other than master
(gcc-{5|6|7}-branch) with the same conclusion.
It's not uncommon to find GCC build system scripts that rely on the validity of
overriding an option setting by appending a new setting. Legalistically, this
is usually counting on undocumented behaviour, but there's a better
chance of Russia joining NATO than of GCC ceasing to take the last setting that
it parses for a mutex option.
In this question, I meet the situation that gcc myfile.c -S produce the assembly code that is better than gcc myfile.c -O0 but worse than gcc myfile.c -O1.
At -O0, both loops are generated. At -O1, both loops are optimized out. (Thanks #Raymond Chen for reminder. cited from his comments) (using the -S just optimize one loop out)
I search the Internet and only find this:
-S (cited from Overall options)
Stop after the stage of compilation proper; do not assemble. The output is in the form of an assembler code file for each non-assembler input file specified.
By default, the assembler file name for a source file is made by replacing the suffix ‘.c’, ‘.i’, etc., with ‘.s’.
Input files that don't require compilation are ignored.
So my question is:
what is exactly the optimization level of -S option when it compile file? (-O0.5?)
why not just using the -O0 or -O1... (or it is a bug?)
Edit: you can use this site to help you reproduce the problem. Code is in the question I mentioned. ( If you just use -S compiler option(or no compiler option), you can get one loop elision. )
step 1:
Open this site and copy the following code in Code Eidtor.
#include <stdio.h>
int main (int argc, char *argv[]) {
unsigned int j = 10;
for (; j > -1; --j) {
printf("%u", j);
}
}
step 2:
Choose g++ 4.8 as compiler. Compiler option is empty.(or -S)
step 3:
You get the first situation. Now, change the j > -1 to j >= -1 and you can see the second.
With your last edit, it's now somewhat clear what you're actually doing, so:
For the 1. case, j > -1
This can never happen. j is an unsigned int, and -1 converted to an unsigned value will correspond to a value with all bits being set. That's the same as UINT_MAX, and j can never be greater than that. So gcc eliminates the loop, since its condition will always be false
For the 2. case, j >= -1:
This can happen. j can surely become (unsigned int)-1, or UINT_MAX as mentioned above. The loop is not eliminated.
what is exactly the optimization level of -S option when it compile file? (-O0.5?)
The optimization level is controlled with the -O flag. The -S does not impact optimization. The default optimization if no -O flag is given is -O0 (no optimization)
-S doesn't optimize. -O0, on the other hand, disables all and any optimizations, even the default ones.
So the effect that you see is that you're "enabling" the default optimizations if you use just -S.
Use -S with various -O options to see the effect on the assembler code.
EDIT I've been using GCC since about 2.6 (in 1994). I'm pretty sure I remember that in some versions, the compiler would do default optimizations that you could disable with -O0 to debug the compiler (i.e. gcc ... crashes, gcc -O0 ... doesn't crash -> congrats, you found a bug).
But that doesn't seem to be the case here. I get the same assembler output for -S, -O0 and not giving either. So it seems that the simple optimizations (like if(0){} to comment out a code block) are always applied, no matter which optimization level is selected.
Therefore, I'd say is that original statement above:
At -O0, both loops are generated. At -O1, both loops are optimized out. (Thanks #Raymond Chen for reminder. cited from his comments) (using the -S just optimize one loop out)
is not correct to begin with (at least for GCC 4.8.2). The only other alternative is that the GCC version used by the OP (4.8) has a bug when it comes to enabling/disabling optimizer options.