What does '-Olimit 2000' mean for cc - gcc

I try to compile an old program (which was compiled by cc) using gcc. In the makefile there is one line like this:
CFLAGS = -O2 -Olimit 2000 -w
There is no '-Olimit 2000' in gcc. I am wondering what does it really mean. Whether it is safe to just delete this option when using gcc.

As far as I can tell, this was only supported by IRIX's C compiler. I can't even find a solid reference as to what it was used for. Since it doesn't do anything with GCC, its definitely safe to remove it.
A little more detail, it was used to disable optimization on routines that were larger than the "Olimit". This limit is to make it so the amount of time doing optimization is limited. If you specify 0 for the Olimit, it means an "infinite Olimit" and will optimization every routine. Here's a man page for MIPSpro: http://cimss.ssec.wisc.edu/~gumley/modis/old/mips_64.pdf

Related

How to use GCC LTO with differently optimized object files?

I'm compiling an executable with arm-none-eabi-gcc for a Cortex-M4 based microcontroller. Non-performance-critical code is compiled with -Os (optimized for executable code size) and performance critical parts with another optimalization flags, eg. -Og / -O2 etc.
Is it safe to use -flto in such a build? If so, which optimalization flag should be passed to the linker?
According to the GCC documentation regarding optimise options:
It is recommended that you compile all the files participating in the same link with the same options
Such a statement is rather vague. Nevertheless, when digging into the release notes of GCC 5, there are some additional details:
Command-line optimization and target options are now streamed on a per-function basis and honored by the link-time optimizer. This change makes link-time optimization a more transparent replacement of per-file optimizations. It is now possible to build projects that require different optimization settings for different translation units (such as -ffast-math, -mavx, or -finline).
And also information about which flags are affected by such limitations and which aren't:
Note that this applies only to those command-line options that can be passed to optimize and target attributes. Command-line options affecting global code generation (such as -fpic), warnings (such as -Wodr), optimizations affecting the way static variables are optimized (such as -fcommon), debug output (such as -g), and --param parameters can be applied only to the whole link-time optimization unit. In these cases, it is recommended to consistently use the same options at both compile time and link time.
In your scenario, the optimisation flags -Og, -O2 and -Os can be passed as optimise attributes and do not fall into the cases where the compile time and link time flags ought to be the same. So yes, it should be safe to use -flto in such a build.
Regarding the optimisations flags passed at link time, as stated in the release notes:
Contrary to earlier GCC releases, the optimization and target options
passed on the link command line are ignored.
GCC automatically determines which optimisation level to use, which is the highest level used when compiling the object files. You therefore don't need to pass any of your -O optimisation options to the linker.

C++ Compilation flags for an R package in Windows/Mac

I developed an R package which calls C++ code through Rcpp and RcppEigen. My Makevars.win looks like this (the enumeration is meant to refer to my questions)
CXX_STD = CXX11
PKG_CPPFLAGS = -fopenmp -O3 -Wall -ftree-vectorize -march=native -mavx -mfma
PKG_CXXFLAGS += $(SHLIB_OPENMP_CXXFLAGS)
PKG_LIBS = -fopenmp
PKG_LIBS += $(LAPACK_LIBS) $(BLAS_LIBS) $(FLIBS) $(SHLIB_OPENMP_CXXFLAGS)
PKG_CPPFLAGS += -I../inst/include/
as I want to use OpenMP and link the R package against Intel MKL library. I am also adding in my source files the plugins // [[Rcpp::plugins(cpp11)]] and // [[Rcpp::plugins(openmp)]].
When I compile the package everything works fine but I am still getting the default compilation flags -O2 and -std=c++0x. So my questions are:
A. isn't 1. supposed to force -std=c++11 (by the way, using the same Makevars yields the right C++ version, so there must be something specific to Windows)?
B. does 3 repeats fopenmp in 2?
C. how to check whether 5. has been taken into account? I am asking this as the same package built on Mac is much faster than on Windows while their configurations are the same. I have done some benchmark of the same code on Windows using Microsoft R Open and Mac, and Windows was faster in that case.
Thank you very much for your very precious help.
Where to start?
First off, compilation and linking options are based on the union of R's Makeconf and you package's src/Makevars. You can add to value, you cannot replace.
Second, and related, which BLAS you get is a system setup issue. You cannot generally govern that from your package.
Third, plugins for sourceCpp() and cppFunction(). In packages you make direct declarations, ie CXX_STD=CXX11.
Fourth, there are almost 1000 packages on CRAN using Rcpp. Sometimes it helps just to look at what some of these do. Many employ OpenMP.
Fifth, OpenMP is severely challenging on OS X thanks to Apple. I've forgotten what the Windows situation is. It just works on Linux.

How does gcc's linktime optimisation (-flto flag) work

I understand more or less the idea: When compiling separate modules and producing assembly code, functions calling each other have to respect strictly the calling convention, which kills the opportunity for many optimisations when compiling separate modules.
For instance if I have function A which calls function B which calls function C, all 3 in their own separate source files, it becomes possible to allocate registers evenly within the functions so that no register saving on the stack is necessary at all during those calls. With traditional compile-assembly-linking this is not possible, as the caller-saved and callee-saved registers are imposed by the calling convention.
Another optimisation is to inline functions which are called only once. This previously was possible only if a function is local, but thanks to linktime optimisation it's now possible even if the function is in another source file.
Now, if I compile with both -flto and -S flags, I see that instead of normal assembly instructions, gcc generates an encoded representation of the program, such as this:
.section .gnu.lto_.inline.c3c5e6ef8ec983c,"dr0"
.ascii "x\234mQ;N\303#\20}\273\353\17\370C\234\20\242`\"!Q\20\11Ah\322&\25\242\314\231|\4\32\220\220(,$.#\205D\343\3P Z.\341Tn\231\35\274\31L\342\342\355\314\274\371<\317\30\354\376\356\365\357\333\7\262"
.ascii "1\240G\325\273\202\7\216\232\204\36\205"
.ascii "8\242\370\240|\222"
.ascii "8\374\21\205ty\352\"*r\340!:!n\357n%]\224\345\10|\304\23\342\274z\346"
.ascii "8\35\23\370\7\4\1\366s\362\203j\271]\27bb{\316\353\27\343\310\4\371\374\237*n#\220\342rA\31"
.ascii "7\365\263\327\231\26\364\10"
.ascii "2\\-\311\277\255^w\220}|\340\233\306\352\263\362Qo+e+\314\354\277\246\354\252\277\20\364\224%T\233'eR\301{\32\340\372\313\362\263\242\331\314\340\24\6\21s\210\243!\371\347\325\333&m\210\305\203\355\277*\326\236\34\300-\213\327\306\2Td\317\27\231\26tl,\301\26\21cd\27\335#\262L\223"
.ascii "8\353\30\351\264{I\26\316\11\14"
.ascii "9\326h\254\220B}6a\247\13\353\27M\274\231"
.ascii "0\23M\332\272\272%d[\274\36Q\200\37\321\1&\35"
Since the data is in its own particular section, the linker sees this, and does the code generation. If the module was written in either assembly or with no -flto flag, then the linker would see data in the .text section instead, so there is no confusion possible for the linker.
The problem is: How can the linker generate code? Normally only gcc can generate code, the linker's role is just here to change a few offsets and adapt the binary format. In order to generate code, the linker would need to contain a second copy of the entire gcc backend (half of the compiler which generates assembly code from intermediate representation), as well as the entire assembler (since no assembly code was produced). How is such a thing possible, especially considering that binutils is a completely separate entity from gcc, developed by different teams?
GCC's -flto emits a serialized form of GCC's internal representation, as you discovered.
Then, at link time, the linker reinvokes GCC and passes it the objects that need final compilation. GCC reads the internal representation and does the work.
I think the actual work is done in collect2, which is part of GCC that is used when invoking the linker (I'm a little fuzzy on the details). There is also a "linker plugin" system that enables this to work a little better (like letting the linker decide how to split the compilation). This is implemented at least by the binutils ld and by gold; but as far as I recall this is just an optimization and isn't needed to get the basic -flto feature to work. You can see a bit more information on the original LTO project page; and maybe links from there would explain more.
There is more overlap between the GCC and binutils teams than you might think. The two projects share some code and have a long history of working together. Some people work on both projects.
From https://gcc.gnu.org/wiki/LinkTimeOptimization:
Despite the "link time" name, LTO does not need to use any special
linker features. The basic mechanism needed is the detection of GIMPLE
sections inside object files. This is currently implemented in
collect2 [which is called by gcc; -ps]. Therefore, LTO will work on any linker already supported by
GCC.
I assume this means you must link calling the compiler driver gcc. Simply linking with the system's vanilla linker wouldn't optimize the whole program, as you already concluded.
Update:
https://gcc.gnu.org/onlinedocs/gccint/Collect2.html says
The program collect2 is installed as ld in the directory where the
passes of the compiler are installed. When collect2 needs to find the
real ld it tries the following file names: [...]
(The page goes on detailing how collect2 looks for configuration-dependent executables and ones with well-known names like real-ld, finally even ld; but will not call itself recursively.)

gcc; Aarch64; Armv8; enable crypto; -mcpu=cortex-a53+crypto

I am trying to optimize an Arm processor (Corte-A53) with an Armv8 architecture for crypto purposes.
The problem is that however the compiler accepts -mcpu=cortex-a53+crypto etc it doesn't change the output (I checked the assembly output).
Changing mfpu, mcpu add futures like crypto or simd, it doesn't matter, it is completely ignored.
To enable Neon code -ftree-vectorize is needed, how to make use of crypto?
(I checked the -O(1,2,3) flags, it won't help).
Edit: I realized I made a mistake by thinking the crypto flag works like an optimization flag solved by the compiler. My bad.
You had two questions...
Why does -mcpu=cortex-a53+crypto not change code output?
The crypto extensions are an optional feature under the AArch64 state of ARMv8-A. The +crypto feature flag indicates to the compiler that these instructions are available use. From a practical perspective, in GCC 4.8/4.9/5.1, this defines the macro __ARM_FEATURE_CRYPTO, and controls whether or not you can use the crypto intrinsics defined in ACLE, for example:
uint8x16_t vaeseq_u8 (uint8x16_t data, uint8x16_t key)
There is no optimisation in current GCC which will automatically convert a sequence of C code to use the cryptography instructions. If you want to make this transformation, you have to do it by hand (and guard it by the appropriate feature macro).
Why do the +fpu and +simd flags not change code output?
For -mcpu=cortex-a53 the +fp and +simd flags are implied by default (for some configurations of GCC +crypto may also be implied by default). Adding these feature flags will therefore not change code generation.

gcc 4.4.4 optimization error only with O1 OR O2+no-strict-aliasing

Post-Answer-Acceptance Summary: The problem was the use of a pointer to a stack variable that had gone out of scope. It had nothing to do with optimization. It is a pity that valgrind can't find stack errors...
I have a segfault that appears only when enabling -O1 level optimization in gcc 4.4.4 (CentOS 5.5). All other optimization levels (0,2,3,s) are fine. I haven't managed to create a reduced test case for it yet, but it appears to be related to an array offset calculation causing the stack to be overwritten.
If I enable -O1 and disable all optimizations that have a flag (http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html) the bug still occurs.
If I use -O2 (or any other level) there is no problem. If I use O2 and disable strict-aliasing with -fno-strict-aliasing then the segfault returns.
Edit: If I add -fstack-protector-all to the build flags (either O1 or O2 -fno-strict-aliasing) the segfault disappears.
So it appears to be caused by an optimization that happens by default in O1 that is disabled by strict-aliasing.
I suspect that this is a compiler bug (but without a reduced testcase I can't prove it). This is a production server that needs a quick turn around. The normal optimization level is O1 and I'm loathe to just change it to O2 as it seems that the fix might be more dangerous than the original problem.
I would really appreciate some suggestions. Currently I'm thinking to try compiling gcc 4.4.6 and seeing if that fixes it. However not knowing for sure what is causing the problem is a little worrying.
Edit: the server is compiled with -Wall -Werror (and a few others). It runs without error in valgrind (valgrind checks heap accesses and this appears to be a stack related error).
Often, compiler optimizations can expose invalid or undefined behavior in source code, that you are lucky to get to work otherwise. A few things I would try:
compiling with -Wall -Wextra
running under valgrind to see if you can get more of a hint of where the error is
finding that minimum testcase!

Resources