I would like to know if there is an option I can use with GCC to get a detailed report on the optimization actually chosen and performed by the compiler. This is possible with the Intel C compiler using the -opt-report. I do not want to look at the assembly file and figure out the optimization. I am specifically looking for the loop unrolling and loop tiling factors chosen by the compiler.
Although it's not a report in the sense of aggregated information, you might try the -fdump-ipa-all option which makes gcc produce dump files which at least keep you from having to analyse assembler code on what happened.
Regarding loop optimization the -fdump-rtl-loop2 option might be of interest.
For details on all this please see the section Options for Debugging Your Program or GCC of the manual.
The reports from GCC is not that straight forward as like intel, but we can prefer. here is the detailed options use for optimization done by GCC.
-fopt-info
-fopt-info-options
-fopt-info-options=filename
Controls optimization dumps from various optimization passes. If the ‘-options’ form is used, options is a list of ‘-’ separated option keywords to select the dump details and optimizations.
The options can be divided into three groups:
options describing what kinds of messages should be emitted,
options describing the verbosity of the dump, and
options describing which optimizations should be included.
The options from each group can be freely mixed as they are non-overlapping. However, in case of any conflicts, the later options override the earlier options on the command line.
The following options control which kinds of messages should be emitted:
‘optimized’
Print information when an optimization is successfully applied. It is up to a pass to decide which information is relevant. For example, the vectorizer passes print the source location of loops which are successfully vectorized.
‘missed’
Print information about missed optimizations. Individual passes control which information to include in the output.
‘note’
Print verbose information about optimizations, such as certain transformations, more detailed messages about decisions etc.
‘all’
Print detailed optimization information. This includes ‘optimized’, ‘missed’, and ‘note’.
The following option controls the dump verbosity:
‘internals’
By default, only “high-level” messages are emitted. This option enables additional, more detailed, messages, which are likely to only be of interest to GCC developers.
One or more of the following option keywords can be used to describe a group of optimizations:
‘ipa’
Enable dumps from all interprocedural optimizations.
‘loop’
Enable dumps from all loop optimizations.
‘inline’
Enable dumps from all inlining optimizations.
‘omp’
Enable dumps from all OMP (Offloading and Multi Processing) optimizations.
‘vec’
Enable dumps from all vectorization optimizations.
‘optall’
Enable dumps from all optimizations. This is a superset of the optimization groups listed above.
If options is omitted, it defaults to ‘optimized-optall’, which means to dump messages about successful optimizations from all the passes, omitting messages that are treated as “internals”.
If the filename is provided, then the dumps from all the applicable optimizations are concatenated into the filename. Otherwise the dump is output onto stderr. Though multiple -fopt-info options are accepted, only one of them can include a filename. If other filenames are provided then all but the first such option are ignored.
Note that the output filename is overwritten in case of multiple translation units. If a combined output from multiple translation units is desired, stderr should be used instead.
In the following example, the optimization info is output to stderr:
gcc -O3 -fopt-info
This example:
gcc -O3 -fopt-info-missed=missed.all
outputs missed optimization report from all the passes into missed.all, and this one:
gcc -O2 -ftree-vectorize -fopt-info-vec-missed
prints information about missed optimization opportunities from vectorization passes on stderr. Note that -fopt-info-vec-missed is equivalent to -fopt-info-missed-vec. The order of the optimization group names and message types listed after -fopt-info does not matter.
As another example,
gcc -O3 -fopt-info-inline-optimized-missed=inline.txt
outputs information about missed optimizations as well as optimized locations from all the inlining passes into inline.txt.
Finally, consider:
gcc -fopt-info-vec-missed=vec.miss -fopt-info-loop-optimized=loop.opt
Here the two output filenames vec.miss and loop.opt are in conflict since only one output file is allowed. In this case, only the first option takes effect and the subsequent options are ignored. Thus only vec.miss is produced which contains dumps from the vectorizer about missed opportunities.
You may also consider fsave-optimization-record option
Related
While running an executable in gdb, I encountered the following error:
Program received signal SIGFPE, Arithmetic exception.
0x08158307 in radtra_ ()
How do I understand what line number and file does 0x08158307 without recompiling or otherwise modifying the source? if it helps, the source language was Fortran.
How do I understand what line number and file does 0x08158307 without recompiling or otherwise modifying the source?
That isn't easy. You could use GDB disassemble command, look for access to global variables and CALL instructions, and make a guess where inside radtra_ you are. This is harder the larger the routine is, the more optimizations compiler has applied to it, and the fewer calls and global variable accesses are performed.
If you can't guess, your only options are:
Rebuild the application adding -g flag, but leaving all other compile options unmodified, then use addr2line to translate the address to line number. (This is how you should build the application from the start.)
If you can't rebuild the entire application, rebuild just the source containing radtra_ (again with same flags, but add -g). You should be able to match the output from objdump -d radtra.o with the output from disassemble. Once you have a match, read output from readelf -wl radtra.o or objdump -g radtra.o to associate code offsets within radtra_ with source lines that code was generated from.
Hire an expert to guess for you. This wouldn't be cheap, as people skilled in this kind of reverse engineering are usually gainfully employed and value their time.
I am compiling code for Microchip dspic33 series processors using Microchip's XC16 compiler.
I have some code that is used in several applications (i.e. it is in a code library). For certain modules, I want to ensure that certain compiler flags are set during compilation, ideally using the pre-processor. In particular, I am interested in testing for the -mauxflash and -code-in-auxflash target flags.
Is there a way to test for compiler options during compilation?
I have tried dumping all the #defines using xc16-gcc -dM -E - < /dev/null, but nothing seems to change. There are 3 defines related to auxflash (AUXFLASH_LENGTH, __AUXFLASH_BASE, and __HAS_AUXFLASH), but nothing related to the target flags.
not all flags affect CPP defines, so you might be SOL there. your use of -dM -E is the best way to check.
however, there are a few features that might be useful to you:
-grecord-gcc-switches: this records all the flags used at compile time on a per-object basis in the DWARF info. you could then have a script that checks the objects and throws an error if one was built w/out the flag you care about.
__attribute__((optimize("flags"))): gcc lets you force specific flags on a per-function basis.
#pragma GCC optimize ("flags"): gcc lets you force specific flags at the file level.
I have a C program, which I compiled with the -g options, and then runned with:
valgrind --tool=cachegrind --branch-sim=yes ./myexecutable
This let me know which function contains a bottleneck. However this is a pretty long function and it is not clear to me from which part of the function I get most of the cache missings. I cannot (do not want to) divide it in two different parts.
Is there a way (maybe including a valgrind.h or with some magical #pragma stuff), to instruct Valgrind to make different statistics for different parts of a function?
To check function-by-function values, you have probably used cg_annotate like this:
cg_annotate cachegrind.out.1234
If you add the "--auto=yes" flag to that command, the values will be displayed for each line:
cg_annotate --auto=yes cachegrind.out.1234
You can print the result into a file so that you can search for your function. Note that only lines and functions that have a major performance impact will be displayed, so if you can't find certain lines they likely have a negligible impact on the execution.
I wonder what is the difference between these two:
gcc -s: Remove all symbol table and relocation information from the executable.
strip: Discard symbols from object files.
Do they have the same meaning?
Which one do you use to:
reduce the size of executable?
speed up its running?
gcc being a compiler/linker, its -s option is something done while linking. It's also not configurable - it has a set of information which it removes, no more no less.
strip is something which can be run on an object file which is already compiled. It also has a variety of command-line options which you can use to configure which information will be removed. For example, -g strips only the debug information which gcc -g adds.
Note that strip is not a bash command, though you may be running it from a bash shell. It is a command totally separate from bash, part of the GNU binary utilities suite.
The accepted answer is very good but just to complement your further questions (and also as reference for anyone that end up here).
What's the equivalent to gcc -s in terms of strip with some of its options?
They both do the same thing, removing the symbols table completely. However, as #JimLewis pointed out strip allows finer control. For example, in a relocatable object, strip --strip-unneeded won't remove its global symbols. However, strip or strip --strip-all would remove the complete symbols table.
Which one do you use to reduce the size of executable and speed up its running
The symbols table is a non-allocable section of the binary. This means that it never gets loaded in RAM memory. It stores information that can be useful for debugging purporses, for instance, to print out a stacktrace when a crash happens. A case where it could make sense to remove the symbols table would be a scenario where you have serious constraints of storage capacity (in that regard, gcc -Os -s or make CXXFLAGS="-Os -s" ... is useful as it will result in a smaller slower binary that is also stripped to reduce size further). I don't think removing the symbols table would result into a speed gain for the reasons commented.
Lastly, I recommend this link about stripping shared objects: http://www.technovelty.org/linux/stripping-shared-libraries.html
"gcc -s" removes the relocation information along with the symbol table which is not done by "strip". Note that, removing relocation information would have some effect on Address space layout randomization. See this link.
They do similar things, but strip allows finer grained control over what gets removed from
the file.
I am compiling some benchmarks, and it says that I can try the option gcc-serial instead of only gcc, can anyone please explain the difference between gcc and gcc serial?.
The place where that appears is here and it is mentioned for example in the slide 71. It is mentioned in more places but in none of them say what is gcc-serial.
Thank you.
The slides refer to a tool from Stanford (PARSEC) meant to benchmark multithreaded shared memory programs -- a.k.a. parallel programs. In many cases, "serial" is the opposite of "parallel":
$ cat config/gcc-serial.bldconf
#!/bin/bash
#
# gcc-serial.bldconf - file containing global information necessary to build
# the serial versions of the PARSEC programs with gcc
#
# Copyright (C) 2006, 2007 Christian Bienia
# Global configuration is identical to multi-threaded version
source ${PARSECDIR}/config/gcc.bldconf
I've never heard of gcc-serial, and I've used gcc for quite a while. Can you clarify more precisely what your benchmarks are telling you? Maybe you meant "gcc -serial" (with a space after gcc and before -serial)? Even in that, case though, I still don't know, since I can't find any mention of a -serial option in my gcc manual.
One version of gcc I'm using has the -mserialize-volatile and -mno-serialize-volatile options, which enable and disable respectively the generation of code that ensures the sequential consistency of volatile memory accesses.
From the slides, it seems to be a configuration name for the benchmarking tool, not a command you should use. It probably means some special way of using gcc when the tool is used.