How Does The Debugging Option -g Change the Binary Executable? - gcc

When writing C/C++ code, in order to debug the binary executable the debug option must be enabled on the compiler/linker. In the case of GCC, the option is -g. When the debug option is enabled, how does the affect the binary executable? What additional data is stored in the file that allows the debugger function as it does?

-g tells the compiler to store symbol table information in the executable. Among other things, this includes:
symbol names
type info for symbols
files and line numbers where the symbols came from
Debuggers use this information to output meaningful names for symbols and to associate instructions with particular lines in the source.
For some compilers, supplying -g will disable certain optimizations. For example, icc sets the default optimization level to -O0 with -g unless you explicitly indicate -O[123]. Also, even if you do supply -O[123], optimizations that prevent stack tracing will still be disabled (e.g. stripping frame pointers from stack frames. This has only a minor effect on performance).
With some compilers, -g will disable optimizations that can confuse where symbols came from (instruction reordering, loop unrolling, inlining etc). If you want to debug with optimization, you can use -g3 with gcc to get around some of this. Extra debug info will be included about macros, expansions, and functions that may have been inlined. This can allow debuggers and performance tools to map optimized code to the original source, but it's best effort. Some optimizations really mangle the code.
For more info, take a look at DWARF, the debugging format originally designed to go along with ELF (the binary format for Linux and other OS's).

A symbol table is added to the executable which maps function/variable names to data locations, so that debuggers can report back meaningful information, rather than just pointers. This doesn't effect the speed of your program, and you can remove the symbol table with the 'strip' command.

In addition to the debugging and symbol information
Google DWARF (A Developer joke on ELF)
By default most compiler optimizations are turned off when debugging is enabled.
So the code is the pure translation of the source into Machine Code rather than the result of many highly specialized transformations that are applied to release binaries.
But the most important difference (in my opinion)
Memory in Debug builds is usually initialized to some compiler specific values to facilitate debugging. In release builds memory is not initialized unless explicitly done so by the application code.
Check your compiler documentation for more information:
But an example for DevStudio is:
0xCDCDCDCD Allocated in heap, but not initialized
0xDDDDDDDD Released heap memory.
0xFDFDFDFD "NoMansLand" fences automatically placed at boundary of heap memory. Should never be overwritten. If you do overwrite one, you're probably walking off the end of an array.
0xCCCCCCCC Allocated on stack, but not initialized

-g adds debugging information in the executable, such as the names of variables, the names of functions, and line numbers. This allows a debugger, such as gdb to step through code line by line, set breakpoints, and inspect the values of variables. Because of this additional information using -g increases the size of the executable.
Also, gcc allows to use -g together with -O flags, which turn on optimization. Debugging an optimized executable can be very tricky, because variables may be optimized away, or instructions may be executed in a different order. Generally, it is a good idea to turn off optimization when using -g, even though it results in much slower code.

Just as a matter of interest, you can crack open a hexeditor and take a look at an executable produced with -g and one without. You can see the symbols and things that are added. It may change the assembly (-S) too, but I'm not sure.

There is some overlap with this question which covers the issue from the other side.

Some operating systems (like z/OS) produce a "side file" that contains the debug symbols. This helps avoid bloating the executable with extra information.

Related

How -fomit-frame-pointer gcc option could make debugging impossible?

GCC online doc - 3.10 Options That Control Optimization affirm that -fomit-frame-pointer gcc option can make debbuging impossible.
-fomit-frame-pointer
Don't keep the frame pointer in a register for functions that don't need one. This avoids the instructions to save, set up and restore frame pointers; it also makes an extra register available in many functions. It also makes debugging impossible on some machines.
I understand why local variables are harder to locate and stack traces are much harder to reconstruct without a frame pointer to help out.
But, In what circumstances is it make debugging impossible?
It may be impossible in the sense that existing tools for these platforms (which are often provided by platform vendor, not GNU) expect frame pointer to be present for successful unwinding. One could theoretically modify them to be more intelligent but in practice this is not possible.

gcc configure option explanations

I want to investigate a way for my other question (gcc: Strip unused functions) by building the latest gcc 6.3.0.
There are some options from https://gcc.gnu.org/install/configure.html and https://gcc.gnu.org/onlinedocs/libstdc++/manual/configure.html that I would like to try, but don't understand what they meant.
Specifically, these are the flags I want to try:
--disable-libstdcxx-verbose: I rarely use exceptions so I am not very familiar with how it works. I have never seen the "verbose messages" it mentioned before.
--enable-linker-build-id and --enable-gnu-unique-object: Simply don't understand what the explanations are trying to say. What are the benefits exactly?
--enable-cxx-flags="-ffunction-sections -fstrict-aliasing -fno-exceptions": If I use -fno-exceptions in libstdc++, doesn't that means I get no exceptions if I use libstdc++? -ffunction-sections is used, but where to put -Wl,-gc-sections?
Although I always use --enable-lto, but with ld.bfd, it seems to be quite useless compared to the famous gold linker.
If you have more flags you think I should try, please let me know!
--disable-libstdcxx-verbose: I rarely use exceptions
so I am not very familiar with how it works. I have never seen the
"verbose messages" it mentioned before.
+1, you normally don't run into errors which trigger these friendly error messages you can avoid paying for them.
--enable-linker-build-id and --enable-gnu-unique-object:
Simply don't understand what the explanations are trying to say.
What are the benefits exactly?
There are none.
Unique objects is a badly designed feature that prevents shared libraries which contain references to globally used objects (usually vtables) from unloading on dlclose. AFAIR it's enabled by default (as it's needed to simulate C++ semantics in shared libs environment).
Build id is needed to support separate debuginfo.
--enable-cxx-flags="-ffunction-sections -fstrict-aliasing -fno-exceptions":
You won't benefit from -fstrict-aliasing as it's enabled at -O2 and higher by default.
-ffunction-sections is used, but where to put -Wl,-gc-sections?
To --enable-cxx-flags as well (note that it wants double-dash i.e. -Wl,--gc-sections).
Although I always use --enable-lto, but with ld.bfd,
it seems to be quite useless compared to the famous gold linker.
This flags simply enables LTO support in GCC (it's actually equivalent to adding lto to --enable-languages). It won't cause any difference unless you enable -flto in CXXFLAGS as well. Keep in mind that LTO will normally increase executable size (as compiler will have more opportunities for inlining).
If you have more flags you think I should try, please let me know!
Speaking of size reduction, I'd say -ffunction-sections is your best bet (be sure to verify that configure machinery passes all options correctly and libstdc++.a indeed has one section per function). You could also add -fdata-sections.

How does GCC's '-pg' flag work in relation to profilers?

I'm trying to understand how the -pg (or -p) flag works when compiling C code with GCC.
The official GCC documentation only states:
-pg
Generate extra code to write profile information suitable for the analysis program gprof. You must use this option when compiling the source files you want data about, and you must also use it when linking.
This really interests me, as I'm doing a small research on profilers. I'm trying to pick the best tool for the job.
Compiling with -pg instruments your code, so that Gprof reports detailed information. See gprof's manual, 9.1 Implementation of Profiling:
Profiling works by changing how every function in your program is compiled so that when it is called, it will stash away some information about where it was called from. From this, the profiler can figure out what function called it, and can count how many times it was called. This change is made by the compiler when your program is compiled with the -pg option, which causes every function to call mcount (or _mcount, or __mcount, depending on the OS and compiler) as one of its first operations.
The mcount routine, included in the profiling library, is responsible for recording in an in-memory call graph table both its parent routine (the child) and its parent's parent. This is typically done by examining the stack frame to find both the address of the child, and the return address in the original parent. Since this is a very machine-dependent operation, mcount itself is typically a short assembly-language stub routine that extracts the required information, and then calls __mcount_internal (a normal C function) with two arguments—frompc and selfpc. __mcount_internal is responsible for maintaining the in-memory call graph, which records frompc, selfpc, and the number of times each of these call arcs was traversed.
...
Please note that with such an instrumenting profiler, you're profiling the same code you would compile in release without profiling instrumentation. There is an overhead associated with the instrumentation code itself. Also, the instrumentation code may alter instruction and data cache usage.
Contrary to an instrumenting profiler, a sampling profiler like Intel VTune works on noninstrumented code by looking at the target program's program counter at regular intervals using operating system interrupts. It can also query special CPU registers to give you even more insight of what's going on.
See also Profilers Instrumenting Vs Sampling.
This link gives a brief explanation of how gprof works.
This link gives an extensive critique of it.
(Check my answer to the archived question.)
From "Measuring Function Duration with Ftrace":
Instrumentation comes in two main
forms—explicitly declared tracepoints, and implicit tracepoints.
Explicit tracepoints consist of developer defined
declarations which specify the location of the
tracepoint, and additional information about what data
should be collected at a particular trace site. Implicit
tracepoints are placed into the code automatically by the compiler, either due to compiler flags or by developer redefinition of commonly used macros.
To instrument functions implicitly, when
the kernel is configured to support function tracing, the kernel build system adds -pg to the flags used with
the compiler. This causes the compiler to add code to
the prologue of each function, which calls a special assembly routine called mcount. This compiler option is
specifically intended to be used for profiling and tracing
purposes.

Clever uses of linker scripts?

A great comment on my answer describing how to use linker scripts to make a ctor-like function list pointed out that recent GNU ld has much improved support for grafting new sections into system linker scripts with -Wl,-T... and INSERT BEFORE/INSERT AFTER. This got me thinking about other linker script tricks.
For a network card firmware I modified the linker script to group together the runtime modules of the firmware so that they would all be in a contiguous block that could be in L1 cache without conflicts. To clean up stragglers (where I couldn't group by .o) I used section attributes on individual functions. Performance counters verified that it actually worked (reduced L1 instruction cache misses to almost nothing).
What other clever things have you accomplished with linker scripts?
On a certain platform, for reasons I won't go into, I needed to have a section of executable which I could discard after load. Now unfortunately unmapping the memory for the executable was not possible so I was compelled to resort to linker trickery.
What I ended up doing was introducing a section of the executable which aliased the bss. That way, presuming I could sneak some code in early enough, I could copy the data out, reinitialize the bss, and so long as my aliased section was smaller than the total bss of the executable, paid no cost for the privilege. There are a couple of problems in that I couldn't really change the crt at all and the earliest point I could inject code was still after tls initialization (which used some bss), but nothing impossible to work around.
I'm still sort of surprised it worked, I would have thought that the bss was initialized by the crt after all the program sections were loaded. I haven't tried it on any platform where I have access to the loader or crt source.

profile linking times with gcc/g++ and ld

I'm using g++ to compile and link a project consisting of about 15 c++ source files and 4 shared object files. Recently the linking time more than doubled, but I don't have the history of the makefile available to me. Is there any way to profile g++ to see what part of the linking is taking a long time?
Edit: After I noticed that the makefile was using -O3 optimizations all the time, I managed to halve the linking time just by removing that switch. Is there any good way I could have found this without trial and error?
Edit: I'm not actually interested in profiling how ld works. I'm interested in knowing how I can match increases in linking time to specific command line switches or object files.
Profiling g++ will prove futile, because g++ doesn't perform linking, the linker ld does.
Profiling ld will also likely not show you anything interesting, because linking time is most often dominated by disk I/O, and if your link isn't, you wouldn't know what to make of the profiling data, unless you understand ld internals.
If your link time is noticeable with only 15 files in the link, there is likely something wrong with your development system [1]; either it has a disk that is on its last legs and is constantly retrying, or you do not have enough memory to perform the link (linking is often RAM-intensive), and your system swaps like crazy.
Assuming you are on an ELF based system, you may also wish to try the new gold linker (part of binutils), which is often several times faster than the GNU ld.
[1] My typical links involve 1000s of objects, produce 200+MB executables, and finish in less than 60s.
If you have just hit your RAM limit, you'll be probably able to hear the disk working, and a system activity monitor will tell you that. But if linking is still CPU-bound (i.e. if CPU usage is still high), that's not the issue. And if linking is IO-bound, the most common culprit can be runtime info. Have a look at the executable size anyway.
To answer your problem in a different way: are you doing heavy template usage? For each usage of a template with a different type parameter, a new instance of the whole template is generated, so you get more work for the linker. To make that actually noticeable, though, you'd need to use some library really heavy on templates. A lot of ones from the Boost project qualifies - I got template-based code bloat when using Boost::Spirit with a complex grammar. And ~4000 lines of code compiled to 7,7M of executable - changing one line doubled the number of specializations required and the size of the final executable. Inlining helped a lot, though, leading to 1,9M of output.
Shared libraries might be causing other problems, you might want to look at documentation for -fvisibility=hidden, and it will improve your code anyway. From GCC manual for -fvisibility:
Using this feature can very substantially
improve linking and load times of shared object libraries, produce
more optimized code, provide near-perfect API export and prevent
symbol clashes. It is *strongly* recommended that you use this in
any shared objects you distribute.
In fact, the linker normally must support the possibility for the application or for other libraries to override symbols defined into the library, while typically this is not the intended usage. Note that using that is not for free however, it does require (trivial) code changes.
The link suggested by the docs is: http://gcc.gnu.org/wiki/Visibility
Both gcc and g++ support the -v verbose flag, which makes them output details of the current task.
If you're interested in really profiling the tools, you may want to check out Sysprof or OProfile.

Resources