Odd behavior with statically linked binaries compiled with gc and gccgo - compilation

here is a Hello world in go:
package main
import (
"fmt"
)
func main() {
fmt.Println("Go is great!")
}
Put it in hello.go and compile it with:
go build -o hello_go_build hello.go
go build -o hello_go_build_gccgo --compiler gccgo hello.go
gccgo -o hello_gccgo_shared hello.go
gccgo -static -o hello_gccgo_static hello.go
First, I noticed hello_go_build_gccgo and hello_gccgo_shared are not of the same size. I looked for information on the Internet without success. Does anyone know why is that? Or even better, would anyone tell me how I can try to figure that out? I tried to keep temp files with the -work flag, but I couldn't spot the relevant information.
Then, as you might notice, the two statically linked binaries does not have the same size either. Actually, the one compiled with the go build (hello_go_build) command works not only on my system but also on other systems with other Linux distribution while hello_go_build_gccgo fails on my system as well as on others with the error:
panic: runtime error: invalid memory address or nil pointer dereference
This is a bug about to be solved: https://groups.google.com/forum/?fromgroups=#!topic/golang-nuts/y2RIy0XLJ24
Finally, even if nowadays, size does not matter anymore, I am curious: is there any option with anyone of the go compilers to do function level linking (instead of statically link a package as a whole, only link the functions needed and their dependencies)?

First, I noticed hello_go_build_gccgo and hello_gccgo_shared are not of the same size. I looked for information on the Internet without success. Does anyone know why is that?
I would find it odd if they would be the same size. One is statically linked, the other uses shared libraries, so why they should be expected to be the same size?
Then, as you might notice, the two statically linked binaries does not have the same size either.
I would find it odd if they would be the same size. One is compiled by gc, the other by gccgo - two completely different compilers. Why they should be expected to produce a binary of the same size?
Finally, even if nowadays, size does not matter anymore, I am curious: is there any option with anyone of the go compilers to do function level linking (instead of statically link a package as a whole, only link the functions needed and their dependencies)?
There's no such thing as "statically link a package as a whole" with gc. Unused functions (and perhaps not only functions) are not present in the binary. And, IIRC, that was the case since day 1 (counting from the public release). Not sure if the preceding applies to gccgo as well, but I would expect it to do the same good job in this.

Related

Using binary breakpoints in GDB - how exact is the location?

I have some memorydumps from Linux Redhat GCC compiled programs like:
/apps/suns/runtime/bin/mardb82[0x40853b]
When I open mardb82 and put the breakpoint with break *0x40853b it will give me C filename/lineno which seems quite correct, but not completely.
Can I trust it, and what does it depend on? Is it sufficient if the source file in question is the same or does the files making up the executable have to be the same?
Can I find the locations in sources in some other way?
(Max debug info and sources are present, I haven't tried not having the sources present or passing them in)
When I open mardb82 and put the breakpoint with break *0x40853b it will give me C filename/lineno which seems quite correct, but not completely.
A faster way to get the filename/line:
addr2line -fe /path/to/mardb82 0x40853b
You didn't say where the ...bin/mardb82[0x40853b] line came from. Assuming it is a part of a crash stack, note that the instruction is usually the next after a CALL, so you may be interested in 0x40853b-5 (on *86 architectures) for all but the innermost level in the stack.
what does it depend on? Is it sufficient if the source file in question is the same or does the files making up the executable have to be the same?
The instruction address depends on the particular executable. Any change to source code comprising that executable, to compilation or linking flags, etc. etc. may cause the instructions to shift to a different address.

what do I do with an SIGFPE address in gdb?

While running an executable in gdb, I encountered the following error:
Program received signal SIGFPE, Arithmetic exception.
0x08158307 in radtra_ ()
How do I understand what line number and file does 0x08158307 without recompiling or otherwise modifying the source? if it helps, the source language was Fortran.
How do I understand what line number and file does 0x08158307 without recompiling or otherwise modifying the source?
That isn't easy. You could use GDB disassemble command, look for access to global variables and CALL instructions, and make a guess where inside radtra_ you are. This is harder the larger the routine is, the more optimizations compiler has applied to it, and the fewer calls and global variable accesses are performed.
If you can't guess, your only options are:
Rebuild the application adding -g flag, but leaving all other compile options unmodified, then use addr2line to translate the address to line number. (This is how you should build the application from the start.)
If you can't rebuild the entire application, rebuild just the source containing radtra_ (again with same flags, but add -g). You should be able to match the output from objdump -d radtra.o with the output from disassemble. Once you have a match, read output from readelf -wl radtra.o or objdump -g radtra.o to associate code offsets within radtra_ with source lines that code was generated from.
Hire an expert to guess for you. This wouldn't be cheap, as people skilled in this kind of reverse engineering are usually gainfully employed and value their time.

What are gcc linker map files used for?

What are the ".map" files generated by gcc/g++ linker option "-Map" used for ?
And how to read them ?
I recommend generating a map file and keeping a copy for any software you put into production.
It can be useful for deciphering crash reports. Depending on the system, you likely can get a stack dump from the crash. The stack dump will include memory addresses and one of the registers will include the Instruction Pointer. That tells you the memory address code was executing at. On some systems, code addresses can be moved around (when loading dynamic libraries, hence, dynamic), but the lower order bytes should remain the same.
The map file is a MAP from memory location -> code location. It gives you the name of the function at a given memory address. Due to optimizations, it may not be extremely accurate, but it gives you a place to start in terms of looking for bugs that cause the crash.
Now, in 30 years of writing commercial software, this is the only thing I've used the map files for. Twice successfully.
What are the ".map" files generated by gcc/g++ linker option "-Map" used for?
There is no such thing as 'gcc linker' -- GCC and linker are independent and separate projects.
Usually the map is used for understanding decisions that ld made while linking the binary. From man ld:
-M
--print-map
Print a link map to the standard output.
A link map provides information about the link, including the following:
· Where object files are mapped into memory.
· How common symbols are allocated.
· All archive members included in the link, with a mention of the symbol which caused the archive member to be brought in.
· The values assigned to symbols.
...
If you don't understand what that means, you likely don't (yet) have the questions that this output answers, and hence have no need to read it.
The compiler gcc is one program that generates object code files, the linker ld is a second program to combine the object code files into an executable. The two can be combined into a single command line.
If you are generating a program to run on an ARM processor you need to use arm-none-eabi-gcc and arm-none-eabi-ld so that the code will be correct for the ARM architecture. Gcc and ld will generate code for your host computer.

What is the difference between "gcc -s" and a "strip" command?

I wonder what is the difference between these two:
gcc -s: Remove all symbol table and relocation information from the executable.
strip: Discard symbols from object files.
Do they have the same meaning?
Which one do you use to:
reduce the size of executable?
speed up its running?
gcc being a compiler/linker, its -s option is something done while linking. It's also not configurable - it has a set of information which it removes, no more no less.
strip is something which can be run on an object file which is already compiled. It also has a variety of command-line options which you can use to configure which information will be removed. For example, -g strips only the debug information which gcc -g adds.
Note that strip is not a bash command, though you may be running it from a bash shell. It is a command totally separate from bash, part of the GNU binary utilities suite.
The accepted answer is very good but just to complement your further questions (and also as reference for anyone that end up here).
What's the equivalent to gcc -s in terms of strip with some of its options?
They both do the same thing, removing the symbols table completely. However, as #JimLewis pointed out strip allows finer control. For example, in a relocatable object, strip --strip-unneeded won't remove its global symbols. However, strip or strip --strip-all would remove the complete symbols table.
Which one do you use to reduce the size of executable and speed up its running
The symbols table is a non-allocable section of the binary. This means that it never gets loaded in RAM memory. It stores information that can be useful for debugging purporses, for instance, to print out a stacktrace when a crash happens. A case where it could make sense to remove the symbols table would be a scenario where you have serious constraints of storage capacity (in that regard, gcc -Os -s or make CXXFLAGS="-Os -s" ... is useful as it will result in a smaller slower binary that is also stripped to reduce size further). I don't think removing the symbols table would result into a speed gain for the reasons commented.
Lastly, I recommend this link about stripping shared objects: http://www.technovelty.org/linux/stripping-shared-libraries.html
"gcc -s" removes the relocation information along with the symbol table which is not done by "strip". Note that, removing relocation information would have some effect on Address space layout randomization. See this link.
They do similar things, but strip allows finer grained control over what gets removed from
the file.

What is gcc serial?

I am compiling some benchmarks, and it says that I can try the option gcc-serial instead of only gcc, can anyone please explain the difference between gcc and gcc serial?.
The place where that appears is here and it is mentioned for example in the slide 71. It is mentioned in more places but in none of them say what is gcc-serial.
Thank you.
The slides refer to a tool from Stanford (PARSEC) meant to benchmark multithreaded shared memory programs -- a.k.a. parallel programs. In many cases, "serial" is the opposite of "parallel":
$ cat config/gcc-serial.bldconf
#!/bin/bash
#
# gcc-serial.bldconf - file containing global information necessary to build
# the serial versions of the PARSEC programs with gcc
#
# Copyright (C) 2006, 2007 Christian Bienia
# Global configuration is identical to multi-threaded version
source ${PARSECDIR}/config/gcc.bldconf
I've never heard of gcc-serial, and I've used gcc for quite a while. Can you clarify more precisely what your benchmarks are telling you? Maybe you meant "gcc -serial" (with a space after gcc and before -serial)? Even in that, case though, I still don't know, since I can't find any mention of a -serial option in my gcc manual.
One version of gcc I'm using has the -mserialize-volatile and -mno-serialize-volatile options, which enable and disable respectively the generation of code that ensures the sequential consistency of volatile memory accesses.
From the slides, it seems to be a configuration name for the benchmarking tool, not a command you should use. It probably means some special way of using gcc when the tool is used.

Resources