Why do common C compilers include the source filename in the output?

Why do common C compilers include the source filename in the output? - gcc

I have learnt from this recent answer that gcc and clang include the source filename somewhere in the binary as metadata, even when debugging is not enabled.
I can't really understand why this should be a good idea. Besides the tiny privacy risks, this happens also when one optimizes for the size of the resulting binary (-Os), which looks inefficient.
Why do the compilers include this information?

The reason why GCC includes the filename is mainly for debugging purposes, because it allows a programmer to identify from which source file a given symbol comes from as (tersely) outlined in the ELF spec p1-17 and further expanded upon in some Oracle docs on linking.
An example of using the STT_FILE section is given by this SO question.
I'm still confused why both GCC and Clang still include it even if you specify -g0, but you can stop it from including STT_FILE with -s. I couldn't find any explanation for this, nor could I find an "official reason" why STT_FILE is included in the ELF specification (which is very terse).

I have learnt from this recent answer that gcc includes the source filename somewhere in the binary as metadata, even when debugging is not enabled.
Not quite. In modern ELF object files the file name indeed is a symbol of type FILE:
$ readelf bignum.o # Source bignum.c
[...]
Symbol table (.symtab) contains 36 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS bignum.c
2: 0000000000000000 0 SECTION LOCAL DEFAULT 1
3: 0000000000000000 0 SECTION LOCAL DEFAULT 3
4: 0000000000000000 0 SECTION LOCAL DEFAULT 4
5: 0000000000000000 0 SECTION LOCAL DEFAULT 5
6: 0000000000000000 0 SECTION LOCAL DEFAULT 6
7: 0000000000000000 0 SECTION LOCAL DEFAULT 7
8: 0000000000000000 0 SECTION LOCAL DEFAULT 8
9: 00000000000003f0 172 FUNC GLOBAL DEFAULT 1 add
10: 00000000000004a0 104 FUNC GLOBAL DEFAULT 1 copy
However, once stripped, the symbol is gone:
$ strip bignum.o
$ readelf -all bignum.o | grep bignum.c
$
So to keep your privacy, strip the executable, or compile/link with -s.

Related

Find Source File Function Declaration Location from ELF

Given an ELF file, I run readelf -sV bin/my_app | grep \ glob on it. This returns:
205: 0000000000000000 0 FUNC GLOBAL DEFAULT UND glob#GLIBC_2.27 (6)
326: 0000000000000000 0 FUNC GLOBAL DEFAULT UND glob#GLIBC_2.17 (2)
334: 0000000000000000 0 FUNC GLOBAL DEFAULT UND globfree#GLIBC_2.17 (2)
21011: 0000000000cd2748 8 OBJECT LOCAL DEFAULT 26 global_mask
21968: 0000000000000000 0 FILE LOCAL DEFAULT ABS globals_io.o
40039: 0000000000000000 0 FUNC GLOBAL DEFAULT UND glob##GLIBC_2.27
46623: 0000000000000000 0 FUNC GLOBAL DEFAULT UND glob#GLIBC_2.17
47377: 0000000000000000 0 FUNC GLOBAL DEFAULT UND globfree##GLIBC_2.17
With the information in this output, can I find the location in my source files where this glob symbol is invoked? I would like to find the location to understand why two different versions of GLIBC are linked for the same symbol.

With the information in this output, can I find the location in my source files where this glob symbol is invoked?
No.
You could do: objdump -d bin/my_app, find CALLs to the two versions of glob, and that will tell you what functions the calls are coming from.
I would like to find the location to understand why two different versions of GLIBC are linked for the same symbol.
It's not "two different versions of GLIBC", it's "two different glob symbols with different ABIs".
I didn't think it was possible to reference different versions of glob in the same ELF binary, unless you do some creative symbol aliasing with asm(".symver ...") directives. Once you know which function references the (old) glob#GLIBC_2.17 symbol, run preprocessor on the file in which that function is defined, and I'd be very surprised if there is no asm(".symversion...") in that file.

set .symtab address and flags in elf to load it into memory with arm-none-eabi toolchain

I would like to load .symtab into memory with gdb debugger.
At most two steps are required for a normal section (for some section, e.g. .text, .data, ... , step 1 can be skipped cause is automatically set by ld):
1 - Set the Alloc flag (in case of a special section) to the section in the ELF. This can be done in this way for a normal section.
arm-none-eabi-objcopy --set-section-flags .sectionName=alloc src.elf dst.elf
2 - Set the address to the section. This can be done in 2 ways for a normal section AFAIK
A - Specifying the section memory area in the LD script e.g. for text section:
.text :
{
*(.text)
*(.text*)
} > FLASH
B - Using again objcopy
arm-none-eabi-objcopy --change-section-address .sectioName=0x0ABCD src.elf dst.elf
since .symtab is generated automatically by the linker I cannot treat it as a normal section so none of the steps above works.
Does anyone have any idea on how to solve this?
I already successfully implemented a workaround that to generate a new elf stripping all unneeded sections, and this works but then you have to load two elfs and i'm looking for a cleaner solution.

Controlling ELF file size using linker script

I am generating an ELF file for ARM platform using linaro tool chain.
The file is an executable that is supposed to run bare-metal.
I use a linker script to select the locations of sections in the memory because I want to put specific sections in specific locations.
The problem is that when I move some section forward in the memory I see that the image size increases, although no additional data has been added.
When I run readelf -a elf_file I see that both the virtual address (see Address field below) and the offset in image (See Offset field below) are both increased.
Example:
The following lines in the linker script
. = 0x2000000;
.__translations_block_0 : { TM_TranslationTables.o(__translations_block_0) }
Result in the following offsets in the elf file (output from readelf)
[Nr] Name Type Address Offset Size EntSize Flags Link Info Align
[10] .tdata PROGBITS 0000000000279000 00279080 000000000000000c 0000000000000000 WAT 0 0 16
[11] .tbss NOBITS 0000000000279080 0027908c 0000000000011bcc 0000000000000000 WAT 0 0 16
[12] .__translations_b PROGBITS 0000000002000000 02000080 0000000000000008 0000000000000000 WA 0 0 8
[13] .__translations_b PROGBITS 0000000002001000 02001080 0000000000000008 0000000000000000 WA 0 0 8
My question is:
Is there a way to increase the address of some section without blowing the image size? I just want the section to be loaded into memory address 0x2000000, I don't want the image size to be 0x2000000.
Any help would be appreciated.

GCC: Section names containing the / character

The answer to this question:
gcc/ld: Allow Code Placement And Removal of Unused Functions
seems to be a very good one. However, trying to use it, I see that the section name gets truncated as soon as a slash (/) character is encountered.
__FILE__ contains the path to the file, and thus the / character. The linker drops everything following a / character when creating a section name, eg.:
#define SEC_TEXT __attribute__((section(".mytext.bl/ah.c")))
unsigned char SEC_TEXT poll(void)
I end up with this section name:
[ 8] .mytext.bl PROGBITS 00000000 000120 00003d 00 0 0 1
If I use your answer, using __LINE__ and __FILE__:
#define __S(s) #s
#define _S(s) __S(s)
#define SECTION __FILE__ "." _S(__LINE__)
#define SEC_MYTEXT __attribute__((section(".mytext." SECTION)))
unsigned char SEC_MYTEXT poll(void)
I get this:
[ 8] .mytext. PROGBITS 00000000 000120 00003d 00 0 0 1
But you can see from the preprocessor output that it should give me a section name with the file and the line:
unsigned char __attribute__((section(".mytext." "/path/to/mycode/poll.c" "." "250"))) poll
Any way of getting around this issue ?

Hmm, it's only the free Mentor Graphics Intel (x86) compiler that shows that behaviour, both 4.6.3 and 4.7.2. GCC 4.8.2 with Ubuntu 14.04 is OK with handling slashes in section names. So is the Mentor ARM compiler 4.6.3.

ELF shared library: relocation offset out of bounds

There is a software package elfutils which includes a program called eu-elflint for checking ELF binaries (just as lint for C - hence the name).
Just for curiosity I have checked our own shared libraries with this tool and it found a lot of issues, e.g.:
eu-elflint libUtils.so
section [ 2] '.dynsym': _DYNAMIC symbol size 0 does not match dynamic segment size 248
section [ 2] '.dynsym': _GLOBAL_OFFSET_TABLE_ symbol size 0 does not match .got.plt section size 3076
section [ 8] '.rel.plt': relocation 0: offset out of bounds
section [ 8] '.rel.plt': relocation 1: offset out of bounds
...
section [ 8] '.rel.plt': relocation 765: offset out of bounds
As a crosscheck I have build a very trivial shared library from the source code below
int foo(int a) {
return a + 1;
}
// gcc -shared -fPIC -o libfoo.so foo.c
And tried again ...
eu-elflint libfoo.so
section [ 9] '.rel.plt': relocation 0: offset out of bounds
section [ 9] '.rel.plt': relocation 1: offset out of bounds
section [23] '.comment' has wrong flags: expected none, is MERGE|STRINGS
section [25] '.symtab': _GLOBAL_OFFSET_TABLE_ symbol size 0 does not match .got.plt section size 20
section [25] '.symtab': _DYNAMIC symbol size 0 does not match dynamic segment size 200
As you can see even the trivial example also shows a lot of issues.
BTW: I am on Ubuntu-Karmic-32bit with gcc v4.4.1
BTW: ... the same happens on Debian-Lenny-64bit with gcc v4.2.4
Is this something I should be concerned about?

Quick answer: "Is this something I should be concerned about?" No.
Longer answer: elflint checks not only ABI standards, but also some ELF conventions. Both ABIs and ELF conventions change over time: ABIs are extended, and have to remain backward compatible, and ELF conventions do evolve over time (to get new features, mainly). As a consequence, elflint's expectations have to be kept in sync with what your assembler/linker (the GNU binutils in this case) produce. You can find lots of reports to elflint about new ELF extensions introduced in GNU binutils, and for which elflint only catches later on. Thus, it's most probable that you have a version of elflint that is too old for your installed binutils. As elflint is not so much used, it wouldn't surprise me that a linux distro doesn't keep those two in sync so well.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Why do common C compilers include the source filename in the output? - gcc

Related

Find Source File Function Declaration Location from ELF

set .symtab address and flags in elf to load it into memory with arm-none-eabi toolchain

Controlling ELF file size using linker script

GCC: Section names containing the / character

ELF shared library: relocation offset out of bounds

Categories

Resources