Find Source File Function Declaration Location from ELF

Find Source File Function Declaration Location from ELF - gcc

Given an ELF file, I run readelf -sV bin/my_app | grep \ glob on it. This returns:
205: 0000000000000000 0 FUNC GLOBAL DEFAULT UND glob#GLIBC_2.27 (6)
326: 0000000000000000 0 FUNC GLOBAL DEFAULT UND glob#GLIBC_2.17 (2)
334: 0000000000000000 0 FUNC GLOBAL DEFAULT UND globfree#GLIBC_2.17 (2)
21011: 0000000000cd2748 8 OBJECT LOCAL DEFAULT 26 global_mask
21968: 0000000000000000 0 FILE LOCAL DEFAULT ABS globals_io.o
40039: 0000000000000000 0 FUNC GLOBAL DEFAULT UND glob##GLIBC_2.27
46623: 0000000000000000 0 FUNC GLOBAL DEFAULT UND glob#GLIBC_2.17
47377: 0000000000000000 0 FUNC GLOBAL DEFAULT UND globfree##GLIBC_2.17
With the information in this output, can I find the location in my source files where this glob symbol is invoked? I would like to find the location to understand why two different versions of GLIBC are linked for the same symbol.

With the information in this output, can I find the location in my source files where this glob symbol is invoked?
No.
You could do: objdump -d bin/my_app, find CALLs to the two versions of glob, and that will tell you what functions the calls are coming from.
I would like to find the location to understand why two different versions of GLIBC are linked for the same symbol.
It's not "two different versions of GLIBC", it's "two different glob symbols with different ABIs".
I didn't think it was possible to reference different versions of glob in the same ELF binary, unless you do some creative symbol aliasing with asm(".symver ...") directives. Once you know which function references the (old) glob#GLIBC_2.17 symbol, run preprocessor on the file in which that function is defined, and I'd be very surprised if there is no asm(".symversion...") in that file.

Related

How to properly set linker flags on esp-idf projects

In trying to port punyforth from esp8266 to esp32, using esp-idf toolchain, I ran unto a linker problem that stumps me.
If not using any special linker flags, I run into errors like:
.
.
.
/Users/k/.espressif/tools/xtensa-esp32-elf/esp-2020r1-8.2.0/xtensa-esp32-elf/bin/../lib/gcc/xtensa-esp32-elf/8.2.0/../../../../xtensa-esp32-elf/bin/ld: /Users/k/esp/esp-idf/examples/punyforth/arch/esp8266/rtos/user/main/punyforth.S:22:(.irom0.text+0x1237): dangerous relocation: l32r: literal placed after use: (.irom0.literal+0xc)
/Users/k/.espressif/tools/xtensa-esp32-elf/esp-2020r1-8.2.0/xtensa-esp32-elf/bin/../lib/gcc/xtensa-esp32-elf/8.2.0/../../../../xtensa-esp32-elf/bin/ld: /Users/k/esp/esp-idf/examples/punyforth/arch/esp8266/rtos/user/main/punyforth.S:24:(.irom0.text+0x123a): dangerous relocation: l32r: literal placed after use: (.irom0.literal+0x10)
/Users/k/.espressif/tools/xtensa-esp32-elf/esp-2020r1-8.2.0/xtensa-esp32-elf/bin/../lib/gcc/xtensa-esp32-elf/8.2.0/../../../../xtensa-esp32-elf/bin/ld: /Users/k/esp/esp-idf/examples/punyforth/arch/esp8266/rtos/user/main/punyforth.S:27:(.irom0.text+0x123f): dangerous relocation: l32r: literal placed after use: (.irom0.literal+0x14)
/Users/k/.espressif/tools/xtensa-esp32-elf/esp-2020r1-8.2.0/xtensa-esp32-elf/bin/../lib/gcc/xtensa-esp32-elf/8.2.0/../../../../xtensa-esp32-elf/bin/ld: /Users/k/esp/esp-idf/examples/punyforth/arch/esp8266/rtos/user/build/main/libmain.a(punyforth.o): in function `code_divmod':
/Users/k/esp/esp-idf/examples/punyforth/arch/esp8266/rtos/user/main/../../../primitives.S:121:(.irom0.text+0xcd): dangerous relocation: call0: call target out of range: forth_divmod
/Users/k/.espressif/tools/xtensa-esp32-elf/esp-2020r1-8.2.0/xtensa-esp32-elf/bin/../lib/gcc/xtensa-esp32-elf/8.2.0/../../../../xtensa-esp32-elf/bin/ld: /Users/k/esp/esp-idf/examples/punyforth/arch/esp8266/rtos/user/build/main/libmain.a(punyforth.o): in function `code_random':
/Users/k/esp/esp-idf/examples/punyforth/arch/esp8266/rtos/user/main/../../../ext.S:388:(.irom0.text+0xb75): dangerous relocation: call0: call target out of range: forth_random
/Users/k/.espressif/tools/xtensa-esp32-elf/esp-2020r1-8.2.0/xtensa-esp32-elf/bin/../lib/gcc/xtensa-esp32-elf/8.2.0/../../../../xtensa-esp32-elf/bin/ld: /Users/k/esp/esp-idf/examples/punyforth/arch/esp8266/rtos/user/build/main/libmain.a(punyforth.o): in function `code_usat':
/Users/k/esp/esp-idf/examples/punyforth/arch/esp8266/rtos/user/main/../../../ext.S:525:(.irom0.text+0xf85): dangerous relocation: call0: call target out of range: esp_timer_get_time
If I set (like recommended here and in other places)
LDFLAGS += -mtext-section-literals, then I get undefined main like so:
/Users/k/.espressif/tools/xtensa-esp32-elf/esp-2020r1-8.2.0/xtensa-esp32-elf/bin/../lib/gcc/xtensa-esp32-elf/8.2.0/../../../../xtensa-esp32-elf/bin/ld: /Users/k/.espressif/tools/xtensa-esp32-elf/esp-2020r1-8.2.0/xtensa-esp32-elf/bin/../lib/gcc/xtensa-esp32-elf/8.2.0/../../../../xtensa-esp32-elf/lib/no-rtti/crt0.o:(.literal+0x0): undefined reference to `main'
I'm a complete noob regarding the esp-idf toolchain, so I'm pretty much stuck. Any pointers on how to approach this would be great.
I found this and this post, but I'm still stuck.

The last error you mention refers to crt0. Peeking into the toolchain provided crt0.0 shows that it expects a symbol "main":
xtensa-esp32-elf-objdump -t crt0.o
crt0.o: file format elf32-xtensa-le
SYMBOL TABLE:
...
00000000 g .text 00000000 _start
00000000 *UND* 00000000 main
This is why the linker is looking for a symbol "main".
Esp-idf by default doesn't link against the standard C library, so specify -nostdlib when building. Unless punyforth itself needs the standard C runtime, then punyforth or your glue code needs to provide "main".

objcopy is removing a section unless I declare a static volatile variable in that section (using attribute)

I have a linker script in which I have defined a section for containing the checksum of a software image. Something like:
...
.my_checksum :
{
__checksum_is_here = .;
KEEP (*(.my_checksum))
. = ALIGN(4);
_sw_image_code_end = .;
} > IMAGE
...
The checksum is placed into that section by using objcopy --update-section.
I build an elf file by using the arm gcc compiler, and I can see this section and its value within it:
> arm-none-eabu-objdumph -h my_elf_file.elf
...
0 .text 0001496c 08010000 08010000 00010000 2**4
...
7 .my_checksum 00000004 080250c0 080250c0 000350c0 2**2
...
// Notice that 000350c0 is the file offset and 080250c0 is the LMA.
// The starting LMA is 08010000
And I can retrieve its value:
> xxd -s 0x000350c0 -l 4 my_elf_file.elf
000350c0: 015e 028e // I have checked this value and it is correct.
Now I generate a bin file by executing
> arm-none-eabi-objcopy -O binary --gap-fill 0xFF -S my_elf_file.elf my_elf_file.bin
Now, if I try to read the checksum value again, using the difference between the checksum LMA and the first section LMA (see above):
> xxd -s 0x150c0 -l 4 my_elf_file.bin
The result I obtain here is different from the one obtained in the elf file, that is, the checksum section has been removed by objcopy. (That's what I think at least).
Nevertheless, If I define this in my main.c file:
static volatile unsigned int __aux_checksum __attribute__((section(".my_checksum")));
...
int main() {
...
((void)__aux_checksum); // Avoid compiler/linker optimizations.
...
}
Now, if I replicate the same steps as above with the elf and bin files (using the proper offsets), I can retrieve the checksum from the bin file (elf and bin give the same result).
Questions
My first question is: I know that you can define a section using __attribute__((section)), but if you use a section already defined within the linker script, does this command changes its behaviour for placing the variable within the section, instead of creating a new one?
My second question is: Is this the only way for preventing objcopy of removing this particular section?

Lets answer your 2nd question first,
Is this the only way for preventing objcopy of removing this particular section?
You need a concept as documented in the gnu LD manual under SECTIONS.
4.6.8.1. Output Section Type
Each output section may have a type. The type is a keyword in parentheses. The following types are defined:
NOLOAD
The section should be marked as not loadable, so that it will not be loaded into memory when the program is run.
DSECT, COPY, INFO, OVERLAY
These type names are supported for backward compatibility, and are rarely used. They all have the same effect: the section should be marked as not allocatable, so that no memory is allocated for the section when the program is run.
The linker normally sets the attributes of an output section based on the input sections which map into it. You can override this by using the section type. For example, in the script sample below, the ROM section is addressed at memory location 0 and does not need to be loaded when the program is run. The contents of the ROM section will appear in the linker output file as usual.
SECTIONS {
ROM 0 (NOLOAD) : { … }
…
}
So what does that mean? Say you have debugging info in your objects. If you are burning a ROM image you probably don't want to place the debugging info in the object. As well, the BSS segment is all zero and there is no need to store it to ROM, but you need to clear our RAM (at the load address) to make way for it. The 'init value' for the .data section is initialized from ROM but resides in RAM. The concepts are 'loadable' and 'allocatable' and they have flags for them in an ELF file. By default your .my_checksum gets no flags. Ie, not allocated and not loadable like debug info.
I know that you can define a section using attribute((section)), but if you use a section already defined within the linker script, does this command changes its behaviour for placing the variable within the section, instead of creating a new one?
From the above,
The linker normally sets the attributes of an output section based on the input sections which map into it.
Your input sections flags get inherited by your output section. So you have put in at least allocatable as a flag.
I would suggest that you just put your checksum at the end of either .text or .data. For instance, input secttions .rodata (constant values) usually get put with the output .text. There is usually no need to invent another output sections unless you want some book keeping that wont get to the final image. Your __checksum_is_here label is sufficient to find it and you can look at this question on CRCs.

Why do common C compilers include the source filename in the output?

I have learnt from this recent answer that gcc and clang include the source filename somewhere in the binary as metadata, even when debugging is not enabled.
I can't really understand why this should be a good idea. Besides the tiny privacy risks, this happens also when one optimizes for the size of the resulting binary (-Os), which looks inefficient.
Why do the compilers include this information?

The reason why GCC includes the filename is mainly for debugging purposes, because it allows a programmer to identify from which source file a given symbol comes from as (tersely) outlined in the ELF spec p1-17 and further expanded upon in some Oracle docs on linking.
An example of using the STT_FILE section is given by this SO question.
I'm still confused why both GCC and Clang still include it even if you specify -g0, but you can stop it from including STT_FILE with -s. I couldn't find any explanation for this, nor could I find an "official reason" why STT_FILE is included in the ELF specification (which is very terse).

I have learnt from this recent answer that gcc includes the source filename somewhere in the binary as metadata, even when debugging is not enabled.
Not quite. In modern ELF object files the file name indeed is a symbol of type FILE:
$ readelf bignum.o # Source bignum.c
[...]
Symbol table (.symtab) contains 36 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS bignum.c
2: 0000000000000000 0 SECTION LOCAL DEFAULT 1
3: 0000000000000000 0 SECTION LOCAL DEFAULT 3
4: 0000000000000000 0 SECTION LOCAL DEFAULT 4
5: 0000000000000000 0 SECTION LOCAL DEFAULT 5
6: 0000000000000000 0 SECTION LOCAL DEFAULT 6
7: 0000000000000000 0 SECTION LOCAL DEFAULT 7
8: 0000000000000000 0 SECTION LOCAL DEFAULT 8
9: 00000000000003f0 172 FUNC GLOBAL DEFAULT 1 add
10: 00000000000004a0 104 FUNC GLOBAL DEFAULT 1 copy
However, once stripped, the symbol is gone:
$ strip bignum.o
$ readelf -all bignum.o | grep bignum.c
$
So to keep your privacy, strip the executable, or compile/link with -s.

What does the "aw" flag in the section attribute mean?

In the following line of code (which declares a global variable),
unsigned int __attribute__((section(".myVarSection,\"aw\",#nobits#"))) myVar;
what does the "aw" flag mean?
My understanding is that the nobits flag will prevent the variable from being initialised to zero, but I am struggling to find info about the "aw" flag.
Also, what meaning do the # and # have around the nobits flag?

The section("section-name") attribute places a variable in a specific section by producing the following assembler line:
.section section-name,"aw",#progbits
When you set section-name to ".myVarSection,\"aw\",#nobits#" you exploit a kind of "code injection" in GCC to produce:
.section .myVarSection,"aw",#nobits#,"aw",#progbits
Note that # sign starts a one-line comment.
See GNU Assembler manual for the full description of .section directive. A general syntax is
.section name [, "flags"[, #type[,flag_specific_arguments]]]
so "aw" are flags:
a: section is allocatable
w: section is writable
and #nobits is a type:
#nobits: section does not contain data (i.e., section only occupies space)
All the above is also applicable to functions, not just variables.

what does the "aw" flag mean?
It means that the section is allocatable (i.e. it's loaded to the memory at runtime) and writable (and readable, of course).
My understanding is that the nobits flag will prevent the variable from being initialised to zero, but I am struggling to find info about the "aw" flag.
Also, what meaning do the # and # have around the nobits flag?
#nobits (# is just a part of the name) means that the section isn't stored in the image on disk, it only exists in runtime (and it's filled with zeros at the startup).
# character begins the comment, so whatever the compiler will put in addition to what you have specified will be ignored.

interpreting gcc map file

I need to find the code size for a library developed using C on linux. I have generated the map file using the gcc linker options against a sample application that uses this library.
The map file is quite exhaustive. How do I find out the code size of the library from the map file? any pointers to any documentation on how to interpret the map file would also be very useful.

You want to find out the size of the machine instructions in a given shared object? Why do you need the map file?
This gives the size of the .text section. The .text section is where executable code is stored:
$ objdump -x /usr/bin/objdump | grep .text
13 .text 0002c218 0000000000403320 0000000000403320 00003320 2**4
In this example, there are 2c218 bytes of executable text. In decimal this is about 180 KiB:
$ printf %d\\n 0x2c218
180760
Edit: This is how it looks like with a library:
$ objdump -x /usr/lib/libcairo.so | grep .text
11 .text 00054c18 000000000000cc80 000000000000cc80 0000cc80 2**4
$ printf %d\\n 0x54c18
347160

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio