I am generating an ELF file for ARM platform using linaro tool chain.
The file is an executable that is supposed to run bare-metal.
I use a linker script to select the locations of sections in the memory because I want to put specific sections in specific locations.
The problem is that when I move some section forward in the memory I see that the image size increases, although no additional data has been added.
When I run readelf -a elf_file I see that both the virtual address (see Address field below) and the offset in image (See Offset field below) are both increased.
Example:
The following lines in the linker script
. = 0x2000000;
.__translations_block_0 : { TM_TranslationTables.o(__translations_block_0) }
Result in the following offsets in the elf file (output from readelf)
[Nr] Name Type Address Offset Size EntSize Flags Link Info Align
[10] .tdata PROGBITS 0000000000279000 00279080 000000000000000c 0000000000000000 WAT 0 0 16
[11] .tbss NOBITS 0000000000279080 0027908c 0000000000011bcc 0000000000000000 WAT 0 0 16
[12] .__translations_b PROGBITS 0000000002000000 02000080 0000000000000008 0000000000000000 WA 0 0 8
[13] .__translations_b PROGBITS 0000000002001000 02001080 0000000000000008 0000000000000000 WA 0 0 8
My question is:
Is there a way to increase the address of some section without blowing the image size? I just want the section to be loaded into memory address 0x2000000, I don't want the image size to be 0x2000000.
Any help would be appreciated.
Related
I need to access .symtab symbol table by parsing memory of the process.
At the moment, my algorithm is:
Get Dynamic segment (Program's header p_type == PT_DYNAMIC) and follow p_vaddr
Search in this Dynamic Section for the DT_SYMTAB d_tag and take ptr from +4 offset (d_ptr), which should be our actual .symtab Symbol Table.
However, instead of .symtab, for some reason, I'm receiving .dynsym, which is proved by comparing symbol names and other info retrieved from readelf -Ws.
So, how to get the actual .symtab ptr?
Thank you.
For reference, I'm using:
https://en.wikipedia.org/wiki/Executable_and_Linkable_Format#Program_header
http://labmaster.mi.infn.it/Laboratorio2/CompilerCD/clang/l1/ELF.html
More good resources are appreciated.
I need to access .symtab symbol table by parsing memory of the process.
This is generally impossible because .symtab is normally not loaded into the process memory at all.
E.g.
readelf -WS foo.o | egrep ' \.(data|text|symtab)'
[ 1] .text PROGBITS 0000000000000000 000040 00001b 00 AX 0 0 1
[ 5] .data PROGBITS 0000000000000000 0000d0 000000 00 WA 0 0 1
[ 9] .symtab SYMTAB 0000000000000000 000130 000120 18 10 10 8
Notice that .data and .text have A (allocatable) flag, while .symtab doesn't.
However, instead of .symtab, for some reason, I'm receiving .dynsym
.dynsym is the only symbol table used at runtime, and is the only symbol table you can get without reading the executable on disk.
P.S. Also note that a fully-stripped binary will not have a .symtab at all, while still being perfectly runnable.
I have learnt from this recent answer that gcc and clang include the source filename somewhere in the binary as metadata, even when debugging is not enabled.
I can't really understand why this should be a good idea. Besides the tiny privacy risks, this happens also when one optimizes for the size of the resulting binary (-Os), which looks inefficient.
Why do the compilers include this information?
The reason why GCC includes the filename is mainly for debugging purposes, because it allows a programmer to identify from which source file a given symbol comes from as (tersely) outlined in the ELF spec p1-17 and further expanded upon in some Oracle docs on linking.
An example of using the STT_FILE section is given by this SO question.
I'm still confused why both GCC and Clang still include it even if you specify -g0, but you can stop it from including STT_FILE with -s. I couldn't find any explanation for this, nor could I find an "official reason" why STT_FILE is included in the ELF specification (which is very terse).
I have learnt from this recent answer that gcc includes the source filename somewhere in the binary as metadata, even when debugging is not enabled.
Not quite. In modern ELF object files the file name indeed is a symbol of type FILE:
$ readelf bignum.o # Source bignum.c
[...]
Symbol table (.symtab) contains 36 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS bignum.c
2: 0000000000000000 0 SECTION LOCAL DEFAULT 1
3: 0000000000000000 0 SECTION LOCAL DEFAULT 3
4: 0000000000000000 0 SECTION LOCAL DEFAULT 4
5: 0000000000000000 0 SECTION LOCAL DEFAULT 5
6: 0000000000000000 0 SECTION LOCAL DEFAULT 6
7: 0000000000000000 0 SECTION LOCAL DEFAULT 7
8: 0000000000000000 0 SECTION LOCAL DEFAULT 8
9: 00000000000003f0 172 FUNC GLOBAL DEFAULT 1 add
10: 00000000000004a0 104 FUNC GLOBAL DEFAULT 1 copy
However, once stripped, the symbol is gone:
$ strip bignum.o
$ readelf -all bignum.o | grep bignum.c
$
So to keep your privacy, strip the executable, or compile/link with -s.
I am reading about data alignment. And I know that when an x86 program starts executing, its stack will be aligned to a 4 bytes boundary. But will the .data and .bss sections also be aligned to a 4 bytes boundary? For example if I have the following:
section .data
number1 DW 1234
When a program with this code executes, will number1 always be on an address that is divisible by 4?
Yes. See the nasm manual:
The defaults assumed by NASM if you do not specify the above
qualifiers are:
section .data progbits alloc noexec write align=4
section .bss nobits alloc noexec write align=4
Notice it says align=4. This is for ELF output. You have forgotten to specify what you use.
For the win32 format, the relevant part is section 7.5.1:
The defaults assumed by NASM if you do not specify the above
qualifiers are:
section .data data align=4
section .bss bss align=4
I'm using valgrind to debug a binary which uses loadable libraries via dlopen.
On debian stable the stacktrace does not contain symbols for calls inside the loadable lib.
| | ->11.55% (114,688B) 0x769492C: ???
| | | ->11.55% (114,688B) 0x7697289: ???
| | | ->11.55% (114,688B) 0x769806F: ???
| | | ->11.55% (114,688B) 0x419812: myfunc (main.c:1010)
Valgrind on debian unstable works fine and the symbols are properly resolved. So I started looking what is different.
I have these packages on both systems (valgrind was updated to 3.7 from unstable):
ii valgrind 1:3.7.0-1+b1
ii libtool 2.2.6b-2
ii gcc 4:4.4.5-1
ii binutils 2.20.1-16
The libs are not stripped and contain debuginfo:
ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, BuildID[sha1]=0x33ffd210859178c15bb3923c5491e1a1b6065015, not stripped
Looking closer I noticed that the size of the libraries are different, on debian unstable the lib is slightly bigger. Comparing them with readelf, the size of the debug info is bigger.
[26] .debug_aranges PROGBITS 0000000000000000 00a74c 000090 00 0 0 1
[27] .debug_pubnames PROGBITS 0000000000000000 00a7dc 000385 00 0 0 1
[28] .debug_info PROGBITS 0000000000000000 00ab61 00512f 00 0 0 1
[29] .debug_abbrev PROGBITS 0000000000000000 00fc90 0006e2 00 0 0 1
[30] .debug_line PROGBITS 0000000000000000 010372 002314 00 0 0 1
[31] .debug_str PROGBITS 0000000000000000 012686 0019d3 01 MS 0 0 1
[32] .debug_loc PROGBITS 0000000000000000 014059 000f24 00 0 0 1
[33] .debug_macinfo PROGBITS 0000000000000000 014f7d 179082 00 0 0 1
[34] .debug_ranges PROGBITS 0000000000000000 18dfff 000060 00 0 0 1
This makes me think that something is missing from the debug info section from the binaries built on debian stable. Now my question is: why and how are the binaries different? The tools (gcc, libtool, binutils) used in the build are the same, including the compiler/linker flags and commands (I checked with diff on make's output).
Update:
The debug_info section size difference came from the fact that the full path of the source file is stored there as well and the build home was different. Also there are different openssl versions on unstable/stable which added some different symbols to the debug_info section. Hence the difference in debug_info size.
Running valgrind in debug mode (-d -v -v) shows that it reads symbols from the loadable lib in both cases:
--19837-- Reading syms from /usr/lib/myplugin.so (0x6c62000)
If you are using dlopen for the loadable library, chances are that it was unloaded before the program terminates. Therefore Valgrind is unable to resolve its symbols. Try to avoid calling dlclose on this library. See http://valgrind.org/docs/manual/faq.html#faq.unhelpful for more information.
There have been a number of posts on stackoverflow and other places detailing how to embed binary blobs into elf binaries.
Embedding binary blobs using gcc mingw
and
C/C++ with GCC: Statically add resource files to executable/library
being the most complete answers.
But there's a possible issue which noone mentions. Here's a quicky foo.txt coverted to foo.o:
$ objdump -x foo.o
foo.o: file format elf32-i386
Sections:
Idx Name Size VMA LMA File off Algn
0 .data 0000000d 00000000 00000000 00000034 2**0
CONTENTS, ALLOC, LOAD, DATA
SYMBOL TABLE:
00000000 l d .data 00000000 .data
0000000d g .data 00000000 _binary_foo_txt_end
0000000d g *ABS* 00000000 _binary_foo_txt_size
00000000 g .data 00000000 _binary_foo_txt_start
Now, I don't really grok all this output - is there documentation for this stuff??? I guess most of it is obvious enough "g" is global and "l" is local etc etc...
What stands out is the alignment for the .data segment set at 0. Does that mean what I think it means? ie: When it comes to linking, the linker will go "ah yeah, wherever..."
If you embed char data or are working on an x86 then you'll never notice. But if you embed int data or, as I'm doing, 16 and 32 bit data on an ARM, then you could get an alignment trap at any point.
My gut feeling is that this means that either objcopy needs another option to specify alignment of the binary blob, or it's broken and you shouldn't use this method at all.
To answer my own question, I'd assert that objcopy is broken in this instance. I believe that using assembly is likely the best way to go here using Gnu as. Unfortunately I'm now linux machine-less so can't test this properly but I'll put this answer here in case someone finds it or wants to check:
.section ".rodata"
.align 4 # which either means 4 or 2**4 depending on arch!
.global _binary_file_bin_start
.type _binary_file_bin_start, #object
_binary_file_bin_start:
.incbin file.bin
.align 4
.global _binary_file_bin_end
_binary_file_bin_end:
The underscores are the traditional way to annoy yourself with C/asm interoperability. In other words they vanish with MS/Borland compilers under Windows.
Create a linker script "lscript.ld"
MEMORY
{
memory : ORIGIN = 0x00000000, LENGTH = 0x80000000
}
SECTIONS
{
.data (ALIGN(4)) : {
*(.data)
*(.data.*)
__data_end = .;
} > memory
.text (ALIGN(4)) : {
*(.text)
*(.text.*)
__text_end = .;
} > memory
_end = .;
}
Link your file:
gcc -Wl,-T -Wl,lscript.ld -o linked_foo.elf foo.o
Find all the extraneous stuff added in linking:
objdump -x linked_foo.elf
Objcopy again, to remove the extra stuff:
objcopy --remove-section ".init_array" (repeat as necessary) --strip-all --keep-symbol "_binary_foo_txt_start" --keep-symbol "_binary_foo_txt_end" --keep-symbol "_binary_foo_txt_size" linked_foo.elf final_foo.elf
That gets you an elf file at 2**2 alignement.