Why is the objdump binary size so much larger than the actual ELF size? - compilation

I have an ELF file which we then convert to a binary format:
arm-none-eabi-objcopy -O binary MyElfFile.elf MyBinFile.bin
The ELF file is just under 300KB, but the binary output file is 446-times larger: 134000KB, or 130MB! How is this possible when the whole point of a binary is to remove symbols and section tables and debug info?
Looking at Reddit and SO it looks like the binary image should be smaller than the ELF, not larger.

so.s
b .
.section .data
.word 0x12345678
arm-none-eabi-as so.s -o so.o
arm-none-eabi-objdump -D so.o
so.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <.text>:
0: eafffffe b 0 <.text>
Disassembly of section .data:
00000000 <.data>:
0: 12345678 eorsne r5, r4, #120, 12 ; 0x78
arm-none-eabi-readelf -a so.o
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .text PROGBITS 00000000 000034 000004 00 AX 0 0 4
[ 2] .data PROGBITS 00000000 000038 000004 00 WA 0 0 1
[ 3] .bss NOBITS 00000000 00003c 000000 00 WA 0 0 1
[ 4] .ARM.attributes ARM_ATTRIBUTES 00000000 00003c 000012 00 0 0 1
[ 5] .symtab SYMTAB 00000000 000050 000060 10 6 6 4
[ 6] .strtab STRTAB 00000000 0000b0 000004 00 0 0 1
[ 7] .shstrtab STRTAB 00000000 0000b4 00003c 00 0 0 1
so my "binary" has 8 bytes total. In two sections.
-rw-rw-r-- 1 oldtimer oldtimer 560 Oct 12 16:32 so.o
8 bytes relative to 560 for the object.
Link it.
MEMORY
{
one : ORIGIN = 0x00001000, LENGTH = 0x1000
two : ORIGIN = 0x00002000, LENGTH = 0x1000
}
SECTIONS
{
.text : { (.text) } > one
.data : { (.data) } > two
}
arm-none-eabi-ld -T so.ld so.o -o so.elf
arm-none-eabi-objdump -D so.elf
so.elf: file format elf32-littlearm
Disassembly of section .text:
00001000 <.text>:
1000: eafffffe b 1000 <.text>
Disassembly of section .data:
00002000 <.data>:
2000: 12345678 eorsne r5, r4, #120, 12 ; 0x7800000
arm-none-eabi-readelf -a so.elf
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .text PROGBITS 00001000 001000 000004 00 AX 0 0 4
[ 2] .data PROGBITS 00002000 002000 000004 00 WA 0 0 1
[ 3] .ARM.attributes ARM_ATTRIBUTES 00000000 002004 000012 00 0 0 1
[ 4] .symtab SYMTAB 00000000 002018 000070 10 5 7 4
[ 5] .strtab STRTAB 00000000 002088 00000c 00 0 0 1
[ 6] .shstrtab STRTAB 00000000 002094 000037 00 0 0 1
Now...we need 4 bytes at 0x1000 and 4 bytes at 0x2000, if we want to use the -O binary objcopy that means it is going to take the entire memory space and start the file with the lowest address thing and end with the highest address thing. With this link the lowest thing is 0x1000 and highest is 0x2003, a total span of 0x1004 bytes:
arm-none-eabi-objcopy -O binary so.elf so.bin
ls -al so.bin
-rwxrwxr-x 1 oldtimer oldtimer 4100 Oct 12 16:40 so.bin
4100 = 0x1004 bytes
hexdump -C so.bin
00000000 fe ff ff ea 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00001000 78 56 34 12 |xV4.|
00001004
The assumption here is the user knows that the base address is 0x1000 as there is no address info in the file format. And that this is a continuous memory image so that the four bytes also land at 0x2000. So -O binary pads the file to fill everything in.
If I change to this
MEMORY
{
one : ORIGIN = 0x00000000, LENGTH = 0x1000
two : ORIGIN = 0x10000000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > one
.data : { *(.data*) } > two
}
You can easily see where this is headed.
ls -al so.bin
-rwxrwxr-x 1 oldtimer oldtimer 268435460 Oct 12 16:43 so.bin
So my elf does not change size, but the -O binary format is 0x10000004 bytes in size, there are only 8 bytes I care about but the nature of objcopy -O binary has to pad the middle.
Since the sizes and spaces of things vary specific to your project and your linker script, no generic statements can be made relative to the size of the elf file and the size of an -O binary file.
ls -al so.elf
-rwxrwxr-x 1 oldtimer oldtimer 131556 Oct 12 16:49 so.elf
arm-none-eabi-strip so.elf
ls -al so.elf
-rwxrwxr-x 1 oldtimer oldtimer 131336 Oct 12 16:50 so.elf
arm-none-eabi-as -g so.s -o so.o
ls -al so.o
-rw-rw-r-- 1 oldtimer oldtimer 1300 Oct 12 16:51 so.o
arm-none-eabi-ld -T so.ld so.o -o so.elf
ls -al so.elf
-rwxrwxr-x 1 oldtimer oldtimer 132088 Oct 12 16:51 so.elf
arm-none-eabi-strip so.elf
ls -al so.elf
-rwxrwxr-x 1 oldtimer oldtimer 131336 Oct 12 16:52 so.elf
The elf binary file format does not have absolute rules on content, the consumer of the file can have rule as to what you have to put where, if any specific names of items have to be there, etc. It is a somewhat open file format, it is a container like a cardboard box, and you can fill it to some extent how you like. You cannot fit a cruise ship in it, but you can put books or toys and you can choose how you put the books or toys in it sometimes.
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .text PROGBITS 00000000 010000 000004 00 AX 0 0 4
[ 2] .data PROGBITS 10000000 020000 000004 00 WA 0 0 1
[ 3] .ARM.attributes ARM_ATTRIBUTES 00000000 020004 000012 00 0 0 1
[ 4] .shstrtab STRTAB 00000000 020016 000027 00 0 0 1
Even after stripping there is still extra stuff there, if you study the file format you have a header, relatively small with number of program headers and number of section headers and then that many program headers and that many section headers. Depending on the consumer(s) of the file you may for example only need the main header stuff and two program headers in this case and that is it, a much smaller file (as you can see with the object version of the file).
arm-none-eabi-as so.s -o so.o
ls -al so.o
-rw-rw-r-- 1 oldtimer oldtimer 560 Oct 12 16:57 so.o
arm-none-eabi-strip so.o
ls -al so.o
-rw-rw-r-- 1 oldtimer oldtimer 364 Oct 12 16:57 so.o
readelf that
Size of this header: 52 (bytes)
Size of program headers: 0 (bytes)
Number of program headers: 0
Size of section headers: 40 (bytes)
Number of section headers: 6
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .text PROGBITS 00000000 000034 000004 00 AX 0 0 4
[ 2] .data PROGBITS 00000000 000038 000004 00 WA 0 0 1
[ 3] .bss NOBITS 00000000 00003c 000000 00 WA 0 0 1
[ 4] .ARM.attributes ARM_ATTRIBUTES 00000000 00003c 000012 00 0 0 1
[ 5] .shstrtab STRTAB 00000000 00004e 00002c 00 0 0 1
Extra section headers we don't need which maybe can be removed in the linker script. But I assume for some consumers all you would need is the two program headers
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 2
Plus the 8 bytes and any padding for this file format.
Also note
arm-none-eabi-objcopy --only-section=.text -O binary so.elf text.bin
arm-none-eabi-objcopy --only-section=.data -O binary so.elf data.bin
ls -al text.bin
-rwxrwxr-x 1 oldtimer oldtimer 4 Oct 12 17:03 text.bin
ls -al data.bin
-rwxrwxr-x 1 oldtimer oldtimer 4 Oct 12 17:03 data.bin
hexdump -C text.bin
00000000 fe ff ff ea |....|
00000004
hexdump -C data.bin
00000000 78 56 34 12 |xV4.|
00000004

Related

Conversion from .elf to .bin increases file size

When we covert a .elf file generated from arm-gcc toolchain to .bin file, its size increases from 40kB to 1.1Gb.
For conversion we are using :
./arm-none-eabi-objcopy -O binary test.elf test.bin
It might be because of non-contiguous memory map and the gaps between the memory regions are just being filled with zeros.
What options can be used in objcopy? Or is there any other method to convert?
Following is the elf information:
Tag_CPU_name: "Cortex-M7" Tag_CPU_arch: v7E-M
Tag_CPU_arch_profile: Microcontroller Tag_THUMB_ISA_use: Thumb-2
Tag_FP_arch: FPv5/FP-D16 for ARMv8 Tag_ABI_PCS_wchar_t: 4
Tag_ABI_FP_denormal: Needed Tag_ABI_FP_exceptions: Needed
Tag_ABI_FP_number_model: IEEE 754 Tag_ABI_align_needed: 8-byte
Tag_ABI_enum_size: small Tag_ABI_VFP_args: VFP registers
Tag_ABI_optimization_goals: Aggressive Debug
Tag_CPU_unaligned_access: v6
The listing of sections contained in the ELF file is - there are 25 section headers, starting at offset 0x3e982c:
Section Headers:
[Nr] Name
Type Addr Off Size ES Lk Inf Al
Flags
[ 0]
NULL 00000000 000000 000000 00 0 0 0
[00000000]:
[ 1] .flash_config
PROGBITS 60000000 020000 000200 00 0 0 4
[00000002]: ALLOC
[ 2] .ivt
PROGBITS 60001000 021000 000030 00 0 0 4
[00000002]: ALLOC
[ 3] .interrupts
PROGBITS 60002000 022000 000400 00 0 0 4
[00000002]: ALLOC
[ 4] .text
PROGBITS 60002400 022400 312008 00 0 0 16
[00000006]: ALLOC, EXEC
[ 5] .ARM
ARM_EXIDX 60314408 334408 000008 00 4 0 4
[00000082]: ALLOC, LINK ORDER
[ 6] .init_array
INIT_ARRAY 60314410 334410 000004 04 0 0 4
[00000003]: WRITE, ALLOC
[ 7] .fini_array
FINI_ARRAY 60314414 334414 000004 04 0 0 4
[00000003]: WRITE, ALLOC
[ 8] .interrupts_ram
PROGBITS 20200000 380000 000000 00 0 0 1
[00000001]: WRITE
[ 9] .data
PROGBITS 20200000 340000 014bd0 00 0 0 8
[00000007]: WRITE, ALLOC, EXEC
[10] .ncache.init
PROGBITS 20214bd0 354bd0 011520 00 0 0 4
[00000003]: WRITE, ALLOC
[11] .ncache
NOBITS 20226100 366100 0021d8 00 0 0 64
[00000003]: WRITE, ALLOC
[12] .bss
NOBITS 20229000 369000 077ce8 00 0 0 4096
[00000003]: WRITE, ALLOC
[13] .NVM_TABLE
PROGBITS 20000000 010000 00000c 00 0 0 4
[00000003]: WRITE, ALLOC
[14] .heap
NOBITS 2000000c 01000c 000404 00 0 0 1
[00000003]: WRITE, ALLOC
[15] .stack
NOBITS 20000410 01000c 000400 00 0 0 1
[00000003]: WRITE, ALLOC
[16] .NVM
PROGBITS 60570000 370000 010000 00 0 0 1
[00000003]: WRITE, ALLOC
[17] .ARM.attributes
ARM_ATTRIBUTES 00000000 380000 00002e 00 0 0 1
[00000000]:
[18] .comment
PROGBITS 00000000 38002e 00004c 01 0 0 1
[00000030]: MERGE, STRINGS
[19] .debug_frame
PROGBITS 00000000 38007c 001174 00 0 0 4
[00000000]:
[20] .stab
PROGBITS 00000000 3811f0 0000cc 0c 21 0 4
[00000000]:
[21] .stabstr
STRTAB 00000000 3812bc 0001b9 00 0 0 1
[00000000]:
[22] .symtab
SYMTAB 00000000 381478 046620 10 23 13540 4
[00000000]:
[23] .strtab
STRTAB 00000000 3c7a98 021cb2 00 0 0 1
[00000000]:
[24] .shstrtab
STRTAB 00000000 3e974a 0000df 00 0 0 1
[00000000]:
so.s
.thumb
nop
.data
.word 0x11223344
so.ld
MEMORY
{
one : ORIGIN = 0x00000000, LENGTH = 0x1000
two : ORIGIN = 0x20000000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > one
.data : { *(.data*) } > two
}
build
arm-none-eabi-as so.s -o so.o
arm-none-eabi-ld -T so.ld so.o -o so.elf
arm-none-eabi-objdump -D so.elf
arm-none-eabi-objcopy -O binary so.elf so.bin
536870916 Apr 28 15:23 so.bin
131556 Apr 28 15:23 so.elf
from readelf
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x010000 0x00000000 0x00000000 0x00002 0x00002 R E 0x10000
LOAD 0x020000 0x20000000 0x20000000 0x00004 0x00004 RW 0x10000
now so.ld
MEMORY
{
one : ORIGIN = 0x00000000, LENGTH = 0x1000
two : ORIGIN = 0x20000000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > one
.bss : { *(.bss*) } > two AT > one
.data : { *(.data*) } > two AT > one
}
it is actually .bss that is doing the magic here, that is some other research project, I could have started with a .C file but tried asm...
6 Apr 28 15:30 so.bin
131556 Apr 28 15:29 so.elf
and now it is the possibly desired 6 bytes without padding, but of course you have to add labels in the linker script and use them in the bootstrap code to move .data to ram and zero .bss and such.
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x010000 0x00000000 0x00000000 0x00002 0x00002 R E 0x10000
LOAD 0x020000 0x20000000 0x00000002 0x00004 0x00004 RW 0x10000
Notice how now the physical is in the 0x00000000 range, it was tacked at the end of the space used by .text. but the virtual (where it wants to live, needs to live, do not think mmu here or anything like that just think the two address spaces (on flash and where it is used)).
In case this is not clear:
MEMORY
{
one : ORIGIN = 0xE0000000, LENGTH = 0x1000
two : ORIGIN = 0xE0000100, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > one
.data : { *(.data*) } > two
}
260 Apr 28 15:46 so.bin
66276 Apr 28 15:46 so.elf
objcopy starts the binary file at the lowest defined (loadable) address and not zero...The file size is the difference, inclusive, of the lowest addressed byte and the highest.
I have same problem with you, and I finally find it's not objcopy's problem, I just change my compile command and it work. At first, I just use gcc only, and I face the problem like you, then I try use gcc -c and ld(just separate this two steps), and the file is smaller in magically. so maybe the problem is in gcc, not objcopy. you can try like me, and my compile command now is:
gcc-4.8 -g -e boot_start -fno-builtin -Ttext 0x7C00 -nostdlib -m32 -c bootloader.S -o bootasm.o
gcc-4.8 -Os -fno-builtin -nostdlib -m32 -c bootloader.c -o bootc.o
ld -m elf_i386 -e boot_start -nostdlib -N bootasm.o bootc.o -o bootloader.o
objcopy -S -O binary bootloader.o bootloader.bin
hope it can help you...

Can't understand the 'Off' value of a section in readelf output? Is it offset from 'Address'?

Here is the output of readelf -a test.elf
Section Headers:
[Nr] Name Type Address Off Size ES Flg Lk Inf Al
[ 0] NULL 0000000000000000 000000 000000 00 0 0 0
[ 1] .text PROGBITS 0000000040000000 010000 00007c 00 AX 0 0 8
[ 2] .rodata PROGBITS 0000000040000080 010080 000016 00 A 0 0 8
[ 3] .debug_info PROGBITS 0000000000000000 010096 0000af 00 0 0 1
[ 4] .debug_abbrev PROGBITS 0000000000000000 010145 000086 00 0 0 1
[ 5] .debug_aranges PROGBITS 0000000000000000 0101cb 000030 00 0 0 1
The .text section starts at 0x40000000. With debugger, I could see the PC value is starting from 0x40000000, and the code there is the startup.s which is meant to be there. But I'm not sure why the value 'Off' for that section is 0x10000. What does this 'Off' value mean? Isn't Address and Size enough for a section?
The Offset field denotes the location of that segment in the file. Here the .text segment starts at location 0x10000 and is 0x7c bytes long, then next segment .rodata starts at 0x10080 etc.

RISC-V: Size of code size in an object file which is not linked

I have a .i file which I have compiled but not linked using the SiFive risc-v compiler as follows:
../riscv64-unknown-elf-gcc-8.2.0-2019.05.3-x86_64-linux-ubuntu14/bin/riscv64-unknown-elf-gcc clock.i
However when I do a readelf -S on the compiled object file the .text section is 0 bytes:
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .text PROGBITS 00000000 000034 000000 00 AX 0 0 2
[ 2] .data PROGBITS 00000000 000034 000000 00 WA 0 0 1
[ 3] .bss NOBITS 00000000 000034 000000 00 WA 0 0 1
[ 4] .text.getTime PROGBITS 00000000 000034 00003a 00 AX 0 0 2
[ 5] .rela.text.ge RELA 00000000 000484 0000c0 0c I 18 4 4
[ 6] .text.timeGap PROGBITS 00000000 00006e 00002e 00 AX 0 0 2
[ 7] .rela.text.ti RELA 00000000 000544 000024 0c I 18 6 4
[ 8] .text.applyTO PROGBITS 00000000 00009c 000038 00 AX 0 0 2
[ 9] .rela.text.ap RELA 00000000 000568 0000c0 0c I 18 8 4
[10] .text.clearTO PROGBITS 00000000 0000d4 000038 00 AX 0 0 2
[11] .rela.text.cl RELA 00000000 000628 0000c0 0c I 18 10 4
[12] .rodata.getTi PROGBITS 00000000 00010c 00000c 01 AMS 0 0 4
[13] .rodata.__func__. PROGBITS 00000000 000118 00000c 00 A 0 0 4
[14] .rodata.__func__. PROGBITS 00000000 000124 00000c 00 A 0 0 4
[15] .rodata.__func__. PROGBITS 00000000 000130 00000c 00 A 0 0 4
[16] .comment PROGBITS 00000000 00013c 000029 01 MS 0 0 1
[17] .riscv.attributes LOPROC+0x3 00000000 000165 00001f 00 0 0 1
[18] .symtab SYMTAB 00000000 000184 000240 10 19 31 4
[19] .strtab STRTAB 00000000 0003c4 0000be 00 0 0 1
[20] .shstrtab STRTAB 00000000 0006e8 000100 00 0 0 1
If I do a size on the compiled object file I get a size of 264 bytes:
text data bss dec hex filename
264 0 0 264 108 clock.o
If I do an nm --print-size I get the following:
U __assert_func
00000000 0000000c r __func__.3507
00000000 0000000c r __func__.3518
00000000 0000000c r __func__.3522
0000000c t .L11
00000036 t .L12
00000034 t .L14
00000036 t .L18
00000034 t .L2
00000034 t .L20
00000032 t .L3
0000001e t .L8
00000000 r .LANCHOR0
00000000 r .LANCHOR1
00000000 r .LANCHOR2
00000000 r .LC0
00000004 r .LC1
00000000 00000038 T applyTO
00000000 00000038 T clearTO
00000000 0000003a T getTime
00000000 0000002e T timeGap
Which to me the size would be 0x38 + 0x38 + 0x3A + 0x2E = 0xD8 (216) bytes.
How can I calculate the size of a compiled object file?
Your object file is compiled with -ffunction-sections, a special flag which allocates each function to it's dedicated section. You can see individual .texts for each function in readelf's output:
[ 4] .text.getTime PROGBITS 00000000 000034 00003a 00 AX 0 0 2
...
[ 6] .text.timeGap PROGBITS 00000000 00006e 00002e 00 AX 0 0 2
...
To get full code size you'll need to sum sizes for each section starting with ".text". A Perl one-liner:
$ readelf ... | perl -ne 'if(/ \.text/) { s/^.*\] *//; $sz += hex((split(/ +/))[4]); print "$sz\n"; }' 0
58
104
160
216 <-- Full size

Meaning of a Common String In Executables?

There appear to be some similar-looking long alphanumeric strings that commonly occur in Mach-O 64 bit executables and ELF 64-bit LSB executables among other symbols that are not alphanumeric:
cat /bin/bash | grep -c "AWAVAUATSH"
has 181 results, and
cat /usr/bin/gzip | grep -c "AWAVAUATSH"
has 9 results.
What are these strings?
Interesting question. Since I didn't know the answer, here are the steps I took to figure it out:
Where in the file does the string occur?
strings -otx /bin/gzip | grep AWAVAUATUSH
35e0 AWAVAUATUSH
69a0 AWAVAUATUSH
7920 AWAVAUATUSH
8900 AWAVAUATUSH
92a0 AWAVAUATUSH
Which section is that in?
readelf -WS /bin/gzip
There are 28 section headers, starting at offset 0x16860:
Section Headers:
[Nr] Name Type Address Off Size ES Flg Lk Inf Al
[ 0] NULL 0000000000000000 000000 000000 00 0 0 0
[ 1] .interp PROGBITS 0000000000400238 000238 00001c 00 A 0 0 1
[ 2] .note.ABI-tag NOTE 0000000000400254 000254 000020 00 A 0 0 4
[ 3] .note.gnu.build-id NOTE 0000000000400274 000274 000024 00 A 0 0 4
[ 4] .gnu.hash GNU_HASH 0000000000400298 000298 000038 00 A 5 0 8
[ 5] .dynsym DYNSYM 00000000004002d0 0002d0 000870 18 A 6 1 8
[ 6] .dynstr STRTAB 0000000000400b40 000b40 000360 00 A 0 0 1
[ 7] .gnu.version VERSYM 0000000000400ea0 000ea0 0000b4 02 A 5 0 2
[ 8] .gnu.version_r VERNEED 0000000000400f58 000f58 000080 00 A 6 1 8
[ 9] .rela.dyn RELA 0000000000400fd8 000fd8 000090 18 A 5 0 8
[10] .rela.plt RELA 0000000000401068 001068 0007e0 18 A 5 12 8
[11] .init PROGBITS 0000000000401848 001848 00001a 00 AX 0 0 4
[12] .plt PROGBITS 0000000000401870 001870 000550 10 AX 0 0 16
[13] .text PROGBITS 0000000000401dc0 001dc0 00f1ba 00 AX 0 0 16
[14] .fini PROGBITS 0000000000410f7c 010f7c 000009 00 AX 0 0 4
... etc.
From above output, we see that all instances of AWAVAUATUSH are in .text section (which covers [0x1dc0, 0x10f7a) offsets of the file.
Since this is .text, we expect to find executable instructions there. The address we are interested in is 0x401dc0 (.text address) + 0x35e0 (offset of AWAVAUATUSH in the file) - 0x1dc0 (offset of .text in the file) == 0x4035e0.
First, let's check that the above arithmetic is correct:
gdb -q /bin/gzip
(gdb) x/s 0x4035e0
0x4035e0: "AWAVAUATUSH\203\354HdH\213\004%("
Yes, it is. Next, what are the instructions there?
(gdb) x/20i 0x4035e0
0x4035e0: push %r15
0x4035e2: push %r14
0x4035e4: push %r13
0x4035e6: push %r12
0x4035e8: push %rbp
0x4035e9: push %rbx
0x4035ea: sub $0x48,%rsp
0x4035ee: mov %fs:0x28,%rax
0x4035f7: mov %rax,0x38(%rsp)
0x4035fc: xor %eax,%eax
0x4035fe: mov 0x213363(%rip),%rax # 0x616968
0x403605: mov %rdi,(%rsp)
0x403609: mov %rax,0x212cf0(%rip) # 0x616300
0x403610: cmpb $0x7a,(%rax)
0x403613: je 0x403730
0x403619: mov $0x616300,%ebx
0x40361e: mov (%rsp),%rdi
0x403622: callq 0x4019f0 <strlen#plt>
0x403627: cmp $0x20,%eax
0x40362a: mov %rax,0x8(%rsp)
These indeed look like normal executable instructions. What is the opcode of push %r15? This table shows that 0x41, 0x57 is indeed push %r15, and these opcodes just happen to spell AW in ASCII. Similarly, push %r14 is encoded as 0x41, 0x56, which just happens spell AV. Etc.
P.S. My version of gzip is fully stripped, which is why GDB shows no symbols in the above disassembly. If I use a non-stripped version instead, I see:
strings -o -tx gzip | grep AWAVAUATUSH | head -1
6be0 AWAVAUATUSH
readelf -WS gzip | grep text
[13] .text PROGBITS 0000000000401b00 001b00 00d102 00 AX 0 0 16
So the string is still in .text.
gdb -q ./gzip
(gdb) p/a 0x0000000000401b00 + 0x6be0 - 0x001b00
$1 = 0x406be0 <inflate_dynamic>
(gdb) disas/r 0x406be0
Dump of assembler code for function inflate_dynamic:
0x0000000000406be0 <+0>: 41 57 push %r15
0x0000000000406be2 <+2>: 41 56 push %r14
0x0000000000406be4 <+4>: 41 55 push %r13
0x0000000000406be6 <+6>: 41 54 push %r12
0x0000000000406be8 <+8>: 55 push %rbp
0x0000000000406be9 <+9>: 53 push %rbx
0x0000000000406bea <+10>: 48 81 ec 38 05 00 00 sub $0x538,%rsp
...
Now you can clearly see the ASCII 0x4157415641554154... sequence of opcodes.
P.P.S. The original question asks about AWAVAUATSH, which does appear in my Mach-O bash and gzip, but not in Linux ones. Conversely, AWAVAUATUSH does not appear in my Mach-O binaries.
The answer is however the same. The AWAVAUATSH sequence is the same as AWAVAUATUSH, but with push %rbp omitted.
P.P.P.S Here are some other "fun" strings of the same nature:
strings /bin/bash | grep '^A.A.A.' | sort | uniq -c | sort -nr | head
44 AWAVAUATUSH
27 AVAUATUSH
16 AWAVAUA
15 AVAUATUH
14 AWAVAUI
14 AWAVAUATUH
12 AWAVAUATI
8 AWAVAUE1
8 AVAUATI
6 AWAVAUATU

Force GCC to keep section when using link time optimization

I have a C struct that is compiled by GCC into a special section and placed at the beginning of an output binary via a linker script. It contains file metadata, including a magic value at the beginning.
Here's a simplified example using just a string as the struct.
const char __magic_value[9] __attribute__((section(".the_header"))) = "MAGICVAL";
GCC places this value into its own section, as we can see with readelf -S:
There are 10 section headers, starting at offset 0x190:
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .text PROGBITS 00000000 000034 000000 00 AX 0 0 2
[ 2] .data PROGBITS 00000000 000034 000000 00 WA 0 0 1
[ 3] .bss NOBITS 00000000 000034 000000 00 WA 0 0 1
[ 4] .the_header PROGBITS 00000000 000034 000009 00 A 0 0 1
[ 5] .comment PROGBITS 00000000 00003d 00001e 01 MS 0 0 1
[ 6] .ARM.attributes ARM_ATTRIBUTES 00000000 00005b 000033 00 0 0 1
[ 7] .shstrtab STRTAB 00000000 00008e 000051 00 0 0 1
[ 8] .symtab SYMTAB 00000000 0000e0 000090 10 9 8 4
[ 9] .strtab STRTAB 00000000 000170 00001e 00 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings)
I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
O (extra OS processing required) o (OS specific), p (processor specific)
(Note section #4.)
Now, if I specify -flto to the GCC invocation, this section is no longer emitted!
There are 19 section headers, starting at offset 0x730:
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .text PROGBITS 00000000 000034 000000 00 AX 0 0 2
[ 2] .data PROGBITS 00000000 000034 000000 00 WA 0 0 1
[ 3] .bss NOBITS 00000000 000034 000000 00 WA 0 0 1
[ 4] .gnu.lto_.profile PROGBITS 00000000 000034 00000f 00 E 0 0 1
[ 5] .gnu.lto_.icf.93e PROGBITS 00000000 000043 000017 00 E 0 0 1
[ 6] .gnu.lto_.jmpfunc PROGBITS 00000000 00005a 00000f 00 E 0 0 1
[ 7] .gnu.lto_.inline. PROGBITS 00000000 000069 00000f 00 E 0 0 1
[ 8] .gnu.lto_.purecon PROGBITS 00000000 000078 00000f 00 E 0 0 1
[ 9] .gnu.lto_.symbol_ PROGBITS 00000000 000087 000022 00 E 0 0 1
[10] .gnu.lto_.refs.93 PROGBITS 00000000 0000a9 00000f 00 E 0 0 1
[11] .gnu.lto_.decls.9 PROGBITS 00000000 0000b8 000239 00 E 0 0 1
[12] .gnu.lto_.symtab. PROGBITS 00000000 0002f1 00001d 00 E 0 0 1
[13] .gnu.lto_.opts PROGBITS 00000000 00030e 0000e8 00 E 0 0 1
[14] .comment PROGBITS 00000000 0003f6 00001e 01 MS 0 0 1
[15] .ARM.attributes ARM_ATTRIBUTES 00000000 000414 000033 00 0 0 1
[16] .shstrtab STRTAB 00000000 000447 00018c 00 0 0 1
[17] .symtab SYMTAB 00000000 0005d4 000130 10 18 17 4
[18] .strtab STRTAB 00000000 000704 00002c 00 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings)
I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
O (extra OS processing required) o (OS specific), p (processor specific)
My linker script does not find the .the_header section, resulting in a corrupt binary.
I have tried specifying __attribute__((used)) for the header variable, with no effect.
Here's the relevant part of the linker script:
ENTRY(main)
MEMORY
{
APP (rwx) : ORIGIN = 0, LENGTH = 65536
}
SECTIONS
{
.header :
{
KEEP(*(.the_header))
} > APP
...
}
How do I tell GCC to emit .the_header into the final binary when using link time optimization and the above linker script snippet?

Resources