There appear to be some similar-looking long alphanumeric strings that commonly occur in Mach-O 64 bit executables and ELF 64-bit LSB executables among other symbols that are not alphanumeric:
cat /bin/bash | grep -c "AWAVAUATSH"
has 181 results, and
cat /usr/bin/gzip | grep -c "AWAVAUATSH"
has 9 results.
What are these strings?
Interesting question. Since I didn't know the answer, here are the steps I took to figure it out:
Where in the file does the string occur?
strings -otx /bin/gzip | grep AWAVAUATUSH
35e0 AWAVAUATUSH
69a0 AWAVAUATUSH
7920 AWAVAUATUSH
8900 AWAVAUATUSH
92a0 AWAVAUATUSH
Which section is that in?
readelf -WS /bin/gzip
There are 28 section headers, starting at offset 0x16860:
Section Headers:
[Nr] Name Type Address Off Size ES Flg Lk Inf Al
[ 0] NULL 0000000000000000 000000 000000 00 0 0 0
[ 1] .interp PROGBITS 0000000000400238 000238 00001c 00 A 0 0 1
[ 2] .note.ABI-tag NOTE 0000000000400254 000254 000020 00 A 0 0 4
[ 3] .note.gnu.build-id NOTE 0000000000400274 000274 000024 00 A 0 0 4
[ 4] .gnu.hash GNU_HASH 0000000000400298 000298 000038 00 A 5 0 8
[ 5] .dynsym DYNSYM 00000000004002d0 0002d0 000870 18 A 6 1 8
[ 6] .dynstr STRTAB 0000000000400b40 000b40 000360 00 A 0 0 1
[ 7] .gnu.version VERSYM 0000000000400ea0 000ea0 0000b4 02 A 5 0 2
[ 8] .gnu.version_r VERNEED 0000000000400f58 000f58 000080 00 A 6 1 8
[ 9] .rela.dyn RELA 0000000000400fd8 000fd8 000090 18 A 5 0 8
[10] .rela.plt RELA 0000000000401068 001068 0007e0 18 A 5 12 8
[11] .init PROGBITS 0000000000401848 001848 00001a 00 AX 0 0 4
[12] .plt PROGBITS 0000000000401870 001870 000550 10 AX 0 0 16
[13] .text PROGBITS 0000000000401dc0 001dc0 00f1ba 00 AX 0 0 16
[14] .fini PROGBITS 0000000000410f7c 010f7c 000009 00 AX 0 0 4
... etc.
From above output, we see that all instances of AWAVAUATUSH are in .text section (which covers [0x1dc0, 0x10f7a) offsets of the file.
Since this is .text, we expect to find executable instructions there. The address we are interested in is 0x401dc0 (.text address) + 0x35e0 (offset of AWAVAUATUSH in the file) - 0x1dc0 (offset of .text in the file) == 0x4035e0.
First, let's check that the above arithmetic is correct:
gdb -q /bin/gzip
(gdb) x/s 0x4035e0
0x4035e0: "AWAVAUATUSH\203\354HdH\213\004%("
Yes, it is. Next, what are the instructions there?
(gdb) x/20i 0x4035e0
0x4035e0: push %r15
0x4035e2: push %r14
0x4035e4: push %r13
0x4035e6: push %r12
0x4035e8: push %rbp
0x4035e9: push %rbx
0x4035ea: sub $0x48,%rsp
0x4035ee: mov %fs:0x28,%rax
0x4035f7: mov %rax,0x38(%rsp)
0x4035fc: xor %eax,%eax
0x4035fe: mov 0x213363(%rip),%rax # 0x616968
0x403605: mov %rdi,(%rsp)
0x403609: mov %rax,0x212cf0(%rip) # 0x616300
0x403610: cmpb $0x7a,(%rax)
0x403613: je 0x403730
0x403619: mov $0x616300,%ebx
0x40361e: mov (%rsp),%rdi
0x403622: callq 0x4019f0 <strlen#plt>
0x403627: cmp $0x20,%eax
0x40362a: mov %rax,0x8(%rsp)
These indeed look like normal executable instructions. What is the opcode of push %r15? This table shows that 0x41, 0x57 is indeed push %r15, and these opcodes just happen to spell AW in ASCII. Similarly, push %r14 is encoded as 0x41, 0x56, which just happens spell AV. Etc.
P.S. My version of gzip is fully stripped, which is why GDB shows no symbols in the above disassembly. If I use a non-stripped version instead, I see:
strings -o -tx gzip | grep AWAVAUATUSH | head -1
6be0 AWAVAUATUSH
readelf -WS gzip | grep text
[13] .text PROGBITS 0000000000401b00 001b00 00d102 00 AX 0 0 16
So the string is still in .text.
gdb -q ./gzip
(gdb) p/a 0x0000000000401b00 + 0x6be0 - 0x001b00
$1 = 0x406be0 <inflate_dynamic>
(gdb) disas/r 0x406be0
Dump of assembler code for function inflate_dynamic:
0x0000000000406be0 <+0>: 41 57 push %r15
0x0000000000406be2 <+2>: 41 56 push %r14
0x0000000000406be4 <+4>: 41 55 push %r13
0x0000000000406be6 <+6>: 41 54 push %r12
0x0000000000406be8 <+8>: 55 push %rbp
0x0000000000406be9 <+9>: 53 push %rbx
0x0000000000406bea <+10>: 48 81 ec 38 05 00 00 sub $0x538,%rsp
...
Now you can clearly see the ASCII 0x4157415641554154... sequence of opcodes.
P.P.S. The original question asks about AWAVAUATSH, which does appear in my Mach-O bash and gzip, but not in Linux ones. Conversely, AWAVAUATUSH does not appear in my Mach-O binaries.
The answer is however the same. The AWAVAUATSH sequence is the same as AWAVAUATUSH, but with push %rbp omitted.
P.P.P.S Here are some other "fun" strings of the same nature:
strings /bin/bash | grep '^A.A.A.' | sort | uniq -c | sort -nr | head
44 AWAVAUATUSH
27 AVAUATUSH
16 AWAVAUA
15 AVAUATUH
14 AWAVAUI
14 AWAVAUATUH
12 AWAVAUATI
8 AWAVAUE1
8 AVAUATI
6 AWAVAUATU
Related
I have an ELF file which we then convert to a binary format:
arm-none-eabi-objcopy -O binary MyElfFile.elf MyBinFile.bin
The ELF file is just under 300KB, but the binary output file is 446-times larger: 134000KB, or 130MB! How is this possible when the whole point of a binary is to remove symbols and section tables and debug info?
Looking at Reddit and SO it looks like the binary image should be smaller than the ELF, not larger.
so.s
b .
.section .data
.word 0x12345678
arm-none-eabi-as so.s -o so.o
arm-none-eabi-objdump -D so.o
so.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <.text>:
0: eafffffe b 0 <.text>
Disassembly of section .data:
00000000 <.data>:
0: 12345678 eorsne r5, r4, #120, 12 ; 0x78
arm-none-eabi-readelf -a so.o
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .text PROGBITS 00000000 000034 000004 00 AX 0 0 4
[ 2] .data PROGBITS 00000000 000038 000004 00 WA 0 0 1
[ 3] .bss NOBITS 00000000 00003c 000000 00 WA 0 0 1
[ 4] .ARM.attributes ARM_ATTRIBUTES 00000000 00003c 000012 00 0 0 1
[ 5] .symtab SYMTAB 00000000 000050 000060 10 6 6 4
[ 6] .strtab STRTAB 00000000 0000b0 000004 00 0 0 1
[ 7] .shstrtab STRTAB 00000000 0000b4 00003c 00 0 0 1
so my "binary" has 8 bytes total. In two sections.
-rw-rw-r-- 1 oldtimer oldtimer 560 Oct 12 16:32 so.o
8 bytes relative to 560 for the object.
Link it.
MEMORY
{
one : ORIGIN = 0x00001000, LENGTH = 0x1000
two : ORIGIN = 0x00002000, LENGTH = 0x1000
}
SECTIONS
{
.text : { (.text) } > one
.data : { (.data) } > two
}
arm-none-eabi-ld -T so.ld so.o -o so.elf
arm-none-eabi-objdump -D so.elf
so.elf: file format elf32-littlearm
Disassembly of section .text:
00001000 <.text>:
1000: eafffffe b 1000 <.text>
Disassembly of section .data:
00002000 <.data>:
2000: 12345678 eorsne r5, r4, #120, 12 ; 0x7800000
arm-none-eabi-readelf -a so.elf
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .text PROGBITS 00001000 001000 000004 00 AX 0 0 4
[ 2] .data PROGBITS 00002000 002000 000004 00 WA 0 0 1
[ 3] .ARM.attributes ARM_ATTRIBUTES 00000000 002004 000012 00 0 0 1
[ 4] .symtab SYMTAB 00000000 002018 000070 10 5 7 4
[ 5] .strtab STRTAB 00000000 002088 00000c 00 0 0 1
[ 6] .shstrtab STRTAB 00000000 002094 000037 00 0 0 1
Now...we need 4 bytes at 0x1000 and 4 bytes at 0x2000, if we want to use the -O binary objcopy that means it is going to take the entire memory space and start the file with the lowest address thing and end with the highest address thing. With this link the lowest thing is 0x1000 and highest is 0x2003, a total span of 0x1004 bytes:
arm-none-eabi-objcopy -O binary so.elf so.bin
ls -al so.bin
-rwxrwxr-x 1 oldtimer oldtimer 4100 Oct 12 16:40 so.bin
4100 = 0x1004 bytes
hexdump -C so.bin
00000000 fe ff ff ea 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00001000 78 56 34 12 |xV4.|
00001004
The assumption here is the user knows that the base address is 0x1000 as there is no address info in the file format. And that this is a continuous memory image so that the four bytes also land at 0x2000. So -O binary pads the file to fill everything in.
If I change to this
MEMORY
{
one : ORIGIN = 0x00000000, LENGTH = 0x1000
two : ORIGIN = 0x10000000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > one
.data : { *(.data*) } > two
}
You can easily see where this is headed.
ls -al so.bin
-rwxrwxr-x 1 oldtimer oldtimer 268435460 Oct 12 16:43 so.bin
So my elf does not change size, but the -O binary format is 0x10000004 bytes in size, there are only 8 bytes I care about but the nature of objcopy -O binary has to pad the middle.
Since the sizes and spaces of things vary specific to your project and your linker script, no generic statements can be made relative to the size of the elf file and the size of an -O binary file.
ls -al so.elf
-rwxrwxr-x 1 oldtimer oldtimer 131556 Oct 12 16:49 so.elf
arm-none-eabi-strip so.elf
ls -al so.elf
-rwxrwxr-x 1 oldtimer oldtimer 131336 Oct 12 16:50 so.elf
arm-none-eabi-as -g so.s -o so.o
ls -al so.o
-rw-rw-r-- 1 oldtimer oldtimer 1300 Oct 12 16:51 so.o
arm-none-eabi-ld -T so.ld so.o -o so.elf
ls -al so.elf
-rwxrwxr-x 1 oldtimer oldtimer 132088 Oct 12 16:51 so.elf
arm-none-eabi-strip so.elf
ls -al so.elf
-rwxrwxr-x 1 oldtimer oldtimer 131336 Oct 12 16:52 so.elf
The elf binary file format does not have absolute rules on content, the consumer of the file can have rule as to what you have to put where, if any specific names of items have to be there, etc. It is a somewhat open file format, it is a container like a cardboard box, and you can fill it to some extent how you like. You cannot fit a cruise ship in it, but you can put books or toys and you can choose how you put the books or toys in it sometimes.
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .text PROGBITS 00000000 010000 000004 00 AX 0 0 4
[ 2] .data PROGBITS 10000000 020000 000004 00 WA 0 0 1
[ 3] .ARM.attributes ARM_ATTRIBUTES 00000000 020004 000012 00 0 0 1
[ 4] .shstrtab STRTAB 00000000 020016 000027 00 0 0 1
Even after stripping there is still extra stuff there, if you study the file format you have a header, relatively small with number of program headers and number of section headers and then that many program headers and that many section headers. Depending on the consumer(s) of the file you may for example only need the main header stuff and two program headers in this case and that is it, a much smaller file (as you can see with the object version of the file).
arm-none-eabi-as so.s -o so.o
ls -al so.o
-rw-rw-r-- 1 oldtimer oldtimer 560 Oct 12 16:57 so.o
arm-none-eabi-strip so.o
ls -al so.o
-rw-rw-r-- 1 oldtimer oldtimer 364 Oct 12 16:57 so.o
readelf that
Size of this header: 52 (bytes)
Size of program headers: 0 (bytes)
Number of program headers: 0
Size of section headers: 40 (bytes)
Number of section headers: 6
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .text PROGBITS 00000000 000034 000004 00 AX 0 0 4
[ 2] .data PROGBITS 00000000 000038 000004 00 WA 0 0 1
[ 3] .bss NOBITS 00000000 00003c 000000 00 WA 0 0 1
[ 4] .ARM.attributes ARM_ATTRIBUTES 00000000 00003c 000012 00 0 0 1
[ 5] .shstrtab STRTAB 00000000 00004e 00002c 00 0 0 1
Extra section headers we don't need which maybe can be removed in the linker script. But I assume for some consumers all you would need is the two program headers
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 2
Plus the 8 bytes and any padding for this file format.
Also note
arm-none-eabi-objcopy --only-section=.text -O binary so.elf text.bin
arm-none-eabi-objcopy --only-section=.data -O binary so.elf data.bin
ls -al text.bin
-rwxrwxr-x 1 oldtimer oldtimer 4 Oct 12 17:03 text.bin
ls -al data.bin
-rwxrwxr-x 1 oldtimer oldtimer 4 Oct 12 17:03 data.bin
hexdump -C text.bin
00000000 fe ff ff ea |....|
00000004
hexdump -C data.bin
00000000 78 56 34 12 |xV4.|
00000004
When we covert a .elf file generated from arm-gcc toolchain to .bin file, its size increases from 40kB to 1.1Gb.
For conversion we are using :
./arm-none-eabi-objcopy -O binary test.elf test.bin
It might be because of non-contiguous memory map and the gaps between the memory regions are just being filled with zeros.
What options can be used in objcopy? Or is there any other method to convert?
Following is the elf information:
Tag_CPU_name: "Cortex-M7" Tag_CPU_arch: v7E-M
Tag_CPU_arch_profile: Microcontroller Tag_THUMB_ISA_use: Thumb-2
Tag_FP_arch: FPv5/FP-D16 for ARMv8 Tag_ABI_PCS_wchar_t: 4
Tag_ABI_FP_denormal: Needed Tag_ABI_FP_exceptions: Needed
Tag_ABI_FP_number_model: IEEE 754 Tag_ABI_align_needed: 8-byte
Tag_ABI_enum_size: small Tag_ABI_VFP_args: VFP registers
Tag_ABI_optimization_goals: Aggressive Debug
Tag_CPU_unaligned_access: v6
The listing of sections contained in the ELF file is - there are 25 section headers, starting at offset 0x3e982c:
Section Headers:
[Nr] Name
Type Addr Off Size ES Lk Inf Al
Flags
[ 0]
NULL 00000000 000000 000000 00 0 0 0
[00000000]:
[ 1] .flash_config
PROGBITS 60000000 020000 000200 00 0 0 4
[00000002]: ALLOC
[ 2] .ivt
PROGBITS 60001000 021000 000030 00 0 0 4
[00000002]: ALLOC
[ 3] .interrupts
PROGBITS 60002000 022000 000400 00 0 0 4
[00000002]: ALLOC
[ 4] .text
PROGBITS 60002400 022400 312008 00 0 0 16
[00000006]: ALLOC, EXEC
[ 5] .ARM
ARM_EXIDX 60314408 334408 000008 00 4 0 4
[00000082]: ALLOC, LINK ORDER
[ 6] .init_array
INIT_ARRAY 60314410 334410 000004 04 0 0 4
[00000003]: WRITE, ALLOC
[ 7] .fini_array
FINI_ARRAY 60314414 334414 000004 04 0 0 4
[00000003]: WRITE, ALLOC
[ 8] .interrupts_ram
PROGBITS 20200000 380000 000000 00 0 0 1
[00000001]: WRITE
[ 9] .data
PROGBITS 20200000 340000 014bd0 00 0 0 8
[00000007]: WRITE, ALLOC, EXEC
[10] .ncache.init
PROGBITS 20214bd0 354bd0 011520 00 0 0 4
[00000003]: WRITE, ALLOC
[11] .ncache
NOBITS 20226100 366100 0021d8 00 0 0 64
[00000003]: WRITE, ALLOC
[12] .bss
NOBITS 20229000 369000 077ce8 00 0 0 4096
[00000003]: WRITE, ALLOC
[13] .NVM_TABLE
PROGBITS 20000000 010000 00000c 00 0 0 4
[00000003]: WRITE, ALLOC
[14] .heap
NOBITS 2000000c 01000c 000404 00 0 0 1
[00000003]: WRITE, ALLOC
[15] .stack
NOBITS 20000410 01000c 000400 00 0 0 1
[00000003]: WRITE, ALLOC
[16] .NVM
PROGBITS 60570000 370000 010000 00 0 0 1
[00000003]: WRITE, ALLOC
[17] .ARM.attributes
ARM_ATTRIBUTES 00000000 380000 00002e 00 0 0 1
[00000000]:
[18] .comment
PROGBITS 00000000 38002e 00004c 01 0 0 1
[00000030]: MERGE, STRINGS
[19] .debug_frame
PROGBITS 00000000 38007c 001174 00 0 0 4
[00000000]:
[20] .stab
PROGBITS 00000000 3811f0 0000cc 0c 21 0 4
[00000000]:
[21] .stabstr
STRTAB 00000000 3812bc 0001b9 00 0 0 1
[00000000]:
[22] .symtab
SYMTAB 00000000 381478 046620 10 23 13540 4
[00000000]:
[23] .strtab
STRTAB 00000000 3c7a98 021cb2 00 0 0 1
[00000000]:
[24] .shstrtab
STRTAB 00000000 3e974a 0000df 00 0 0 1
[00000000]:
so.s
.thumb
nop
.data
.word 0x11223344
so.ld
MEMORY
{
one : ORIGIN = 0x00000000, LENGTH = 0x1000
two : ORIGIN = 0x20000000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > one
.data : { *(.data*) } > two
}
build
arm-none-eabi-as so.s -o so.o
arm-none-eabi-ld -T so.ld so.o -o so.elf
arm-none-eabi-objdump -D so.elf
arm-none-eabi-objcopy -O binary so.elf so.bin
536870916 Apr 28 15:23 so.bin
131556 Apr 28 15:23 so.elf
from readelf
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x010000 0x00000000 0x00000000 0x00002 0x00002 R E 0x10000
LOAD 0x020000 0x20000000 0x20000000 0x00004 0x00004 RW 0x10000
now so.ld
MEMORY
{
one : ORIGIN = 0x00000000, LENGTH = 0x1000
two : ORIGIN = 0x20000000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > one
.bss : { *(.bss*) } > two AT > one
.data : { *(.data*) } > two AT > one
}
it is actually .bss that is doing the magic here, that is some other research project, I could have started with a .C file but tried asm...
6 Apr 28 15:30 so.bin
131556 Apr 28 15:29 so.elf
and now it is the possibly desired 6 bytes without padding, but of course you have to add labels in the linker script and use them in the bootstrap code to move .data to ram and zero .bss and such.
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x010000 0x00000000 0x00000000 0x00002 0x00002 R E 0x10000
LOAD 0x020000 0x20000000 0x00000002 0x00004 0x00004 RW 0x10000
Notice how now the physical is in the 0x00000000 range, it was tacked at the end of the space used by .text. but the virtual (where it wants to live, needs to live, do not think mmu here or anything like that just think the two address spaces (on flash and where it is used)).
In case this is not clear:
MEMORY
{
one : ORIGIN = 0xE0000000, LENGTH = 0x1000
two : ORIGIN = 0xE0000100, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > one
.data : { *(.data*) } > two
}
260 Apr 28 15:46 so.bin
66276 Apr 28 15:46 so.elf
objcopy starts the binary file at the lowest defined (loadable) address and not zero...The file size is the difference, inclusive, of the lowest addressed byte and the highest.
I have same problem with you, and I finally find it's not objcopy's problem, I just change my compile command and it work. At first, I just use gcc only, and I face the problem like you, then I try use gcc -c and ld(just separate this two steps), and the file is smaller in magically. so maybe the problem is in gcc, not objcopy. you can try like me, and my compile command now is:
gcc-4.8 -g -e boot_start -fno-builtin -Ttext 0x7C00 -nostdlib -m32 -c bootloader.S -o bootasm.o
gcc-4.8 -Os -fno-builtin -nostdlib -m32 -c bootloader.c -o bootc.o
ld -m elf_i386 -e boot_start -nostdlib -N bootasm.o bootc.o -o bootloader.o
objcopy -S -O binary bootloader.o bootloader.bin
hope it can help you...
Here is the output of readelf -a test.elf
Section Headers:
[Nr] Name Type Address Off Size ES Flg Lk Inf Al
[ 0] NULL 0000000000000000 000000 000000 00 0 0 0
[ 1] .text PROGBITS 0000000040000000 010000 00007c 00 AX 0 0 8
[ 2] .rodata PROGBITS 0000000040000080 010080 000016 00 A 0 0 8
[ 3] .debug_info PROGBITS 0000000000000000 010096 0000af 00 0 0 1
[ 4] .debug_abbrev PROGBITS 0000000000000000 010145 000086 00 0 0 1
[ 5] .debug_aranges PROGBITS 0000000000000000 0101cb 000030 00 0 0 1
The .text section starts at 0x40000000. With debugger, I could see the PC value is starting from 0x40000000, and the code there is the startup.s which is meant to be there. But I'm not sure why the value 'Off' for that section is 0x10000. What does this 'Off' value mean? Isn't Address and Size enough for a section?
The Offset field denotes the location of that segment in the file. Here the .text segment starts at location 0x10000 and is 0x7c bytes long, then next segment .rodata starts at 0x10080 etc.
I am used to getting nice listing files from C code where I can see lovely source code intertwined with opcodes and hex offsets for debugging as seen here: List File In C (.LST) List File In C (.LST)
And the -S directive gets me the assembler code only from g++ for Ada.... but I can't seem to get it to give up the good stuff so I can debug a nasty elaboration crash.
Any thoughts on the GNAT compiler switches to send in?
Maybe this helps. The next command generates something similar to what you refer to:
$ gnatmake -g main.adb -cargs -Wa,-adhln > main.lst
The -cargs (a so-called mode switch) causes gnatmake to pass the subsequent arguments to the compiler. The compiler subsequently passes the -adhln switches to the assembler (see here). But you might as wel use objdump -d -S main.o to see the assembly/source code after build.
main.adb
with Ada.Text_IO; use Ada.Text_IO;
procedure Main is
begin
Put_Line ("Hello, world!");
end Main;
output (main.lst)
1 .file "main.adb"
2 .text
3 .Ltext0:
4 .section .rodata
5 .LC1:
6 0000 48656C6C .ascii "Hello, world!"
6 6F2C2077
6 6F726C64
6 21
7 000d 000000 .align 8
8 .LC0:
9 0010 01000000 .long 1
10 0014 0D000000 .long 13
11 .text
12 .align 2
13 .globl _ada_main
15 _ada_main:
16 .LFB1:
17 .file 1 "main.adb"
1:main.adb **** with Ada.Text_IO; use Ada.Text_IO;
2:main.adb ****
3:main.adb **** procedure Main is
18 .loc 1 3 1
19 .cfi_startproc
20 0000 55 pushq %rbp
21 .cfi_def_cfa_offset 16
22 .cfi_offset 6, -16
23 0001 4889E5 movq %rsp, %rbp
24 .cfi_def_cfa_register 6
25 0004 53 pushq %rbx
26 0005 4883EC08 subq $8, %rsp
27 .cfi_offset 3, -24
28 .LBB2:
4:main.adb **** begin
5:main.adb **** Put_Line ("Hello, world!");
29 .loc 1 5 4
30 0009 B8000000 movl $.LC1, %eax
30 00
31 000e BA000000 movl $.LC0, %edx
31 00
32 0013 4889C1 movq %rax, %rcx
33 0016 4889D3 movq %rdx, %rbx
34 0019 4889D0 movq %rdx, %rax
35 001c 4889CF movq %rcx, %rdi
36 001f 4889C6 movq %rax, %rsi
37 0022 E8000000 call ada__text_io__put_line__2
37 00
38 .LBE2:
6:main.adb **** end Main;
39 .loc 1 6 5
40 0027 4883C408 addq $8, %rsp
41 002b 5B popq %rbx
42 002c 5D popq %rbp
43 .cfi_def_cfa 7, 8
44 002d C3 ret
45 .cfi_endproc
46 .LFE1:
48 .Letext0:
You might want to look at the section on debugging control in the top-secret GNAT documentation, especially the -gnatG switch.
I am learning how a C file is compiled to machine code. I know I can generate assembly from gcc with the -S flag, however it also produces a lot of code to do with main() and printf() that I am not interested in at the moment.
Is there a way to get gcc or clang to "compile" a function in isolation and output the assembly?
I.e. get the assembly for the following c in isolation:
int add( int a, int b ) {
return a + b;
}
There are two ways to do this for a specific object file:
The -ffunction-sections option to gcc instructs it to create a separate ELF section for each function in the sourcefile being compiled.
The symbol table contains section name, start address and size of a given function; that can be fed into objdump via the --start-address/--stop-address arguments.
The first example:
$ readelf -S t.o | grep ' .text.'
[ 1] .text PROGBITS 0000000000000000 00000040
[ 4] .text.foo PROGBITS 0000000000000000 00000040
[ 6] .text.bar PROGBITS 0000000000000000 00000060
[ 9] .text.foo2 PROGBITS 0000000000000000 000000c0
[11] .text.munch PROGBITS 0000000000000000 00000110
[14] .text.startup.mai PROGBITS 0000000000000000 00000180
This has been compiled with -ffunction-sections and there are four functions, foo(), bar(), foo2() and munch() in my object file. I can disassemble them separately like so:
$ objdump -w -d --section=.text.foo t.o
t.o: file format elf64-x86-64
Disassembly of section .text.foo:
0000000000000000 <foo>:
0: 48 83 ec 08 sub $0x8,%rsp
4: 8b 3d 00 00 00 00 mov 0(%rip),%edi # a <foo+0xa>
a: 31 f6 xor %esi,%esi
c: 31 c0 xor %eax,%eax
e: e8 00 00 00 00 callq 13 <foo+0x13>
13: 85 c0 test %eax,%eax
15: 75 01 jne 18 <foo+0x18>
17: 90 nop
18: 48 83 c4 08 add $0x8,%rsp
1c: c3 retq
The other option can be used like this (nm dumps symbol table entries):
$ nm -f sysv t.o | grep bar
bar |0000000000000020| T | FUNC|0000000000000026| |.text
$ objdump -w -d --start-address=0x20 --stop-address=0x46 t.o --section=.text
t.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000020 <bar>:
20: 48 83 ec 08 sub $0x8,%rsp
24: 8b 3d 00 00 00 00 mov 0(%rip),%edi # 2a <bar+0xa>
2a: 31 f6 xor %esi,%esi
2c: 31 c0 xor %eax,%eax
2e: e8 00 00 00 00 callq 33 <bar+0x13>
33: 85 c0 test %eax,%eax
35: 75 01 jne 38 <bar+0x18>
37: 90 nop
38: bf 3f 00 00 00 mov $0x3f,%edi
3d: 48 83 c4 08 add $0x8,%rsp
41: e9 00 00 00 00 jmpq 46 <bar+0x26>
In this case, the -ffunction-sections option hasn't been used, hence the start offset of the function isn't zero and it's not in its separate section (but in .text).
Beware though when disassembling object files ...
This isn't exactly what you want, because, for object files, the call targets (as well as addresses of global variables) aren't resolved - you can't see here that foo calls printf, because the resolution of that on binary level happens only at link time. The assembly source would have the call printf in there though. The information that this callq is actually to printf is in the object file, but separate from the code (it's in the so-called relocation section that lists locations in the object file to be 'patched' by the linker); the disassembler can't resolve this.
The best way to go would be to copy your function in a single temp.c C file and to compile it with the -c flag like this: gcc -c -S temp.c -o temp.s
It should produce a more tighten assembly code with no other distraction (except for the header and footer).