How do I get full assembler output in gcc?

How do I get full assembler output in gcc? - gcc

I know I can get the assembler source code generated by the compiler by using:
gcc -S ...
even though that annoyingly doesn't give me an object file as part of the process.
But how can I get everything about the compiled code? I mean addresses, the bytes generated and so forth.
The instructions output by gcc -S do not tell me anything about instruction lengths or encodings, which is what I want to see.

I like objdump for this, but the most useful options are non-obvious - especially if you're using it on an object file which contains relocations, rather than a final binary.
objdump -d some_binary does a reasonable job.
objdump -d some_object.o is less useful because calls to external functions don't get disassembled helpfully:
...
00000005 <foo>:
5: 55 push %ebp
6: 89 e5 mov %esp,%ebp
8: 53 push %ebx
...
29: c7 04 24 00 00 00 00 movl $0x0,(%esp)
30: e8 fc ff ff ff call 31 <foo+0x2c>
35: 89 d8 mov %ebx,%eax
...
The call is actually to printf()... adding the -r flag helps with that; it marks relocations. objdump -dr some_object.o gives:
...
29: c7 04 24 00 00 00 00 movl $0x0,(%esp)
2c: R_386_32 .rodata.str1.1
30: e8 fc ff ff ff call 31 <foo+0x2c>
31: R_386_PC32 printf
...
Then, I find it useful to see each line annotated as <symbol+offset>. objdump has a handy option for that, but it has the annoying side effect of turning off the dump of the actual bytes - objdump --prefix-addresses -dr some_object.o gives:
...
00000005 <foo> push %ebp
00000006 <foo+0x1> mov %esp,%ebp
00000008 <foo+0x3> push %ebx
...
But it turns out that you can undo that by providing another obscure option, finally arriving at my favourite objdump incantation:
objdump --prefix-addresses --show-raw-insn -dr file.o
which gives output like this:
...
00000005 <foo> 55 push %ebp
00000006 <foo+0x1> 89 e5 mov %esp,%ebp
00000008 <foo+0x3> 53 push %ebx
...
00000029 <foo+0x24> c7 04 24 00 00 00 00 movl $0x0,(%esp)
2c: R_386_32 .rodata.str1.1
00000030 <foo+0x2b> e8 fc ff ff ff call 00000031 <foo+0x2c>
31: R_386_PC32 printf
00000035 <foo+0x30> 89 d8 mov %ebx,%eax
...
And if you've built with debugging symbols (i.e. compiled with -g), and you replace the -dr with -Srl, it will attempt to annotate the output with the corresponding source lines.

The easiest way to get a quick listing is to use the -a option to the assembler, which you can do by putting -Wa,-a on the gcc command line. You can use various modifiers to the a option to affect exactly what comes out -- see the as(1) man page.

It sounds to me like you want a disassembler. objdump is pretty much the standard (otool on Mac OS X); in concert with whatever map file information your linker gives you, the disassembly of your object file should give you everything you want.

gcc will produce an assembly language source file. You can then use as -a yourfile.S to produce a listing that includes offsets and encoded bytes for each instruction. -a also has some sub-options to control what shows up in the listing file (as --help will give a list of them along with the other available options).

nasm -f elf xx.asm -l x.lst
gcc xx.c xx.o -o xx
generates a 'list' file x.lst which is only for xx.asm
for xx.c along with xx.asm you can compile them both and then use 'gdb' - gnu debugger

Related

How do I implement an FFI from Rust in assembler?

My Rust code needs to make winapi FFIs and I see winapi-rs is very popular. What I need now, is to see the actual instructions of these FFIs. The binary object files are available on github (for example GLU32).
Just as an example, it contains a 663 bytes object file dsjfbs00001.o, which I'd like to disassemble and see the instructions. I've tried without giving an offset (which means it starts at 0):
objdump -b binary -Mintel,x86-64 -m i386 -D dsjfbs00001.o
This line comes from the similar question Disassembling A Flat Binary File Using objdump and I get this output (I show just the first 16 lines, it goes on for 247 lines):
dsjfbs00001.o: file format binary
Disassembly of section .data:
00000000 <.data>:
0: 64 86 07 xchg BYTE PTR fs:[rdi],al
3: 00 00 add BYTE PTR [rax],al
5: 00 00 add BYTE PTR [rax],al
7: 00 84 01 00 00 0a 00 add BYTE PTR [rcx+rax*1+0xa0000],al
e: 00 00 add BYTE PTR [rax],al
10: 00 00 add BYTE PTR [rax],al
12: 04 00 add al,0x0
14: 2e 74 65 cs je 0x7c
17: 78 74 js 0x8d
...
I have some knowledge about assembler, but here I'm at a loss. The executable code obviously doesn't start at 0 so I wonder how can I discover the correct offset?
The output shows that this is the .data section. But how does it tell this? It this a guess? A hexdump returns exactly the same bytes with no header bytes (i.e. such as an elf file would have):
0000000 8664 0007 0000 0000 0184 0000 000a 0000
0000010 0000 0004 742e 7865 0074 0000 0000 0000
Endianness aside, it starts with 0x64, 0x86, 0x07, as seen above for the xchg opcode. So how can it tell it's a .data section? And then... where's the .text section I'm interested in? It never says there's one.
From all of this I deduce that without an actual offset it's impossible to tell where the entry point is. Actually, the initial ~600 bytes contain many zeroes, while the last ~60 bytes have the typical entropy you'd expect from executable code. But I don't know how to determine this offset exactly by searching in the winapi-rs repo (the *.def files look useless to me, they just list the available routine names).
And as an additional question, would it be feasible to create those file on my own? Can't I just take/write some assembly code, produce an object file with NASM or similar, and use that for FFIs from my Rust code? Is this even possible?
Where would I start doing something like this, if I don't even have C/C++ WinAPI header files or Visual Studio?
BTW: I really need just some ~10 functions of GLU32, not the whole winapi.

What is the format of the content in the .text generated by the linker?

During the compilation process, the linker maps our code text content into the .text in the code memory section. I would like to know what is the meaning of the text content, does it mean the actual code in text or in assembly?
Thanks a lot!

does it mean the actual code in text or in assembly?
Neither: it's actual code in machine instructions.
For example:
$ cat > t.c
int foo() { return 42; }
$ gcc -c t.c
$ objdump -d t.o
t.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <foo>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: b8 2a 00 00 00 mov $0x2a,%eax
9: 5d pop %rbp
a: c3 retq
The contents of the .text section is the following 11 bytes:
554889e5b82a0000005dc3
Update:
is it correct to say that it contains the assembly code which converted into machine readable binary format?
You could say that, but it not very clear.
Perhaps "contains machine readable binary instructions, produced by compiling and assembling the program source, and applying relocations". (That last part is what the linker does, not demonstrated in the example above.)

Calls to Addresses in the Middle of Routines

I am tracing wireshark-2.6.10 using Pin. At several points during the initialization, I can see some calls, such as this:
00000000004e9400 <__libc_csu_init##Base>:
...
4e9449: 41 ff 14 dc callq *(%r12,%rbx,8)
...
The target of this call is 0x197db0, shown here:
0000000000197cb0 <_start##Base>:
...
197db0: 55 push %rbp
197db1: 48 89 e5 mov %rsp,%rbp
197db4: 5d pop %rbp
197db5: e9 66 ff ff ff jmpq 197d20 <_start##Base+0x70>
197dba: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
...
Pin says that this is in the middle of the containing routine, i.e., _start##Base. But, when I reach this target using gdb, I see the following output:
>│0x5555556ebdb0 <frame_dummy> push %rbp
│0x5555556ebdb1 <frame_dummy+1> mov %rsp,%rbp
│0x5555556ebdb4 <frame_dummy+4> pop %rbp
│0x5555556ebdb5 <frame_dummy+5> jmpq 0x5555556ebd20 <register_tm_clones>
│0x5555556ebdba <frame_dummy+10> nopw 0x0(%rax,%rax,1)
│0x5555556ebdc0 <main_window_update()> xor %edi,%edi
Note that if I subtract the bias value, the runtime target address will be consistent with the compile time value (i.e., 0x5555556ebdb0 - 0x555555554000 = 0x197db0). It seems that there exists a pseudo-routine called frame_dummy inside _start##Base. How is that possible? How can I extract the addresses for these pseudo-routines, beforehand (i.e., before execution)?
UPDATE:
These types of calls to the middle of functions were not present in GIMP and Anjuta (which are written almost purely in C and built from source). But are present in Inkscape and Wireshark (written in C++, although I do not think that the language is the cause. These two were installed from packages.).
At first, it seemed that this situation occurs only during the initialization and before calling the main() function. But, at least in wireshark-2.6.10 this occurs at least in one place after main() starts. Here, we have wireshark-qt.cpp: Lines 522-524 (which is part of main()).
/* Get the compile-time version information string */
comp_info_str = get_compiled_version_info(get_wireshark_qt_compiled_info,
get_gui_compiled_info);
This is a call to get_compiled_version_info(). In assembly, the function is called at address 0x5555556e74c2 (0x1934c2 without bias), as shown below:
>│0x5555556e74c2 <main(int, char**)+178> callq 0x5555556f5870 <get_compiled_version_info>
│0x5555556e74c7 <main(int, char**)+183> lea 0x4972(%rip),%rdi # 0x5555556ebe40 <get_wireshark_runtime_info(_GString*)>
│0x5555556e74ce <main(int, char**)+190> mov %rax,%r13
Again, the target is in the middle of another function, _ZN7QStringD1Ev##Base:
00000000001980f0 <_ZN7QStringD1Ev##Base>:
...
1a1870: 41 54 push %r12
...
This is the output of gdb (0x5555556f5870 - 0x555555554000 = 0x1a1870):
>│0x5555556f5870 <get_compiled_version_info> push %r12
│0x5555556f5872 <get_compiled_version_info+2> mov %rdi,%r12
│0x5555556f5875 <get_compiled_version_info+5> push %rbp
│0x5555556f5876 <get_compiled_version_info+6> lea 0x349445(%rip),%rdi # 0x555555a3ecc2
As can be seen, the debugger recognizes that this address is the start address of get_compiled_version_info(). This is because it has access to debug_info. In all cases that I found, the symbol for these pseudo-routines were removed from the original binary (because .symtab was removed from the binary). But the strange thing is that it is located inside _ZN7QStringD1Ev##Base. Therefore, Pin considers get_compiled_version_info() to be inside _ZN7QStringD1Ev##Base.

How is that possible?
The frame_dummy is a bona-fide C function. If Pin thinks it's in the middle of _start, it's probably because:
_start is an assembly function, and
its .st_size is set incorrectly in the symbol table.
You can confirm this by looking at readelf -Ws a.out | egrep ' (_start|frame_dummy)'.
You are probably using the binary linked with fairly old GLIBC.
GLIBC used to generate C runtime startup files (whence _start comes from) by using gcc -S to create assembly from C source, then splitting and editing the assembly with sed. Getting .size directive wrong was one problem with that approach, and it is no longer used on x86_64 as of 2012 (commit).
How can I extract the addresses for these pseudo-routines, beforehand (i.e., before execution)?
Pin doesn't magically create these pseudo-routines, they must be visible in the readelf -Ws output of the original binary.

GCC compile and link raw output

I am trying to get the raw instruction code output for a simple C program with function calls.
I have already searched on here and Google for the answer but can only find answers that are correct for single functions (no function calls).
A trivial example:
int main(){
return addition(5, 7);
}
int addition(int a, int b){
return a + b;
}
When I use gcc -c test.c -o test.o and then objdump -d test.o on this, the JAL (jump and link) instruction shows a jump to address 0x00000000 which is obviously incorrect, however when I compile the program fully, I get an enormous amount of junk in the objdump command
I am compiling with the mips compiler (and associated mips-objdump, etc).
The programs I am compiling are self contained (no external libraries, or system include files). What I want is a dump of the instructions where the JAL and equivalent instructions point to the correct addresses for the functions they call.

While your code is not linked, the addresses may or may not be resolved (absolute addresses certainly won't be). In the latter case, you should see relocation entries if you use objdump -dr. If you link your program, those issues should be gone, and if the program is really standalone (ie. not even C libraries) then all you see should be your code. You might want to use -nostdlib switch to gcc. I don't have mips gcc available, but here is the x86 version for illustration:
080480d8 <addition>:
80480d8: 8b 44 24 08 mov 0x8(%esp),%eax
80480dc: 03 44 24 04 add 0x4(%esp),%eax
80480e0: c3 ret
080480e1 <main>:
80480e1: 83 ec 08 sub $0x8,%esp
80480e4: c7 44 24 04 07 00 00 movl $0x7,0x4(%esp)
80480eb: 00
80480ec: c7 04 24 05 00 00 00 movl $0x5,(%esp)
80480f3: e8 e0 ff ff ff call 80480d8 <addition>
80480f8: 83 c4 08 add $0x8,%esp
80480fb: c3 ret
That is all the code in the binary, and as you can see at 80480f3 the call is resolved. I expect it works similarly for mips.

How to link assembly functions without a linker?

I'm using NASM for some projects in Windows. I'd like to call C's printf function, but I don't want GCC with all it's burocracy, making my small project (assembly is actually 30 lines) in disassembled 24000 lines. How can I make linking ( get a function be called within system definitions, like MessageBox ) without a linker?
Edit:
I made it using a disassembler. It's funny to see that almost everything is add and nop.
0000000000402b90 <MessageBoxA>:
402b90: ff 25 2a 58 00 00 jmpq *0x582a(%rip) # 4083c0 <__imp_MessageBoxA>
402b96: 90 nop
402b97: 90 nop
402b98: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
402b9f: 00
and __imp_MessageBoxA:
00000000004083c0 <__imp_MessageBoxA>:
4083c0: 14 87 adc $0x87,%al
4083c2: 00 00 add %al,(%rax)
4083c4: 00 00 add %al,(%rax)
What does it actually do?

You can do C library linking manually. use following command on your object file produced with NASM:
ld -dynamic-linker /lib64/ld-linux-x86-64.so.2 objectfile -lc
Above command tested on 64bit linux so if you want to use it on windows you must change /lib64/ld-linux-x86-64.so.2 into your dynamic linker library usually find on C:/cygwin/lib/.. if you are using cygwin and on C:/MingW/msys/1.0/lib64/ld-linux-x86-64.so.2 if you using MinGW.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio