Is there a way to format objdump output? - terminal

Examining some binary files with objdump (i'm on a mac, but installed binutils).
Is there a way to align the columns so there is no column overflow? Posted some example output below to illustrate the current state of things.
I don't want to send this to a text file and then auto edit everything. Is there a way to adjust the current terminal formatting.
objdump -D first | grep -A10 main:
_main:
100000f30: 55 pushq %rbp
100000f31: 48 89 e5 movq %rsp, %rbp
100000f34: 48 83 ec 20 subq $32, %rsp
100000f38: c7 45 fc 00 00 00 00 movl $0, -4(%rbp)
100000f3f: 89 7d f8 movl %edi, -8(%rbp)
100000f42: 48 89 75 f0 movq %rsi, -16(%rbp)
100000f46: c7 45 ec 00 00 00 00 movl $0, -20(%rbp)
100000f4d: 83 7d ec 0a cmpl $10, -20(%rbp)
100000f51: 0f 8d 1f 00 00 00 jge 31 <_main+0x46>
100000f57: 48 8d 3d 40 00 00 00 leaq 64(%rip), %rdi

Use gobjdump to access the binutils installed on mac. That uses an autoformatted output

Related

The difference between mov and movl instruction in X86? and I meet some trouble when reading assembly [duplicate]

This question already has an answer here:
Assembly do we need the endings? [duplicate]
(1 answer)
Closed 1 year ago.
Recently, I read some books about computer science. I wrote some C code, and disassembled them, using gcc and objdump.
The following C code:
#include <stdio.h>
#include <stdbool.h>
int dojob()
{
static short num[ ][4] = { {2, 9, -1, 5}, {3, 8, 2, -6}};
static short *pn[ ] = {num[0], num[1]};
static short s[2] = {0, 0};
int i, j;
for (i=0; i<2; i++) {
for (j=0; j<4; j++){
s[i] += *pn[i]++;
}
printf ("sum of line %d: %d\n", i+1, s[i]);
}
return 0;
}
int main ( )
{
dojob();
}
got the following assembly code (AT&T syntex; only assembly of function dojob and some data is list):
00401350 <_dojob>:
401350: 55 push %ebp
401351: 89 e5 mov %esp,%ebp
401353: 83 ec 28 sub $0x28,%esp
401356: c7 45 f4 00 00 00 00 movl $0x0,-0xc(%ebp)
40135d: eb 75 jmp 4013d4 <_dojob+0x84>
40135f: c7 45 f0 00 00 00 00 movl $0x0,-0x10(%ebp)
401366: eb 3c jmp 4013a4 <_dojob+0x54>
401368: 8b 45 f4 mov -0xc(%ebp),%eax
40136b: 8b 04 85 00 20 40 00 mov 0x402000(,%eax,4),%eax
401372: 8d 48 02 lea 0x2(%eax),%ecx
401375: 8b 55 f4 mov -0xc(%ebp),%edx
401378: 89 0c 95 00 20 40 00 mov %ecx,0x402000(,%edx,4)
40137f: 0f b7 10 movzwl (%eax),%edx
401382: 8b 45 f4 mov -0xc(%ebp),%eax
401385: 0f b7 84 00 08 50 40 movzwl 0x405008(%eax,%eax,1),%eax
40138c: 00
40138d: 89 c1 mov %eax,%ecx
40138f: 89 d0 mov %edx,%eax
401391: 01 c8 add %ecx,%eax
401393: 89 c2 mov %eax,%edx
401395: 8b 45 f4 mov -0xc(%ebp),%eax
401398: 66 89 94 00 08 50 40 mov %dx,0x405008(%eax,%eax,1)
40139f: 00
4013a0: 83 45 f0 01 addl $0x1,-0x10(%ebp)
4013a4: 83 7d f0 03 cmpl $0x3,-0x10(%ebp)
4013a8: 7e be jle 401368 <_dojob+0x18>
4013aa: 8b 45 f4 mov -0xc(%ebp),%eax
4013ad: 0f b7 84 00 08 50 40 movzwl 0x405008(%eax,%eax,1),%eax
4013b4: 00
4013b5: 98 cwtl
4013b6: 8b 55 f4 mov -0xc(%ebp),%edx
4013b9: 83 c2 01 add $0x1,%edx
4013bc: 89 44 24 08 mov %eax,0x8(%esp)
4013c0: 89 54 24 04 mov %edx,0x4(%esp)
4013c4: c7 04 24 24 30 40 00 movl $0x403024,(%esp)
4013cb: e8 50 08 00 00 call 401c20 <_printf>
4013d0: 83 45 f4 01 addl $0x1,-0xc(%ebp)
4013d4: 83 7d f4 01 cmpl $0x1,-0xc(%ebp)
4013d8: 7e 85 jle 40135f <_dojob+0xf>
4013da: b8 00 00 00 00 mov $0x0,%eax
4013df: c9 leave
4013e0: c3 ret
Disassembly of section .data:
00402000 <__data_start__>:
402000: 08 20 or %ah,(%eax)
402002: 40 inc %eax
402003: 00 10 add %dl,(%eax)
402005: 20 40 00 and %al,0x0(%eax)
Disassembly of section .bss:
...
00405008 <_s.1927>:
405008: 00 00 add %al,(%eax)
...
I have two questions:
I don't understand the difference between mov and movl instruction? Why the compiler generate mov for some code, and movl for others?
I completely understand the meaning of the C code, but not the assembly that the compiler generated. Who can make some comments for it for me to understand? I will thank a lot.
The MOVL instruction was generated because you put two int (i and j variables), MOVL will perform a MOV of 32 bits, and integer' size is 32 bits.
a non exhaustive list of all MOV* exist (like MOVD for doubleword or MOVQ for quadword) to allow to optimize your code and use the better expression to gain most time as possible.
PS: may be the -M intel objdump's argument can help you to have a better comprehension of the disassembly, a lot of man on the Intel syntax can may be find easily.

Why does GCC insert a callq at the begain of a function? [duplicate]

I know when using objdump -dr in my file call shows up in machine code as e8 00 00 00 00 because it has not yet been linked. But I need to find out what the 00 00 00 00 will turn into after the linker has done it's job. I know it should calculate the offset, but I'm a little confused about that.
As an example with the code below, after the linker part is done, how should the e8 00 00 00 00 be? And how do I get to that answer?
I'm testing out with this sample code: (I'm trying to call moo)
Disassembly of section .text:
0000000000000000 <foo>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: 89 7d fc mov %edi,-0x4(%rbp)
7: 8b 45 fc mov -0x4(%rbp),%eax
a: 83 e8 0a sub $0xa,%eax
d: 5d pop %rbp
e: c3 retq
000000000000000f <moo>:
f: 55 push %rbp
10: 48 89 e5 mov %rsp,%rbp
13: 89 7d fc mov %edi,-0x4(%rbp)
16: b8 01 00 00 00 mov $0x1,%eax
1b: 5d pop %rbp
1c: c3 retq
000000000000001d <main>:
1d: 55 push %rbp
1e: 48 89 e5 mov %rsp,%rbp
21: 48 83 ec 10 sub $0x10,%rsp
25: c7 45 fc 8e 0c 00 00 movl $0xc8e,-0x4(%rbp)
2c: 8b 45 fc mov -0x4(%rbp),%eax
2f: 89 c7 mov %eax,%edi
31: e8 00 00 00 00 callq 36 <main+0x19>
32: R_X86_64_PC32 moo-0x4
36: 89 45 fc mov %eax,-0x4(%rbp)
39: b8 00 00 00 00 mov $0x0,%eax
3e: c9 leaveq
3f: c3 retq
With objdump -r you have Relocations printed with your disassembly -d:
31: e8 00 00 00 00 callq 36 <main+0x19>
32: R_X86_64_PC32 moo-0x4
ld-linux.so.2 loader will relocate objects (in modern world it will relocate even executable to random address) and fill the relocations with correct address.
Check with gdb by adding breakpoint at main and starting program (linker works before main function is started):
gdb ./program
(gdb) start
(gdb) disassemble main
If you want to compile the code without relocations, show source code and compilation options.
Object files and executable files on several architectures that I know of do not necessarily fix jump destinations at link time.
This is a feature which provides flexibility.
Jump target addresses do not have to be fixed until just before the instruction executes. They do not need to be fixed up at link time—nor even at program start time!
Most systems (Windows, Linux, Unix, VAX/VMS) tag such locations in the object code as an address which needs adjustment. There is additional information about what the target address is, what type of reference it is (such as absolute or relative; 16-bit, 24-bit, 32-bit, 64-bit, etc.).
The zero value there is not necessarily a placeholder, but the base value upon which to evaluate the result. For example, if the instruction were—for whatever reason—call 5+external_address, then there might be 5 (e8 05 00 00 00) in the object code.
If you want to see what the address is at execution time, run the program under a debugger, place a breakpoint at that instruction and then view the instruction just before it executes.
A common anti-virus, security-enhancing feature known as ASLR (address space layout randomization) intentionally loads programs sections at inconsistent addresses to thwart malicious code which alters programs or data. Programs operating in this environment may not have some target addresses assigned until after the program runs a bit.
(Of related interest, VAX/VMS in particular has a complex fixup mode in which an equation describes the operations needed to compute a value. Operations include addition, subtraction, multiplication, division, shifting, rotating, and probably others. I never saw it actually used, but it was interesting to contemplate how one might apply the capability.)
but you clearly know how to do all of this. you know how to disassemble before linking just disassemble after to see how the linker modifies those instructions.
asm(".globl _start; _start: nop\n");
unsigned int foo ( unsigned int x )
{
return(x+5);
}
unsigned int moo ( unsigned int x )
{
return(foo(x)+3);
}
int main ( void )
{
return(moo(3)+2);
}
0000000000000000 <_start>:
0: 90 nop
0000000000000001 <foo>:
1: 55 push %rbp
2: 48 89 e5 mov %rsp,%rbp
5: 89 7d fc mov %edi,-0x4(%rbp)
8: 8b 45 fc mov -0x4(%rbp),%eax
b: 83 c0 05 add $0x5,%eax
e: 5d pop %rbp
f: c3 retq
0000000000000010 <moo>:
10: 55 push %rbp
11: 48 89 e5 mov %rsp,%rbp
14: 48 83 ec 08 sub $0x8,%rsp
18: 89 7d fc mov %edi,-0x4(%rbp)
1b: 8b 45 fc mov -0x4(%rbp),%eax
1e: 89 c7 mov %eax,%edi
20: e8 00 00 00 00 callq 25 <moo+0x15>
25: 83 c0 03 add $0x3,%eax
28: c9 leaveq
29: c3 retq
000000000000002a <main>:
2a: 55 push %rbp
2b: 48 89 e5 mov %rsp,%rbp
2e: bf 03 00 00 00 mov $0x3,%edi
33: e8 00 00 00 00 callq 38 <main+0xe>
38: 83 c0 02 add $0x2,%eax
3b: 5d pop %rbp
3c: c3 retq
0000000000001000 <_start>:
1000: 90 nop
0000000000001001 <foo>:
1001: 55 push %rbp
1002: 48 89 e5 mov %rsp,%rbp
1005: 89 7d fc mov %edi,-0x4(%rbp)
1008: 8b 45 fc mov -0x4(%rbp),%eax
100b: 83 c0 05 add $0x5,%eax
100e: 5d pop %rbp
100f: c3 retq
0000000000001010 <moo>:
1010: 55 push %rbp
1011: 48 89 e5 mov %rsp,%rbp
1014: 48 83 ec 08 sub $0x8,%rsp
1018: 89 7d fc mov %edi,-0x4(%rbp)
101b: 8b 45 fc mov -0x4(%rbp),%eax
101e: 89 c7 mov %eax,%edi
1020: e8 dc ff ff ff callq 1001 <foo>
1025: 83 c0 03 add $0x3,%eax
1028: c9 leaveq
1029: c3 retq
000000000000102a <main>:
102a: 55 push %rbp
102b: 48 89 e5 mov %rsp,%rbp
102e: bf 03 00 00 00 mov $0x3,%edi
1033: e8 d8 ff ff ff callq 1010 <moo>
1038: 83 c0 02 add $0x2,%eax
103b: 5d pop %rbp
103c: c3 retq
for example
20: e8 00 00 00 00 callq 25 <moo+0x15>
1033: e8 d8 ff ff ff callq 1010 <moo>

GDB and opcodes

Coming from a Windows environment, when I do kernel debugging or even in user mode for that matter, I can see the disassembled code in a way that is quite detailed, for example:
80526db2 6824020000 push 224h
80526db7 6808a14d80 push offset nt!ObWatchHandles+0x8dc (804da108)
80526dbc e81f030100 call nt!_SEH_prolog (805370e0)
80526dc1 a140a05480 mov eax,dword ptr [nt!__security_cookie (8054a040)]
The first number is the address quite obviously but the second represent the opcode bytes and that is lacking on GDB or at least, I don't know how to get a similar result.
I usually will do something like this:
(gdb): display /i $pc
But all I get is something like this:
x/i $pc 0x21c4c: pop %eax
I can see what the code bytes are which is sometimes a bit of an issue for me. Is there something I can do with display that could help?
Edit: GDB in question is 6.3.50 on Mac OS X 10.8.3.
I think disassemble /r should give you what you are looking for:
(gdb) help disass
Disassemble a specified section of memory.
Default is the function surrounding the pc of the selected frame.
With a /m modifier, source lines are included (if available).
With a /r modifier, raw instructions in hex are included.
With a single argument, the function surrounding that address is dumped.
Two arguments (separated by a comma) are taken as a range of memory to dump,
in the form of "start,end", or "start,+length".
(gdb) disass /r main
Dump of assembler code for function main:
0x004004f8 <+0>: 55 push %ebp
0x004004f9 <+1>: 48 dec %eax
0x004004fa <+2>: 89 e5 mov %esp,%ebp
0x004004fc <+4>: 48 dec %eax
0x004004fd <+5>: 83 ec 10 sub $0x10,%esp
0x00400500 <+8>: 89 7d fc mov %edi,-0x4(%ebp)
0x00400503 <+11>: 48 dec %eax
0x00400504 <+12>: 89 75 f0 mov %esi,-0x10(%ebp)
0x00400507 <+15>: bf 0c 06 40 00 mov $0x40060c,%edi
0x0040050c <+20>: b8 00 00 00 00 mov $0x0,%eax
0x00400511 <+25>: e8 0a ff ff ff call 0x400420
0x00400516 <+30>: bf 00 00 00 00 mov $0x0,%edi
0x0040051b <+35>: e8 10 ff ff ff call 0x400430
End of assembler dump.
(gdb)
GDB disassemble command documentation
If you use lldb, you can use the -b option to disassemble to get the same effect:
(lldb) disassemble -b -p
Sketch`main + 46 at SKTMain.m:17:
-> 0x10001aa0e: 48 89 c7 movq %rax, %rdi
0x10001aa11: b0 00 movb $0, %al
0x10001aa13: e8 f2 48 00 00 callq 0x10001f30a ; symbol stub for: NSLog
0x10001aa18: 48 8d 35 99 fa 00 00 leaq 64153(%rip), %rsi ; #Sketch`.str3

how to generate a map of instructions when compiling?

when compiling a program with gcc or any other compiler, can I somehow make the compiler generate a map of instructions in memory ??
something like:
0000: First Instruction
0001: Second Instruction
1000: Third Instruction (after a jump for example)
I would like to use these addresses as a pattern to test a design of an instruction cache. I don't care what instructions are compiled or anything like that, just the addresses of these instruction. is this possible?
The easiest way has to be to use objdump on your compiled output. For instance:
$ objdump -d /tmp/test
/tmp/test: file format elf64-x86-64
Disassembly of section .text:
0000000000400410 <_start>:
400410: 31 ed xor %ebp,%ebp
400412: 49 89 d1 mov %rdx,%r9
400415: 5e pop %rsi
400416: 48 89 e2 mov %rsp,%rdx
400419: 48 83 e4 f0 and $0xfffffffffffffff0,%rsp
40041d: 50 push %rax
40041e: 54 push %rsp
40041f: 49 c7 c0 b0 05 40 00 mov $0x4005b0,%r8
400426: 48 c7 c1 20 05 40 00 mov $0x400520,%rcx
40042d: 48 c7 c7 fa 04 40 00 mov $0x4004fa,%rdi
400434: e8 b7 ff ff ff callq 4003f0 <__libc_start_main#plt>
400439: f4 hlt
40043a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
And so on. If you want to only have the addresses, just filter them out with sed or something.

How to create raw binary from assembler "as" command

Is there any way to get raw binary output from "as" command in max os x?
When I assemble some simple assembly file it outputs Mach-O object file with its headers and some symbol information.
I want only the code part of the file.
If it does't have any useful commands or command options, is there any easy way(command) to extract a specified segment from Mach-O file?
Yes, you can use otool after assembling and linking the file to extract the code. otool -t will extract the text (code) segment as ASCII hex. For example:
$ cat hello.S
.cstring
LC0:
.ascii "Hello world!\12\0"
.text
.align 4,0x90
.globl _main
_main:
pushl %ebp
movl %esp, %ebp
subl $24, %esp
movl $LC0, (%esp)
call _puts
xorl %eax, %eax
leave
ret
.subsections_via_symbols
$ gcc -m32 hello.S -o hello
$ otool -t hello
hello:
(__TEXT,__text) section
00001f20 6a 00 89 e5 83 e4 f0 83 ec 10 8b 5d 04 89 1c 24
00001f30 8d 4d 08 89 4c 24 04 83 c3 01 c1 e3 02 01 cb 89
00001f40 5c 24 08 8b 03 83 c3 04 85 c0 75 f7 89 5c 24 0c
00001f50 e8 0b 00 00 00 89 04 24 e8 19 00 00 00 f4 90 90
00001f60 55 89 e5 83 ec 18 c7 04 24 a4 1f 00 00 e8 0a 00
00001f70 00 00 31 c0 c9 c3
$
Note that main actually starts at 00001f60:
$ otool -tv hello
hello:
(__TEXT,__text) section
start:
00001f20 pushl $0x00
00001f22 movl %esp,%ebp
00001f24 andl $0xf0,%esp
00001f27 subl $0x10,%esp
00001f2a movl 0x04(%ebp),%ebx
00001f2d movl %ebx,(%esp)
00001f30 leal 0x08(%ebp),%ecx
00001f33 movl %ecx,0x04(%esp)
00001f37 addl $0x01,%ebx
00001f3a shll $0x02,%ebx
00001f3d addl %ecx,%ebx
00001f3f movl %ebx,0x08(%esp)
00001f43 movl (%ebx),%eax
00001f45 addl $0x04,%ebx
00001f48 testl %eax,%eax
00001f4a jne 0x00001f43
00001f4c movl %ebx,0x0c(%esp)
00001f50 calll 0x00001f60
00001f55 movl %eax,(%esp)
00001f58 calll 0x00001f76
00001f5d hlt
00001f5e nop
00001f5f nop
_main:
00001f60 pushl %ebp
00001f61 movl %esp,%ebp
00001f63 subl $0x18,%esp
00001f66 movl $0x00001fa4,(%esp)
00001f6d calll 0x00001f7c
00001f72 xorl %eax,%eax
00001f74 leave
00001f75 ret
$
No, as assembler translates source code into object files, like most, if not all, assemblers do. Then you have to use linker in order to link one or more object files, possible with runtime libraries, in order to create an executable file.

Resources