Related
I'm compiling a program using the sqlite3 libary in Linux(centos7_64). As the user has an old CPU, I set the -march=nehalem flag in GCC (-march=nehalem -mtune=nehalem -m64 -O3). I find I can't limit the assembly instructions to nehalem, some BMI operations still exist in the final binary.
Follow the output step by step, I find the problem comes from the linker (ld).
libsqlite3.a:
632c2: 66 41 83 4f 26 01 orw $0x1,0x26(%r15)
632c8: 0f b6 84 24 80 00 00 movzbl 0x80(%rsp),%eax
632cf: 00
632d0: c1 e0 08 shl $0x8,%eax
632d3: 89 c2 mov %eax,%edx
632d5: 0f b6 84 24 81 00 00 movzbl 0x81(%rsp),%eax
632dc: 00
632dd: c1 e0 10 shl $0x10,%eax
632e0: 09 d0 or %edx,%eax
632e2: 8d 90 00 fe ff ff lea -0x200(%rax),%edx
632e8: 41 89 47 30 mov %eax,0x30(%r15)
632ec: 81 fa 00 fe 00 00 cmp $0xfe00,%edx
632f2: 0f 87 d1 05 00 00 ja 638c9 <sqlite3BtreeOpen+0xb29>
632f8: 8d 50 ff lea -0x1(%rax),%edx
632fb: 85 c2 test %eax,%edx
632fd: 0f 85 c6 05 00 00 jne 638c9 <sqlite3BtreeOpen+0xb29>
However, in the final binary:
9499f2: 66 41 83 4f 26 01 orw $0x1,0x26(%r15)
9499f8: 0f b6 84 24 80 00 00 movzbl 0x80(%rsp),%eax
9499ff: 00
949a00: 0f b6 94 24 81 00 00 movzbl 0x81(%rsp),%edx
949a07: 00
949a08: c1 e0 08 shl $0x8,%eax
949a0b: 89 c1 mov %eax,%ecx
949a0d: 89 d0 mov %edx,%eax
949a0f: c1 e0 10 shl $0x10,%eax
949a12: 09 c8 or %ecx,%eax
949a14: 8d 90 00 fe ff ff lea -0x200(%rax),%edx
949a1a: 41 89 47 30 mov %eax,0x30(%r15)
949a1e: 81 fa 00 fe 00 00 cmp $0xfe00,%edx
949a24: 0f 87 cf 05 00 00 ja 949ff9 <sqlite3BtreeOpen+0xb09>
949a2a: c4 e2 78 f3 c8 blsr %eax,%eax
949a2f: 85 c0 test %eax,%eax
949a31: 0f 85 c2 05 00 00 jne 949ff9 <sqlite3BtreeOpen+0xb09>
Notice the last few lines, the linker changed the lea to blsr, which is unexpected.
Thus, why will this happen. Will the linker (ld) optimize the code further? How to limit the instrutions for the linker to use?
Many thanks for the comments. I have found the problem, as Peter Cordes said in the comment, I linked to another set of sqlite library. I installed too many sets of GCC compiler environment, and each compiler has its own sqlite in its default library path. My project was managed by cmake, it remembered all previous GCC settings...
Steps to discover:
add -v flag to gcc commands.
copy the ld commands out, and add flag "--print-map -Map=demo.map", run the full ld command again.
search the library name (sqlite here) in demo.map, I clearly find another set of sqlite library has been linked. Realise how stupid I am...
Update: I have a new problem: if the library.a is compiled with an advanced CPU instruction, how to downgrade it in link stage, seems those instructions will be copied into binary without checking the -march flags in GCC.
This is the code I used to test the stack protection feature of gcc.
static inline void charcpy(char* temp)
{
temp[0]='a';
temp[1]='b';
temp[2]='c';
temp[3]='d';
temp[4]='\0';
}
int main()
{
char temp[3];
charcpy(temp);
return 0;
}
When I compile with gcc 7.3 (without specifying any flags), I got the following runtime error on my desktop
*** stack smashing detected ***: <unknown> terminated
Aborted (core dumped)
The uname -a command gives the following for my desktop if it matters
Linux lixun-Desktop 4.15.0-36-generic #39-Ubuntu SMP Mon Sep 24 16:19:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
However, when I tried the same thing on a server machine using gcc 5.4, no error shows up. The uname -a command for the server machine is
Linux aggravation 4.4.0-137-generic #163-Ubuntu SMP Mon Sep 24 13:14:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Then I use objdump -D a.out to check their assembly codes but still cannot figure out why the stack protection is not working on the server machine.
Here is the output on my desktop (I only paste the section I think might matter)
Disassembly of section .init:
0000000000000510 <_init>:
510: 48 83 ec 08 sub $0x8,%rsp
514: 48 8b 05 cd 0a 20 00 mov 0x200acd(%rip),%rax # 200fe8 <__gmon_start__>
51b: 48 85 c0 test %rax,%rax
51e: 74 02 je 522 <_init+0x12>
520: ff d0 callq *%rax
522: 48 83 c4 08 add $0x8,%rsp
526: c3 retq
Disassembly of section .plt:
0000000000000530 <.plt>:
530: ff 35 8a 0a 20 00 pushq 0x200a8a(%rip) # 200fc0 <_GLOBAL_OFFSET_TABLE_+0x8>
536: ff 25 8c 0a 20 00 jmpq *0x200a8c(%rip) # 200fc8 <_GLOBAL_OFFSET_TABLE_+0x10>
53c: 0f 1f 40 00 nopl 0x0(%rax)
0000000000000540 <__stack_chk_fail#plt>:
540: ff 25 8a 0a 20 00 jmpq *0x200a8a(%rip) # 200fd0 <__stack_chk_fail#GLIBC_2.4>
546: 68 00 00 00 00 pushq $0x0
54b: e9 e0 ff ff ff jmpq 530 <.plt>
Disassembly of section .plt.got:
0000000000000550 <__cxa_finalize#plt>:
550: ff 25 a2 0a 20 00 jmpq *0x200aa2(%rip) # 200ff8 <__cxa_finalize#GLIBC_2.2.5>
556: 66 90 xchg %ax,%ax
...
Disassembly of section .text:
0000000000000560 <_start>:
560: 31 ed xor %ebp,%ebp
562: 49 89 d1 mov %rdx,%r9
565: 5e pop %rsi
566: 48 89 e2 mov %rsp,%rdx
569: 48 83 e4 f0 and $0xfffffffffffffff0,%rsp
56d: 50 push %rax
56e: 54 push %rsp
56f: 4c 8d 05 ea 01 00 00 lea 0x1ea(%rip),%r8 # 760 <__libc_csu_fini>
576: 48 8d 0d 73 01 00 00 lea 0x173(%rip),%rcx # 6f0 <__libc_csu_init>
57d: 48 8d 3d 24 01 00 00 lea 0x124(%rip),%rdi # 6a8 <main>
584: ff 15 56 0a 20 00 callq *0x200a56(%rip) # 200fe0 <__libc_start_main#GLIBC_2.2.5>
58a: f4 hlt
58b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
...
000000000000066a <charcpy>:
66a: 55 push %rbp
66b: 48 89 e5 mov %rsp,%rbp
66e: 48 89 7d f8 mov %rdi,-0x8(%rbp)
672: 48 8b 45 f8 mov -0x8(%rbp),%rax
676: c6 00 61 movb $0x61,(%rax)
679: 48 8b 45 f8 mov -0x8(%rbp),%rax
67d: 48 83 c0 01 add $0x1,%rax
681: c6 00 62 movb $0x62,(%rax)
684: 48 8b 45 f8 mov -0x8(%rbp),%rax
688: 48 83 c0 02 add $0x2,%rax
68c: c6 00 63 movb $0x63,(%rax)
68f: 48 8b 45 f8 mov -0x8(%rbp),%rax
693: 48 83 c0 03 add $0x3,%rax
697: c6 00 64 movb $0x64,(%rax)
69a: 48 8b 45 f8 mov -0x8(%rbp),%rax
69e: 48 83 c0 04 add $0x4,%rax
6a2: c6 00 00 movb $0x0,(%rax)
6a5: 90 nop
6a6: 5d pop %rbp
6a7: c3 retq
00000000000006a8 <main>:
6a8: 55 push %rbp
6a9: 48 89 e5 mov %rsp,%rbp
6ac: 48 83 ec 10 sub $0x10,%rsp
6b0: 64 48 8b 04 25 28 00 mov %fs:0x28,%rax
6b7: 00 00
6b9: 48 89 45 f8 mov %rax,-0x8(%rbp)
6bd: 31 c0 xor %eax,%eax
6bf: 48 8d 45 f5 lea -0xb(%rbp),%rax
6c3: 48 89 c7 mov %rax,%rdi
6c6: e8 9f ff ff ff callq 66a <charcpy>
6cb: b8 00 00 00 00 mov $0x0,%eax
6d0: 48 8b 55 f8 mov -0x8(%rbp),%rdx
6d4: 64 48 33 14 25 28 00 xor %fs:0x28,%rdx
6db: 00 00
6dd: 74 05 je 6e4 <main+0x3c>
6df: e8 5c fe ff ff callq 540 <__stack_chk_fail#plt>
6e4: c9 leaveq
6e5: c3 retq
6e6: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
6ed: 00 00 00
And this is the output on the server machine
Disassembly of section .init:
00000000004003f0 <_init>:
4003f0: 48 83 ec 08 sub $0x8,%rsp
4003f4: 48 8b 05 fd 0b 20 00 mov 0x200bfd(%rip),%rax # 600ff8 <_DYNAMIC+0x1d0>
4003fb: 48 85 c0 test %rax,%rax
4003fe: 74 05 je 400405 <_init+0x15>
400400: e8 3b 00 00 00 callq 400440 <__libc_start_main#plt+0x10>
400405: 48 83 c4 08 add $0x8,%rsp
400409: c3 retq
Disassembly of section .plt:
0000000000400410 <__stack_chk_fail#plt-0x10>:
400410: ff 35 f2 0b 20 00 pushq 0x200bf2(%rip) # 601008 <_GLOBAL_OFFSET_TABLE_+0x8>
400416: ff 25 f4 0b 20 00 jmpq *0x200bf4(%rip) # 601010 <_GLOBAL_OFFSET_TABLE_+0x10>
40041c: 0f 1f 40 00 nopl 0x0(%rax)
0000000000400420 <__stack_chk_fail#plt>:
400420: ff 25 f2 0b 20 00 jmpq *0x200bf2(%rip) # 601018 <_GLOBAL_OFFSET_TABLE_+0x18>
400426: 68 00 00 00 00 pushq $0x0
40042b: e9 e0 ff ff ff jmpq 400410 <_init+0x20>
0000000000400430 <__libc_start_main#plt>:
400430: ff 25 ea 0b 20 00 jmpq *0x200bea(%rip) # 601020 <_GLOBAL_OFFSET_TABLE_+0x20>
400436: 68 01 00 00 00 pushq $0x1
40043b: e9 d0 ff ff ff jmpq 400410 <_init+0x20>
Disassembly of section .plt.got:
0000000000400440 <.plt.got>:
400440: ff 25 b2 0b 20 00 jmpq *0x200bb2(%rip) # 600ff8 <_DYNAMIC+0x1d0>
400446: 66 90 xchg %ax,%ax
...
Disassembly of section .text:
0000000000400450 <_start>:
400450: 31 ed xor %ebp,%ebp
400452: 49 89 d1 mov %rdx,%r9
400455: 5e pop %rsi
400456: 48 89 e2 mov %rsp,%rdx
400459: 48 83 e4 f0 and $0xfffffffffffffff0,%rsp
40045d: 50 push %rax
40045e: 54 push %rsp
40045f: 49 c7 c0 40 06 40 00 mov $0x400640,%r8
400466: 48 c7 c1 d0 05 40 00 mov $0x4005d0,%rcx
40046d: 48 c7 c7 84 05 40 00 mov $0x400584,%rdi
400474: e8 b7 ff ff ff callq 400430 <__libc_start_main#plt>
400479: f4 hlt
40047a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
0000000000400546 <charcpy>:
400546: 55 push %rbp
400547: 48 89 e5 mov %rsp,%rbp
40054a: 48 89 7d f8 mov %rdi,-0x8(%rbp)
40054e: 48 8b 45 f8 mov -0x8(%rbp),%rax
400552: c6 00 61 movb $0x61,(%rax)
400555: 48 8b 45 f8 mov -0x8(%rbp),%rax
400559: 48 83 c0 01 add $0x1,%rax
40055d: c6 00 62 movb $0x62,(%rax)
400560: 48 8b 45 f8 mov -0x8(%rbp),%rax
400564: 48 83 c0 02 add $0x2,%rax
400568: c6 00 63 movb $0x63,(%rax)
40056b: 48 8b 45 f8 mov -0x8(%rbp),%rax
40056f: 48 83 c0 03 add $0x3,%rax
400573: c6 00 64 movb $0x64,(%rax)
400576: 48 8b 45 f8 mov -0x8(%rbp),%rax
40057a: 48 83 c0 04 add $0x4,%rax
40057e: c6 00 00 movb $0x0,(%rax)
400581: 90 nop
400582: 5d pop %rbp
400583: c3 retq
0000000000400584 <main>:
400584: 55 push %rbp
400585: 48 89 e5 mov %rsp,%rbp
400588: 48 83 ec 10 sub $0x10,%rsp
40058c: 64 48 8b 04 25 28 00 mov %fs:0x28,%rax
400593: 00 00
400595: 48 89 45 f8 mov %rax,-0x8(%rbp)
400599: 31 c0 xor %eax,%eax
40059b: 48 8d 45 f0 lea -0x10(%rbp),%rax
40059f: 48 89 c7 mov %rax,%rdi
4005a2: e8 9f ff ff ff callq 400546 <charcpy>
4005a7: b8 00 00 00 00 mov $0x0,%eax
4005ac: 48 8b 55 f8 mov -0x8(%rbp),%rdx
4005b0: 64 48 33 14 25 28 00 xor %fs:0x28,%rdx
4005b7: 00 00
4005b9: 74 05 je 4005c0 <main+0x3c>
4005bb: e8 60 fe ff ff callq 400420 <__stack_chk_fail#plt>
4005c0: c9 leaveq
4005c1: c3 retq
4005c2: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
4005c9: 00 00 00
4005cc: 0f 1f 40 00 nopl 0x0(%rax)
...
I also tried to specify -fstack-protector(-all or -strong) on the server machine, it still shows no error.
Anyone knows why there is the difference?
The server machine places the array temp into a different place in the main stack frame. It uses this operation to compute the address, with a 16-byte offset from the frame pointer:
40059b: 48 8d 45 f0 lea -0x10(%rbp),%rax
The other machine uses this instead:
6bf: 48 8d 45 f5 lea -0xb(%rbp),%rax
There is only an 11-byte offset from the frame pointer. The canary is stored at offset 8 in both cases.
As a result, on the server machine, there are 5 unused bytes after the array, and the overflow spills into that. The canary is not overwritten, which is why the overflow is not detected. But neither is the return address, so it is not possible to redirect execution here.
In real-world software, this would be a source-level stack-based buffer overflow which is not exploitable by accident in the compiled binary. Such things happen occasionally.
This question already has an answer here:
Assembly do we need the endings? [duplicate]
(1 answer)
Closed 1 year ago.
Recently, I read some books about computer science. I wrote some C code, and disassembled them, using gcc and objdump.
The following C code:
#include <stdio.h>
#include <stdbool.h>
int dojob()
{
static short num[ ][4] = { {2, 9, -1, 5}, {3, 8, 2, -6}};
static short *pn[ ] = {num[0], num[1]};
static short s[2] = {0, 0};
int i, j;
for (i=0; i<2; i++) {
for (j=0; j<4; j++){
s[i] += *pn[i]++;
}
printf ("sum of line %d: %d\n", i+1, s[i]);
}
return 0;
}
int main ( )
{
dojob();
}
got the following assembly code (AT&T syntex; only assembly of function dojob and some data is list):
00401350 <_dojob>:
401350: 55 push %ebp
401351: 89 e5 mov %esp,%ebp
401353: 83 ec 28 sub $0x28,%esp
401356: c7 45 f4 00 00 00 00 movl $0x0,-0xc(%ebp)
40135d: eb 75 jmp 4013d4 <_dojob+0x84>
40135f: c7 45 f0 00 00 00 00 movl $0x0,-0x10(%ebp)
401366: eb 3c jmp 4013a4 <_dojob+0x54>
401368: 8b 45 f4 mov -0xc(%ebp),%eax
40136b: 8b 04 85 00 20 40 00 mov 0x402000(,%eax,4),%eax
401372: 8d 48 02 lea 0x2(%eax),%ecx
401375: 8b 55 f4 mov -0xc(%ebp),%edx
401378: 89 0c 95 00 20 40 00 mov %ecx,0x402000(,%edx,4)
40137f: 0f b7 10 movzwl (%eax),%edx
401382: 8b 45 f4 mov -0xc(%ebp),%eax
401385: 0f b7 84 00 08 50 40 movzwl 0x405008(%eax,%eax,1),%eax
40138c: 00
40138d: 89 c1 mov %eax,%ecx
40138f: 89 d0 mov %edx,%eax
401391: 01 c8 add %ecx,%eax
401393: 89 c2 mov %eax,%edx
401395: 8b 45 f4 mov -0xc(%ebp),%eax
401398: 66 89 94 00 08 50 40 mov %dx,0x405008(%eax,%eax,1)
40139f: 00
4013a0: 83 45 f0 01 addl $0x1,-0x10(%ebp)
4013a4: 83 7d f0 03 cmpl $0x3,-0x10(%ebp)
4013a8: 7e be jle 401368 <_dojob+0x18>
4013aa: 8b 45 f4 mov -0xc(%ebp),%eax
4013ad: 0f b7 84 00 08 50 40 movzwl 0x405008(%eax,%eax,1),%eax
4013b4: 00
4013b5: 98 cwtl
4013b6: 8b 55 f4 mov -0xc(%ebp),%edx
4013b9: 83 c2 01 add $0x1,%edx
4013bc: 89 44 24 08 mov %eax,0x8(%esp)
4013c0: 89 54 24 04 mov %edx,0x4(%esp)
4013c4: c7 04 24 24 30 40 00 movl $0x403024,(%esp)
4013cb: e8 50 08 00 00 call 401c20 <_printf>
4013d0: 83 45 f4 01 addl $0x1,-0xc(%ebp)
4013d4: 83 7d f4 01 cmpl $0x1,-0xc(%ebp)
4013d8: 7e 85 jle 40135f <_dojob+0xf>
4013da: b8 00 00 00 00 mov $0x0,%eax
4013df: c9 leave
4013e0: c3 ret
Disassembly of section .data:
00402000 <__data_start__>:
402000: 08 20 or %ah,(%eax)
402002: 40 inc %eax
402003: 00 10 add %dl,(%eax)
402005: 20 40 00 and %al,0x0(%eax)
Disassembly of section .bss:
...
00405008 <_s.1927>:
405008: 00 00 add %al,(%eax)
...
I have two questions:
I don't understand the difference between mov and movl instruction? Why the compiler generate mov for some code, and movl for others?
I completely understand the meaning of the C code, but not the assembly that the compiler generated. Who can make some comments for it for me to understand? I will thank a lot.
The MOVL instruction was generated because you put two int (i and j variables), MOVL will perform a MOV of 32 bits, and integer' size is 32 bits.
a non exhaustive list of all MOV* exist (like MOVD for doubleword or MOVQ for quadword) to allow to optimize your code and use the better expression to gain most time as possible.
PS: may be the -M intel objdump's argument can help you to have a better comprehension of the disassembly, a lot of man on the Intel syntax can may be find easily.
I know when using objdump -dr in my file call shows up in machine code as e8 00 00 00 00 because it has not yet been linked. But I need to find out what the 00 00 00 00 will turn into after the linker has done it's job. I know it should calculate the offset, but I'm a little confused about that.
As an example with the code below, after the linker part is done, how should the e8 00 00 00 00 be? And how do I get to that answer?
I'm testing out with this sample code: (I'm trying to call moo)
Disassembly of section .text:
0000000000000000 <foo>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: 89 7d fc mov %edi,-0x4(%rbp)
7: 8b 45 fc mov -0x4(%rbp),%eax
a: 83 e8 0a sub $0xa,%eax
d: 5d pop %rbp
e: c3 retq
000000000000000f <moo>:
f: 55 push %rbp
10: 48 89 e5 mov %rsp,%rbp
13: 89 7d fc mov %edi,-0x4(%rbp)
16: b8 01 00 00 00 mov $0x1,%eax
1b: 5d pop %rbp
1c: c3 retq
000000000000001d <main>:
1d: 55 push %rbp
1e: 48 89 e5 mov %rsp,%rbp
21: 48 83 ec 10 sub $0x10,%rsp
25: c7 45 fc 8e 0c 00 00 movl $0xc8e,-0x4(%rbp)
2c: 8b 45 fc mov -0x4(%rbp),%eax
2f: 89 c7 mov %eax,%edi
31: e8 00 00 00 00 callq 36 <main+0x19>
32: R_X86_64_PC32 moo-0x4
36: 89 45 fc mov %eax,-0x4(%rbp)
39: b8 00 00 00 00 mov $0x0,%eax
3e: c9 leaveq
3f: c3 retq
With objdump -r you have Relocations printed with your disassembly -d:
31: e8 00 00 00 00 callq 36 <main+0x19>
32: R_X86_64_PC32 moo-0x4
ld-linux.so.2 loader will relocate objects (in modern world it will relocate even executable to random address) and fill the relocations with correct address.
Check with gdb by adding breakpoint at main and starting program (linker works before main function is started):
gdb ./program
(gdb) start
(gdb) disassemble main
If you want to compile the code without relocations, show source code and compilation options.
Object files and executable files on several architectures that I know of do not necessarily fix jump destinations at link time.
This is a feature which provides flexibility.
Jump target addresses do not have to be fixed until just before the instruction executes. They do not need to be fixed up at link time—nor even at program start time!
Most systems (Windows, Linux, Unix, VAX/VMS) tag such locations in the object code as an address which needs adjustment. There is additional information about what the target address is, what type of reference it is (such as absolute or relative; 16-bit, 24-bit, 32-bit, 64-bit, etc.).
The zero value there is not necessarily a placeholder, but the base value upon which to evaluate the result. For example, if the instruction were—for whatever reason—call 5+external_address, then there might be 5 (e8 05 00 00 00) in the object code.
If you want to see what the address is at execution time, run the program under a debugger, place a breakpoint at that instruction and then view the instruction just before it executes.
A common anti-virus, security-enhancing feature known as ASLR (address space layout randomization) intentionally loads programs sections at inconsistent addresses to thwart malicious code which alters programs or data. Programs operating in this environment may not have some target addresses assigned until after the program runs a bit.
(Of related interest, VAX/VMS in particular has a complex fixup mode in which an equation describes the operations needed to compute a value. Operations include addition, subtraction, multiplication, division, shifting, rotating, and probably others. I never saw it actually used, but it was interesting to contemplate how one might apply the capability.)
but you clearly know how to do all of this. you know how to disassemble before linking just disassemble after to see how the linker modifies those instructions.
asm(".globl _start; _start: nop\n");
unsigned int foo ( unsigned int x )
{
return(x+5);
}
unsigned int moo ( unsigned int x )
{
return(foo(x)+3);
}
int main ( void )
{
return(moo(3)+2);
}
0000000000000000 <_start>:
0: 90 nop
0000000000000001 <foo>:
1: 55 push %rbp
2: 48 89 e5 mov %rsp,%rbp
5: 89 7d fc mov %edi,-0x4(%rbp)
8: 8b 45 fc mov -0x4(%rbp),%eax
b: 83 c0 05 add $0x5,%eax
e: 5d pop %rbp
f: c3 retq
0000000000000010 <moo>:
10: 55 push %rbp
11: 48 89 e5 mov %rsp,%rbp
14: 48 83 ec 08 sub $0x8,%rsp
18: 89 7d fc mov %edi,-0x4(%rbp)
1b: 8b 45 fc mov -0x4(%rbp),%eax
1e: 89 c7 mov %eax,%edi
20: e8 00 00 00 00 callq 25 <moo+0x15>
25: 83 c0 03 add $0x3,%eax
28: c9 leaveq
29: c3 retq
000000000000002a <main>:
2a: 55 push %rbp
2b: 48 89 e5 mov %rsp,%rbp
2e: bf 03 00 00 00 mov $0x3,%edi
33: e8 00 00 00 00 callq 38 <main+0xe>
38: 83 c0 02 add $0x2,%eax
3b: 5d pop %rbp
3c: c3 retq
0000000000001000 <_start>:
1000: 90 nop
0000000000001001 <foo>:
1001: 55 push %rbp
1002: 48 89 e5 mov %rsp,%rbp
1005: 89 7d fc mov %edi,-0x4(%rbp)
1008: 8b 45 fc mov -0x4(%rbp),%eax
100b: 83 c0 05 add $0x5,%eax
100e: 5d pop %rbp
100f: c3 retq
0000000000001010 <moo>:
1010: 55 push %rbp
1011: 48 89 e5 mov %rsp,%rbp
1014: 48 83 ec 08 sub $0x8,%rsp
1018: 89 7d fc mov %edi,-0x4(%rbp)
101b: 8b 45 fc mov -0x4(%rbp),%eax
101e: 89 c7 mov %eax,%edi
1020: e8 dc ff ff ff callq 1001 <foo>
1025: 83 c0 03 add $0x3,%eax
1028: c9 leaveq
1029: c3 retq
000000000000102a <main>:
102a: 55 push %rbp
102b: 48 89 e5 mov %rsp,%rbp
102e: bf 03 00 00 00 mov $0x3,%edi
1033: e8 d8 ff ff ff callq 1010 <moo>
1038: 83 c0 02 add $0x2,%eax
103b: 5d pop %rbp
103c: c3 retq
for example
20: e8 00 00 00 00 callq 25 <moo+0x15>
1033: e8 d8 ff ff ff callq 1010 <moo>
I have a little hackme where I have to get the password with brute force. In the program is the function usleep(); when I have the right length and it is changing when one letter is right.
It would not be a problem, but the sleep time is about one minute and this is quite a long time.
Is there a way to make the usleep timer faster?
ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs)
Method 1
You can override library functions with a LD_PRELOAD directive.
There's a good tutorial here and here to get you started with this.
Suppose you have the following program code, which is then compiled to a binary elf file.
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h> /* for usleep() */
int main(int argc, char* argv[]) {
printf("Entry point. We'll now wait 10 seconds.\n");
system("date +\"%H:%M:%S\""); //Output time
usleep(10*1000*1000);
printf("Woke up again.\n");
system("date +\"%H:%M:%S\""); //Output time
return 0;
}
Running it normally would give you
root#kali:~/so# gcc -o prog prog.c
root#kali:~/so# ./prog
Entry point. We'll now wait 10 seconds.
20:31:10
Woke up again.
20:31:20
Now write your own version of usleep().
#include <unistd.h>
#include <stdio.h>
int usleep(useconds_t usec){
printf("Nope, you're not sleeping today :)\n");
return 0;
}
Compile it as a shared library.
root#kali:~/so# gcc -Wall -fPIC -shared -o usleep_override.so usleep_override.c
Now preload that library function before executing the original program.
root#kali:~/so# LD_PRELOAD=./usleep_override.so ./prog
Entry point. We'll now wait 10 seconds.
20:35:28
Nope, you're not sleeping today :)
Woke up again.
20:35:28
As you can see when looking at the date output, it executed the hooked function instead of the original and then immediatly returned.
Method 2
Modify the binary. In particular, modify the instructions sothat the usleep() function is not executed.
When we dump the instructions of the main() function of prog with objdump, we get:
root#kali:~/so# objdump -d -Mintel prog | grep -A20 "<main>"
0000000000400596 <main>:
400596: 55 push rbp
400597: 48 89 e5 mov rbp,rsp
40059a: 48 83 ec 10 sub rsp,0x10
40059e: 89 7d fc mov DWORD PTR [rbp-0x4],edi
4005a1: 48 89 75 f0 mov QWORD PTR [rbp-0x10],rsi
4005a5: bf 68 06 40 00 mov edi,0x400668
4005aa: e8 a1 fe ff ff call 400450 <puts#plt>
4005af: bf 90 06 40 00 mov edi,0x400690
4005b4: e8 a7 fe ff ff call 400460 <system#plt>
4005b9: bf 80 96 98 00 mov edi,0x989680
4005be: e8 cd fe ff ff call 400490 <usleep#plt>
4005c3: bf a2 06 40 00 mov edi,0x4006a2
4005c8: e8 83 fe ff ff call 400450 <puts#plt>
4005cd: bf 90 06 40 00 mov edi,0x400690
4005d2: e8 89 fe ff ff call 400460 <system#plt>
4005d7: b8 00 00 00 00 mov eax,0x0
4005dc: c9 leave
4005dd: c3 ret
4005de: 66 90 xchg ax,ax
We can see the offending lines that are responsible for the usleep(10*1000*1000) call:
4005b9: bf 80 96 98 00 mov edi,0x989680
4005be: e8 cd fe ff ff call 400490 <usleep#plt>
Since 0x989680 equals 10000000 in decimal, we can deduce that this is the argument for the usleep() function. So, we can just modify the binary (search for the byte sequence bf 80 96 98 00 e8 cd fe ff ff) and instead just put the 0x90 there for a NOP instruction, which does nothing.
Before and after:
When we now dump the instructions:
root#kali:~/so# objdump -d -Mintel prog_cracked | grep -A28 "<main>"
0000000000400596 <main>:
400596: 55 push rbp
400597: 48 89 e5 mov rbp,rsp
40059a: 48 83 ec 10 sub rsp,0x10
40059e: 89 7d fc mov DWORD PTR [rbp-0x4],edi
4005a1: 48 89 75 f0 mov QWORD PTR [rbp-0x10],rsi
4005a5: bf 68 06 40 00 mov edi,0x400668
4005aa: e8 a1 fe ff ff call 400450 <puts#plt>
4005af: bf 90 06 40 00 mov edi,0x400690
4005b4: e8 a7 fe ff ff call 400460 <system#plt>
4005b9: 90 nop
4005ba: 90 nop
4005bb: 90 nop
4005bc: 90 nop
4005bd: 90 nop
4005be: 90 nop
4005bf: 90 nop
4005c0: 90 nop
4005c1: 90 nop
4005c2: 90 nop
4005c3: bf a2 06 40 00 mov edi,0x4006a2
4005c8: e8 83 fe ff ff call 400450 <puts#plt>
4005cd: bf 90 06 40 00 mov edi,0x400690
4005d2: e8 89 fe ff ff call 400460 <system#plt>
4005d7: b8 00 00 00 00 mov eax,0x0
4005dc: c9 leave
4005dd: c3 ret
4005de: 66 90 xchg ax,ax
Nice, the call is gone. Run and we get:
root#kali:~/so# chmod +x prog_cracked
root#kali:~/so# ./prog_cracked
Entry point. We'll now wait 10 seconds.
21:11:18
Woke up again.
21:11:18
And thus, the program is "cracked" again.