Can not understand how my shellcode works. Shellcode for windows OS( not Linux!) to open calc.exe - windows

I’ve got a shellcode. It opens calculator in my buffer overflow program.
0: eb 16 jmp 0x18
2: 5b pop ebx
3: 31 c0 xor eax,eax
5: 50 push eax
6: 53 push ebx
7: bb 4d 11 86 7c mov ebx,0x7c86114d
c: ff d3 call ebx
e: 31 c0 xor eax,eax
10: 50 push eax
11: bb ea cd 81 7c mov ebx,0x7c81cdea
16: ff d3 call ebx
18: e8 e5 ff ff ff call 0x2
1d: 63 61 6c arpl WORD PTR [ecx+0x6c],sp
20: 63 2e arpl WORD PTR [esi],bp
22: 65 78 65 gs js 0x8a
25: 00 90 90 90 90 90 add BYTE PTR [eax-0x6f6f6f70],dl
2b: 90 nop
2c: 90 nop
2d: 90 nop
2e: 90 nop
2f: 90 nop
Apart from the main question being “What does this shellcode do line by line”, I am particularly interested in:
The jmp operation, why and where does my program jump?
The arpl stuff, I see it for the first time and google does not help me much... Same with GS operation

The jmp 0x18 is a relative jump to offset 0x18, which is practically the end of your code. It then calls address 0x2 (again, relative). This call places the "return address" on the stack, so it could be popped from it, giving you a clue about the address in which this relative shellcode is being executed. And indeed, the pop ebx at offset 0x2 is getting the address from the stack.
I said that 0x18 is the end of the code, because the lines after it are data bytes and not asm opcodes. This is why you see arpl. If you look at the hex values of the bytes, you will see:
calc.exe\0 ==> 0x63 0x61 0x63 0x6c 0x2e 0x65 0x78 0x65 0x00
Edited:
The full flow of the shellcode is:
jmp 0x18 - jump to to the last code instruction of the shellcode
call 0x2 - returns to offset 2, and stores the address of offset 0x1D on the stack
pop ebx - ebx := address from the stack, which is the address of the string "calc.exe"
xor eax,eax - common opcode to zero the register: eax := 0
push eax - push the value 0 as the second argument for a future function call
push ebx - push the pointer to "calc.exe" as the first argument for a future function call
mov ebx,0x7c86114d - ebx will be a fixed address (probably WinExec)
call ebx - call the function: WinExec("calc.exe", 0)
xor eax,eax - again, eax := 0
push eax - push the value 0 as the first argument for a future function call
mov ebx,0x7c81cdea - ebx will be a fixed address (probably exit)
call ebx - call the function: exit(0)

Related

Errata in "Practical Reverse Engineering"?

I've just started the book Practical Reverse Engineering by Bruce Dang et alia, and am confused about a portion of the "walk-through" at the end of chapter one. This is the relevant portion of code:
65: ...
66: loc_10001d16:
67: mov eax, [ebp-118h]
68: mov ecx, [ebp-128h]
69: jmp short loc_10001d2a (line 73)
70: loc_10001d24:
71: mov eax, [ebp+0ch]
72: mov ecx, [ebp+0ch]
73: loc_10001d2a:
74: cmp eax, ecx
75: pop esi
76: jnz short loc_10001D38 (line 82)
77: xor eax, eax
78: pop edi
79: mov esp, ebp
80: pop ebp
81: retn 0ch
82: ...
And the authors' commentary:
"After the loop exits, execution resumes at line 66. Lines 67–68 save the matching PROCESSENTRY32’s th32ParentProcessID/th32ProcessID in EAX/ECX and
continue execution at 73. Notice that Line 66 is also a jump target in line 43.
Lines 70–74 read the fdwReason parameter of DllMain (EBP+C) and check
whether it is 0 (DLL_PROCESS_DETACH). If it is, the return value is set to 0 and
it returns; otherwise, it goes to line 82."
This is not how I interpreted the code when reading it; surely any jump to loc_10001d24 (line 70) will cause the function to terminate with return value 0 unconditionally, and not only if the value at ebp+0x0c is 0? (I assume that poping into esi does not affect the eflags register, and that the jump in line 76 conditions on the result of cmp eax, ecx in line 74?) This is also consistent with earlier portions in the code, which jump to loc_10001d24 if various called functions return with values indicating failure.
In addition, I thought the point of the section starting at line 66 was to also return with value 0 if PROCESSENTRY32 (a structure defined earlier, starting at position ebp-0x130 in memory) has equal th32ParentProcessID (ebp-0x118 in memory) and th32ProcessID (ebp-0x128 in memory) entries; is this correct? The authors' commentary did not seem to indicate this.
As a more general question, even just chapter 1 of the book has seemed to have had quite a large number of typos; does anyone know of a webpage collecting errata from the book anywhere?
Yes, ECX and EAX are both loaded from the same memory location, so unless something else has a pointer to it and is changing it asynchronously, cmp x,x / jne will always be not-taken. Unlike floating-point, ever possible integer is equal to itself.
And you're correct, pop doesn't change EFLAGS, as per Intel's manuals: https://www.felixcloutier.com/x86/pop.
To check whether a memory location is zero, you can load it into a reg for test eax,eax / jnz
or cmp dword ptr [ebp + 0xc], 0 / jne.
(JNE and JNZ are the same instruction; the different mnemonics let you express the semantic meanings of equality or directly being zero based on ZF being set according to the value itself.)
Lines 70–74 read the fdwReason parameter of DllMain (EBP+C) and check whether it is 0 (DLL_PROCESS_DETACH)
This is bogus. If the book is full of stuff like that, that doesn't sound like a good book.
The cmp eax,ecx only makes any sense when reached from the path that loaded 2 different values. (And couldn't use test for that, x & y != 0 doesn't tell you whether they were equal.) This seems unlikely to be real compiler output.
This is the full listing. It's part of malware found in the wild:
01: ; BOOL __stdcall DllMain(HINSTANCE hinstDLL, DWORD fdwReason, LPVOID lpvReserved)
02: _DllMain#12 proc near
03: 55 push ebp
04: 8B EC mov ebp, esp
05: 81 EC 30 01 00+ sub esp, 130h
06: 57 push edi
07: 0F 01 4D F8 sidt fword ptr [ebp-8]
08: 8B 45 FA mov eax, [ebp-6]
09: 3D 00 F4 03 80 cmp eax, 8003F400h
10: 76 10 jbe short loc_10001C88 (line 18)
11: 3D 00 74 04 80 cmp eax, 80047400h
12: 73 09 jnb short loc_10001C88 (line 18)
13: 33 C0 xor eax, eax
14: 5F pop edi
15: 8B E5 mov esp, ebp
16: 5D pop ebp
17: C2 0C 00 retn 0Ch
18: loc_10001C88:
19: 33 C0 xor eax, eax
20: B9 49 00 00 00 mov ecx, 49h
21: 8D BD D4 FE FF+ lea edi, [ebp-12Ch]
22: C7 85 D0 FE FF+ mov dword ptr [ebp-130h], 0
23: 50 push eax
24: 6A 02 push 2
25: F3 AB rep stosd
26: E8 2D 2F 00 00 call CreateToolhelp32Snapshot
27: 8B F8 mov edi, eax
28: 83 FF FF cmp edi, 0FFFFFFFFh
29: 75 09 jnz short loc_10001CB9 (line 35)
30: 33 C0 xor eax, eax
31: 5F pop edi
32: 8B E5 mov esp, ebp
33: 5D pop ebp
34: C2 0C 00 retn 0Ch
35: loc_10001CB9:
36: 8D 85 D0 FE FF+ lea eax, [ebp-130h]
37: 56 push esi
38: 50 push eax
39: 57 push edi
40: C7 85 D0 FE FF+ mov dword ptr [ebp-130h], 128h
41: E8 FF 2E 00 00 call Process32First
42: 85 C0 test eax, eax
43: 74 4F jz short loc_10001D24 (line 70)
44: 8B 35 C0 50 00+ mov esi, ds:_stricmp
45: 8D 8D F4 FE FF+ lea ecx, [ebp-10Ch]
46: 68 50 7C 00 10 push 10007C50h
47: 51 push ecx
48: FF D6 call esi
49: 83 C4 08 add esp, 8
50: 85 C0 test eax, eax
51: 74 26 jz short loc_10001D16 (line 66)
52: loc_10001CF0:
53: 8D 95 D0 FE FF+ lea edx, [ebp-130h]
54: 52 push edx
55: 57 push edi
56: E8 CD 2E 00 00 call Process32Next
57: 85 C0 test eax, eax
58: 74 23 jz short loc_10001D24 (line 70)
59: 8D 85 F4 FE FF+ lea eax, [ebp-10Ch]
60: 68 50 7C 00 10 push 10007C50h
61: 50 push eax
62: FF D6 call esi
63: 83 C4 08 add esp, 8
64: 85 C0 test eax, eax
65: 75 DA jnz short loc_10001CF0 (line 52)
66: loc_10001D16:
67: 8B 85 E8 FE FF+ mov eax, [ebp-118h]
68: 8B 8D D8 FE FF+ mov ecx, [ebp-128h]
69: EB 06 jmp short loc_10001D2A (line 73)
70: loc_10001D24:
71: 8B 45 0C mov eax, [ebp+0Ch]
72: 8B 4D 0C mov ecx, [ebp+0Ch]
73: loc_10001D2A:
74: 3B C1 cmp eax, ecx
75: 5E pop esi
76: 75 09 jnz short loc_10001D38 (line 82)
77: 33 C0 xor eax, eax
78: 5F pop edi
79: 8B E5 mov esp, ebp
80: 5D pop ebp
81: C2 0C 00 retn 0Ch
82: loc_10001D38:
83: 8B 45 0C mov eax, [ebp+0Ch]
84: 48 dec eax
85: 75 15 jnz short loc_10001D53 (line 93)
86: 6A 00 push 0
87: 6A 00 push 0
88: 6A 00 push 0
89: 68 D0 32 00 10 push 100032D0h
90: 6A 00 push 0
91: 6A 00 push 0
92: FF 15 20 50 00+ call ds:CreateThread
93: loc_10001D53:
94: B8 01 00 00 00 mov eax, 1
95: 5F pop edi
96: 8B E5 mov esp, ebp
97: 5D pop ebp
98: C2 0C 00 retn 0Ch
99: _DllMain#12 endp
So lines 70-74 make no sense on their own, but do serve the original purpose - if either Process32First()/Process32Next() returns FALSE then the code jumps here and eventually exits with 0.
And if the desired process was found then eax/ecx are set to ParentProcessID/ProcessID respectively so the function will continue.
Anyway, there's also lines 83-85 which the books states:
...with lpStartAddress as 0x100032D0. This block can be decompiled as follows:
if (fdwReason == DLL_PROCESS_DETACH) { return FALSE; }
if (fdwReason == DLL_THREAD_ATTACH || fdwReason == DLL_THREAD_DETACH) { return TRUE; }
CreateThread(0, 0, (LPTHREAD_START_ROUTINE) 0x100032D0, 0, 0, 0);
return TRUE;
Lines 83-85 actually check if fdwReason equals DLL_PROCESS_ATTACH or not (bypassing the call to CreateThread if not, which makes perfect sense), and there's no special case for DLL_PROCESS_DETACH.
I'll say that the book certainly lacks proper structure, some things the book takes for granted, other maybe mundane things are emphasizes. Still a very good resource.
Oh well, who said this was easy.

How can I insert repeated NOP statements using Visual C++'s inline assembler?

Visual C++, using Microsoft's compiler, allows us to define inline assembly code using:
__asm {
nop
}
What I need is a macro that makes possible to multiply such instruction n times like:
ASM_EMIT_MULT(op, times)
for example:
ASM_EMIT_MULT(0x90, 160)
Is that possible? How could I do this?
With MASM, this is very simple to do. Part of the installation is a file named listing.inc (since everyone gets MASM as part of Visual Studio now, this will be located in your Visual Studio root directory/VC/include). This file defines a series of npad macros that take a single size argument and expand to an appropriate sequence of non-destructive "padding" opcodes. If you only need one byte of padding, you use the obvious nop instruction. But rather than using a long series of nops until you reach the desired length, Intel actually recommends other non-destructive opcodes of the appropriate length, as do other vendors. These pre-defined npad macros free you from having to memorize that table, not to mention making the code much more readable.
Unfortunately, inline assembly is not a full-featured assembler. There are a lot of things missing that you would expect to find in real assemblers like MASM. Macros (MACRO) and repeats (REPEAT/REPT) are among the things that are missing.
However, ALIGN directives are available in inline assembly. These will generate the required number of nops or other non-destructive opcodes to enforce alignment of the next instruction. Using this is drop-dead simple. Here is a very stupid example, where I've taken working code and peppered it with random aligns:
unsigned long CountDigits(unsigned long value)
{
__asm
{
mov edx, DWORD PTR [value]
bsr eax, edx
align 4
xor eax, 1073741792
mov eax, DWORD PTR [4 * eax + kMaxDigits+132]
align 16
cmp edx, DWORD PTR [4 * eax + kPowers-4]
sbb eax, 0
align 8
}
}
This generates the following output (MSVC's assembly listings use npad x, where x is the number of bytes, just as you'd write it in MASM):
PUBLIC CountDigits
_TEXT SEGMENT
_value$ = 8
CountDigits PROC
00000 8b 54 24 04 mov edx, DWORD PTR _value$[esp-4]
00004 0f bd c2 bsr eax, edx
00007 90 npad 1 ;// enforcing the "align 4"
00008 35 e0 ff ff 3f xor eax, 1073741792
0000d 8b 04 85 84 00
00 00 mov eax, DWORD PTR _kMaxDigits[eax*4+132]
00014 eb 0a 8d a4 24
00 00 00 00 8d
49 00 npad 12 ;// enforcing the "align 16"
00020 3b 14 85 fc ff
ff ff cmp edx, DWORD PTR _kPowers[eax*4-4]
00027 83 d8 00 sbb eax, 0
0002a 8d 9b 00 00 00
00 npad 6 ;// enforcing the "align 8"
00030 c2 04 00 ret 4
CountDigits ENDP
_TEXT ENDS
If you aren't actually wanting to enforce alignment, but just want to insert an arbitrary number of nops (perhaps as filler for later hot-patching?), then you can use C macros to simulate the effect:
#define NOP1 __asm { nop }
#define NOP2 NOP1 NOP1
#define NOP4 NOP2 NOP2
#define NOP8 NOP4 NOP4
#define NOP16 NOP8 NOP8
// ...
#define NOP64 NOP16 NOP16 NOP16 NOP16
// ...etc.
And then pepper your code as desired:
unsigned long CountDigits(unsigned long value)
{
__asm
{
mov edx, DWORD PTR [value]
bsr eax, edx
NOP8
xor eax, 1073741792
mov eax, DWORD PTR [4 * eax + kMaxDigits+132]
NOP4
cmp edx, DWORD PTR [4 * eax + kPowers-4]
sbb eax, 0
}
}
to produce the following output:
PUBLIC CountDigits
_TEXT SEGMENT
_value$ = 8
CountDigits PROC
00000 8b 54 24 04 mov edx, DWORD PTR _value$[esp-4]
00004 0f bd c2 bsr eax, edx
00007 90 npad 1 ;// these are, of course, just good old NOPs
00008 90 npad 1
00009 90 npad 1
0000a 90 npad 1
0000b 90 npad 1
0000c 90 npad 1
0000d 90 npad 1
0000e 90 npad 1
0000f 35 e0 ff ff 3f xor eax, 1073741792
00014 8b 04 85 84 00
00 00 mov eax, DWORD PTR _kMaxDigits[eax*4+132]
0001b 90 npad 1
0001c 90 npad 1
0001d 90 npad 1
0001e 90 npad 1
0001f 3b 14 85 fc ff
ff ff cmp edx, DWORD PTR _kPowers[eax*4-4]
00026 83 d8 00 sbb eax, 0
00029 c2 04 00 ret 4
CountDigits ENDP
_TEXT ENDS
Or, even cooler, we can use a bit of template meta-programming magic to get the same effect in style. Just define the following template function and its specialization (important to prevent infinite recursion):
template <size_t N> __forceinline void npad()
{
npad<N-1>();
__asm { nop }
}
template <> __forceinline void npad<0>() { }
And use it like this:
unsigned long CountDigits(unsigned long value)
{
__asm
{
mov edx, DWORD PTR [value]
bsr eax, edx
}
npad<8>();
__asm
{
xor eax, 1073741792
mov eax, DWORD PTR [4 * eax + kMaxDigits+132]
}
npad<4>();
__asm
{
cmp edx, DWORD PTR [4 * eax + kPowers-4]
sbb eax, 0
}
}
That'll produce the desired output (exactly the same as the one just above) in all optimized builds—whether you optimize for size (/O1) or speed (/O2)—…but not in debugging builds. If you need it in debug builds, you'll have to resort to the C macros. :-(
Base on Cody Gray Answer and code example for metaprogramming using template recursion and inline or forceinline as stated on the code before
template <size_t N> __forceinline void npad()
{
npad<N-1>();
__asm { nop }
}
template <> __forceinline void npad<0>() { }
It won't work on visual studio, without setting some options and is not a guarantee it will work
Although __forceinline is a stronger indication to the compiler than
__inline, inlining is still performed at the compiler's discretion, but no heuristics are used to determine the benefits from inlining this function.
You can read more about this here https://learn.microsoft.com/en-us/cpp/error-messages/compiler-warnings/compiler-warning-level-4-c4714?view=vs-2019

ASM algorithm decoding

I am trying to understand this problem that is in ASM. Here is the code:
45 33 C9 xor r9d, r9d
C7 44 24 18 50 72 69 6D mov [rsp+arg_10], 6D697250h
66 C7 44 24 1C 65 53 mov [rsp+arg_14], 5365h
C6 44 24 1E 6F mov [rsp+arg_16], 6Fh
4C 63 C1 movsxd r8, ecx
85 C9 test ecx, ecx
7E 1C jle short locret_140001342
41 8B C9 mov ecx, r9d
loc_140001329:
48 83 F9 07 cmp rcx, 7
49 0F 4D C9 cmovge rcx, r9
48 FF C1 inc rcx
8A 44 0C 17 mov al, [rsp+rcx+arg_F]
30 02 xor [rdx], al
48 FF C2 inc rdx
49 FF C8 dec r8
75 E7 jnz short loc_140001329
locret_140001342:
C3 retn
And here is the encoded text:
07 1D 1E 41 45 2A 00 25 52 0D 04 01 73 06
24 53 49 39 0D 36 4F 35 1F 08 04 09 73 0E
34 16 1B 08 16 20 4F 39 01 49 4A 54 3D 1B
35 00 07 5C 53 0C 08 1E 38 11 2A 30 13 1F
22 1B 04 08 16 3C 41 33 1D 04 4A
I've been studying ASM for some time now and I know what most of the commands but I still have some questions I have not found the answer to.
How do i plug the encoded text into the algorithm?
What are arg_10, arg_14, etc? I assume they are from the encoded part but I dont know exatcly.
Could someone go line by line what this algorithm does, I understand some of it but I need some clarification.
I have been using visual studio and c++ to test asm. I do know that to run an asm procedure you can declare a function like this
extern "C" int function(int a, int b, int c,int d, int f, int g);
and use it like this
printf("ASM Returned %d", function(92,2,3,4,5,6));
I am also aware that the first four parameters go into int RCX, RDX, R8, and R9 and the rest are on the stack. I don't know much about the stack so I do not know how to access them right now. I also know that the returned value is the value contained by RAX. So a something like this would add two numbers:
xor eax, eax
mov eax, ecx
add eax, edx
ret
So as Jester suggested, I will go line by line explaining what I think the code does.
xor r9d, r9d //xor on r9d (clears the register)
mov [rsp+arg_10], 6D697250h //moves 6D697250 to the address pointed at by rsp + arg_10
mov [rsp+arg_14], 5365h //moves 5365 to the adress pointed at by rsp+arg_14
mov [rsp+arg_16], 6Fh //moves 6F to the adress pointed at by rsp+arg_16
movsxd r8, ecx //moves ecx, to r8 and sign extends it since exc is 32 bit and r8 is 64 bit
test ecx, ecx //tests exc and sets the labels
jle short locret_140001342 //jumps to ret if ecx is zero or less
mov ecx, r9d //moves the lower 32 bits or r9 into ecx
loc_140001329: //label used by jump commands
cmp rcx, 7 //moves 7(decimal) into rcx
cmovge rcx, r9 //don't know
inc rcx //increases rcx by 1
mov al, [rsp + rcx + arg_F] //moves the the value at adress [rsp + rcx + arg_F] into al,
//this is probably the key step as al is 1 byte and each character is also one byte, it is also the rax register so it holds the value to be returned
xor [rdx], al //xor on the value at address [rdx] and al, stores the result at the address of [rdx]
inc rdx //increase rdx by 1
dec r8 //decrease r8 by 1
jnz short loc_140001329 //if r8 is not zero jump back to loc_140...
//this essentially is a while loop until r8 reaches 0 (assuming it starts as positive)
locret_140001342:
ret
I still don't know what the arg_xx are or how exactly is the encoded text plugged into this algorithm.
Here is my take on the code.
; rdx holds the message location
; ecx holds the message length
xor r9d, r9d ; r9d = 0
mov [rsp+arg_10], 6D697250h ; fix up the key
mov [rsp+arg_14], 5365h
mov [rsp+arg_16], 6Fh ; which is "PrimeSo"
movsxd r8, ecx ; length counter
test ecx, ecx ; test the message length
jle short locret_140001342 ; skip if invalid length
mov ecx, r9d ; reset key index to 0
loc_140001329:
cmp rcx, 7 ; check indexing of key
cmovge rcx, r9 ; reset if o/range
inc rcx ; obfusacte by incrementing first
mov al, [rsp+rcx+arg_F] ; ... and indexing wrong offset
xor [rdx], al ; encrypt the message byte
inc rdx ; advance message pointer
dec r8 ; loop count
jnz short loc_140001329 ; next message byte
locret_140001342:
retn
I decoded the message with a C program implementing the algorithm, but that would be too easy, so I won't post it.
Reverse engineering
The code does not contain enough information to solve it top-down, because some registers are used without being loaded, and labels are not defined. I solved it bottom-up, by identifying the instruction that does the encryption, and working out from there.
Although the stack labels are not defined, the nomenclature is enough of a clue to show that the parts of the key are actually consecutive, and the assumption of little-endian reveals the key. This is confirmed looking at the hex byte tabulation, which shows the three values being stored at offsets' lsb of 18, 1C and 1E
I think your understanding is largely correct, a few minor corrections:
Correction 1
test ecx, ecx //tests exc and sets the labels
This sets the flags (not the labels).
Correction 2
cmp rcx, 7 //moves 7(decimal) into rcx
This compares rcx to the immediate value 7, and sets the flags accordingly. (i.e. after this instruction a conditional instruction such as gt will only execute if rcx was greater than 7.)
Correction 3
cmovge rcx, r9 //don't know
This conditionally (based on the flags you have just set) moves r9 into rcx. The condition is ge, so this instruction only executes if rcx was greater than or equal to 7. r9 contains 0, so the effect of this is to set rcx back to 0 when it reaches 7.
Parameters
You are not given information on the parameters to the function, but it seems safe to assume that rcx is the original length of the data to be decrypted, and rdx is a pointer to the data.
one thing I noticed is that the values being stored at those stack offsets are ASCII:
>>> '5072696d65536f'.decode('hex')
'PrimeSo'
as for entering the data, you could use xxd -r -p and read it from stdin in the program: xxd -r -p data.hex | ./myprog
those arg_14 etc. offsets have to be declared somewhere in the sources. but I would guess they're hex offsets 0xf, 0x10, 0x14, 0x16.
Ok i have figured out the algorithm and have made it work in ASM as well. You guys were right, the arg_xx were offsets. arg_10 == 0x10, arg_f == 0x0f. The data is passed in as an array with the length of it. So rcx will be the data length in this case 47, and rdx will point to the beginning of the array. Here is the function I used in c++ to call the ASM procedure.
extern "C" void function(int length, char* message);
The algorithm is pretty simple. The key phrase is "PrimeSo". All it does is do a XOR operation on each value passed in with one of the values in "PrimeSo" in increasing order, once it reaches the 'o' in "PrimeSo" it goes back to 'P'. Hence
cmp rcx, 7
cmovge rcx, r9 //as Peter de Rivaz stated this will put 0 into rcx if it is greater or equal to seven
inc rcx
and so
mov al, [rsp + rcx + 0Fh]
will effectively become [rsp + 1 + 0fh], [rsp + 2 + 0Fh], ..., [rsp + 7 + 0Fh]. Note that "PrimeSo" was stored at [rsp + 10h] meaning that [rsp + 1 + 0Fh] points to 'P'. In each iteration of the loop, al will become one of the characters in "PrimeSo" and it will cycle through them.
xor [rdx], al //This will do an xor operation on [rdx](begining of our message) and al wich is 'P' in the first loop.
//It will then store the result in it's place.
inc rdx //move to next character
dec r8 //decrease counter
jnz short loc_140001329 //and start the loop again
With that being said lets look at the first few ones.
xor P, 07 == xor 50, 07 --> 57 = W
xor r, 1D == xor 72, 1D --> 6F = o
xor i, 1E == xor 69, 1E --> 77 = w
xor m, 41 == xor 6D, 41 --> 2C = ,
For those wondering here is the C++ code:
#include <fstream>
extern "C" void function(int length, char* message);
int main()
{
char message[] = { 0x07, 0x1D, 0x1E, 0x41, 0x45, 0x2A, 0x00, 0x25, 0x52, 0x0D, 0x04, 0x01, 0x73, 0x06, 0x24, 0x53, 0x49, 0x39, 0x0D, 0x36, 0x4F, 0x35, 0x1F, 0x08, 0x04, 0x09, 0x73, 0x0E, 0x34, 0x16, 0x1B, 0x08, 0x16, 0x20, 0x4F, 0x39, 0x01, 0x49, 0x4A, 0x54, 0x3D, 0x1B, 0x35, 0x00, 0x07, 0x5C, 0x53, 0x0C, 0x08, 0x1E, 0x38, 0x11, 0x2A, 0x30, 0x13, 0x1F, 0x22, 0x1B, 0x04, 0x08, 0x16, 0x3C, 0x41, 0x33, 0x1D, 0x04, 0x4A, '\0'};
function(sizeof(message) - 1, message);
printf("Decoded Message is:\n%s\n", message);
printf("\n");
system("pause");
return 0;
}
No I did not manually insert the data into message. Also note that I added a string terminator at the end and used sizeof(message) - 1 to avoid decoding the string terminator.
Here is the ASM code, this is simply a new file called assembly.asm and has this in it.
.code
function proc
xor r9d, r9d
mov dword ptr [rsp + 18h], 6D697250h
mov word ptr [rsp + 1Ch], 5365h
mov byte ptr [rsp + 1Eh], 6Fh
movsxd r8, ecx
test ecx, ecx
jle short locret_140001342
mov ecx, r9d
loc_140001329:
cmp rcx, 7
cmovge rcx, r9
inc rcx
mov al, [rsp + rcx + 17h]
xor [rdx], al
inc rdx
dec r8
jnz short loc_140001329
locret_140001342:
ret
function endp
end
In visual studio, you can add a breakpoint in here and go to debug->windows->registers and debug->windows->memory-memory 1 to see the registers and the program's memory. Note that rcx will contain the count, and rdx will point to the beginning of the encoded message.
Thank you all for your help and suggestions, I couldn't of done it without you.

_InterlockedIncrement intrinsic implementation

Visual Studio produces the following machine code when _InterlockedIncrement is used:
; 40 : _InterlockedIncrement(&framecounter);
00078 b8 00 00 00 00 mov eax, OFFSET ?framecounter##3JA ; framecounter
0007d b9 01 00 00 00 mov ecx, 1
00082 f0 0f c1 08 lock xadd DWORD PTR [eax], ecx
If I would be writing this i would use just lock inc DWORD PTR [eax] instead of mov and xadd
Is there a valid reason why Microsoft preferred xadd and using 2 instructions instead of 1?
Because _InterlockedIncrement also returns the new value.
You can't do that with lock inc DWORD PTR [eax], because now neither the old nor the new value are anywhere to be found. Except in memory, but if you do an other read, clearly it won't be atomic (the increment itself would be, but you could get a value back that has nothing to do with what happened at the time of the increment).
Returning the value makes _InterlockedIncrement more useful.

WinExec Return 0x21, But what exactly it means?

When call WinExec to run a .exe, I get return value 0x21.
According to MSDN, a return value greater than 31 (0x1F) means function succeeds.
But what does it mean of 0x21, Why it didn't return other value to me?
It is not useful for you to know what it means. That is an implementation detail. Even if you knew what it meant for this version, the meaning might change in the next version. As a programmer, you are concerned only with programming against the interface, not the underlying implementation.
However, if you are really interested, I will take you through the approach I would take to reverse engineer the function. On my system, WinExec is disassembled to this:
764F2C21 > 8BFF MOV EDI,EDI
764F2C23 55 PUSH EBP
764F2C24 8BEC MOV EBP,ESP
764F2C26 81EC 80000000 SUB ESP,80
764F2C2C 53 PUSH EBX
764F2C2D 8B5D 0C MOV EBX,DWORD PTR SS:[EBP+C]
764F2C30 56 PUSH ESI
764F2C31 57 PUSH EDI
764F2C32 33FF XOR EDI,EDI
764F2C34 47 INC EDI
764F2C35 33F6 XOR ESI,ESI
764F2C37 85DB TEST EBX,EBX
764F2C39 79 4F JNS SHORT kernel32.764F2C8A
764F2C3B 8D45 FC LEA EAX,DWORD PTR SS:[EBP-4]
764F2C3E 50 PUSH EAX
764F2C3F 56 PUSH ESI
764F2C40 57 PUSH EDI
764F2C41 8D45 C8 LEA EAX,DWORD PTR SS:[EBP-38]
764F2C44 50 PUSH EAX
764F2C45 C745 FC 20000000 MOV DWORD PTR SS:[EBP-4],20
764F2C4C E8 90BE0200 CALL <JMP.&API-MS-Win-Core-ProcessThread>
764F2C51 85C0 TEST EAX,EAX
764F2C53 0F84 D2000000 JE kernel32.764F2D2B
764F2C59 56 PUSH ESI
764F2C5A 56 PUSH ESI
764F2C5B 6A 04 PUSH 4
764F2C5D 8D45 F8 LEA EAX,DWORD PTR SS:[EBP-8]
764F2C60 50 PUSH EAX
764F2C61 68 01000600 PUSH 60001
764F2C66 56 PUSH ESI
764F2C67 8D45 C8 LEA EAX,DWORD PTR SS:[EBP-38]
764F2C6A 50 PUSH EAX
764F2C6B C745 0C 00000800 MOV DWORD PTR SS:[EBP+C],80000
764F2C72 897D F8 MOV DWORD PTR SS:[EBP-8],EDI
764F2C75 E8 5CBE0200 CALL <JMP.&API-MS-Win-Core-ProcessThread>
764F2C7A 85C0 TEST EAX,EAX
764F2C7C 0F84 95000000 JE kernel32.764F2D17
764F2C82 8D45 C8 LEA EAX,DWORD PTR SS:[EBP-38]
764F2C85 8945 C4 MOV DWORD PTR SS:[EBP-3C],EAX
764F2C88 EB 03 JMP SHORT kernel32.764F2C8D
764F2C8A 8975 0C MOV DWORD PTR SS:[EBP+C],ESI
764F2C8D 6A 44 PUSH 44
764F2C8F 8D45 80 LEA EAX,DWORD PTR SS:[EBP-80]
764F2C92 56 PUSH ESI
764F2C93 50 PUSH EAX
764F2C94 E8 B5E9F7FF CALL <JMP.&ntdll.memset>
764F2C99 83C4 0C ADD ESP,0C
764F2C9C 33C0 XOR EAX,EAX
764F2C9E 3975 0C CMP DWORD PTR SS:[EBP+C],ESI
764F2CA1 897D AC MOV DWORD PTR SS:[EBP-54],EDI
764F2CA4 0F95C0 SETNE AL
764F2CA7 66:895D B0 MOV WORD PTR SS:[EBP-50],BX
764F2CAB 8D0485 44000000 LEA EAX,DWORD PTR DS:[EAX*4+44]
764F2CB2 8945 80 MOV DWORD PTR SS:[EBP-80],EAX
764F2CB5 8D45 E8 LEA EAX,DWORD PTR SS:[EBP-18]
764F2CB8 50 PUSH EAX
764F2CB9 8D45 80 LEA EAX,DWORD PTR SS:[EBP-80]
764F2CBC 50 PUSH EAX
764F2CBD 56 PUSH ESI
764F2CBE 56 PUSH ESI
764F2CBF FF75 0C PUSH DWORD PTR SS:[EBP+C]
764F2CC2 56 PUSH ESI
764F2CC3 56 PUSH ESI
764F2CC4 56 PUSH ESI
764F2CC5 FF75 08 PUSH DWORD PTR SS:[EBP+8]
764F2CC8 56 PUSH ESI
764F2CC9 E8 A4E3F7FF CALL kernel32.CreateProcessA
764F2CCE 85C0 TEST EAX,EAX
764F2CD0 74 27 JE SHORT kernel32.764F2CF9
764F2CD2 A1 3C005476 MOV EAX,DWORD PTR DS:[7654003C]
764F2CD7 3BC6 CMP EAX,ESI
764F2CD9 74 0A JE SHORT kernel32.764F2CE5
764F2CDB 68 30750000 PUSH 7530
764F2CE0 FF75 E8 PUSH DWORD PTR SS:[EBP-18]
764F2CE3 FFD0 CALL EAX
764F2CE5 FF75 E8 PUSH DWORD PTR SS:[EBP-18]
764F2CE8 8B35 A0054776 MOV ESI,DWORD PTR DS:[<&ntdll.NtClose>] ; ntdll.ZwClose
764F2CEE FFD6 CALL ESI
764F2CF0 FF75 EC PUSH DWORD PTR SS:[EBP-14]
764F2CF3 FFD6 CALL ESI
764F2CF5 6A 21 PUSH 21
764F2CF7 EB 1D JMP SHORT kernel32.764F2D16
764F2CF9 E8 C9E4F7FF CALL <JMP.&API-MS-Win-Core-ErrorHandling>
764F2CFE 48 DEC EAX
764F2CFF 48 DEC EAX
764F2D00 74 12 JE SHORT kernel32.764F2D14
764F2D02 48 DEC EAX
764F2D03 74 0B JE SHORT kernel32.764F2D10
764F2D05 2D BE000000 SUB EAX,0BE
764F2D0A 75 0B JNZ SHORT kernel32.764F2D17
764F2D0C 6A 0B PUSH 0B
764F2D0E EB 06 JMP SHORT kernel32.764F2D16
764F2D10 6A 03 PUSH 3
764F2D12 EB 02 JMP SHORT kernel32.764F2D16
764F2D14 6A 02 PUSH 2
764F2D16 5E POP ESI
764F2D17 F745 0C 00000800 TEST DWORD PTR SS:[EBP+C],80000
764F2D1E 74 09 JE SHORT kernel32.764F2D29
764F2D20 8D45 C8 LEA EAX,DWORD PTR SS:[EBP-38]
764F2D23 50 PUSH EAX
764F2D24 E8 A2BD0200 CALL <JMP.&API-MS-Win-Core-ProcessThread>
764F2D29 8BC6 MOV EAX,ESI
764F2D2B 5F POP EDI
764F2D2C 5E POP ESI
764F2D2D 5B POP EBX
764F2D2E C9 LEAVE
764F2D2F C2 0800 RETN 8
The call convention used under Win32 is stdcall which mandates return values be held in EAX. In the case of WinExec, there is only one exit from the function (0x764F2D2F). Tracing back from there, EAX is set by (at least when the return is 0x21):
764F2D29 8BC6 MOV EAX,ESI
Tracing back further, ESI itself is set from POP ESI which pops the top of the stack into ESI. The value of this is dependent on what was previously pushed on the stack. In the case of 0x21, this happens at:
764F2CF5 6A 21 PUSH 21
Immediately afterwards, a JMP is made to the POP ESI. How we got to the PUSH 21 is interesting only from after the CreateProcess call.
764F2CC9 E8 A4E3F7FF CALL kernel32.CreateProcessA
764F2CCE 85C0 TEST EAX,EAX
764F2CD0 74 27 JE SHORT kernel32.764F2CF9
764F2CD2 A1 3C005476 MOV EAX,DWORD PTR DS:[7654003C]
764F2CD7 3BC6 CMP EAX,ESI
764F2CD9 74 0A JE SHORT kernel32.764F2CE5
764F2CDB 68 30750000 PUSH 7530
764F2CE0 FF75 E8 PUSH DWORD PTR SS:[EBP-18]
764F2CE3 FFD0 CALL EAX
764F2CE5 FF75 E8 PUSH DWORD PTR SS:[EBP-18]
764F2CE8 8B35 A0054776 MOV ESI,DWORD PTR DS:[<&ntdll.NtClose>] ; ntdll.ZwClose
764F2CEE FFD6 CALL ESI
764F2CF0 FF75 EC PUSH DWORD PTR SS:[EBP-14]
764F2CF3 FFD6 CALL ESI
764F2CF5 6A 21 PUSH 21
To see how the path will take you to the PUSH 21, observe different branches. The first occurs as:
764F2CD0 74 27 JE SHORT kernel32.764F2CF9
This is saying if CreateProcess returned 0, call Win-Core-ErrorHandling. The return values are then set differently (0x2, 0x3 and 0xB are all possible return values if CreateProcess failed).
The next branch is a lot less obvious to reverse engineer:
764F2CD9 74 0A JE SHORT kernel32.764F2CE5
What it does is read a memory address which probably contains a function pointer (we know this because the result of the read is called later on). This JE simply indicates whether or not to make this call at all. Regardless of whether the call is made, the next step is to call ZwClose (twice). Finally 0x21 is returned.
So one simple way of looking at it is that when CreateProcess succeeds, 0x21 is returned, otherwise 0x2, 0x3 or 0xB are returned. This is not to say these are the only return values. For example, 0x0 can also be returned from the branch at 0x764F2C53 (in this case, ESI is not used in the same way at all). There are a few more possible return values but I will leave those for you to look into yourself.
What I've shown you is how to do a very shallow analysis of WinExec specifically for the 0x21 return. If you want to find out more, you need to poke around more in-depth and try to understand from a higher level what is going on. You'll be able to find out a lot more just by breakpointing the function and stepping through it (this way you can observe data values).
One other way is to look at the Wine source, where someone has already done all the hard work for you:
UINT WINAPI WinExec( LPCSTR lpCmdLine, UINT nCmdShow )
{
PROCESS_INFORMATION info;
STARTUPINFOA startup;
char *cmdline;
UINT ret;
memset( &startup, 0, sizeof(startup) );
startup.cb = sizeof(startup);
startup.dwFlags = STARTF_USESHOWWINDOW;
startup.wShowWindow = nCmdShow;
/* cmdline needs to be writable for CreateProcess */
if (!(cmdline = HeapAlloc( GetProcessHeap(), 0, strlen(lpCmdLine)+1 ))) return 0;
strcpy( cmdline, lpCmdLine );
if (CreateProcessA( NULL, cmdline, NULL, NULL, FALSE,
0, NULL, NULL, &startup, &info ))
{
/* Give 30 seconds to the app to come up */
if (wait_input_idle( info.hProcess, 30000 ) == WAIT_FAILED)
WARN("WaitForInputIdle failed: Error %d\n", GetLastError() );
ret = 33;
/* Close off the handles */
CloseHandle( info.hThread );
CloseHandle( info.hProcess );
}
else if ((ret = GetLastError()) >= 32)
{
FIXME("Strange error set by CreateProcess: %d\n", ret );
ret = 11;
}
HeapFree( GetProcessHeap(), 0, cmdline );
return ret;
}
33d is 0x21 so this actually just confirms the fruits of our earlier analysis.
In regards to the reason 0x21 is returned, my guess is that perhaps there exists more internal documentation which makes it more useful in some way.
Other than that this means success, the meaning of the return value is not defined. Perhaps it was chosen such that legacy applications will work well with this particular value. One thing is certain: there are more important things to worry about!
http://msdn.microsoft.com/en-us/library/windows/desktop/ms687393(v=vs.85).aspx
EDIT: This answer is wrong because the OP's result is not an error code. I mistakenly thought it was said that it was an error code. I still think the practical info below can be useful, plus that it can be useful to see what a wrong assumption can lead to, so I let this answer stand.
If you have installed Visual Studio (full or express edition), then you have a tool called errlook, which uses the FormatMessage API function to tell you what an error code or HRESULT value means.
In this case,
The process cannot access the file because another process has locked a portion of the file.
You can do much of the same manually by looking in the <winerror.h> file. For example, type an #include of that in a C++ source file in Visual Studio, then right click and ask it to open the header. Where you find that
//
// MessageId: ERROR_LOCK_VIOLATION
//
// MessageText:
//
// The process cannot access the file because another process has locked a portion of the file.
//
#define ERROR_LOCK_VIOLATION 33L
By the way, WinExec is just an old compatibility function. Preferably use ShellExecute or CreateProcess. The ShellExecute function is able to play more nicely with Windows Vista and 7 User Access Control, and it is simpler to use, so it is generally preferable.

Resources