How can I insert repeated NOP statements using Visual C++'s inline assembler? - visual-studio

Visual C++, using Microsoft's compiler, allows us to define inline assembly code using:
__asm {
nop
}
What I need is a macro that makes possible to multiply such instruction n times like:
ASM_EMIT_MULT(op, times)
for example:
ASM_EMIT_MULT(0x90, 160)
Is that possible? How could I do this?

With MASM, this is very simple to do. Part of the installation is a file named listing.inc (since everyone gets MASM as part of Visual Studio now, this will be located in your Visual Studio root directory/VC/include). This file defines a series of npad macros that take a single size argument and expand to an appropriate sequence of non-destructive "padding" opcodes. If you only need one byte of padding, you use the obvious nop instruction. But rather than using a long series of nops until you reach the desired length, Intel actually recommends other non-destructive opcodes of the appropriate length, as do other vendors. These pre-defined npad macros free you from having to memorize that table, not to mention making the code much more readable.
Unfortunately, inline assembly is not a full-featured assembler. There are a lot of things missing that you would expect to find in real assemblers like MASM. Macros (MACRO) and repeats (REPEAT/REPT) are among the things that are missing.
However, ALIGN directives are available in inline assembly. These will generate the required number of nops or other non-destructive opcodes to enforce alignment of the next instruction. Using this is drop-dead simple. Here is a very stupid example, where I've taken working code and peppered it with random aligns:
unsigned long CountDigits(unsigned long value)
{
__asm
{
mov edx, DWORD PTR [value]
bsr eax, edx
align 4
xor eax, 1073741792
mov eax, DWORD PTR [4 * eax + kMaxDigits+132]
align 16
cmp edx, DWORD PTR [4 * eax + kPowers-4]
sbb eax, 0
align 8
}
}
This generates the following output (MSVC's assembly listings use npad x, where x is the number of bytes, just as you'd write it in MASM):
PUBLIC CountDigits
_TEXT SEGMENT
_value$ = 8
CountDigits PROC
00000 8b 54 24 04 mov edx, DWORD PTR _value$[esp-4]
00004 0f bd c2 bsr eax, edx
00007 90 npad 1 ;// enforcing the "align 4"
00008 35 e0 ff ff 3f xor eax, 1073741792
0000d 8b 04 85 84 00
00 00 mov eax, DWORD PTR _kMaxDigits[eax*4+132]
00014 eb 0a 8d a4 24
00 00 00 00 8d
49 00 npad 12 ;// enforcing the "align 16"
00020 3b 14 85 fc ff
ff ff cmp edx, DWORD PTR _kPowers[eax*4-4]
00027 83 d8 00 sbb eax, 0
0002a 8d 9b 00 00 00
00 npad 6 ;// enforcing the "align 8"
00030 c2 04 00 ret 4
CountDigits ENDP
_TEXT ENDS
If you aren't actually wanting to enforce alignment, but just want to insert an arbitrary number of nops (perhaps as filler for later hot-patching?), then you can use C macros to simulate the effect:
#define NOP1 __asm { nop }
#define NOP2 NOP1 NOP1
#define NOP4 NOP2 NOP2
#define NOP8 NOP4 NOP4
#define NOP16 NOP8 NOP8
// ...
#define NOP64 NOP16 NOP16 NOP16 NOP16
// ...etc.
And then pepper your code as desired:
unsigned long CountDigits(unsigned long value)
{
__asm
{
mov edx, DWORD PTR [value]
bsr eax, edx
NOP8
xor eax, 1073741792
mov eax, DWORD PTR [4 * eax + kMaxDigits+132]
NOP4
cmp edx, DWORD PTR [4 * eax + kPowers-4]
sbb eax, 0
}
}
to produce the following output:
PUBLIC CountDigits
_TEXT SEGMENT
_value$ = 8
CountDigits PROC
00000 8b 54 24 04 mov edx, DWORD PTR _value$[esp-4]
00004 0f bd c2 bsr eax, edx
00007 90 npad 1 ;// these are, of course, just good old NOPs
00008 90 npad 1
00009 90 npad 1
0000a 90 npad 1
0000b 90 npad 1
0000c 90 npad 1
0000d 90 npad 1
0000e 90 npad 1
0000f 35 e0 ff ff 3f xor eax, 1073741792
00014 8b 04 85 84 00
00 00 mov eax, DWORD PTR _kMaxDigits[eax*4+132]
0001b 90 npad 1
0001c 90 npad 1
0001d 90 npad 1
0001e 90 npad 1
0001f 3b 14 85 fc ff
ff ff cmp edx, DWORD PTR _kPowers[eax*4-4]
00026 83 d8 00 sbb eax, 0
00029 c2 04 00 ret 4
CountDigits ENDP
_TEXT ENDS
Or, even cooler, we can use a bit of template meta-programming magic to get the same effect in style. Just define the following template function and its specialization (important to prevent infinite recursion):
template <size_t N> __forceinline void npad()
{
npad<N-1>();
__asm { nop }
}
template <> __forceinline void npad<0>() { }
And use it like this:
unsigned long CountDigits(unsigned long value)
{
__asm
{
mov edx, DWORD PTR [value]
bsr eax, edx
}
npad<8>();
__asm
{
xor eax, 1073741792
mov eax, DWORD PTR [4 * eax + kMaxDigits+132]
}
npad<4>();
__asm
{
cmp edx, DWORD PTR [4 * eax + kPowers-4]
sbb eax, 0
}
}
That'll produce the desired output (exactly the same as the one just above) in all optimized builds—whether you optimize for size (/O1) or speed (/O2)—…but not in debugging builds. If you need it in debug builds, you'll have to resort to the C macros. :-(

Base on Cody Gray Answer and code example for metaprogramming using template recursion and inline or forceinline as stated on the code before
template <size_t N> __forceinline void npad()
{
npad<N-1>();
__asm { nop }
}
template <> __forceinline void npad<0>() { }
It won't work on visual studio, without setting some options and is not a guarantee it will work
Although __forceinline is a stronger indication to the compiler than
__inline, inlining is still performed at the compiler's discretion, but no heuristics are used to determine the benefits from inlining this function.
You can read more about this here https://learn.microsoft.com/en-us/cpp/error-messages/compiler-warnings/compiler-warning-level-4-c4714?view=vs-2019

Related

Why did additional pointer arguments disappear in assembly?

C Code:
void PtrArg1(int* a,int* b,int* c, int* d, int* e, int* f)
{
return;
}
void PtrArg2(int* a,int* b,int* c, int* d, int* e, int* f, int* g, int* h)
{
return;
}
Compiling with
gcc -c -m64 -o basics basics.c -O0
Running
objdump -d basics -M intel -r
then results in the following disassembly (Intel syntax):
000000000000000b <PtrArg1>:
b: f3 0f 1e fa endbr64
f: 55 push rbp
10: 48 89 e5 mov rbp,rsp
13: 48 89 7d f8 mov QWORD PTR [rbp-0x8],rdi
17: 48 89 75 f0 mov QWORD PTR [rbp-0x10],rsi
1b: 48 89 55 e8 mov QWORD PTR [rbp-0x18],rdx
1f: 48 89 4d e0 mov QWORD PTR [rbp-0x20],rcx
23: 4c 89 45 d8 mov QWORD PTR [rbp-0x28],r8
27: 4c 89 4d d0 mov QWORD PTR [rbp-0x30],r9
2b: 90 nop
2c: 5d pop rbp
2d: c3 ret
000000000000002e <PtrArg2>:
2e: f3 0f 1e fa endbr64
32: 55 push rbp
33: 48 89 e5 mov rbp,rsp
36: 48 89 7d f8 mov QWORD PTR [rbp-0x8],rdi
3a: 48 89 75 f0 mov QWORD PTR [rbp-0x10],rsi
3e: 48 89 55 e8 mov QWORD PTR [rbp-0x18],rdx
42: 48 89 4d e0 mov QWORD PTR [rbp-0x20],rcx
46: 4c 89 45 d8 mov QWORD PTR [rbp-0x28],r8
4a: 4c 89 4d d0 mov QWORD PTR [rbp-0x30],r9
4e: 90 nop
4f: 5d pop rbp
50: c3 ret
The number of arguments differs for PtrArg1 and PtrArg2, but the assembly instructions are the same for both. Why?
This is due to the calling convention (System V AMD64 ABI, current version 1.0). The first six parameters are passed in integer registers, all others are pushed onto the stack.
After executing till location PtrArg2+0x4e, you get the following stack layout:
+----------+-----------------+
| offset | content |
+----------+-----------------+
| rbp-0x30 | f |
| rbp-0x28 | e |
| rbp-0x20 | d |
| rbp-0x18 | c |
| rbp-0x10 | b |
| rbp-0x8 | a |
| rbp+0x0 | saved rbp value |
| rbp+0x8 | return address |
| rbp+0x10 | g |
| rbp+0x18 | h |
+----------+-----------------+
Since g and h are pushed by the caller, you get the same disassembly for both functions. For the caller
void Caller()
{
PtrArg2(1, 2, 3, 4, 5, 6, 7, 8);
}
(I ommitted the necessary casts for clarity) we would get the following disassembly:
Caller():
push rbp
mov rbp, rsp
push 8
push 7
mov r9d, 6
mov r8d, 5
mov ecx, 4
mov edx, 3
mov esi, 2
mov edi, 1
call PtrArg2
add rsp, 16
nop
leave
ret
(see compiler explorer)
The parameters h = 8 and g = 7 are pushed onto the stack, before calling PtrArg2.
Disappear? What did you expect the function to do with them that the compiler would emit asm instructions to implement?
You literally return; as the only statement in a void function so there's nothing the function needs to do other than ret. If you compile with a normal level of optimization like -O2, that's all you'll get. Debug-mode code is usually not interesting to look at, and is full of redundant / useless stuff.
How to remove "noise" from GCC/clang assembly output?
The only reason you're seeing any instructions for some args is that you compiled in debug mode, i.e. the default optimization level of -O0, anti-optimized debug mode. Every C object (except register locals) has a memory address, and debug mode makes sure that the value is actually there in memory before/after every C statement. This means spilling register args to the stack on function entry. Why does clang produce inefficient asm with -O0 (for this simple floating point sum)?
The x86-64 System V ABI's calling convention passes the first 6 integer args in registers, the rest on the stack. The stack args already have memory addresses; the compiler doesn't emit code to copy them down next to other local vars below the return address; that would be pointless. The callee "owns" its own stack args, i.e. it can store new values to the stack space where the caller wrote the args, so that space can be the true address of args even if the function were to modify them.

Is tooling available to 'assemble' WebAssembly to x86-64 native code?

I am guessing that a Wasm binary is usually JIT-compiled to native code, but given a Wasm source, is there a tool to see the actual generated x86-64 machine code?
Or asked in a different way, is there a tool that consumes Wasm and outputs native code?
The online WasmExplorer compiles C code to both WebAssembly and FireFox x86, using the SpiderMonkey compiler. Given the following simple function:
int testFunction(int* input, int length) {
int sum = 0;
for (int i = 0; i < length; ++i) {
sum += input[i];
}
return sum;
}
Here is the x86 output:
wasm-function[0]:
sub rsp, 8 ; 0x000000 48 83 ec 08
cmp esi, 1 ; 0x000004 83 fe 01
jge 0x14 ; 0x000007 0f 8d 07 00 00 00
0x00000d:
xor eax, eax ; 0x00000d 33 c0
jmp 0x26 ; 0x00000f e9 12 00 00 00
0x000014:
xor eax, eax ; 0x000014 33 c0
0x000016: ; 0x000016 from: [0x000024]
mov ecx, dword ptr [r15 + rdi] ; 0x000016 41 8b 0c 3f
add eax, ecx ; 0x00001a 03 c1
add edi, 4 ; 0x00001c 83 c7 04
add esi, -1 ; 0x00001f 83 c6 ff
test esi, esi ; 0x000022 85 f6
jne 0x16 ; 0x000024 75 f0
0x000026:
nop ; 0x000026 66 90
add rsp, 8 ; 0x000028 48 83 c4 08
ret
You can view this example online.
WasmExplorer compiles code into wasm / x86 via a service - you can see the scripts that are run on Github - you should be able to use these to construct a command-line tool yourself.

ASM algorithm decoding

I am trying to understand this problem that is in ASM. Here is the code:
45 33 C9 xor r9d, r9d
C7 44 24 18 50 72 69 6D mov [rsp+arg_10], 6D697250h
66 C7 44 24 1C 65 53 mov [rsp+arg_14], 5365h
C6 44 24 1E 6F mov [rsp+arg_16], 6Fh
4C 63 C1 movsxd r8, ecx
85 C9 test ecx, ecx
7E 1C jle short locret_140001342
41 8B C9 mov ecx, r9d
loc_140001329:
48 83 F9 07 cmp rcx, 7
49 0F 4D C9 cmovge rcx, r9
48 FF C1 inc rcx
8A 44 0C 17 mov al, [rsp+rcx+arg_F]
30 02 xor [rdx], al
48 FF C2 inc rdx
49 FF C8 dec r8
75 E7 jnz short loc_140001329
locret_140001342:
C3 retn
And here is the encoded text:
07 1D 1E 41 45 2A 00 25 52 0D 04 01 73 06
24 53 49 39 0D 36 4F 35 1F 08 04 09 73 0E
34 16 1B 08 16 20 4F 39 01 49 4A 54 3D 1B
35 00 07 5C 53 0C 08 1E 38 11 2A 30 13 1F
22 1B 04 08 16 3C 41 33 1D 04 4A
I've been studying ASM for some time now and I know what most of the commands but I still have some questions I have not found the answer to.
How do i plug the encoded text into the algorithm?
What are arg_10, arg_14, etc? I assume they are from the encoded part but I dont know exatcly.
Could someone go line by line what this algorithm does, I understand some of it but I need some clarification.
I have been using visual studio and c++ to test asm. I do know that to run an asm procedure you can declare a function like this
extern "C" int function(int a, int b, int c,int d, int f, int g);
and use it like this
printf("ASM Returned %d", function(92,2,3,4,5,6));
I am also aware that the first four parameters go into int RCX, RDX, R8, and R9 and the rest are on the stack. I don't know much about the stack so I do not know how to access them right now. I also know that the returned value is the value contained by RAX. So a something like this would add two numbers:
xor eax, eax
mov eax, ecx
add eax, edx
ret
So as Jester suggested, I will go line by line explaining what I think the code does.
xor r9d, r9d //xor on r9d (clears the register)
mov [rsp+arg_10], 6D697250h //moves 6D697250 to the address pointed at by rsp + arg_10
mov [rsp+arg_14], 5365h //moves 5365 to the adress pointed at by rsp+arg_14
mov [rsp+arg_16], 6Fh //moves 6F to the adress pointed at by rsp+arg_16
movsxd r8, ecx //moves ecx, to r8 and sign extends it since exc is 32 bit and r8 is 64 bit
test ecx, ecx //tests exc and sets the labels
jle short locret_140001342 //jumps to ret if ecx is zero or less
mov ecx, r9d //moves the lower 32 bits or r9 into ecx
loc_140001329: //label used by jump commands
cmp rcx, 7 //moves 7(decimal) into rcx
cmovge rcx, r9 //don't know
inc rcx //increases rcx by 1
mov al, [rsp + rcx + arg_F] //moves the the value at adress [rsp + rcx + arg_F] into al,
//this is probably the key step as al is 1 byte and each character is also one byte, it is also the rax register so it holds the value to be returned
xor [rdx], al //xor on the value at address [rdx] and al, stores the result at the address of [rdx]
inc rdx //increase rdx by 1
dec r8 //decrease r8 by 1
jnz short loc_140001329 //if r8 is not zero jump back to loc_140...
//this essentially is a while loop until r8 reaches 0 (assuming it starts as positive)
locret_140001342:
ret
I still don't know what the arg_xx are or how exactly is the encoded text plugged into this algorithm.
Here is my take on the code.
; rdx holds the message location
; ecx holds the message length
xor r9d, r9d ; r9d = 0
mov [rsp+arg_10], 6D697250h ; fix up the key
mov [rsp+arg_14], 5365h
mov [rsp+arg_16], 6Fh ; which is "PrimeSo"
movsxd r8, ecx ; length counter
test ecx, ecx ; test the message length
jle short locret_140001342 ; skip if invalid length
mov ecx, r9d ; reset key index to 0
loc_140001329:
cmp rcx, 7 ; check indexing of key
cmovge rcx, r9 ; reset if o/range
inc rcx ; obfusacte by incrementing first
mov al, [rsp+rcx+arg_F] ; ... and indexing wrong offset
xor [rdx], al ; encrypt the message byte
inc rdx ; advance message pointer
dec r8 ; loop count
jnz short loc_140001329 ; next message byte
locret_140001342:
retn
I decoded the message with a C program implementing the algorithm, but that would be too easy, so I won't post it.
Reverse engineering
The code does not contain enough information to solve it top-down, because some registers are used without being loaded, and labels are not defined. I solved it bottom-up, by identifying the instruction that does the encryption, and working out from there.
Although the stack labels are not defined, the nomenclature is enough of a clue to show that the parts of the key are actually consecutive, and the assumption of little-endian reveals the key. This is confirmed looking at the hex byte tabulation, which shows the three values being stored at offsets' lsb of 18, 1C and 1E
I think your understanding is largely correct, a few minor corrections:
Correction 1
test ecx, ecx //tests exc and sets the labels
This sets the flags (not the labels).
Correction 2
cmp rcx, 7 //moves 7(decimal) into rcx
This compares rcx to the immediate value 7, and sets the flags accordingly. (i.e. after this instruction a conditional instruction such as gt will only execute if rcx was greater than 7.)
Correction 3
cmovge rcx, r9 //don't know
This conditionally (based on the flags you have just set) moves r9 into rcx. The condition is ge, so this instruction only executes if rcx was greater than or equal to 7. r9 contains 0, so the effect of this is to set rcx back to 0 when it reaches 7.
Parameters
You are not given information on the parameters to the function, but it seems safe to assume that rcx is the original length of the data to be decrypted, and rdx is a pointer to the data.
one thing I noticed is that the values being stored at those stack offsets are ASCII:
>>> '5072696d65536f'.decode('hex')
'PrimeSo'
as for entering the data, you could use xxd -r -p and read it from stdin in the program: xxd -r -p data.hex | ./myprog
those arg_14 etc. offsets have to be declared somewhere in the sources. but I would guess they're hex offsets 0xf, 0x10, 0x14, 0x16.
Ok i have figured out the algorithm and have made it work in ASM as well. You guys were right, the arg_xx were offsets. arg_10 == 0x10, arg_f == 0x0f. The data is passed in as an array with the length of it. So rcx will be the data length in this case 47, and rdx will point to the beginning of the array. Here is the function I used in c++ to call the ASM procedure.
extern "C" void function(int length, char* message);
The algorithm is pretty simple. The key phrase is "PrimeSo". All it does is do a XOR operation on each value passed in with one of the values in "PrimeSo" in increasing order, once it reaches the 'o' in "PrimeSo" it goes back to 'P'. Hence
cmp rcx, 7
cmovge rcx, r9 //as Peter de Rivaz stated this will put 0 into rcx if it is greater or equal to seven
inc rcx
and so
mov al, [rsp + rcx + 0Fh]
will effectively become [rsp + 1 + 0fh], [rsp + 2 + 0Fh], ..., [rsp + 7 + 0Fh]. Note that "PrimeSo" was stored at [rsp + 10h] meaning that [rsp + 1 + 0Fh] points to 'P'. In each iteration of the loop, al will become one of the characters in "PrimeSo" and it will cycle through them.
xor [rdx], al //This will do an xor operation on [rdx](begining of our message) and al wich is 'P' in the first loop.
//It will then store the result in it's place.
inc rdx //move to next character
dec r8 //decrease counter
jnz short loc_140001329 //and start the loop again
With that being said lets look at the first few ones.
xor P, 07 == xor 50, 07 --> 57 = W
xor r, 1D == xor 72, 1D --> 6F = o
xor i, 1E == xor 69, 1E --> 77 = w
xor m, 41 == xor 6D, 41 --> 2C = ,
For those wondering here is the C++ code:
#include <fstream>
extern "C" void function(int length, char* message);
int main()
{
char message[] = { 0x07, 0x1D, 0x1E, 0x41, 0x45, 0x2A, 0x00, 0x25, 0x52, 0x0D, 0x04, 0x01, 0x73, 0x06, 0x24, 0x53, 0x49, 0x39, 0x0D, 0x36, 0x4F, 0x35, 0x1F, 0x08, 0x04, 0x09, 0x73, 0x0E, 0x34, 0x16, 0x1B, 0x08, 0x16, 0x20, 0x4F, 0x39, 0x01, 0x49, 0x4A, 0x54, 0x3D, 0x1B, 0x35, 0x00, 0x07, 0x5C, 0x53, 0x0C, 0x08, 0x1E, 0x38, 0x11, 0x2A, 0x30, 0x13, 0x1F, 0x22, 0x1B, 0x04, 0x08, 0x16, 0x3C, 0x41, 0x33, 0x1D, 0x04, 0x4A, '\0'};
function(sizeof(message) - 1, message);
printf("Decoded Message is:\n%s\n", message);
printf("\n");
system("pause");
return 0;
}
No I did not manually insert the data into message. Also note that I added a string terminator at the end and used sizeof(message) - 1 to avoid decoding the string terminator.
Here is the ASM code, this is simply a new file called assembly.asm and has this in it.
.code
function proc
xor r9d, r9d
mov dword ptr [rsp + 18h], 6D697250h
mov word ptr [rsp + 1Ch], 5365h
mov byte ptr [rsp + 1Eh], 6Fh
movsxd r8, ecx
test ecx, ecx
jle short locret_140001342
mov ecx, r9d
loc_140001329:
cmp rcx, 7
cmovge rcx, r9
inc rcx
mov al, [rsp + rcx + 17h]
xor [rdx], al
inc rdx
dec r8
jnz short loc_140001329
locret_140001342:
ret
function endp
end
In visual studio, you can add a breakpoint in here and go to debug->windows->registers and debug->windows->memory-memory 1 to see the registers and the program's memory. Note that rcx will contain the count, and rdx will point to the beginning of the encoded message.
Thank you all for your help and suggestions, I couldn't of done it without you.

_InterlockedIncrement intrinsic implementation

Visual Studio produces the following machine code when _InterlockedIncrement is used:
; 40 : _InterlockedIncrement(&framecounter);
00078 b8 00 00 00 00 mov eax, OFFSET ?framecounter##3JA ; framecounter
0007d b9 01 00 00 00 mov ecx, 1
00082 f0 0f c1 08 lock xadd DWORD PTR [eax], ecx
If I would be writing this i would use just lock inc DWORD PTR [eax] instead of mov and xadd
Is there a valid reason why Microsoft preferred xadd and using 2 instructions instead of 1?
Because _InterlockedIncrement also returns the new value.
You can't do that with lock inc DWORD PTR [eax], because now neither the old nor the new value are anywhere to be found. Except in memory, but if you do an other read, clearly it won't be atomic (the increment itself would be, but you could get a value back that has nothing to do with what happened at the time of the increment).
Returning the value makes _InterlockedIncrement more useful.

WinExec Return 0x21, But what exactly it means?

When call WinExec to run a .exe, I get return value 0x21.
According to MSDN, a return value greater than 31 (0x1F) means function succeeds.
But what does it mean of 0x21, Why it didn't return other value to me?
It is not useful for you to know what it means. That is an implementation detail. Even if you knew what it meant for this version, the meaning might change in the next version. As a programmer, you are concerned only with programming against the interface, not the underlying implementation.
However, if you are really interested, I will take you through the approach I would take to reverse engineer the function. On my system, WinExec is disassembled to this:
764F2C21 > 8BFF MOV EDI,EDI
764F2C23 55 PUSH EBP
764F2C24 8BEC MOV EBP,ESP
764F2C26 81EC 80000000 SUB ESP,80
764F2C2C 53 PUSH EBX
764F2C2D 8B5D 0C MOV EBX,DWORD PTR SS:[EBP+C]
764F2C30 56 PUSH ESI
764F2C31 57 PUSH EDI
764F2C32 33FF XOR EDI,EDI
764F2C34 47 INC EDI
764F2C35 33F6 XOR ESI,ESI
764F2C37 85DB TEST EBX,EBX
764F2C39 79 4F JNS SHORT kernel32.764F2C8A
764F2C3B 8D45 FC LEA EAX,DWORD PTR SS:[EBP-4]
764F2C3E 50 PUSH EAX
764F2C3F 56 PUSH ESI
764F2C40 57 PUSH EDI
764F2C41 8D45 C8 LEA EAX,DWORD PTR SS:[EBP-38]
764F2C44 50 PUSH EAX
764F2C45 C745 FC 20000000 MOV DWORD PTR SS:[EBP-4],20
764F2C4C E8 90BE0200 CALL <JMP.&API-MS-Win-Core-ProcessThread>
764F2C51 85C0 TEST EAX,EAX
764F2C53 0F84 D2000000 JE kernel32.764F2D2B
764F2C59 56 PUSH ESI
764F2C5A 56 PUSH ESI
764F2C5B 6A 04 PUSH 4
764F2C5D 8D45 F8 LEA EAX,DWORD PTR SS:[EBP-8]
764F2C60 50 PUSH EAX
764F2C61 68 01000600 PUSH 60001
764F2C66 56 PUSH ESI
764F2C67 8D45 C8 LEA EAX,DWORD PTR SS:[EBP-38]
764F2C6A 50 PUSH EAX
764F2C6B C745 0C 00000800 MOV DWORD PTR SS:[EBP+C],80000
764F2C72 897D F8 MOV DWORD PTR SS:[EBP-8],EDI
764F2C75 E8 5CBE0200 CALL <JMP.&API-MS-Win-Core-ProcessThread>
764F2C7A 85C0 TEST EAX,EAX
764F2C7C 0F84 95000000 JE kernel32.764F2D17
764F2C82 8D45 C8 LEA EAX,DWORD PTR SS:[EBP-38]
764F2C85 8945 C4 MOV DWORD PTR SS:[EBP-3C],EAX
764F2C88 EB 03 JMP SHORT kernel32.764F2C8D
764F2C8A 8975 0C MOV DWORD PTR SS:[EBP+C],ESI
764F2C8D 6A 44 PUSH 44
764F2C8F 8D45 80 LEA EAX,DWORD PTR SS:[EBP-80]
764F2C92 56 PUSH ESI
764F2C93 50 PUSH EAX
764F2C94 E8 B5E9F7FF CALL <JMP.&ntdll.memset>
764F2C99 83C4 0C ADD ESP,0C
764F2C9C 33C0 XOR EAX,EAX
764F2C9E 3975 0C CMP DWORD PTR SS:[EBP+C],ESI
764F2CA1 897D AC MOV DWORD PTR SS:[EBP-54],EDI
764F2CA4 0F95C0 SETNE AL
764F2CA7 66:895D B0 MOV WORD PTR SS:[EBP-50],BX
764F2CAB 8D0485 44000000 LEA EAX,DWORD PTR DS:[EAX*4+44]
764F2CB2 8945 80 MOV DWORD PTR SS:[EBP-80],EAX
764F2CB5 8D45 E8 LEA EAX,DWORD PTR SS:[EBP-18]
764F2CB8 50 PUSH EAX
764F2CB9 8D45 80 LEA EAX,DWORD PTR SS:[EBP-80]
764F2CBC 50 PUSH EAX
764F2CBD 56 PUSH ESI
764F2CBE 56 PUSH ESI
764F2CBF FF75 0C PUSH DWORD PTR SS:[EBP+C]
764F2CC2 56 PUSH ESI
764F2CC3 56 PUSH ESI
764F2CC4 56 PUSH ESI
764F2CC5 FF75 08 PUSH DWORD PTR SS:[EBP+8]
764F2CC8 56 PUSH ESI
764F2CC9 E8 A4E3F7FF CALL kernel32.CreateProcessA
764F2CCE 85C0 TEST EAX,EAX
764F2CD0 74 27 JE SHORT kernel32.764F2CF9
764F2CD2 A1 3C005476 MOV EAX,DWORD PTR DS:[7654003C]
764F2CD7 3BC6 CMP EAX,ESI
764F2CD9 74 0A JE SHORT kernel32.764F2CE5
764F2CDB 68 30750000 PUSH 7530
764F2CE0 FF75 E8 PUSH DWORD PTR SS:[EBP-18]
764F2CE3 FFD0 CALL EAX
764F2CE5 FF75 E8 PUSH DWORD PTR SS:[EBP-18]
764F2CE8 8B35 A0054776 MOV ESI,DWORD PTR DS:[<&ntdll.NtClose>] ; ntdll.ZwClose
764F2CEE FFD6 CALL ESI
764F2CF0 FF75 EC PUSH DWORD PTR SS:[EBP-14]
764F2CF3 FFD6 CALL ESI
764F2CF5 6A 21 PUSH 21
764F2CF7 EB 1D JMP SHORT kernel32.764F2D16
764F2CF9 E8 C9E4F7FF CALL <JMP.&API-MS-Win-Core-ErrorHandling>
764F2CFE 48 DEC EAX
764F2CFF 48 DEC EAX
764F2D00 74 12 JE SHORT kernel32.764F2D14
764F2D02 48 DEC EAX
764F2D03 74 0B JE SHORT kernel32.764F2D10
764F2D05 2D BE000000 SUB EAX,0BE
764F2D0A 75 0B JNZ SHORT kernel32.764F2D17
764F2D0C 6A 0B PUSH 0B
764F2D0E EB 06 JMP SHORT kernel32.764F2D16
764F2D10 6A 03 PUSH 3
764F2D12 EB 02 JMP SHORT kernel32.764F2D16
764F2D14 6A 02 PUSH 2
764F2D16 5E POP ESI
764F2D17 F745 0C 00000800 TEST DWORD PTR SS:[EBP+C],80000
764F2D1E 74 09 JE SHORT kernel32.764F2D29
764F2D20 8D45 C8 LEA EAX,DWORD PTR SS:[EBP-38]
764F2D23 50 PUSH EAX
764F2D24 E8 A2BD0200 CALL <JMP.&API-MS-Win-Core-ProcessThread>
764F2D29 8BC6 MOV EAX,ESI
764F2D2B 5F POP EDI
764F2D2C 5E POP ESI
764F2D2D 5B POP EBX
764F2D2E C9 LEAVE
764F2D2F C2 0800 RETN 8
The call convention used under Win32 is stdcall which mandates return values be held in EAX. In the case of WinExec, there is only one exit from the function (0x764F2D2F). Tracing back from there, EAX is set by (at least when the return is 0x21):
764F2D29 8BC6 MOV EAX,ESI
Tracing back further, ESI itself is set from POP ESI which pops the top of the stack into ESI. The value of this is dependent on what was previously pushed on the stack. In the case of 0x21, this happens at:
764F2CF5 6A 21 PUSH 21
Immediately afterwards, a JMP is made to the POP ESI. How we got to the PUSH 21 is interesting only from after the CreateProcess call.
764F2CC9 E8 A4E3F7FF CALL kernel32.CreateProcessA
764F2CCE 85C0 TEST EAX,EAX
764F2CD0 74 27 JE SHORT kernel32.764F2CF9
764F2CD2 A1 3C005476 MOV EAX,DWORD PTR DS:[7654003C]
764F2CD7 3BC6 CMP EAX,ESI
764F2CD9 74 0A JE SHORT kernel32.764F2CE5
764F2CDB 68 30750000 PUSH 7530
764F2CE0 FF75 E8 PUSH DWORD PTR SS:[EBP-18]
764F2CE3 FFD0 CALL EAX
764F2CE5 FF75 E8 PUSH DWORD PTR SS:[EBP-18]
764F2CE8 8B35 A0054776 MOV ESI,DWORD PTR DS:[<&ntdll.NtClose>] ; ntdll.ZwClose
764F2CEE FFD6 CALL ESI
764F2CF0 FF75 EC PUSH DWORD PTR SS:[EBP-14]
764F2CF3 FFD6 CALL ESI
764F2CF5 6A 21 PUSH 21
To see how the path will take you to the PUSH 21, observe different branches. The first occurs as:
764F2CD0 74 27 JE SHORT kernel32.764F2CF9
This is saying if CreateProcess returned 0, call Win-Core-ErrorHandling. The return values are then set differently (0x2, 0x3 and 0xB are all possible return values if CreateProcess failed).
The next branch is a lot less obvious to reverse engineer:
764F2CD9 74 0A JE SHORT kernel32.764F2CE5
What it does is read a memory address which probably contains a function pointer (we know this because the result of the read is called later on). This JE simply indicates whether or not to make this call at all. Regardless of whether the call is made, the next step is to call ZwClose (twice). Finally 0x21 is returned.
So one simple way of looking at it is that when CreateProcess succeeds, 0x21 is returned, otherwise 0x2, 0x3 or 0xB are returned. This is not to say these are the only return values. For example, 0x0 can also be returned from the branch at 0x764F2C53 (in this case, ESI is not used in the same way at all). There are a few more possible return values but I will leave those for you to look into yourself.
What I've shown you is how to do a very shallow analysis of WinExec specifically for the 0x21 return. If you want to find out more, you need to poke around more in-depth and try to understand from a higher level what is going on. You'll be able to find out a lot more just by breakpointing the function and stepping through it (this way you can observe data values).
One other way is to look at the Wine source, where someone has already done all the hard work for you:
UINT WINAPI WinExec( LPCSTR lpCmdLine, UINT nCmdShow )
{
PROCESS_INFORMATION info;
STARTUPINFOA startup;
char *cmdline;
UINT ret;
memset( &startup, 0, sizeof(startup) );
startup.cb = sizeof(startup);
startup.dwFlags = STARTF_USESHOWWINDOW;
startup.wShowWindow = nCmdShow;
/* cmdline needs to be writable for CreateProcess */
if (!(cmdline = HeapAlloc( GetProcessHeap(), 0, strlen(lpCmdLine)+1 ))) return 0;
strcpy( cmdline, lpCmdLine );
if (CreateProcessA( NULL, cmdline, NULL, NULL, FALSE,
0, NULL, NULL, &startup, &info ))
{
/* Give 30 seconds to the app to come up */
if (wait_input_idle( info.hProcess, 30000 ) == WAIT_FAILED)
WARN("WaitForInputIdle failed: Error %d\n", GetLastError() );
ret = 33;
/* Close off the handles */
CloseHandle( info.hThread );
CloseHandle( info.hProcess );
}
else if ((ret = GetLastError()) >= 32)
{
FIXME("Strange error set by CreateProcess: %d\n", ret );
ret = 11;
}
HeapFree( GetProcessHeap(), 0, cmdline );
return ret;
}
33d is 0x21 so this actually just confirms the fruits of our earlier analysis.
In regards to the reason 0x21 is returned, my guess is that perhaps there exists more internal documentation which makes it more useful in some way.
Other than that this means success, the meaning of the return value is not defined. Perhaps it was chosen such that legacy applications will work well with this particular value. One thing is certain: there are more important things to worry about!
http://msdn.microsoft.com/en-us/library/windows/desktop/ms687393(v=vs.85).aspx
EDIT: This answer is wrong because the OP's result is not an error code. I mistakenly thought it was said that it was an error code. I still think the practical info below can be useful, plus that it can be useful to see what a wrong assumption can lead to, so I let this answer stand.
If you have installed Visual Studio (full or express edition), then you have a tool called errlook, which uses the FormatMessage API function to tell you what an error code or HRESULT value means.
In this case,
The process cannot access the file because another process has locked a portion of the file.
You can do much of the same manually by looking in the <winerror.h> file. For example, type an #include of that in a C++ source file in Visual Studio, then right click and ask it to open the header. Where you find that
//
// MessageId: ERROR_LOCK_VIOLATION
//
// MessageText:
//
// The process cannot access the file because another process has locked a portion of the file.
//
#define ERROR_LOCK_VIOLATION 33L
By the way, WinExec is just an old compatibility function. Preferably use ShellExecute or CreateProcess. The ShellExecute function is able to play more nicely with Windows Vista and 7 User Access Control, and it is simpler to use, so it is generally preferable.

Resources