In short
What would be the easiest way to make sure that the high word of ecx contains 0 by replacing following instruction in the exe file?
004044be 0fbf4dfc movsx ecx,word ptr [ebp-4]
Instead of containing FFFF9298 after executing adress 004044be I'd like ecx to contain 00009298.
What have I tried
I have tried replacing movsx with a simple movby replacing the opcode 0fbf in the executable with 8b0c but that didn't work out as planned (using the debugger to verify the change, I'm sure it was at the the right place)
Some background
I have a rather old program that for one reason or another AV's when I try to print to my HP printer. Everything works fine when I print to CutePDF so the current workaround is to generate a PDF file and print the PDF.
Getting my feet wet with WinDbg I tried to find the reason of why this was happening.
While this is not the root cause of my problem, it seems that ecx at some point get's a negative value which is used to allocate memory, ultimately resulting in an exception.
I could try to find why a negative value is returned but during my debugging session, I noticed that zeroing out the high word in ecx did the job (aka printed the file).
So instead of containing FFFF9298 after executing adress 004044be I'd like ecx to contain 00009298.
004044ac 0fbf05e8025100 movsx eax,word ptr [Encore32!SystemsPerPageDlogProc+0x10e4a1 (005102e8)]
004044b3 05416f0000 add eax,6F41h
004044b8 668945fc mov word ptr [ebp-4],ax
004044bc 6a40 push 40h
004044be 0fbf4dfc movsx ecx,word ptr [ebp-4] --> replace with ?
004044c2 51 push ecx
004044c3 8b55f8 mov edx,dword ptr [ebp-8]
004044c6 52 push edx
Use movzx, it does exactly what you need.
The opcode (for 16->32 version) is 0F B7 /r so just patching BF to B7 should do it.
movsx mean "move with sign extend". That means that the top bit of src is copied to all high bits of dest. You don't want that.
movzx fills top bits with 0.
movzx ecx,word ptr [ebp-4] is what you need. Using 32-bit code as in your example it can be encoded as 0F B7 4D FC.
Related
Unless I am misunderstanding something, it seems the position of the canary value can be before or after ebp, therefore in the second case the attacker can overwrite the frame pointer without touching the canary.
For example in this snippet, the canary is located at a lower address (ebp-0xc) than ebp therefore protecting it (an attacker must overwrite the canary to overwrite ebp):
0x080484e0 <+52>: mov eax,DWORD PTR [ebp-0xc]
0x080484e3 <+55>: xor eax,DWORD PTR gs:0x14
0x080484ea <+62>: je 0x80484f1 <func+69>
0x080484ec <+64>: call 0x8048360 <__stack_chk_fail#plt>
However looking at other code the canary is after rbp+8:
How should I interpret this? Does this depend on GCC version or something else?
The canary is always below the frame pointer, with every version of gcc I've tried. You can see that confirmed in the gdb disassembly immediately below the IDA disassembly in the blog post you linked, which has mov rax, QWORD PTR [rbp-0x8].
I think this is just an artifact of IDA's disassembler. Instead of displaying the numerical offset for rbp-relative addresses, it assigns a name to each stack slot, and displays the name instead; basically assuming that every rbp-relative access is to a local variable or argument. And it looks like it always displays that name with a + regardless of whether the offset is positive or negative. Note that buf and fd also get a + sign even though they are local variables which are clearly below the frame pointer.
In this example, it has named the canary var_8 as if it were a local variable. So I suppose to translate this properly, you have to think of var_8 as having the value -8.
I'm currently learning reverse engineering and therefore I need to learn assembly. The code is running well, but I got an error on JE and JNE instructions. To be more specific: I'm doing a detour hook as a practice and the program crashes, because it jumps to the false address.
I wanted to write this, but the compiler gave me an error (LNK2016):
JE 0x123 (0x123 example address)
I fixed that problem by writing:
JE short 0x123
Full function code:
my_hook:
push rbp
mov rbp, rsp
mov dword [rcx + 0x18], 99
test rcx, rcx
je 0x7FF7847902EE
lock dec dword [rcx + 0x08]
jne 0x7FF7847902EE
mov rcx, [rsp + 0x30]
leave
ret
But the problem now is, that it is jumping to the beginning of the function instead of 0x123 in our case.
The conditional jump instructions on x86 all take a relative displacement, not an absolute address. They work to jump to another label in your code, since the assembler can compute the displacement between the jump instruction and the destination, but you can't use them jump to an absolute address unless you know, at assembly time, the address where your code will be loaded.
Near absolute jumps on x86 are indirect: you need to have the address in a register or memory. And such jumps are only available in an unconditional form, so you'll have to use the conditional jump to get to them.
You could do
my_hook:
push rbp
mov rbp, rsp
mov dword [rcx + 0x18], 99
test rcx, rcx
je jump_elsewhere
lock dec dword [rcx + 0x08]
jne jump_elsewhere
mov rcx, [rsp + 0x30]
leave
ret
jump_elsewhere:
mov rax, 0x7FF7847902EE
jmp rax
If you can't spare a register, you could instead do
jump_elsewhere:
jmp [rel elsewhere_addr]
elsewhere_addr:
dq 0x7FF7847902EE
but this is more bytes of code.
If you do know the address where your code will be loaded, and it's within 2 GB of the destination address, you can use the ORG directive to tell the assembler about it. So if my_hook will be at address 0x7ff7DEADBEEF, you can do
org 0x7ff7DEADBEEF
my_hook:
; ...
je 0x7FF7847902EE
and the assembler will compute the correct 32-bit displacement.
By the way, the test rcx, rcx doesn't really make sense: if rcx were zero then the previous instruction mov dword [rcx + 0x18], 99 would have faulted, since on any decent operating system, the zero page will be unmapped. Maybe you wanted this test and conditional jump to go before the mov. Unless this is some sort of strange bare-metal code that you're patching?
I came across the following Go code:
type Element [12]uint64
//go:noescape
func CSwap(x, y *Element, choice uint8)
//go:noescape
func Add(z, x, y *Element)
where the CSwap and Add functions are basically coming from an assembly, and look like the following:
TEXT ·CSwap(SB), NOSPLIT, $0-17
MOVQ x+0(FP), REG_P1
MOVQ y+8(FP), REG_P2
MOVB choice+16(FP), AL // AL = 0 or 1
MOVBLZX AL, AX // AX = 0 or 1
NEGQ AX // RAX = 0x00..00 or 0xff..ff
MOVQ (0*8)(REG_P1), BX
MOVQ (0*8)(REG_P2), CX
// Rest removed for brevity
TEXT ·Add(SB), NOSPLIT, $0-24
MOVQ z+0(FP), REG_P3
MOVQ x+8(FP), REG_P1
MOVQ y+16(FP), REG_P2
MOVQ (REG_P1), R8
MOVQ (8)(REG_P1), R9
MOVQ (16)(REG_P1), R10
MOVQ (24)(REG_P1), R11
// Rest removed for brevity
What I try to do is that translate the assembly to a syntax that is more familiar to me (I think mine is more like NASM), while the above syntax is Go assembler. Regarding the Add method I didn't have much problem, and translated it correctly (according to test results). It looks like this in my case:
.text
.global add_asm
add_asm:
push r12
push r13
push r14
push r15
mov r8, [reg_p1]
mov r9, [reg_p1+8]
mov r10, [reg_p1+16]
mov r11, [reg_p1+24]
// Rest removed for brevity
But, I have a problem when translating the CSwap function, I have something like this:
.text
.global cswap_asm
cswap_asm:
push r12
push r13
push r14
mov al, 16
mov rax, al
neg rax
mov rbx, [reg_p1+(0*8)]
mov rcx, [reg_p2+(0*8)]
But this doesn't seem to be quite correct, as I get error when compiling it. Any ideas how to translate the above CSwap assembly part to something like NASM?
EDIT (SOLUTION):
Okay, after the two answers below, and some testing and digging, I found out that the code uses the following three registers for parameter passing:
#define reg_p1 rdi
#define reg_p2 rsi
#define reg_p3 rdx
Accordingly, rdx has the value of the choice parameter. So, all that I had to do was use this:
movzx rax, dl // Get the lower 8 bits of rdx (reg_p3)
neg rax
Using byte [rdx] or byte [reg_3] was giving an error, but using dl seems to work fine for me.
Basic docs about Go's asm: https://golang.org/doc/asm. It's not totally equivalent to NASM or AT&T syntax: FP is a pseudo-register name for whichever register it decides to use as the frame pointer. (Typically RSP or RBP). Go asm also seems to omit function prologue (and probably epilogue) instructions. As #RossRidge comments, it's a bit more like a internal representation like LLVM IR than truly asm.
Go also has its own object-file format, so I'm not sure you can make Go-compatible object files with NASM.
If you want to call this function from something other than Go, you'll also need to port the code to a different calling convention. Go appears to be using a stack-args calling convention even for x86-64, unlike the normal x86-64 System V ABI or the x86-64 Windows calling convention. (Or maybe those mov function args into REG_P1 and so on instructions disappear when Go builds this source for a register-arg calling convention?)
(This is why you could you had to use movzx eax, dl instead of loading from the stack at all.)
BTW, rewriting this code in C instead of NASM would probably make even more sense if you want to use it with C. Small functions are best inlined and optimized away by the compiler.
It would be a good idea to check your translation, or get a starting point, by assembling with the Go assembler and using a disassembler.
objdump -drwC -Mintel or Agner Fog's objconv disassembler would be good, but they don't understand Go's object-file format. If Go has a tool to extract the actual machine code or get it in an ELF object file, do that.
If not, you could use ndisasm -b 64 (which treats input files as flat binaries, disassembling all the bytes as if they were instructions). You can specify an offset/length if you can find out where the function starts. x86 instructions are variable length, and disassembly will likely be "out of sync" at the start of the function. You might want to add a bunch of single-byte NOP instructions (kind of a NOP sled) for the disassembler, so if it decodes some 0x90 bytes as part of an immediate or disp32 for a long instruction that was really not part of the function, it will be in sync. (But the function prologue will still be messed up).
You might add some "signpost" instructions to your Go asm functions to make it easy to find the right place in the mess of crazy asm from disassembling metadata as instructions. e.g. put a pmuludq xmm0, xmm0 in there somewhere, or some other instruction with a unique mnemonic that you can search for which the Go code doesn't include. Or an instruction with an immediate that will stand out, like addq $0x1234567, SP. (An instruction that will crash so you don't forget to take it out again is good here.)
Or you could use gdb's built-in disassembler: add an instruction that will segfault (like a load from a bogus absolute address (movl 0, AX null-pointer deref), or a register holding a non-pointer value e.g. movl (AX), AX). Then you'll have an instruction-pointer value for the instructions in memory, and can disassemble from some point behind that. (Probably the function start will be 16-byte aligned.)
Specific instructions.
MOVBLZX AL, AX reads AL, so that's definitely an 8-bit operand. The size for AX is given by the L part of the mnemonic, meaning long for 32 bit, like in GAS AT&T syntax. (The gas mnemonic for that form of movzx is movzbl %al, %eax). See What does cltq do in assembly? for a table of cdq / cdqe and the AT&T equivalent, and the AT&T / Intel mnemonic for the equivalent MOVSX instruction.
The NASM instruction you want is movzx eax, al. Using rax as the destination would be a waste of a REX prefix. Using ax as the destination would be a mistake: it wouldn't zero-extend into the full register, and would leave whatever high garbage. Go asm syntax for x86 is very confusing when you're not used to it, because AX can mean AX, EAX, or RAX depending on the operand size.
Obviously mov rax, al isn't a possibility: Like most instructions, mov requires both its operands to be the same size. movzx is one of the rare exceptions.
MOVB choice+16(FP), AL is a byte load into AL, not an immediate move. choice+16 is a an offset from FP. This syntax is basically the same as AT&T addressing modes, with FP as a register and choice as an assemble-time constant.
FP is a pseudo-register name. It's pretty clear that it should simply be loading the low byte of the 3rd arg-passing slot, because choice is the name of a function arg. (In Go asm, choice is just syntactic sugar, or a constant defined as zero.)
Before a call instruction, rsp points at the first stack arg, so that + 16 is the 3rd arg. It appears that FP is that base address (and might actually be rsp+8 or something). After a call (which pushes an 8 byte return address), the 3rd stack arg is at rsp + 24. After more pushes, the offset will be even larger, so adjust as necessary to reach the right location.
If you're porting this function to be called with a standard calling convention, the 3 integer args will be passed in registers, with no stack args. Which 3 registers depends on whether you're building for Windows vs. non-Windows. (See Agner Fog's calling conventions doc: http://agner.org/optimize/)
BTW, a byte load into AL and then movzx eax, al is just dumb. Much more efficient on all modern CPUs to do it in one step with
movzx eax, byte [rsp + 24] ; or rbp+32 if you made a stack frame.
I hope the source in the question is from un-optimized Go compiler output? Or the assembler itself makes such optimizations?
I think you can translate these as just
mov rbx, [reg_p1]
mov rcx, [reg_p2]
Unless I'm missing some subtlety, the offsets which are zero can just be ignored. The *8 isn't a size hint since that's already in the instruction.
The rest of your code looks wrong though. The MOVB choice+16(FP), AL in the original is supposed to be fetching the choice argument into AL, but you're setting AL to a constant 16, and the code for loading the other arguments seems to be completely missing, as is the code for all of the arguments in the other function.
So, I was messing around with assembly, and stumbled upon weird thing. Before writing anything, I set registers to these values:
AX 0000, BX 0000, SP 00FD.
Lets say, that we write simple code, which increases registers AX and BX:
A100
INC AX
CALL 0200
A200
INC BX
RETN
After writing it, I tried to see how registers change when executing commands, one by one, using T command. First time, it increases AX, moves to 0200, increases BX and returns. Here is the weird bit: when it returns, it executes command:
ADD [BX+SI],AX and then calls 0200 again.
When it calls 0200 again, it repeats itself, but now, when it returns, it returns to CALL 0100 command (not to CALL 0200) and increases AX again and so on. Why is this happening?
I have image of full output, maybe this can help understand my question better?:
http://s18.postimg.org/wt6eracg9/Untitled.png
Based on your screenshots, your code seems to be this (filled with nop's, disassembled with udcli):
echo 40 e8 fc 00 01 00 e8 f7 00 e8 f4 ff x{1..244} 43 c3 | sed 's/x[0-9]*\>/90/g' | udcli -o 0x100 -x -16
0000000000000100 40 inc ax
0000000000000101 e8fc00 call word 0x200
0000000000000104 0100 add [bx+si], ax
0000000000000106 e8f700 call word 0x200
0000000000000109 e8f4ff call word 0x100
000000000000010c *** never reached ***
0000000000000200 43 inc bx
0000000000000201 c3 ret
The code flow is the following, line by line:
0000000000000100 40 inc ax
ax gets incremented.
0000000000000101 e8fc00 call word 0x200
Return address 0x104 gets pushed to the stack, ip (instruction pointer) is set to 0x200.
0000000000000200 43 inc bx
bx gets incremented.
0000000000000201 c3 ret
Near return, that is, ip is popped from stack. New ip will be 0x104.
0000000000000104 0100 add [bx+si], ax
The value of ax is added to the word value at [bx+si].
0000000000000106 e8f700 call word 0x200
Return address 0x109 gets pushed to the stack, ip (instruction pointer) is set to 0x200.
0000000000000200 43 inc bx
bx gets incremented.
0000000000000201 c3 ret
Near return, that is, ip is popped from stack. New ip will be 0x109.
0000000000000109 e8f4ff call word 0x100
Return address 0x10c gets pushed to the stack, ip (instruction pointer) is set to 0x100. So this is actually an infinite recursive function and will run out of stack.
So your problem is that you don't define the code after CALL 0200. There happens to be 01 00 (add [bx+si], ax) and after return it will be executed, and after it other undefined instructions.
My advice: download any decent assembler (NASM, YASM, FASM...) ASAP and don't ruin your life trying to write assembly code with DEBUG. Trying to write assembly programs with DEBUG is an attempt destined to fail.
On possible reason is that you don't have some exit, you didn't use DOS's 0x4c service, and the program get bound the code and started to execute random commands or the execution flow hit some ret instruction where no call is used, it then will behave unexpectedly.
So I've got a very strange and specific problem here. We're putting seemingly valid code into a linker, and then the linker is making some mistakes, like removing a valid ptr label and instead of replacing it with a value, it's putting in 0 instead. But not everywhere either. It starts at a pretty arbitrary point.
So, some background: we have an interpreted language that is converted into assembly by putting together hand-generated chunks of assembly (by an in-house compiler-like application) and adding in variables where required. This is a system that has been working for around 10 years if not longer, in pretty much it's current form, so the validity of this method is not currently in question. This assembly is assembled to an obj file using the Microsoft Assembler (ml.exe or MASM).
Then in a separate step, this obj file is linked using the Microsoft Incremental Linker together with some other libraries (static, and import libraries for dlls) to create an executable.
Following is a portion of the assembly (.asm) file that the assembler outputs when it creates the obj file in the first step:
call _c_rt_strcmp
mov di, 1
mov ebp, esp
cmp ax, 0
je sym2148
dec di
sym2148: mov [ebp+6], di
add esp, 6
mov ebx, dword ptr [_smfv1_ptr]
add ebx, 0bb49h
pop ax
mov byte ptr [ebx],al
mov ebx, dword ptr [_smfv1_ptr]
add ebx, 012656h
push ebx
mov eax, OFFSET sym2151
push eax
call _c_rt_strcmp
mov di, 1
mov ebp, esp
cmp ax, 0
je sym2152
dec di
sym2152: mov [ebp+6], di
add esp, 6
mov ebx, dword ptr [_smfv1_ptr]
add ebx, 0bb32h
pop ax
mov byte ptr [ebx],al
mov ebx, dword ptr [_smfv1_ptr]
add ebx, 012656h
push ebx
mov eax, OFFSET sym2155
push eax
call _c_rt_strcmp
mov di, 1
mov ebp, esp
cmp ax, 0
je sym2156
dec di
sym2156: mov [ebp+6], di
add esp, 6
mov ebx, dword ptr [_smfv1_ptr]
add ebx, 0bb4ah
pop ax
mov byte ptr [ebx],al
mov ebx, dword ptr [_smfv1_ptr]
add ebx, 0126bbh
push ebx
mov eax, OFFSET sym2159
push eax
call _c_rt_strcmp
As far as I can see, that is correct, and makes sense. I believe that code that generates this does a string compare and then stores a value based on the result of the string compare, and it's a section of 20-30 of them, so there are 10 or so more groups before the first bit I posted, and about 15-20 after the last bit I posted.
Following is the disassembled view of the crash location that Visual C++ 5.0 (old I know, but this is on a legacy system unfortunately) shows when the executable crashes:
004093D2 call 00629570
004093D7 mov di,1
004093DB mov ebp,esp
004093DD cmp ax,0
004093E1 je 004093E5
004093E3 dec di
004093E5 mov word ptr [ebp+6],di
004093E9 add esp,6
004093EC mov ebx,dword ptr ds:[64174Ch]
004093F2 add ebx,0BB49h
004093F8 pop ax
004093FA mov byte ptr [ebx],al
004093FC mov ebx,dword ptr ds:[64174Ch]
00409402 add ebx,12656h
00409408 push ebx
00409409 mov eax,0
0040940E push eax
0040940F call 00409414
00409414 mov di,1
00409418 mov ebp,esp
0040941A cmp ax,0
0040941E je 00409422
00409420 dec di
00409422 mov word ptr [ebp+6],di
00409426 add esp,6
00409429 mov ebx,dword ptr ds:[0]
0040942F add ebx,0BB32h
00409435 pop ax
00409437 mov byte ptr [ebx],al
00409439 mov ebx,dword ptr ds:[0]
0040943F add ebx,12656h
00409445 push ebx
00409446 mov eax,0
0040944B push eax
0040944C call 00409451
00409451 mov di,1
00409455 mov ebp,esp
00409457 cmp ax,0
0040945B je 0040945F
0040945D dec di
0040945F mov word ptr [ebp+6],di
00409463 add esp,6
00409466 mov ebx,dword ptr ds:[0]
0040946C add ebx,0BB4Ah
00409472 pop ax
00409474 mov byte ptr [ebx],al
00409476 mov ebx,dword ptr ds:[0]
0040947C add ebx,126BBh
00409482 push ebx
00409483 mov eax,0
00409488 push eax
00409489 call 0040948E
The actual crash location is 0x00409429.
The two bits of code match, as in they are the same section of code, except the first from the .asm file is what is going into the linker, and the second one is that has come out of the linker.
From what I can see, it's crashing at that location because it's attempting to de-reference an address of 0, right? So of course that's going to fail. Also, if you take a look at the different function calls at 0x004093D2, 0x0040940F, 0x0040944C and 0x00409489, only the first one is valid, the others are simply doing a function call to the line directly after them! And this broken code continues on for the next 15-20 similar segments of code that is defined in that file.
If you look at the corresponding lines for the function calls and the bad pointers in both sections, you will see that in the .asm file everything appears to be correct, but somehow the linker just breaks it all when it compiles the exe, and in a very specific spot, as there are similar chunks of code previous in the file which are constructed correctly.
We do get some linker warnings of the form: "LINK : warning LNK4049: locally defined symbol "_smfv1_ptr" imported". But we were getting those same warnings even when it was working.
The assembler used was ml.exe version 6.11, and the linker was link.exe version 5.10.7303 which were both the version from Visual C++ 5.0. Since the assembled code seems to be right, I'm going to be trying the linker from Visual Studio 2005, 2008 and 2010 and see if anything changes.
I can't really imagine what could create this kind of error, and I thought maybe it was symbols getting messed up, but there are jumps to locations (for small 'if' statements) which are stored as symbols which still work fine after they get through the linker.
Is it possible that a symbol table or something similar is getting overloaded inside the linker, and it's just reverting to bad or default values?
Call to the following address is a clear sign of unresolved symbol reference (it makes sense if you notice that all calls in .obj files are emitted as E8 00 00 00 00). You have zeroes for some data references as well (sym2151, some references to _smfv1_ptr, sym2159). What's strange is that the first call to _c_rt_strcmp did get resolved. I would suggest turning on all warning/debugging/verbose switches that you can find, and generating all kinds of listing and map files. Maybe something will jump out.
Okay, so the end result seems to be that it is a bug with the Visual C++ version of the masm assembler "ml.exe" (big surprise, huh?)
So, moving to the versions of masm and link that were provided in Visual Studio 2005 seems to be the best solution for us.
Thanks for your help guys :)