__debugbreak and int3 instruction - debugging

jmp 1f
push %eax
1:
movl $100, (%eax)
call __debugbreak
int3
ret
Suppose we want to put a breakpoint on push %eax instruction. Can we do it
by rewriting push %eax with call debugbreak? What is the problem with the above approach? Why does rewriting with int3 work in this case?

int3 is a single-byte instruction (and so is the push in this case). call isn't. You jmp will jump into the "middle" of call if you overwrite your push with call instead of int3.
But unless there's a way to reach the push (I see no label before it to jump/call to), putting a breakpoint on push is useless, isn't it?

Related

Why is my branch instruction not jumping to the given address?

I'm currently learning reverse engineering and therefore I need to learn assembly. The code is running well, but I got an error on JE and JNE instructions. To be more specific: I'm doing a detour hook as a practice and the program crashes, because it jumps to the false address.
I wanted to write this, but the compiler gave me an error (LNK2016):
JE 0x123 (0x123 example address)
I fixed that problem by writing:
JE short 0x123
Full function code:
my_hook:
push rbp
mov rbp, rsp
mov dword [rcx + 0x18], 99
test rcx, rcx
je 0x7FF7847902EE
lock dec dword [rcx + 0x08]
jne 0x7FF7847902EE
mov rcx, [rsp + 0x30]
leave
ret
But the problem now is, that it is jumping to the beginning of the function instead of 0x123 in our case.
The conditional jump instructions on x86 all take a relative displacement, not an absolute address. They work to jump to another label in your code, since the assembler can compute the displacement between the jump instruction and the destination, but you can't use them jump to an absolute address unless you know, at assembly time, the address where your code will be loaded.
Near absolute jumps on x86 are indirect: you need to have the address in a register or memory. And such jumps are only available in an unconditional form, so you'll have to use the conditional jump to get to them.
You could do
my_hook:
push rbp
mov rbp, rsp
mov dword [rcx + 0x18], 99
test rcx, rcx
je jump_elsewhere
lock dec dword [rcx + 0x08]
jne jump_elsewhere
mov rcx, [rsp + 0x30]
leave
ret
jump_elsewhere:
mov rax, 0x7FF7847902EE
jmp rax
If you can't spare a register, you could instead do
jump_elsewhere:
jmp [rel elsewhere_addr]
elsewhere_addr:
dq 0x7FF7847902EE
but this is more bytes of code.
If you do know the address where your code will be loaded, and it's within 2 GB of the destination address, you can use the ORG directive to tell the assembler about it. So if my_hook will be at address 0x7ff7DEADBEEF, you can do
org 0x7ff7DEADBEEF
my_hook:
; ...
je 0x7FF7847902EE
and the assembler will compute the correct 32-bit displacement.
By the way, the test rcx, rcx doesn't really make sense: if rcx were zero then the previous instruction mov dword [rcx + 0x18], 99 would have faulted, since on any decent operating system, the zero page will be unmapped. Maybe you wanted this test and conditional jump to go before the mov. Unless this is some sort of strange bare-metal code that you're patching?

FASM assembly program waits before exiting

This is the code that I use with FASM:
format PE console
entry main
include '..\MACRO\import32.inc'
section '.data' data readable writeable
msg db "привіт!",0dh,0ah,0 ;hi
lcl_set db ?
section '.code' code readable executable
main:
;fail without set locale
push msg
call [printf]
pop ecx
;succeed with set locale
push msg
call _liapnuty
pop ecx
push 0
call [ExitProcess]
_liapnuty:
push ebp
mov ebp, esp
;sub esp, 0
mov ebx,[ebp+8] ; 1st arg addr
mov al, [lcl_set]
or al, al
jnz _liapnuty_rest
call __set_locale
_liapnuty_rest:
push ebx
call [printf]
pop ebx
mov esp, ebp
pop ebp
ret 0
__set_locale:
mov al, [lcl_set]
or al, al
jnz __set_locale_rest
push 1251
call SetConsoleCP
call SetConsoleOutputCP
pop ecx
mov [lcl_set], 1
;push lcl
;call [system]
; pop ecx
; mov [lcl_set], 1
;push cls
;call [printf]
; pop ecx
__set_locale_rest:
ret 0
section '.idata' import data readable
library kernel,'kernel32.dll',\
msvcrt,'msvcrt.dll'
import kernel,\
SetConsoleCP,'SetConsoleCP',\
SetConsoleOutputCP,'SetConsoleOutputCP',\
ExitProcess,'ExitProcess'
import msvcrt,\
printf,'printf'
It works almost perfectly, except that before exiting it waits for like a second for some reason. It outputs data almost instantly, yet it fails to shut down quickly. If the reason is using these libraries or not clearing the stack after calling ExitProcess (which obviously can't be done), then let me know and I will mostly gladly accept this answer, but I want to be 100% sure I'm doing everything correctly.
The reason for all of it was because kernel32 functions pop their parameters themselves on return. If I remove unnecessary pops it starts working fast again. Of course, the program still runs with damaged stack but it does a lot of damage control at the end. That's why it was slow, but still worked. For everyone facing this issue, make sure to be careful with the calling convention.
To debug the application and find the error I used OLLYDBG. It's free and it works. It helps you debug EXEs and DLLs, allowing to step one command at a time. Also it shows the memory, the stack and all of the registers and flags.
Using the stack I was able to find out that it gets corrupted.

Translating Go assembler to NASM

I came across the following Go code:
type Element [12]uint64
//go:noescape
func CSwap(x, y *Element, choice uint8)
//go:noescape
func Add(z, x, y *Element)
where the CSwap and Add functions are basically coming from an assembly, and look like the following:
TEXT ·CSwap(SB), NOSPLIT, $0-17
MOVQ x+0(FP), REG_P1
MOVQ y+8(FP), REG_P2
MOVB choice+16(FP), AL // AL = 0 or 1
MOVBLZX AL, AX // AX = 0 or 1
NEGQ AX // RAX = 0x00..00 or 0xff..ff
MOVQ (0*8)(REG_P1), BX
MOVQ (0*8)(REG_P2), CX
// Rest removed for brevity
TEXT ·Add(SB), NOSPLIT, $0-24
MOVQ z+0(FP), REG_P3
MOVQ x+8(FP), REG_P1
MOVQ y+16(FP), REG_P2
MOVQ (REG_P1), R8
MOVQ (8)(REG_P1), R9
MOVQ (16)(REG_P1), R10
MOVQ (24)(REG_P1), R11
// Rest removed for brevity
What I try to do is that translate the assembly to a syntax that is more familiar to me (I think mine is more like NASM), while the above syntax is Go assembler. Regarding the Add method I didn't have much problem, and translated it correctly (according to test results). It looks like this in my case:
.text
.global add_asm
add_asm:
push r12
push r13
push r14
push r15
mov r8, [reg_p1]
mov r9, [reg_p1+8]
mov r10, [reg_p1+16]
mov r11, [reg_p1+24]
// Rest removed for brevity
But, I have a problem when translating the CSwap function, I have something like this:
.text
.global cswap_asm
cswap_asm:
push r12
push r13
push r14
mov al, 16
mov rax, al
neg rax
mov rbx, [reg_p1+(0*8)]
mov rcx, [reg_p2+(0*8)]
But this doesn't seem to be quite correct, as I get error when compiling it. Any ideas how to translate the above CSwap assembly part to something like NASM?
EDIT (SOLUTION):
Okay, after the two answers below, and some testing and digging, I found out that the code uses the following three registers for parameter passing:
#define reg_p1 rdi
#define reg_p2 rsi
#define reg_p3 rdx
Accordingly, rdx has the value of the choice parameter. So, all that I had to do was use this:
movzx rax, dl // Get the lower 8 bits of rdx (reg_p3)
neg rax
Using byte [rdx] or byte [reg_3] was giving an error, but using dl seems to work fine for me.
Basic docs about Go's asm: https://golang.org/doc/asm. It's not totally equivalent to NASM or AT&T syntax: FP is a pseudo-register name for whichever register it decides to use as the frame pointer. (Typically RSP or RBP). Go asm also seems to omit function prologue (and probably epilogue) instructions. As #RossRidge comments, it's a bit more like a internal representation like LLVM IR than truly asm.
Go also has its own object-file format, so I'm not sure you can make Go-compatible object files with NASM.
If you want to call this function from something other than Go, you'll also need to port the code to a different calling convention. Go appears to be using a stack-args calling convention even for x86-64, unlike the normal x86-64 System V ABI or the x86-64 Windows calling convention. (Or maybe those mov function args into REG_P1 and so on instructions disappear when Go builds this source for a register-arg calling convention?)
(This is why you could you had to use movzx eax, dl instead of loading from the stack at all.)
BTW, rewriting this code in C instead of NASM would probably make even more sense if you want to use it with C. Small functions are best inlined and optimized away by the compiler.
It would be a good idea to check your translation, or get a starting point, by assembling with the Go assembler and using a disassembler.
objdump -drwC -Mintel or Agner Fog's objconv disassembler would be good, but they don't understand Go's object-file format. If Go has a tool to extract the actual machine code or get it in an ELF object file, do that.
If not, you could use ndisasm -b 64 (which treats input files as flat binaries, disassembling all the bytes as if they were instructions). You can specify an offset/length if you can find out where the function starts. x86 instructions are variable length, and disassembly will likely be "out of sync" at the start of the function. You might want to add a bunch of single-byte NOP instructions (kind of a NOP sled) for the disassembler, so if it decodes some 0x90 bytes as part of an immediate or disp32 for a long instruction that was really not part of the function, it will be in sync. (But the function prologue will still be messed up).
You might add some "signpost" instructions to your Go asm functions to make it easy to find the right place in the mess of crazy asm from disassembling metadata as instructions. e.g. put a pmuludq xmm0, xmm0 in there somewhere, or some other instruction with a unique mnemonic that you can search for which the Go code doesn't include. Or an instruction with an immediate that will stand out, like addq $0x1234567, SP. (An instruction that will crash so you don't forget to take it out again is good here.)
Or you could use gdb's built-in disassembler: add an instruction that will segfault (like a load from a bogus absolute address (movl 0, AX null-pointer deref), or a register holding a non-pointer value e.g. movl (AX), AX). Then you'll have an instruction-pointer value for the instructions in memory, and can disassemble from some point behind that. (Probably the function start will be 16-byte aligned.)
Specific instructions.
MOVBLZX AL, AX reads AL, so that's definitely an 8-bit operand. The size for AX is given by the L part of the mnemonic, meaning long for 32 bit, like in GAS AT&T syntax. (The gas mnemonic for that form of movzx is movzbl %al, %eax). See What does cltq do in assembly? for a table of cdq / cdqe and the AT&T equivalent, and the AT&T / Intel mnemonic for the equivalent MOVSX instruction.
The NASM instruction you want is movzx eax, al. Using rax as the destination would be a waste of a REX prefix. Using ax as the destination would be a mistake: it wouldn't zero-extend into the full register, and would leave whatever high garbage. Go asm syntax for x86 is very confusing when you're not used to it, because AX can mean AX, EAX, or RAX depending on the operand size.
Obviously mov rax, al isn't a possibility: Like most instructions, mov requires both its operands to be the same size. movzx is one of the rare exceptions.
MOVB choice+16(FP), AL is a byte load into AL, not an immediate move. choice+16 is a an offset from FP. This syntax is basically the same as AT&T addressing modes, with FP as a register and choice as an assemble-time constant.
FP is a pseudo-register name. It's pretty clear that it should simply be loading the low byte of the 3rd arg-passing slot, because choice is the name of a function arg. (In Go asm, choice is just syntactic sugar, or a constant defined as zero.)
Before a call instruction, rsp points at the first stack arg, so that + 16 is the 3rd arg. It appears that FP is that base address (and might actually be rsp+8 or something). After a call (which pushes an 8 byte return address), the 3rd stack arg is at rsp + 24. After more pushes, the offset will be even larger, so adjust as necessary to reach the right location.
If you're porting this function to be called with a standard calling convention, the 3 integer args will be passed in registers, with no stack args. Which 3 registers depends on whether you're building for Windows vs. non-Windows. (See Agner Fog's calling conventions doc: http://agner.org/optimize/)
BTW, a byte load into AL and then movzx eax, al is just dumb. Much more efficient on all modern CPUs to do it in one step with
movzx eax, byte [rsp + 24] ; or rbp+32 if you made a stack frame.
I hope the source in the question is from un-optimized Go compiler output? Or the assembler itself makes such optimizations?
I think you can translate these as just
mov rbx, [reg_p1]
mov rcx, [reg_p2]
Unless I'm missing some subtlety, the offsets which are zero can just be ignored. The *8 isn't a size hint since that's already in the instruction.
The rest of your code looks wrong though. The MOVB choice+16(FP), AL in the original is supposed to be fetching the choice argument into AL, but you're setting AL to a constant 16, and the code for loading the other arguments seems to be completely missing, as is the code for all of the arguments in the other function.

Windows32 API: "mov edi,edi" on function entry?

I'm stepping through Structured Error Handling recovery code in Windows 7
(e.g, what happens after SEH handler is done and passes back "CONTINUE" code).
Here's a function which is called:
7783BD9F mov edi,edi
7783BDA1 push ebp
7783BDA2 mov ebp,esp
7783BDA4 push 1
7783BDA6 push dword ptr [ebp+0Ch]
7783BDA9 push dword ptr [ebp+8]
7783BDAC call 778692DF
7783BDB1 pop ebp
7783BDB2 ret 8
I'm used to the function prolog of "push ebp/mov ebp,esp". What's the purpose
of the "mov edi,edi"?
Raymond Chen (one of the Microsoft developers) has answered this exact question:
Why do Windows functions all begin with a pointless MOV EDI, EDI instruction?
And he links an even earlier reference:
Why does the compiler generate a MOV EDI, EDI instruction at the beginning of functions?
Basically, it leaves space for a jump instruction to be inserted during hot patching.

What is the use of "push %ebp; movl %esp, %ebp" generated by GCC for x86?

What effect these two instructions cause in the assembly code generated by gcc for x86 machines:
push %ebp
movl %esp, %ebp
unwind's explanation is the literal truth (one minor directional error notwithstanding), but doesn't explain why.
%ebp is the "base pointer" for your stack frame. It's the pointer used by the C runtime to access local variables and parameters on the stack. Here's some typical function prologue code generated by GCC (g++ to be precise) First the C++ source.
// junk.c++
int addtwo(int a)
{
int x = 2;
return a + x;
}
This generates the following assembler.
.file "junk.c++"
.text
.globl _Z6addtwoi
.type _Z6addtwoi, #function
_Z6addtwoi:
.LFB2:
pushl %ebp
.LCFI0:
movl %esp, %ebp
.LCFI1:
subl $16, %esp
.LCFI2:
movl $2, -4(%ebp)
movl -4(%ebp), %edx
movl 8(%ebp), %eax
addl %edx, %eax
leave
ret
.LFE2:
.size _Z6addtwoi, .-_Z6addtwoi
.ident "GCC: (Ubuntu 4.3.3-5ubuntu4) 4.3.3"
.section .note.GNU-stack,"",#progbits
Now to explain that prologue code (all the stuff before .LCFI2:), first:
pushl %ebp stores the stack frame of the calling function on the stack.
movl %esp, %ebp takes the current stack pointer and uses it as the frame for the called function.
subl $16, %esp leaves room for local variables.
Now your function is ready for business. Any references with a negative offset from the %ebp% register are your local variables (x in this example). Any references with a positive offset from the %ebp% register are your parameters passed in.
The final point of interest is the leave instruction which is an x86 assembler instruction which does the work of restoring the calling function's stack frame. This is usually optimized away in to the faster move %ebp %esp and pop %ebp% sequence in C code. For illustrative purposes, however, I didn't compile with any optimizations on at all.
It's typical code that you see at the beginning of a function.
It saves the contents of the EBP register on the stack, and then stores the content of the current stack pointer in EBP.
The stack is used during a function call to store local arguments. But in the function, the stack pointer may change because values are stored on the stack.
If you save the original value of the stack, you can refer to the stored arguments via the EBP register, while you can still use (add values to) the stack.
At the end of the function you will probably see the command
pop %ebp ; restore original value
ret ; return
push %ebp
This will push the 32 bit (extended) base pointer register on the stack, i.e. the stack pointer (%esp) is subtracted by four, then the value of %ebp is copied to the location that the stack pointer points to.
movl %esp, %ebp
This copies the stack pointer register to the base pointer register.
The purpose of copying the stack pointer to the base pointer is to create a stack frame, i.e. an area on the stack where a subroutine can store local data. The code in the subroutine would use the base pointer to reference the data.
It's part of what is known as the function prolog.
It saves the current base pointer that is going to be retrieved when the function ends and sets the new ebp to the beginning of the new frame.
I also think that it is important to note that often after
push %ebp and
movl %esp, %ebp
the assembly will have push %ebx or push %edx. These are caller saves of the registers %ebx and %edx. At the end of the routine call the registers will be restored with their original values.
Also - %ebx, %esi, %edi are all callee save registers. So if you want to overwrite them you need to save them first, then restore them.
The piece of code sets up the stack for your program.
In x86 stack information is held by two registers.
Base pointer (bp): Holds starting address of the stack
Stack pointer (sp): Holds the address in which next value will be stored
These registers have different names in different modes:
Base pointer Stack pointer
16 bit real mode: bp sp
32 bit protected mode: ebp(%ebp) esp(%esp)
64 bit mode: rbp rsp
When you set up a stack, stack pointer and base pointer gets the same address at the beginning.
Now to explain your code,
push %ebp
This code pushes the current stack address into the stack so that the function can "exit" or "return" properly.
movl %esp, %ebp
This code sets up the stack for your function.
For more info, refer to this question.
Hope this help!

Resources