I was trying to implement a stack tracer, using stack pointers; RSP and RBP, but I think debuggers use an entirely different way to grab the return addresses, or maybe I am missing something. I can grab the return address of the last stack frame, but I can't get the others because I don't know the size of other stack frames, so I can't figure out how much bytes should I go back from stack frame, to get the return address. Are there anybody know which way do debuggers use to trace stack?
It is possible to trace the stack when the code uses frame pointers. In this case ebp/rbp is used as the frame pointer and functions begin with prologs and end with epilogs.
A typical prolog looks like this:
push rbp ; save previous frame pointer
mov rbp, rsp ; initialize this functions frame pointer
A typical epilog looks like this:
mov rsp, rbp ; restore the value of rsp
pop rbp ; restore previous frame pointer value from stack
retn
Thus in every place in a function rbp points to the stack position where the previous frame pointer is saved and rbp+8 contains the saved return address.
To get the called function a debugger should read [rbp+8] value and find a function to which this address belongs. This can be done by searching in debugging symbols.
Next it should read [rbp] value to get the frame pointer of the caller function. Continue this process until you find a toplevel function. This is typically a system library function that starts threads.
Related
Writing x64 Assembly code using MASM, we can use these directives to provide frame unwinding information. For example, from .SETFRAME definition:
These directives do not generate code; they only generate .xdata and .pdata.
Since these directives don't produce any code, I cannot see their effects in Disassembly window. So, I don't see any difference, when I write assembly function with or without these directives. How can I see the result of these directives - using dumpbin or something else?
How to write code that can test this unwinding capability? For example, I intentionally write assembly code that causes an exception. I want to see the difference in exception handling behavior, when function is written with or without these directives.
In my case caller is written in C++, and can use try-catch, SSE etc. - whatever is relevant for this situation.
Answering your question:
How can I see the result of these directives - using dumpbin or something else?
You can use dumpbin /UNWINDINFO out.exe to see the additions to the .pdata resulting from your use of .SETFRAME.
The output will look something like the following:
00000054 00001530 00001541 000C2070
Unwind version: 1
Unwind flags: None
Size of prologue: 0x04
Count of codes: 2
Frame register: rbp
Frame offset: 0x0
Unwind codes:
04: SET_FPREG, register=rbp, offset=0x00
01: PUSH_NONVOL, register=rbp
A bit of explanation to the output:
The second hex number found in the output is the function address 00001530
Unwind codes express what happens in the function prolog. In the example what happens is:
RBP is pushed to the stack
RBP is used as the frame pointer
Other functions may look like the following:
000000D8 000016D0 0000178A 000C20E4
Unwind version: 1
Unwind flags: EHANDLER UHANDLER
Size of prologue: 0x05
Count of codes: 2
Unwind codes:
05: ALLOC_SMALL, size=0x20
01: PUSH_NONVOL, register=rbx
Handler: 000A2A50
One of the main differences here is that this function has an exception handler. This is indicated by the Unwind flags: EHANDLER UHANDLER as well as the Handler: 000A2A50.
Probably your best bet is to have your asm function call another C++ function, and have your C++ function throw a C++ exception. Ideally have the code there depend on multiple values in call-preserved registers, so you can make sure they get restored. But just having unwinding find the right return addresses to get back into your caller requires correct metadata to indicate where that is relative to RSP, for any given RIP.
So create a situation where a C++ exception needs to unwind the stack through your asm function; if it works then you got the stack-unwind metadata directives correct. Specifically, try{}catch in the C++ caller, and throw in a C++ function you call from asm.
That thrower can I think be extern "C" so you can call it from asm without name mangling. Or call it via a function pointer, or just look at MSVC compiler output and copy the mangled name into asm.
Apparently Windows SEH uses the same mechanism as plain C++ exceptions, so you could potentially set up a catch for the exception delivered by the kernel in response to a memory fault from something like mov ds:[0], eax (null deref). You could put this at any point in your function to make sure the exception unwind info was correct about the stack state at every point, not just getting back into sync before a function-call.
https://learn.microsoft.com/en-us/cpp/build/exception-handling-x64?view=msvc-170&viewFallbackFrom=vs-2019 has details about the metadata.
BTW, the non-Windows (e.g. GNU/Linux) equivalent of this metadata is DWARF .cfi directives which create a .eh_frame section.
I don't know equivalent details for Windows, but I do know they use similar metadata that makes it possible to unwind the stack without relying on RBP frame pointers. This lets compilers make optimized code that doesn't waste instructions on push rbp / mov rbp,rsp and leave in function prologues/epilogues, and frees up RBP for use as a general-purpose register. (Even more useful in 32-bit code where 7 instead of 6 registers besides the stack pointer is a much bigger deal than 15 vs. 14.)
The idea is that given a RIP, you can look up the offset from RSP to the return address on the stack, and the locations of any call-preserved registers. So you can restore them and continue unwinding into the parent using that return address.
The metadata indicates where each register was saved, relative to RSP or RBP, given the current RIP as a search key. In functions that use an RBP frame pointer, one piece of metadata can indicate that. (Other metadata for each push rbx / push r12 says which call-preserved regs were saved in which order).
In functions that don't use RBP as a frame pointer, every push / pop or sub/add RSP needs metadata for which RIP it happened at, so given a RIP, stack unwinding can see where the return address is, and where those saved call-preserved registers are. (Functions that use alloca or VLAs thus must use RBP as a frame pointer.)
This is the big-picture problem that the metadata has to solve. There are a lot of details, and it's much easier to leave things up to a compiler!
If I am using the Win32 entry point and I have the following code (in NASM):
extern _ExitProcess#4
global _start
section .text
_start:
mov ebp, esp
; Reserve space onto the stack for two 4 bytes variables
sub esp, 4
sub esp, 4
; ExitProcess(0)
push 0
call _ExitProcess#4
Now before exiting the process, should I increase the esp value to remove the two variables from the stack like I do with any "normal" function?
ExitProcess api can be called from any place. in any function and sub-function. and stack pointer of course can be any. you not need set any registers (include stack pointer) to some (and which ?) values. so answer - you not need increase the esp
as noted #HarryJohnston of course stack must be valid and aligned. as and before any api call. ExiProcess is usual api. and can be call as any another api. and like any another api it require only valid stack but not concrete stack pointer value. non-volatile registers need restore only we return to caller. but ExiProcess not return to caller. it at all never return
so rule is very simply - if you return from any function (entry point or absolute any - does not matter) - we need restore non volatile registers (stack pointer esp or rsp based on calling conventions) and return. if we not return to caller - we and not need restore/preserve any registers. if we return from thread or process entry point, despite good practice also restore all registers as well - in current windows implementations - even if we not do this, any way all will be work, because kernel32 shell caller simply just call ExitThread after we return. it not use any non volatile registers or local variables here. so code will be worked even without restore this from entry point, but much better restore it anyway
As suggested to me in another question i checked the windows ABI and i'm left a little confused about what i can and cannot do if i'm not calling windows API myself.
My scenario is i'm programming .NET and need a small chunk of code in asm targeting a specific processor for a time critical section of code that does heavy multi pass processing on an array.
When checking the register information in the ABI at https://msdn.microsoft.com/en-us/library/9z1stfyw.aspx
I'm left a little confused about what applies to me if i
1) Don't call the windows API from the asm code
2) Don't return a value and take a single parameter.
Here is what i understand, am i getting all of it right?
RAX : i can overwrite this without preserving it as the function doesn't expect a return value
RCX : I need to preserve this as this is where the single int parameter will be passed, then i can overwrite it and not restore it
RDX/R8/R9 : Should not be initialized as there are no such parameters in my method, i can overwrite those and not restore them
R10/R11 : I can overwrite those without saving them, if the caller needs it he is in charge of preserving them
R12/R13/R14/R15/RDI/RSI/RBX : I can overwrite them but i first need to save them (or can i just not save them if i'm not calling the windows API?)
RBP/RSP : I'm assuming i shouldn't touch those?
If so am i correct that this is the right way to handle this (if i don't care about the time taking to preserve data and need as many registers available as possible)? Or is there a way to use even more registers?
; save required registers
push r12
push r13
push r14
push r15
push rdi
push rsi
push rbx
; my own array processing code here, using rax as the memory address passed as the first parameter
; safe to use rax rbx rcx rdx r8 r9 r10 r11 r12 r13 r14 r15 rdi rsi giving me 14 64bit registers
; 1 for the array address 13 for processing
; should not touch rbp rsp
; restore required registers
pop rbx
pop rsi
pop rdi
pop r15
pop r14
pop r13
pop r12
TL;DR: if you need registers that are marked preserved, push/pop them in proper order. With your code you can use those 14 registers you mention without issues. You may touch RBP if you preserve it, but don't touch RSP basically ever.
It does matter if you call Windows APIs but not in the way I assume you think. The ABI says what registers you must preserve. The preservation information means that the caller knows that there are registers you will not change. You don't need to call any Windows API functions for that requirement to be there.
The idea as an analogue (yeah, I know...): Here are five different colored stacks of sticky notes. You can use any of them, but if you need the red or the blue ones, could you keep the top one in a safe place and put it back when you stop since I need the phone numbers on them. About the other colors I don't care, they were just scratch paper and I've written the information elsewhere.
So if you call an external function you know that no function will ever change the value of the registers marked as preserved. Any other register may change their values and you have to make sure you don't have anything there that needs to be preserved.
And when your function is called, the caller expects the same: if they put a value in a preserved register, it will have the same value after the call. But any non-preserved registers may be whatever and they will make sure they store those values if they need to keep them.
The return value register you may use however you want. If the function doesn't return a value the caller must not expect it to have any specific value and also will not expect it to preserve its value.
You only need to preserve the registers you use. If you don't use all of these, you don't need to preserve all of them.
You can freely use RAX, RCX, RDX, R8, R9, R10 and R11. The latter two must be preserved by the caller, if necessary, not by your function.
Most of the time, these registers (or their subregisters like EAX) are enough for my purposes. I hardly ever need more.
Of course, if any of these (e.g. RCX) contain arguments for your function, it is up to you to preserve them for yourself as long as you need them. How you do that is also up to you. But if you push them, make sure that there is a corresponding pop somewhere.
Use This MSDN page as a guide.
In the context of exploit development, I end up with this stack:
ESP 5d091399 Return address -- points to a RETN in a DLL
ESP+4 42424242 ASCII 'BBBB' -- labelled "junk" in my tutorial
ESP+8 5209398c Beginning of ROP chain and shellcode
...
In the stack above, the return address has been overwritten with a pointer to a RETN instruction in a DLL. The effect of this is that the stack goes from ESP to ESP+8, where the ROP chain starts.
ESP+4 is never at the top of the stack, hence it's label "junk" and the fact that it can contain any garbage bytes we want. It will have no effect on the execution of the exploit.
How does a return address pointing to a RETN result in the top of the stack going from ESP to ESP+8.
(Or why are the four middle bytes unimportant?)
Note:
The pointer in ESP is definitely to a single RETN on it's own. It is not preceded by a POP.
EDIT:
This tutorial also shows the "junk" (scroll down a tiny bit until exploit code).
This is the same context as my other tutorial (which is on paper, can't link)
Treat this more as pseudocode than anything. If there's some macro or other element that you feel should be included, let me know.
I'm rather new to assembly. I programmed on a pic processor back in college, but nothing since.
The problem here (segmentation fault) is the first instruction after "Compile function entrance, setup stack frame." or "push %ebp". Here's what I found out about those two instructions:
http://unixwiz.net/techtips/win32-callconv-asm.html
Save and update the %ebp :
Now that we're in the new function, we need a new local stack frame pointed to by %ebp, so this is done by saving the current %ebp (which belongs to the previous function's frame) and making it point to the top of the stack.
push ebp
mov ebp, esp // ebp « esp
Once %ebp has been changed, it can now refer directly to the function's arguments as 8(%ebp), 12(%ebp). Note that 0(%ebp) is the old base pointer and 4(%ebp) is the old instruction pointer.
Here's the code. This is from a JIT compiler for a project I'm working on. I'm doing this more for the learning experience than anything.
IL_CORE_COMPILE(avs_x86_compiler_compile)
{
X86GlobalData *gd = X86_GLOBALDATA(ctx);
ILInstruction *insn;
avs_debug(print("X86: Compiling started..."));
/* Initialize X86 Assembler opcode context */
x86_context_init(&gd->ctx, 4096, 1024*1024);
/* Compile function entrance, setup stack frame*/
x86_emit1(&gd->ctx, pushl, ebp);
x86_emit2(&gd->ctx, movl, esp, ebp);
/* Setup floating point rounding mode to integer truncation */
x86_emit2(&gd->ctx, subl, imm(8), esp);
x86_emit1(&gd->ctx, fstcw, disp(0, esp));
x86_emit2(&gd->ctx, movl, disp(0, esp), eax);
x86_emit2(&gd->ctx, orl, imm(0xc00), eax);
x86_emit2(&gd->ctx, movl, eax, disp(4, esp));
x86_emit1(&gd->ctx, fldcw, disp(4, esp));
for (insn=avs_il_tree_base(tree); insn != NULL; insn = insn->next) {
avs_debug(print("X86: Compiling instruction: %p", insn));
compile_opcode(gd, obj, insn);
}
/* Restore floating point rounding mode */
x86_emit1(&gd->ctx, fldcw, disp(0, esp));
x86_emit2(&gd->ctx, addl, imm(8), esp);
/* Cleanup stack frame */
x86_emit0(&gd->ctx, emms);
x86_emit0(&gd->ctx, leave);
x86_emit0(&gd->ctx, ret);
/* Link machine */
obj->run = (AvsRunnableExecuteCall) gd->ctx.buf;
return 0;
}
And when obj->run is called, it's called with obj as its only argument:
obj->run(obj);
If it helps, here are the instructions for the entire function call. It's basically an assignment operation: foo=3*0.2;. foo is pointing to a float in C.
0x8067990: push %ebp
0x8067991: mov %esp,%ebp
0x8067993: sub $0x8,%esp
0x8067999: fnstcw (%esp)
0x806799c: mov (%esp),%eax
0x806799f: or $0xc00,%eax
0x80679a4: mov %eax,0x4(%esp)
0x80679a8: fldcw 0x4(%esp)
0x80679ac: flds 0x806793c
0x80679b2: fsts 0x805f014
0x80679b8: fstps 0x8067954
0x80679be: fldcw (%esp)
0x80679c1: add $0x8,%esp
0x80679c7: emms
0x80679c9: leave
0x80679ca: ret
Edit: Like I said above, in the first instruction in this function, %ebp is void. This is also the instruction that causes the segmentation fault. Is that because it's void, or am I looking for something else?
Edit: Scratch that. I keep typing edp instead of ebp. Here are the values of ebp and esp.
(gdb) print $esp
$1 = (void *) 0xbffff14c
(gdb) print $ebp
$3 = (void *) 0xbffff168
Edit: Those values above are wrong. I should have used the 'x' command, like below:
(gdb) x/x $ebp
0xbffff168: 0xbffff188
(gdb) x/x $esp
0xbffff14c: 0x0804e481
Here's a reply from someone on a mailing list regarding this. Anyone care to illuminate what he means a bit? How do I check to see how the stack is set up?
An immediate problem I see is that the
stack pointer is not properly aligned.
This is 32-bit code, and the Intel
manual says that the stack should be
aligned at 32-bit addresses. That is,
the least significant digit in esp
should be 0, 4, 8, or c.
I also note that the values in ebp and
esp are very far apart. Typically,
they contain similar values --
addresses somewhere in the stack.
I would look at how the stack was set
up in this program.
He replied with corrections to the above comments. He was unable to see any problems after further input.
Another edit: Someone replied that the code page may not be marked executable. How can I insure it is marked as such?
The problem had nothing to do with the code. Adding -z execstack to the linker fixed the problem.
If push %ebp is causing a segfault, then your stack pointer isn't pointing at valid stack. How does control reach that point? What platform are you on, and is there anything odd about the runtime environment? At the entry to the function, %esp should point to the return address in the caller on the stack. Does it?
Aside from that, the whole function is pretty weird. You go out of your way to set the rounding bits in the fp control word, and then don't perform any operations that are affected by rounding. All the function does is copy some data, but uses floating-point registers to do it when you could use the integer registers just as well. And then there's the spurious emms, which you need after using MMX instructions, not after doing x87 computations.
Edit See Scott's (the original questioner) answer for the actual reason for the crash.