If I am using the Win32 entry point, should I increase the esp value to remove the variables from the stack? - winapi

If I am using the Win32 entry point and I have the following code (in NASM):
extern _ExitProcess#4
global _start
section .text
_start:
mov ebp, esp
; Reserve space onto the stack for two 4 bytes variables
sub esp, 4
sub esp, 4
; ExitProcess(0)
push 0
call _ExitProcess#4
Now before exiting the process, should I increase the esp value to remove the two variables from the stack like I do with any "normal" function?

ExitProcess api can be called from any place. in any function and sub-function. and stack pointer of course can be any. you not need set any registers (include stack pointer) to some (and which ?) values. so answer - you not need increase the esp
as noted #HarryJohnston of course stack must be valid and aligned. as and before any api call. ExiProcess is usual api. and can be call as any another api. and like any another api it require only valid stack but not concrete stack pointer value. non-volatile registers need restore only we return to caller. but ExiProcess not return to caller. it at all never return
so rule is very simply - if you return from any function (entry point or absolute any - does not matter) - we need restore non volatile registers (stack pointer esp or rsp based on calling conventions) and return. if we not return to caller - we and not need restore/preserve any registers. if we return from thread or process entry point, despite good practice also restore all registers as well - in current windows implementations - even if we not do this, any way all will be work, because kernel32 shell caller simply just call ExitThread after we return. it not use any non volatile registers or local variables here. so code will be worked even without restore this from entry point, but much better restore it anyway

Related

How debuggers can trace the stack?

I was trying to implement a stack tracer, using stack pointers; RSP and RBP, but I think debuggers use an entirely different way to grab the return addresses, or maybe I am missing something. I can grab the return address of the last stack frame, but I can't get the others because I don't know the size of other stack frames, so I can't figure out how much bytes should I go back from stack frame, to get the return address. Are there anybody know which way do debuggers use to trace stack?
It is possible to trace the stack when the code uses frame pointers. In this case ebp/rbp is used as the frame pointer and functions begin with prologs and end with epilogs.
A typical prolog looks like this:
push rbp ; save previous frame pointer
mov rbp, rsp ; initialize this functions frame pointer
A typical epilog looks like this:
mov rsp, rbp ; restore the value of rsp
pop rbp ; restore previous frame pointer value from stack
retn
Thus in every place in a function rbp points to the stack position where the previous frame pointer is saved and rbp+8 contains the saved return address.
To get the called function a debugger should read [rbp+8] value and find a function to which this address belongs. This can be done by searching in debugging symbols.
Next it should read [rbp] value to get the frame pointer of the caller function. Continue this process until you find a toplevel function. This is typically a system library function that starts threads.

How to see result of MASM directives such as PROC, .SETFRAME. .PUSHREG

Writing x64 Assembly code using MASM, we can use these directives to provide frame unwinding information. For example, from .SETFRAME definition:
These directives do not generate code; they only generate .xdata and .pdata.
Since these directives don't produce any code, I cannot see their effects in Disassembly window. So, I don't see any difference, when I write assembly function with or without these directives. How can I see the result of these directives - using dumpbin or something else?
How to write code that can test this unwinding capability? For example, I intentionally write assembly code that causes an exception. I want to see the difference in exception handling behavior, when function is written with or without these directives.
In my case caller is written in C++, and can use try-catch, SSE etc. - whatever is relevant for this situation.
Answering your question:
How can I see the result of these directives - using dumpbin or something else?
You can use dumpbin /UNWINDINFO out.exe to see the additions to the .pdata resulting from your use of .SETFRAME.
The output will look something like the following:
00000054 00001530 00001541 000C2070
Unwind version: 1
Unwind flags: None
Size of prologue: 0x04
Count of codes: 2
Frame register: rbp
Frame offset: 0x0
Unwind codes:
04: SET_FPREG, register=rbp, offset=0x00
01: PUSH_NONVOL, register=rbp
A bit of explanation to the output:
The second hex number found in the output is the function address 00001530
Unwind codes express what happens in the function prolog. In the example what happens is:
RBP is pushed to the stack
RBP is used as the frame pointer
Other functions may look like the following:
000000D8 000016D0 0000178A 000C20E4
Unwind version: 1
Unwind flags: EHANDLER UHANDLER
Size of prologue: 0x05
Count of codes: 2
Unwind codes:
05: ALLOC_SMALL, size=0x20
01: PUSH_NONVOL, register=rbx
Handler: 000A2A50
One of the main differences here is that this function has an exception handler. This is indicated by the Unwind flags: EHANDLER UHANDLER as well as the Handler: 000A2A50.
Probably your best bet is to have your asm function call another C++ function, and have your C++ function throw a C++ exception. Ideally have the code there depend on multiple values in call-preserved registers, so you can make sure they get restored. But just having unwinding find the right return addresses to get back into your caller requires correct metadata to indicate where that is relative to RSP, for any given RIP.
So create a situation where a C++ exception needs to unwind the stack through your asm function; if it works then you got the stack-unwind metadata directives correct. Specifically, try{}catch in the C++ caller, and throw in a C++ function you call from asm.
That thrower can I think be extern "C" so you can call it from asm without name mangling. Or call it via a function pointer, or just look at MSVC compiler output and copy the mangled name into asm.
Apparently Windows SEH uses the same mechanism as plain C++ exceptions, so you could potentially set up a catch for the exception delivered by the kernel in response to a memory fault from something like mov ds:[0], eax (null deref). You could put this at any point in your function to make sure the exception unwind info was correct about the stack state at every point, not just getting back into sync before a function-call.
https://learn.microsoft.com/en-us/cpp/build/exception-handling-x64?view=msvc-170&viewFallbackFrom=vs-2019 has details about the metadata.
BTW, the non-Windows (e.g. GNU/Linux) equivalent of this metadata is DWARF .cfi directives which create a .eh_frame section.
I don't know equivalent details for Windows, but I do know they use similar metadata that makes it possible to unwind the stack without relying on RBP frame pointers. This lets compilers make optimized code that doesn't waste instructions on push rbp / mov rbp,rsp and leave in function prologues/epilogues, and frees up RBP for use as a general-purpose register. (Even more useful in 32-bit code where 7 instead of 6 registers besides the stack pointer is a much bigger deal than 15 vs. 14.)
The idea is that given a RIP, you can look up the offset from RSP to the return address on the stack, and the locations of any call-preserved registers. So you can restore them and continue unwinding into the parent using that return address.
The metadata indicates where each register was saved, relative to RSP or RBP, given the current RIP as a search key. In functions that use an RBP frame pointer, one piece of metadata can indicate that. (Other metadata for each push rbx / push r12 says which call-preserved regs were saved in which order).
In functions that don't use RBP as a frame pointer, every push / pop or sub/add RSP needs metadata for which RIP it happened at, so given a RIP, stack unwinding can see where the return address is, and where those saved call-preserved registers are. (Functions that use alloca or VLAs thus must use RBP as a frame pointer.)
This is the big-picture problem that the metadata has to solve. There are a lot of details, and it's much easier to leave things up to a compiler!

Printing a string in x86 Assembly on Mac OS X (NASM)

I'm doing x86 on Mac OS X with NASM. Copying an example and experimenting I noticed that my print command needed a four bytes pushed onto the stack after the other parameters but can't figure out why line five is necessary:
1 push dword len ;Length of message
2 push dword msg ;Message to write
3 push dword 1 ;STDOUT
4 mov eax,4 ;Command code for 'writing'
5 sub esp,4 ;<<< Effectively 'push' Without this the print breaks
6 int 0x80 ;SYSCALL
7 add esp,16 ;Functionally 'pop' everything off the stack
I am having trouble finding any documentation on this 'push the parameters to the stack' syntax that NASM/OS X seems to require. If anyone can point me to a resource for that in general that would most likely answer this question as well.
(Most of the credit goes to #Michael Petch's comment; I'm repeating it here so that it is an answer, and also in order to further clarify the reason for the additional four bytes on the stack.)
macOS is based on BSD, and, as per FreeBSD's documentation re system calls, by default the kernel uses the C calling conventions (which means arguments are pushed to the stack, from last to first), but assuming four extra bytes pushed to the stack, as "it is assumed the program will call a function that issues int 80h, rather than issuing int 80h directly".
That is, the kernel is not built for direct int 80h calls, but rather for code that looks like this:
kernel: ; subroutine to make system calls
int 80h
ret
.
.
.
; code that makes a system call
call kernel ; instead of invoking int 80h directly
Notice that call kernel would push the return address (used by the kernel subroutine's ret to return to calling code after the system call) onto the stack, accounting for four additional bytes – that's why it's necessary to manually push four bytes to the stack (any four bytes – their actual value doesn't matter, as it is ignored by the kernel – so one way to achieve this is sub esp, 4) when invoking int 80h directly.
The reason the kernel expects this behaviour – of calling a method which invokes the interrupt instead of invoking it directly – is that when writing code that can be run on multiple platforms it's then only needed to provide a different version of the kernel subroutine, rather than of every place where a system call is invoked (more details and examples in the link above).
Note: all the above is for 32-bit; for 64-bit the calling conventions are different – registers are used to pass the arguments rather than the stack (there's also a call convention for 32-bit which uses registers, but even then it's not the same registers), the syscall instruction is used instead of int 80h, and no extra four bytes (which, on 64-bit systems, would actually be eight bytes) need to be pushed.

How is the stack set up in a Windows function call?

To begin, I'd like to say I have sufficient background in assembly to understand most of what one needs to know to be a functional assembly programmer. Unfortunately I do not understand how a Windows API call works in terms of the return address.
Here's some example code written in GAS assembly for Windows using MinGW's as as the assembler and MinGW's ld as the linker...
.extern _ExitProcess#4
.text
.globl _main
_main:
pushl $0
call _ExitProcess#4
This code compiles and runs after assembling...
as program.s -o program.o
And linking it...
ld program.o -o program.exe -lkernel32
From my understanding, Windows API calls take arguments via push instructions, as can be seen above. Then during the call;
call _ExitProcess#4
the return address for the function is placed on the stack. Then, and this is where I'm confused, the function pops all the arguments off the stack.
I am confused because, since the stack is last in first out, in my mind while popping the arguments on the stack it would pop off the return address first. The arguments came first and the return address came next so it would technically be popped off first.
My question is, what does the layout of the stack look like after passing arguments via push operations to the function call and the return address placed on the stack? How are the arguments and the return address popped off the stack by the function as it executes? And finally, how is the return address popped off the stack and the function call rerturns to the address specified in the return addresss?
Almost all Windows API functions use the stdcall calling convention. This works like the normal "cdecl" convention, except as you've seen the called function is responsible for removing the argument when it returns. It does this using the RET instruction, which takes an optional immediate operand. This operand is the number of bytes to pop off the stack after first popping off the return value.
In both the cdecl and stdcall calling convention the arguments to a function aren't popped off the stack while the function is executing. They're left on the stack and accessed using ESP or EBP relative addressing. So when ExitProcess needs to access its argument it uses an instruction like mov 4(%esp), %eax or mov 4(%ebp), %eax.

Windows 64 ABI, correct register use if i do NOT call windows API?

As suggested to me in another question i checked the windows ABI and i'm left a little confused about what i can and cannot do if i'm not calling windows API myself.
My scenario is i'm programming .NET and need a small chunk of code in asm targeting a specific processor for a time critical section of code that does heavy multi pass processing on an array.
When checking the register information in the ABI at https://msdn.microsoft.com/en-us/library/9z1stfyw.aspx
I'm left a little confused about what applies to me if i
1) Don't call the windows API from the asm code
2) Don't return a value and take a single parameter.
Here is what i understand, am i getting all of it right?
RAX : i can overwrite this without preserving it as the function doesn't expect a return value
RCX : I need to preserve this as this is where the single int parameter will be passed, then i can overwrite it and not restore it
RDX/R8/R9 : Should not be initialized as there are no such parameters in my method, i can overwrite those and not restore them
R10/R11 : I can overwrite those without saving them, if the caller needs it he is in charge of preserving them
R12/R13/R14/R15/RDI/RSI/RBX : I can overwrite them but i first need to save them (or can i just not save them if i'm not calling the windows API?)
RBP/RSP : I'm assuming i shouldn't touch those?
If so am i correct that this is the right way to handle this (if i don't care about the time taking to preserve data and need as many registers available as possible)? Or is there a way to use even more registers?
; save required registers
push r12
push r13
push r14
push r15
push rdi
push rsi
push rbx
; my own array processing code here, using rax as the memory address passed as the first parameter
; safe to use rax rbx rcx rdx r8 r9 r10 r11 r12 r13 r14 r15 rdi rsi giving me 14 64bit registers
; 1 for the array address 13 for processing
; should not touch rbp rsp
; restore required registers
pop rbx
pop rsi
pop rdi
pop r15
pop r14
pop r13
pop r12
TL;DR: if you need registers that are marked preserved, push/pop them in proper order. With your code you can use those 14 registers you mention without issues. You may touch RBP if you preserve it, but don't touch RSP basically ever.
It does matter if you call Windows APIs but not in the way I assume you think. The ABI says what registers you must preserve. The preservation information means that the caller knows that there are registers you will not change. You don't need to call any Windows API functions for that requirement to be there.
The idea as an analogue (yeah, I know...): Here are five different colored stacks of sticky notes. You can use any of them, but if you need the red or the blue ones, could you keep the top one in a safe place and put it back when you stop since I need the phone numbers on them. About the other colors I don't care, they were just scratch paper and I've written the information elsewhere.
So if you call an external function you know that no function will ever change the value of the registers marked as preserved. Any other register may change their values and you have to make sure you don't have anything there that needs to be preserved.
And when your function is called, the caller expects the same: if they put a value in a preserved register, it will have the same value after the call. But any non-preserved registers may be whatever and they will make sure they store those values if they need to keep them.
The return value register you may use however you want. If the function doesn't return a value the caller must not expect it to have any specific value and also will not expect it to preserve its value.
You only need to preserve the registers you use. If you don't use all of these, you don't need to preserve all of them.
You can freely use RAX, RCX, RDX, R8, R9, R10 and R11. The latter two must be preserved by the caller, if necessary, not by your function.
Most of the time, these registers (or their subregisters like EAX) are enough for my purposes. I hardly ever need more.
Of course, if any of these (e.g. RCX) contain arguments for your function, it is up to you to preserve them for yourself as long as you need them. How you do that is also up to you. But if you push them, make sure that there is a corresponding pop somewhere.
Use This MSDN page as a guide.

Resources