Is value of ebp register always multiple of 8? - gcc

I am new to reverse engineering. Whenever I disassemled a program, I always found that value of ebp register be multiple of 8.
For performance reasons, modern x64 calling conventions requires the stack to be aligned to 16 bytes.
This is also the case for GCCs x86 calling convention.
I can assume this is relevant for ebp, not only esp.


Why does Windows use RCX, RDX for pointers in a fresh x64 process, different from EAX, EBX in a newly created 32-bit process?

When I create a Windows x86 process in a suspended state (CREATE_SUSPENDED) its CONTEXT contains:
Virtual Address of Entry Point in Eax register;
Virtual Address of Process Environment Block structure in Ebx register.
But when I do the same for x86_64 process then CONTEXT contains:
Virtual Address of Entry Point in Rcx register (why not Rax?)
Virtual Address of PEB structure in Rdx register (why not Rbx?)
It seems logical to me to take Rax in x64 in place of Eax in x86 and Rbx in x64 in place of Ebx in x86 .
But instead of Eax→Rax and Ebx→Rbx we see Eax→Rcx and Ebx→Rdx.
Also, I see that 64-bit Cheat Engine is aware of this when opening the 32-bit process (notice the migration of the values eax↔ecx and ebx↔edx:
What was the reason to move from *ax register to *cx and from *bx to *dx in 64-bit processes?
Is it somehow connected to calling conventions?
Is it related to Windows only or do other OSes also have this kind of register repurposing?
Screenshots of just created x64 process in a suspended state:
I don't see why it would be logical to assume so.
Even if, at MS, they had defined an internal ABI documenting the context of a just-created 32-bit process, the 64-bit version of would have been designed anew, so there is no reason to assume it carries anything over from the old 32-bit ABI.
If Windows uses sysret to return to user space, a process created with a suspended state may leak the target address in rcx.
Returning via other mechanisms (e.g. iret/retf), as could be the case for 32-bit code, will of course leak different data in different registers.
What you are seeing is probably an artifact of how Windows returns to user mode. I don't know exactly what the Windows kernel code to return to user mode is, but it is reasonable to assume that MS kept the same interface for 32-bit processes and that this interface was designed before sysret was widely used.
Note that at the PE entry-point rcx contains a pointer to the PEB and rdx to the entry-point (not the other way around). The former appears to be an undocumented parameter passed to the entry-point function, the latter may be just an artifact of how the entry-point is called.
In fact, a 32-bit process will find a pointer to the PEB in the stack, as the first parameter for the PE entry-point code.
Regarding other OSes, anything that is not documented to be stable is free to change at any time (including what's left in the registers). This is true in general.
As far as stability goes, passing from a 32-bit to a 64-bit implementation is a pretty big step and, again, there is no reason to keep using a very old interface (but with wider registers) instead of improving it with all the recent knowledge.
You can easily see that, for example, Linux "repurposed" the registers in the 64-bit system call ABI.

Windows 64 ABI, correct register use if i do NOT call windows API?

As suggested to me in another question i checked the windows ABI and i'm left a little confused about what i can and cannot do if i'm not calling windows API myself.
My scenario is i'm programming .NET and need a small chunk of code in asm targeting a specific processor for a time critical section of code that does heavy multi pass processing on an array.
When checking the register information in the ABI at
I'm left a little confused about what applies to me if i
1) Don't call the windows API from the asm code
2) Don't return a value and take a single parameter.
Here is what i understand, am i getting all of it right?
RAX : i can overwrite this without preserving it as the function doesn't expect a return value
RCX : I need to preserve this as this is where the single int parameter will be passed, then i can overwrite it and not restore it
RDX/R8/R9 : Should not be initialized as there are no such parameters in my method, i can overwrite those and not restore them
R10/R11 : I can overwrite those without saving them, if the caller needs it he is in charge of preserving them
R12/R13/R14/R15/RDI/RSI/RBX : I can overwrite them but i first need to save them (or can i just not save them if i'm not calling the windows API?)
RBP/RSP : I'm assuming i shouldn't touch those?
If so am i correct that this is the right way to handle this (if i don't care about the time taking to preserve data and need as many registers available as possible)? Or is there a way to use even more registers?
; save required registers
push r12
push r13
push r14
push r15
push rdi
push rsi
push rbx
; my own array processing code here, using rax as the memory address passed as the first parameter
; safe to use rax rbx rcx rdx r8 r9 r10 r11 r12 r13 r14 r15 rdi rsi giving me 14 64bit registers
; 1 for the array address 13 for processing
; should not touch rbp rsp
; restore required registers
pop rbx
pop rsi
pop rdi
pop r15
pop r14
pop r13
pop r12
TL;DR: if you need registers that are marked preserved, push/pop them in proper order. With your code you can use those 14 registers you mention without issues. You may touch RBP if you preserve it, but don't touch RSP basically ever.
It does matter if you call Windows APIs but not in the way I assume you think. The ABI says what registers you must preserve. The preservation information means that the caller knows that there are registers you will not change. You don't need to call any Windows API functions for that requirement to be there.
The idea as an analogue (yeah, I know...): Here are five different colored stacks of sticky notes. You can use any of them, but if you need the red or the blue ones, could you keep the top one in a safe place and put it back when you stop since I need the phone numbers on them. About the other colors I don't care, they were just scratch paper and I've written the information elsewhere.
So if you call an external function you know that no function will ever change the value of the registers marked as preserved. Any other register may change their values and you have to make sure you don't have anything there that needs to be preserved.
And when your function is called, the caller expects the same: if they put a value in a preserved register, it will have the same value after the call. But any non-preserved registers may be whatever and they will make sure they store those values if they need to keep them.
The return value register you may use however you want. If the function doesn't return a value the caller must not expect it to have any specific value and also will not expect it to preserve its value.
You only need to preserve the registers you use. If you don't use all of these, you don't need to preserve all of them.
You can freely use RAX, RCX, RDX, R8, R9, R10 and R11. The latter two must be preserved by the caller, if necessary, not by your function.
Most of the time, these registers (or their subregisters like EAX) are enough for my purposes. I hardly ever need more.
Of course, if any of these (e.g. RCX) contain arguments for your function, it is up to you to preserve them for yourself as long as you need them. How you do that is also up to you. But if you push them, make sure that there is a corresponding pop somewhere.
Use This MSDN page as a guide.

Windows x64 ABI. How can debugger show you arguments passed to functions

In x86 calling conventions parameters are passed on the stack and when using base pointers in a frame it is possible to reconstruct from a call stack what parameters have been passed to successive stack functions (actually the process is done in reverse order from last functioned called going back)
How can we do the same in x64 ABI considering (as per x64 ABI) that registers used for parameter passing RCX, RDX, R8, R9 -> are all volatile and thus loose their values between frames (with no stack backup). ?

Can I use a register as a loop counter?

Since the calling convention of a function states which registers are preserved, can a register be used as a loop counter?
I first thought that the ecx register is used as a loop counter, but after finding out that an stdcall function I have used has not preserved the value of ecx, I thought otherwise.
Is there a register that is guaranteed (by mostly used calling conventions at least) to be preserved?
Note: I don't have a problem in using a stack variable as a loop counter, I just want to make sure that it is the only way.
You can use any general-purpose register, and occasionally others, as the loop counter (just not the stack pointer of course ☺).
Either you use one to loop manually, i.e. replace…
loop label
… with…
dec ebp
jnz label
… which is faster anyway (because AMD (and later Intel, when they caught up, MHz-wise) artificially slowed down the loop instruction as otherwise, Windows® and some Turbo Pascal compiled software crashed).
Or you just save the counter in between:
push ecx
call func
pop ecx
loop label
Both are standard strategies.
Is there a register that is guaranteed (by mostly used calling conventions at least) to be preserved?
You can choose any free register in your own code if your loop code will not call any external entity.
If your loop code will call an external entity where the only guaranteed contract is the ABI and calling convention then you must save/restore your registers and make the register choice case-by-case.
Quoting Agner Fog's excellent paper Calling conventions for different C++ compilers and operating systems:
6 Register usage
The rules for register usage depend on the operating system, as shown in table 4. Scratch registers are registers that can be used for temporary storage without restrictions (also called caller-save or volatile registers). Callee-save registers are registers that you have to save before using them and restore after using them (also called non-volatile registers). You can rely on these registers having the same value after a call as before the call...
See also:
Wikipedia: x86 calling conventions

Compile for multiple calling conventions

I'm looking at some Linux code coming out of the Intel compiler. It looks like functions are being compiled for 2 calling conventions at once. The map file has lots of function name pairs like this:
0x0000000008000000 __foo
0x0000000008000008 __foo.
The offset between the pairs of functions is 4, 8, or 12 bytes. Each of those corresponds to 1, 2, or 3 mov instructions that are moving stack args to registers like this:
mov eax, [esp+4]
mov edx, [esp+8]
push ebp
After those instructions, it looks like a function using the regparm convention starts.
Does the Intel compiler generate functions with two different calling conventions and then use whichever entry address is correct for the given caller?
Actually, I'd say you have answered yourself to the question:
Does the Intel compiler generate functions with two different calling
conventions and then use whichever entry address is correct for the
given caller?
The foo function appears to be declared with the __regcall attribute. My educated guess is that you must probably have your program compiled using Debug profile, as stack frame based calling conventions allows some information to be available more easily.
