Why gcc does so when creating assembler code? - gcc

I am playing around with gcc -S to understand how memory and stack works. During these plays I found several things unclear to me. Could you please help me to understand the reasons?
When calling function sets arguments for a called one it uses mov to esp instead push. What is the advantage not using push?
Function which works with its stack located arguments points to them as ebp + (N + offset) (where N is a size reserved for return address). I expect to see esp - offset which is more understandable. What is the reason to use ebp as fundamental point everywhere? I know these ones are equal but anyway?
What is this magic for in the beginning of main? Why esp must be initialized in this way only?
and esp,0xfffffff0
Thanks,

I will assume you are working under a 32-bit environment because in a 64-bit environment arguments are passed in registers.
Question 1
Perhaps you are passing a floating point argument here. You cannot push these directly, as the push instruction in a 32-bit runtime pushes 4 bytes at a time so you would have to break up the value. It is sometimes easier to subtract 8 from esp and them mov the 8-byte quadword into [esp].
Question 2
ebp is frequently used to index the parameters and locals in stack frames in 32-bit code. This allows the offsets within frames to be fixed even as the stack pointer moves. For example consider
void f(int x) {
int a;
g(x, 5);
}
Now if you only accessed the stack frame contents with esp, then a is at [esp], the return address would be at [esp+4] and x would be at [esp+8]. Now let's generate code to call g. We have to first push 5 then push x. But after pushing 5, the offset of x from esp has changed! This is why ebp is used. Normally on entry to functions we push the old value of ebp to save it, then copy esp to ebp. Now ebp can be used to access stack frame contents. It won't move when we are in the middle of passing arguments.
Question 3
This and instruction zeros out the last 4 bits of esp, aligning it to a 16-byte boundary. Since the stack grows downward, this is nice and safe.

Related

Windows x86 assembly language syntax [duplicate]

This question already has an answer here:
Which segment register is used by default?
(1 answer)
Closed 6 years ago.
(1) What does the following code mean? I cannot find any reference about the ds:[ ] syntax anywhere online. How is it different from without the ds:?
cmp eax,dword ptr ds:[12B656Ch]
(2) In the following instruction,
movsx eax,word ptr [esi+24h]
What is the esi register used for? Is it possible to guess what the original C code is doing from using such a rare register?
DS refers to the Data Segment.
In Win32, CS = DS = ES = SS = 0.
That is these segments do not matter and a flat 32 bit address space is used.
The Data segment is the default segment when accessing memory. Some disassemblers mistakenly list it, even though it serves no purpose to list a default segment.
You can list a different segment if you do wish by using a segment override.
CS is de Code Segment which is the default segment for jumps and calls and SS is the Stack segment which is the default for addresses based on ESP.
ES is the Extra Segment which is used for string instructions.
The only segment override that makes sense in Win32 is FS (The F does not stand for anything, but it comes after E).
FS links to the Thread Information Block (TIB) which houses thread specific data and is very useful for Thread Local Storage and multi-threading in general.
There is also a GS which is reserved for future use in Win32 and is used for the TIB in Win64.
In Linux the picture is more or less the same.
What is register X for
You must let go of the notion that registers have special purposes.
In x86 you can use almost any register for almost any purpose.
Only a few complex instructions use specific registers, but the normal instructions can use any register.
The compiler will try and use as many registers as possible to avoid having to use memory.
Having said this the original purposes of the 8 x86 registers are as follows:
EAX : accumulator, some instructions using this register have 'short versions'.
EDX : overflow for EAX, used to store 64 bit values when multiplying or dividing.
ECX : counter, used in string instructions like rep mov and shifts.
EBX : miscellaneous general purpose register.
ESI : Source Index register, used as source pointer for string instructions
EDI : Destination Index register, used as destination pointer
ESP : Stack pointer, used to keep track of the stack
EBP : Base pointer, used in stack frames
You can use any register pretty much as you please, with the exception of ESP. Although ESP will work in many instructions, it is just too awkward to lose track of the stack.
Is it possible to guess what the original C code is doing from using such a rare register?
My guess:
struct x {
int a,b,c,d,e,f,g,h,i,j; //36 bytes
short s };
....
int i = x.s;
ESI likely points to some structure or object. At offset 24h (36) a short is present which is transfered into an int. (hence the mov with Sign eXtend).
ESI does not link local variable, because in that case EBP or ESP would be used.
If you want to know more about the c code you'd need more context.
Many c constructs translate into multiple cpu instructions.
The best way to see this is to write c code and inspect the cpu code that gets generated.

How does `sub rsp, 16` aligns the stack on Mac OSX?

I'm currently learning x64 asm on Mac OSX using nasm.
I've come across the problem of aligning the stack, a necessary step for some system calls such as malloc, which is done with these instructions:
push rbp
mov rbp, rsp
sub rsp, 16
Can anyone explain me how the function prolog does align the stack ? I mean, if it's not already on a multiple of 16, why would sub rsp, 16 correct it ?
Let's say that esp = 0x35 , after sub rsp, 16, esp = 0x25 right ? So esp wasn't aligned on a multiple of 16 before the sub, and the sub didn't align it neither so I think I haven't quite understood what "aligning the stack" means.
Can someone tell me what I should understand when I read "the stack needs to be aligned on a 16 bytes boundary" ?
It doesn't align it, as you say it just keeps alignment. Obviously the sub rsp, 16 is only needed if you want to allocate space for 1-15 bytes of local variables. You should make sure the number there is the next multiple of 16 above the space needed, assuming the stack is already aligned. Note that the return address and the frame pointer also add up to 16 bytes, if you don't use a frame pointer you need to account for that too.
In general, the calling convention mandates that it be aligned in a particular way upon entry to all functions, so you just have to maintain that. The only place that is normally not the case is possibly at process or thread startup but that is usually taken care of by the system libraries.

Understanding OSX 16-Byte alignment

So it seems like everyone knows that OSX syscalls are always 16 byte stack aligned. Great, that makes sense when you have code like this:
section .data
message db 'something', 10, 0
section .text
global start
start:
push 10 ; size of the message (4 bytes)
push msg ; the address of the message (4 bytes)
push 1 ; we want to write to STD_OUT (4 bytes)
mov eax, 4 ; write(...) syscall
sub esp, 4 ; move stack pointer down to 4 bytes for a total of 16.
int 0x80 ; invoke
add esp, 16 ; clean
Perfect, the stack is aligned to 16 bytes, makes perfect sense. How about though we call syscall(1) (exit). Logically that would look something like this:
push 69 ; return value
mov eax, 1 ; exit(...) syscall
sub esp, 12 ; push down stack for total of 16 bytes.
int 0x80 ; invoke
This doesn't work though, but this does:
push 69 ; return value
mov eax, 1 ; exit(...) syscall
sub esp, 4 ; push down stack for total of 8 bytes.
int 0x80 ; invoke
That works fine, but that's only 8 bytes???? Osx is cool, but this ABI is driving me nuts. Can someone shed some light on what I'm not understanding?
Short version: you probably don't need to align to 16 bytes, you just need to always leave a 4-byte gap before your argument list.
Long version:
Here's what I think is happening: I'm not sure that it's true that the stack should be 16-byte aligned. However, logic dictates that if it is and if padding or adjusting the stack is necessary to achieve that alignment, it must happen before the arguments for the syscall are pushed, not after. There can't be an arbitrary number of bytes between the stack pointer at the time of the int 0x80 instruction and where the arguments actually are. The kernel wouldn't know where to find the actual arguments. Subtracting from the stack pointer after pushing the arguments to achieve "alignment" doesn't align the arguments, it aligns the stack pointer by inserting an arbitrary number of bytes between the stack pointer and the arguments. Whatever else may be true, that can't be right.
Then why do the first and third snippets work at all? Don't they also insert arbitrary bytes there? They work by accident. It's because they both happen to insert 4 bytes. That adjustment isn't "successful" because it achieves stack alignment, it's part of the syscall ABI. Apparently, the syscall ABI expects and requires that there be a 4-byte slot before the argument list.
The source for the syscall() function can be found here. It looks like this:
LEAF(___syscall, 0)
popl %ecx // ret addr
popl %eax // syscall number
pushl %ecx
UNIX_SYSCALL_TRAP
movl (%esp),%edx // add one element to stack so
pushl %ecx // caller "pop" will work
jnb 2f
BRANCH_EXTERN(cerror)
2:
END(___syscall)
To call this library function, the caller will have set up the stack pointer to point to the arguments to the syscall() function, which starts with the syscall number and then has the real arguments for the actual syscall. However, the caller will then have used a call instruction to call it, which pushed the return address onto the stack.
So, the above code pops the return address, pops the syscall number into %eax, pushes the return address back onto the stack (where the syscall number originally was), and then does int 0x80. So, the stack pointer points to the return address and then the arguments. There's the extra 4 bytes: the return address. I suspect the kernel ignores the return address. I guess its presence in the syscall ABI may just be to make the ABI for system calls similar to that of function calls.
What does this mean for the alignment requirement of syscalls? Well, this function is guaranteed to change the alignment of the stack from how it was set up by its caller. The caller presumably set up the stack with 16-byte alignment and this function moves it by 4 bytes before the interrupt. It may just be a myth that the stack needs to be 16-byte aligned for syscalls. On the other hand, the 16-byte alignment requirement is definitely real for calling system library functions. The Wine project, for which I develop, was burned by it. It is mostly necessary for 128-bit SSE argument data types, but Apple made their lazy symbol resolver deliberately blow up if the alignemtn is wrong even for functions which don't use such arguments so that problems would be found early. Syscalls would not be subject to that early-failure mechanism. It may be that the kernel doesn't require the 16-byte alignment. I'm not sure if any syscalls take 128-bit arguments.

NASM on OS X work with bit, not byte

I am working on my first NASM program, and while trying to figure out the not instruction, I realized that instead of reversing the bit 0, it was reversing the byte 00000000. How would I tell it to work with a bit or otherwise fix this? Here is my code...
section .text
global start
start:
mov eax, 255
not eax
push eax
mov eax, 0x1
sub esp, 4
int 0x80
Feel free to give me pointers also on my assembly coding, as I don't want to get into any bad habits.
In most computer architectures (including the x86), a bit is not a directly addressable unit of memory. The smallest unit that you can directly refer to is a byte, which happens to contain 8 bits on the x86. You have not stated what you're exactly trying to accomplish, so I'm not able to give you an exact solution to your problem, but working with single bits (or groups of bits) is most often achieved by masking out the bits that are of no interest with the AND instruction, eventually shifting the value left or right, and then doing the processing.
If you want to actually get the value of the n-th bit in a register, then you're most probably looking for the instruction BT. It stores the value of the n-th bit in the Carry Flag.
When it comes to other tips : the push instruction decrements the stack pointer by the number of bytes pushed to the stack. This is a characteristic of the x86 architecture - the stack grows, by design, downwards. Therefore, if you want to free some space on the stack, you do add esp, number_of_bytes, not sub (the way you did), which just reserves more space on the stack.

Questions regarding base pointer and stack pointer

I am writing a report to summarize stack. If you click on my profile you will see that I have been doing this for a while. Right now, I have some troubles because on GDB it shows me a different thing than on visual studio.
As a result, I am not too sure about my understanding of base pointer and stack pointer, and I am hoping that someone can lead me in the right direction if I am wrong.
For x86 computer, stack is typical growing downward (from higher memory address to lower).
So when a program begins, we called the main function.
In general, at the entry of each function call, a stack is created at the current esp location, and this is what we called "the top of the stack". Is this correct?
When the old ebp gets pushed onto the stack, is it pushed onto where the esp was first pointed to?
Afterward, the esp will move down to point to an empty memory location, is that correct?
Finally, esp is always changing, moving down pointing at the next available memory space. Is that correct?
Does esp move per byte, or per 4 bytes down?
I know there's a lot of questions. But thanks for your time!
Thank you for the response, sir!
#iSciurus
I am confused how everyone define esp pointing at the most recent entry that was pushed onto the stack.
For x86, since the stack grows downward, from your explanation, the esp will first point at the lowest address of the stack. When I look at the the assembly code, we have
0x080483f4 <+0>: push %ebp
0x080483f5 <+1>: mov %esp,%ebp
0x080483f7 <+3>: sub $0x10,%esp
So esp is decremented 16 bytes. So this is the size of the stack of this function call. Local variables come right after return address (ebp-4, ebp-8, etc). So what is the overall purpose of esp here? From what I understand, stack overflow occurs when we try to access an address smaller than that.
The last thing is: when we say the top of the stack, are we referring to the lowest address (for x86).
This is the picture I have in mind (growing downward)
[Parameter n ]
...
[Parameter 2 ]
[Parameter 1 ]
[Return Address ] 0x002CF744
[Previous EBP ] 0x002CF740 (current ebp)
[Local Variables ]
-- ESP
Sorry for these long questions. But I really appreciate your help.
To be correct, a stack frame is created at the current esp location, not the stack itself. The stack is created once at a thread startup, and each thread has its own stack, which is just an area in the process memory space. The stack frame is what actually created at each function entry, and it is a region inside the thread stack.
No, it is pushed onto the address [old_esp - 4] (or [old_rsp - 8] in x64), because esp is the top of the stack and points to the lowest used address. The next DWORD (or QWORD) in the stack is free and ebp is pushed there.
Yes, this is typically done with sub esp, value
No. First, esp is moving down pointing at the lowest used address in the stack, not at the next available space. Second, keep in mind that esp may point to anywhere, not only to the stack: it is ok until you use stack-related instuctions like push/pop.
Esp moves per machine word size: in x86 it moves per 4 bytes, while in x64 it moves per 8 bytes.
The overall purpose of esp is almost always the same: to store the top of the stack. All stack-related instructions like pop/push use esp as their argument. In x86 both ebp and esp are used to store information about the stack frame (the bottom and the top correspondingly). Maybe, you got confused about this redundancy. However, in x64 only rsp is used for stack-based parameters, rbp is a general purpose register.
What about buffer overflows in stack, they often occur when the code tries to write higher than the last element of an array (or struct or whatever). Stack grows downwards, but arrays grow upwards. When we write higher, we can access the return address, the SEH handler as well as internal variables of our caller.
Yes, when we say top, we mean the lowest address. So, most debuggers show the stack in reverse order:
-- ESP
[Local Variables ]
[Previous EBP ] 0x002CF740 (current ebp)
[Return Address ] 0x002CF744
[Parameter 1 ]
[Parameter 2 ]
...
[Parameter n ]
Here, ESP points to the value "above" all the data and looks more like the "top". Though it is still the lowest used address.

Resources