I am a newbie in assembly programming and I am using push and pop instructions that use the memory stack.
So, What is the stack default size, How to modify it and What is the limit if its size?
Stack size depends upon a lot of factors.
It depends on where you start the stack, how much memory you have, what CPU you are using etc.
The CPU you are using is not called a "Windows CPU".
If you are specifying what CPU you are using, you specify the name of that CPU in detail and also, very important, the architecture of the CPU. In this case, you are probably using x86 architecture.
Here is a memory map for x86 architecture:
All addresses Before 0X100000 - Free
0x100000 - 0xc0000 - BIOS
0xc0000 - 0xa0000 - Video Memory
0xa0000 - 0x9fc00 - Extended BIOS data area
0x9fC00 - 0x7e00 - Free
0x7e00 - 0x7c00 - Boot loader
0x7c00 - 0x500 - Free
0x500 - 0x400 - BIOS data area
0x400 - 0x00 - Interupt vector table
In x86, stack information is held by two registers:
Base pointer (bp): Holds starting address of the stack
Stack pointer (sp): Holds the address in which next value will be stored
These registers have different names in different modes:
`Base pointer Stack pointer`
16 bit real mode: bp sp
32 bit protected mode: ebp esp
64 bit mode: rbp rsp
When you set up a stack, stack pointer and base pointer gets the same address.
Stack is setup in the address specified in base pointer register.
You can set up your stack anywhere in memory that is free and the stack grows downwards.
Each time you "push" something on to the stack, the value is stored in the address specified by stack pointer (which is same as base pointer at the beginning), and the stack pointer register is decremented.
Each time you "pop" something from the stack, the value stored in address specified by stack pointer register is stored in a register specified by the programmer and the stack pointer register is incremented.
In 16 bit real mode, you "push" and "pop" 16 bits. So each time you "push" or "pop", The stack pointer register is decremented or incremented by 0x02, since each address holds 8 bits..
In 32 bit protected mode, you "push" and "pop" 32 bits. So each time you "push" or "pop", The stack pointer register is decremented or incremented by 0x04, since each address holds 8 bits.
You will have to setup the stack in the right place dpending upon how many values you are going to be "pushing".
If you keep "pushing" your stack keeps growing downwards and at some point of time your stack may overwrite something. So be wise and set up the stack in a address in the memory where there is plenty of room for the stack to grow downwards.
For example:
If you setup your stack at 0x7c00, just below the bootloader and you "push" too many values, your stack might overwrite the BIOS data area at some point of time which causes a lot of errors.
You should have a basic idea of a stack and the size of it by now.
Whatever loaded ("the loader") your program into memory, and passed control to it, determines where in memory the stack is located, and how much space is available for the stack.
It does so by the simple artifice of loading the stack pointer, typically using a MOV ESP, ... instruction before calling/jumping to your code. Your program then uses the stack area supplied.
If your program uses too much, it will write beyond the end of the allocated stack area. This is a program bug, because the memory past the end may be allocated for some other purpose in the application. Writing on that other memory is likely to change the program behavior (e.g., "bug") when that memory gets used, and finding the cause of that bug is likely to be difficult (people assume that stacks don't damage program data and vice versa).
If your application wants to use a larger stack, generally all you have to do is allocate your own area, large enough for your purposes, and do a MOV ESP, ... yourself to set the stack to the chosen location. How you allocate an area depends on the execution environment in which you run. (You need to respect ESP conventions: must be a multiple of 4, should be initialized to the bottom of a cache line, often useful to initialize to the bottom of virtual memory page).
It is generally a good idea when "switching" stacks to save the old value of ESP provided by the loader, and restore ESP to that old value before returning control to the loader/caller/OS. Likewise, you should free the extended stack space no longer being used.
This scheme will work if you know the amount of stack space you need in advance. In practice, this is rather hard to "guess" (and may be impossible if your code has a recursive algorithm that nests deeply). So you can either pick a really huge number bigger than you need (ick) or you can use an organized approach to switch stacks when it is clear to the program that it needs more.
See How does a stackless language work? for more discussion.
Related
The reason this gets me confused is that all addresses hold a sequence of 1's and 0's. So how does the CPU differentiate, let's say, 00000100(integer) from 00000100(CPU instruction)?
First of all, different commands have different values (opcodes). That's how the CPU knows what to do.
Finally, the questions remains: What's a command, what's data?
Modern PCs are working with the von Neumann-Architecture ( https://en.wikipedia.org/wiki/John_von_Neumann) where data and opcodes are stored in the same memory space. (There are architectures seperating between these two data types, such as the Harvard architecture)
Explaining everything in Detail would totally be beyond the scope of stackoverflow, most likely the amount of characters per post would not be sufficent.
To answer the question with as few words as possible (Everyone actually working on this level would kill me for the shortcuts in the explanation):
Data in the memory is stored at certain addresses.
Each CPU Advice is basically consisting of 3 different addresses (NOT values - just addresses!):
Adress about what to do
Adress about value
Adress about an additional value
So, assuming an addition should be performed, and you have 3 Adresses available in the memory, the application would Store (in case of 5+7) (I used "verbs" for the instructions)
Adress | Stored Value
1 | ADD
2 | 5
3 | 7
Finally the CPU receives the instruction 1 2 3, which then means ADD 5 7 (These things are order-sensitive! [Command] [v1] [v2])... And now things are getting complicated.
The CPU will move these values (actually not the values, just the adresses of the values) into its registers and then processing it. The exact registers to choose depend on datatype, datasize and opcode.
In the case of the command #1 #2 #3, the CPU will first read these memory addresses, then knowing that ADD 5 7 is desired.
Based on the opcode for ADD the CPU will know:
Put Address #2 into r1
Put Address #3 into r2
Read Memory-Value Stored at the address stored in r1
Read Memory-Value stored at the address stored in r2
Add both values
Write result somewhere in memory
Store Address of where I put the result into r3
Store Address stored in r3 into the Memory-Address stored in r1.
Note that this is simplified. Actually the CPU needs exact instructions on whether its handling a value or address. In Assembly this is done by using
eax (means value stored in register eax)
[eax] (means value stored in memory at the adress stored in the register eax)
The CPU cannot perform calculations on values stored in the memory, so it is quite busy moving values From memory to registers and from registers to memory.
i.e. If you have
eax = 0x2
and in memory
0x2 = 110011
and the instruction
MOV ebx, [eax]
this means: move the value, currently stored at the address, that is currently stored in eax into the register ebx. So finally
ebx = 110011
(This is happening EVERYTIME the CPU does a single calculation!. Memory -> Register -> Memory)
Finally, the demanding application can read its predefined memory address #2,
resulting in address #2568 and then knows, that the outcome of the calculation is stored at adress #2568. Reading that Adress will result in the value 12 (5+7)
This is just a tiny tiny example of whats going on. For a more detailed introduction about this, refer to http://www.cs.virginia.edu/~evans/cs216/guides/x86.html
One cannot really grasp the amount of data movement and calculations done for a simple addition of 2 values. Doing what a CPU does (on paper) would take you several minutes just to calculate "5+7", since there is no "5" and no "7" - Everything is hidden behind an address in memory, pointing to some bits, resulting in different values depending on what the bits at adress 0x1 are instructing...
Short form: The CPU does not know what's stored there, but the instructions tell the CPU how to interpret it.
Let's have a simplified example.
If the CPU is told to add a word (let's say, an 32 bit integer) stored at the location X, it fetches the content of that address and adds it.
If the program counter reaches the same location, the CPU will again fetch this word and execute it as a command.
The CPU (other than security stuff like the NX bit) is blind to whether it's data or code.
The only way data doesn't accidentally get executed as code is by carefully organizing the code to never refer to a location holding data with an instruction meant to operate on code.
When a program is started, the processor starts executing it at a predefined spot. The author of a program written in machine language will have intentionally put the beginning of their program there. From there, that instruction will always end up setting the next location the processor will execute to somewhere this is an instruction. This continues to be the case for all of the instructions that make up the program, unless there is a serious bug in the code.
There are two main ways instructions can set where the processor goes next: jumps/branches, and not explicitly specifying. If the instruction doesn't explicitly specify where to go next, the CPU defaults to the location directly after the current instruction. Contrast that to jumps and branches, which have space to specifically encode the address of the next instruction's address. Jumps always jump to the place specified. Branches check if a condition is true. If it is, the CPU will jump to the encoded location. If the condition is false, it will simply go to the instruction directly after the branch.
Additionally, the a machine language program should never write data to a location that is for instructions, or some other instruction at some future point in the program could try to run what was overwritten with data. Having that happen could cause all sorts of bad things to happen. The data there could have an "opcode" that doesn't match anything the processor knows what to do. Or, the data there could tell the computer to do something completely unintended. Either way, you're in for a bad day. Be glad that your compiler never messes up and accidentally inserts something that does this.
Unfortunately, sometimes the programmer using the compiler messes up, and does something that tells the CPU to write data outside of the area they allocated for data. (A common way this happens in C/C++ is to allocate an array L items long, and use an index >=L when writing data.) Having data written to an area set aside for code is what buffer overflow vulnerabilities are made of. Some program may have a bug that lets a remote machine trick the program into writing data (which the remote machine sent) beyond the end of an area set aside for data, and into an area set aside for code. Then, at some later point, the processor executes that "data" (which, remember, was sent from a remote computer). If the remote computer/attacker was smart, they carefully crafted the "data" that went past the boundary to be valid instructions that do something malicious. (To give them more access, destroy data, send back sensitive data from memory, etc).
this is because an ISA must take into account what a valid set of instructions are and how to encode data: memory address/registers/literals.
see this for more general info on how ISA is designed
https://en.wikipedia.org/wiki/Instruction_set
In short, the operating system tells it where the next instruction is. In the case of x64 there is a special register called rip (instruction pointer) which holds the address of the next instruction to be executed. It will automatically read the data at this address, decode and execute it, and automatically increment rip by the number of bytes of the instruction.
Generally, the OS can mark regions of memory (pages) as holding executable code or not. If an error or exploit tries to modify executable memory an error should occur, similarly if the CPU finds itself trying to execute non-executable memory it will/should also signal an error and terminate the program. Now you're into the wonderful world of software viruses!
I have a process that is overflowing the stack when run from within an IIS process, but works fine when run on its own. I suspect that on its own it gets the default 1MB stack, but within IIS gets somewhat less.
To avoid messing with the IIS worker processes I am using a sub-thread within the IIS process to allocate a bigger stack, but I suspect the stack size argument to Thread creation is being ignored as per the documentation (http://msdn.microsoft.com/en-us/library/ms149581.aspx)
When the stack overflows I can view the halted process in the debugger, but how do I find out how big a stack was actually allocated?
The answer is as follows.
In debugger, add a watch on the pseudo register TIB (http://msdn.microsoft.com/en-us/library/aa232399(v=vs.60).aspx )
Now take this value and display that address in a memory window. Subtract the third 4 byte word from the second 4 byte word, remembering to use little endian byte ordering.
http://en.wikipedia.org/wiki/Win32_Thread_Information_Block
I am writing a report to summarize stack. If you click on my profile you will see that I have been doing this for a while. Right now, I have some troubles because on GDB it shows me a different thing than on visual studio.
As a result, I am not too sure about my understanding of base pointer and stack pointer, and I am hoping that someone can lead me in the right direction if I am wrong.
For x86 computer, stack is typical growing downward (from higher memory address to lower).
So when a program begins, we called the main function.
In general, at the entry of each function call, a stack is created at the current esp location, and this is what we called "the top of the stack". Is this correct?
When the old ebp gets pushed onto the stack, is it pushed onto where the esp was first pointed to?
Afterward, the esp will move down to point to an empty memory location, is that correct?
Finally, esp is always changing, moving down pointing at the next available memory space. Is that correct?
Does esp move per byte, or per 4 bytes down?
I know there's a lot of questions. But thanks for your time!
Thank you for the response, sir!
#iSciurus
I am confused how everyone define esp pointing at the most recent entry that was pushed onto the stack.
For x86, since the stack grows downward, from your explanation, the esp will first point at the lowest address of the stack. When I look at the the assembly code, we have
0x080483f4 <+0>: push %ebp
0x080483f5 <+1>: mov %esp,%ebp
0x080483f7 <+3>: sub $0x10,%esp
So esp is decremented 16 bytes. So this is the size of the stack of this function call. Local variables come right after return address (ebp-4, ebp-8, etc). So what is the overall purpose of esp here? From what I understand, stack overflow occurs when we try to access an address smaller than that.
The last thing is: when we say the top of the stack, are we referring to the lowest address (for x86).
This is the picture I have in mind (growing downward)
[Parameter n ]
...
[Parameter 2 ]
[Parameter 1 ]
[Return Address ] 0x002CF744
[Previous EBP ] 0x002CF740 (current ebp)
[Local Variables ]
-- ESP
Sorry for these long questions. But I really appreciate your help.
To be correct, a stack frame is created at the current esp location, not the stack itself. The stack is created once at a thread startup, and each thread has its own stack, which is just an area in the process memory space. The stack frame is what actually created at each function entry, and it is a region inside the thread stack.
No, it is pushed onto the address [old_esp - 4] (or [old_rsp - 8] in x64), because esp is the top of the stack and points to the lowest used address. The next DWORD (or QWORD) in the stack is free and ebp is pushed there.
Yes, this is typically done with sub esp, value
No. First, esp is moving down pointing at the lowest used address in the stack, not at the next available space. Second, keep in mind that esp may point to anywhere, not only to the stack: it is ok until you use stack-related instuctions like push/pop.
Esp moves per machine word size: in x86 it moves per 4 bytes, while in x64 it moves per 8 bytes.
The overall purpose of esp is almost always the same: to store the top of the stack. All stack-related instructions like pop/push use esp as their argument. In x86 both ebp and esp are used to store information about the stack frame (the bottom and the top correspondingly). Maybe, you got confused about this redundancy. However, in x64 only rsp is used for stack-based parameters, rbp is a general purpose register.
What about buffer overflows in stack, they often occur when the code tries to write higher than the last element of an array (or struct or whatever). Stack grows downwards, but arrays grow upwards. When we write higher, we can access the return address, the SEH handler as well as internal variables of our caller.
Yes, when we say top, we mean the lowest address. So, most debuggers show the stack in reverse order:
-- ESP
[Local Variables ]
[Previous EBP ] 0x002CF740 (current ebp)
[Return Address ] 0x002CF744
[Parameter 1 ]
[Parameter 2 ]
...
[Parameter n ]
Here, ESP points to the value "above" all the data and looks more like the "top". Though it is still the lowest used address.
Is there a call to determine the stack size of a running thread? I've been looking in MSDN thread functions documentation, and can't seem to find one.
Whilst there isn't an API to find out stack size directly, contiguous virtual address space must be reserved up to the maximum stack size - it's just that a lot of that space isn't committed yet. You can take advantage of this and make two calls to VirtualQuery.
For the first call, pass it the address of any value on the stack to get the base address and size, in bytes, of the committed stack space. On an x86 machine where the stack grows downwards, subtract the size from the base address and VirtualQuery again: this will give you the size of the space reserved for the stack (assuming you're not precisely on the limit of stack size at the time). Summing the two naturally gives you the total stack size.
You can get the current committed size from the Top and Bottom in the TEB. You can get the process initial reserve and commit sizes from the PE header. But you cannot retrieve the actual sizes passed to CreateThread, nor is there any API to get the remaining size of reserved nor committed from current stack, see Thread Stack Size.
For example, for a 8-bit CPU, the stack size is expected to be 8-bit wide, and 16-bit CPU vs 16-bit stack width, and 32-bit, 64-bit CPU, and so on. Is it true for all architectures?
A CPU has a data bus and a address bus. They can have the same width, but they often aren't.
The stack pointer is a pointer to memory, so its often as wide as the address bus, unless there is some (weird/obscure) conversion used internally. The instruction pointer (points to the current instruction) is also a pointer to memory and as such as wide as the stack pointer.
Other registers mostly deal with data, and as such have the same dimensions as the data bus. But as usual, there are exceptions.
To take an old example. The 6502. A 8 bit cpu (8 bit databus, 16 bit addressbus). It has the (more or less) general purpose registers X and Y, and the "accumulator" called A. All 8 bit registers. There is a stack pointer and a instruction pointer, the stack pointer having 8 explicit and 8 implicit bits (stack is always in the same 256-byte region) and thus the stack pointer register having 8 bits, the instruction pointer having 16 bits.
The 8086, has an 16 bit databus and a 20 bit addressbus. The general registers where 8 (and 16 bit). The instruction pointer and stackpointer where 16 bit, but used segment registers (also 16 bit) to get the full 20 bit address.
8-bit CPU, the stack size would expect
to be 8-bit width
You might expect that, but you would be wrong. The basic use of the stack is to store & retrieve return addresses, and on 8-bit processors these are actually 16-bit values. For example the Z80 instruction:
PUSH HL
pushes a 16-bit value (the contents of the two 8-bit registers, H and L) onto the stack. There is no Z80 instruction which pushes an 8-bit value.
The stack is really just a concept, and "stack width" doesn't really mean anything.
Presumably, you mean the width of the stack elements, but you can typically store arbitrary sized values on any stack (with some padding or alignment), regardless of the size of the architecture's bus width or register size.
Note that the terms 8-bit, 16-bit etc are a little arbitrary, and aren't really tight definitions, so this complicates any attempted definition of "width".
One point not yet mentioned is that some CPUs (e.g. the Microchip PIC family) have a dedicated stack which is not accessed like normal RAM. On the PIC16 series, the stack is tied exclusively to the program counter, and the stack is (presumably) the same width as the program counter. On the PIC18 series, the stack and program counter are both 21 bits wide; the upper 20 bits can be copied to or from the program counter; code can also examine the top stack element as a 5-bit part and two 8-bit parts.