Stack based memory allocation - memory-management

With reference to Stack Based Memory Allocation, it is stated as "...each thread has a reserved region of memory referred to as its stack. When a function executes, it may add some of its state data to the top of the stack; when the function exits it is responsible for removing that data from the stack" and "...that memory on the stack is automatically, and very efficiently, reclaimed when the function exits"
The first quoted sentence says the current thread is responsible and second quoted sentence says its done automatically.
Question 1: Is it done automatically or by the current running thread?
Question 2: How the deallocation of memory takes place in Stack?

Question 1: by automatically (and very efficiently) they mean that just by shifting a memory pointer around (cutting the top off the stack), all memory used there is reclaimed. There is no complex garbage collection necessary.
Question 2: the stack is just a contiguous chunk of memory delimited by a start and an end pointer. Everything between the pointers belongs to the stack, everything beyond the end pointer is considered free memory. You allocate and deallocate memory by moving the end pointer (the top of the stack) around. Things are much more complicated on the heap, where memory use is fragmented.

You might understand more by looking at an example of a Call Stack (such as in C on many machines).

Question 1: Yes.
Question 2: by decreasing the stack pointer, i.e. the reverse operation of allocation.

The stack is managed by the compiler.
The heap is managed by a library.

ans to the question 1: yes its automatically done by the garbage collector as it is daemon process running always with the jvm. it checks for all the references and if they dont have references(or out of reach) then it will remove it from the heap.
ans to the question 2: as the local variables and method calls will be stored in stack as soon as they will be out of scope they will be removed from the stack.

Related

Checking a process' stack usage in Linux

I am using version 3.12.10 of Linux. I am writing a simple module that loops through the task list and checks the stack usage of each process to see if any are in danger of overflowing the stack. To get the stack limit of the process I use:
tsk->signal->rlim[ RLIMIT_STACK ].rlim_cur
To get the memory address for the start of the stack I use:
tsk->mm->start_stack
I then subract from it the result of this macro:
KSTK_ESP( tsk )
Most of the time this seems to work just fine, but on occasion I a situation where a process uses more than its stack limit ( usually 8 MB ), but the process continues to run and Linux itself is not reporting any sort of issue.
My question is, am I using the right variables to check this stack usage?
After doing more research I think I have realized that this is not a good way of determining how much stack was used. The problem arises when the kernel allocates more pages of memory to the stack for that process. Those pages may not be contiguous to the other pages. Thus the current stack pointer may be some value that would result in an invalid calculation.
The value in task->mm->stack_vm can be used to determine how much space was actually allocated to a process' stack. This is not as accurate as how much is actually used, but for my use, good enough.

Does memory layout of a program depend on address binding technique?

I have learned that with run-time address binding, the program can be allocated frames in the physical memory non-contiguously. Also, as described here and here, every segment of the program in the logical address space is contiguous, but not all segments are placed together side-by-side. The text, data, BSS and heap segments are placed together, but the stack segment is not. In other words, there are pages between the heap and the stack segments (between the program break and stack top) in the logical address space that are not mapped to any frames in the physical address space, thus implying that the logical address space is non-contiguous in the case of run-time address binding.
But what about the memory layout in the case of compile-time or load-time binding ? Now that the logical address space in not an abstract address space but the actual physical address space, how is a program laid out in the physical memory ? More specifically, how is the stack segment placed in the physical address space of a program ? Is it placed together with the rest of the segments or separately just as in the case of run-time binding ?
To answer your quesitons, I first have to explain a bit about stack and heap allocation in modern operating systems.
Stack as the name suggest is the continues memory allocation, where cpu uses push and pop commands to add/remove data from top of the stack. I assume that you already know how stack works. process stores - return address, function arguments and local variables over stack. Every time a function is called, more data is pushed (it could ultimately lead to stack overflow is no data is popped ever - infinite recursion?). Stack size is fixed for a program when it is loaded in the memory. Most of the programming language lets you decide stack size during compilation. If not, they will decide a default. On Linux, maximum stack size(hard limit) is limited by ulimit. You can check and set the size by ulimit -s.
Heap space however, has no upper limit in *nix systems(depends, confirm it using ulimit -v), every program starts with a default/set amount of heap and can increase as much needed. Heap space in a process is actually two linked lists, free and used blocks. Whenever memory allocation is required from heap, one or more free blocks are combined to form a bigger block and allocated to the used list as a single block. Freeing up means removing a block from the used list to the free list. After Freeing the blocks, heap can have external fragmentation. Now if the the number of free blocks can't contain the whole data, process will request more memory from the OS, Generally the newer blocks are allocated from a higher address. Thus we show upward direction diagram for heap growth. I rephrase - Heap does not allocate memory continuously in a higher direction.
Now to answer your questions.
With compile-time or load-time address binding, how are the stack and
the heap segments placed in the physical address space of a program ?
Fixed stack is allocated at compile time, with some heap memory. How are placed have been explained above.
Is the space between the heap and the stack reserved for the program
or is it available for the OS to be used for other programs ?
Yes it is reserved for the program. Process however can request more memory to add free blocks in its heap. It is different than sharing it's own heap.
Note: There are lots of topic which can be covered here as the question is broad. Some of them are - garbage collection, block selection, shared memory etc. I will soon add the references here.
References:-
Memory Management in JVM
Stack vs Heap
Heap memory allocation strategies

How to release memory allocated by gcnew?

After some tests with help of Task Manager, I understood one thing about gcnew — memory allocated for local variables remaines allocated even if control leaves function, and is re-allocated only when control re-enters this function — so I'm in perplexity, how to deallocate memory myself. Here is some example of the problem:
void Foo(void)
{
System::Text::StringBuilder ^ t = gcnew System::Text::StringBuilder("");
int i = 0;
while(++i < 20000000) t->Append(i);
return;
}
As I mentioned, memory for variable t remains after leaving Foo(), delete not work as it works for new, and calling Foo() once, only gives me pointless allocated memory.
This is gcnew, which means garbage collected allocation. It will be disposed and deallocated by GC thread
Your function uses memory for code and data. The code is a fixed amount and will be used the entire time the library or program is loaded. The data is only used when the function is executing.
Data used by a program is either static or dynamic. Static data is laid out by the compiler and is basically equivalent to code (except that it might be marked as non-executable and/or read-only to prevent accidents). Dynamic data is temporary and allocated from a stack or heap (or CPU registers).
In a classic program, the stack and heap share the same memory address range with the stack at one end, growing toward the heap and the heap at the other end, trying not to grow into the stack. However, with modern address ranges on the order of 1TB, a heap generally has a lot of room.
Keep in mind that when a program requests an address range, it's just signaling to the operating system that it's okay to use that address for data reading, data writing and/or code execution. Until it actually puts something there, there is no load on the system. Also keep in mind with a virtual memory system, process memory is effectively allocated on the swap file/device (hard drive) with optimizations especially using RAM for caching, copy on write and many other techniques. (Data written to a memory address might never make it to the swap file, but that's up to the operating system.)
The data your function needs is for the two variables: t and i. t is a reference to a garbage collected object. i is an integer. Both are quite small and short-lived. You could think of them as being on the stack. When the function returns, the stack frame is popped and their memory is reused by the next stack operation. If you are looking at memory allocation, there won't be a change because the amount of memory allocated to the stack would not be changed.
Now in the execution of your function, a new object is created and, the way it's filled with data, it takes up quite a bit of memory. You could consider that object to be created in the heap. You don't need to delete it since it is a garbage collection object. When the garbage collector runs by walking all objects reachable from a set of root objects, it will find that the object is not reachable and add its space to a free list. When space for a new object is needed that doesn't fit into any blocks on the free list, more of the heap's address range will be used.
The CLR heap is compactable, which means it can move objects around in order to coalesce free blocks. Using this ability, it can move objects out of areas of allocated memory and give it back to the operating system, thereby freeing up space in the swap file.
So, there are three things that have to happen for you to see a reduction in the amount of memory allocated to the process:
The garbage collection has run to find unreachable objects.
The heap has been compacted.
The heap allocation has been reduced.
None of these things are really necessary until the swap file can't grow anymore. Obviously, the system has been designed for performance and to be a good citizen so it wouldn't take it that far. You can influence when garbage collection runs but this is only very rarely helpful and is generally not done.

Can you "allocate" stack space with VirtualAlloc?

I'm messing around with VirtualAlloc and dynamic code generation, and I've become curious about something.
The first parameter of VirtualAlloc specifies the start of the address range to be allocated, or more accurately, the page containing that address specifies the start of the page range to be allocated. Right?
I started wondering. Could you just make a bunch of space on the stack and "allocate" that memory with VirtualAlloc? For instance, to change its permissions to PAGE_EXECUTE_READWRITE?
(As an extension of the above, I'm curious where exactly the stack is in a Windows process. How is it set up? What sets it up?)
tl;dr Can you "allocate" stack space with VirtualAlloc?
Stack space is allocated by VirtualAlloc and the MEM_RESERVE flag (or perhaps directly using the underlying syscall) when a thread is created. This causes a chuck of the process's address space to be reserved for that thread stack.
A guard page is used to cause an access-violation when the stack grows past the region which is actually committed. The OS handles this automatically, by committing additional memory (if there is enough reserved space) or generating EXCEPTION_STACK_OVERFLOW to the process if the edge of the reserved area is reached. In the first case, a new guard page is set up. In the second, recreating the guard page is an important step if you try to handle that exception and recover.
You could use VirtualAlloc and VirtualProtect to precommit your thread's stack. But they don't touch the stack pointer, so they can't be used for stack allocation (code using the stack pointer would happily reuse "your" allocation for automatic variables, function parameters, etc). To allocate space from the stack, you need to adjust the stack pointer. Most C and C++ compilers provide an _alloca() intrinsic for doing this.
If you're doing dynamic code generation, don't use the stack for that. Non-executable stack is a valuable protection against remote execution vulnerabilities. You certainly can use VirtualAlloc for dynamic allocation in specialized cases like this, instead of the general-purpose allocators HeapAlloc and malloc and new[]. The general-purpose allocators all ultimately get their memory from VirtualAlloc, but then parcel it out in chunks that don't line up with page boundaries.

Where can one find detailed information about stack operation in x86 processors

I am interested in the layout of an executable and dynamic memory allocation using stack and how the processor and kernel together manage the stack region, like during function calls and other scenarios of using stack based memory allocation. Also how stack overflow and other hazards associated with this model occur, are their other designs of code execution that are not stack based and don't have such issues. A video or an animation would be of great help.
Typically (any processor, not just x86) there is one ram address space, and typically the program is in lower memory and grows upwards as you run. Say your program is 0x1000 bytes and is loaded at 0x0000 then you do a malloc of 0x3000 bytes the address returned would be 0x1000 in this hypothetical situation and now the lower 0x4000 bytes are being actively used by the program. Additional mallocs continue to grow in this way. Free()s do not necessary cause this consumption to go down, it depends on how the memory is managed and the programs mixture of malloc()s and free()s.
The stack though normally goes from the top down. Say 0x10000 is the address the stack pointer starts at. Say you have a function that has three 32 bit unsigned int variables, and no parameters are passed in, you need three stack locations to hold those variables (assuming no optimization has reduced that requirement) so upon entry of the function the stack pointer is reduced by 3*4 = 12 bytes, so the stack pointer is changed to 0xFFF4, one of your variables is at address 0xFFF4+0 one at 0xFFF4+4 and the third at 0xFFF4+8. If that function calls another function then the stack pointer continues to move toward zero in memory. And as you continue to malloc() your used program memory grows upward. Unchecked they will collide, and the code needed to do that checking is cost prohibitive enough that it is rarely used. This is why local variables are good for optimizing and a few other things but bad because stack consumption is often non-deterministic or at least the analysis is not done by the average programmer.
On ISAs (instruction set architectures) like x86 where there is a limited number of usable registers then functions often need to pass arguments on the stack as well. The rules governing where and how things are passed and returned is defined and well understood by the compiler, this is not some random thing. Anyway, in addition to leaving room for the local variables some of the arguments to the function are on the stack and sometimes the return value is on the stack. In particular with x86, each function call causes the stack to grow downward, and functions calling functions makes that worse. Think about what recursion can do to your stack.
What are your alternatives? Use an instruction set with more registers with a function calling spec that uses more registers and less stack. Use fewer arguments when calling functions. Use fewer local variables. Malloc less. Use a good compiler with a good optimizer as well as help the optimizer by using easy to optimize habits when coding.
Realistically though, to have a generically useful processor for which you write generically useful programs you have to have a stack and the possibility that the stack overflows and/or collides with the heap.
Now the segmented memory model of the x86 as well as mmus in general give you the opportunity to keep the program memory and stack well away from each other. Also protection mechanisms can be used that if either the heap or the stack venture outside their allocated space a protection fault occurs. Still an oversight by the programmer but is easier to know what happened and debug it than the random side effects that occur when the stack grows down into program memory space. Using a protection mechanism like this is much easier solution to help the programmer control the stack growth than building something into the code generated by the compiler to check for a collision on every function call and malloc.
Another pitfall which is often asked in job interviews is something along the lines of:
int * myfun ( int a )
{
int i;
i=a+7;
return(&i);
}
This can take many forms, the thing to understand is that the variable i is temporarily allocated on the stack and is only allocated while the function is executing, when the function returns the stack pointer frees up the memory allocated for i and the next function called may very well clobber that memory. So by returning the address to a variable stored on the stack is a bad idea. Code that does something like this may run properly for weeks, months, years before being detected.
Now this is acceptable even on stack based cpus (the zylin zpu for example).
int myfun ( int a )
{
int i;
i=a+7;
return(i);
}
Partly because there isnt much you can do other than use globals (yes this specific case does not require the additional variable i, but assume your code is complicated enough that you need that local return variable), the second is because in C, the calling code frees up its portion of the stack. Meaning on an x86 for example if you call a function with two parameters on the stack, lets say two 4 byte ints, the calling code moves the stack pointer down by 8 and places those two parameters in that memory (sp+0 and sp+4), then when the function returns the calling code is the one that unallocates those two variables by adding 8 to the stack pointer. So in the above code using i and returning i by value, the C calling convention for that processor knows where to get the return value, and once that value is captured the stack memory holding that value is no longer needed. My understanding is that pascal, say borland turbo pascal for example the calee cleaned up the stack. So the caller would put the two variables on the stack and the function being called would clean up the stack. Not a bad idea as far as stack management goes, you can nest much deeper this way. there are pros and cons to both approaches.

Resources