Does memory layout of a program depend on address binding technique? - memory-management

I have learned that with run-time address binding, the program can be allocated frames in the physical memory non-contiguously. Also, as described here and here, every segment of the program in the logical address space is contiguous, but not all segments are placed together side-by-side. The text, data, BSS and heap segments are placed together, but the stack segment is not. In other words, there are pages between the heap and the stack segments (between the program break and stack top) in the logical address space that are not mapped to any frames in the physical address space, thus implying that the logical address space is non-contiguous in the case of run-time address binding.
But what about the memory layout in the case of compile-time or load-time binding ? Now that the logical address space in not an abstract address space but the actual physical address space, how is a program laid out in the physical memory ? More specifically, how is the stack segment placed in the physical address space of a program ? Is it placed together with the rest of the segments or separately just as in the case of run-time binding ?

To answer your quesitons, I first have to explain a bit about stack and heap allocation in modern operating systems.
Stack as the name suggest is the continues memory allocation, where cpu uses push and pop commands to add/remove data from top of the stack. I assume that you already know how stack works. process stores - return address, function arguments and local variables over stack. Every time a function is called, more data is pushed (it could ultimately lead to stack overflow is no data is popped ever - infinite recursion?). Stack size is fixed for a program when it is loaded in the memory. Most of the programming language lets you decide stack size during compilation. If not, they will decide a default. On Linux, maximum stack size(hard limit) is limited by ulimit. You can check and set the size by ulimit -s.
Heap space however, has no upper limit in *nix systems(depends, confirm it using ulimit -v), every program starts with a default/set amount of heap and can increase as much needed. Heap space in a process is actually two linked lists, free and used blocks. Whenever memory allocation is required from heap, one or more free blocks are combined to form a bigger block and allocated to the used list as a single block. Freeing up means removing a block from the used list to the free list. After Freeing the blocks, heap can have external fragmentation. Now if the the number of free blocks can't contain the whole data, process will request more memory from the OS, Generally the newer blocks are allocated from a higher address. Thus we show upward direction diagram for heap growth. I rephrase - Heap does not allocate memory continuously in a higher direction.
Now to answer your questions.
With compile-time or load-time address binding, how are the stack and
the heap segments placed in the physical address space of a program ?
Fixed stack is allocated at compile time, with some heap memory. How are placed have been explained above.
Is the space between the heap and the stack reserved for the program
or is it available for the OS to be used for other programs ?
Yes it is reserved for the program. Process however can request more memory to add free blocks in its heap. It is different than sharing it's own heap.
Note: There are lots of topic which can be covered here as the question is broad. Some of them are - garbage collection, block selection, shared memory etc. I will soon add the references here.
References:-
Memory Management in JVM
Stack vs Heap
Heap memory allocation strategies

Related

what is the use of attaching static data along with a program when it is loaded on the main memory?

When the operating system loads Program onto the main memory , it , along with the stack and heap memory , also attaches the static data along with it. I googled about what is present in the static data which said it contained the global variables and static variables. But I am confused as both of these are already present in the text file of the program then why do we add them seperately?
The data in the executable is often referred as the data segment. The CPU doesn't interact with the hard-disk but only with RAM. The data segment must thus be loaded in RAM before the CPU can access it. The file of the executable is not really a text file. It is an executable so it has a different extension. Text files often refer to an actual file with a .txt extension.
With that said, you also asked another question not long ago (If the amount of stack memory provided to a program is fixed then why does it grow downwards in the process architecture? Or am I getting it wrong?) so I will try to give some insight for both of these in this same answer.
I don't know much about caching and low level inner CPU workings but, today mostly, the CPU doesn't even operate on RAM directly. It will load a bunch of RAM chunks into the cache and make operations on them and keep RAM-cache consistency by implementing complex mechanisms. The OS also has its role to play in RAM-cache consistency but, like I said, I am far from an expert here. Other than that, caching is mostly transparent to the OS. The CPU handles it and the OS simply provides instructions to the CPU which executes them.
Today, you have paging used by most OS and implemented on most CPU architectures. With paging, every process sees a full contiguous virtual address space. The virtual address space is accessed contiguously and the hardware MMU translates those addresses to physical ones automatically by crossing the page tables. The OS is responsible to make sure the page tables are consistent and the MMU does the rest of the job (for more info read: What is paging exactly? OSDEV). If you understand paging well, things become much clearer.
For a process, there is mostly 3 types of memory. There is the stack (often called automatic storage), the heap and the static/global data. I will attempt to give precision on all of these to give a global picture.
The stack is given a maximum size when the process begins. The OS handles that and creates the page tables and places the proper address in the stack pointer register so that stack accesses reach the proper region of physical memory. The stack is automatic storage which means that it isn't handled manually by the high level programmer. For example, in C/C++, the stack is managed by the compiler which, at the entry of a function, will create a stack frame and place offsets from the stack base pointer in the instructions. Every local variable (within a function) will be accessed with a relative negative offset from the stack base pointer. What the compiler needs to do is to create a stack frame of the proper size so that there will be enough place for all local variables of a particular function (for more info on the stack see: Each program allocates a fixed stack size? Who defines the amount of stack memory for each application running?).
For the heap, the OS reserves a very big amount of virtual memory. Today, virtual memory is very big (2^48 bytes or more). The amount of heap available for each process is often only limited by the amount of physical memory available to back virtual memory allocations. For example, a process could use malloc() to allocate 4KB of memory in C. The OS will be called with a system call by the libc library which is an implementation of the C standard library. The OS will then reserve a page of the virtual memory available for the heap and change the page tables so that accessing that portion of virtual memory will translate to somewhere in RAM (probably somewhere another process wasn't already using).
The static/global data are simply placed in the executable in the data segment. The data segment is loaded in the virtual memory alongside the text segment. The text segment will thus be able to access this data often using RIP-relative addressing.

How does macOS allocate stack and heap for a process?

I want to know how macOS allocate stack and heap memory for a process, i.e. the memory layout of a process in macOS. I only know that the segments of a mach-o executable are loaded into pages, but I can't find a segment that correspond to stack or heap area of a process. Is there any document about that?
Stacks and heaps are just memory. The only think that makes a stack a stack or a heap or a heap is the way it is accessed. Stacks and heaps are allocated the same way all memory is: by mapping pages into the logical address space.
Let's take a step back - the Mach-o format describes mapping the binary segments into virtual memory. Importantly the memory pages you mentioned have read write and execute permissions. If it's an executable(i.e. not a dylib) it must contain the __PAGEZERO segment with no permissions at all. This is the safe guard area to prevent accessing low addresses of virtual memory by accident (here falls the infamous Null pointer exception and such if attempting to access zero memory address).
__TEXT read executable (typically without write) segment follows which in virtual memory will contain the file representation itself. This implies all the executable code lives here. Also immmutable data like string constants.
The order may vary, but usually next you will encounter __LINKEDIT read only segment. This is the segment dyld uses to setup externally loaded functions, this is too broad to cover here, but there are numerous answers on the topic.
Finally we have the readable writable __DATA segment the first place a process can actually write to. This is used for global/static variables, external addresses to calls populated by dyld.
We have roughly covered the process initial setup when it will launch through either LC_UNIXTHREAD or in modern MacOS (10.7+) LC_MAIN. This starts the process main thread. Each thread must contain it's own stack. The creation of it is handled by operating system (including allocating it). Notice so far the process has no awareness of the heap at all (it's the operating system that's doing the heavy lifting to prepare the stack).
So to sum up so far we have 2 independent sources of memory - the process memory representing the Mach-o structure (size is fixed and determined by the executable structure) and the main thread stack (also with predefined size). The process is about to run a C-like main function , any local variables declared would move the thread stack pointer, likewise any calls to functions (local and external) to at least setup the stack frame for return address. Accessing a global/static variable would reference the __DATA segment virtual memory directly.
Reserving stack space in x86-64 assembly would look like this:
sub rsp,16
There are some great SO anwers on System V / AMD64 ABI (which includes MacOS) requirements for stack alignment like this one
Any new thread created will have its own stack to allow setting up stack frames for local variables and calling functions.
Now we can cover heap allocation - which is mitigated by the libSystem (aka MacOS C standard library) delivering the malloc/free. Internally this is handled by mmap & munmap system calls - the kernel API for managing memory pages.
Using those system calls directly is possible, but might turned out inefficient, thus an internal memory pool is utilised by malloc/free to limit the number of system calls (which are costly to make).
The changing addresses you mentioned in the comment are caused by:
ASLR aka PIE (position independent code) for process memory , which is a security measure randomizing the start of virtual memory
Thread local stacks being prepared by the operating system

Variable allocation and tracking

I started searching and reading about ALDS and memory management recently after I got a doubt about memory allocation, and after a couple of days of study I learnt a lot of things about memory management but the actual doubt remains unsolved.
So the doubt is, while allocating memory to a variable, how exactly does the system know which block of memory is available and which is free, and similarly when we destruct an object or set a variable as null or when GC frees up some memory, what exactly does it do with that block of memory, as I know the actual data is never erased on deletion, that block just gets marked as free somewhere in some table, but does that table keep track of each and every bit on the memory, if yes then wouldn't that become a lot of data in itself to store?
For an example, if I declare a linked list, then a block will be allocated in heap with it's next block having null value as there is no other node to reference, now as I keep adding more nodes into it, system will keep allocating more blocks each containing reference to next one. Now these blocks can be present on random locations depending on the availability of memory at allocation time, and can only be accessed through their proceeding nodes.
So now, for any given block of memory, how the system will know if its free and has just garbage value in it, or its actually a node of some linked list.
On a modern operating system the process has a logical, linear address space. Part of that address space is reserved for the system and is common to all processes. Some of the address space may be reserved but most of the remainder is available to the process.
The address space is defined by PAGE TABLES. The structure of the page table is defined by the processor but the operating system maintains a table for each process. Memory is allocated to a process in PAGES. The smallest I am aware of is 512 bytes but the size can go up to a megabyte or even larger in some processors and some processor configurations.The size is always a power of 2.
The page table defines:
Whether an page has actually been mapped to the process
Whether the pages has a corresponding physical memory location
If so, the mapping to that physical location.
There operating system only knows about pages.
At the next level down there are memory managers. These are not part of the operating system. Memory managers manage heaps that consist of pages allocated by the operating system. The memory manage has to keep track of the heap size and what memory has been allocated within it.
Memory managers operate is a huge number of different ways. There are malloc/free implementations galore that you can link into your code to get different behaviors.

Can you "allocate" stack space with VirtualAlloc?

I'm messing around with VirtualAlloc and dynamic code generation, and I've become curious about something.
The first parameter of VirtualAlloc specifies the start of the address range to be allocated, or more accurately, the page containing that address specifies the start of the page range to be allocated. Right?
I started wondering. Could you just make a bunch of space on the stack and "allocate" that memory with VirtualAlloc? For instance, to change its permissions to PAGE_EXECUTE_READWRITE?
(As an extension of the above, I'm curious where exactly the stack is in a Windows process. How is it set up? What sets it up?)
tl;dr Can you "allocate" stack space with VirtualAlloc?
Stack space is allocated by VirtualAlloc and the MEM_RESERVE flag (or perhaps directly using the underlying syscall) when a thread is created. This causes a chuck of the process's address space to be reserved for that thread stack.
A guard page is used to cause an access-violation when the stack grows past the region which is actually committed. The OS handles this automatically, by committing additional memory (if there is enough reserved space) or generating EXCEPTION_STACK_OVERFLOW to the process if the edge of the reserved area is reached. In the first case, a new guard page is set up. In the second, recreating the guard page is an important step if you try to handle that exception and recover.
You could use VirtualAlloc and VirtualProtect to precommit your thread's stack. But they don't touch the stack pointer, so they can't be used for stack allocation (code using the stack pointer would happily reuse "your" allocation for automatic variables, function parameters, etc). To allocate space from the stack, you need to adjust the stack pointer. Most C and C++ compilers provide an _alloca() intrinsic for doing this.
If you're doing dynamic code generation, don't use the stack for that. Non-executable stack is a valuable protection against remote execution vulnerabilities. You certainly can use VirtualAlloc for dynamic allocation in specialized cases like this, instead of the general-purpose allocators HeapAlloc and malloc and new[]. The general-purpose allocators all ultimately get their memory from VirtualAlloc, but then parcel it out in chunks that don't line up with page boundaries.

Windows stack and heap address ranges

Working with Linux until now where stack addresses are very high and heap addresses are pretty low (as seen by printing heap and stack addresses using a C program), I have a problem with the Win32 process memory layout. MWSDN is saying that that stack addresses are higher than heap addresses, but from what I saw in practice, stack addresses are lower than heap addresses. So I am confused. Someone please explain.
Hm, stack addresses are higher than heap addresses - this is simply not true. Both stack and heap can reside anywhere in the address space of the process on Windows.
If you start a lot of threads, make huge heap allocations and load hundreds of dlls, you will find that all these objects are evenly spread around the address space.
This picture shows the structure of virtual allocations in a typical 32-bit process on Windows. Green shows free areas, blue that something is allocated. Activity is mostly taking place in the beginning of the address space but in other address ranges it is present also.

Resources