how does stack growing work on windows and linux? - windows

I just read that windows programs call _alloca on function entry to grow the stack if they need more than 4k on the stack. I guss that every time the guard page is hit windows allocates a new page for the stack, therefore _alloca accesses the stack in 4k steps to allocate the space.
I also read that this only applies to windows. How does linux (or other oses) solve this problem if they don't need _alloca?

Linux relies on a heavily optimized page fault handling, so what happens is that the program just pushes things on the stack and the page fault handler will extend the stack on the fly.

Related

Does the Windows ABI allow me to change the stack pointer?

I know that the windows ABI have some restrictions about code generation for procedure's prologs & epilog, but I was wondering if it's fine by the OS to allocate a large heap storage and point the stack pointer to this location (and restore the RSP before the function returns)?
Basically, from what I understand windows threads have a hard limit of 4GB and I wonder if it's OK to increase the stack limit that way or if there's another way to do so?
I have read the information that MSDN has about the x64 stack usage here but I could not find any information about assigning new value to the stack register

How does the linux kernel avoid the stack overwriting the text (instructions)?

I was curious about how the kernel prevents the stack from growing too big, and I found this Q/A:
Q: how does the linux kernel enforce stack size limits?
A: The kernel can control this due to the virtual memory. The virtual
memory (also known as memory mapping), is basically a list of virtual
memory areas (base + size) and a target physically memory area that
the kernel can manipulate that is unique to each program. When a
program tries to access an address that is not on this list, an
exception happens. This exception will cause a context switch into
kernel mode. The kernel can look up the fault. If the memory is to
become valid, it will be put into place before the program can
continue (swap and mmap not read from disk yet for instance) or a
SEGFAULT can be generated.
In order to decide the stack size limit, the kernel simply manipulates
the virtual memory map. - Stian Skjelstad
But I didn't quite find this answer satisfactory. "When a program tries to access an address that is not on this list, an exception happens." - But wouldn't the text section (instructions) of the program be part of the virtual memory map?
I'm asking about how the kernel enforces the stack size of user programs.
There's a growth limit, set with ulimit -s for the main stack, that will stop the stack from getting anywhere near .text. (And the guard pages below that make sure there's a segfault if the stack does overflow past the growth limit.) See How is Stack memory allocated when using 'push' or 'sub' x86 instructions?. (Or for thread stacks (not the main thread), stack memory is just a normal mmap allocation with no growth; the only lazy allocation is physical pages to back the virtual ones.)
Also, .text is a read+exec mapping of the executable, so there's no way to modify it without calling mprotect first. (It's a private mapping, so doing so would only affect the pages in memory, not the actual file. This is how text relocations work: runtime fixups for absolute addresses, to be fixed up by the dynamic linker.)
The actual mechanism for limiting growth is by simply not extending the mapping and allocating a new page when the process triggers a hardware page fault with the stack pointer below the existing stack area. Thus the page fault is an invalid one, instead of a soft aka minor for the normal stack-growth case, so a SIGSEGV is delivered.
If a program used alloca or a C99 VLA with an unchecked size, malicious input could make it jump over any guard pages and into some other read/write mapping such as .data or stuff that's dynamically allocated.
To harden buggy code against that so it segfaults instead of actually allowing a stack clash attack, there are compiler options that make it touch every intervening page as the stack grows, so it's certain to set off the "tripwire" in the form of an unmapped guard page below the stack-growth limit. See Linux process stack overrun by local variables (stack guarding)
If you set ulimit -s unlimited could you maybe grow the stack into some other mapping, if Linux truly does allow unlimited growth in that case without reserving a guard page as you approach another mapping.

Checking a process' stack usage in Linux

I am using version 3.12.10 of Linux. I am writing a simple module that loops through the task list and checks the stack usage of each process to see if any are in danger of overflowing the stack. To get the stack limit of the process I use:
tsk->signal->rlim[ RLIMIT_STACK ].rlim_cur
To get the memory address for the start of the stack I use:
tsk->mm->start_stack
I then subract from it the result of this macro:
KSTK_ESP( tsk )
Most of the time this seems to work just fine, but on occasion I a situation where a process uses more than its stack limit ( usually 8 MB ), but the process continues to run and Linux itself is not reporting any sort of issue.
My question is, am I using the right variables to check this stack usage?
After doing more research I think I have realized that this is not a good way of determining how much stack was used. The problem arises when the kernel allocates more pages of memory to the stack for that process. Those pages may not be contiguous to the other pages. Thus the current stack pointer may be some value that would result in an invalid calculation.
The value in task->mm->stack_vm can be used to determine how much space was actually allocated to a process' stack. This is not as accurate as how much is actually used, but for my use, good enough.

Some clarification on TCB of an operting system

I'm a computer undergraduate taking operating systems course. For my assignment, I am required to implement a simple thread management system.
I'm in the process of creating a struct for a TCB. According to my lecture notes, what I could have in my TCB are:
registers,
program counter,
stack pointer,
thread ID and
process ID
Now according to my lecture notes, each thread should have its own stack. And my problem is this:
Just by storing the stack pointer, can I keep a unique stack per thread? If I did so, won't one stack of a thread over write other's stack?
How can I prevent that? Limit the stack for each thread??? Please tell me how this is usually done in a normal operating system.
Please help. Thanks in advance.
The OS may control stack growth by monitoring page faults from inaccessible pages located around the stack portion of the address space. This can help with detection of stack overflows by small amounts.
But if you move the stack pointer way outside the stack region of the address space and use it to access memory, you may step into the global variables or into the heap or the code or another thread's stack and corrupt whatever's there.
Threads run in the same address space for a reason, to share code and data between one another with minimal overhead and their stacks usually aren't excepted from sharing, from being accessible.
The OS is generally unable to do anything about preventing programs from stack overflows and corruptions and helping them to recover from those. The OS simply doesn't and can't know how an arbitrary program works and what it's supposed to do, hence it can't know when things start going wrong and what to do about them. The only thing the OS can do is just terminate a program that's doing something very wrong like trying to access inaccessible resources (memory, system registers, etc) or execute invalid or inaccessible instructions.

OS X, gcc, x86, segmentation, paging, seg fault, bus error

In the case of osx, gcc, modern x86:
How is the x86 segmentation h/w and paging h/w used?
For the most part1, the segmentation hardware isn't used. Most current OSes set CS, DS, SS, and ES to all point to all memory (base address of 0, limit of 4Gig). Each is set to allow full access to all that memory (CS->execute, DS, ES, SS->read/write).
That means nearly all real access control is done with the paging unit. The basic idea is that pages accessible by a particular process are mapped to that process. Pages that are in virtual memory are mapped, but marked not present, so attempting to read/write them will cause an exception; the OS reads the data from the paging file into RAM, marks the data as present, and re-starts the instruction.
As far as how pages are marked, most executable code will be marked read-only, and will be shared between processes. Most data and stack will be marked read/write and will not be shared. Depending on the exact system, stack space will usually have the NX bit set to prevent it from being executed.
There are a few other bits and pieces that are a bit different. For example, most OSes (including OS/X, if memory serves) set up a stack guard page -- a page at the top of the stack that allows no access. When/if you try to access it, the OS catches an exception, allocates another page of stack space, and re-starts the instruction. This means you can allocate (say) 4 megabytes of address space for the stack, but only allocate actual RAM for roughly the space that's been used (obviously in page-sized increments).
The hardware also supports "large" (4 megabyte) pages. These are used primarily for mapping large chunks of contiguous memory like the part of the memory on the graphics card that's directly visible to the CPU.
That's only a very high-level view, but it's hard to provide more detail without knowing what you care about. Trying to cover all the use of paging by an entire OS could occupy an entire (large) book.
1 Windows (unlike most other systems) does make a minimal use of segmentation -- it sets up FS as a pointer to a Thread Information Block (TIB), which gives access to some basic information about the current thread. This is useful (and used) particularly by Windows' Structured Exception Handling (and Vectored Exception Handling).

Resources