Call stack questions - windows

I have been reading up on the call stack lately. However all examples and articles I have been reading has been single threaded. I am interested in how the call stack looks like in memory and how we can analyse it.
Sorry for including so many questions in one post. But it seems messy to create one post for each question when they are all related.
My questions here are for Windows x86.
So the questions I am having difficulty with is:
Is there always one call stack for each thread in a process? Ie, threads do not share call stacks?
Is the size of each call stack fixed? Or can it be different for each thread?
Let's pretend that we are doing everything ourselves and write our program in assembly. Is the call stack magically given to us? Or do we have to implement it ourselves?
If we make our program in assembly, do we then reserve some memory and set the call stack memory start address to ESP in order to set it up?
-Michael

1) Each thread has its own stack - almost by definition.
2) Maximum stack size is a process limit, specified in header. Initial thread stack size is a thread creation parameter - see CreateThread() API.
3) The OS manages all memory. The stack for new threads is dynamically allocated by the kernel upon thread creation and the top of the stack filled in with a stack frame that, amongst other stuff, allows the thread to begin execution by popping the frame in a similar manner to an interrupt-return. Don't try to do this at home.
4) NO! Import and call the CreateThread() API.

Related

How to measure the amount of memory consumed by the stack?

With Lauterbach TRACE32, how to measure the amount of memory consumed by the stack when the application is running?
I am using AUTOSAR OS on a PowerPC CPU.
In case you'd like to know the memory consumed by the stacks of the tasks I think the easiest way is to consult window TASK.STacK.view.
Ensure the following to get TASK.STacK.view working
Set-up the debuggers OS awareness (after loading your ELF), with TASK.ORTI for AUTOSAR or TASK.CONFIG for any other target OS.
Initialize the stacks of your task with a magic pattern, which can be done in the start code of your OS or with TRACE32 Data.Set command.
Possibly declare the magic initialization pattern to the debugger with command TASK.STacK.PATtern, if not detected by the OS awareness.
In case you'd like to know the memory consumed by the stack of a bare-metal application you have to check this via a PRACTICE script.
The basic idea here is to initialize the stack with a fixed pattern (before the application starts) and than check later which percentage of the stack does not longer contain the initialization pattern.
You can do this in the following three steps:
1st initialize your stack with a magic pattern after loading your ELF like this (or do this in the start-up code of your application):
GLOBAL &lowAddr &highAddr &magicPattern
&lowAddr=ADDRESS.OFFSET(__stack_start) // assign here the upper border of the addressrange occupied by your stack
&highAddr=ADDRESS.OFFSET(__stack_end)-1 // assign here the lower border of the addressrange occupied by your stack
&magicPattern=0xCCCCCCCC // any 32-bit you like which appears unlikely on the used part of the stack
Data.Set &lowAddr--&highAddr %Long &magicPattern // initialize the stack
Secondly create a script (stackcheck.cmm) to check the stack usage like this:
PRIVATE &lowAddr &highAddr &pattern &addr
ENTRY &lowAddr &highAddr &pattern
IF ("&lowAddr"=="")||("&highAddr"=="")||("&pattern "=="")
(
PRINT %ERROR "At least one of the 3 required parameters is missing"
ENDDO
)
VAR.NEWGLOBAL int \stacksize=0
VAR.NEWGLOBAL int \stackusage=0
&addr=&lowAddr
WHILE (Data.LONG(D:&addr)==&pattern)&&(&addr<&highAddr)
&addr=&addr+4
Var.Set %SPaces \stacksize = (&highAddr - &lowAddr) + 1
Var.Set %SPaces \stackusage = (&highAddr - &addr) + 1
Var.View \stacksize \stackusage (100*\stackusage)/\stacksize // Show result
ENDDO
(You might want to optimize the way to search through the address-range of the stack.)
Finally call the script to detect the current stack usage like this:
DO stackcheck.cmm &lowAddr &highAddr &magicPattern
If you want to check the stack-usage pretty often you might want to use MENU.AddTool to create a button for this in the tool-bar of TRACE32.

How Kernel stack is used in case of different processor mode in ARM architecture?

As I understand every process have a user stack and kernel stack. Apart from that there is a stack for every mode in ARM achitecture. So I want to know How different stack and stack pointer works in ARM modes? Also when this kernel stack associated with the process will be used ?
... when this kernel stack associated with the process will be used ?
When you make a system call. Like you want to get IP address of an interface, kernel just like any other application needs some stack to prepare what you want. So it has a corresponding stack when you switch to kernel side of a system call.
How different stack and stack pointer works in ARM modes?
ARM defines a few hardware modes to handle different inputs to the system. For example out of nowhere you can execute an illegal instruction (or undefined). In this case execution in CPU goes into a different mode and needs to be told how to proceed. Since most of the time you require some stack space to be able to handle this gracefully you need a separate stack for this mode. ARM provides you different stack register so when you switch to a different HW mode you don't overwrite previous modes stack pointer.
The kernel stack is not associated with any particular process it is used by kernel to keep track of its own functions and the system calls which are invoked by processes.since system call handles kernel data structures its stack can not be maintained on process stack since then process can access private data strucutres of kernel which is harmful to kernel.

get_user_pages -EFAULT error caused by VM_GROWSDOWN flag not set

I'm continue my work on the FGPA driver.
Now I'm adding OpenCL support. So I have a following test.
It's just add NUM_OF_EXEC times write and read requests of same buffers and after that waits for completion.
Each write/read request serialized in driver and sequentially executed as DMA transaction. DMA related code can be viewed here.
So the driver takes a transaction, execute it (rsp_setup_dma and fpga_push_data_to_device), waits for interrupt from FPGA (fpga_int_handler), release resources (fpga_finish_dma_write) and begin a new one. When NUM_OF_EXEC equals to 1, all seems to work, but if I increase it, problem appears. At some point get_user_pages (at rsp_setup_dma) returns -EFAULT. Debugging the kernel, I found out, that allocated vma doesn't have VM_GROWSDOWN flag set (at find_extend_vma in mmap.c). But at this point I stuck, because neither I'm sure that I understand why this flag is needed, neither I have an idea why it is not set. Why can get_user_pages fail with the above symptomps? How can I debug this?
On some architectures the stack grows up and on others the stack grows down. See hppa and hppa64 for the weirdos that created the need for such a flag.
So whenever you have to deal with setting up the stack for a kernel thread or process you'll have to provide the direction in which the stack grows as well.

Thread stack or Thread call stack

What I have read from MSDN,
Each new thread or fiber receives its own stack space consisting of both reserved and initially committed memory.
Does the word 'stack' here really mean a 'call stack' or does it mean that it gets piece of memory that is called a stack?
The call stack lives on the stack. Each thread or fiber has its own private stack and that's what the topic you link to is discussing.
That is referring to the call stack - each thread/fiber needs its own to function. Is there a reason that you think it wouldn't be the call stack?

Allocating a buffer of more a page size on stack will corrupt memory?

In Windows, stack is implemented as followed: a specified page is followed committed stack pages. It's protection flag is as guarded. So when thead references an address on the guared page, an memory fault rises which makes memory manager commits the guarded page to the stack and clean the page's guarded flag, then it reserves a new page as guarded.
when I allocate an buffer which size is more than one page(4KB), however, an expected error haven't happen. Why?
Excellent question (+1).
There's a trick, and few people know about it (besides driver writers).
When you allocate large buffer on the stack - the compiler automatically adds so-called stack probes. It's an extra code (implemented in CRT usually), which probes the allocated region, page-by-page, in the needed order.
EDIT:
The function is _chkstk.
The fault doesn't reach your program - it is handled by the operating system. Similar thing happens when your program tries to read memory that happens to be written into the swap file - a trap occurs and the operating system unswaps the page and your program continues.

Resources