in clone (2) man page, for child stack its mentioned that
Since the child and calling process may share memory, it is not possible for the child
process to execute in the same stack as the calling process.
can anybody please explain how "sharing memory" ,specifically, makes it impossible. OTOH, a common perception is that the function execution sequence in a thread will be different from others, so we need another stack there.
Thanks,
Kapil
Two threads can't use the same stack. They'd just mess it up for each other, and soon crash.
When using fork, there's no memory sharing. Both threads have the same value of the stack pointer, but it points to physically different memory pages.
When using pthread_create, a new stack pointer is chosen for the new thread, separate from the parent. This way they don't corrupt each other's stack.
clone is a low-level function, which is somewhere between the two. It keeps memory shared, so the threads must not shared the stack. But unlike pthread_create, the new stack pointer is determined by the user, which may choose it as he wishes. The sentence you quote warns that you should choose it with care.
Related
I have a heavily multi-threaded application under Linux consuming lots of memory and I am trying to categorize its RSS. I found particularly challenging to estimate total RSS of all thread stacks in program. I had following ideas:
Idea 1: look into /proc/<pid>/smaps and consider mappings for stacks; there is an information regarding resident size of each mapping but only the main thread mapping is annotated like [stack]; the rest of them is indistinguishable from regular 8 MiB mappings (with default stack size). Also reading /proc/<pid>/smaps is pretty expensive as it produces contention on kernel innternal VMA data structures.
Idea 2: look into /proc/<tid>/status; there is VmStk section which should describe stack resident size, but it always shows stack size of a main thread. It looks pretty clear why: beacuse main thread is the only one for which kernel allocates stack by itself, while the rest of threads gets stack from pthreads code which allocates it as a regular memory mapping.
Idea 3: traverse threads from user-space using some stuff from pthreads, retrieve stack mapping address and stack size for each thread and then find out how many pages are resident using mincore(2). As a possible optimization, we may skip calling mincore for sleeping threads using the cached value for them. Unfortunately, I did not find any suitable way to iterate over pthread_t structures. Note that part of the threads comes from the libraries which I am not able to control, so maintaining any kind of thread registry by registering threads on startup is not possible.
Idea 4: use ptrace(2) to retrieve thread registers, retrive stack pointers from them, then proceed with Idea 1. This way looks excessively hard and intrusive.
Can anybody provide me more or less intended way to do so? Being non-portable is OK.
Two more ideas I got after some extra research:
Idea 5: from man 5 proc on /proc/<pid>/maps:
There are additional helpful pseudo-paths:
[stack]
The initial process's (also known as the main thread's) stack.
[stack:<tid>] (since Linux 3.4)
A thread's stack (where the <tid> is a thread ID). It corresponds to the /proc/[pid]/task/[tid]/ path.
It looks intriguing, but it seems that this logic has been reverted as it was implemented ineffiiently: https://lore.kernel.org/patchwork/patch/716239/. Man page seems obsolete (at least on my Ubuntu Disco 19.04).
Idea 6: This one may actually work. There is an /proc/<tid>/syscall file which may expose thread stack register for a blocked thread. Considering the fact that most of my threads are sleeping on I/O, this allows me to track their rsp value, which I may project onto /proc/<pid>/maps to find the correspondence between thread and its stack mapping. After that I may implement Idea 3.
I'm doing a project where I need full control over the address space of the process. I need to move the thread's stack away from where it currently is to a predefined area chosen by me, because I need to deallocate the original stack memory. I couldn't find anything on how to do this, only how to deal with the stack size, but that's not what I need. I have two ideas how to do this, none of them being ideal:
Set ESP and EBP to my predefined area and update the stack base and stack limit fields in the thread's TEB. This sounds like a bad idea since it's hard to know if there are other places I would have to update as well, let alone the possibility of the kernel keeping bookkeeping information internally about the stack's location.
Reserve memory everywhere to basically force a new thread's stack to be allocated in the space that I've left available. This is an awful idea, I know.
Is it at all possible to do something like this? It doesn't have to be the same thread.
Edit: Anything will do as long I get to deallocate the original stack and decide the new/old stack's new location. So copying/moving the stack, killing the old thread and starting a new one with a stack at a predefined location etc. should do just fine. I don't need the old thread, I just need a way to force a thread to run at a certain location (already solved) and have its stack in a safe location decided by me. So in that case it's fine to discard the old stack data as currently I don't depend on it.
If you want to free the the system allocate stack you are opening a can of worms. The problem is that you need to know the structure of all the stack frames above your thread. These frames could reference addresses on the stack so deleting them could cause all kinds of problems.
You could create a thread with a 1-page stack and not deallocate it. Then allocate your own block of memory and move its address into the stack pointer register. in your top level thread routine.
I'm a computer undergraduate taking operating systems course. For my assignment, I am required to implement a simple thread management system.
I'm in the process of creating a struct for a TCB. According to my lecture notes, what I could have in my TCB are:
registers,
program counter,
stack pointer,
thread ID and
process ID
Now according to my lecture notes, each thread should have its own stack. And my problem is this:
Just by storing the stack pointer, can I keep a unique stack per thread? If I did so, won't one stack of a thread over write other's stack?
How can I prevent that? Limit the stack for each thread??? Please tell me how this is usually done in a normal operating system.
Please help. Thanks in advance.
The OS may control stack growth by monitoring page faults from inaccessible pages located around the stack portion of the address space. This can help with detection of stack overflows by small amounts.
But if you move the stack pointer way outside the stack region of the address space and use it to access memory, you may step into the global variables or into the heap or the code or another thread's stack and corrupt whatever's there.
Threads run in the same address space for a reason, to share code and data between one another with minimal overhead and their stacks usually aren't excepted from sharing, from being accessible.
The OS is generally unable to do anything about preventing programs from stack overflows and corruptions and helping them to recover from those. The OS simply doesn't and can't know how an arbitrary program works and what it's supposed to do, hence it can't know when things start going wrong and what to do about them. The only thing the OS can do is just terminate a program that's doing something very wrong like trying to access inaccessible resources (memory, system registers, etc) or execute invalid or inaccessible instructions.
I have been playing a while with ptrace. I followed some tutorials like this one or this one. So far, when I have a ptrace-d child process, I am able to:
Detect system calls and browse the registers.
Fetch the strings contained in addresses pointed by the registers, thanks to the PTRACE_PEEKDATA option of ptrace.
Change the values of those registers and change memory values in the user space of the child process thanks to the PTRACE_POKEDATA option of ptrace.
My problem is the following: let's say that for example I have just detected an open system call. I can modify the filename of the file to be opened thanks to the address stored in the ebx register. However, I wonder if I can just change the filename to anything I want, any size. If the name I am changing to is really large (let's say 50 times the original filename length), wouldn't I be messing with some memory I should not be writing on? Should I 'allocate' some memory in the child's memory space? If so, how would this be done?
Note that the child process is some program executed with execve, I cannot access its source code.
The pathname passed to open could be dynamically allocated by the program (so its on the heap or stack somewhere), or it could be in the read-only section if it was a compile-time constant. In either case, you don't know what other parts of the program might be using it, so its probably not a good idea to change its contents. You would definitely overwrite adjacent memory if you wrote past the current length (which would probably lead to subtle problems like corrupting heap meta-data or corrupting other random allocation objects).
Here are some random ideas (totally untested) on how to allocate memory in a child process:
invoke an mmap syscall on its behalf (this would probably be pretty tricky) but would get you a page (or more) of memory to play with
allocate some space in the current stack (don't change the child's registers, but use your knowledge of which part of the stack the child is using to put temporary objects in the unused section). Technically its legal for the child process to do this same thing (so you could end up corrupting that data), but its very unlikely.
hide stuff at the far end of the stack, (again assuming the child isn't also playing this trick).
I didn't think invoking malloc would be easy, but googling for 'ptrace child allocate memory' I found: http://www.hick.org/code/skape/papers/needle.txt (which finds the malloc routine used by the ELF dynamic linker and constructs a call out to there to allocate memory).
I know that there is no special difference between thread and processing linux, except keeping the cr3 register untouched during the thread switch and tlb flush during process switch.
Since the threads in groud share same address space and as pgd(page table) is not changed meaning whole memory layout is shared, and hence stack space also gets shared, but as per the general definition thread owns its own stack, how is this acheived in linux.
if its like threadA has stack from x-y range, then at the first pagefault occurs and page table is updated, similarly threadB which uses the range u-v, would update the same pagetable. Hence it is possible to mess up the stack of threadB from threadA.
I just want to get the clear picture on this, help me out.Is this the safe implementation of thread?.
That's correct, there is no OS-enforced protection of the stack memory between threads. One thread A can corrupt the stack of another thread B (if thread A knows where in memory to look).