Question Purpose: Reality check on the MS docs of DllMain.
It is "common" knowledge that you shouldn't do too much in DllMain, there are definite things you must never do, some best practises.
I now stumbled over a new gem in the docs, that makes little sense to me: (emph. mine)
When handling DLL_PROCESS_DETACH, a DLL should free resources such as
heap memory only if the DLL is being unloaded dynamically (the
lpReserved parameter is NULL). If the process is terminating (the
lpvReserved parameter is non-NULL), all threads in the process except
the current thread either have exited already or have been explicitly
terminated by a call to the ExitProcess function, which might leave
some process resources such as heaps in an inconsistent state. In this
case, it is not safe for the DLL to clean up the resources. Instead,
the DLL should allow the operating system to reclaim the memory.
Since global C++ objects are cleaned up during DllMain/DETACH, this would imply that global C++ objects must not free any dynamic memory, because the heap may be in an inconsistent state. / When the DLL is "linked statically" to the executable. / Certainly not what I see out there - global C++ objects (iff there are) of various (ours, and third party) libraries allocate and deallocate just fine in their destructors. (Barring other ordering bugs, o.c.)
So, what specific technical problem is this warning aimed at?
Since the paragraph mentions thread termination, could there be a heap corruption problem when some threads are not cleaned up correctly?
The ExitProcess API in general does the follwoing:
Enter Loader Lock critical section
lock main process heap (returned by GetProcessHeap()) via HeapLock(GetProcessHeap()) (ok, of course via RtlLockHeap) (this is very important step for avoid deadlock)
then terminate all threads in process, except current (by call NtTerminateProcess(0, 0) )
then call LdrShutdownProcess - inside this api loader walk by loaded module list and sends DLL_PROCESS_DETACH with lpvReserved nonnull.
finally call NtTerminateProcess(NtCurrentProcess(), ExitCode ) which terminates the process.
The problem here is that threads terminated in arbitrary place. For example, thread can allocate or free memory from any heap and be inside heap critical section, when it terminated. As a result, if code during DLL_PROCESS_DETACH tries to free a block from the same heap, it deadlocks when trying to enter this heap's critical section (if of course heap implementation use it).
Note that this does not affect the main process heap, because we call HeapLock for it before terminate all threads (except current). The purpose of this: We wait in this call until all another threads exit from process heap critical section and after we acquire the critical section, no other threads can enter it - because the main process heap is locked.
So, when we terminate threads after locking the main heap - we can be sure that no other threads that are killed are inside main heap critical section or heap structure in inconsistent state. Thanks to RtlLockHeap call. But this is related only to main process heap. Any other heaps in the process are not locked. So these can be in inconsistent state during DLL_PROCESS_DETACH or can be exclusively acquired by an already terminated thread.
So - using HeapFree for GetProcessHeap or saying LocalFree is safe (however not documented) here.
Using HeapFree for any other heaps is not safe if DllMain is called during process termination.
Also if you use another custom data structures by several threads - it can be in inconsistent state, because another threads (which can use it) terminated in arbitrary point.
So this note is warning that when lpvReserved parameter is non-NULL (what is mean DllMain is called during process termination) you need to be especially careful in clean up the resources. Anyway all internal memory allocations will be free by operation system when process died.
As an addendum to RbMm's excellent answer, I'll add a quote from ExitProcess that does a much better job - than the DllMain docs do - at explaining, why heap operation (or any operation, really) can be compromised:
If one of the terminated threads in the process holds a lock and the
DLL detach code in one of the loaded DLLs attempts to acquire the same
lock, then calling ExitProcess results in a deadlock. In contrast, if
a process terminates by calling TerminateProcess, the DLLs that the
process is attached to are not notified of the process termination.
Therefore, if you do not know the state of all threads in your
process, it is better to call TerminateProcess than ExitProcess. Note
that returning from the main function of an application results in a
call to ExitProcess.
So, it all boils down to: IFF you application has "runaway" threads that may hold any lock, the (CRT) heap lock being a prominent example, you have a big problem during shutdown, when you need to access the same structures (e.g. the heap), that your "runaway" threads are using.
Which just goes to show that you should shut down all your threads in a controlled way.
I am working on a kernel module where I need to be "aware" that a given process has crashed.
Right now my approach is to set up a periodic timer interrupt in the kernel module; on every timer interrupt, I check the task_struct.state and task_struct.exitstate values for that process.
I am wondering if there's a way to set up an interrupt in the kernel module that would go off when the process terminates, or, when the process receives a given signal (e.g., SIGINT or SIGHUP).
Thanks!
EDIT: A catch here is that I can't modify the user application. Or at least, it would be a much tougher sell to the customer if I place additional requirements/constraints on s/w from another vendor...
You could have your module create a character device node and then open that node from your userspace process. It's only about a dozen lines of boilerplate to register a simple cdev in your module. Your cdev's open method will get called when the process opens the device node and the release method will be called when the device node is closed. If a process exits, either intentionally or because of a signal, all open file descriptors are closed by the kernel. So you can be certain that release will be called. This avoids any need to poll the process status and you can avoid modifying any kernel code outside of your module.
You could also setup a watchdog style system, where your process must write one byte to the device every so often. Have the write method of the cdev reset a timer. If too much time passes without a write and the timer expires, it is assumed the process has somehow failed, even if it hasn't crashed and terminated. For instance a programming bug that allowed for a mutex deadlock or placed the process into an infinite loop.
There is a point in the kernel code where signals are delivered to user processes. You could patch that, check the process name, and signal a condition variable if it matches. This would just catch signals, not intentional process exits. IMHO, this is much uglier and you'll need to deal with maintaining a kernel patch. But it's not that hard, there's a single point, I don't recall what function, sorry, where one can insert the necessary code and it will catch all signals.
What is Disadvantage of using mutex in interrupt context.Why spin lock is preferred here.
A Mutex will force the function to sleep if it's contended and sleeping is illegal when preemption is disabled or in interrupt context.
Many functions in the kernel sleep (ie. call schedule()) directly or
indirectly: you can never call them while holding a spinlock, or with
preemption disabled. This also means you need to be in user context:
calling them from an interrupt is illegal.
The following is worth reading...
https://www.kernel.org/pub/linux/kernel/people/rusty/kernel-locking/c557.html
Theres a ton of information in that doc.
When a thread tries to acquire a mutex and if it does not succeed, either due to another thread having already acquired it or due to a context switch, the thread goes to sleep until been woken-up which is critical if used in an ISR.
Whereas when a thread fails to acquire a spin_lock, it continuously tries to acquire it, until it finally succeeds, thus avoiding sleeping in an ISR. Using spin_lock in the "top-half" is a common practice followed in writing Linux Device Driver Interrupt handlers.
Hence you should use a spin-lock instead!
I have two programs, X is the normal program with which the user interacts, and program Y which cleans up the resources acquired by program Y. There can be multiple instances of X but only one of Y (I already solved that part with named mutexes). Now, since Y is a cleanup program, it should be blocked until the last instance of X disappear.
I tried using a semaphore but I couldn't figure it out. Can somebody help me?
A semaphore is one valid way of doing this, but not necessarily the best. Whenever program X starts, call ReleaseSemaphore. Whenever a process terminates, call WaitForSingleObject with a timeout of zero on the semaphore handle (be sure to also include this in the exception handler, in case the program crashes).
Process Y can regularly poll WaitForSingleObject with a zero (or a few milliseconds) timeout then. If the return value is WAIT_OBJECT_0, it must release the semaphore again immediately (otherwise it will block the last X process trying to exit!). If the return value is WAIT_TIMEOUT, there are not X processes any more.
The best solution would of course be to launch all X processes from Y. In that case, Y could just WaitForMultipleObjects on the process handles that it gets from CreateProcess with no extra "ifs and whens". This will "just work", always. It is more efficient, too.
Which leads to the second best solution... getting handles to the running processes with OpenProcess and WaitForMultipleObjects on those. The problem is where to get the process IDs from. A shared memory area might do, a pipe might do, or CreateToolhelp32Snapshot might give you that info.
Another way would be to use a named mutex object. All processes X call CreateMutex. If the mutex already exists, no harm is done (GetLastError will return ERROR_ALREADY_EXISTS, but so what). If the process terminates or crashes, all open handles are closed, and thus the mutex reference count is decremented.
The Y process calls OpenMutex. This either succeeds or fails. If it succeeds, it closes the handle again, sleeps, and tries again. If it fails, no single X process is running.
Yet another way (though it might possibly have race issues) would be creating a named shared memory segment and calling InterlockedIncrement and InterlockedDecrement at process X start and exit. Process Y knows that no X processes are running if either the shared memory object cannot be opened or the counter is zero.
I am reading following article by Robert Love
http://www.linuxjournal.com/article/6916
that says
"...Let's discuss the fact that work queues run in process context. This is in contrast to the other bottom-half mechanisms, which all run in interrupt context. Code running in interrupt context is unable to sleep, or block, because interrupt context does not have a backing process with which to reschedule. Therefore, because interrupt handlers are not associated with a process, there is nothing for the scheduler to put to sleep and, more importantly, nothing for the scheduler to wake up..."
I don't get it. AFAIK, scheduler in the kernel is O(1), that is implemented through the bitmap. So what stops the scehduler from putting interrupt context to sleep and taking next schedulable process and passing it the control?
So what stops the scehduler from putting interrupt context to sleep and taking next schedulable process and passing it the control?
The problem is that the interrupt context is not a process, and therefore cannot be put to sleep.
When an interrupt occurs, the processor saves the registers onto the stack and jumps to the start of the interrupt service routine. This means that when the interrupt handler is running, it is running in the context of the process that was executing when the interrupt occurred. The interrupt is executing on that process's stack, and when the interrupt handler completes, that process will resume executing.
If you tried to sleep or block inside an interrupt handler, you would wind up not only stopping the interrupt handler, but also the process it interrupted. This could be dangerous, as the interrupt handler has no way of knowing what the interrupted process was doing, or even if it is safe for that process to be suspended.
A simple scenario where things could go wrong would be a deadlock between the interrupt handler and the process it interrupts.
Process1 enters kernel mode.
Process1 acquires LockA.
Interrupt occurs.
ISR starts executing using Process1's stack.
ISR tries to acquire LockA.
ISR calls sleep to wait for LockA to be released.
At this point, you have a deadlock. Process1 can't resume execution until the ISR is done with its stack. But the ISR is blocked waiting for Process1 to release LockA.
I think it's a design idea.
Sure, you can design a system that you can sleep in interrupt, but except to make to the system hard to comprehend and complicated(many many situation you have to take into account), that's does not help anything. So from a design view, declare interrupt handler as can not sleep is very clear and easy to implement.
From Robert Love (a kernel hacker):
http://permalink.gmane.org/gmane.linux.kernel.kernelnewbies/1791
You cannot sleep in an interrupt handler because interrupts do not have
a backing process context, and thus there is nothing to reschedule back
into. In other words, interrupt handlers are not associated with a task,
so there is nothing to "put to sleep" and (more importantly) "nothing to
wake up". They must run atomically.
This is not unlike other operating systems. In most operating systems,
interrupts are not threaded. Bottom halves often are, however.
The reason the page fault handler can sleep is that it is invoked only
by code that is running in process context. Because the kernel's own
memory is not pagable, only user-space memory accesses can result in a
page fault. Thus, only a few certain places (such as calls to
copy_{to,from}_user()) can cause a page fault within the kernel. Those
places must all be made by code that can sleep (i.e., process context,
no locks, et cetera).
Because the thread switching infrastructure is unusable at that point. When servicing an interrupt, only stuff of higher priority can execute - See the Intel Software Developer's Manual on interrupt, task and processor priority. If you did allow another thread to execute (which you imply in your question that it would be easy to do), you wouldn't be able to let it do anything - if it caused a page fault, you'd have to use services in the kernel that are unusable while the interrupt is being serviced (see below for why).
Typically, your only goal in an interrupt routine is to get the device to stop interrupting and queue something at a lower interrupt level (in unix this is typically a non-interrupt level, but for Windows, it's dispatch, apc or passive level) to do the heavy lifting where you have access to more features of the kernel/os. See - Implementing a handler.
It's a property of how O/S's have to work, not something inherent in Linux. An interrupt routine can execute at any point so the state of what you interrupted is inconsistent. If you interrupted the thread scheduling code, its state is inconsistent so you can't be sure you can "sleep" and switch threads. Even if you protect the thread switching code from being interrupted, thread switching is a very high level feature of the O/S and if you protected everything it relies on, an interrupt becomes more of a suggestion than the imperative implied by its name.
So what stops the scehduler from putting interrupt context to sleep and taking next schedulable process and passing it the control?
Scheduling happens on timer interrupts. The basic rule is that only one interrupt can be open at a time, so if you go to sleep in the "got data from device X" interrupt, the timer interrupt cannot run to schedule it out.
Interrupts also happen many times and overlap. If you put the "got data" interrupt to sleep, and then get more data, what happens? It's confusing (and fragile) enough that the catch-all rule is: no sleeping in interrupts. You will do it wrong.
Disallowing an interrupt handler to block is a design choice. When some data is on the device, the interrupt handler intercepts the current process, prepares the transfer of the data and enables the interrupt; before the handler enables the current interrupt, the device has to hang. We want keep our I/O busy and our system responsive, then we had better not block the interrupt handler.
I don't think the "unstable states" are an essential reason. Processes, no matter they are in user-mode or kernel-mode, should be aware that they may be interrupted by interrupts. If some kernel-mode data structure will be accessed by both interrupt handler and the current process, and race condition exists, then the current process should disable local interrupts, and moreover for multi-processor architectures, spinlocks should be used to during the critical sections.
I also don't think if the interrupt handler were blocked, it cannot be waken up. When we say "block", basically it means that the blocked process is waiting for some event/resource, so it links itself into some wait-queue for that event/resource. Whenever the resource is released, the releasing process is responsible for waking up the waiting process(es).
However, the really annoying thing is that the blocked process can do nothing during the blocking time; it did nothing wrong for this punishment, which is unfair. And nobody could surely predict the blocking time, so the innocent process has to wait for unclear reason and for unlimited time.
Even if you could put an ISR to sleep, you wouldn't want to do it. You want your ISRs to be as fast as possible to reduce the risk of missing subsequent interrupts.
The linux kernel has two ways to allocate interrupt stack. One is on the kernel stack of the interrupted process, the other is a dedicated interrupt stack per CPU. If the interrupt context is saved on the dedicated interrupt stack per CPU, then indeed the interrupt context is completely not associated with any process. The "current" macro will produce an invalid pointer to current running process, since the "current" macro with some architecture are computed with the stack pointer. The stack pointer in the interrupt context may point to the dedicated interrupt stack, not the kernel stack of some process.
By nature, the question is whether in interrupt handler you can get a valid "current" (address to the current process task_structure), if yes, it's possible to modify the content there accordingly to make it into "sleep" state, which can be back by scheduler later if the state get changed somehow. The answer may be hardware-dependent.
But in ARM, it's impossible since 'current' is irrelevant to process under interrupt mode. See the code below:
#linux/arch/arm/include/asm/thread_info.h
94 static inline struct thread_info *current_thread_info(void)
95 {
96 register unsigned long sp asm ("sp");
97 return (struct thread_info *)(sp & ~(THREAD_SIZE - 1));
98 }
sp in USER mode and SVC mode are the "same" ("same" here not mean they're equal, instead, user mode's sp point to user space stack, while svc mode's sp r13_svc point to the kernel stack, where the user process's task_structure was updated at previous task switch, When a system call occurs, the process enter kernel space again, when the sp (sp_svc) is still not changed, these 2 sp are associated with each other, in this sense, they're 'same'), So under SVC mode, kernel code can get the valid 'current'. But under other privileged modes, say interrupt mode, sp is 'different', point to dedicated address defined in cpu_init(). The 'current' calculated under these mode will be irrelevant to the interrupted process, accessing it will result in unexpected behaviors. That's why it's always said that system call can sleep but interrupt handler can't, system call works on process context but interrupt not.
High-level interrupt handlers mask the operations of all lower-priority interrupts, including those of the system timer interrupt. Consequently, the interrupt handler must avoid involving itself in an activity that might cause it to sleep. If the handler sleeps, then the system may hang because the timer is masked and incapable of scheduling the sleeping thread.
Does this make sense?
If a higher-level interrupt routine gets to the point where the next thing it must do has to happen after a period of time, then it needs to put a request into the timer queue, asking that another interrupt routine be run (at lower priority level) some time later.
When that interrupt routine runs, it would then raise priority level back to the level of the original interrupt routine, and continue execution. This has the same effect as a sleep.
It is just a design/implementation choices in Linux OS. The advantage of this design is simple, but it may not be good for real time OS requirements.
Other OSes have other designs/implementations.
For example, in Solaris, the interrupts could have different priorities, that allows most of devices interrupts are invoked in interrupt threads. The interrupt threads allows sleep because each of interrupt threads has separate stack in the context of the thread.
The interrupt threads design is good for real time threads which should have higher priorities than interrupts.