DllMain DLL_PROCESS_DETACH and GetMessage Function reentrancy - winapi

I have written a global hook that hooks using SetWindowsHookEx the WH_GETMESSAGE, WH_CALLWNDPROC and WH_CALLWNDPROCRET.
The hook dll creates a new thread in the hooked process, which, among other things, checks the audio state of the process and calls IAudioSessionManager2::GetSessionEnumerator().
Now the interesting part, I had called UnhookWindowsHookEx() from the hook host AND during the time my dll's worker thread was running the call to IAudioSessionManager2::GetSessionEnumerator(). That call was in the same thread's call stack, where the DllMain with DLL_PROCESS_DETACH was invoked.
I assume the reason was, that GetSessionEnumerator() invokes GetMessage() function somewhere and the latter is reentrant. Unfortunately I do not remember precisely, but I think I saw that in the call stack.
But there are multiple important things I wonder about and things that remain unclear. So here come my related questions:
Can DllMain with DLL_PROCESS_DETACH invoked any time, even in a thread which runs functions from that dll which is currently being unloaded?
What happens with the functions up the stack when DllMain DLL_PROCESS_DETACH exits? Will the code in the functions up the call stack execute eventually?
What if these functions do not exit? When will the dll be unloaded?
Can the DllMain DLL_PROCESS_DETACH be similarly invoked during the callbacks for WH_GETMESSAGE, WH_CALLWNDPROC and WH_CALLWNDPROCRET hooks? I know and have experimentally confirmed that sometimes, although not too often, these functions are reentrant, so the calls to these functions can be injected during the time previous call is still running in the same stack, but I do not know whether also calls to DllMain can be injected in a similar manner.
When precisely the DllMain can be invoked in a thread - are there some specific Windows API functions that need to be called and which in turn may lead to DllMain DLL_PROCESS_DETACH call, or could it happen at any instruction?
If the DllMain DLL_PROCESS_DETACH call can be "injected" at any time AND functions up the call stack do not get executed anymore, then how do I know where precisely was the function up the call stack interrupted? So I could inside DllMain release some handles or resources allocated by the function up the stack.
Is there any way to temporarily prevent/postpone calls to DllMain DLL_PROCESS_DETACH? Locks obviously do not help if the call/interruption occurs in the same stack.
Unfortunately I probably cannot experimentally solve these questions since I have had my hooking (and also unhooking) code running on multiple computers for months before such situation with DllMain occured during unhooking. Though for some reason it then occured with four different programs at once...
Also please, would someone with enough reputation like to merge the tags "reentrant" and "reentrancy"?

So thanks to Hans I now know regarding point (4) that DllMain DLL_PROCESS_DETACH will not be reentrant with hook procedures.
Regarding the thread that the hook created, my log files currently indicate that if DllMain DLL_PROCESS_DETACH is injected into the stack of this thread then that thread indeed will be terminated after DllMain exits and will NOT run to completion. That should answer points (2) and (3). The question itself implicitly answers the point (1).
But for solving the problem for that thread the hook created, I assume that the DllMain DLL_PROCESS_DETACH can be prevented by calling
GetModuleHandleEx
(
GET_MODULE_HANDLE_EX_FLAG_FROM_ADDRESS,
(LPCTSTR)DllMain,
&hModule_thread
)
and before the thread termination calling
FreeLibraryAndExitThread(hModule_thread, 0)
So using GetModuleHandleEx should answer the point (7), which in turn makes all other points irrelevant. Of course I have to use some IPC to trigger the thread termination in the hooked processes.
The remaining interesting question is point (5), but it is just out of curiosity:
"When precisely the DllMain DLL_PROCESS_DETACH can be invoked in a thread - are there some specific Windows API functions that need to be called and which in turn may lead to DllMain DLL_PROCESS_DETACH call, or could it happen at any instruction?"

Related

A DLL should free heap memory only if the DLL is unloaded dynamically?

Question Purpose: Reality check on the MS docs of DllMain.
It is "common" knowledge that you shouldn't do too much in DllMain, there are definite things you must never do, some best practises.
I now stumbled over a new gem in the docs, that makes little sense to me: (emph. mine)
When handling DLL_PROCESS_DETACH, a DLL should free resources such as
heap memory only if the DLL is being unloaded dynamically (the
lpReserved parameter is NULL). If the process is terminating (the
lpvReserved parameter is non-NULL), all threads in the process except
the current thread either have exited already or have been explicitly
terminated by a call to the ExitProcess function, which might leave
some process resources such as heaps in an inconsistent state. In this
case, it is not safe for the DLL to clean up the resources. Instead,
the DLL should allow the operating system to reclaim the memory.
Since global C++ objects are cleaned up during DllMain/DETACH, this would imply that global C++ objects must not free any dynamic memory, because the heap may be in an inconsistent state. / When the DLL is "linked statically" to the executable. / Certainly not what I see out there - global C++ objects (iff there are) of various (ours, and third party) libraries allocate and deallocate just fine in their destructors. (Barring other ordering bugs, o.c.)
So, what specific technical problem is this warning aimed at?
Since the paragraph mentions thread termination, could there be a heap corruption problem when some threads are not cleaned up correctly?
The ExitProcess API in general does the follwoing:
Enter Loader Lock critical section
lock main process heap (returned by GetProcessHeap()) via HeapLock(GetProcessHeap()) (ok, of course via RtlLockHeap) (this is very important step for avoid deadlock)
then terminate all threads in process, except current (by call NtTerminateProcess(0, 0) )
then call LdrShutdownProcess - inside this api loader walk by loaded module list and sends DLL_PROCESS_DETACH with lpvReserved nonnull.
finally call NtTerminateProcess(NtCurrentProcess(), ExitCode ) which terminates the process.
The problem here is that threads terminated in arbitrary place. For example, thread can allocate or free memory from any heap and be inside heap critical section, when it terminated. As a result, if code during DLL_PROCESS_DETACH tries to free a block from the same heap, it deadlocks when trying to enter this heap's critical section (if of course heap implementation use it).
Note that this does not affect the main process heap, because we call HeapLock for it before terminate all threads (except current). The purpose of this: We wait in this call until all another threads exit from process heap critical section and after we acquire the critical section, no other threads can enter it - because the main process heap is locked.
So, when we terminate threads after locking the main heap - we can be sure that no other threads that are killed are inside main heap critical section or heap structure in inconsistent state. Thanks to RtlLockHeap call. But this is related only to main process heap. Any other heaps in the process are not locked. So these can be in inconsistent state during DLL_PROCESS_DETACH or can be exclusively acquired by an already terminated thread.
So - using HeapFree for GetProcessHeap or saying LocalFree is safe (however not documented) here.
Using HeapFree for any other heaps is not safe if DllMain is called during process termination.
Also if you use another custom data structures by several threads - it can be in inconsistent state, because another threads (which can use it) terminated in arbitrary point.
So this note is warning that when lpvReserved parameter is non-NULL (what is mean DllMain is called during process termination) you need to be especially careful in clean up the resources. Anyway all internal memory allocations will be free by operation system when process died.
As an addendum to RbMm's excellent answer, I'll add a quote from ExitProcess that does a much better job - than the DllMain docs do - at explaining, why heap operation (or any operation, really) can be compromised:
If one of the terminated threads in the process holds a lock and the
DLL detach code in one of the loaded DLLs attempts to acquire the same
lock, then calling ExitProcess results in a deadlock. In contrast, if
a process terminates by calling TerminateProcess, the DLLs that the
process is attached to are not notified of the process termination.
Therefore, if you do not know the state of all threads in your
process, it is better to call TerminateProcess than ExitProcess. Note
that returning from the main function of an application results in a
call to ExitProcess.
So, it all boils down to: IFF you application has "runaway" threads that may hold any lock, the (CRT) heap lock being a prominent example, you have a big problem during shutdown, when you need to access the same structures (e.g. the heap), that your "runaway" threads are using.
Which just goes to show that you should shut down all your threads in a controlled way.

Why is ExitProcess necessary under Win32 when you can use a RET?

I've noticed that many assembly language examples built using straight Win32 calls (no C Runtime dependency) illustrate the use of an explicit call to ExitProcess() to end the program at the end of the entry-point code. I'm not talking about using ExitProcess() to exit at some nested location within the program. There are surprisingly fewer examples where the entry-point code simply exits with a RET instruction. One example that comes to mind is the famous TinyPE, where the program variations exit with a RET instruction, because a RET instruction is a single byte. Using either ExitProcess() or a RET both seem to do the job.
A RET from an executable's entry-point returns the value of EAX back to the Windows loader in KERNEL32, which ultimately propagates the exit code back to NtTerminateProcess(), at least on Windows 7. On Windows XP, I think I remember seeing that ExitProcess() was even called directly at the end of the thread-cleanup chain.
Since there are many respected optimizations in assembly language that are chosen purely on generating smaller code, I wonder why more code floating around prefers the explicit call to ExitProcess() rather than RET. Is this habit or is there another reason?
In its purest sense, wouldn't a RET instruction be preferable to a direct call to ExitProcess()? A direct call to ExitProcess() seems akin to exiting your program by killing it from the task manager as this short-circuits the normal flow of returning back to where the Windows loader called your entry-point and thus skipping various thread cleanup operations?
I can't seem to locate any information specific to this issue, so I was hoping someone could shed some light on the topic.
If your main function is being called from the C runtime library, then exiting will result in a call to ExitProcess() and the process will exit.
If your main function is being called directly by Windows, as may well be the case with assembly code, then exiting will only cause the thread to exit. The process will exit if and only if there are no other threads. That's a problem nowadays, because even if you didn't create any threads, Windows may have created one or more on your behalf.
As far as I know this behaviour is not properly documented, but is described in Raymond Chen's blog post, "If you return from the main thread, does the process exit?".
(I have also tested this myself on both Windows 7 and Windows 10 and confirmed that they behaved as Raymond describes.)
Addendum: in recent versions of Windows 10, the process loader is itself multi-threaded, so there will always be additional threads present when the process first starts.

BUG: Scheduling while atomic .... using sysfs_notify()

I have a kernel module that uses hrtimers to notify userspace when the timer has fired. I understand I can just use userspace timers, but it is emulating a driver that will actually talk to hardware in the future. Every once in a while I get a BUG: Scheduling while atomic. After doing some research I am assuming that the hrtimer.function that I register as a callback, is being called from an interrupt routine by the kernel internals (making my callback function in an "Atomic Context"). Then when I call sysfs_notify() within the callback, I get the kernel bug, because sysfs_notify() acquires a mutex.
1) Is this a correct assumption?
If this is correct, I have seen that there is a function called sys_notify_dirent() that I can use to notify userspace from an atomic context. But according to this source:
http://linux.derkeiler.com/Mailing-Lists/Kernel/2009-10/msg07510.html
It can only be called from a "process" context, and not an interrupt context (due to the spinlock).
2) Could someone explain the difference between process, interrupt, and atomic context?
3) If this cannot be used in an interrupt context, what is an alternative to notifying userspace in this context?
Correct, sysfs_notify() cannot be called from atomic context. And yes, sysfs_notify_dirent() appears to be safe to call from atomic context. The source you cite is a bug report that notices in an old kernel version that statement wasn't actually true, along with a patch to fix it. It now appears to be safe to call.
Follow the source code in gpiolib_sysfs.c, and you'll notice that sysfs_notify_dirent() eventually calls schedule_work(), which defers the actual call to sysfs_notify(), which is exactly what the comments to your question are advising you to do. It's just wrapped inside the convenience function.

Loading/calling ntdll from DllMain

One should not use functions other than those in kernel32.dll from DllMain:
From MS documentation:
Because Kernel32.dll is guaranteed to be loaded in the process address space when the entry-point function is called, calling functions in Kernel32.dll does not result in the DLL being used before its initialization code has been executed. Therefore, the entry-point function can call functions in Kernel32.dll that do not load other DLLs. For example, DllMain can create synchronization objects such as critical sections and mutexes, and use TLS. Unfortunately, there is not a comprehensive list of safe functions in Kernel32.dll.
...
Calling functions that require DLLs other than Kernel32.dll may result in problems that are difficult to diagnose. For example, calling User, Shell, and COM functions can cause access violation errors, because some functions load other system components. Conversely, calling functions such as these during termination can cause access violation errors because the corresponding component may already have been unloaded or uninitialized.
My question:
But the documentation does not mention ntdll.dll. - Can I call LoadLibrary for "ntdll" and use functions in ntdll from DllMain:
1) during DLL_PROCESS_ATTACH (load and use functions of ntdll)?
2) during DLL_PROCESS_DETACH (use functions of previously loaded ntdll)?
Also, please, would somebody with 1500+ reputation like to create a new tag titled "dllmain" ?
The answer to the question "is it safe in DllMain" always defaults to "no". In this case, calling LoadLibrary is never okay.
Generally speaking, calling anything in ntdll.dll is not recommended even places where it is safe to do so.

Is it necessary to explicitly stop all threads prior to exiting a Win32 application?

I have a Win32 native VC++ application that upon entering WinMain() starts a separate thread, then does some useful job while that other thread is running, then simply exits WinMain() - the other thread is not explicitly stopped.
This blog post says that a .NET application will not terminate in this case since the other thread is still running. Does the same apply to native Win32 applications?
Do I have to stop all threads prior to exiting?
Yes, you have to if you are simply exiting or terminating the main thread via ExitThread or TerminateThread, otherwise your application may not fully shutdown. I recommend reading Raymond Chen's excellent blog posts on this topic:
The old-fashioned theory on how processes exit
Quick overview of how processes exit on Windows XP
How my lack of understanding of how processes exit on Windows XP forced a security patch to be recalled
During process termination, the gates are now electrified
If you return from the main thread, does the process exit?
But please note in particular that if you properly return from the main or WinMain function, the process will exit as described by the ExitProcess API documentation and the last post by Raymond Chen that is being linked above!
The short of it is:
For a native Win32 process to terminate, one of two conditions must be met:
Someone calls ExitProcess or TerminateProcess.
All the threads exit (by returning from their ThreadProc (including the WinMainEntryPoint that is the first thread created by windows)), close (by calling ExitThread), or terminated (someone calls TerminateThread).
(The first condition is actually the same as the 2nd: ExitProcess and TerminateProcess, as part of their cleanup, both call TerminateThread on each thread in the process).
The c-runtime imposes different conditions: For a C/C++ application to terminate, you must either:
return from main (or WinMain).
call exit()
Calling exit() or returning from main() both cause the c-runtime to call ExitProcess(). Which is how c & c++ applications exit without cleaning up their threads. I, personally, think this is a bad thing.
However, non trivial Win32 processes can never terminate because many perfectly, otherwise reasonable, Win32 subsystems create worker threads. winsock, ole, etc. And do not provide any way to cause those threads to spontaneously close.
No, when WinMain returns, the process will be terminated, and this means all threads spawned by the process should be terminated though they might not be closed gracefully.
However, it is possible that a primary thread is terminated while the other threads are running, resulting in the application is still running. If you call ExitThread (not exit or ExitProcess) in WinMain, and there are running threads (eventually created by the primary thread), then, you may observe this behavior. Nonetheless, just return in WinMain will call ExitProcess, and that means all threads are should be terminated.
Correct me if it's wrong.
I think you can first close all your windows(so the user won't see your application), and then set a flag for exit, your thread should check the flag periodicly, and once found set, the thread should return.
after set the flag, your main thread could call ::WaitForSingleObject() or ::WaitForMultipleObjects() for a while (say, three seconds), if the thread(s) not return, just kill them by ::TerminateThread().
Want to improve this post? Provide detailed answers to this question, including citations and an explanation of why your answer is correct. Answers without enough detail may be edited or deleted.
short answer : yes

Resources