Why does process loads modules(dlls) in different phases?

Why does process loads modules(dlls) in different phases? - windows

The phenomenon goes like this.
I'm trying to implement dll injection. My application create process in 'suspended' state, (Using CreateProcess with CREATE_SUSPENDED), wait for an important dll to be loaded(kernel32.dll of course), and then executes injection. The suspended process is resumed using ResumeThread after the injection.
I'm using suspended process creation in the hope that dll is injected prior to any other execution except dll load. But while executing this, I found something interesting.
I've simply tested this injection using notepad.exe and it worked just fine. But when I've duplicated the executable as notepad2.exe, my injection waits for infinite time. Kernel32.dll never loads up until the main thread of the process is resumed.
It seems there's some sort of 'authenticated' executable that modules are loaded beforehand (or maybe some security solutions installed in my computer that already uses dll injection might cause this different execution).
Does Anyone knows if there's any sort of thing that modules are loaded in different manner? (Frankly, I still don't understand the life cycle of windows process...)

I think I've found an answer. The answer is that though the process is in SUSPENDED state, and therefore main thread is stopped, when certain thread in that process tries to access to a module, then the required modules are loaded (Which means, that windows application seems to 'Lazy Load' require modules)
In the comment, I said that the process protected by certain security solution installed on my PC uses Detour to inject DLL, and therefore modules in certain Process loaded while the process is in SUSPENDED state.
But that was not a specific case of Detour, but general issues on every processes.
I was stupid that the injection technique I used was, to list all the modules loaded in the target process, find kernel32.dll, calculate the offset of LoadLibraryW, and call that function using remote thread. In this technique, the injector enumerates all the modules BEFORE LoadLibrary functions are ever called, and therefore LazyLoading never happens, and therefore I can never see kernel32.dll loaded on the process.
When I used simple technique that use GetProcAddress from current process, get the address of LoadLibrary function, and remotely calling the process, kernel32.dll was successfully loaded and the injection succeeded.
In case of several applications protected through the 'security solutions using Detour', even though I suspended the process, the injection tries do happen, and that's why kernel32.dll was loaded on notepad.exe (which is in the list of the security solution), but not loaded on notepad2.exe (as image name differed, it is not in the list).
I was just stupid, trying to use more complicated methods for dll injection, and that's what caused all these problems.
And thanks a lot for the comments.

Related

How Does GetStringType Cause a Deadlock in Dll Main?

I'm looking at an application which hangs on Server 2016 but runs find on Server 2008 R2. I have traced it to a hang caused by deadlock when a particular DLL loads. Analysing the DLL (I don't have the source code) I can see that it violates the guideline here
Call GetStringTypeA, GetStringTypeEx, or GetStringTypeW (either
directly or indirectly). This can cause a deadlock or a crash
Specifically it calls GetStringTypeW from DllMain.
I'm trying to understand how this function can cause deadlocks in DllMain.

Your DllMain function runs inside the loader lock, one of the few
times the OS lets you run code while one of its internal locks is
held. This means that you must be extra careful not to violate a lock
hierarchy in your DllMain; otherwise, you are asking for a deadlock. (Refer to "Another reason not to do anything scary in your DllMain: Inadvertent deadlock")
DllMain is called while the loader-lock is held, so if GetStringType acquires the loader lock, deadlock appears.
Significant restrictions are imposed on the functions that can be
called within DllMain. As such, DllMain is designed to perform minimal
initialization tasks, by using a small subset of the Microsoft®
Windows® API. You cannot call any function in DllMain that directly or
indirectly tries to acquire the loader lock. Otherwise, you will
introduce the possibility that your application deadlocks or crashes.
We can't see source code of GetStringType function. It is suggested to follow Dynamic-Link Library Best Practices.

Calling a function from DllMain that will (directly or indirectly) attempt to load another module is going to deadlock.
Module loading is serialized. The system uses a lock to ensure that at any given time no more than a single DLL entry point is entered. If you are calling LoadLibrary (directly or indirectly) from your DLL entry point, you are holding the loader lock. LoadLibrary will attempt to acquire the loader lock itself prior to calling into the module's entry point. So while your code is waiting for LoadLibrary to return, LoadLibrary is waiting for the lock you are holding. That's a deadlock.
This is likely the reason why calling GetStringType from DllMain deadlocks. Without access to source code I cannot verify this, though. You may be able to find evidence by enabling loader snaps.

What's the lifetime of a DLL after being compiled?

I was reading about it and the part about memory was really confusing. What exactly happens to a DLL after compilation?
Questions that were somewhat bugging me:
Is it loaded into memory only once and all processes that require access to it are given only a pointer to where it is?
When is it loaded? I am sure it isn't just arbitrarily loaded into memory after compilation so is there a special procedure to load it or does Windows load a DLL when a process requires it and keeps it for sharing among other processes?
From the Microsoft docs
Every process that loads the DLL maps it into its virtual address space. After the process loads the DLL into its virtual address, it can call the exported DLL functions.
How does that "mapping" look like? I found that a bit confusing.
I don't know if this is a relevant piece of info but I am specifically interested in custom DLLs (DLLs written by me), not system DLLs

An exe file lists DLLs is wants to link to, so when the loader loads an exe, it loads DLLs that are listed as required, unless they are already loaded. A DLL may run initialization code the first time it loads.
Of course a DLL can be loaded dynamically by name, then it's loaded when the LoadLibrary API call is issued by a program. This is useful to implement dynamically loadable plugins.
Windows keeps a reference counter for each DLL, so when all processes stopped referencing a DLL, either by exiting or explicitly calling FreeLibrary, Windows will unload the DLL, giving it a chance to run any cleanup code.

Diagnosing Win32 RegisterClass leak

We are trying to troubleshoot a nasty problem on a production server where the server will start misbehaving after running for awhile.
Diagnostics have led us to believe there may be a bug in a DLL that is used by one of the processes running on this server that is resulting in a global atom leak. The assumed vector is a process that is calling RegisterClass without a corresponding UnregisterClass (and the class name is using a random number as part of the name, so it's a different class name each time the process starts).
This article provided some information: https://blogs.msdn.microsoft.com/ntdebugging/2012/01/31/identifying-global-atom-table-leaks/
But we are reluctant to attempt kernel mode debugging on a production server, so we have tried installing windbg and using the !gatom command to list atoms for a given session.
I use windbg to attach to a process in one of the sessions (these processes are running as Windows Services if that matters), then invoke the !gatom command. The returned atom list doesn't have any window classes in it.
Then I read this: https://blogs.msdn.microsoft.com/oldnewthing/20150429-00/?p=44984
and it sounds like there is a separate atom table for windows classes. And no way to query it. I was hoping that we'd be able to actually see how many windows class atoms have been registered, and see if that list gets bigger over time, indicating a leak.
The documentation on !gatom is sparse, and I'm hoping I can get some expert confirmation or recommendations on how to proceed.
Does anyone have any ideas on how we can get at the list of registered Windows classes on a production server?
More detail about what happens when the server starts to misbehave:
We run many instances (>50) of the same application as separately registered services running from isolated executables and DLLs - so each of those 50 instances has their own private executables and DLLs.
During their normal run, the processes unload and reload a DLL (about every hour). There is a windows class used that's part of a "session handle" used by the DLL (the session handle is part of the registered windows class name), and that session handle is unique each time the DLL is loaded. So every hour, there is an additional Window class registration, made by a DLL (our service stays running).
After some period of time, the system will get into a state where further attempts to load the DLL in question fail. This may happen for one of the services, then gradually over time, other services will start to have the same problem.
When this happens, restarting the service does not fix the problem. The only way that we've found to get things running properly again is to reboot the server.
We are monitoring memory commit load, and we are well within the virtual memory of the server. We are even within the physical memory size.
I just did a code review the vendor of the DLL, and it looks like they are not actually calling RegisterClass from the DLL itself (they only make one RegisterClass call from the DLL, and it's a static string - not a different class name for each session). The DLL launches an EXE, and that EXE is the one that registers the session specific class name. Their EXE does call UnregisterClass (and even if it didn't, the EXE is terminated when we unload their DLL, so it seems that this may not be what is going on).
I am now out of bullets on this one. The behavior seems like some sort of resource leak or pool exhaustion. The next time this happens, I will try connecting to the failing process with windbg and see what the application atom pool looks like - but I'm not hopeful that is going to shed any light.
Update: The excellent AtomTableMonitor tool has narrowed the problem to rogue RegisterWindowMessage. I'm going to ask a more specific question focused on this exact issue: Diagnosing RegisterWindowsMessage leak

You may try using this standalone global atom monitor
The application appears to have capabilities to monitor atoms in services
that run in a different session
btw if you have narrowed it to RegisterWindowMessage
then spy++ can log the Registered messages system wide along with thread and process
spy++ (i am using it from vs2015 community)
ctrl+m select all windows in system
in the messages tab clear all and select registered
and start logging
you can also save the log (it is plain text in-spite of strange extension )
powershell -c "gc spy++.sxl -Tail 3"
<000152> 001F01A4 P message:0xC1B2 [Registered:"nsAppShell:EventID"] wParam:00000000 lParam:06EDFCE0 time:4:2
7:49.584 point:(408, 221)
<000153> 001F01A4 P message:0xC1B2 [Registered:"nsAppShell:EventID"] wParam:00000000 lParam:06EDFCE0 time:4:2
7:49.600 point:(408, 221)
<000154> 001F01A4 P message:0xC1B2 [Registered:"nsAppShell:EventID"] wParam:00000000 lParam:06EDFCE0 time:4:2
7:49.600 point:(408, 221)

File operation functions return, but are not actually committed when Windows shuts down

I am working on an MFC application that can (among other things) be used to shut Windows down. When doing this, Windows of course sends the WM_QUERYENDSESSION and WM_ENDSESSION to all applications, mine included. However, the problem is that my application, as part of some destructors, delete certain files (with CFile::Remove) that have been used during the execution. I have reason to believe that the destructors are called (but that is hard to know for certain) when the application is closed by Windows.
However, when Windows starts back up again, I do occasionally notice that the files that were supposed to be deleted are still present. This does not happen consistently, even when the execution of the program is identical (I have a script for testing this). This leads me to think that one of two things are happening: Either a) the destructors are not consistently being called, or b) the Remove function returns, but the file is not actually deleted before Windows is shut down.
The only work-around I have found so far is that if I get the system to wait with the shutdown for approximately 10 seconds after my program has stopped, then the files will be properly deleted. This leads me to believe that b) may be the case.
I hope someone is able to help me with this problem.
Regards
Mort

Once your program returns from WM_ENDSESSION, Windows can terminate it at any time:
If the session is being ended, this parameter is TRUE; the session can end any time after all applications have returned from processing this message.
If the session ends quickly, then it may end before your destructors run. You must do all your cleanup before returning from WM_ENDSESSION, because there is no guarantee that you will get a chance to do it afterwards.

The problem here is that some versions of Windows report back that file handling operations have been completed before they actually have. This isn't a problem unless shutdown is triggered as some operations, including file delete will be abandoned.
I would suggest that you cope with this by forcing your code to wait for a confirmed deletion of the files (have a process look for the files and raise an event when they've gone) before calling for system shutdown.

If the system is properly shut down (nut went sudden power loss or etc.) then all the cached data is flushed. In particular this includes flushing the global file descriptor table (or whatever it's called in your file system) which should commit the file deletion.
So the problem seems to be that the user-mode code doesn't call DeleteFile, or it failes (for whatever reason).
Note that there are several ways the application (process) may exit, whereas not always d'tors are called. There are automatic objects which are destroyed in the context of their callstack, plus there are global/static objects, which are initialized and destroyed by the CRT init/cleanup code.
Below is a short summary of ways to terminate the process, with the consequences:
All process threads exit conventionally (return from their procedure). The OS terminates the process that has no threads. All the d'tors are executed.
Some threads either exit via ExitThread or killed by TerminateThread. The automatic objects of those threads are not d'tructed.
Process exited by ExitProcess. Automatic objects are not destructed, global may be destructed (this happens in the CRT is used in a DLL)
Process is terminated by TerminateProcess. All d'tors are not called.
I suggest you check if the DeleteFile (or CFile::Remove that wraos it) is called indeed, and check also if it succeeds. For instance you may open the same file twice for whatever reason

Flow of execution of file in WINDOWS

My question is:
What is the flow of execution of an executable file in WINDOWS? (i.e. What happens when we start a application.)
How does the OS integrate with the application to operate or handle the application?
Does it have a central control place that checks execution of each file or process?
Is the process registered in the execution directory? (if any)
What actually happens that makes the file perform its desired task within WINDOWS environment?
All help is appreciated...

There's plenty going on beyond the stages already stated (loading the PE, enumerating the dlls it depends on, calling their entry points and calling the exe entry point).
Probably the earliest action is the OS and the processor collaborating to create a new address space infrastructure (I think essentially a dedicated TLB). The OS initializes a Process Environment Block, with process-wide data (e.g., Process ID and environment variables). It initializes a Thread Environment Block, with thread-specific data (thread id, SEH root handlers, etc). Once an appropriate address is selected for every dll and it is loaded there, re-addressing of exported functions happens by the windows loader. (very briefly - at compile time, the dll cannot know the address at which it will be loaded by every consumer exe. the actual call addresses of its functions is thus determined only at load time). There are initializations of memory pages shared between processes - for windows messages, for example - and I think some initialization of disk-paging structures. there's PLENTY more going on. The main windows component involved is indeed the windows loader, but the kernel and executive are involved. Finally, the exe entry point is called - it is by default BaseProcessStart.
Typically a lot of preparation happens still after that, above the OS level, depending on used frameworks (canonical ones being CRT for native code and the CLR for managed): the framework must initialize its own internal structures to be able to deliver services to the application - memory management, exception handling, I/O, you name it.
A great place to go for such in depth discussions is Windows Internals. You can also dig a bit deeper in forums like SO, but you have to be able to break it down into more focused bits. As phrased, this is really too much for an SO post.

This is high-level and misses many details:
The OS reads the PE Header to find the dlls that the exe depends on and the exe's entry point
The DllMain function in all linked dlls is called with the DLL_PROCESS_ATTACH message
The entry point of the exe is called with the appropriate arguments
There is no "execution directory" or other central control other than the kernel itself. However, there are APIs that allow you to enumerate and query the currently-running processes, such as EnumProcesses.

Your question isn't very clear, but I'll try to explain.
When you open an application, it is loaded into RAM from your disk.
The operating system jumps to the entry point of the application.
The OS provides all the calls required for showing a window, connecting with stuff and receiving user input. It also manages processing time, dividing it evenly between applications.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio