Windows must do something to parse the PE header, load the executable in memory, and pass command line arguments to main().
Using OllyDbg I have set the debugger to break on main() so I could view the call stack:
It seems as if symbols are missing so we can't get the function name, just its memory address as seen. However we can see the caller of main is kernel32.767262C4, which is the callee of ntdll.77A90FD9. Towards the bottom of the stack we see RETURN to ntdll.77A90FA4 which I assume to be the first function to ever be called to run an executable. It seems like the notable arguments passed to that function are the Windows' Structured Exception Handler address and the entry point of the executable.
So how exactly do these functions end up in loading the program into memory and getting it ready for the entry point to execute? Is what the debugger shows the entire process executed by the OS before main()?
if you call CreateProcess system internally call ZwCreateThread[Ex] to create first thread in process
when you create thread - you (if you direct call ZwCreateThread) or system initialize the CONTEXT record for new thread - here Eip(i386) or Rip(amd64) the entry point of thread. if you do this - you can specify any address. but when you call say Create[Remote]Thread[Ex] - how i say - the system fill CONTEXT and it set self routine as thread entry point. your original entry point is saved in Eax(i386) or Rcx(amd64) register.
the name of this routine depended from Windows version.
early this was BaseThreadStartThunk or BaseProcessStartThunk (in case from CreateProcess called) from kernel32.dll.
but now system specify RtlUserThreadStart from ntdll.dll . the RtlUserThreadStart usually call BaseThreadInitThunk from kernel32.dll (except native (boot execute) applications, like smss.exe and chkdsk.exe which no have kernel32.dll in self address space at all ). BaseThreadInitThunk already call your original thread entry point, and after (if) it return - RtlExitUserThread called.
the main goal of this common thread startup wrapper - set the top level SEH filter. only because this we can call SetUnhandledExceptionFilter function. if thread start direct from your entry point, without wrapper - the functional of Top level Exception Filter become unavailable.
but whatever the thread entry point - thread in user space - NEVER begin execute from this point !
early when user mode thread begin execute - system insert APC to thread with LdrInitializeThunk as Apc-routine - this is done by copy (save) thread CONTEXT to user stack and then call KiUserApcDispatcher which call LdrInitializeThunk. when LdrInitializeThunk finished - we return to KiUserApcDispatcher which called NtContinue with saved thread CONTEXT - only after this already thread entry point begin executed.
but now system do some optimization in this process - it copy (save) thread CONTEXT to user stack and direct call LdrInitializeThunk. at the end of this function NtContinue called - and thread entry point being executed.
so EVERY thread begin execute in user mode from LdrInitializeThunk. (this function with exactly name exist and called in all windows versions from nt4 to win10)
what is this function do ? for what is this ? you may be listen about DLL_THREAD_ATTACH notification ? when new thread in process begin executed (with exception for special system worked threads, like LdrpWorkCallback)- he walk by loaded DLL list, and call DLLs entry points with DLL_THREAD_ATTACH notification (of course if DLL have entry point and DisableThreadLibraryCalls not called for this DLL). but how this is implemented ? thanks to LdrInitializeThunk which call LdrpInitialize -> LdrpInitializeThread -> LdrpCallInitRoutine (for DLLs EP)
when the first thread in process start - this is special case. need do many extra jobs for process initialization. at this time only two modules loaded in process - EXE and ntdll.dll . LdrInitializeThunk
call LdrpInitializeProcess for this job. if very briefly:
different process structures is initialized
loading all DLL (and their dependents) to which EXE statically
linked - but not call they EPs !
called LdrpDoDebuggerBreak - this function look - are debugger
attached to process, and if yes - int 3 called - so debugger
receive exception message - STATUS_BREAKPOINT - most debuggers can
begin UI debugging only begin from this point. however exist
debugger(s) which let as debug process from LdrInitializeThunk -
all my screenshots from this kind debugger
important point - until in process executed code only from
ntdll.dll (and may be from kernel32.dll) - code from another
DLLs, any third-party code not executed in process yet.
optional loaded shim dll to process - Shim Engine initialized. but
this is OPTIONAL
walk by loaded DLL list and call its EPs with
DLL_PROCESS_DETACH
TLS Initializations and TLS callbacks called (if exists)
ZwTestAlert is called - this call check are exist APC in thread
queue, and execute its. this point exist in all version from NT4 to
win 10. this let as for example create process in suspended state
and then insert APC call ( QueueUserAPC ) to it thread
(PROCESS_INFORMATION.hThread) - as result this call will be
executed after process will be fully initialized, all
DLL_PROCESS_DETACH called, but before EXE entry point. in context
of first process thread.
and NtContinue called finally - this restore saved thread context
and we finally jump to thread EP
read also Flow of CreateProcess
Related
I'm trying to write an APC dll injection driver, I've found this example and thought to modify it to my needs.
After I understood the code, this is how I thought to modify it (and my question come from there).
In the code, the writer used PsLookupThreadByThreadId to receive a referenced pointer to the ETHREAD structure of the targeted process.
PsLookupThreadByThreadId(pSpi->Threads[0].ClientId.UniqueThread,&Thread)
but to get the SYSTEM_THREAD_INFORMATION for the UniqueThread handle, he used ZwQuerySystemInformation
I want to load my dll right after ntdll is loaded, so I want to use PsSetCreateProcessNotifyRoutineEx and save the UniqueThread from the PS_CREATE_NOTIFY_INFO I got when the callback is called for the process I'm targeting.
And after ntdll is loaded, which I'll know thanks to PsSetLoadImageNotifyRoutineEx I could inject my dll using his APC injection logic.
my goal is to inject my dll in the PloadImageNotifyRoutine callback, but don't use ZwQuerySystemInformation as he does to get the UniqueThread, but save it in the PcreateProcessNotifyRoutineEx callback.
So, my question is: Can I trust the UniqueThread I get from PS_CREATE_NOTIFY_INFO is the same during all the process loading time?
I want to use PsSetCreateProcessNotifyRoutineEx and save the
UniqueThread from the PS_CREATE_NOTIFY_INFO I got when the
callback is called for the process I'm targeting.
about CreatingThreadId from PS_CREATE_NOTIFY_INFO
The process ID and thread ID of the process and thread that
created the new process
this id not for new created process/thread, but for creator. if you want inject self dll in the PloadImageNotifyRoutine callback - the PcreateProcessNotifyRoutineEx is useless for you.
the PloadImageNotifyRoutine called when image is mapped to target process - inside ZwMapViewOfSection . you need check that ProcessId (second parameter of PcreateProcessNotifyRoutineEx - The process ID of the process where image is loaded) is equal to PsGetCurrentProcessId(). this mean that image loaded to the current process and you can use KeGetCurrentThread() - you not need PsLookupThreadByThreadId at all
I want to load my dll right after ntdll is loaded
at this moment any user mode structures in process yet not initialized. because it initialized by ntdll. as result - if you inject your apc and force execute it at this moment - you got crash of process. nothing more
i can advice you inject your dll when kernel32.dll is loaded. and here you need check that this is load as dll, not simply image mapping - check ArbitraryUserPointer in thread teb - are it point to L"*\\kernel32.dll": smss.exe map kernel32.dll during create \\KnownDlls (ArbitraryUserPointer == 0 in this case), wow64 process several time map kernel32.dll (32 and 64 bit) with L"WOW64_IMAGE_SECTION" or L"NOT_AN_IMAGE" names in ArbitraryUserPointer
I am writing a simple debugger for learning purposes. I need to know where the Initial Breakpoint set by Windows is located to handle it properly. Read somewhere that is should be at the function DbgBreakPoint() from ntdll.dll, however that function resolves to address 0x77ab0a60 and from my tests the Initial Breakpoint always raises at address 0x77aedbcf. Is this a function or just some random address with an INT 3 instruction? If I am not mistaken ntdll.dll is always loaded at the same address, if so do programs always break at this exact address, or is there a variation?
process in user mode begin execute from LdrInitializeThunk, it call LdrpInitializeProcess. this routine, after load all static dependencies but before call it initialization routines - check are debugger present (BeingDebugged member of PEB) and if yes - call LdrpDoDebuggerBreak where exist int 3 instruction. in case wow64 process the LdrpDoDebuggerBreak will be called 2 time - from 64 and 32 bit dll. as result 64-bit debugger got 2 breakpoints - STATUS_BREAKPOINT and STATUS_WX86_BREAKPOINT.
how handle this - already debugger must select yourself. interactive debugger simply stop here. another debugger tools, usually simply skip(handle) first STATUS_BREAKPOINT (and STATUS_WX86_BREAKPOINT) by returning DBG_CONTINUE
I've just found out by accident that doing this GetModuleHandle("ntdll.dll") works without a previous call to LoadLibrary("ntdll.dll").
This means ntdll.dll is already loaded in my process.
Is it safe to assume that ntdll.dll will always be loaded on Win32 applications, so that a call to LoadLibrary is not necessary?
From MSDN on LoadLibrary() (emphasis mine):
The system maintains a per-process reference count on all loaded
modules. Calling LoadLibrary increments the reference count. Calling
the FreeLibrary or FreeLibraryAndExitThread function decrements the
reference count. The system unloads a module when its reference count
reaches zero or when the process terminates (regardless of the
reference count).
In other words, continue to call LoadLibrary() and ensure you get your handle to ntdll.dll to be safe -- but the system will almost certainly be bumping a reference count as it should already be loaded.
As for "is it really always loaded?", see Windows Internals on the Image Loader (the short answer is yes, ntdll.dll is part of the loader itself and is always present).
The relevant paragraph is:
The image loader lives in the user-mode system DLL Ntdll.dll and not in the kernel library. Therefore, it behaves just like standard code that is part of a DLL, and it is subject to the same restrictions in terms of memory access and security rights. What makes this code special is the guaranty that it will always be present in the running process (Ntdll.dll is always loaded) and that it is the first piece of code to run in user mode as part of a new application. (When the system builds the initial context, the program counter, or instruction pointer is set to an initialization function inside Ntdll.dll.)
I'm having a lot of trouble dealing with a DLL I've written in Delphi. I've set up a DllMain function using the following code in the library:
begin
DllProc := DllMain;
end.
My DllMain procedure looks like this:
procedure DllMain(reason: Integer);
begin
if reason = DLL_PROCESS_DETACH then
OutputDebugString('DLL PROCESS DETACH')
else if reason = DLL_PROCESS_ATTACH then
OutputDebugString('DLL PROCESS ATTACH')
else if reason = DLL_THREAD_ATTACH then
OutputDebugString('DLL THREAD ATTACH')
else if reason = DLL_THREAD_DETACH then
OutputDebugString('DLL THREAD DETACH')
else
OutputDebugString('DllMain');
end;
What I'm finding is that DETACH seems to be called (twice?!) by a caller (that I don't control) before ATTACH is ever called. Is that even possible, or am I misunderstanding how this is supposed to work? My expectation would be that every ATTACH call would be met with a matching DETACH call, but that doesn't appear to be the case.
What's goin' on here?!
Unfortunately when begin is executed in your dll code, the OS has already called DllMain in your library. So when your DllProc := DllMain; statement executes it is already too late. The Delphi compiler does not allow user code to execute when the dll is attached to a process. The suggested workaround (if you can call that a workaround) is to call your own DllMain function yourself in a unit initalization section or in the library code:
begin
DllProc := DllMain;
DllMain(DLL_PROCESS_ATTACH);
end;
The relevant documentation:
Note: DLL_PROCESS_ATTACH is passed to the procedure only if the DLL's initialization code calls the procedure and specifies DLL_PROCESS_ATTACH as a parameter.
What I'm finding is that DETACH seems to be called (twice?!) by a caller (that I don't control) before ATTACH is ever called.
According to "Programming Windows 5th edition" by Petzold.
DLL_PROCESS_ATTACH gets called when the application starts and
DLL_THREAD_ATTACH when a new thread inside an attached application is started.
DLL_PROCESS_DETACH gets called when an application attached to your application quits.
DLL_THREAD_DETACH gets called when a thread inside an attached application quits.
Note that it is possible for DLL_THREAD_DETACH to be called without a corresponsing earlier DLL_THREAD_ATTACH.
This occurs when the thread was started prior to the application linking to the dll.
This mainly occurs when an application manually loads the dll using LoadLibrary instead of statically linking at compile time.
Take a standard Windows application. It loads a DLL using LoadLibrary to call a function in it (we'll call this DLL_A). That function loads another DLL (we'll call it DLL_B). The application now unloads the DLL_A DLL using FreeLibrary as it no longer requires it.
The question is:
Is DLL_B still in memory and loaded?
Is this something I can depend upon, or is it undocumented?
No. DLL_B will not be unloaded. The LoadLibrary() call made by DLL_A will increment the load count for DLL_B. Since there is no corresponding FreeLibrary() call for DLL_B, the refcount will not go to zero.
From the LoadLibrary() docs:
The system maintains a per-process
reference count on all loaded modules.
Calling LoadLibrary increments the
reference count. Calling the
FreeLibrary or
FreeLibraryAndExitThread function
decrements the reference count. The
system unloads a module when its
reference count reaches zero or when
the process terminates (regardless of
the reference count).
You will have a handle leak in the case:
Program -Load> Dll A
-Load> Dll B
-Unload> Dll A
No code is implicitly executed by a module being unloaded to unload the modules that it loaded.
Since no code is executed to decrease the reference count, the module B will never be unloaded.
Here are the rules for loading / unloading dlls:
Each call to LoadLibrary and LoadLibraryEx will increment the reference count for that module. This is in the context of the calling process only, not across process boundaries.
Each call to FreeLibrary or FreeLibraryAndExitThread will decrement the reference count.
When the reference count reaches 0, it will be unloaded.
When Windows sees that your program is closed, any leaked unloaded modules will then be unloaded.
Depending on what you are doing, DllCanUnloadNow might be useful to you.
Still in memory vs still loaded:
There is no guarantee that your module will be released from memory at a certain time when the reference reaches 0. But you should consider the module as if it is unloaded when the reference count reaches 0.
Stopping the DLL from being unloaded:
To force the DLL from being unloaded you could try
The system calls DllMain with the DLL_PROCESS_DETACH flag. You could try to not return from this via some kind of blocking operation.
You could try to call LoadLibrary from within the DLL that you want to not be able to unload. (Self load)
Edit:
You mentioned your goal is to injet code into the running program and that you wanted to leak the handle on purpose.
That is fine, but if you run this operation a lot it can lead to a crash in your source Program because too many handles will be used, or eventually too much memory will be used.
You can return FALSE from your DllMain to stop it from being loaded so that you don't waste memory. You do this when fdwReason is DLL_PROCESS_ATTACH. You can read more about it here.
If you are trying to emulate a DLL and add in your own extra functionality, you will need to implement all of the functions that the source DLL implements and delegate each call back to the source DLL.
Read the Remarks section for a detailed explanation.
The key thing to note is:
The system maintains a per-process reference count for each loaded module
and further down
When a module's reference count reaches zero or the process terminates, the system unloads the module from the address space of the process
From MSDN:
Frees the loaded dynamic-link library (DLL) module and, if necessary, decrements its reference count. When the reference count reaches zero, the module is unloaded from the address space of the calling process and the handle is no longer valid.
DLLs in windows are reference counted. When A is unloaded you are decrementing the reference count on A, if it hits zero it will unload, and (assuming no bugs in the code) decrement the reference count on B. If the refcount on B goes to zero it will then be unloaded. It is possible DLL C has a refcount on B, and unloading A will not unload B.