Delphi DllMain DLL_PROCESS_DETACH called before DLL_PROCESS_ATTACH - windows

I'm having a lot of trouble dealing with a DLL I've written in Delphi. I've set up a DllMain function using the following code in the library:
begin
DllProc := DllMain;
end.
My DllMain procedure looks like this:
procedure DllMain(reason: Integer);
begin
if reason = DLL_PROCESS_DETACH then
OutputDebugString('DLL PROCESS DETACH')
else if reason = DLL_PROCESS_ATTACH then
OutputDebugString('DLL PROCESS ATTACH')
else if reason = DLL_THREAD_ATTACH then
OutputDebugString('DLL THREAD ATTACH')
else if reason = DLL_THREAD_DETACH then
OutputDebugString('DLL THREAD DETACH')
else
OutputDebugString('DllMain');
end;
What I'm finding is that DETACH seems to be called (twice?!) by a caller (that I don't control) before ATTACH is ever called. Is that even possible, or am I misunderstanding how this is supposed to work? My expectation would be that every ATTACH call would be met with a matching DETACH call, but that doesn't appear to be the case.
What's goin' on here?!

Unfortunately when begin is executed in your dll code, the OS has already called DllMain in your library. So when your DllProc := DllMain; statement executes it is already too late. The Delphi compiler does not allow user code to execute when the dll is attached to a process. The suggested workaround (if you can call that a workaround) is to call your own DllMain function yourself in a unit initalization section or in the library code:
begin
DllProc := DllMain;
DllMain(DLL_PROCESS_ATTACH);
end;
The relevant documentation:
Note: DLL_PROCESS_ATTACH is passed to the procedure only if the DLL's initialization code calls the procedure and specifies DLL_PROCESS_ATTACH as a parameter.

What I'm finding is that DETACH seems to be called (twice?!) by a caller (that I don't control) before ATTACH is ever called.
According to "Programming Windows 5th edition" by Petzold.
DLL_PROCESS_ATTACH gets called when the application starts and
DLL_THREAD_ATTACH when a new thread inside an attached application is started.
DLL_PROCESS_DETACH gets called when an application attached to your application quits.
DLL_THREAD_DETACH gets called when a thread inside an attached application quits.
Note that it is possible for DLL_THREAD_DETACH to be called without a corresponsing earlier DLL_THREAD_ATTACH.
This occurs when the thread was started prior to the application linking to the dll.
This mainly occurs when an application manually loads the dll using LoadLibrary instead of statically linking at compile time.

Related

How mingw32-g++ compiler know where to inject system calls in the WIN32 machine executable?

Today, only for the testing purposes, I came with the following idea, to create and compile a naive source code in CodeBlocks, using Release target to remove the unnecessary debugging code, a main function with three nop operations only to find faster where the entry point for the main function is.
CodeBlocks sample naive program:
Using IDA disassembler, I have seen something strange, OS actually can add aditional machine code calls in the main function (added implicitly), a call to system function which reside in kernel32.dll what is used for OS thread handling.
IDA program view:
In the machine code only for test reason the three "nop" (90) was replaced by "and esp, 0FFFFFFF0h", program was re-pached again, this is why "no operation" opcodes are not disponible in the view.
Observed behaviour:
It is logic to create a new thread for each process is opened, as we can explore it in the TaskManager, a process run in it's own thread, that is a reason why compiler add this code (the implicit default thread).
My questions:
How compiler know where to "inject" this call code automatically?
Why this call is not made before in the upper function (sub_401B8C) which will route to main function entry point?
To quote the gcc manual:
If no init section is available, when GCC compiles any function called
main (or more accurately, any function designated as a program entry
point by the language front end calling expand_main_function), it
inserts a procedure call to __main as the first executable code after
the function prologue. The __main function is defined in libgcc2.c and
runs the global constructors.

VCL/LCL – a form in DLL – no Application taskbar window, cannot minimize the main form

I have one problem, and I tried to search a solution but can't achieve what I want. Sorry if that is actually simple, please just point me to correct way of how to do it.
So! I have a C program that is a loader. It must call my DLL written in Delphi or Lazarus (Free Pascal). The DLL is actually a standalone GUI application: during debugging I conditionally compile it as EXE and it working.
My build script compiles it as DLL with one entry point that must execute it just as it works standalone. I expect exactly the same behavior, but I can do some things different (especially setting the Application icon) if needed.
Loader is a console-style program but compiled without a console – no windows, no anything. It just loads DLL and calls a function.
Problem is that when I build even empty default project with one form as an EXE – it will actually have "master" Application (.Handle <> 0) window in taskbar. So I can set its title independently from main form caption.
But when the same thing is inside a DLL – there is no Application window (.Handle = 0), the title will be the form caption, but the most important bug: a form cannot be minimized!
In Delphi 7 it goes background under other windows (but taskbar thing stays!); in Lazarus it just minimizes to nowhere (hided, no way to restore anymore); both without any minimizing animation.
Other than that, my application seems to behave normally. This is only issue I have.
OK, I know that forms in libraries is a bad thing to do, but:
I’m fine to instantiate "another" VCL completely independent from host’s instance, maybe even in different thread.
There is no VCL in my particular host application! For me, it must work exactly as it will in EXE alone…
I searched something about Application.Handle in DLL, and now understand than I need to pass a handle to host’s Application object, so DLL will be joined with others host forms, but I have none! It’s even not Delphi… (and Application:=TApplication.Create(nil); didn’t help either)
Anything of following will probably help me:
A) How to instruct VCL to create a normal Application object for me? How it does it when in EXE, maybe I can copy that code?
B) How to create a suitable master window from C (proper styles, etc.) to pass it’s handle to DLL? Also, I believe, in Free Pascal there is no direct access to TApplication handle value, so I couldn’t probably assign it.
C) How to live without a taskbar window, but have my form (good news: my program has only one form!) to minimize correctly (or just somehow…)?
I now you all love to see some code, so here it is:
// default empty project code, produces valid working EXE:
program Project1;
uses Forms, Unit1 in 'Unit1.pas' {Form1};
{$R *.res}
begin
Application.Initialize;
Application.CreateForm(TForm1, Form1);
Application.Run;
end.
+
// that's how I tried to put it in a DLL:
library Project1;
uses Forms, Unit1 in 'Unit1.pas' {Form1};
{$R *.res}
function entry(a, b, c, d: Integer): Integer; stdcall;
begin
Application.Initialize;
Application.CreateForm(TForm1, Form1);
Application.Run;
Result := 0;
end;
exports
entry;
begin
end.
I specially crafted entry() function to be callable with rundll32, just for testing.
Also, I tried to put the body directly to "begin end." initialization section – same wrong behavior.
// To call a DLL, this can be used:
program Project1;
function entry(a, b, c, d: Integer): Integer; stdcall; external 'Project1.dll';
begin
entry(0, 0, 0, 0);
end.
Also, CMD-command "rundll32 project1.dll entry" will run it instantly. (Yeah, that way I might get a handle that Rundll gives me, but it isn’t what I want anyway.)
Last notes: (a) the DLL must be compiled in Lazarus; actually first thing I thought that it is a bug in LCL, but now when tested in Delphi7 I see the same; and since Delphi case is more simpler and robust, I decided to put here that; (b) my C loader doesn’t call LoadLibrary, it uses TFakeDLL hack (that OBJ file was tweaked to work without Delphi wrapper) and loads my DLL from memory (so I don’t have a handle to DLL itself), but otherwise their behavior is the same.
Okay, thanks to #Sertac Akyuz, I tried with .ShowModal:
// working Delphi solution:
library Project1;
uses Forms, Dialogs, SysUtils, Unit1 in 'Unit1.pas' {Form1};
{$R *.res}
function entry(a, b, c, d: Integer): Integer; stdcall;
begin
Result := 0;
Application.Initialize;
Form1 := TForm1.Create(nil);
try
Form1.ShowModal;
except
on e: Exception do
ShowMessage(e.message);
end;
Form1.Free;
end;
exports
entry;
begin
end.
There is still no Application window (taskbar title equal to form caption), but now my form can be successfully minimized (with system animation). Note that for EXE compilation I have to go default way with Application, because when I tried to create the form like this – it started to minimize to desktop (iconify) instead of the taskbar.
It works perfect in empty default Lazarus project too. But when I tried to implement it to my production program, it gave me "Disk Full" exception at .ShowModal!
That thing was frustrating me little earlier (and that’s why I got rid of modality altogether, tried it no more), but now I was determined enough to get the bottom of this.
And I found the problem! My build script doesn’t pass "-WG" ("Specify graphic type application") compiler option. Looks like something in LCL was using console-alike mode, and modality loop failed for some reason.
Then, I had another very confusing issue that I want to share. My form’s OnCreate was rather big and complex (even starting other threads), and some internal function give me access violation when tried to do some stuff with one of controls on the form. It looked like the control is not constructed yet, or the form itself…
Turns out that the actual call Form1:=TForm1.Create(nil); obviously will leave the global variable "Form1" unassigned during FormCreate event. The fix was simple: to add Form1:=Self; in the beginning of TForm1.FormCreate(Sender: TObject);
Now everything is working without any problems. I can even use other forms with a normal Form2.Show(); if I firstly add them to my entry() function, like Form2:=TForm2.Create(Form1);
(edit: minor note, if you would use Lazarus and try to run entry() function from any different thread than one that loaded DLL library itself – then you should put MainThreadID:=GetCurrentThreadId(); just above Application.Initialize;)
Yay, this question is solved!

What Does Windows Do Before Main() is Called?

Windows must do something to parse the PE header, load the executable in memory, and pass command line arguments to main().
Using OllyDbg I have set the debugger to break on main() so I could view the call stack:
It seems as if symbols are missing so we can't get the function name, just its memory address as seen. However we can see the caller of main is kernel32.767262C4, which is the callee of ntdll.77A90FD9. Towards the bottom of the stack we see RETURN to ntdll.77A90FA4 which I assume to be the first function to ever be called to run an executable. It seems like the notable arguments passed to that function are the Windows' Structured Exception Handler address and the entry point of the executable.
So how exactly do these functions end up in loading the program into memory and getting it ready for the entry point to execute? Is what the debugger shows the entire process executed by the OS before main()?
if you call CreateProcess system internally call ZwCreateThread[Ex] to create first thread in process
when you create thread - you (if you direct call ZwCreateThread) or system initialize the CONTEXT record for new thread - here Eip(i386) or Rip(amd64) the entry point of thread. if you do this - you can specify any address. but when you call say Create[Remote]Thread[Ex] - how i say - the system fill CONTEXT and it set self routine as thread entry point. your original entry point is saved in Eax(i386) or Rcx(amd64) register.
the name of this routine depended from Windows version.
early this was BaseThreadStartThunk or BaseProcessStartThunk (in case from CreateProcess called) from kernel32.dll.
but now system specify RtlUserThreadStart from ntdll.dll . the RtlUserThreadStart usually call BaseThreadInitThunk from kernel32.dll (except native (boot execute) applications, like smss.exe and chkdsk.exe which no have kernel32.dll in self address space at all ). BaseThreadInitThunk already call your original thread entry point, and after (if) it return - RtlExitUserThread called.
the main goal of this common thread startup wrapper - set the top level SEH filter. only because this we can call SetUnhandledExceptionFilter function. if thread start direct from your entry point, without wrapper - the functional of Top level Exception Filter become unavailable.
but whatever the thread entry point - thread in user space - NEVER begin execute from this point !
early when user mode thread begin execute - system insert APC to thread with LdrInitializeThunk as Apc-routine - this is done by copy (save) thread CONTEXT to user stack and then call KiUserApcDispatcher which call LdrInitializeThunk. when LdrInitializeThunk finished - we return to KiUserApcDispatcher which called NtContinue with saved thread CONTEXT - only after this already thread entry point begin executed.
but now system do some optimization in this process - it copy (save) thread CONTEXT to user stack and direct call LdrInitializeThunk. at the end of this function NtContinue called - and thread entry point being executed.
so EVERY thread begin execute in user mode from LdrInitializeThunk. (this function with exactly name exist and called in all windows versions from nt4 to win10)
what is this function do ? for what is this ? you may be listen about DLL_THREAD_ATTACH notification ? when new thread in process begin executed (with exception for special system worked threads, like LdrpWorkCallback)- he walk by loaded DLL list, and call DLLs entry points with DLL_THREAD_ATTACH notification (of course if DLL have entry point and DisableThreadLibraryCalls not called for this DLL). but how this is implemented ? thanks to LdrInitializeThunk which call LdrpInitialize -> LdrpInitializeThread -> LdrpCallInitRoutine (for DLLs EP)
when the first thread in process start - this is special case. need do many extra jobs for process initialization. at this time only two modules loaded in process - EXE and ntdll.dll . LdrInitializeThunk
call LdrpInitializeProcess for this job. if very briefly:
different process structures is initialized
loading all DLL (and their dependents) to which EXE statically
linked - but not call they EPs !
called LdrpDoDebuggerBreak - this function look - are debugger
attached to process, and if yes - int 3 called - so debugger
receive exception message - STATUS_BREAKPOINT - most debuggers can
begin UI debugging only begin from this point. however exist
debugger(s) which let as debug process from LdrInitializeThunk -
all my screenshots from this kind debugger
important point - until in process executed code only from
ntdll.dll (and may be from kernel32.dll) - code from another
DLLs, any third-party code not executed in process yet.
optional loaded shim dll to process - Shim Engine initialized. but
this is OPTIONAL
walk by loaded DLL list and call its EPs with
DLL_PROCESS_DETACH
TLS Initializations and TLS callbacks called (if exists)
ZwTestAlert is called - this call check are exist APC in thread
queue, and execute its. this point exist in all version from NT4 to
win 10. this let as for example create process in suspended state
and then insert APC call ( QueueUserAPC ) to it thread
(PROCESS_INFORMATION.hThread) - as result this call will be
executed after process will be fully initialized, all
DLL_PROCESS_DETACH called, but before EXE entry point. in context
of first process thread.
and NtContinue called finally - this restore saved thread context
and we finally jump to thread EP
read also Flow of CreateProcess

Why is ExitProcess necessary under Win32 when you can use a RET?

I've noticed that many assembly language examples built using straight Win32 calls (no C Runtime dependency) illustrate the use of an explicit call to ExitProcess() to end the program at the end of the entry-point code. I'm not talking about using ExitProcess() to exit at some nested location within the program. There are surprisingly fewer examples where the entry-point code simply exits with a RET instruction. One example that comes to mind is the famous TinyPE, where the program variations exit with a RET instruction, because a RET instruction is a single byte. Using either ExitProcess() or a RET both seem to do the job.
A RET from an executable's entry-point returns the value of EAX back to the Windows loader in KERNEL32, which ultimately propagates the exit code back to NtTerminateProcess(), at least on Windows 7. On Windows XP, I think I remember seeing that ExitProcess() was even called directly at the end of the thread-cleanup chain.
Since there are many respected optimizations in assembly language that are chosen purely on generating smaller code, I wonder why more code floating around prefers the explicit call to ExitProcess() rather than RET. Is this habit or is there another reason?
In its purest sense, wouldn't a RET instruction be preferable to a direct call to ExitProcess()? A direct call to ExitProcess() seems akin to exiting your program by killing it from the task manager as this short-circuits the normal flow of returning back to where the Windows loader called your entry-point and thus skipping various thread cleanup operations?
I can't seem to locate any information specific to this issue, so I was hoping someone could shed some light on the topic.
If your main function is being called from the C runtime library, then exiting will result in a call to ExitProcess() and the process will exit.
If your main function is being called directly by Windows, as may well be the case with assembly code, then exiting will only cause the thread to exit. The process will exit if and only if there are no other threads. That's a problem nowadays, because even if you didn't create any threads, Windows may have created one or more on your behalf.
As far as I know this behaviour is not properly documented, but is described in Raymond Chen's blog post, "If you return from the main thread, does the process exit?".
(I have also tested this myself on both Windows 7 and Windows 10 and confirmed that they behaved as Raymond describes.)
Addendum: in recent versions of Windows 10, the process loader is itself multi-threaded, so there will always be additional threads present when the process first starts.

DllMain DLL_PROCESS_DETACH and GetMessage Function reentrancy

I have written a global hook that hooks using SetWindowsHookEx the WH_GETMESSAGE, WH_CALLWNDPROC and WH_CALLWNDPROCRET.
The hook dll creates a new thread in the hooked process, which, among other things, checks the audio state of the process and calls IAudioSessionManager2::GetSessionEnumerator().
Now the interesting part, I had called UnhookWindowsHookEx() from the hook host AND during the time my dll's worker thread was running the call to IAudioSessionManager2::GetSessionEnumerator(). That call was in the same thread's call stack, where the DllMain with DLL_PROCESS_DETACH was invoked.
I assume the reason was, that GetSessionEnumerator() invokes GetMessage() function somewhere and the latter is reentrant. Unfortunately I do not remember precisely, but I think I saw that in the call stack.
But there are multiple important things I wonder about and things that remain unclear. So here come my related questions:
Can DllMain with DLL_PROCESS_DETACH invoked any time, even in a thread which runs functions from that dll which is currently being unloaded?
What happens with the functions up the stack when DllMain DLL_PROCESS_DETACH exits? Will the code in the functions up the call stack execute eventually?
What if these functions do not exit? When will the dll be unloaded?
Can the DllMain DLL_PROCESS_DETACH be similarly invoked during the callbacks for WH_GETMESSAGE, WH_CALLWNDPROC and WH_CALLWNDPROCRET hooks? I know and have experimentally confirmed that sometimes, although not too often, these functions are reentrant, so the calls to these functions can be injected during the time previous call is still running in the same stack, but I do not know whether also calls to DllMain can be injected in a similar manner.
When precisely the DllMain can be invoked in a thread - are there some specific Windows API functions that need to be called and which in turn may lead to DllMain DLL_PROCESS_DETACH call, or could it happen at any instruction?
If the DllMain DLL_PROCESS_DETACH call can be "injected" at any time AND functions up the call stack do not get executed anymore, then how do I know where precisely was the function up the call stack interrupted? So I could inside DllMain release some handles or resources allocated by the function up the stack.
Is there any way to temporarily prevent/postpone calls to DllMain DLL_PROCESS_DETACH? Locks obviously do not help if the call/interruption occurs in the same stack.
Unfortunately I probably cannot experimentally solve these questions since I have had my hooking (and also unhooking) code running on multiple computers for months before such situation with DllMain occured during unhooking. Though for some reason it then occured with four different programs at once...
Also please, would someone with enough reputation like to merge the tags "reentrant" and "reentrancy"?
So thanks to Hans I now know regarding point (4) that DllMain DLL_PROCESS_DETACH will not be reentrant with hook procedures.
Regarding the thread that the hook created, my log files currently indicate that if DllMain DLL_PROCESS_DETACH is injected into the stack of this thread then that thread indeed will be terminated after DllMain exits and will NOT run to completion. That should answer points (2) and (3). The question itself implicitly answers the point (1).
But for solving the problem for that thread the hook created, I assume that the DllMain DLL_PROCESS_DETACH can be prevented by calling
GetModuleHandleEx
(
GET_MODULE_HANDLE_EX_FLAG_FROM_ADDRESS,
(LPCTSTR)DllMain,
&hModule_thread
)
and before the thread termination calling
FreeLibraryAndExitThread(hModule_thread, 0)
So using GetModuleHandleEx should answer the point (7), which in turn makes all other points irrelevant. Of course I have to use some IPC to trigger the thread termination in the hooked processes.
The remaining interesting question is point (5), but it is just out of curiosity:
"When precisely the DllMain DLL_PROCESS_DETACH can be invoked in a thread - are there some specific Windows API functions that need to be called and which in turn may lead to DllMain DLL_PROCESS_DETACH call, or could it happen at any instruction?"

Resources