using LoadLibrary, effect performance? - performance

when you dynamically load a library at runtime using LoadLibrary in windows (C++), does it load into memory the same as the rest of your program, or might there be some overhead associated with calling functions referenced from that library?
In other words, if you plan on making frequent calls to a function, will it be just as fast from the library as it would if you linked it into you program at compile-time, or do you lose some performance?
(This is not related to libraries that link to or against a program during compile-time via .lib/.a files.)

Once dll is loaded and function pointer variable is initialized by GetProcAddress, there isn't any overhead in function call.

Related

How does a JIT compiler compile a function that contains a call to another non-jitted function or virtual call

I understand what JIT compilers do, it uses counters to detect hot codes and compiles them, and stores them in the code cache, so that next time the function won't need to be interpreted. But this brings me to two questions:
How does a JIT compiler compile a function that contains calls to other non-jitted functions. Calls to other non-jitted functions need to let the jitted function, which is the native code, calls the outside function that is still supposed to be executed through the interpreter of the JIT compiler.
The second one is a variant of the first one. How does a JIT compiler compile a function that contains a virtual call? it seems that it has to hard-code the vtable lookup procedure into the native code, but what will happen if the vtable is living in the virtual machine, then there must be a way to access the vtable inside the generated native code, but how could this be done, if the virtual machine is implemented using a language like C or C++ maybe you can hard-code the memory address of the vtable into the native code, but how can I do the same if the virtual machine is implemented in some managed language that its users are not encouraged to, or simply cannot access the address of an object, and even if I can access the address the object might be moved during the compact phase of the garbage collector.

Safe place to put unsafe DLL cleanup code on Windows?

We hit a case where it would be the best solution for us to put a FreeLibrary call into DllMain / DLL_PROCESS_DETACH.
Of course, you must not do that:
It is not safe to call FreeLibrary from DllMain.
The use case is that we have a situation like this:
(unknown client dll or exe) links dynamically or statically to ->
-> DLL_1, loads dynamically -> DLL_x
DLL_1 should load DLL_x transparently wrt. to it's client code, and it should to load DLL_x dynamically. Now, the loading can be done lazily, so that the LoadLibrary call needn't reside in the DLL_PROCESS_ATTACH part of DLL_1.
But once the client is done with DLL_1, when/before DLL_1 is unloaded from the process, it should also unload (== FreeLibrary) DLL_x.
Is there any way to do this without an explicit DLL_1/Uninitialize function that must be called by the client?
I'll note:
DllMain, and thus also any C++ global static destructor cannot be used.
Is there any other callback mechanism in either kernel32/ntdll or maybe in the shared MS CRT to make this happen?
Are there other patterns to make this usecase work?
The correct approach is an explicit Uninitialize function in DLL_1.
However, if you can't do that, you can work around the problem by launching a helper thread to do the unload for you. If you want to play it safe, launch the thread at the same time you load DLL_x and have it wait on an event object. (For the record, though, it is generally considered safe to launch a thread from DllMain so long as you respect the fact that it won't start up until DllMain has exited.)
Obviously, the helper thread's code can't be in DLL_1. If you can modify DLL_x you can put it there. If not, you'll need a helper DLL. In either case, the DLL containing the helper thread's code can safely self-unload using the FreeLibraryAndExitThread function.

How much of shared object is loaded to memory

If there is a shared object file say libComponent.so which is made up of two object files Component_1.o and Compononet_2.o.
And there is an application which links to libComponent.so but is only using Compononent_1.o functions.
Will the entire shared object i.e libComponent.so will be loaded into memory when application runs and uses shared object file or just the Component_1.o ?
Is there an option available in gcc compiler to toggle this behaviour of only loading the required symbols from a shared object ?
Well, it depends on what you mean by 'loaded'.
The dynamic linker will map all of the library into the process's virtual memory space and will fill in entries in the executable's import table for each library function used with the addresses of functions in the shared library. But filling in the import table doesn't actually load from those addresses, so they won't be loaded into physical memory.
From then on, the library code will be paged into physical memory on demand when the function is called, just like any other pageable memory in the process's virtual address space. If a function is never called (directly from the application or indirectly from another library function called by the application), it won't be paged in. (Well, paging occurs with page size granularity, so you might pull in a function the application doesn't call if it's next to a function it does call. Some compilers use profile-guided optimization to place functions commonly called together next to each other to minimize the number of pages used.)
(Aside: if your library wasn't compiled to use position-independent code and it's loaded at its non-default base address, the linker will need to fix up addresses in the code when it's loaded, which would cause the entire library to be paged in. This could be done lazily when each page is first loaded, though I'm not sure which linkers do this.)

Loading/calling ntdll from DllMain

One should not use functions other than those in kernel32.dll from DllMain:
From MS documentation:
Because Kernel32.dll is guaranteed to be loaded in the process address space when the entry-point function is called, calling functions in Kernel32.dll does not result in the DLL being used before its initialization code has been executed. Therefore, the entry-point function can call functions in Kernel32.dll that do not load other DLLs. For example, DllMain can create synchronization objects such as critical sections and mutexes, and use TLS. Unfortunately, there is not a comprehensive list of safe functions in Kernel32.dll.
...
Calling functions that require DLLs other than Kernel32.dll may result in problems that are difficult to diagnose. For example, calling User, Shell, and COM functions can cause access violation errors, because some functions load other system components. Conversely, calling functions such as these during termination can cause access violation errors because the corresponding component may already have been unloaded or uninitialized.
My question:
But the documentation does not mention ntdll.dll. - Can I call LoadLibrary for "ntdll" and use functions in ntdll from DllMain:
1) during DLL_PROCESS_ATTACH (load and use functions of ntdll)?
2) during DLL_PROCESS_DETACH (use functions of previously loaded ntdll)?
Also, please, would somebody with 1500+ reputation like to create a new tag titled "dllmain" ?
The answer to the question "is it safe in DllMain" always defaults to "no". In this case, calling LoadLibrary is never okay.
Generally speaking, calling anything in ntdll.dll is not recommended even places where it is safe to do so.

An impasse with hooking calls to HeapAlloc for a memory tracking application

I am writing a memory tracking application that hooks all the calls to HeapAlloc using IAT patching mechanism. The idea is to capture all the calls to HeapAlloc and get a callstack.
However I am currently facing a problem with getting the callstack using DBGHELP Apis. I found that the dbghelp dll itself is linking to MSVCRT dll and this dependancy results in a recursive call. When I try to get a callstack for any of the calls from the target application, dbghelp internally calls some method from MSVCRT that again calls HeapAlloc. And since I have already patched MSVCRT it results in an infinite loop.
Has anyone faced this problem and solved it ? Is there any way out of this impasse?
This is a standard problem in function interception code. We had a similar issue with a logging library that used shared memory for storing log level information, while the shared memory library had to log information.
The way we fixed it could be applied to your situation, I believe.
In your intercept code, maintain a static flag that indicates whether or not you're in the middle of an intercept. When your intercept is called and the flag isn't set, set the flag then do what you currently do, including calling DbgHelp, then clear the flag.
If your intercept is called while the flag is set, only call the back-end HeapAlloc code without doing any of the other stuff (including calling DbgHelp which is what's causing your infinite recursion).
Something along the lines of (pseudo-code):
function MyHookCode:
static flag inInterceptMode = false
if inInterceptMode:
call HeapAlloc
return
inInterceptMode = true
call DbgHelp stuff
call HeapAlloc
inInterceptMode = false
return
function main:
hook HeapAlloc with MyHookCode
: : :
return
What about using some real memory tracking products like GlowCode?
You can use Deviare API Hook and get full stack-trace without using that API that has a big number of problems.

Resources