Internal Mechanism of Dynamic Loading DLL's in C++ in OS perpective? - windows

I am not able to get much information about dynamic loading of DLL files from C++ .
I know it does use some functions like LoadLibrary and FreeLibrary with GetProcAddress . But how it works actually internally in the OS perspective like where it actually looks for the DLL file and where it loads like Memory ? can someone help me on that with some diagrams ?

DLL search order is described on the MSDN, and there's an article on DLL loading, and two-part article describing PE format (part two here) (they're slightly old, but I don't think they're outdated). Look through MSDN Magazine and MSJ archives and you'll probably find more.

There's two ways to use a DLL. You can load it dynamically at run-time or statically link against it at link-time.
If you load dynamically it using LoadLibrary, the OS has some mechanism to determine where to look for DLLs. It then attempts to load them. Then you can try to get function pointers to the functions you name (by string or ordinary) and call these functions.
If you link statically, basically the linker adds a reference to the DLL and some jump table with an entry for each of the DLL's functions. When the OS loads your application, it finds references to those DLLs, attempts to load these, and patches the loaded DLL's function's addresses into the jump table. Only then is your application considered loaded and will start.
Note that in reality this is a bit more complicated. For example, DLLs can in turn reference other DLLs. So when the loader loads a DLL, before the DLL can be considered loaded, it will need to (possibly recursively) load other DLLs as well.

For Win32, loader details are on MSDN. See here.
From your C++ code, you're right (for Windows), you load with ::LoadLibrary and resolve function pointers with ::GetProcAddress. Typically you'll cast the result of GetProcAddress into the type that you know the entry point function to be, and then use it in your program.
For example, if you have a plug-in architecture like a browser, you'd decide what your plug-in directory is, get the filename list for that directory, and call ::LoadLibrary for each DLL (filtering filenames would be up to you). For each, you'd resolve the required entry points with GetProcAddress, store them in a structure for that library, and put them in some plug-in list. Later, you'd call through those function pointers to let the plug-in do its work.
If you specify a relative path (e.g. "foo.dll" rather than "c:\foo.dll"), the OS library search path kicks in. Details at MSDN.
Also, DLLs get loaded into your process's address space. Typically you don't care about where, but in the past, you could get faster load times by "rebasing" your DLLs. I don't think there are any guarantees about how the OS loader places libraries in memory, but you can always get the base address in your process's address space.
Your DLL's entry point (dllmain) can also respond to various messages -- thread attach, process attach -- to do initialization in a sensible way.

Related

How to tell if an exe will load a DLL statically or dynamically by looking at the PE file header?

As the title says, how to tell if an exe will load a DLL statically or dynamically by looking at the PE file header?
In other words, how to tell if the DLL is part of the executable, or will be called by the loader?
Thanks
Let me first clarify some terminology to avoid confusion.
Anything executed within a DLL is by definition dynamic. But, a DLL may be statically bound or dynamically bound to an executable.
With static binding, the EXE links against a DLL's import library (actually a .LIB file that is built alongside the DLL). Corresponding DLL function prototypes in header files will usually be declared with __declspec(dllimport). This results in the EXE being filled with stubs for each DLL symbol that are filled in by the Windows loader at runtime. This is facilitated by the final EXE having an import section structure in its PE headers listing all the DLLs to be resolved by the Windows loader at runtime and their symbolic names (e.g. functions). Windows then does all the dirty work to find and load these DLLs and referenced symbolic addresses before the EXE starts execution of the primary thread at its entry point. If Windows fails to find any DLL(s) or referenced symbolic addresses, the EXE won't start.
With dynamic binding, the EXE explicitly invokes code to load DLL(s) and resolve symbolic addresses. This is done using the two KERNEL32 API functions: LoadLibrary() and GetProcAddress(). If an EXE does this, there will be no associated import section describing this DLL and its symbols, and the Windows loader will happily load the EXE knowing nothing said DLL(s). It is then application defined as to how to handle success or failure of LoadLibrary() and /or GetProcAddress().
It is worth noting at this point, that libraries like the C-Runtime may be provided in DLL form (dynamic library) or static form (static library). If you link to one of these libraries statically, there will be no DLL import section in the built EXE's PE header and no function stubs to resolve at runtime for that library. Instead of stubs, these symbols (functions and/or data variables) become part of the EXE. Static library functions and/or data are copied into the EXE and are assigned relative addresses explicitly by the linker; no different than if those symbols were implemented directly by the EXE. Additionally, there will be no LoadLibrary() or GetProcAddress() resolution either implicitly (by the Windows loader) or explicitly in code for these functions as they will be directly present and self-contained within the final EXE. As a side-note, debugging symbols may be used in this case to try and differentiate between EXE implemented functions and library implemented functions (should you care) but this is highly dependent on the settings used to build both the EXE and the static library.
With terminology cleared up, let me attempt to answer your question! :)
Let me also add I'm not going to go into the specifics of bound and unbound import symbols for a module's import section because this distinction has nothing to do with the original question and have more to do with speeding up the work done by the Windows loader. If you are interested in those details however, you can read up on Microsoft's PE COFF Specification.
To see if an EXE is statically bound to a DLL, you can either parse the PE headers yourself to locate the DLL imports section or use one of dozens of tools to do this for you, such as Dependency Walker. If you load your EXE in Dependency Walker for example, you will see a list of all statically bound DLLs in the top-left pane underneath the EXE itself. If any of these DLLs are not found at runtime, the program will fail to load. In the right pane, top table, you will see symbols (e.g. functions) that are referenced in the EXE for the selected DLL. All of these symbols must additionally be found for the EXE to load. The lower table simply shows all of the symbols exported by the DLL, referenced or not.
If the EXE uses dynamic binding (also called it manual binding) for a given DLL, there will be no import section for that DLL and thus you won't see it referenced in tools like Dependency Walker. BUT, you can click on KERNEL32.DLL in Dependency Walker (all EXEs will have this dependency, though there are exceptions to this rule I won't get in to here) and locate references to LoadLibrary() and GetProcAddress(). Unfortunately most EXEs reference these functions in boilerplate code such as the C-Runtime so this won't tell you too much.
If you want to dig deeper into trying to figure out which DLLs are manually loaded by an application, the first thing to try is to and locate that DLL name string by searching the EXE for the DLL name. Note that the DLL name string need not end in ".DLL" as LoadLibrary() automatically assumes this extension if not provided. The standard tool for searching for strings within a binary module is Sysinternals Strings. This works great for modules that make no attempt to hide what they are doing.
With that said, obfuscated code (found in unpackers, viruses and the like) may obfuscate or encrypt DLL names as well as the functions referenced. Code may even choose to resolve LoadLibrary() and GetProcAddress() at runtime to further hinder efforts to figure out what they are doing. Your best bet in these situations is to use a tool like Sysinternals Process Monitor or a debugger with verbose logging enabled to watch the DLLs being loaded as the program runs. You can also use a disassembler (such as IDA) to try and piece together what the code is doing. To find out what DLL symbols are being used, you might start the EXE in a debugger and wait for the initial break at the entry-point. Then add a breakpoint on the first instruction in KERNEL32.GetProcAddress. Run the program. When that breakpoint is hit, the stack arguments will contain the symbol trying to be resolved.
As you can see, if an application resolves DLL symbols manually (dynamic binding), the process of figuring out what DLLs are being referenced is not as straightforward.

Is it possible to load a DLL into the address space not from a file-system file?

I have to create a wrapper DLL that exports some symbols (functions). Within its resources it contains another encrypted DLL that actually does the job.
Upon the wrapper DLL initialization it decrypts the original one, saves it in a file, and loads into the address space by LoadLibrary. However I'd like to avoid saving this DLL in a file.
I know that this doesn't guarantee a bullet-proof protection, actually one may dump the process virtual memory and see it there. I also know that it's possible to create a file with FILE_FLAG_DELETE_ON_CLOSE attribute, which ensures this file will be deleted as soon as the process terminates. But still I'd like to know if there's an option to load the DLL "not from a file".
So far I thought about the following:
Allocate a virtual memory block with adequate protection (PAGE_EXECUTE_READ or PAGE_EXECUTE_READWRITE). Preferrably at the image preferred base address.
Extract/decrypt the DLL image there.
If the image base address isn't its preferred address - do the relocation "manually". I.e. - analyze the relocation table and patch the image in-place.
Handle the image imports. Load its dependency DLLs and fill symbol addresses.
Invoke its initialization function (DllMain).
That is, I can do the work of the loader. But unfortunately there are some areas where the DLL loaded by the above trick will behave differently, since it's not a properly-loaded DLL from the OS's perspective. This includes the following:
The DllMain requires the DLL "module handle", which is just its base address. It may use this handle in calls to various API functions, such as LoadResource. Those calls will probably fail.
There will be problems with exception handling. The OS won't see the DLL's SAFESEH section, hence its internal exception handling code won't be invoked (it's a 64-bit DLL, means SAFESEH is mandatory for exception handling).
Here's my question: Is there an API to properly load the DLL into the process address space without the need for it to be in a file? An alternative variant of LoadLibrary that works, say, on a file mapping instead of a file-system file?
Thanks in advance.
Yes, it is possible to load a DLL which is located in the resources of another image and execute it without needing a file! Take a look at this article, this is exactly what you want. It works, I tried it.

Architecturally what is the difference between a shared object (SO) and a dynamic link library (DLL)?

The question is pretty much in the title: in terms of OS-level implementation, how are shared objects and dlls different?
The reason I ask this is because I recently read this page on extending Python, which states:
Unix and Windows use completely different paradigms for run-time loading of code. Before you try to build a module that can be dynamically loaded, be aware of how your system works.
In Unix, a shared object (.so) file contains code to be used by the program, and also the names of functions and data that it expects to find in the program. When the file is joined to the program, all references to those functions and data in the file’s code are changed to point to the actual locations in the program where the functions and data are placed in memory. This is basically a link operation.
In Windows, a dynamic-link library (.dll) file has no dangling references. Instead, an access to functions or data goes through a lookup table. So the DLL code does not have to be fixed up at runtime to refer to the program’s memory; instead, the code already uses the DLL’s lookup table, and the lookup table is modified at runtime to point to the functions and data.
Could anyone elaborate on that? Specifically I'm not sure I understand the description of shared objects containing references to what they expect to find. Similarly, a DLL sounds like pretty much the same mechanism to me.
Is this a complete explanation of what is going on? Are there better ones? Is there in fact any difference?
I am aware of how to link to a DLL or shared object and a couple of mechanisms (.def listings, dllexport/dllimport) for writing DLLs so I'm explicitly not looking for a how to on those areas; I'm more intrigued as to what is going on in the background.
(Edit: another obvious point - I'm aware they work on different platforms, use different file types (ELF vs PE), are ABI-incompatible etc...)
A Dll is pretty much the same mechanism as used by .so or .dylib (MacOS) files, so it is very hard to explain exactly what the differences are.
The core difference is in what is visible by default from each type of file. .so files export the language (gcc) level linkage - which means that (by default) all C & c++ symbols that are "extern" are available for linking when .so's are pulled in.
It also means that, as resolving .so files is essentially a link step, the loader doesn't care which .so file a symbol comes from. It just searches the specified .so files in some order following the usual link step rules that .a files adhere to.
Dll files on the other hand are an Operating system feature, completely separate to the link step of the language. MSVC uses .lib files for linking both static, and dynamic libraries (each dll file generates a paired .lib file that is used for linking) so the resulting program is fully "linked" (from a language centric point of view) once its built.
During the link stage however, symbols were resolved in the lib's that represents the Dlls, allowing the linker to build the import table in the PE file containing an explicit list of dlls and the entry points referenced in each dll. At load time, Windows does not have to perform a "link" to resolving symbols from shared libraries: That step was already done - the windows loader just loads up the dll's and hooks up the functions directly.

Some basic questions about the DLL file

When does Windows Operating System load a DLL into memory?
Does the operation occur when the application starts or when the application first calls one of the procedures in the DLL?
Could a DLL be unloaded once it has been loaded?
When does Windows Operating System
load a DLL into memory?
If you've linked your EXE to a DLL implicitly through a .lib file, like you normally do for most windows apis such as user32.dll and kernel32.dll, then the defautl behavior is for the DLL to get loaded when the process starts and before your WinMain/main function is called. See below for delay loading...
If one DLL depends on another, it will load its dependencies first if they are not already loaded.
If you are explicitly loading code through a DLL (LoadLibrary, CoCreateInstance, etc...), then it will get loaded upon making these calls
Does the operation occur when the
application starts or when the
application first calls one of the
procedures in the DLL?
You can have it both ways. By default, DLL is loaded at app startup. If you used the /DELAYLOAD linker flag, the DLL may be able to defer being loaded until its actually needed. This is "best effort" - if there are weird export dependencies with global variables, it may not work.
Could a DLL be unloaded once it has been loaded?
Short answer is "no" for implicit DLL dependencies that you've linked. FreeLibrary and CoFreeUnusedLibrary can be used for LoadLibrary/CoCreateInstance calls.
I'm going to assume we are talking .net. It is garanteed to happen before you need the code. But you can use late binding to do it at some other time. See this pagelink text
In the windows API, you can explicitly control the loading and unloading of a .dll.
See LoadLibrary and FreeLibrary as a starting point.
Depending on the language/tools you are using many of the details of loading libraries will be taken care of for you, but usually you can still get explicit control if you really want it.

What exactly are DLL files, and how do they work?

How exactly do DLL files work? There seems to be an awful lot of them, but I don't know what they are or how they work.
So, what's the deal with them?
What is a DLL?
Dynamic Link Libraries (DLL)s are like EXEs but they are not directly executable. They are similar to .so files in Linux/Unix. That is to say, DLLs are MS's implementation of shared libraries.
DLLs are so much like an EXE that the file format itself is the same. Both EXE and DLLs are based on the Portable Executable (PE) file format. DLLs can also contain COM components and .NET libraries.
What does a DLL contain?
A DLL contains functions, classes, variables, UIs and resources (such as icons, images, files, ...) that an EXE, or other DLL uses.
Types of libraries:
On virtually all operating systems, there are 2 types of libraries. Static libraries and dynamic libraries. In windows the file extensions are as follows: Static libraries (.lib) and dynamic libraries (.dll). The main difference is that static libraries are linked to the executable at compile time; whereas dynamic linked libraries are not linked until run-time.
More on static and dynamic libraries:
You don't normally see static libraries though on your computer, because a static library is embedded directly inside of a module (EXE or DLL). A dynamic library is a stand-alone file.
A DLL can be changed at any time and is only loaded at runtime when an EXE explicitly loads the DLL. A static library cannot be changed once it is compiled within the EXE.
A DLL can be updated individually without updating the EXE itself.
Loading a DLL:
A program loads a DLL at startup, via the Win32 API LoadLibrary, or when it is a dependency of another DLL. A program uses the GetProcAddress to load a function or LoadResource to load a resource.
Further reading:
Please check MSDN or Wikipedia for further reading. Also the sources of this answer.
What is a DLL?
DLL files are binary files that can contain executable code and resources like images, etc. Unlike applications, these cannot be directly executed, but an application will load them as and when they are required (or all at once during startup).
Are they important?
Most applications will load the DLL files they require at startup. If any of these are not found the system will not be able to start the process at all.
DLL files might require other DLL files
In the same way that an application requires a DLL file, a DLL file might be dependent on other DLL files itself. If one of these DLL files in the chain of dependency is not found, the application will not load. This is debugged easily using any dependency walker tools, like Dependency Walker.
There are so many of them in the system folders
Most of the system functionality is exposed to a user program in the form of DLL files as they are a standard form of sharing code / resources. Each functionality is kept separately in different DLL files so that only the required DLL files will be loaded and thus reduce the memory constraints on the system.
Installed applications also use DLL files
DLL files also becomes a form of separating functionalities physically as explained above. Good applications also try to not load the DLL files until they are absolutely required, which reduces the memory requirements. This too causes applications to ship with a lot of DLL files.
DLL Hell
However, at times system upgrades often breaks other programs when there is a version mismatch between the shared DLL files and the program that requires them. System checkpoints and DLL cache, etc. have been the initiatives from M$ to solve this problem. The .NET platform might not face this issue at all.
How do we know what's inside a DLL file?
You have to use an external tool like DUMPBIN or Dependency Walker which will not only show what publicly visible functions (known as exports) are contained inside the DLL files and also what other DLL files it requires and which exports from those DLL files this DLL file is dependent upon.
How do we create / use them?
Refer the programming documentation from your vendor. For C++, refer to LoadLibrary in MSDN.
Let’s say you are making an executable that uses some functions found in a library.
If the library you are using is static, the linker will copy the object code for these functions directly from the library and insert them into the executable.
Now if this executable is run it has every thing it needs, so the executable loader just loads it into memory and runs it.
If the library is dynamic the linker will not insert object code but rather it will insert a stub which basically says this function is located in this DLL at this location.
Now if this executable is run, bits of the executable are missing (i.e the stubs) so the loader goes through the executable fixing up the missing stubs. Only after all the stubs have been resolved will the executable be allowed to run.
To see this in action delete or rename the DLL and watch how the loader will report a missing DLL error when you try to run the executable.
Hence the name Dynamic Link Library, parts of the linking process is being done dynamically at run time by the executable loader.
One a final note, if you don't link to the DLL then no stubs will be inserted by the linker, but Windows still provides the GetProcAddress API that allows you to load an execute the DLL function entry point long after the executable has started.
DLLs (dynamic link libraries) and SLs (shared libraries, equivalent under UNIX) are just libraries of executable code which can be dynamically linked into an executable at load time.
Static libraries are inserted into an executable at compile time and are fixed from that point. They increase the size of the executable and cannot be shared.
Dynamic libraries have the following advantages:
1/ They are loaded at run time rather than compile time so they can be updated independently of the executable (all those fancy windows and dialog boxes you see in Windows come from DLLs so the look-and-feel of your application can change without you having to rewrite it).
2/ Because they're independent, the code can be shared across multiple executables - this saves memory since, if you're running 100 apps with a single DLL, there may only be one copy of the DLL in memory.
Their main disadvantage is advantage #1 - having DLLs change independent your application may cause your application to stop working or start behaving in a bizarre manner. DLL versioning tend not to be managed very well under Windows and this leads to the quaintly-named "DLL Hell".
DLL files contain an Export Table which is a list of symbols which can be looked up by the calling program. The symbols are typically functions with the C calling convention (__stcall). The export table also contains the address of the function.
With this information, the calling program can then call the functions within the DLL even though it did not have access to the DLL at compile time.
Introducing Dynamic Link Libraries has some more information.
http://support.microsoft.com/kb/815065
A DLL is a library that contains code
and data that can be used by more than
one program at the same time. For
example, in Windows operating systems,
the Comdlg32 DLL performs common
dialog box related functions.
Therefore, each program can use the
functionality that is contained in
this DLL to implement an Open dialog
box. This helps promote code reuse and
efficient memory usage.
By using a DLL, a program can be
modularized into separate components.
For example, an accounting program may
be sold by module. Each module can be
loaded into the main program at run
time if that module is installed.
Because the modules are separate, the
load time of the program is faster,
and a module is only loaded when that
functionality is requested.
Additionally, updates are easier to
apply to each module without affecting
other parts of the program. For
example, you may have a payroll
program, and the tax rates change each
year. When these changes are isolated
to a DLL, you can apply an update
without needing to build or install
the whole program again.
http://en.wikipedia.org/wiki/Dynamic-link_library
DLL is a File Extension & Known As “dynamic link library” file format used for holding multiple codes and procedures for Windows programs. Software & Games runs on the bases of DLL Files; DLL files was created so that multiple applications could use their information at the same time.
IF you want to get more information about DLL Files or facing any error read the following post.
https://www.bouncegeek.com/fix-dll-errors-windows-586985/
DLLs (Dynamic Link Libraries) contain resources used by one or more applications or services. They can contain classes, icons, strings, objects, interfaces, and pretty much anything a developer would need to store except a UI.
According to Microsoft
(DLL) Dynamic link libraries are files that contain data, code, or resources needed for the running of applications. These are files that are created by the windows ecosystem and can be shared between two or more applications.
When a program or software runs on Windows, much of how the application works depends on the DLL files of the program. For instance, if a particular application had several modules, then how each module interacts with each other is determined by the Windows DLL files.
If you want detailed explanation, check these useful resources
What are dll files , About Dll files

Resources