Exported variable vs exported function in a DLL - windows

How to know if the exported symbol from a dll is actually variable or a function ?
One way may be to look whether the destination address of the symbol resides in the .code section or not.
Another method could be to check the memory protection attributes of the selected section.
But all these methods seem to be unreliable.
What is the best way ?

Related

Using Linker Symbol from C++ code as a fixed constant (NOT relocated) in a shared library (DLL)

Sorry if the title is not very clear. I am using MinGW with GCC 6.3.0 to build a x86 32-bit DLL on Windows (so far). I'll spare you the details why I need hacky offsets amongst its sections accessible from code, so please do not ask if it's useful or not (because I don't want to bother explaining that).
So, if I can get the following testcase to work, I'm good. Here's my problem:
In a C++ file, I want to access a linker symbol as an absolute numeric value, not relocated, directly. Remember that I am building a 32-bit DLL which requires a .reloc section for relocations, but in this case I do NOT want relocation, in fact a relocation would screw it up completely.
Here's an example: retrieve the offset of say __imp__MessageBoxW#16 relative to __IAT_start__, in case you don't know what they are, __imp__MessageBoxW#16 is the relocated pointer to the actual function at runtime, and __IAT_start__ is a linker symbol in the default script file. Here's where it is defined:
.idata BLOCK(__section_alignment__) :
{
/* This cannot currently be handled with grouped sections.
See pe.em:sort_sections. */
KEEP (SORT(*)(.idata$2))
KEEP (SORT(*)(.idata$3))
/* These zeroes mark the end of the import list. */
LONG (0); LONG (0); LONG (0); LONG (0); LONG (0);
KEEP (SORT(*)(.idata$4))
__IAT_start__ = .;
KEEP (SORT(*)(.idata$5))
__IAT_end__ = .;
KEEP (SORT(*)(.idata$6))
KEEP (SORT(*)(.idata$7))
}
So far, no problem. Because GAS doesn't allow me to "subtract" two externally defined symbols (both symbols are defined in the linker), I have to define the symbol in the linker script, so at the end of the linker script I have this:
test_symbol = ABSOLUTE("__imp__MessageBoxW#16" - __IAT_start__);
Then in C++ I use this little inline asm to retrieve this relative difference which is supposed to be a fixed value once linked:
asm("movl $test_symbol, %0":"=r"(var));
Now var should contain that fixed number right? Wrong!
Because test_symbol is an "undefined" symbol as far as the assembler is concerned, it makes it relocated. Or I don't know why, but I tried so many things to force it to be an "absolute constant value symbol" instead of a "relocated symbol" to no avail. Even editing the linker script with many things like LD_FEATURE("SANE_EXPR") and others, doesn't work at all.
Its value is correct only if the DLL does not get relocated.
You see, either GNU LD or the assembler adds an entry in the .reloc section for that movl instruction, which is WRONG!
Is there a way to force it to treat an external/undefined symbol as a fixed CONSTANT and apply no relocation to it whatsoever? Basically, omit it from the .reloc section.
I am going crazy with this, please tell me there's something easy I overlooked, I searched for hours!
In other words, is there a way to use a Linker Symbol from within inline asm/C++ without having it relocated whatsoever? No entry to the .reloc section or anything, basically same as a constant like $1234. So if a DLL gets loaded into another base address, that constant would be the same everytime.
UPDATE: I forgot about this question but decided to bring an update, since it seems it's likely not possible as nobody even commented. For anyone else in the same boat as me, I presume this is a limitation of the COFF object format itself. In other words, external symbols are implicitly relocated, and it doesn't seem there's a way against this.
I didn't "fix" it the way I wanted, I did it in a very hacky way though. If anyone is interested, here's my ugly "hack":
First I put a special "custom" instruction in the inline assembly where I reference this external symbol from C++. This "custom" instruction holds a placeholder instruction that grabs the symbol (normal x86 asm instruction with a dummy constant, e.g. 1234) and a way to identify it. Then let GCC generate the assembly files (.S files), then I parse the assembly with a simple script and when I find that "custom" instruction I insert a label for the linker (make it .global) and at the same time add a directive to a custom "on-the-fly" generated linker script that gets included from my main linker script at the end.
This places data in a temporary section in the resulting DLL with absolute offsets to the custom instruction that I need, but without relocation.
Next, I parse the binary DLL itself, in particular that temporary section I added with all this hack. I take the offsets from there, convert them to file offsets, and modify the DLL's .text section directly where those offsets point (remember those placeholder instructions? it is replacing their immediate constants 1234 with the respective value from the linker's non-relocated constant). Then I strip the temporary section from the DLL, and it's done. Of course, all of this is done automatically by a helper program and script
It's an insane hack, but it works and it's fully automatic now that I got it going. If my assumption is correct that COFF doesn't support non-relocated external symbols, then it's really the only way to use linker constants from C++ without them being relocated, which would be a disaster.

Why PE need Original First Thunk(OFT)?

There is "First Thunk"(FT), which loader overwrites after execution with correct addresses.
But when PE uses OFT?
Does PE even need it?
The original first thunk is needed if the imports are bound but the imported .DLL does not match.
On a fresh unpatched version of Windows, all addresses of all functions in the base .DLLs (ntdll, kernel32, user32 etc) are known. Take shell32 for example, it links to kernel32!CreateProcess and the true address of CreateProcess can be stored directly in shell32. This is called import binding and lets the loader skip the step where it looks up all the addresses of the imported functions.
This does not work if the imported .DLL has not been loaded at its preferred address nor if the .DLL has changed (security update etc). If this happens then the loader has to look up the functions "the normal way" and the original first thunk array has to be used because that is the only place where the RVAs of the function names are stored.
If import binding is not used then the original first thunk array is optional and might not be present.
ASLR has probably made this optimization irrelevant.
Let me summarize a lot of things for you here. When you load a Library, for example, Milad.dll and then try to call a function from that like MPrint, dynamic loader of the windows operating system has to resolve the address of the MPrint function and then call it. How can OS resolve the address of that function?
Windows go through some really complicated stuff which I want to tell you those steps with a simple tongue. The dynamic loader of windows OS to resolve the address of the function in DLLs has to check Import Name Table (INT), Import Ordinal Table (IOT) and Import Address Table (IAT) table. These table pointed by AddressOfNames, AddressOfNamesOrdinal and AddressOfFunction member in Export directory a PE structure.
After OS load Milad.dll in address space of target process with help of LoadLibrary, it's going to fill INT, IOT and IAT table with their RVA in target address space of the process with GetProcAddress and doing some calculation.
There is an array of Import Directory in the process structure that has OriginalFirstThunk, TimeDateStamp, ForwarderChain, Name, FirstThunk which these members point to some important addresses.
Name in Import Directory (Image_Import_Data) pointed to the name of
the DLL which process tries to call, in this example this DLL is
Milad.dll.
OriginalFirstThunk pointed to Import Name Table which includes Names
of functions that exported by the Milad.Dll. Functions in this table
have a unique index which loader takes that index and go to the next
step and reference to Import Ordinal Table with that index and takes
the value which there is into that index of Import Ordinal Table
which It's another integer value.
FirstThunk is another important member which point to IAT. in the
previous step dynamic loader takes an integer value via IOT. this
value is an index number which dynamic loader refer to IAT with that value.
In this table, there is an address in index value which dynamic
loader gets from INT-IOT. After these steps when dynamic loader
finds out the correct address of the function, it puts that address
to Import Address Table for MPrint function. So the process can call
that function with its address.
This is a simple explanation for complicated stuff which loader does to resolve the address of the functions in DLLs via Name, OFT(INT) and FT(IAT) members in Image_Import_Data.
We need to know that when the PE file is loaded into memory, the PE loader will look at the IMAGE_THUNK_DATAs and IMAGE_IMPORT_BY_NAMEs and determine the addresses of the import functions. Then it replaces the IMAGE_THUNK_DATAs in the array pointed to by FirstThunk with the real addresses of the functions. Thus when the PE file is ready to run. The array of RVAs pointed to by OriginalFirstThunk remains unchanged so that if the need arises to find the names of import functions, the PE loader can still find them.

How do call instructions get generated for imported functions in a compiled module

I am not sure if I am phrasing the question correctly, but basically I want to know how the call instruction is generated when calling an imported function from another library.
For example
GetModuleFileName(...)
is compiled to
call 0x4D0000
where 0x4D0000 is the address of the imported function which is dynamic.
How does windows set those calls and would it be possible to circumvent it and set a custom address instead.
The address used in the call statement isn't dynamic. It's a relative address that's fixed at link time like a call to any other function. That's because the call is actually to a stub, and the stub performs an indirect jump to the real function. The indirect jump uses a memory operand that refers to location in the import table. When the executable (or DLL) is loaded by Windows it updates the import table with addresses of all the functions the executable or DLL uses in any DLLs it's linked to.
So if an executable a call instruction like this:
call _GetModuleFileNameA#12
Then somewhere else in the same executable is astub like this:
_GetModuleFileNameA#12:
jmp [__imp__GetModuleFileNameA#12]
And somewhere in the import table there is a definition like this:
__imp__GetModuleFileNameA#12:
DD ?
Windows sets the value of __imp_GetModuleFileName#12 in the import table when the executable (or DLL) is loaded. There's not much you can do change this, though it's not too hard to change the value after the executable (or DLL) has been loaded. Note that the import table might be located in a read-only section, meaning you may need to change the virtual memory protections in order to do this.

How can I use GetProcAddress() to load functions with unlimited function arguments?

I had browsed internet, but hadn't found an answer.
Previously we used static linking using def file.
Currently this approach is not suitable, because there are cases when dll is not accessible.
So now we need to load dynamically function with unlimited function arguments.
Is there a common approach? Just push in right direction or some topic related for that is OK.
GetProcAddress does not care about the number of arguments the function has. If you use C++ and your problem is name mangling, you can either mark the functions with extern "C" or pass the mangled name to GetProcAddress.

Using GetProcAddress when the name might be decorated

What is the correct way to use GetProcAddress() on a 32 bit DLL? On win32, there are three calling conventions, cdecl, stdcall and fastcall. If the function in the DLL is foo they will decorate the name in the following ways _foo, _foo#N and #foo#N.
But if the author of the dll has used a .def file, then the exported name will be changed to just "foo" without any decoration.
This spells trouble for me because if I want to load foo from a dll that is using stdcall, should I use the decorated name:
void *h = LoadLibraryEx(L"foo.dll", NULL, 0);
GetProcAddres((HMODULE)h, L"_foo#16");
Or the undecorated one:
void *h = LoadLibraryEx(L"foo.dll", NULL, 0);
GetProcAddres((HMODULE)h, L"foo");
? Should I guess? I've looked at lots of 32 bit DLL files (stdcall and cdecl) and they all seem to export the undecorated name. But can you assume that is always the case?
There's really no shortcut or definitive rule here. You have to know the name of the function. The normal scenario is that you know at compile time the name of the function. In which case it does not matter whether the exported name is mangled, decorated, or indeed completely unrelated to the semantic name. Functions can be exported without names, by ordinal. Again, you need to know how the function was exported.
If you are presented with a header file for a library, and want to link to it with explicit linking (LoadLibrary/GetProcAddress) then you will need to find out the names of the function. Use a tool like dumpbin or Dependency Walker to do that.
Now, the other scenario which might lead to you asking the question is that you don't know the name at compile time. For instance, the name is provided by the user of your program in one way or another. Again, it is quite reasonable to require the user to know the exported name of the function.
Finally, you can parse the PE meta data for the executable file to enumerate its exported function. This will give you a list of exported function names, and exported function ordinals. This is what tools like dumpbin and Dependency Walker do.
If __declspec(dllexport) is used during compilation and __declspec(dllimport) in header file, as well as extern "c", then you do not need to decorate those functions. The __declspec helps in using the undecorated names, but function overloads, namespaces, and classes still require same way to distinguish them.
Usually, object oriented functions are exported using function ordinals instead of their decorated names. Cast the ordinal as (char*)(unsigned short)ordinal, thus, GetProcAddress(module, (char*)(unsigned short)ordinal);
Edit: while most of Windows use UTF-16, GetProcAddress uses UTF-8, so it cannot use a wide character string.
GetProcAddress(module, L"foo") is identical to GetProcAddress(module, "f");

Resources