I have recently written a simple test program to call a few private functions in a DLL that ships with Windows. Since private functions are not exported, their address cannot be found with GetProcAddress(), and I am looking for a way to achieve the same without it.
With IDA Pro, I took note of the offset of an exported function, DllRegisterServer in this case. Since this function is exported, I can query its address with GetProcAddress. By knowing its offset from the beginning of the .text section of the PE executable, I can then dynamically query the address of the .text section after loading the DLL with LoadLibrary.
Again, still using IDA Pro, I took note of the offsets of each of the private functions of interest. Once I have the address of the .text section, I only need to add the offset for the known private functions and I now have the correct address to call those functions. There is only one problem, however: since the offset has been hardcoded, this will only work with this exact version of the DLL.
I know there are many tools that can read PE and give addresses of private functions. I assume that the debug symbols are used to query the names of those private functions. What I'm looking for is a simple library that can read PE and debug symbols and allow me to query by name the address of a private function.
Does anybody know of such a library that can help find the addresses of private function by names, making use if debug symbols when available?
Related
Sorry if the title is not very clear. I am using MinGW with GCC 6.3.0 to build a x86 32-bit DLL on Windows (so far). I'll spare you the details why I need hacky offsets amongst its sections accessible from code, so please do not ask if it's useful or not (because I don't want to bother explaining that).
So, if I can get the following testcase to work, I'm good. Here's my problem:
In a C++ file, I want to access a linker symbol as an absolute numeric value, not relocated, directly. Remember that I am building a 32-bit DLL which requires a .reloc section for relocations, but in this case I do NOT want relocation, in fact a relocation would screw it up completely.
Here's an example: retrieve the offset of say __imp__MessageBoxW#16 relative to __IAT_start__, in case you don't know what they are, __imp__MessageBoxW#16 is the relocated pointer to the actual function at runtime, and __IAT_start__ is a linker symbol in the default script file. Here's where it is defined:
.idata BLOCK(__section_alignment__) :
{
/* This cannot currently be handled with grouped sections.
See pe.em:sort_sections. */
KEEP (SORT(*)(.idata$2))
KEEP (SORT(*)(.idata$3))
/* These zeroes mark the end of the import list. */
LONG (0); LONG (0); LONG (0); LONG (0); LONG (0);
KEEP (SORT(*)(.idata$4))
__IAT_start__ = .;
KEEP (SORT(*)(.idata$5))
__IAT_end__ = .;
KEEP (SORT(*)(.idata$6))
KEEP (SORT(*)(.idata$7))
}
So far, no problem. Because GAS doesn't allow me to "subtract" two externally defined symbols (both symbols are defined in the linker), I have to define the symbol in the linker script, so at the end of the linker script I have this:
test_symbol = ABSOLUTE("__imp__MessageBoxW#16" - __IAT_start__);
Then in C++ I use this little inline asm to retrieve this relative difference which is supposed to be a fixed value once linked:
asm("movl $test_symbol, %0":"=r"(var));
Now var should contain that fixed number right? Wrong!
Because test_symbol is an "undefined" symbol as far as the assembler is concerned, it makes it relocated. Or I don't know why, but I tried so many things to force it to be an "absolute constant value symbol" instead of a "relocated symbol" to no avail. Even editing the linker script with many things like LD_FEATURE("SANE_EXPR") and others, doesn't work at all.
Its value is correct only if the DLL does not get relocated.
You see, either GNU LD or the assembler adds an entry in the .reloc section for that movl instruction, which is WRONG!
Is there a way to force it to treat an external/undefined symbol as a fixed CONSTANT and apply no relocation to it whatsoever? Basically, omit it from the .reloc section.
I am going crazy with this, please tell me there's something easy I overlooked, I searched for hours!
In other words, is there a way to use a Linker Symbol from within inline asm/C++ without having it relocated whatsoever? No entry to the .reloc section or anything, basically same as a constant like $1234. So if a DLL gets loaded into another base address, that constant would be the same everytime.
UPDATE: I forgot about this question but decided to bring an update, since it seems it's likely not possible as nobody even commented. For anyone else in the same boat as me, I presume this is a limitation of the COFF object format itself. In other words, external symbols are implicitly relocated, and it doesn't seem there's a way against this.
I didn't "fix" it the way I wanted, I did it in a very hacky way though. If anyone is interested, here's my ugly "hack":
First I put a special "custom" instruction in the inline assembly where I reference this external symbol from C++. This "custom" instruction holds a placeholder instruction that grabs the symbol (normal x86 asm instruction with a dummy constant, e.g. 1234) and a way to identify it. Then let GCC generate the assembly files (.S files), then I parse the assembly with a simple script and when I find that "custom" instruction I insert a label for the linker (make it .global) and at the same time add a directive to a custom "on-the-fly" generated linker script that gets included from my main linker script at the end.
This places data in a temporary section in the resulting DLL with absolute offsets to the custom instruction that I need, but without relocation.
Next, I parse the binary DLL itself, in particular that temporary section I added with all this hack. I take the offsets from there, convert them to file offsets, and modify the DLL's .text section directly where those offsets point (remember those placeholder instructions? it is replacing their immediate constants 1234 with the respective value from the linker's non-relocated constant). Then I strip the temporary section from the DLL, and it's done. Of course, all of this is done automatically by a helper program and script
It's an insane hack, but it works and it's fully automatic now that I got it going. If my assumption is correct that COFF doesn't support non-relocated external symbols, then it's really the only way to use linker constants from C++ without them being relocated, which would be a disaster.
How to know if the exported symbol from a dll is actually variable or a function ?
One way may be to look whether the destination address of the symbol resides in the .code section or not.
Another method could be to check the memory protection attributes of the selected section.
But all these methods seem to be unreliable.
What is the best way ?
I have been doing some x86 programming in Windows with NASM and I have run into some confusion. I am confused as to why I must do this:
extern _ExitProcess#4
Specifically I am confused about the '_' and the '#4'. I know that the '#4' is the size of the stack but why is it needed? When I looked in the kernel32.dll with a hex editor I only saw 'ExitProcess' not '_ExitProcess#4'.
I am also confused as to why C Functions do not need the underscore and the stack size such as this:
extern printf
Why don't C Functions need decorations?
My third question is "Is this the way I should be using these functions?" Right now I am linking with the actual dll files themselves.
I know that the '#4' is the size of the stack but why is it needed?
To enable the linker to report a fatal error if your compiler assumed the wrong calling convention for the function (this can happen if you forget to include header files in C and ignore all the compiler warnings or if a declaration doesn't exactly match the function in the shared library).
Why don't C Functions need decorations?
Functions that use the cdecl calling convention are decorated with a single leading (so it would actually be _printf).
The reason why no parameter size is encoded into the decorated name is that the caller is responsible for both setting up and tearing down the stack, so an argument count mismatch will not be fatal for the stack setup (though the calling function might still crash if it isn't given the right arguments, of course). It might even be possible that the argument count is variable, like in the case of printf.
When I looked in the kernel32.dll with a hex editor I only saw ExitProcess not _ExitProcess#4.
The mangled names are usually mapped to the actual exported names of the DLL using definition files (*.def), which then get compiled to *.lib import library files that can be used in your linker invocation. An example of such a definition file for kernel32.dll is this one. The following line defines the mapping for ExitProcess:
_ExitProcess#4 = ExitProcess
Is this the way I should be using these functions?
I don't know NASM very well, but the code I've seen so far usually specifies the decorated name, like in your example.
You can find more information on this excellent page about Win32 calling conventions.
I've been looking recently into creating a new native language. I understand the (very) basics of the PE format and I've grabbed an assembler with a fairly kind interface off the webs, which I've successfully used to implement some simple functions. But I've run into a problem using functions from a library. The only way that I've called library functions from a dynamically compiled function previously is to pass in the function pointer manually- something I can't do if I create PE files and execute them in their own process. Now, I'm not planning on using the CRT, but I will need access to the Win API to implement my own standard libraries. How do I generate a reference to a WinAPI function so that the PE loader will patch it up?
You need to write an import table. It's basically a list of function names that you wish to use in your application. It's pointed to by the PE header. The loader loads the DLL files into the process memory space for you, finds the requested function in their export table and leaves the address for it in the import table. You then usually dereference that and jmp there.
Check out Izelion's assembly tutorial for the full details and for asm examples.
How about starting by emitting C instead of assembly? Then writing directly to ASM is just an optimization.
I'm not being facetious: most compilers turn out some kind of intermediate code before the final native code pass.
I realize you're trying to get away from all the null-delmited rigmarole, but you'll need that for the WinAPI functions anyway.
Re-reading your question: you do realize that you can get the WinAPI function addresses by calling LoadLibrary(), then calling GetProcAddress(), and then setting up the call...right?
If you want to see how to bootstrap this from pure assembly: the old SDKs had ASM sample code, probably the new ones still do. If they don't, the DDK will.
I have the offset address's of all symbols (obtained with libelf executing on its own binary .so). Now, at runtime, I would need to calculate the absolutue address's of all those symbols and for that I would need to get the base address (where the shared library is loaded) and do a calculation:
symbol_address = base_address + symbol_offset
How can a shared lib get its own base address? On Windows I would use the parameter passed to DllMain, is there some equivalent in linux?
On Linux, dladdr() on any symbol from libfoo.so will give you
void *dli_fbase; /* Load address of that object */
More info here.
Alternatively, dl_iterate_phdr can give you load address of every ELF image loaded into current process.
Both are GLIBC extensions. If you are not using GLIBC, do tell what you are using, so more appropriate answer can be given.
This is an old question, but still relevant.
I found this example code from ubuntu to be very useful. It will print all your shared libraries and their segments.
http://manpages.ubuntu.com/manpages/bionic/man3/dl_iterate_phdr.3.html
After some research I managed to find out the method of discovering the address of the library loading by its descriptor, which is returned by the dlopen() function. It is performed with the help of such macro:
#define LIBRARY_ADDRESS_BY_HANDLE(dlhandle) ((NULL == dlhandle) ? NULL : (void*)*(size_t const*)(dlhandle))