Address fixup for calls to DLL functions is a multistage process: the linker directs the call instruction to an indirect jump instruction, and the indirect jump instruction to a word of memory in the import table in the .rdata section where the Windows program loader will place the address of the function when the DLL is loaded at runtime.
The indirect jump instruction must be generated by the linker because the compiler doesn't know the function will turn out to be in a DLL. Program file size is minimized by generating only one indirect jump instruction for each function, no matter how many places it's called from.
Given that, the obvious way to do it is to gather all the indirect jump instructions at the end of the text section, after all the compiler-generated code in all the object files, and that does seem to be what happens when I try a simple test case with the Microsoft linker /nodefaultlib switch (which generates a small enough executable that I can understand the full disassembly).
When I link a small program in the normal way with the C standard library, the resulting executable is large enough that I can't follow all of the disassembly, but as far as I can see, the indirect jump instructions seem to be scattered throughout the code in small groups of maybe three at a time.
Is there a reason for this that I'm missing?
The indirect jump instruction must be generated by the linker because
the compiler doesn't know the function will turn out to be in a DLL.
Actually, this is not always the case. If you mark the function with __declspec(dllimport), the compiler does know it will be a DLL import and in that case it can generate an indirect call:
; HMODULE = LoadLibrary("mylib");
push offset $SG66630
call [__imp__LoadLibraryA#4]
(__imp__LoadLibraryA#4 is the pointer to the import in the IAT)
If you do not use dllimport then the compiler generates a relative function call:
push offset $SG66630
call _LoadLibraryA#4
And in such case the linker has to generate a jump stub:
LoadLibraryA proc near
jmp [__imp__LoadLibraryA#4]
LoadLibraryA endp
And, in fact, it does group such jump stubs together (though possibly by compile unit and/or imported DLL, not 100% sure here).
Note: in the past, the linker did not explicitly generate jump stubs but took them from the import libraries. They contained complete object files both the stubs and the structures necessary for generating the PE import directory. See this article for how it all worked: https://www.microsoft.com/msj/0498/hood0498.aspx
These days the import libraries have only the API and DLL names and the linker knows how to generate the necessary code and metadata for importing them.
Related
How can I find the main method in the PE executable file, should I find the entry point address and start from that point or find three pushes of the stack in case the PE is written in C?
There are not going to be 3 pushes because main is not the real entry point on Windows. The compiler will insert extra code that initializes things and then calls main/WinMain. There is probably too much code between the real start and main to automate finding main. You would have to consider multiple versions of Visual Studio and MinGW. And some exe files do not use the C run-time at all and execute directly from the real entrypoint.
The entry point is a function that takes zero arguments. Its address is the load address of the .exe (Starting with MZ) + IMAGE_OPTIONAL_HEADER.AddressOfEntryPoint.
When I compile a C code using GCC to MIPS, it contains code like:
daddiu $28,$28,%lo(%neg(%gp_rel(f)))
And I have trouble understanding instructions starting with %.
I found that they are called macros and predefined macros are dependent on the assembler but I couldn't find description of the macros (as %lo, %neg etc.) in the documentation of gas.
So does there exist any official documentation that explains macros used by GCC when generating MIPS code?
EDIT: The snippet of the code comes from this code.
This is a very odd instruction to find in compiled C code, since this instruction is not just using $28/$gp as a source but also updating that register, which the compiler shouldn't be doing, I would think. That register is the global data pointer, which is setup on program start, and used by all code accessing near global variables, so it shouldn't ever change once established. (Share a godbolt.org example, if you would.)
The functions you're referring to are for composing the address of labels that are located in global data. Unlike x86, MIPS cannot load (or otherwise have) a 32-bit immediate in one instruction, and so it uses multiple instructions to do work with 32-bit immediates including address immediates. A 32-bit immediate is subdivided into 2 parts — the top 16-bits are loaded using an LUI and the bottom 16-bits using an ADDI (or LW/SW instruction), forming a 2 instruction sequence.
MARS does not support these built-in functions. Instead, it uses the pseudo instruction, la $reg, label, which is expanded by the assembler into such a sequence. MARS also allows lw $reg, label to directly access the value of a global variable, however, that also expands to multiple instruction sequence (sometimes 3 instructions of which only 2 are really necessary..).
%lo computes the low 16-bits of a 32-bit address for the label of the argument to the "function". %hi computes the upper 16-bits of same, and would be used with LUI. Fundamentally, I would look at these "functions" as being a syntax for the assembly author to communicate to the assembler to share certain relocation information/requirements to the linker. (In reverse, a disassembler may read relocation information and determine usage of %lo or %hi, and reflect that in the disassembly.)
I don't know %neg() or %gp_rel(), though could guess that %neg negates and %gp_rel produces the $28/$gp relative value of the label.
%lo and %hi are a bit odd in that the value of the high immediate sometimes is offset by +1 — this is done when the low 16-bits will appear negative. ADDI and LW/SW will sign extend, which will add -1 to the upper 16-bits loaded via LUI, so %hi offsets its value by +1 to compensate when that happens. This is part of the linker's operation since it knows the full 32-bit address of the label.
That generated code is super weird, and completely different from that generated by the same compiler, but 32-bit version. I added the option -msym32 and then the generated code looks like I would expect.
So, this has something to do with the large(?) memory model on MIPS 64, using a multiple instruction sequence to locate and invoke g, and swapping the $28/$gp register as part of the call. Register $25/$t9 is somehow also involved as the generated code sources it without defining it; later, prior to where we would expect the call it sets $25.
One thing I particularly don't understand, though, is where is the actual function invocation in that sequence! I would have expected a jalr instruction, if it's using an indirect branch because it doesn't know where g is (except as data), but there's virtually nothing but loads and stores.
There are two additional oddities in the output: one is the blank line near where the actual invocation should be (maybe those are normal, but usually don't see those inside a function) and the other is a nop that is unnecessary but might have been intended for use in the delay slot following an invocation instruction.
Today, only for the testing purposes, I came with the following idea, to create and compile a naive source code in CodeBlocks, using Release target to remove the unnecessary debugging code, a main function with three nop operations only to find faster where the entry point for the main function is.
CodeBlocks sample naive program:
Using IDA disassembler, I have seen something strange, OS actually can add aditional machine code calls in the main function (added implicitly), a call to system function which reside in kernel32.dll what is used for OS thread handling.
IDA program view:
In the machine code only for test reason the three "nop" (90) was replaced by "and esp, 0FFFFFFF0h", program was re-pached again, this is why "no operation" opcodes are not disponible in the view.
Observed behaviour:
It is logic to create a new thread for each process is opened, as we can explore it in the TaskManager, a process run in it's own thread, that is a reason why compiler add this code (the implicit default thread).
My questions:
How compiler know where to "inject" this call code automatically?
Why this call is not made before in the upper function (sub_401B8C) which will route to main function entry point?
To quote the gcc manual:
If no init section is available, when GCC compiles any function called
main (or more accurately, any function designated as a program entry
point by the language front end calling expand_main_function), it
inserts a procedure call to __main as the first executable code after
the function prologue. The __main function is defined in libgcc2.c and
runs the global constructors.
I want to generate shellcode using the following NASM code:
global _start
extern exit
section .text
_start:
xor rcx, rcx
or rcx, 10
call exit
The problem here is that I cannot use this because the address of exit function cannot be hard coded. So, how do I go about using library functions without having to re-implement them using system calls?
One way that I can think of, is to retrieve the address of exit function in a pre-processing program using GetProcAddress and substitute it in the shellcode at the appropriate place.
However, this method does not generate shellcode that can be run as it is. I'm sure there must be a better way to do it.
I am not an expert on writing shellcode, but you could try to find the import address table (IAT) of your target program and use the stored function pointers to call windows functions.
Note that you would be limited to the functions the target program uses.
Also you would have to let your shellcode calculate IAT's position relative to the process's base address due to relocations. Of course you could rely on Windows not relocating, but this might result in errors in a few cases.
Another issue is that you would have to find the target process's base address from outside.
A totally different attempt would be using syscalls, but they are really hard to use, not talking about the danger using them.
Information on PE file structure:
https://msdn.microsoft.com/en-us/library/ms809762.aspx
So recently I've been wanting to call some win32 calls from assembly, and I've been using NASM as my external assembler. I was calling SendMessage in my code in the following way:
call __imp__SendMessageW#16
This was assembled into a relative jump (0xE8 opcode) and the result was an access violation. In the debugger, the computed jump offset seemed to be the correct one (in that __imp__SendMessageW#16 really did seem to reside there) but nonetheless it did not work. Examining the assembly produced by Visual Studio when I called the function from C++, I noticed that it wasn't a relative immediate jump it was using, but instead (in the language of MASM) a call dword ptr [__imp__SendMessageW#16], corresponding to an 0xFF15 opcode. After some futzing around I figured out that NASM syntax encodes this as call dword near [dword __imp__SendMessageW#16], and making the change my code suddenly worked.
My question is, why does one work and not the other? Is there some relocation of code going on that causes the relative immediate call to jump somewhere unfriendly? I've never been much of an assembly programmer but my impression was always that the two calls should do the same thing and the main difference is that one is position independent and the other is not (assuming that they move the IP to the same place). The relocation of code theory makes sense given that, but then how do you explain the debugger showing the right address?
Also: what's the logic behind the [] syntax in this call? The offset is still an immediate (just little endian encoded immediately after 0xFF15), there's no memory access going on here beyond the instruction fetch (I tend to think of [] as a dereference outside the context of lea).
call dword[__imp__SendMessageW#16]
_imp_SendMessageW#16 is an address to your imports section that contains the address of the API function. You use the square brackets to deference (call the address STORED by this address)