Low-level details on linking and loading of (PE) programs in Windows

Low-level details on linking and loading of (PE) programs in Windows - windows

Low-level details on linking and loading of (PE) programs in Windows.
I'm looking for an answer or tutorial that clarifies how a Windows program are linked and loaded into memory after it has been assembled.
Especially, I'm uncertain about the following points:
After the program is assembled, some instructions may reference memory within the .DATA section. How are these references translated, when the program is loaded into memory starting at some arbitrary address? Does RVA's and relative memory references take care of these issues (BaseOfCode and BaseOfData RVA-fields of the PE-header)?
Is the program always loaded at the address specified in ImageBase header field? What if a loaded (DLL) module specifies the same base?

First I'm going to answer your second question:
No, a module (being an exe or dll) is not allways loaded at the base address. This can happen for two reasons, either there is some other module already loaded and there is no space for loading it at the base address contained in the headers, or because of ASLR (Address Space Layout Randomization) which mean modules are loaded at random slots for exploit mitigation purposes.
To address the first question (it is related to the second one):
The way a memory location is refered to can be relative or absolute. Usually jumps and function calls are relative (though they can be absolute), which say: "go this many bytes from the current instruction pointer". Regardless of where the module is loaded, relative jumps and calls will work.
When it comes to addressing data, they are usually absolute references, that is, "access these 4-byte datum at this address". And a full virtual address is specified, not an RVA but a VA.
If a module is not loaded at its base address, absolute references will all be broken, they are no longer pointing to the correct place the linker assumed they should point to. Let's say the ImageBase is 0x04000000 and you have a variable at RVA 0x000000F4, the VA will be 0x040000F4. Now imagine the module is loaded not at its BaseAddress, but at 0x05000000, everything is moved 0x1000 bytes forward, so the VA of your variable is actually 0x050000F4, but the machine code that accessess the data still has the old address hardcoded, so the program is corrupted. In order to fix this, linkers store in the executable where these absolute references are, so they can be fixed by adding to them how much the executable has been displaced: the delta offset, the difference between where the image is loaded and the image base contained in the headers of the executable file. In this case it's 0x1000. This process is called Base Relocation and is performed at load time by the operating system: before the code starts executing.
Sometimes a module has no relocations, so it can't be loaded anywhere else but at its base address. See How do I determine if an EXE (or DLL) participate in ASLR, i.e. is relocatable?
For more information on ASLR: https://insights.sei.cmu.edu/cert/2014/02/differences-between-aslr-on-windows-and-linux.html
There is another way to move the executable in memory and still have it run correctly. There exists something called Position Independent Code. Code crafted in such a way that it will run anywhere in memory without the need for the loader to perform base relocations.
This is very common in Linux shared libraries and it is done addressing data relatively (access this data item at this distance from the instruction pointer).
To do this, in the x64 architecture there is RIP-relative addressing, in x86 a trick is used to emulate it: get the content of the instruction pointer and then calculate the VA of a variable by adding to it a constant offset.
This is very well explained here:
https://www.technovelty.org/linux/plt-and-got-the-key-to-code-sharing-and-dynamic-libraries.html
I don't think PIC code is common in Windows, more often than not, Windows modules contain base relocations to fix absolute addresses when it is loaded somewhere else than its prefered base address, although I'm not exactly sure of this last paragraph so take it with a grain of salt.
More info:
http://opensecuritytraining.info/LifeOfBinaries.html
How are windows DLL actually shared? (a bit confusing because I didn't explain myself well when asking the question).
https://www.iecc.com/linker/
I hope I've helped :)

Related

What is there before ImageBase address in Virtual Address?

I know from the Microsoft documentation that the image base is set to 0x140000000 for 64-bit images and it is the base address where the executable file is first loaded into the memory.
So my questions are as follows
What comes before 0x140000000 address and starting of virtual address first page (0x0000000)
What does it mean by executable first loaded? Is it the entry point of the program (which is of course not the main function) or something else

What comes before 0x140000000 address and starting of virtual address first page (0x0000000)
Whatever happens to allocate there, like DLLs, file mappings, heap memory, or this memory can be free. The first page is always inaccessible.
What does it mean by executable first loaded? Is it the entry point of the program (which is of course not the main function) or something else
Loaded means mapped into memory. After it is mapped into memory, its imports are resolved, statically linked DLLs are mapped into memory, their entry points are executed, and only then it comes to the executable entry point. Executable entry point is not really the first function to execute from the executable if it has TLS callbacks.

I don't know the technical reason why the 64-bit default is so high, perhaps just to make sure your app does not have 32-bit pointer truncation bugs with data/code in the module? And it is important to note that this default comes from the Microsoft compiler, Windows itself will accept a lower value. The default for 32-bit applications is 0x00400000 and there are actual hardware and technical reasons for that.
The first page starting at 0 is off limits in most operating systems to prevent issues with de-referencing a NULL pointer. The first couple of megabytes might have BIOS/firmware or other legacy things mapped there.
By first loaded, it means the loader will map the file into memory starting at that address. First the MZ part (DOS header and stub code) and the PE header. After this comes the various sections listed in the PE header.
Most applications are using ASLR these days so the base address will be random and not the preferred address listed in the PE. ntdll and kernel32 are mapped before the exe so if you choose their base address you will also be relocated.

Why ELF executables have a fixed load address?

ELF executables have a fixed load address (0x804800 on 32-bit x86 Linux binaries, and 0x40000 on 64-bit x86_64 binaries).
I read the SO answers (e.g., this one) about the historical reasons for those specific addresses. What I still don't understand is why to use a fixed load address and not a randomized one (given some range to be randomized within)?

why to use a fixed load address and not a randomized one
Traditionally that's how executables worked. If you want a randomized load address, build a PIE binary (which is really a special case of shared library that has startup code in it) with -fPIE and link with -pie flags.
Building with -fPIE introduces runtime overhead, in some cases as bad as 10% performance degradation, which may not be tolerable if you have a large cluster or you need every last bit of performance.

not sure if I understood your question correct, but saying I did, that's sort-off a "legacy" / historical issue, ELF is the file format used by UNIX derived operating systems, both POSIX (IOS) and Unix-like (Linux).
and the elf format simply states that there must be some resolved and absolute virtual address that the code is loaded into and begins running from...
and simply that's how the file format is, and during to historical reasons that cant be changed... you couldn't just "throw" the executable in any memory address and have it run successfully, back in the 90's when the ELF format was introduced problems such as calling functions with virtual tables we're raised and it was decided that the elf format would have absolute addresses within it.
Also think about it, take a look at the elf format -https://en.wikipedia.org/wiki/Executable_and_Linkable_Format
how would you design an OS executable-loader that would be able to handle an executable load it to ANY desired virtual address and have the code run successfully without actually having to change the binary itself... if you would like to do something like that you'd either need to vastly change the output compilers generate or the format itself, which again isn't possible
As time passed the requirement of position independent executing (PIE/PIC) has raised and shared objects we're introduced in order to allow that and ASLR
(Address Space Layout Randomization) - which means that the code could be thrown in any memory address and still be able to execute, that is simply implemented by making sure that all calls within the code itself are relative to the current address of the executed instruction, AND that when the shared object is loaded the OS loader would have to change some data within the binary given that the data changed is not executable instructions (R E) but actual data (RW, e.g .data segment), which also is implemented by calling functions from some "Jump tables" ( which would be changed at load time ) for example PLT / GOT.... those shared objects allow absolute randomization of the addresses the code is loaded to and if you want to execute some more "secure" code you'd have to compile it as a shared object and and dynamically link it and load time or run time..
( hope I've cleared some things out :) )

ELF, PIE ASLR and everything in between, specifically within Linux

Before asking my question, I would like to cover some few technical details I want to make sure I've got correct:
A Position Independent Executable (PIE) is a program that would be able to execute regardless of which memory address it is loaded into, right?
ASLR (Address Space Layout Randomization) pretty much states that in order to keep addresses static, we would randomize them in some manner,
I've read that specifically within Linux and Unix based systems, implementing ASLR is possible regardless of if our code is a PIE, if it is PIE, all jumps, calls and offsets are relative hence we have no problem.
If it's not, code somehow gets modified and addresses are edited regardless of whether the code is an executable or a shared object.
Now this leads me to ask a few questions
If ASLR is possible to implement within codes that aren't PIE and are executables AND NOT SHARED / RELOCATABLE OBJECT (I KNOW HOW RELOCATION WORKS WITHIN RELOCATABLE OBJECTS!!!!), how is it done? ELF format should hold no section that states where within the code sections are functions so the kernel loader could modify it, right? ASLR should be a kernel functionality so how on earth could, for example, an executable containing, for example, these instructions.
pseudo code:
inc_eax:
add eax, 5
ret
main:
mov eax, 5
mov ebx, 6
call ABSOLUTE_ADDRES{inc_eax}
How would the kernel executable loader know how to change the
addresses if they aren't stored in some relocatable table within the ELF
file and aren't relative in order to load the executable into some
random address?
Let's say I'm wrong, and in order to implement ASLR you must have a
PIE executable. All segments are relative. How would one compile a
C++ OOP code and make it work, for example, if I have some instance
of a class using a pointer to a virtual table within its struct,
and that virtual table should hold absolute addresses, hence I
wouldn't be able to compile a pure PIE for C++ programs that have
usage of run time virtual tables, and again ASLR isn't possible....
I doubt that virtual tables would contain relative addresses and
there would be a different virtual table for each call of some
virtual function...
My last and least significant question is regarding ELF and PIE — is there some special way to detect an ELF executable is PIE? I'm familiar with the ELF format so I doubt that there is a way, but I might be wrong. Anyway, if there isn't a way, how does the kernel loader know if our executable is PIE hence it could use ASLR on it.
I've got this all messed up in my head and I'd love it if someone could help me here.

Your question appears to be a mish-mash of confusion and misunderstanding.
A Position Independent Executable (PIE) is a program that would be able to execute regardless of which memory address it is loaded into, right?
Almost. A PIE binary usually can not be loaded into memory at arbitrary address, as its PT_LOAD segments will have some alignment requirements (e.g. 0x400, or 0x10000). But it can be loaded and will run correctly if loaded into memory at address satisfying the alignment requirements.
ASLR (Address Space Layout Randomization) pretty much states that in order to keep addresses static we would randomize them in some manner,
I can't parse the above statement in any meaningful way.
ASLR is a technique for randomizing various parts of address space, in order to make "known address" attacks more difficult.
Note that ASLR predates PIE binaries, and does not in any way require PIE. When ASLR was introduced, it randomized placement of stack, heap, and shared libraries. The placement of (non-PIE) main executable could not be randomized.
ASLR has been considered a success, and therefore extended to also support PIE main binary, which is really a specially crafted shared library (and has ET_DYN file type).
call ABSOLUTE_ADDRES{inc_eax}
how would the kernel executable loader know how to change the addresses if > they aren't stored in some relocatable table
Simple: on x86, there is no instruction to call ABSOLUTE_ADDRESS -- all calls are relative.
2 ... I wouldn't be able to compile a pure PIE for C++ programs that have usage of run time virtual tables, and again ASLR isn't possible..
PIE binary requires relocation, just like a shared library. Virtual tables in PIE binaries work exactly the same way they work in shared libraries: ld-linux.so.2 updates GOT (global offset table) before transferring control to the PIE binary.
3 ... is there some special way to detect an ELF executable is PIE
Simple: a PIE binary has ELF file type set to ET_DYN (a non-PIE binary will have type ET_EXEC). If you run file a.out on a PIE executable, you'll see that it's a "shared library".

What are possible reasons for not mapping Win32 Portable Executable images at offset 0?

I've been looking into Window's PE format lately and I have noticed that in most examples,
people tend to set the ImageBase offset value in the optional header to something unreasonably high like 0x400000.
What could make it unfavorable not to map an image at offset 0x0?

First off, that's not a default of Windows or the PE file format, it is the default for the linker's /BASE option when you use it to link an EXE. The default for a DLL is 0x10000000.
Selecting /BASE:0 would be bad choice, no program can ever run at that base address. The first 64 KB of the address space is reserved and can never be mapped. Primarily to catch null pointer dereference bugs. And expanded to 64KB to catch pointer bugs in programs that started life in 16-bits and got recompiled to 32-bits.
Why 0x40000 and not 0x10000 is the default is a historical accident as well and goes back to at least Windows 95. Which reserved the first 4 megabytes of the address space for the "16-bit/MS-DOS Compatibility Arena". I don't remember much about it, Windows 9x had a very different 16-bit VM implementation from NT. You can read some more about it in this ancient KB article. It certainly isn't relevant anymore these days, a 64-bit OS will readily allocate heap memory in the space between 0x010000 and 0x400000.
There isn't any point in changing the /BASE option for an EXE. However, there's lots of point changing it for a DLL. They are much more effective if they don't overlap and thus don't have to be relocated, they won't take any space in the paging file and can be shared between processes. There's even an SDK tool for it so you can change it after building, rebase.exe

Practically, the impact of setting /BASE to 0 depends on the Address Space Layout Randomization (ASLR) setting of your image (which is also put by the Linker - /DYNAMICBASE:NO).
Should your image have /BASE:0 and ASLR is on (/DYNAMICBASE:YES), then your image will start and run because the loader will automatically load it at a "valid" address.
Should your image have /BASE:0 and ASLR is off (/DYNAMICBASE:NO), then your image will NOT start because the loader will NOT load it at the desired based address (which is, as explained above, unvalid/reserved).

If you map it to address 0 then that means the code expects to be running starting at address zero.
For the OS, address zero is NULL, which is an invalid address.
(Not "fundamentally", but for modern-day OSes, it is.)
Also, in general you don't want anything in the lower 16 MiB of memory (even virtual), for numerous reasons.
But what's the alternative? It has to be mapped somewhere, so they chose 0x400000... no particular reason for that particular address, probably. It was probably just handy.

Microsoft chose that address as the default starting address specified by the linker at which the PE file will be memory mapped. The linker assumes this address and and can optimize the executable with that assumption. When the file is memory mapped at that address the code can be run without needing to modify any internal offsets.
If for some reason the file cannot be loaded to that location (another exe/dll already loaded there) relocations will need to occur before the executable can run which will increase load times.
Lower memory addresses are usually assumed to contain low level system routines and are generally left alone. The only real requirement for the ImageBase address is that it is a multiple of 0x10000.
Recommended reading:
http://msdn.microsoft.com/en-us/library/ms809762.aspx

How can I determine if Windows applies ASLR without rebooting?

As far as I understand, ASLR Address Space Layout Randomization will only do random relocation per system start (per reboot).
Address Space Layout Randomization (ASLR)
ASLR moves executable images into random locations when a system
boots, making it harder for exploit
code to operate predictably. (...)
If this is the case, how can I then "test" or, rather, check that ASLR is happening for my C++ module or for a system module (say, kernel32.dll) without repeatedly restarting Windows and hoping the randomness kicks in?

This is what I would try:
Remember that a module's HMODULE handle is actually the base address of the module's image. You can use GetModuleHandle to obtain this value. If you compare that to the base address in the image's optional header values, we would expect those two values to be different when ASLR is turned on.
Keep in mind that this would only be a clear indicator of ASLR when GetModuleHandle is used on certain system DLLs; it would work for kernel32 because it is not a typical candidate for image relocation:
Microsoft system DLLs are all given unique recommended base addresses; and
It is one of the first DLLs mapped into the process address space.
Since kernel32 wouldn't typically be relocated, if ASLR was turned off it would be reasonable to expect it to be loaded at its recommended base address.
How do you obtain the recommended base address from the image headers? The easiest way is to use the DUMPBIN utility included with Visual C++. If you'd rather do it programatically, you will need to do some spelunking through the executable image's headers until you locate the IMAGE_OPTIONAL_HEADER structure's ImageBase field. For more information about PE headers, I'd recommend "An In-Depth Look into the Win32 Portable Executable File Format" by Matt Pietrek.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio