How would one restore missing PE headers? - windows

I have a binary file which once was a valid PE executable, but all the headers were erased (DOS-header, PE-header and sections table). I managed to guess that one section is .text since if converted to asm in IDA it shows some valid asm code. .rdata was easy to find as well since it contains some strings which correspond to program's logic. But no further progress. I guess I'm not the first one to stumble upon this problem and there are tools/methods to generate PE headers. Any suggestions?

I think you will have some problem that you couldn't fix
the entry point ( where the binary begin)
the relocation (but you can fix the base adress to skip it)
the base adress (but in general it is always the same just need to know if it x86 or x64)
the library used it and the extern functions
perhaps the resourse for instance py2exe create a resource for the python bytecode
and last things bu certainly some other if you have a tls fls in the binary

Related

Where is _start symbol likely to be defined

I have some startup assembly for RISCV which defines the .text section as beginning at .globl _start.
I know what this is - as a disassembly shows me the address, but I cannot see where it is defined. It's not in the linker script and a grep in the build directories shows it is in various binary files, but I cannot find a definition.
I am guessing this appears in a file somewhere as a function of the architecture, but can anyone tell me where? (This is all being built using RISCV GNU cross compilers on Linux)
Unless you control it yourself there is usually at least in the gnu tools world a file called crt0.s. Or perhaps some other name. Should be one per architecture since it is in assembly language. It is the default bootstrap, zeros .bss copies .data as needed, etc.
I dont remember if it is part of the C library (glibc, newlib, etc), or if it is added on later by folks that build a toolchain targeting some specific platform.
Not required certainly but it is not uncommon to see _start be the label of the beginning of the binary, it is supposed to be the entry point certainly. So if you have an operating system/loader that uses a binary with labels present (elf, etc), then it can load the binary and instead of branching to the first address it branches to the entry point.
So the _start is merely defined as being at the start of the .text section, and the address of the .text section is defined in the linker script.

Low-level details on linking and loading of (PE) programs in Windows

Low-level details on linking and loading of (PE) programs in Windows.
I'm looking for an answer or tutorial that clarifies how a Windows program are linked and loaded into memory after it has been assembled.
Especially, I'm uncertain about the following points:
After the program is assembled, some instructions may reference memory within the .DATA section. How are these references translated, when the program is loaded into memory starting at some arbitrary address? Does RVA's and relative memory references take care of these issues (BaseOfCode and BaseOfData RVA-fields of the PE-header)?
Is the program always loaded at the address specified in ImageBase header field? What if a loaded (DLL) module specifies the same base?
First I'm going to answer your second question:
No, a module (being an exe or dll) is not allways loaded at the base address. This can happen for two reasons, either there is some other module already loaded and there is no space for loading it at the base address contained in the headers, or because of ASLR (Address Space Layout Randomization) which mean modules are loaded at random slots for exploit mitigation purposes.
To address the first question (it is related to the second one):
The way a memory location is refered to can be relative or absolute. Usually jumps and function calls are relative (though they can be absolute), which say: "go this many bytes from the current instruction pointer". Regardless of where the module is loaded, relative jumps and calls will work.
When it comes to addressing data, they are usually absolute references, that is, "access these 4-byte datum at this address". And a full virtual address is specified, not an RVA but a VA.
If a module is not loaded at its base address, absolute references will all be broken, they are no longer pointing to the correct place the linker assumed they should point to. Let's say the ImageBase is 0x04000000 and you have a variable at RVA 0x000000F4, the VA will be 0x040000F4. Now imagine the module is loaded not at its BaseAddress, but at 0x05000000, everything is moved 0x1000 bytes forward, so the VA of your variable is actually 0x050000F4, but the machine code that accessess the data still has the old address hardcoded, so the program is corrupted. In order to fix this, linkers store in the executable where these absolute references are, so they can be fixed by adding to them how much the executable has been displaced: the delta offset, the difference between where the image is loaded and the image base contained in the headers of the executable file. In this case it's 0x1000. This process is called Base Relocation and is performed at load time by the operating system: before the code starts executing.
Sometimes a module has no relocations, so it can't be loaded anywhere else but at its base address. See How do I determine if an EXE (or DLL) participate in ASLR, i.e. is relocatable?
For more information on ASLR: https://insights.sei.cmu.edu/cert/2014/02/differences-between-aslr-on-windows-and-linux.html
There is another way to move the executable in memory and still have it run correctly. There exists something called Position Independent Code. Code crafted in such a way that it will run anywhere in memory without the need for the loader to perform base relocations.
This is very common in Linux shared libraries and it is done addressing data relatively (access this data item at this distance from the instruction pointer).
To do this, in the x64 architecture there is RIP-relative addressing, in x86 a trick is used to emulate it: get the content of the instruction pointer and then calculate the VA of a variable by adding to it a constant offset.
This is very well explained here:
https://www.technovelty.org/linux/plt-and-got-the-key-to-code-sharing-and-dynamic-libraries.html
I don't think PIC code is common in Windows, more often than not, Windows modules contain base relocations to fix absolute addresses when it is loaded somewhere else than its prefered base address, although I'm not exactly sure of this last paragraph so take it with a grain of salt.
More info:
http://opensecuritytraining.info/LifeOfBinaries.html
How are windows DLL actually shared? (a bit confusing because I didn't explain myself well when asking the question).
https://www.iecc.com/linker/
I hope I've helped :)

Why ELF executables have a fixed load address?

ELF executables have a fixed load address (0x804800 on 32-bit x86 Linux binaries, and 0x40000 on 64-bit x86_64 binaries).
I read the SO answers (e.g., this one) about the historical reasons for those specific addresses. What I still don't understand is why to use a fixed load address and not a randomized one (given some range to be randomized within)?
why to use a fixed load address and not a randomized one
Traditionally that's how executables worked. If you want a randomized load address, build a PIE binary (which is really a special case of shared library that has startup code in it) with -fPIE and link with -pie flags.
Building with -fPIE introduces runtime overhead, in some cases as bad as 10% performance degradation, which may not be tolerable if you have a large cluster or you need every last bit of performance.
not sure if I understood your question correct, but saying I did, that's sort-off a "legacy" / historical issue, ELF is the file format used by UNIX derived operating systems, both POSIX (IOS) and Unix-like (Linux).
and the elf format simply states that there must be some resolved and absolute virtual address that the code is loaded into and begins running from...
and simply that's how the file format is, and during to historical reasons that cant be changed... you couldn't just "throw" the executable in any memory address and have it run successfully, back in the 90's when the ELF format was introduced problems such as calling functions with virtual tables we're raised and it was decided that the elf format would have absolute addresses within it.
Also think about it, take a look at the elf format -https://en.wikipedia.org/wiki/Executable_and_Linkable_Format
how would you design an OS executable-loader that would be able to handle an executable load it to ANY desired virtual address and have the code run successfully without actually having to change the binary itself... if you would like to do something like that you'd either need to vastly change the output compilers generate or the format itself, which again isn't possible
As time passed the requirement of position independent executing (PIE/PIC) has raised and shared objects we're introduced in order to allow that and ASLR
(Address Space Layout Randomization) - which means that the code could be thrown in any memory address and still be able to execute, that is simply implemented by making sure that all calls within the code itself are relative to the current address of the executed instruction, AND that when the shared object is loaded the OS loader would have to change some data within the binary given that the data changed is not executable instructions (R E) but actual data (RW, e.g .data segment), which also is implemented by calling functions from some "Jump tables" ( which would be changed at load time ) for example PLT / GOT.... those shared objects allow absolute randomization of the addresses the code is loaded to and if you want to execute some more "secure" code you'd have to compile it as a shared object and and dynamically link it and load time or run time..
( hope I've cleared some things out :) )

Relocation of PE DLLs - Load-time or like ELF?

It's my understanding (mainly from Wikipedia's article on the Portable Executable format), that Windows DLLs don't use position-independent code and instead have a link-time-defined preferred base address. In the event that two libraries' base addresses conflict, though, one needs to be relocated via its relocation table.
Is this PE relocation similar to ELF's GOT and PLT (process-local tables in the .data sections that require each absolute address to go through indirection), or is it more like a dynamic-relocation (at load-time all absolute addresses are translated)? If the latter, does this have problems on x64?
The situation is different between WIN32 and WIN64.
For WIN32 images where relocation information is present (non-EXEs, typically), all absolute addresses in the binary code each have a corresponding fixup record so that the address can be patched up by the loader in case the module's preferred load address has already been taken by something else.
For WIN64 images, the situation is similar in principle, but in reality nearly all 64-bit instructions actually use a position-independent encoding where offsets are IP-relative and not absolute, so far fewer relocation fixups are necessary (if at all).

The ImageBase's Address of Excutable file

Why does the ImageBase is difference between CFF and OllyDbg?
CFF is showing you static analysis, meaning it's just reading the file from disk not debugging it at runtime, showing you the "Preferred ImageBase" from the PE header, this is where the binary wants to be loaded at runtime.
When you do dynamic analysis such as with ollydbg it's showing you the values in the PE header at runtime, ImageBase will hold the current ImageBase, not the preferred one from disk. But when you read DllCharacteristics from the optional PE header you will see it is set to 0x8140.
The 0x0040 = IMAGE_DLLCHARACTERISTICS_DYNAMIC_BASE meaning Address Space Layout Randomisation is enabled, causing the Windows loader to load modules into random addresses in effort to make exploits more difficult because hard coded jmps and addresses will not be accurate.
The exploit developers now need to calculate everything at runtime, well unless you use Windows 8 or 10 because a bug has been causing ASLR to not work correctly...

Resources