Windows PE detemine entry point virtual address - windows

I am examining a Windows executable with 'PE Editor' which displays the Entry point as 0x15B8, how do we determine this entry point's address as a virtual address?

The entry point is stored relative to the load address of the module.
The module can state its preferred address by setting the ImageBase field in the IMAGE_OPTIONAL_HEADER (see this page). However, the OS is free to select another address, either because the preferred address is in use, or, these days, because of ASLR.
I'm not sure what environment you're running this on, but if you're doing this with a live running program: It's an implementation detail, but on NT you can cast an HMODULE into a pointer and that is the load address of the module. You can also read PE headers based on that. So for example you can add the AddressOfEntryPoint member to the address of the HMODULE and find an entry point... If rather than load time info, you'd like something like a byte offset into the file, you'll have to parse the section headers to find where in the file it goes.

Related

What is there before ImageBase address in Virtual Address?

I know from the Microsoft documentation that the image base is set to 0x140000000 for 64-bit images and it is the base address where the executable file is first loaded into the memory.
So my questions are as follows
What comes before 0x140000000 address and starting of virtual address first page (0x0000000)
What does it mean by executable first loaded? Is it the entry point of the program (which is of course not the main function) or something else
What comes before 0x140000000 address and starting of virtual address first page (0x0000000)
Whatever happens to allocate there, like DLLs, file mappings, heap memory, or this memory can be free. The first page is always inaccessible.
What does it mean by executable first loaded? Is it the entry point of the program (which is of course not the main function) or something else
Loaded means mapped into memory. After it is mapped into memory, its imports are resolved, statically linked DLLs are mapped into memory, their entry points are executed, and only then it comes to the executable entry point. Executable entry point is not really the first function to execute from the executable if it has TLS callbacks.
I don't know the technical reason why the 64-bit default is so high, perhaps just to make sure your app does not have 32-bit pointer truncation bugs with data/code in the module? And it is important to note that this default comes from the Microsoft compiler, Windows itself will accept a lower value. The default for 32-bit applications is 0x00400000 and there are actual hardware and technical reasons for that.
The first page starting at 0 is off limits in most operating systems to prevent issues with de-referencing a NULL pointer. The first couple of megabytes might have BIOS/firmware or other legacy things mapped there.
By first loaded, it means the loader will map the file into memory starting at that address. First the MZ part (DOS header and stub code) and the PE header. After this comes the various sections listed in the PE header.
Most applications are using ASLR these days so the base address will be random and not the preferred address listed in the PE. ntdll and kernel32 are mapped before the exe so if you choose their base address you will also be relocated.

Why relocations (.reloc section) in executable file?

I wonder why some Windows executables do have relocations. Why is there a need for it when an executable always can be loaded at any virtual address, unlike a DLL?
yes, relocation in EXE is optional and can be stripped. but if we want /DYNAMICBASE - generate an executable image that can be randomly rebased at load time by using the address space layout randomization (ASLR) - we need relocs. so this i be say only for security reasons. like security cookies in stack, Control Flow Guard and etc.. - all this is optional but used
that because of pointers and address reference look to this code :
int i;
int *ptr = &i;
If the linker assumed an image base of 0x10000, the address of the variable i will end up containing something like 0x12004. At the memory used to hold the pointer "ptr", the linker will have written out 0x12004, since that's the address of the variable i. If the loader for whatever reason decided to load the file at a base address of 0x70000, the address of i would be 0x72004. The .reloc section is a list of places in the image where the difference between the linker assumed load address and the actual load address needs to be factored in.

Low-level details on linking and loading of (PE) programs in Windows

Low-level details on linking and loading of (PE) programs in Windows.
I'm looking for an answer or tutorial that clarifies how a Windows program are linked and loaded into memory after it has been assembled.
Especially, I'm uncertain about the following points:
After the program is assembled, some instructions may reference memory within the .DATA section. How are these references translated, when the program is loaded into memory starting at some arbitrary address? Does RVA's and relative memory references take care of these issues (BaseOfCode and BaseOfData RVA-fields of the PE-header)?
Is the program always loaded at the address specified in ImageBase header field? What if a loaded (DLL) module specifies the same base?
First I'm going to answer your second question:
No, a module (being an exe or dll) is not allways loaded at the base address. This can happen for two reasons, either there is some other module already loaded and there is no space for loading it at the base address contained in the headers, or because of ASLR (Address Space Layout Randomization) which mean modules are loaded at random slots for exploit mitigation purposes.
To address the first question (it is related to the second one):
The way a memory location is refered to can be relative or absolute. Usually jumps and function calls are relative (though they can be absolute), which say: "go this many bytes from the current instruction pointer". Regardless of where the module is loaded, relative jumps and calls will work.
When it comes to addressing data, they are usually absolute references, that is, "access these 4-byte datum at this address". And a full virtual address is specified, not an RVA but a VA.
If a module is not loaded at its base address, absolute references will all be broken, they are no longer pointing to the correct place the linker assumed they should point to. Let's say the ImageBase is 0x04000000 and you have a variable at RVA 0x000000F4, the VA will be 0x040000F4. Now imagine the module is loaded not at its BaseAddress, but at 0x05000000, everything is moved 0x1000 bytes forward, so the VA of your variable is actually 0x050000F4, but the machine code that accessess the data still has the old address hardcoded, so the program is corrupted. In order to fix this, linkers store in the executable where these absolute references are, so they can be fixed by adding to them how much the executable has been displaced: the delta offset, the difference between where the image is loaded and the image base contained in the headers of the executable file. In this case it's 0x1000. This process is called Base Relocation and is performed at load time by the operating system: before the code starts executing.
Sometimes a module has no relocations, so it can't be loaded anywhere else but at its base address. See How do I determine if an EXE (or DLL) participate in ASLR, i.e. is relocatable?
For more information on ASLR: https://insights.sei.cmu.edu/cert/2014/02/differences-between-aslr-on-windows-and-linux.html
There is another way to move the executable in memory and still have it run correctly. There exists something called Position Independent Code. Code crafted in such a way that it will run anywhere in memory without the need for the loader to perform base relocations.
This is very common in Linux shared libraries and it is done addressing data relatively (access this data item at this distance from the instruction pointer).
To do this, in the x64 architecture there is RIP-relative addressing, in x86 a trick is used to emulate it: get the content of the instruction pointer and then calculate the VA of a variable by adding to it a constant offset.
This is very well explained here:
https://www.technovelty.org/linux/plt-and-got-the-key-to-code-sharing-and-dynamic-libraries.html
I don't think PIC code is common in Windows, more often than not, Windows modules contain base relocations to fix absolute addresses when it is loaded somewhere else than its prefered base address, although I'm not exactly sure of this last paragraph so take it with a grain of salt.
More info:
http://opensecuritytraining.info/LifeOfBinaries.html
How are windows DLL actually shared? (a bit confusing because I didn't explain myself well when asking the question).
https://www.iecc.com/linker/
I hope I've helped :)

How to get relocated base address in PE Files?

I'm trying to make Simple PE Packer. My PE Viewer show me base address 0x40000000, but OllyDbg show me 0x01900400 or other address.
I guess that it is address relocation.
how to get relocated address ?
what do make packer simple sequence ?
A PE file has a preferred base address. If you're writing a PE Viewer, then it sounds like it will analyze the PE file only. This is a static analysis, so you'll only get the preferred base address, which is 0x40000000.
OllyDbg is a debugger, which is a totally different thing than a PE Viewer. A debugger performs a dynamic analysis at runtime. At runtime, the PE file might have been loaded to a different address, since the preferred address was already used.
So, in my opinion, your PE Viewer program does what it should do - except if you wanted to write a debugger.
Thomas has already explained that base address is just a preferred address, it does not guarantee you that file will load on that address only.
Still, in most of the cases, it should be 400000. If you are using Windows XP, this condition is satisfied most of the times. But from Windows Vista and Windows 7, a new concept known as ASLR was introduced.
When you see some other address while the file is loaded in debugger, it is because of ASLR(Address Space Randomization).
What is ASLR?
Address Space Layout Randomization calculates the address of PE files in memory based on the processor's timestamp counter.
Formula = ([SHR4(Timestamp Counter) mod 254] + 1)*64KB
**need confirmation for formula
Why ASLR?
The main motto behind ASL was to stop malware authors from using various flaws of memory structures like buffer overflow, etc. The randomly arranged memory structures and modules made the guessing of memory addresses(where they wished to put malicious code) difficult for them.
Now, coming back to your question: How to get relocated address:
If you can find the CPU timestamp, which I doubt is possible, you can calculate the base location of executable.How to bypass ASLR
Otherwise, you cannot retrieve this address(for Windows Vista and above) after ASLR from a PE file structure.
Also, you can refer to this:
https://security.stackexchange.com/questions/18556/how-do-aslr-and-dep-work

About the entry point of PE in Windows

Is it always at the lowest address of code section?
No, not necessarily. The PE entry point is defined in the IMAGE_OPTIONAL_HEADER structure, in the AddressOfEntryPoint field:
A pointer to the entry point function, relative to the image base address. For executable files, this is the starting address. For device drivers, this is the address of the initialization function. The entry point function is optional for DLLs. When no entry point is present, this member is zero.
A linker can set this to be whatever it wants to be, as long as its a valid relative virtual offset into the PE. Some compilers and linkers might have the convention of putting the entry point at the beginning of the text/code section, but there's no OS or PE format requirement for it.

Resources