How can i calculate the file offset of the memory virtual address of the export table? - portable-executable

so, i was trying to read a DLL file, everything was fine till i reach the Optional Header Data Directories, specifically its first member, the Export Table.
My problem is that i can't move the offset of my reader because the virtual address member is based on memory VA, and my reader is based on file offset. May a visual example helps:
As you can see, the loaded virtual address that this PE viewer reads at the Export Table Address from the Data Directory(Optional Header) is the value 0x00002630(lets refer to it as hex1 from now on).
However, when i click on the Export Table to see the actual content, the program does the conversion of this address from memory to file offset, redirecting me to this address as the result:
The address that it redirects me is the 0x00001a30(lets refer to it as hex2 from now on).
I did some tests on my own like dividing the hex1 per 8 because i thought it could be the transition from memory alignment which is 4096 and the file alignment which is 512 but it didn't gave me the same result as hex2. I also did some weird stuff to try to get that formula, but it gave me even more bizarre results.
So, my question would be, how could i get/calculate that file offset(hex2) if i only know the memory offset at the Data Directory(hex1)?

Assuming you are using MSVC C/C++, you first need to locate the array of IMAGE_SECTION_HEADER structures following the Optional Header. The SDK has a macro called IMAGE_FIRST_SECTION(pNtHeaders) in which you just pass the pointer of your PE header to make this process easier. It basically just skips past the optional header in memory which is where the section headers start. This macro will also work on either 32-bit or 64-bit Windows PE files.
Once you have the address of the IMAGE_SECTION_HEADER array, you loop through the structures up to FileHeader.NumberOfSections using pointer math. Each of the structures describe the relative starting of a memory address (VirtualAddress) for each of the named PE sections along with the file offset (PointerToRawData) to that section within the file you have loaded.
The size of the section WITHIN the file is SizeOfRawData. At this point, you now have everything you need to translate any given RVA to a file offset. First range check each IMAGE_SECTION_HEADER's VirtualAddress with the RVA you are looking up. I.e.:
if (uRva >= pSect->VirtualAddress && (uRva < (pSect->VirtualAddress + pSect->SizeOfRawData))
{
//found
}
If you find a matching section, you then subtract the VirtualAddress from your lookup RVA, then add the PointerToRawData offset:
uFileOffset = uRva - pSect->VirtualAddress + pSect->PointerToRawData
This results in an offset from the beginning of the file corresponding to that RVA. At this point you have translated the RVA to a file offset.
NOTE: Due to padding, incorrect PE files, etc., you may find not all RVAs will map to a location within the file at which point you might display an error message.

Related

What parts of a PE file are mapped into memory by the MS loader?

What parts of a PE file are mapped into memory by the MS loader?
From the PE documentation, I can deduce the typical format of a PE executable (see below).
I know, by inspection, that all contents of the PE file, up to and including the section headers, gets mapped into memory exactly as stored on disk.
What happens next?
Is the remainder of the file also mapped (here I refer to the Image Pages part in the picture below), so that the whole file is in memory exactly like stored on disk, or is the loader more selective than that?
In the documentation, I've found the following snippet:
Another exception is that attribute certificate and debug information
must be placed at the very end of an image file, with the attribute
certificate table immediately preceding the debug section, because the
loader does not map these into memory. The rule about attribute
certificate and debug information does not apply to object files,
however.
This is really all I can find about loader behavior; it just says that these two parts must be placed last in the file, since they don't go into memory.
But, if the loader loads everything except these two parts, and I set the section RVA's suffiently high, then section data will actually be duplicated in memory (once in the mapped file and once for the position specified by the RVA)?
If possible, link to places where I can read further about loading specific to MS Windows.
Finding this information is like an egg hunt, because MS always insists on using its own terminology when the COFF description uses AT&T terms.
What parts of a PE file are mapped into memory by the MS loader?
It depends.
All sections covered by a section header are mapped into the run-time address space.
However sections that have an RVA of 0 are not mapped and thus never loaded.
Each debug directory entry identifies the location and size of a block of debug information. The RVA specified may be 0 if the debug information is not covered by a section header (i.e., it resides in the image file and is not mapped into the run-time address space). If it is mapped, the RVA is its address.
Memory contains an exact replica of the file on disk.
Note that executables and dll's are mapped into virtual memory, not physical!
As you access the executable parts of it are swapped into RAM as needed.
If a section is not accessed then it obviously does not get swapped into physical RAM, it is however still mapped into virtual memory.
You can read up on everything you might ever want to know about PE files (and more) on MSDN.
Your quote is lifted from the documentation of the COFF file format.
The critical part is:
The rule on attribute certificate and debug information does not apply to object files.
From: https://support.microsoft.com/en-us/kb/121460
Size: Size of the optional header, which is included for executable files but not object files. An object file should have a value of 0 here.
Ergo: executable files or not object files, they are image files.
as such the exception to the rule does not apply to them.

IDA Pro disassembly shows ? instead of hex or plain ascii in .data?

I am using IDA Pro to disassemble a Windows DLL file. At one point I have a line of code saying
mov esi, dword_xxxxxxxx
I need to know what the dword is, but double-clicking it brings me to the .data page and everything is in question marks.
How do I get the plain text that is supposed to be there?
If you see question marks in IDA, this means that there's no physical data at this location on the file (on your disk drive).
Sections in PE files have a physical size (given by the SizeOfRawData field of the section header). This physical size (on disk) might be different from the size of the section once it is mapped onto the process memory by the Windows' loader (this size is given by the VirtualSize field of the section header).
So, if the VirtualSize field is bigger than the SizeOfRawData field, a part of the section has no physical existence and it exists only in memory (once the file is mapped onto the process address space).
On most case, at program entry point, you can assume this memory is filled with 0 (but some parts of the memory might be written by the windows loader).
To get the locations where the data is being written, read or loaded you can use cross-references (xref). Here's an example :
Click on the name of the data from which you want the xref :
Then press 'x', you'll be shown all known (to ida) location where the data is used :
The second column indicates how the data is used:
r means it is read
w means it is written
o means it is loaded as a pointer

How do I make space for my code cave in a Windows PE 32bit executable

So I want to make a space for my code caves in minesweeper.exe (typical Windows XP minesweeper game, link: Minesweeper). So I modified the PE header of the file via CFF Explorer to increase size of the .text section.
I tried increasing raw size of .text segment by 1000h (new size was 3B58), but Windows was unable to locate the entry point and the game failed to launch. Then I tried increasing the size of the .rsrc section, adding a new section, increasing the image size, but none of those attempts were successful, Windows was saying that "This is not x32 executable".
So here is the question: how do I make space for my code cave? I don't want to search for empty space left by the compiler, I want to have nice and clean 1000h bytes for my code. A tutorial for that and a detailed explanation for how to do that without corrupting a game would be GREAT! (And yes, I am actually hacking a minesweeper)
You can't increase the size of a section without invalidating the following ones (typically because it invalidates offsets and addresses in those sections). This remains possible but it's extremely error prone and doesn't worth the hassle when you have a simpler solution.
Typically, you juste need to add a section at the end of the PE and jump there from the code section. There is usually a little bit of space at the end of the code section (code cave) so you can place your JMPs (or a little code stub) there to redirect to the new section. You can also add other new sections for data or new resources or whatever you want.
Note: I'm using two tools: CFF explorer as a PE browser; an hex editor.
This file is quite particular so it is a little bit harder than usual to add a new section.
Let's start!
Below is an hex view of the array of IMAGE_SECTION_HEADER:
Usually there is some room to add a new section but in this particular case, there's none... The last section header is immediately followed by something.
Judging by the content, this is probably a bound import directory, which is confirmed in CFF explorer (offset of the bound directory is 0x248):
Bound import directory are of no use today, especially with ASLR, so we can zero out the whole directory (its size is 0xA8 bytes as indicated in the previous screenshot):
You can also zero out the Bound Import directory RVA in the Data Directories although this is not strictly required:
Now, it's time to add the new section.
Add a new section
Minesweeper comes with 3 sections by default, so increment the Number of sections from 3 to 4:
Go to the sections headers and add a new section (you can do it directly in CFF explorer; I named mine, .foobar, be wary that section names are at most 8 characters and don't need to end with a NULL byte):
You need to choose two numbers:
The raw size of the new section (I picked 0x400) ; it must be a multiple of FileAlignment (which is 0x200 in this case).
The virtual size of the new section (I picked 0x1000); it must be a multiple of SectionAlignement (which is 0x1000 for this binary).
Now we" need to calculate the two other members, Virtual Address and Raw Address.
Virtual Address
Let's take an example with the first and second section.
The first section starts at virtual address 0x1000 and has a virtual Size of 0x3A56. The next section virtual address must be aligned on SectionAlignement (0x1000) so the calculation is (using python here):
>>> def round_up_multiple_of(number, multiple):
num = number + (multiple - 1)
return num - (num % multiple)
>>> hex(round_up_multiple_of(0x1000 + 0x3a56, 0x1000))
'0x5000'
Which gives 0x5000 which is right (.data section starts at virtual address 0x5000).
Now, where our last section should start?
.rsrc section starts at 0x6000 and has a size of 0x19160:
>>> hex(round_up_multiple_of(0x6000 + 0x19160, 0x1000))
'0x20000'
So it must start at virtual address 0x20000. Put that number in Virtual Address.
Raw address
(Typically this is not needed as all sections are already aligned the last section must start right at the end of the file, but we'll do it).
As a reminder, the raw address is an address in the file (not in memory).
Let's start with an example (first and second section):
The first section raw address is 0x400 and its raw size 0x3c00. FileAlignement is 0x200, thus:
>>> hex(round_up_multiple_of(0x400 + 0x3c00, 0x200))
'0x4000'
The second section should start on the file (its Raw address) at 0x4000 which is right.
Thus for our new section, the calculation is:
.rsrc section starts in the file at 0x4200
.rsrc section size on file is 0x19200
FileAligment is 0x200
The calculation is as follow:
>>> hex(round_up_multiple_of(0x4200 + 0x19200, 0x200))
'0x1d400'
Our last section starts at the raw address 0x1d400 in the file which is confirmed with an hex editor:
Final steps
One last step is required, the calculation of the SizeOfImage field in the Optional header. According to the PE specification the field is:
The size (in bytes) of the image, including all headers, as the image
is loaded in memory. It must be a multiple of SectionAlignment.
Hence the calculation can be simplified as: VirtualAddress + VirtualSize of the last section, aligned on SectionAlignment (0x1000):
>>> hex(round_up_multiple_of(0x20000 + 0x1000, 0x1000))
'0x21000'
Now, save all your modifications in CFF explorer and exit.
Adding room for the new section
The last step is to add the required bytes for the last section. As I choose a Raw size of 0x400, I insert 0x400 bytes at Raw Address (0x1d400) with an hex editor.
Save you file. If you followed all the steps it must work (tested on Win 10) as is and you can start the modified executable without any errors.
Try to experience with a different raw size for the new section if 0x400 is not enough.
Now you have a new empty section, the rest is up to you for modifying the code :)

How does a PE file get mapped into memory?

So I have been reasearching the PE format for the last couple days, and I still have a couple of questions
Does the data section get mapped into the process' memory, or does the program read it from the disk?
If it does get mapped into its memory, how can the process aqquire the offset of the section? ( And other sections )
Is there any way the get the entry point of a process that has already been mapped into the memory, without touching the file on disk?
Does the data section get mapped into the process' memory
Yes. That's unlikely to survive for very long, the program is apt to write to that section. Which triggers a copy-on-write page copy that gets the page backed by the paging file instead of the PE file.
how can the process aqquire the offset of the section?
The linker already calculated the offsets of variables in the section. It might be relocated, common for DLLs that have an awkward base address that's already in use when the DLL gets loaded. In which case the relocation table in the PE file is used by the loader to patch the addresses in the code. The pages that contain such patched code get the same treatment as the data section, they are no longer backed by the PE file and cannot be shared between processes.
Is there any way the get the entry point of a process
The entire PE file gets mapped to memory, including its headers. So you can certainly read IMAGE_OPTIONAL_HEADER.AddressOfEntryPoint from memory without reading the file. Do keep in mind that it is painful if you do this for another process since you don't have direct access to its virtual address space. You'd have to use ReadProcessMemory(), that's fairly little joy and unlikely to be faster than reading the file. The file is pretty likely to be present in the file system cache. The Address Space Layout Randomization feature is apt to give you a headache, designed to make it hard to do these kind of things.
Does the data section get mapped into the process' memory, or does the program read it from the disk?
It's mapped into process' memory.
If it does get mapped into its memory, how can the process aqquire the offset of the section? ( And other sections )
By means of a relocation table: every reference to a global object (data or function) from the executable code, that uses direct addressing, has an entry in this table so that the loader patches the code, fixing the original offset. Note that you can make a PE file without relocation section, in which case all data and code sections have a fixed offset, and the executable has a fixed entry point.
Is there any way the get the entry point of a process that has already been mapped into the memory, without touching the file on disk?
Not sure, but if by "not touching" you mean not even reading the file, then you may figure it out by walking up the stack.
Yes, all sections that are described in the PE header get mapped into memory. The IMAGE_SECTION_HEADER struct tells the loader how to map it (the section can for example be much bigger in memory than on disk).
I'm not quite sure if I understand what you are asking. Do you mean how does code from the code section know where to access data in the data section? If the module loads at the preferred load address then the addresses that are generated statically by the linker are correct, otherwise the loader fixes the addresses with relocation information.
Yes, the windows loader also loads the PE Header into memory at the base address of the module. There you can file all the info that was in the file PE header - also the Entry Point.
I can recommend this article for everything about the PE format, especially on relocations.
Does the data section get mapped into the process' memory, or does the
program read it from the disk?
Yes, everything before execution by the dynamic loader of operating systems either Windows or Linux must be mapped into memory.
If it does get mapped into its memory, how can the process acquire the
offset of the section? ( And other sections )
PE file has a well-defined structure which loader use that information and also parse that information to acquire the relative virtual address of sections around ImageBase. Also, if ASLR - Address randomization feature - was activated on the system, the loader has to use relocation information to resolve those offsets.
Is there any way the get the entry point of a process that has already
been mapped into the memory, without touching the file on disk?
NOPE, the loader of the operating system for calculation of OEP uses ImageBase + EntryPoint member values of the optional header structure and in some particular places when Address randomization is enabled, It uses relocation table to resolve all addresses. So we can't do anything without parsing of PE file on the disk.

Import Table in PE (.exe)

I found the pointer for the "Import Table" field. Which is 8 bytes in size and is divided into Virtual Address and Size. However the value in Virtual Address field is to big and is misleading my efforts to extract any information relating the whereabouts for entries relating to the Import Table. Is the value pointing to the offset, if so the (.exe) file finishes before reaching the desired offset.
The RVA (relative virtual address) of the Import Directory in the Directory Table must be valid. Perhaps your conversion of it into a physical offset is malfunctioning. Of course, that is done by traversing the section table to find the containing section. Then subtract that section's starting RVA from the target RVA. Then simply add the physical offset of the section to this result. That will give you the position within the file of the Import Directory. Conversions to and from RVAs to physical offsets may be necessary often if you are working with the on-disk file. If working with the in-memory image, sometimes protection utilities modify or destroy parts of the PE header in memory to deter dumping.
Once you get to the Import Directory you still have more work to do. A quick Google showed a better explanation than I'm likely to write here: http://sandsprite.com/CodeStuff/Understanding_imports.html
I am the author of the now 'classic' PECompact, PEBundle (now discontinued), and other PE manipulation utilities.

Resources