Loading of ELF executable - loading

On Pages 2-7 and 2-8 of the specification of the ELF file format there are two pictures giving an example of an executable's program headers and how they are going to be loaded into memory:
The specification explains:
Although the example’s file offsets and virtual addresses are
congruent modulo 4 KB for both text and data, up to four file pages
hold impure text or data (depending on page size and file system block
size).
The first text page contains the ELF header, the program header table, and other information.
The last text page holds a copy of the beginning of data.
The first data page has a copy of the end of text.
The last data page may contain file information not relevant to the running process.
My questions are:
What does the i-th "text page" and "data page" mean?
What do the 2nd and 3rd items in the above four statements mean?
Why does the data padding appears right after the text segment, while the text padding appears before the data segment, making an interleaved layout?
What if the executable has more than two segments (other than text and data) to load?

A page is the smallest mappable unit of virtual memory. If you are not familiar with the basics, see the wikipedia article on virtual memory. On common systems, the size of a page is 4096 bytes, or 0x1000 in hex.
A "text page" contain executable code; a "data page" contains data. These must be mapped at fixed addresses so that offsets in the code are correct. In shared libraries or position-independent-executables, the exact virtual addresses are no longer specified but their relative position is. In this example, the 0th text page goes from 0x8048000 to 0x8048fff, which is before the start of the text segment (at 0x8048100). The 1th text page goes from 0x8049000 to 0x8049fff. The last text page goes from 0x8073000 to 0x8073fff, which goes beyond the end of the text segment (at 0x8073eff).
The first data page is at 0x8074000, but the data segment doesn't start until 0x8074f00. This page is backed by the same part of the file as the last text page, but has to be mapped separately because it has different permissions (PROT_EXEC|PROT_READ vs PROT_READ). This is what it means by "copy of the beginning of the data" / "copy of the end of the text".
If there are more than two segments, the are loaded exactly the same. "text" and "data" are completely arbitrary, what matters are the flags and addresses specified for each segment. You can view this information with readelf or objdump.
Note that in the real world, there is usually an unmapped space (a "hole") between the text and data segments, though not necessarily between read-only-data and read-write-data or initialized vs uninitialized data.
For example, running cat /proc/self/maps gives me:
ben#joyplim ~ % cat /proc/self/maps
00400000-0040c000 r-xp 00000000 fe:01 36176026 /bin/cat
0060b000-0060c000 r--p 0000b000 fe:01 36176026 /bin/cat
0060c000-0060d000 rw-p 0000c000 fe:01 36176026 /bin/cat
<plus the heap, stack, library, and special kernel stuff>

Related

How can i calculate the file offset of the memory virtual address of the export table?

so, i was trying to read a DLL file, everything was fine till i reach the Optional Header Data Directories, specifically its first member, the Export Table.
My problem is that i can't move the offset of my reader because the virtual address member is based on memory VA, and my reader is based on file offset. May a visual example helps:
As you can see, the loaded virtual address that this PE viewer reads at the Export Table Address from the Data Directory(Optional Header) is the value 0x00002630(lets refer to it as hex1 from now on).
However, when i click on the Export Table to see the actual content, the program does the conversion of this address from memory to file offset, redirecting me to this address as the result:
The address that it redirects me is the 0x00001a30(lets refer to it as hex2 from now on).
I did some tests on my own like dividing the hex1 per 8 because i thought it could be the transition from memory alignment which is 4096 and the file alignment which is 512 but it didn't gave me the same result as hex2. I also did some weird stuff to try to get that formula, but it gave me even more bizarre results.
So, my question would be, how could i get/calculate that file offset(hex2) if i only know the memory offset at the Data Directory(hex1)?
Assuming you are using MSVC C/C++, you first need to locate the array of IMAGE_SECTION_HEADER structures following the Optional Header. The SDK has a macro called IMAGE_FIRST_SECTION(pNtHeaders) in which you just pass the pointer of your PE header to make this process easier. It basically just skips past the optional header in memory which is where the section headers start. This macro will also work on either 32-bit or 64-bit Windows PE files.
Once you have the address of the IMAGE_SECTION_HEADER array, you loop through the structures up to FileHeader.NumberOfSections using pointer math. Each of the structures describe the relative starting of a memory address (VirtualAddress) for each of the named PE sections along with the file offset (PointerToRawData) to that section within the file you have loaded.
The size of the section WITHIN the file is SizeOfRawData. At this point, you now have everything you need to translate any given RVA to a file offset. First range check each IMAGE_SECTION_HEADER's VirtualAddress with the RVA you are looking up. I.e.:
if (uRva >= pSect->VirtualAddress && (uRva < (pSect->VirtualAddress + pSect->SizeOfRawData))
{
//found
}
If you find a matching section, you then subtract the VirtualAddress from your lookup RVA, then add the PointerToRawData offset:
uFileOffset = uRva - pSect->VirtualAddress + pSect->PointerToRawData
This results in an offset from the beginning of the file corresponding to that RVA. At this point you have translated the RVA to a file offset.
NOTE: Due to padding, incorrect PE files, etc., you may find not all RVAs will map to a location within the file at which point you might display an error message.

What parts of a PE file are mapped into memory by the MS loader?

What parts of a PE file are mapped into memory by the MS loader?
From the PE documentation, I can deduce the typical format of a PE executable (see below).
I know, by inspection, that all contents of the PE file, up to and including the section headers, gets mapped into memory exactly as stored on disk.
What happens next?
Is the remainder of the file also mapped (here I refer to the Image Pages part in the picture below), so that the whole file is in memory exactly like stored on disk, or is the loader more selective than that?
In the documentation, I've found the following snippet:
Another exception is that attribute certificate and debug information
must be placed at the very end of an image file, with the attribute
certificate table immediately preceding the debug section, because the
loader does not map these into memory. The rule about attribute
certificate and debug information does not apply to object files,
however.
This is really all I can find about loader behavior; it just says that these two parts must be placed last in the file, since they don't go into memory.
But, if the loader loads everything except these two parts, and I set the section RVA's suffiently high, then section data will actually be duplicated in memory (once in the mapped file and once for the position specified by the RVA)?
If possible, link to places where I can read further about loading specific to MS Windows.
Finding this information is like an egg hunt, because MS always insists on using its own terminology when the COFF description uses AT&T terms.
What parts of a PE file are mapped into memory by the MS loader?
It depends.
All sections covered by a section header are mapped into the run-time address space.
However sections that have an RVA of 0 are not mapped and thus never loaded.
Each debug directory entry identifies the location and size of a block of debug information. The RVA specified may be 0 if the debug information is not covered by a section header (i.e., it resides in the image file and is not mapped into the run-time address space). If it is mapped, the RVA is its address.
Memory contains an exact replica of the file on disk.
Note that executables and dll's are mapped into virtual memory, not physical!
As you access the executable parts of it are swapped into RAM as needed.
If a section is not accessed then it obviously does not get swapped into physical RAM, it is however still mapped into virtual memory.
You can read up on everything you might ever want to know about PE files (and more) on MSDN.
Your quote is lifted from the documentation of the COFF file format.
The critical part is:
The rule on attribute certificate and debug information does not apply to object files.
From: https://support.microsoft.com/en-us/kb/121460
Size: Size of the optional header, which is included for executable files but not object files. An object file should have a value of 0 here.
Ergo: executable files or not object files, they are image files.
as such the exception to the rule does not apply to them.

IDA Pro disassembly shows ? instead of hex or plain ascii in .data?

I am using IDA Pro to disassemble a Windows DLL file. At one point I have a line of code saying
mov esi, dword_xxxxxxxx
I need to know what the dword is, but double-clicking it brings me to the .data page and everything is in question marks.
How do I get the plain text that is supposed to be there?
If you see question marks in IDA, this means that there's no physical data at this location on the file (on your disk drive).
Sections in PE files have a physical size (given by the SizeOfRawData field of the section header). This physical size (on disk) might be different from the size of the section once it is mapped onto the process memory by the Windows' loader (this size is given by the VirtualSize field of the section header).
So, if the VirtualSize field is bigger than the SizeOfRawData field, a part of the section has no physical existence and it exists only in memory (once the file is mapped onto the process address space).
On most case, at program entry point, you can assume this memory is filled with 0 (but some parts of the memory might be written by the windows loader).
To get the locations where the data is being written, read or loaded you can use cross-references (xref). Here's an example :
Click on the name of the data from which you want the xref :
Then press 'x', you'll be shown all known (to ida) location where the data is used :
The second column indicates how the data is used:
r means it is read
w means it is written
o means it is loaded as a pointer

How do I make space for my code cave in a Windows PE 32bit executable

So I want to make a space for my code caves in minesweeper.exe (typical Windows XP minesweeper game, link: Minesweeper). So I modified the PE header of the file via CFF Explorer to increase size of the .text section.
I tried increasing raw size of .text segment by 1000h (new size was 3B58), but Windows was unable to locate the entry point and the game failed to launch. Then I tried increasing the size of the .rsrc section, adding a new section, increasing the image size, but none of those attempts were successful, Windows was saying that "This is not x32 executable".
So here is the question: how do I make space for my code cave? I don't want to search for empty space left by the compiler, I want to have nice and clean 1000h bytes for my code. A tutorial for that and a detailed explanation for how to do that without corrupting a game would be GREAT! (And yes, I am actually hacking a minesweeper)
You can't increase the size of a section without invalidating the following ones (typically because it invalidates offsets and addresses in those sections). This remains possible but it's extremely error prone and doesn't worth the hassle when you have a simpler solution.
Typically, you juste need to add a section at the end of the PE and jump there from the code section. There is usually a little bit of space at the end of the code section (code cave) so you can place your JMPs (or a little code stub) there to redirect to the new section. You can also add other new sections for data or new resources or whatever you want.
Note: I'm using two tools: CFF explorer as a PE browser; an hex editor.
This file is quite particular so it is a little bit harder than usual to add a new section.
Let's start!
Below is an hex view of the array of IMAGE_SECTION_HEADER:
Usually there is some room to add a new section but in this particular case, there's none... The last section header is immediately followed by something.
Judging by the content, this is probably a bound import directory, which is confirmed in CFF explorer (offset of the bound directory is 0x248):
Bound import directory are of no use today, especially with ASLR, so we can zero out the whole directory (its size is 0xA8 bytes as indicated in the previous screenshot):
You can also zero out the Bound Import directory RVA in the Data Directories although this is not strictly required:
Now, it's time to add the new section.
Add a new section
Minesweeper comes with 3 sections by default, so increment the Number of sections from 3 to 4:
Go to the sections headers and add a new section (you can do it directly in CFF explorer; I named mine, .foobar, be wary that section names are at most 8 characters and don't need to end with a NULL byte):
You need to choose two numbers:
The raw size of the new section (I picked 0x400) ; it must be a multiple of FileAlignment (which is 0x200 in this case).
The virtual size of the new section (I picked 0x1000); it must be a multiple of SectionAlignement (which is 0x1000 for this binary).
Now we" need to calculate the two other members, Virtual Address and Raw Address.
Virtual Address
Let's take an example with the first and second section.
The first section starts at virtual address 0x1000 and has a virtual Size of 0x3A56. The next section virtual address must be aligned on SectionAlignement (0x1000) so the calculation is (using python here):
>>> def round_up_multiple_of(number, multiple):
num = number + (multiple - 1)
return num - (num % multiple)
>>> hex(round_up_multiple_of(0x1000 + 0x3a56, 0x1000))
'0x5000'
Which gives 0x5000 which is right (.data section starts at virtual address 0x5000).
Now, where our last section should start?
.rsrc section starts at 0x6000 and has a size of 0x19160:
>>> hex(round_up_multiple_of(0x6000 + 0x19160, 0x1000))
'0x20000'
So it must start at virtual address 0x20000. Put that number in Virtual Address.
Raw address
(Typically this is not needed as all sections are already aligned the last section must start right at the end of the file, but we'll do it).
As a reminder, the raw address is an address in the file (not in memory).
Let's start with an example (first and second section):
The first section raw address is 0x400 and its raw size 0x3c00. FileAlignement is 0x200, thus:
>>> hex(round_up_multiple_of(0x400 + 0x3c00, 0x200))
'0x4000'
The second section should start on the file (its Raw address) at 0x4000 which is right.
Thus for our new section, the calculation is:
.rsrc section starts in the file at 0x4200
.rsrc section size on file is 0x19200
FileAligment is 0x200
The calculation is as follow:
>>> hex(round_up_multiple_of(0x4200 + 0x19200, 0x200))
'0x1d400'
Our last section starts at the raw address 0x1d400 in the file which is confirmed with an hex editor:
Final steps
One last step is required, the calculation of the SizeOfImage field in the Optional header. According to the PE specification the field is:
The size (in bytes) of the image, including all headers, as the image
is loaded in memory. It must be a multiple of SectionAlignment.
Hence the calculation can be simplified as: VirtualAddress + VirtualSize of the last section, aligned on SectionAlignment (0x1000):
>>> hex(round_up_multiple_of(0x20000 + 0x1000, 0x1000))
'0x21000'
Now, save all your modifications in CFF explorer and exit.
Adding room for the new section
The last step is to add the required bytes for the last section. As I choose a Raw size of 0x400, I insert 0x400 bytes at Raw Address (0x1d400) with an hex editor.
Save you file. If you followed all the steps it must work (tested on Win 10) as is and you can start the modified executable without any errors.
Try to experience with a different raw size for the new section if 0x400 is not enough.
Now you have a new empty section, the rest is up to you for modifying the code :)

why system loader load strings at read/execute segment?

I compiled this simple program at ubuntu 15.10 x64
char *glob = "hello strings"
void main() {
}
and using the gdb I could find the "hello strings" are located at the
read/execute segment with .text section.
I already know that some strings contained in the ELF header are located in the code segment
but why the user defined strings are located at the same segment with code?
I've also tried to enlarge the size of the strings to 0x1000 for checking
whether it is compiler optimization to locate small sized strings with code section, but
they are also located at the same segment with code.
It's very interesting to me because intuitively strings should be readable not executable.
By default, the Linux linker creates two PT_LOAD segments: a read-only one, containing .text, and a writable one containing .data (initialized data).
Your string literal resides in .rodata section.
Which of the above two segments would you like this section to go into? If it is to be read-only, then it will have to go into the same segment that contains .text, and that segment must be executable. If the section is to go into the writable segment, it will not have execute permissions, but then you would be able to write to these strings, and they would not be shared when multiple instances of your binary run.
You can see the assignment of sections to segments in the output of readelf -l a.out.
With older versions of GCC (before 4.0), you can see that adding -fwritable-strings moves the string into .data, and into non-executable segment.
Gold linker supports --rodata flag, which moves all read-only non-executable sections into a separate PT_LOAD segment. But that increases the number of mmap and mprotect calls that the dynamic loader has to perform, and so is not the default.

Resources