How does a PE file get mapped into memory? - windows

So I have been reasearching the PE format for the last couple days, and I still have a couple of questions
Does the data section get mapped into the process' memory, or does the program read it from the disk?
If it does get mapped into its memory, how can the process aqquire the offset of the section? ( And other sections )
Is there any way the get the entry point of a process that has already been mapped into the memory, without touching the file on disk?

Does the data section get mapped into the process' memory
Yes. That's unlikely to survive for very long, the program is apt to write to that section. Which triggers a copy-on-write page copy that gets the page backed by the paging file instead of the PE file.
how can the process aqquire the offset of the section?
The linker already calculated the offsets of variables in the section. It might be relocated, common for DLLs that have an awkward base address that's already in use when the DLL gets loaded. In which case the relocation table in the PE file is used by the loader to patch the addresses in the code. The pages that contain such patched code get the same treatment as the data section, they are no longer backed by the PE file and cannot be shared between processes.
Is there any way the get the entry point of a process
The entire PE file gets mapped to memory, including its headers. So you can certainly read IMAGE_OPTIONAL_HEADER.AddressOfEntryPoint from memory without reading the file. Do keep in mind that it is painful if you do this for another process since you don't have direct access to its virtual address space. You'd have to use ReadProcessMemory(), that's fairly little joy and unlikely to be faster than reading the file. The file is pretty likely to be present in the file system cache. The Address Space Layout Randomization feature is apt to give you a headache, designed to make it hard to do these kind of things.

Does the data section get mapped into the process' memory, or does the program read it from the disk?
It's mapped into process' memory.
If it does get mapped into its memory, how can the process aqquire the offset of the section? ( And other sections )
By means of a relocation table: every reference to a global object (data or function) from the executable code, that uses direct addressing, has an entry in this table so that the loader patches the code, fixing the original offset. Note that you can make a PE file without relocation section, in which case all data and code sections have a fixed offset, and the executable has a fixed entry point.
Is there any way the get the entry point of a process that has already been mapped into the memory, without touching the file on disk?
Not sure, but if by "not touching" you mean not even reading the file, then you may figure it out by walking up the stack.

Yes, all sections that are described in the PE header get mapped into memory. The IMAGE_SECTION_HEADER struct tells the loader how to map it (the section can for example be much bigger in memory than on disk).
I'm not quite sure if I understand what you are asking. Do you mean how does code from the code section know where to access data in the data section? If the module loads at the preferred load address then the addresses that are generated statically by the linker are correct, otherwise the loader fixes the addresses with relocation information.
Yes, the windows loader also loads the PE Header into memory at the base address of the module. There you can file all the info that was in the file PE header - also the Entry Point.
I can recommend this article for everything about the PE format, especially on relocations.

Does the data section get mapped into the process' memory, or does the
program read it from the disk?
Yes, everything before execution by the dynamic loader of operating systems either Windows or Linux must be mapped into memory.
If it does get mapped into its memory, how can the process acquire the
offset of the section? ( And other sections )
PE file has a well-defined structure which loader use that information and also parse that information to acquire the relative virtual address of sections around ImageBase. Also, if ASLR - Address randomization feature - was activated on the system, the loader has to use relocation information to resolve those offsets.
Is there any way the get the entry point of a process that has already
been mapped into the memory, without touching the file on disk?
NOPE, the loader of the operating system for calculation of OEP uses ImageBase + EntryPoint member values of the optional header structure and in some particular places when Address randomization is enabled, It uses relocation table to resolve all addresses. So we can't do anything without parsing of PE file on the disk.

Related

Can someone explain the Windows ZwMapViewOfSection system call so that a noob (me) can understand?

I'm investigating a set of Windows API system calls made by a piece of malware running in a sandbox so that I can understand its malicious intent. Unfortunately, I'm struggling to understand the ZwMapViewOfSection function described in documentation: https://learn.microsoft.com/en-us/windows-hardware/drivers/ddi/content/wdm/nf-wdm-zwmapviewofsection
Now, I do understand that this function is related to the mapping of physical memory to virtual memory in a page table. Apart from that, I find the documentation arcane and not friendly to beginners. I am also confused why they are calling blocks of physical memory "sections" rather than "frames" (if that is what they are indeed referring to -- its not clear to me). Can anyone provide a more intuitive explanation about this system call and what it does in general? Is this a common system call for programs or is it limited to malware? Thank You.
It is extremely common for normal programs to make this call (not directly of course), every program is going to invoke it multiple times during initialization at the very least (ZwMapViewOfSection is used when performing the memory backing used to implement the executable sections of code itself). Not so common during normal program code but not uncommon either. Particularly common if the program performs dynamic DLL loads but legitimate programs can also do memory-mapped IO for their own reasons.
It operates on memory section objects (I've never really understood that name either) which are one part of the link between disc files and memory-mapped regions, the section is created via ZwCreateSection or opened with ZwOpenSection and then the other part comes into play with ZwMapViewOfSection.
What part of this, exactly, is confusing you? Knowing that would make it far easier to provide an informative response.
As far as I understand it, you have to open the file and acquire a file handle which you then map with CreateFileMapping, which will call NtCreateSection, which calls MmCreateSection. If the file is mapped for the first time a new segment object and control area are created first then depending on whether the section is created for a data, image or page-file backed MiCreateDataFileMap, MiCreateImageFileMap or MiCreatePagingFileMap is called.
MiCreateDataFileMap sets up the subsection object and section object. In the normal case only one subsection is created, but under some special conditions multiple subsections are used, e.g. if the file is very large. For data files, the subsection object field SubsectionBase is left blank. Instead the SegmentPteTemplate field of the segment object is setup properly which can be used to create the PPTEs when necessary. This defers the creation of PPTEs until a view is mapped for the first time which avoids wasting memory when very large data files are mapped. Note a PPTE is a PTE that is serving as a prototype PTE, but an _MMPTE_PROTOTYPE is a PTE that is pointing to a prototype.
MiCreateImageFileMap creates the section object and loads the PE header of the specified file and verifies it then one subsection is created for the PE header and one for each PE section. If a very small image file is mapped then only one subsection is created for the complete file. Besides the subsections also the related PPTEs for each of them are created and their page protection flags are set according to the protection settings of the related PE section. These PPTEs will be used as a template for building the real PTEs when a view is mapped and accessed.
After a section is created it can be mapped into the address space by creating a view from it. The flProtect passed to CreateFileMapping specifies the protection of the section object. All mapped views of the object must be compatible with this protection. You specify dwMaximumSizeLow and dwMaximumSizeHigh to be 0 in order for dwMaximumSizeHigh to be set to the length of the file automatically.
You then pass the returned section object handle to MapViewOfFile, which will calls NtMapViewOfSection on it, which calls MmMapViewOfSegment, which calls MmCreateMemoryArea, which is where the view is mapped into the VAD of the process with the protection dwDesiredAccess supplied to MapViewOfFile, which serves as the protection type for all PTEs that the VAD entry covers. dwNumberOfBytesToMap = 0 and dwFileOffsetLow = 0 in MapViewOfFile maps the whole file.
When a view is mapped, I believe that all of the PTEs are made to point to the prototype PTEs and are given the protection of the PPTE. For an image file, the PPTEs have already been initialised to subsection PTEs. For a data file, the PPTEs for the view need to be initialised to subsection PTEs. The VAD entry for the view is now created. The VAD entry protection isn't always reflective of the protection of the PTEs it covers, because it can cover multiple subsections and multiple blocks within those subsections.
The first time an address in the mapping is actually accessed, the subsection prototype PTE is filled in on demand with the allocated physical page filled with the I/O write for that range and the process PTE is filled in with that same address. For an image, the PPTE was already filled in when the subsections were created along with protection information derived from the section header characteristics in the image, and it just fills in the PTE with that address and the protection information in it.
When the PTE is trimmed from the process working set, the working set manager accesses the PFN to locate the PPTE address, decreases the share count, and it inserts the PPTE address into the PTE.
I'm not sure when a VAD PTE (which have a prototype bit and prototype address of 0xFFFFFFFF0000 and is not valid) occurs. I would have thought the PPTEs are always there at their virtual address and can be pointed to as soon as the VAD entry is created.

Loading data segment of already loaded shared library

For global offset table to work, GOT must be at a fixed location from text segment. Now assume that a program needs a shared library. Assume also that the shared library is already loaded by the OS for some other process. Now for our program, since text section of shared library is already loaded, it just needs to load data segment. The shared library text section is mapped back to the virtual address of our process. But what if there is already some data/text or whatever at the fixed offset from the virtual address of our shared library. How does the dynamic linker resolve that conflict? One approach would be to leave R_386_GOTPC in the text section till load time and let the dynamic linker change it the new offset. Is this how it is done in practice.
On GNU, even the same DSO is mapped at different addresses in different processes. No data at all is shared between them. This means that the GOT is just private data (like .data), and is initialized at load time with the proper addresses (either stubs or the proper function addresses with BIND_NOW).
(This assumes that prelink is not in use, which is somewhat broken anyway.)

What parts of a PE file are mapped into memory by the MS loader?

What parts of a PE file are mapped into memory by the MS loader?
From the PE documentation, I can deduce the typical format of a PE executable (see below).
I know, by inspection, that all contents of the PE file, up to and including the section headers, gets mapped into memory exactly as stored on disk.
What happens next?
Is the remainder of the file also mapped (here I refer to the Image Pages part in the picture below), so that the whole file is in memory exactly like stored on disk, or is the loader more selective than that?
In the documentation, I've found the following snippet:
Another exception is that attribute certificate and debug information
must be placed at the very end of an image file, with the attribute
certificate table immediately preceding the debug section, because the
loader does not map these into memory. The rule about attribute
certificate and debug information does not apply to object files,
however.
This is really all I can find about loader behavior; it just says that these two parts must be placed last in the file, since they don't go into memory.
But, if the loader loads everything except these two parts, and I set the section RVA's suffiently high, then section data will actually be duplicated in memory (once in the mapped file and once for the position specified by the RVA)?
If possible, link to places where I can read further about loading specific to MS Windows.
Finding this information is like an egg hunt, because MS always insists on using its own terminology when the COFF description uses AT&T terms.
What parts of a PE file are mapped into memory by the MS loader?
It depends.
All sections covered by a section header are mapped into the run-time address space.
However sections that have an RVA of 0 are not mapped and thus never loaded.
Each debug directory entry identifies the location and size of a block of debug information. The RVA specified may be 0 if the debug information is not covered by a section header (i.e., it resides in the image file and is not mapped into the run-time address space). If it is mapped, the RVA is its address.
Memory contains an exact replica of the file on disk.
Note that executables and dll's are mapped into virtual memory, not physical!
As you access the executable parts of it are swapped into RAM as needed.
If a section is not accessed then it obviously does not get swapped into physical RAM, it is however still mapped into virtual memory.
You can read up on everything you might ever want to know about PE files (and more) on MSDN.
Your quote is lifted from the documentation of the COFF file format.
The critical part is:
The rule on attribute certificate and debug information does not apply to object files.
From: https://support.microsoft.com/en-us/kb/121460
Size: Size of the optional header, which is included for executable files but not object files. An object file should have a value of 0 here.
Ergo: executable files or not object files, they are image files.
as such the exception to the rule does not apply to them.

Low-level details on linking and loading of (PE) programs in Windows

Low-level details on linking and loading of (PE) programs in Windows.
I'm looking for an answer or tutorial that clarifies how a Windows program are linked and loaded into memory after it has been assembled.
Especially, I'm uncertain about the following points:
After the program is assembled, some instructions may reference memory within the .DATA section. How are these references translated, when the program is loaded into memory starting at some arbitrary address? Does RVA's and relative memory references take care of these issues (BaseOfCode and BaseOfData RVA-fields of the PE-header)?
Is the program always loaded at the address specified in ImageBase header field? What if a loaded (DLL) module specifies the same base?
First I'm going to answer your second question:
No, a module (being an exe or dll) is not allways loaded at the base address. This can happen for two reasons, either there is some other module already loaded and there is no space for loading it at the base address contained in the headers, or because of ASLR (Address Space Layout Randomization) which mean modules are loaded at random slots for exploit mitigation purposes.
To address the first question (it is related to the second one):
The way a memory location is refered to can be relative or absolute. Usually jumps and function calls are relative (though they can be absolute), which say: "go this many bytes from the current instruction pointer". Regardless of where the module is loaded, relative jumps and calls will work.
When it comes to addressing data, they are usually absolute references, that is, "access these 4-byte datum at this address". And a full virtual address is specified, not an RVA but a VA.
If a module is not loaded at its base address, absolute references will all be broken, they are no longer pointing to the correct place the linker assumed they should point to. Let's say the ImageBase is 0x04000000 and you have a variable at RVA 0x000000F4, the VA will be 0x040000F4. Now imagine the module is loaded not at its BaseAddress, but at 0x05000000, everything is moved 0x1000 bytes forward, so the VA of your variable is actually 0x050000F4, but the machine code that accessess the data still has the old address hardcoded, so the program is corrupted. In order to fix this, linkers store in the executable where these absolute references are, so they can be fixed by adding to them how much the executable has been displaced: the delta offset, the difference between where the image is loaded and the image base contained in the headers of the executable file. In this case it's 0x1000. This process is called Base Relocation and is performed at load time by the operating system: before the code starts executing.
Sometimes a module has no relocations, so it can't be loaded anywhere else but at its base address. See How do I determine if an EXE (or DLL) participate in ASLR, i.e. is relocatable?
For more information on ASLR: https://insights.sei.cmu.edu/cert/2014/02/differences-between-aslr-on-windows-and-linux.html
There is another way to move the executable in memory and still have it run correctly. There exists something called Position Independent Code. Code crafted in such a way that it will run anywhere in memory without the need for the loader to perform base relocations.
This is very common in Linux shared libraries and it is done addressing data relatively (access this data item at this distance from the instruction pointer).
To do this, in the x64 architecture there is RIP-relative addressing, in x86 a trick is used to emulate it: get the content of the instruction pointer and then calculate the VA of a variable by adding to it a constant offset.
This is very well explained here:
https://www.technovelty.org/linux/plt-and-got-the-key-to-code-sharing-and-dynamic-libraries.html
I don't think PIC code is common in Windows, more often than not, Windows modules contain base relocations to fix absolute addresses when it is loaded somewhere else than its prefered base address, although I'm not exactly sure of this last paragraph so take it with a grain of salt.
More info:
http://opensecuritytraining.info/LifeOfBinaries.html
How are windows DLL actually shared? (a bit confusing because I didn't explain myself well when asking the question).
https://www.iecc.com/linker/
I hope I've helped :)

When Resources of a PE file are loaded

When using a resource included in a PE file (for example a binary resource) in C++ . we have to
first call
1 )FindResource and then
2 )LoadResource
to access the resource .
Being accurate about the function name "LoadResource" i wonder if the "Windows Loader" does load all resource of an application in memory just when loading other parts (like code or data section) or they are delay loaded only when we need them ?
If so can we unload these resources after we have used them in order to free allocated memory?
These functions are old, they date back to Windows versions that did not yet support virtual memory. Back in the olden days they would actually physically load a resource into RAM.
Those days are long gone, the OS loader creates a memory-mapped file to map the executable file into memory. And anything from the file (code and resources) are only mapped into RAM when the program dereferences a pointer. You only pay for what you use.
So LoadResource() does very little, it simply returns a pointer, disguised as a HGLOBAL handle. LockResource() does nothing interesting, it simply casts the HGLOBAL back to a pointer. When you actually start using it then you'll trip a page fault and the kernel reads the file, loading it into RAM. UnlockResource() and FreeResource() do nothing. If the OS needs RAM for another process then it can unmap the RAM for the resource. Nothing needs to be preserved since the memory is backed by the file, the page can simply be discarded. Paged back in when necessary if you use the resource again.

Resources