the ways by which page table entry can become dirty - memory-management

The accessed and dirty (A/D) bits inform about a page whether it is accessed or written. when a file is loaded in memory some changes are only in memory which are not still synchronized with file stored on the disk. that page which is modified but not written back is dirty page.
My question is whether this concept also implies on ELF files?
Can .code, .data also get dirty? if yes then how?

My question is whether this concept also implies on ELF files?
Yes.
Can .code, .data also get dirty? if yes then how?
The .code usually doesn't have write permission (only read and execute), and so it usually doesn't get dirty.
However, you can mprotect a .code page to be writable, and write to it (this is often used in runtime patching). If you do, the corresponding page will become dirty, and will stay dirty because it is mapped with MAP_PRIVATE (you generally don't want a running program to change its image on-disk).
You could also get dirty .code pages if your binary has text relocations (which often happens when non-fPIC code is linked into a shared library on ix86).
Finally, the .data pages are modified all the time (every time you modify an initialized global variable), and these pages then stay dirty for the duration of the program (again, you generally don't want a running program to modify its on-disk image).
Update:
text/.code relocations with out fpic are those which are made for shared libraries at load time. then it means these relocations make .code dirty before even execution of entry instruction.
Not necessarily. Two cases to consider:
a.out that directly depends on foo.so
a.out that uses dlopen to load foo.so
In case 1, you are correct: text relocations in foo.so will cause (some of) its .text pages to become dirty before the first instruction of a.out is executed (note that user-space starts executing from ld.so entry, not from a.out entry).
In case 2, the .text pages will become dirty as part of the dlopen, which is long after main (which is itself long after the entry instruction).
when .data pages are modified, in response should .code pages also become dirty for fpic or non fpic?
No: modifying .data does not cause .code to also become dirty. Why would it?

Related

Easiest way to do runtime md5sum on the .text section of a static library

Brainhive,
I'm looking for a way to make sure my code wasn't altered, initial thought was to find the start address of the .text section and the size of it, run md5sum (or other hash) and compare to a constant.
My code is compiled to a static library and I don't want to hash the entire binary, only my library.
How do I? Will it help adding an ld script with reserved labels?
System is arm64 and I'm using GNU arm compiler (linaro implementation).
I'm looking for a way to make sure my code wasn't altered, initial thought was to find the start address of the .text section and the size of it, run md5sum (or other hash) and compare to a constant.
There are several reasons your request is likely misguided:
Anybody who is willing to modify your compiled code, will also be willing to modify the checksum that you are going to compare against at runtime. That is, you appear to want to do something like:
/* 0xabcd1234 is the precomputed checksum over the library. */
if (checksum_over_my_code() != 0xabcd1234) abort();
The attacker can easily replace this entire code with a sequence of NOP instructions, and proceed to use your modified library.
Your static library (usually) doesn't end up as sequence of bytes in the final binary. If you have foo.o and bar.o in your library, and the end-user links your library with his own code in main.o and baz.o, then the .text section of the resulting executable could well be composed of .text from main.o, then .text from foo.o, then .text from baz.o, and finally .text from bar.o.
When the final executable is linked, the instructions in your library are updated (relocated). That is, suppose your original code has CALL foo instruction. The actual bytes in your .text section will be something like 0xE9 0x00 0x00 0x00 0x00 (with a relocation record stating that the bytes following 0xE9 should be updated with whatever the final address of foo ends up being).
After the link is done, and assuming foo ends up at address 0x08010203, the bytes in .text of the executable will no longer be 0s. Instead they'll be 0xE9 0x03 0x02 0x01 0x08 (they actually wouldn't be that for reasons irrelevant here, but they certainly wouldn't be all 0s).
So computing the checksum over actual .text section of your archive library is completely pointless.
There are tools that allow you to dump an ELF section. elfcat makes it super easy, (elfcat --section-name=test the_file.o) but it should also be doable with objdump too. Once you've dumped the section, the problem is reduced to sizing and hashing a file.

Low-level details on linking and loading of (PE) programs in Windows

Low-level details on linking and loading of (PE) programs in Windows.
I'm looking for an answer or tutorial that clarifies how a Windows program are linked and loaded into memory after it has been assembled.
Especially, I'm uncertain about the following points:
After the program is assembled, some instructions may reference memory within the .DATA section. How are these references translated, when the program is loaded into memory starting at some arbitrary address? Does RVA's and relative memory references take care of these issues (BaseOfCode and BaseOfData RVA-fields of the PE-header)?
Is the program always loaded at the address specified in ImageBase header field? What if a loaded (DLL) module specifies the same base?
First I'm going to answer your second question:
No, a module (being an exe or dll) is not allways loaded at the base address. This can happen for two reasons, either there is some other module already loaded and there is no space for loading it at the base address contained in the headers, or because of ASLR (Address Space Layout Randomization) which mean modules are loaded at random slots for exploit mitigation purposes.
To address the first question (it is related to the second one):
The way a memory location is refered to can be relative or absolute. Usually jumps and function calls are relative (though they can be absolute), which say: "go this many bytes from the current instruction pointer". Regardless of where the module is loaded, relative jumps and calls will work.
When it comes to addressing data, they are usually absolute references, that is, "access these 4-byte datum at this address". And a full virtual address is specified, not an RVA but a VA.
If a module is not loaded at its base address, absolute references will all be broken, they are no longer pointing to the correct place the linker assumed they should point to. Let's say the ImageBase is 0x04000000 and you have a variable at RVA 0x000000F4, the VA will be 0x040000F4. Now imagine the module is loaded not at its BaseAddress, but at 0x05000000, everything is moved 0x1000 bytes forward, so the VA of your variable is actually 0x050000F4, but the machine code that accessess the data still has the old address hardcoded, so the program is corrupted. In order to fix this, linkers store in the executable where these absolute references are, so they can be fixed by adding to them how much the executable has been displaced: the delta offset, the difference between where the image is loaded and the image base contained in the headers of the executable file. In this case it's 0x1000. This process is called Base Relocation and is performed at load time by the operating system: before the code starts executing.
Sometimes a module has no relocations, so it can't be loaded anywhere else but at its base address. See How do I determine if an EXE (or DLL) participate in ASLR, i.e. is relocatable?
For more information on ASLR: https://insights.sei.cmu.edu/cert/2014/02/differences-between-aslr-on-windows-and-linux.html
There is another way to move the executable in memory and still have it run correctly. There exists something called Position Independent Code. Code crafted in such a way that it will run anywhere in memory without the need for the loader to perform base relocations.
This is very common in Linux shared libraries and it is done addressing data relatively (access this data item at this distance from the instruction pointer).
To do this, in the x64 architecture there is RIP-relative addressing, in x86 a trick is used to emulate it: get the content of the instruction pointer and then calculate the VA of a variable by adding to it a constant offset.
This is very well explained here:
https://www.technovelty.org/linux/plt-and-got-the-key-to-code-sharing-and-dynamic-libraries.html
I don't think PIC code is common in Windows, more often than not, Windows modules contain base relocations to fix absolute addresses when it is loaded somewhere else than its prefered base address, although I'm not exactly sure of this last paragraph so take it with a grain of salt.
More info:
http://opensecuritytraining.info/LifeOfBinaries.html
How are windows DLL actually shared? (a bit confusing because I didn't explain myself well when asking the question).
https://www.iecc.com/linker/
I hope I've helped :)

Does the .data section gets loaded into memory?

I have attempted the following test to see if the .data section gets loaded into memory when the program is executed:
global _start
section .data
arr times 99999999 DB 0xAF
section .text
_start:
jmp _start ; prevent process from terminating
Assemble and link:
nasm -f win32 D:\file.asm
link D:\file.obj /OUT:D:\file.exe /ENTRY:start /SUBSYSTEM:CONSOLE
I have executed the program, and the result was the following:
As you can see the program only occupied 276 KB of memory while it has an array with a size of 99999999 bytes!
The paging model on most systems will cause the pages comprising the sections of the binary not requiring some kind of dynamic linking to only be loaded when they are accessed - Windows is no exception. So, the .data section is memory-mapped as a binary file to your process memory space, but is not actually swapped in until you need it. The process monitor only reports the memory actually in by default, although you can configure the columns to show all of the memory in the image, also. There may also be compiler options you can use to change the paging behavior, and you can always remap the memory manually (perhaps locking it in) if you need.

How does a PE file get mapped into memory?

So I have been reasearching the PE format for the last couple days, and I still have a couple of questions
Does the data section get mapped into the process' memory, or does the program read it from the disk?
If it does get mapped into its memory, how can the process aqquire the offset of the section? ( And other sections )
Is there any way the get the entry point of a process that has already been mapped into the memory, without touching the file on disk?
Does the data section get mapped into the process' memory
Yes. That's unlikely to survive for very long, the program is apt to write to that section. Which triggers a copy-on-write page copy that gets the page backed by the paging file instead of the PE file.
how can the process aqquire the offset of the section?
The linker already calculated the offsets of variables in the section. It might be relocated, common for DLLs that have an awkward base address that's already in use when the DLL gets loaded. In which case the relocation table in the PE file is used by the loader to patch the addresses in the code. The pages that contain such patched code get the same treatment as the data section, they are no longer backed by the PE file and cannot be shared between processes.
Is there any way the get the entry point of a process
The entire PE file gets mapped to memory, including its headers. So you can certainly read IMAGE_OPTIONAL_HEADER.AddressOfEntryPoint from memory without reading the file. Do keep in mind that it is painful if you do this for another process since you don't have direct access to its virtual address space. You'd have to use ReadProcessMemory(), that's fairly little joy and unlikely to be faster than reading the file. The file is pretty likely to be present in the file system cache. The Address Space Layout Randomization feature is apt to give you a headache, designed to make it hard to do these kind of things.
Does the data section get mapped into the process' memory, or does the program read it from the disk?
It's mapped into process' memory.
If it does get mapped into its memory, how can the process aqquire the offset of the section? ( And other sections )
By means of a relocation table: every reference to a global object (data or function) from the executable code, that uses direct addressing, has an entry in this table so that the loader patches the code, fixing the original offset. Note that you can make a PE file without relocation section, in which case all data and code sections have a fixed offset, and the executable has a fixed entry point.
Is there any way the get the entry point of a process that has already been mapped into the memory, without touching the file on disk?
Not sure, but if by "not touching" you mean not even reading the file, then you may figure it out by walking up the stack.
Yes, all sections that are described in the PE header get mapped into memory. The IMAGE_SECTION_HEADER struct tells the loader how to map it (the section can for example be much bigger in memory than on disk).
I'm not quite sure if I understand what you are asking. Do you mean how does code from the code section know where to access data in the data section? If the module loads at the preferred load address then the addresses that are generated statically by the linker are correct, otherwise the loader fixes the addresses with relocation information.
Yes, the windows loader also loads the PE Header into memory at the base address of the module. There you can file all the info that was in the file PE header - also the Entry Point.
I can recommend this article for everything about the PE format, especially on relocations.
Does the data section get mapped into the process' memory, or does the
program read it from the disk?
Yes, everything before execution by the dynamic loader of operating systems either Windows or Linux must be mapped into memory.
If it does get mapped into its memory, how can the process acquire the
offset of the section? ( And other sections )
PE file has a well-defined structure which loader use that information and also parse that information to acquire the relative virtual address of sections around ImageBase. Also, if ASLR - Address randomization feature - was activated on the system, the loader has to use relocation information to resolve those offsets.
Is there any way the get the entry point of a process that has already
been mapped into the memory, without touching the file on disk?
NOPE, the loader of the operating system for calculation of OEP uses ImageBase + EntryPoint member values of the optional header structure and in some particular places when Address randomization is enabled, It uses relocation table to resolve all addresses. So we can't do anything without parsing of PE file on the disk.

Cleared RW (write protect) flag for PTEs of a process in kernel yet no segmentation fault on write

I implemented incremental process checkpointing at page level(I just dump the data from the process address space into a file).
The approach I used is as follows. I used two system calls:
Complete Checkpoint: copy entire address space. Also if write bit
is set for a page, clear it.
Incremental checkpoint: only dump data if write bit is set and clear it again. So basically, I check if write bit is set for an incremental checkpoint. If yes, dump the page data.
Test program:
char a[10000];
sys_cp_range(a,a+10000);
a[3]='A';
sys_incr_cp_range(a,a+10000);
From what I know, the kernel should be doing page fault and handle illegal write case by killing the process with SIGSEGV. Yet the program is successfully checkpointed.
What is exactly happening here ?
If you modify a PTE when it's still cached in the TLB, the effect of the modification may be unseen for a while (until the PTE gets evicted from the TLB and has to be reread from the page table).
You need to invalidate the PTE in the TLB with the invlpg (I'm assuming x86) instruction after PTE modification. And it has to be done on all CPUs. There must be a dedicated function for this purpose in the kernel.
Also it wouldn't hurt to double check that the compiler didn't reorder or throw away anything from the above code.

Resources