Does the .data section gets loaded into memory? - windows

I have attempted the following test to see if the .data section gets loaded into memory when the program is executed:
global _start
section .data
arr times 99999999 DB 0xAF
section .text
_start:
jmp _start ; prevent process from terminating
Assemble and link:
nasm -f win32 D:\file.asm
link D:\file.obj /OUT:D:\file.exe /ENTRY:start /SUBSYSTEM:CONSOLE
I have executed the program, and the result was the following:
As you can see the program only occupied 276 KB of memory while it has an array with a size of 99999999 bytes!

The paging model on most systems will cause the pages comprising the sections of the binary not requiring some kind of dynamic linking to only be loaded when they are accessed - Windows is no exception. So, the .data section is memory-mapped as a binary file to your process memory space, but is not actually swapped in until you need it. The process monitor only reports the memory actually in by default, although you can configure the columns to show all of the memory in the image, also. There may also be compiler options you can use to change the paging behavior, and you can always remap the memory manually (perhaps locking it in) if you need.

Related

Accessing global variables in ARM64 position independent assembly code

I'm writing some ARM64 assembly code for macOS, and it needs to access a global variable.
I tried to use the solution in this SO answer, and it works fine if I just call the function as is. However, my application needs to patch some instructions of this function, and the way I'm doing it, the function gets moved somewhere else in memory in the process. Note the adrp/ldr pair is untouched during patching.
However, if I try to run the function after moving it elsewhere in memory, it no longer returns correct results. This happens even if I just memcpy() the code as is, without patching. After tracing with a debugger, I isolated the issue to the address of the global valuable being incorrectly loaded by the adrp/ldr pair (and weirdly, the ldr is assembled as an add, as seen with objdump straight after compiling the binary -- not sure if it's somehow related to the issue here.)
What would be the correct way to load a global variable, so that it survives the function being copied somewhere else and run from there?
Note the adrp/ldr pair is untouched during patching.
There's the issue. If you rip code out of the binary it's in, then you effectively need to re-link it.
There's two ways of dealing with this:
If you have complete control over the segment layout, then you could have one executable segment with all of your assembly in it, and right next to it one segment with all addresses that code needs, and make sure the assembly ONLY has references to things on that page. Then wherever you copy your assembly, you'd also copy the data page next to it. This would enable you to make use of static addresses that get rebased by the dynamic linker at the time your binary is loaded. This might look something like:
.section __ASM,__asm,regular
.globl _asm_stub
.p2align 2
_asm_stub:
adrp x0, _some_ref#PAGE
ldr x0, [x0, _some_ref#PAGEOFF]
ret
.section __REF,__ref
.globl _some_ref
.p2align 3
_some_ref:
.8byte _main
Compile that with -Wl,-segprot,__ASM,rx,rx and you'll get an executable __ASM and a writeable __REF segment. Those two would have to maintain their relative position to each other when they get copied around.
(Note that on arm64 macOS you cannot put symbol references into executable segments for the dynamic linker to rebase, because it will fault and crash while trying to do so, and even if it were able to do that, it would invalidate the code signature.)
You act as a linker, scanning for PC-relative instructions and re-linking them as you go. The list of PC-relative instructions in arm64 is quite short, so it should be a feasible amount of work:
adr and adrp
b and bl
b.cond (and bc.cond with FEAT_HBC)
cbz and cbnz
tbz and tbnz
ldr and ldrsw (literal)
ldr (SIMD & FP literal)
prfm (literal)
(You can look for the string PC[] in the ARMv8 Reference Manual to find all uses.)
For each of those you'd have to check whether their target address lies within the range that's being copied or not. If it does, then you'd leave the instruction alone (unless you copy the code to a different offset within the 4K page than it was before, in which case you have to fix up adrp instructions). If it isn't then you'll have to recalculate the offset and emit a new instruction. Some of the instructions have a really low maximum offset (tbz/tbnz ±32KiB). But usually the only instructions that reference addresses across function boundaries are adr, adrp, b, bl and ldr. If all code on the page is written by you then you can do adrp+add instead of adr and adrp+ldr instead of just ldr, and if you have compiler-generated code on there, then all adr's and ldr's will have a nop before or after, which you can use to turn them into an adrp combo. That should get your maximum reference range up to ±128MiB.

What does write_cr0(read_cr0() | 0x10000) do?

I searched the web a lot but didn't find a short explanation about what write_cr0(read_cr0() | 0x10000) really do. It is related to the Linux kernel and I curios about developing LKM's. I want to know what this really do and what are the security issues with this.
It used to remove the write protection on the syscall table.
But how it is really works? and what does each thing in this line?
CR0 is one of the control registers available on x86 CPUs, which contains flags controlling CPU features related to memory protection, multitasking, paging, etc. You can find a full description in Volume 3, Section 2.5 of Intel's Software Developer's Manual.
These registers are accessed by special instructions that the compiler doesn't normally generate, so read_cr0() is a function which executes the instruction to read this register (via inline assembly) and returns the result in a general-purpose register. Likewise, write_cr0() writes to this register.
The function calls are likely to be inlined, so that the generated code would be something like
mov eax, cr0
or eax, 0x10000
mov cr0, eax
The OR with 0x10000 sets bit 16, the Write Protect bit. On early 32-bit x86 CPUs, code running at supervisor level (like the kernel) was always allowed to write all of virtual memory, regardless of whether the page was marked read-only. This bit makes that optional, so that when it is set, such accesses will cause page faults. This line of code probably follows an earlier line which temporarily cleared the bit.

DOS inserting segment addresses at runtime

I noticed a potential bug in some code i'm writing.
I though that if I used mov ax, seg segment_name, the program might be non-portable and only work on one machine in a specific configuration since the load location can vary from machine to machine.
So I decided to disassemble a program containing just that one instruction on two different machines running DOS and I found that the problem was magically solved.
Output of debug on machine one: 0C7A:014C B8BB0C MOV AX,0CBB
Output of debug on machine two: 06CA:014C B80B07 MOV AX,070B
After hex dumping the program I found that the unaltered bytes are actually B84200.
Manually inserting those bytes back into the program results in mov ax, 0042
So does the PE format store references to those instructions and update them at runtime?
As Peter Cordes noted, MS-DOS doesn't use the PECOFF executable format that Windows uses. It has it's own "MZ" executable format, named after the first two bytes of the executable that identify as being in this format.
The MZ format supports the use of multiple segments through a relocation table containing relocations. These relocations are just simple segment:offset values that indicate the location of 16-bit segment values that need to be adjusted based on where the executable was loaded in memory. MS-DOS performs these adjustments by simply adding the actual load segment of the program to the value contained in the executable. This means that without relocations applied the executable would only work if loaded at segment 0, which happens to be impossible.
Note this isn't just necessary for a program to work on multiple machines, it's also necessary for the same program to work reliably on the same machine. The load address can change based on what various configuration details, was well as other programs and drivers that have already been loaded in memory, so the load address of an MS-DOS executable is essentially unpredictable.
Working backwards from your example, we can tell where your example program was loaded into memory on both machines. Since 0042h was relocated into 0CBBh on the first machine and into 070Bh on the second machine, we know MS-DOS loaded your program on the two machines at segments 0C79h and 06C9h respectively:
0CBB - 0042 = 0C79
070B - 0042 = 06C9
From that we can determine that your example executable has the entry 0001:014D, or equivalent segment:offset value, in it's relocation table:
0C7A:014D - 0C79:0000 = 0001:014D
06CA:014D - 06C9:0000 = 0001:014D
This entry indicates the unrelocated location of the 16-bit immediate operand of the mov ax, seg segname instruction that needs adjusting.

Easiest way to do runtime md5sum on the .text section of a static library

Brainhive,
I'm looking for a way to make sure my code wasn't altered, initial thought was to find the start address of the .text section and the size of it, run md5sum (or other hash) and compare to a constant.
My code is compiled to a static library and I don't want to hash the entire binary, only my library.
How do I? Will it help adding an ld script with reserved labels?
System is arm64 and I'm using GNU arm compiler (linaro implementation).
I'm looking for a way to make sure my code wasn't altered, initial thought was to find the start address of the .text section and the size of it, run md5sum (or other hash) and compare to a constant.
There are several reasons your request is likely misguided:
Anybody who is willing to modify your compiled code, will also be willing to modify the checksum that you are going to compare against at runtime. That is, you appear to want to do something like:
/* 0xabcd1234 is the precomputed checksum over the library. */
if (checksum_over_my_code() != 0xabcd1234) abort();
The attacker can easily replace this entire code with a sequence of NOP instructions, and proceed to use your modified library.
Your static library (usually) doesn't end up as sequence of bytes in the final binary. If you have foo.o and bar.o in your library, and the end-user links your library with his own code in main.o and baz.o, then the .text section of the resulting executable could well be composed of .text from main.o, then .text from foo.o, then .text from baz.o, and finally .text from bar.o.
When the final executable is linked, the instructions in your library are updated (relocated). That is, suppose your original code has CALL foo instruction. The actual bytes in your .text section will be something like 0xE9 0x00 0x00 0x00 0x00 (with a relocation record stating that the bytes following 0xE9 should be updated with whatever the final address of foo ends up being).
After the link is done, and assuming foo ends up at address 0x08010203, the bytes in .text of the executable will no longer be 0s. Instead they'll be 0xE9 0x03 0x02 0x01 0x08 (they actually wouldn't be that for reasons irrelevant here, but they certainly wouldn't be all 0s).
So computing the checksum over actual .text section of your archive library is completely pointless.
There are tools that allow you to dump an ELF section. elfcat makes it super easy, (elfcat --section-name=test the_file.o) but it should also be doable with objdump too. Once you've dumped the section, the problem is reduced to sizing and hashing a file.

the ways by which page table entry can become dirty

The accessed and dirty (A/D) bits inform about a page whether it is accessed or written. when a file is loaded in memory some changes are only in memory which are not still synchronized with file stored on the disk. that page which is modified but not written back is dirty page.
My question is whether this concept also implies on ELF files?
Can .code, .data also get dirty? if yes then how?
My question is whether this concept also implies on ELF files?
Yes.
Can .code, .data also get dirty? if yes then how?
The .code usually doesn't have write permission (only read and execute), and so it usually doesn't get dirty.
However, you can mprotect a .code page to be writable, and write to it (this is often used in runtime patching). If you do, the corresponding page will become dirty, and will stay dirty because it is mapped with MAP_PRIVATE (you generally don't want a running program to change its image on-disk).
You could also get dirty .code pages if your binary has text relocations (which often happens when non-fPIC code is linked into a shared library on ix86).
Finally, the .data pages are modified all the time (every time you modify an initialized global variable), and these pages then stay dirty for the duration of the program (again, you generally don't want a running program to modify its on-disk image).
Update:
text/.code relocations with out fpic are those which are made for shared libraries at load time. then it means these relocations make .code dirty before even execution of entry instruction.
Not necessarily. Two cases to consider:
a.out that directly depends on foo.so
a.out that uses dlopen to load foo.so
In case 1, you are correct: text relocations in foo.so will cause (some of) its .text pages to become dirty before the first instruction of a.out is executed (note that user-space starts executing from ld.so entry, not from a.out entry).
In case 2, the .text pages will become dirty as part of the dlopen, which is long after main (which is itself long after the entry instruction).
when .data pages are modified, in response should .code pages also become dirty for fpic or non fpic?
No: modifying .data does not cause .code to also become dirty. Why would it?

Resources