Explain to me how Windows allocates process virtual memory - windows

I have pretty complex question combined of multiple related questions. Let me give you the preamble.
I wrote a simple Win64 program in assembly language which prints "2 + 3 = 5" using printf and then "Hello World!" using puts:
format PE64
entry start
section '.text' code readable executable
start:
sub rsp,8*5 ; reserve stack for API use and make stack dqword aligned
mov edx, 3
mov ecx, 2
call print_sum
lea rcx,[_hw_message]
call [puts]
mov ecx,0
call [ExitProcess]
print_sum:
sub rsp, 20h
mov r9d, ecx
add r9d, edx
mov r8d, edx
mov edx, ecx
lea ecx, [_format_message]
call [printf]
add rsp, 20h
ret
section '.data' data readable writeable
_hw_message db 'Hello World!',0
_format_message db '%d + %d = %d',13,10,0
section '.idata' import data readable writeable
dd 0,0,0,RVA kernel_name,RVA kernel_table
dd 0,0,0,RVA msvcrt_name,RVA msvcrt_table
kernel_table:
ExitProcess dq RVA _ExitProcess
dq 0
msvcrt_table:
printf dq RVA _printf
puts dq RVA _puts
dq 0
kernel_name db 'KERNEL32.DLL',0
msvcrt_name db 'msvcrt.dll',0
_ExitProcess dw 0
db 'ExitProcess',0
_printf dw 0
db 'printf',0
_puts dw 0
db 'puts',0
and built it with fasm. Resulting binary size is 2048 bytes.
I've opened it with CFF Explorer to see PE header values.
Image base is 0x400000, entry point is 0x1000, .text section virtual address is 0x1000 too, so, as far as I understand, it should start in virtual memory at offset 0x401000 and it is also its entry point.
Then I've opened it in debugger (I use x64dbg) to confirm my guess:
Looks believable. Also note that stack is located at 0x8A000.
Fine, then I've tried the same with another program – notepad.exe from C:\Windows:
Wait, what? 0x140000000 + 0x24050 = 0x140024050, not 0x7FF75FD04050. And I can't find in PE headers such big values starting with 7FF.
In addition, the stack is again located somewhere at the beginning of the process's memory map, but now its address is already much larger:
I thought that perhaps this is because notepad.exe is a system program and is tightly tied to the Windows system APIs, and some parts of it (and maybe all the code) are always loaded into RAM while Windows is running. Therefore, I tried to do the same with x64dbg itself, and saw about the same picture:
image base: 0x140000000
entry point (in headers): 0x2440
entry point in VM: 0x7FF6B0E82440
location of stack in VM: 0xFDA07F8000
So the questions are:
Why are sections of some programs mapped to addresses greater than 0x7ff000000000, which doesn't match PE headers?
How are these processes different from others?
How does the OS decide where to place the stack in virtual memory?
Each thread has its own stack. As you can see from the screenshots, thread stacks are usually placed before code sections. If the program starts a dynamic number of threads, this memory may not be enough. Where, in this case, will stacks be allocated for new threads?
How can I programmatically, having an executable file, but not running it, statically determine at what addresses in the virtual memory of its process the sections, the stack will be located, and what address spaces will be available for allocation on the heap?
I understand that this can be difficult to explain in a nutshell, so I appreciate if, in addition to answering my questions, you can recommend me some reading material that will help me improve my understanding of the Windows virtual memory mapping.

What you're seeing is Address space layout randomization, which is enabled by default in MSVC with the linker flag:
/DYNAMICBASE.
To enable this, the flag IMAGE_DLLCHARACTERISTICS_DYNAMIC_BASE (0x40) must be set in the PE header, at FileHeader -> OptionalHeader -> DllCharacteristics.
When enabled, the OS will select a random address for the base image, stack, and heap. The ImageBase specified in the PE header will be ignored.

Related

Direction Flag DF in the Windows 64-bit calling convention?

According to the docs I can find on calling windows functions, the following applies:-
The Microsoft x64 calling convention[12][13] is followed on Windows
and pre-boot UEFI (for long mode on x86-64). It uses registers RCX,
RDX, R8, R9 for the first four integer or pointer arguments (in that
order), and additional arguments are pushed onto the stack (right to
left). Integer return values (similar to x86) are returned in RAX if
64 bits or less.
In the Microsoft x64 calling convention, it's the caller's
responsibility to allocate 32 bytes of "shadow space" on the stack
right before calling the function (regardless of the actual number of
parameters used), and to pop the stack after the call. The shadow
space is used to spill RCX, RDX, R8, and R9,[14] but must be made
available to all functions, even those with fewer than four
parameters.
The registers RAX, RCX, RDX, R8, R9, R10, R11 are considered volatile
(caller-saved).[15]
The registers RBX, RBP, RDI, RSI, RSP, R12, R13, R14, and R15 are
considered nonvolatile (callee-saved).[15]
So, I have been happily calling kernel32 until a call to GetEnvironmentVariableA failed under certain circumstances. I finally traced it back to the fact that the direction flag DF was set and I needed to clear it.
I have not up till now been able to find any mention of this and wondered if it was prudent to always clear it before a call.
Or maybe that would cause other problems. Anyone aware of the conventions of calling in this instance?
Windows assumes that the direction flag is cleared. Despite in article said about C run-time only, this is true for whole windows (I think because windows code itself is primarily written in c/c++). So when your programme begins to execute - you can assume that DF is 0. Usually you do not need to change this flag. However if you temporarily change it (set it to 1) in some internal routine you must clear it by cld before calling any windows API or any external module (because it assumes that DF is 0).
All windows interrupts at very beginning of execution clear DF to 0 - so it is safe to temporarily set DF to 1 in own internal code, main - before any external call reset it back to 0.

Assemble IA-32 mov [bootdrv], dl

I just start to program IA-32 assemble and boot loader and I can't understand one command: mov [bootdrv], dl.
dl is the low 8 bits of data register, but I dont know what is [bootdrv]. Is it a variable or something? How could a register be placed in [bootdrv]?
start:
mov ax,0x7c0 ; BIOS puts us at 0:07C00h, so set DS accordinly
mov ds,ax ; Therefore, we don't have to add 07C00h to all our data
mov [bootdrv], dl ; quickly save what drive we booted from
This is the beginning 3 line of a boot loader and [bootdrv] just appear without any definition, I couldn't understand.
Any information would be helpful and appreciated, thank you!
[bootdrv] is a specification of an absolute memory address. The code:
mov [bootdrv], dl
copies the contents of the 8-bit DL register into a byte in memory, at the address resulting of multiplying the current value of DS by 16, then add the value bootdrv. bootdrv itself is a label, which a value that represents where in the current data segment is the memory position located.
On the other hand, the symbol bootdrv must be defined somewhere. Otherwise, the assembler will stop with a "symbol not defined" error. Maybe it's defined past the code (assemblers do two passes through the source code in order to get all symbols so they can be used even if they are defined after the code sequence that uses them). Maybe it's in a separate .INC file.
mov [bootdrv], dl indicates a segment:offset memory access. In the previous instruction, you configured the Data Segment register with an address, so the mov [bootdrv], dl instruction writes to the segment:offset address 0x7c0:bootdrv, whatever bootdrv might be.

How to disambiguate instructions from data in the .text segment of a PE file?

I have a PE file and I try to disassemble it in order to get it's instructions. However I noticed that .text segment contains not only instructions but also some data (I used IDA to notice that). Here's the example:
.text:004037E4 jmp ds:__CxxFrameHandler3
.text:004037EA ; [00000006 BYTES: COLLAPSED FUNCTION _CxxThrowException. PRESS KEYPAD "+" TO EXPAND]
.text:004037F0 ;
.text:004037F0 mov ecx, [ebp-10h]
.text:004037F3 jmp ds:??1exception#std##UAE#XZ ; std::exception::~exception(void)
.text:004037F3 ;
.text:004037F9 byte_4037F9 db 8Bh, 54h, 24h ; DATA XREF: sub_401440+2o
.text:004037FC dd 0F4428D08h, 33F04A8Bh, 0F6B2E8C8h, 0C4B8FFFFh, 0E9004047h
.text:004037FC dd 0FFFFFFD0h, 3 dup(0CCCCCCCCh), 0E904458Bh, 0FFFFD9B8h
.text:00403828 dword_403828 dd 824548Bh, 8BFC428Dh, 0C833F84Ah, 0FFF683E8h, 47F0B8FFh
.text:00403828 ; DATA XREF: sub_4010D0+2o
.text:00403828 ; .text:00401162o
.text:00403828 dd 0A1E90040h, 0CCFFFFFFh, 3 dup(0CCCCCCCCh), 50E0458Dh
.text:00403828 dd 0FFD907E8h, 458DC3FFh, 0D97EE9E0h
.text:00403860 db 2 dup(0FFh)
.text:00403862 word_403862 dw 548Bh
How can I distinct such data from instructions? My solution to this problem was to find simply the first instruction (enter address) and visit each instruction and all called functions. Unfortunatelly it occured that there are some blocks of code which are not directly called but their addresses are in .rdata segment among some data and I have no idea how distinct valid instruction addresses from data.
To sum up: is there any way to decide whether some address in .text segment contains data or instructions? Or maybe is there any way to decide which potential addresses in .rdata should be interpreted as instructions addresses and which as data?
You cannot, in general. The .text section of a PE file can mix up code and constants any way the author likes. Programs like IDA try to make sense of this by starting with the entrypoints and then disassembling, and seeing which addresses are targets of jumps, and which of reads. But devious programs can 'pun' between instructions and data.

Malloc and Free multiple arrays in assembly

I'm trying to experiment with malloc and free in assembly code (NASM, 64 bit).
I have tried to malloc two arrays, each with space for 2 64 bit numbers. Now I would like to be able to write to their values (not sure if/how accessing them will work exactly) and then at the end of the whole program or in the case of an error at any point, free the memory.
What I have now works fine if there is one array but as soon as I add another, it fails on the first attempt to deallocate any memory :(
My code is currently the following:
extern printf, malloc, free
LINUX equ 80H ; interupt number for entering Linux kernel
EXIT equ 60 ; Linux system call 1 i.e. exit ()
segment .text
global main
main:
push dword 16 ; allocate 2 64 bit numbers
call malloc
add rsp, 4 ; Undo the push
test rax, rax ; Check for malloc failure
jz malloc_fail
mov r11, rax ; Save base pointer for array
; DO SOME CODE/ACCESSES/OPERATIONS HERE
push dword 16 ; allocate 2 64 bit numbers
call malloc
add rsp, 4 ; Undo the push
test rax, rax ; Check for malloc failure
jz malloc_fail
mov r12, rax ; Save base pointer for array
; DO SOME CODE/ACCESSES/OPERATIONS HERE
malloc_fail:
jmp dealloc
; Finish Up, deallocate memory and exit
dealloc:
dealloc_1:
test r11, r11 ; Check that the memory was originally allocated
jz dealloc_2 ; If not, try the next block of memory
push r11 ; push the address of the base of the array
call free ; Free this memory
add rsp, 4
dealloc_2:
test r12, r12
jz dealloc_end
push r12
call free
add rsp, 4
dealloc_end:
call os_return ; Exit
os_return:
mov rax, EXIT
mov rdi, 0
syscall
I'm assuming the above code is calling the C functions malloc() and free()...
If 1st malloc() fails, you arrive at dealloc_1 with whatever garbage is in r11 and r12 after returning from the malloc().
If 2nd malloc() fails, you arrive at dealloc_1 with whatever garbage is in r12 after returning from the malloc().
Therefore, you have to zero out r11 and r12 before doing the first allocation.
Since this is 64-bit mode, all pointers/addresses and sizes are normally 64-bit. When you pass one of those to a function, it has to be 64-bit. So, push dword 16 isn't quite right. It should be push qword 16 instead. Likewise, when you are removing these parameters from the stack, you have to remove exactly as many bytes as you've put there, so add rsp, 4 must change to add rsp, 8.
Finally, I don't know which registers malloc() and free() preserve and which they don't. You may need to save and restore the so-called volatile registers (see your C compiler documentation). The same holds for the code not shown. It must preserve r11 and r12 so they can be used for deallocation. EDIT: And I'd check if it's the right way of passing parameters through the stack (again, see your compiler documentation).
EDIT: you're testing r11 for 0 right before 2nd free(). It should be r12. But free() doesn't really mind receiving NULL pointers. So, these checks can be removed.
Pay attention to your code.
You have to obey x86-64 calling conventions: arguments might be passed through registers, in the case of malloc that would be RDI for the size. And as already pointed out, you have to watch out which registers are preserved by the called functions. (afaik only RBP, RSP and R12-R15 are preserved across function calls)
There are at least two bugs, because you test r11 again (the line test r11,r11 after dealloc_2:, but you supposedly wanted to test r12 here. Additionally you want to push a qword, if you are in 64 bit mode.
The reason the deallocation doesn't work at all may be because you are changing the contents of r11 or r12.
Not that both tests are not needed, as it is perfectly safe to call free with an null pointer.

System Calls in Windows & Native API?

Recently I've been using lot of assembly language in *NIX operating systems. I was wondering about the Windows domain.
Calling convention in Linux:
mov $SYS_Call_NUM, %eax
mov $param1 , %ebx
mov $param2 , %ecx
int $0x80
Thats it. That is how we should make a system call in Linux.
Reference of all system calls in Linux:
Regarding which $SYS_Call_NUM & which parameters we can use this reference : http://docs.cs.up.ac.za/programming/asm/derick_tut/syscalls.html
OFFICIAL Reference : http://kernel.org/doc/man-pages/online/dir_section_2.html
Calling convention in Windows:
???
Reference of all system calls in Windows:
???
Unofficial : http://www.metasploit.com/users/opcode/syscalls.html , but how do I use these in assembly unless I know the calling convention.
OFFICIAL : ???
If you say, they didn't documented it. Then how is one going to write libc for windows without knowing system calls? How is one gonna do Windows Assembly programming? Atleast in the driver programming one needs to know these. right?
Now, whats up with the so called Native API? Is Native API & System calls for windows both are different terms referring to same thing? In order to confirm I compared these from two UNOFFICIAL Sources
System Calls: http://www.metasploit.com/users/opcode/syscalls.html
Native API: http://undocumented.ntinternals.net/aindex.html
My observations:
All system calls are beginning with letters Nt where as Native API is consisting of lot of functions which are not beginning with letters Nt.
System Call of windows are subset of Native API. System calls are just part of Native API.
Can any one confirm this and explain.
EDIT:
There was another answer. It was a 2nd answer. I really liked it but I don't know why answerer has deleted it. I request him to repost his answer.
If you're doing assembly programming under Windows you don't do manual syscalls. You use NTDLL and the Native API to do that for you.
The Native API is simply a wrapper around the kernelmode side of things. All it does is perform a syscall for the correct API.
You should NEVER need to manually syscall so your entire question is redundant.
Linux syscall codes do not change, Windows's do, that's why you need to work through an extra abstraction layer (aka NTDLL).
EDIT:
Also, even if you're working at the assembly level, you still have full access to the Win32 API, there's no reason to be using the NT API to begin with! Imports, exports, etc all work just fine in assembly programs.
EDIT2:
If you REALLY want to do manual syscalls, you're going to need to reverse NTDLL for each relevant Windows version, add version detection (via the PEB), and perform a syscall lookup for each call.
However, that would be silly. NTDLL is there for a reason.
People have already done the reverse-engineering part: see https://j00ru.vexillium.org/syscalls/nt/64/ for a table of system-call numbers for each Windows kernel. (Note that the later rows do change even between versions of Windows 10.) Again, this is a bad idea outside of personal-use-only experiments on your own machine to learn more about asm and/or Windows internals. Don't inline system calls into code that you distribute to anyone else.
The other thing you need to know about the windows syscall convention is that as I understand it the syscall tables are generated as part of the build process. This means that they can simply change - no one tracks them. If someone adds a new one at the top of the list, it doesn't matter. NTDLL still works, so everyone else who calls NTDLL still works.
Even the mechanism used to perform syscalls (which int, or sysenter) is not fixed in stone and has changed in the past, and I think that once upon a time the same version of windows used different DLLs which used different entry mechanisms depending on the CPU in the machine.
I was interested in doing a windows API call in assembly with no imports (as an educational exercise), so I wrote the following FASM assembly to do what NtDll!NtCreateFile does. It's a rough demonstration on my 64-bit version of Windows (Win10 1803 Version 10.0.17134), and it crashes out after the call, but the return value of the syscall is zero so it is successful. Everything is set up per the Windows x64 calling convention, then the system call number is loaded into RAX, and then it's the syscall assembly instruction to run the call. My example creates the file c:\HelloWorldFile_FASM, so it has to be run "as administrator".
format PE64 GUI 4.0
entry start
section '.text' code readable executable
start:
;puting the first four parameters into the right registers
mov rcx, _Handle
mov rdx, [_access_mask]
mov r8, objectAttributes
mov r9, ioStatusBlock
;I think we need 1 stack word of padding:
push 0x0DF0AD8B
;pushing the other params in reverse order:
push [_eaLength]
push [_eaBuffer]
push [_createOptions]
push [_createDisposition]
push [_shareAcceses]
push [_fileAttributes]
push [_pLargeInterger]
;adding the shadow space (4x8)
; push 0x0
; push 0x0
; push 0x0
; push 0x0
;pushing the 4 register params into the shadow space for ease of debugging
push r9
push r8
push rdx
push rcx
;now pushing the return address to the stack:
push endOfProgram
mov r10, rcx ;copied from ntdll!NtCreateFile, not sure of the reason for this
mov eax, 0x55
syscall
endOfProgram:
retn
section '.data' data readable writeable
;parameters------------------------------------------------------------------------------------------------
_Handle dq 0x0
_access_mask dq 0x00000000c0100080
_pObjectAttributes dq objectAttributes ; at 00402058
_pIoStatusBlock dq ioStatusBlock
_pLargeInterger dq 0x0
_fileAttributes dq 0x0000000000000080
_shareAcceses dq 0x0000000000000002
_createDisposition dq 0x0000000000000005
_createOptions dq 0x0000000000000060
_eaBuffer dq 0x0000000000000000 ; "optional" param
_eaLength dq 0x0000000000000000
;----------------------------------------------------------------------------------------------------------
align 16
objectAttributes:
_oalength dq 0x30
_rootDirectory dq 0x0
_objectName dq unicodeString
_attributes dq 0x40
_pSecurityDescriptor dq 0x0
_pSecurityQualityOfService dq securityQualityOfService
unicodeString:
_unicodeStringLength dw 0x34
_unicodeStringMaxumiumLength dw 0x34, 0x0, 0x0
_pUnicodeStringBuffer dq _unicodeStringBuffer
_unicodeStringBuffer du '\??\c:\HelloWorldFile_FASM' ; may need to "run as adinistrator" for the file create to work.
ioStatusBlock:
_status_pointer dq 0x0
_information dq 0x0
securityQualityOfService:
_sqlength dd 0xC
_impersonationLevel dd 0x2
_contextTrackingMode db 0x1
_effectiveOnly db 0x1, 0x0, 0x0
I used the documentation for Ntdll!NtCreateFile, and I also used the kernel debugger to look at and copy a lot of the params.
__kernel_entry NTSTATUS NtCreateFile(
OUT PHANDLE FileHandle,
IN ACCESS_MASK DesiredAccess,
IN POBJECT_ATTRIBUTES ObjectAttributes,
OUT PIO_STATUS_BLOCK IoStatusBlock,
IN PLARGE_INTEGER AllocationSize OPTIONAL,
IN ULONG FileAttributes,
IN ULONG ShareAccess,
IN ULONG CreateDisposition,
IN ULONG CreateOptions,
IN PVOID EaBuffer OPTIONAL,
IN ULONG EaLength
);
Windows system calls are performed by calling into system DLLs such as kernel32.dll or gdi32.dll, which is done with ordinary subroutine calls. The mechanisms for trapping into the OS privileged layer is undocumented, but that is okay because DLLs like kernel32.dll do this for you.
And by system calls, I'm referring to documented Windows API entry points like CreateProcess() or GetWindowText(). Device drivers will generally use a different API from the Windows DDK.
OFFICIAL Calling convention in Windows: http://msdn.microsoft.com/en-us/library/7kcdt6fy.aspx
(hope this link survives in the future; if it doesn't, just search for "x64 Software Conventions" on MSDN).
The function calling convention differs in Linux & Windows x86_64. In both ABIs, parameters are preferably passed via registers, but the registers used differ. More on the Linux ABI can be found at http://www.x86-64.org/documentation/abi.pdf

Resources