The 3rd parameter of VirtualProtect can use flags as follow:
PAGE_EXECUTE
PAGE_NOACCESS
PAGE_READWRITE
PAGE_READONLY
...
At the first I think VirtualProtect may achieve it by using PTE's flag. But when I read the structure of PTE, I cannot find the flag in PTE which record this function's 3rd parameter.
The PTE's structure as follow:
Sorry i cannot post images (for don't have 10 reputation! ), you can find it from Google.
I want to find where the Windows record the protection flag of a virtual memory page, Is not PTE?
After read some material, I Noticed that when a PTE is invalid, the meaning of PTE's fields have changed! And then have 5-bits for protection flag.
The available ProtectionFlags are a super-set of what an Intel processor supports. Keep in mind that Windows was written to run on a variety of processors, it once supported MIPS, Itanium, Alpha and PowerPC as well. A mere footnote today, AMD/Intel won by a landslide with ARM popular on mobile devices.
An Intel processor has pretty limited support for the page protection attributes. A page table entry has:
bit 1 for (R/W), a 1 allows write access, a 0 only allows read access
bit 2 for (U/S), user/supervisory, not relevant to user mode code
bit 63 for (XD), eXecute Disabled. A late addition to AMD cores, originally marketed as "Enhanced Virus Protection", adopted by Intel. All processors you'll find today support it.
So the kernel maps the protection flags like this:
PAGE_NOACCESS: the page simply won't be mapped to RAM
PAGE_READONLY: R/W = 0, XD = 1
PAGE_READWRITE: R/W = 1, XD = 1
PAGE_EXECUTE: R/W = 0, XD = 0
Related
I am running a simulated RV64GC core in QEMU and am trying to better understand the virtual memory subsystem and address translation process in RISC-V. My simulated system runs with OpenSBI, the Linux Kernal v5.5, and a minimal rootfs.
In QEMU debug traces, I see that sometimes (most commonly with ecalls) control is passed to the SBI and the addresses change from kernel (virtual?) addresses with an offset of 0xffffffe000000000 into something that looks like real, physical, addresses in RAM. For example,
...
0xffffffe00003a192: 00000073 ecall
...
IN: sbi_ecall_0_1_handler
0x0000000080004844: 00093603 ld a2,0(s2)
0x0000000080004848: 4785 addi a5,zero,1
0x000000008000484a: 00a797b3 sll a5,a5,a0
...
In the RISC-V privileged specification version 1.11, section 4.1.12, the satp CSR (control and state register) is defined to have a MODE field that determines address translation designation. A MODE of 0 means that translation is bare (addresses are considered physical), a MODE of 8 or 9 requires Sv39 or Sv48 page-based virtual addressing, respectively, and any other MODE values are reserved.
Now, both the RISC-V privileged and unprivileged specifications don't seem to mention when satp may be changed (other than with csrrw), so this leads me to the following questions:
When control is handed to the SBI (as with the ecall above), does the satp MODE change to 0? If yes, does this mean the satp mode should be reset on a u/s/mret instruction? Are there other instances (other than csrrw) where satp is supposed to change?
If not, is there some other mechanism by which the addresses are interpreted and designated as physical? Or are the addresses (the 0x80XXXXXX addresses above) instead considered virtual and should go through the usual virtual address translation process (as outlined in section 4.3.2 of the RISC-V privileged specification)? If this is the case, when are page table entries created for this?
The memory model of RISC-V works in the following way:
M mode has its own memory protection system described under section 3.6 of privileged specifications called PMP (physical memory protection). This is to impose memory protection on lower privilege levels and also M mode itself (if lock bit is used). There is no virtual memory system in M mode.
Now in the S mode, it has page based virtual memory system that S mode can use to set virtual to physical address mapping and also to impose memory restrictions on S mode itself and also U mode.
So each privilege level has control on its own resources and the resources below it but never on the resources of the privilege level above it. This is how things work.
M mode can control memory accessible by M, S and U modes, and S mode can control memory view (virtual memory) and accessibility of S and U modes but not M mode. So satp mode never even changes when moving to M mode. As the mapping pointed by it is never even applicable to M mode. It has its on memory protection unit.
This would be huge security hole if lower privilege levels could impose memory restrictions on higher privilege levels.
On Microsoft Docs I read:
In 64-bit Windows, the theoretical amount of virtual address space is 2^64 bytes (16 exabytes), but only a small portion of the 16-exabyte range is actually used. The 8-terabyte range from 0x000'00000000 through 0x7FF'FFFFFFFF is used for user space, and portions of the 248-terabyte range from 0xFFFF0800'00000000 through 0xFFFFFFFF'FFFFFFFF are used for system space.
Since I have 64 bit pointers, I could possibly construct a pointer that points to some 0xFFFFxxxxxxxxxxxx address.
The site continues:
Code running in user mode has access to user space but does not have access to system space.
If I wereable to guess a valid address in system virtual address space, what mechanism prevents me from writing there?
I know about memory protection but that doesn't seem to offer something that distinguishes between user memory and system memory.
According to the comments by #RbMm, this information is stored in the PTE (page table entry). There seems to be a bit which defines whether access is granted from user mode.
This is confirmed by an article on OSR online, which says
Bit Name: User access
The structure itself does not seem to be part of Microsoft symbols
0:000> dt ntdll!_page*
ntdll!_PAGED_LOOKASIDE_LIST
ntdll!_PAGEFAULT_HISTORY
0:000> dt ntdll!page*
0:000> dt ntdll!*pte*
00007fff324fe910 ntdll!RtlpTestHookInitialize
The PTEs are closely supported by the CPU (the MMU, memory management unit, specifically). That's why we find additional information at OSDev, which says
U, the 'User/Supervisor' bit, controls access to the page based on privilege level. If the bit is set, then the page may be accessed by all; if the bit is not set, however, only the supervisor can access it.
In some leaked SDK files, the bit seems to be
unsigned __int64 Owner : 1;
Since the PTE is supported by the CPU, we should find similar things in Linux. And voilà, I see this SO answer which also has the bit:
#define _PAGE_USER 0x004
which exactly matches the information of OSDev.
I was reading section 'Part Id' of the following document I'm not sure how relevant this document to kernel 2.6.35 for instance; specifically it says:
..the DMA address of the memory must be within the dma_mask of the device..
and they recommend to pass certain flags, such as GFP_DMA, to kmalloc, so that it ensures the memory will fall within DMA mask provided.
However if the memory is allocated from cache pool created by kmem_cache_create, and with kmem_cache_alloc(.. GFP_ATOMIC), this doesn't meet requirements outlined in DMA-API.txt ?
On the other hand, LDD talks about __GFP_DMA flag with regard to legacy ISA devices, therefore I'm not sure this is applicable to PCI/PCIe devices.
This is x86 64-bit platform if it matters:
pci_set_dma_mask(dev, 0xffffffffffffffffULL);
pci_set_consistent_dma_mask(dev, 0xffffffffffffffffULL);
I would appreciate to hear some explanations on it.
For GFP_* for DMA
On x86:
ISA - when using kmalloc() need to bitwise-or GFP_DMA with GFP_KERNEL (or _ATOMIC) because of the following:
GFP_DMA guarantees:
(1) physical addresses are consecutive when get_free_page returns more than one page and
(2) only addresses lower than MAX_DMA_ADDRESS are returned. MAX_DMA_ADDRESS is 16MB on the PC because of ISA constraings
PCI - don't need to use GFP_DMA because there is no MAX_DMA_ADDRESS limit
The dma_mask is checked by the device when calling dma_map_* or dma_alloc_coherent.
dma_alloc_coherent ensures the memory allocated is able to be used by dma_map_* which gives other benifits too. (the implementation may choose to ignore flags that affect the location of the returned memory, like GFP_DMA)
You can refer to http://coweb.cc.gatech.edu/sysHackfest/uploads/58/DMA_howto.1.txt
I'm disassembling "Test Drive III". It's a 1990 DOS game. The *.EXE has MZ format.
I've never dealt with segmentation or DOS, so I would be grateful if you answered some of my questions.
1) The game's system requirements mention 286 CPU, which has protected mode. As far as I know, DOS was 90% real mode software, yet some applications could enter protected mode. Can I be sure that the app uses the CPU in real mode only? IOW, is it guaranteed that the segment registers contain actual offset of the segment instead of an index to segment descriptor?
2) Said system requirements mention 1 MB of RAM. How is this amount of RAM even meant to be accessed if the uppermost 384 KB of the address space are reserved for stuff like MMIO and ROM? I've heard about UMBs (using holes in UMA to access RAM) and about HMA, but it still doesn't allow to access the whole 1 MB of physical RAM. So, was precious RAM just wasted because its physical address happened to be reserved for UMA? Or maybe the game uses some crutches like LIM EMS or XMS?
3) Is CS incremented automatically when the code crosses segment boundaries? Say, the IP reaches 0xFFFF, and what then? Does CS switch to the next segment before next instruction is executed? Same goes for SS. What happens when SP goes all the way down to 0x0000?
4) The MZ header of the executable looks like this:
signature 23117 "0x5a4d"
bytes_in_last_block 117
blocks_in_file 270
num_relocs 0
header_paragraphs 32
min_extra_paragraphs 3349
max_extra_paragraphs 65535
ss 11422
sp 128
checksum 0
ip 16
cs 8385
reloc_table_offset 30
overlay_number 0
Why does it have no relocation information? How is it even meant to run without address fixups? Or is it built as completely position-independent code consisting from program-counter-relative instructions? The game comes with a cheat utility which is also an MZ executable. Despite being much smaller (8448 bytes - so small that it fits into a single segment), it still has relocation information:
offset 1
segment 0
offset 222
segment 0
offset 272
segment 0
This allows IDA to properly disassemble the cheat's code. But the game EXE has nothing, even though it clearly has lots of far pointers.
5) Is there even such thing as 'sections' in DOS? I mean, data section, code (text) section etc? The MZ header points to the stack section, but it has no information about data section. Is data and code completely mixed in DOS programs?
6) Why even having a stack section in EXE file at all? It has nothing but zeroes. Why wasting disk space instead of just saying, "start stack from here"? Like it is done with BSS section?
7) MZ header contains information about initial values of SS and CS. What about DS? What's its initial value?
8) What does an MZ executable have after the exe data? The cheat utility has whole 3507 bytes in the end of the executable file which look like
__exitclean.__exit.__restorezero._abort.DGROUP#.__MMODEL._main._access.
_atexit._close._exit._fclose._fflush._flushall._fopen._freopen._fdopen
._fseek._ftell._printf.__fputc._fputc._fputchar.__FPUTN.__setupio._setvbuf
._tell.__MKNAME._tmpnam._write.__xfclose.__xfflush.___brk.___sbrk._brk._sbrk
.__chmod.__close._ioctl.__IOERROR._isatty._lseek.__LONGTOA._itoa._ultoa.
_ltoa._memcpy._open.__open._strcat._unlink.__VPRINTER.__write._free._malloc
._realloc.__REALCVT.DATASEG#.__Int0Vector.__Int4Vector.__Int5Vector.
__Int6Vector.__C0argc.__C0argv.__C0environ.__envLng.__envseg.__envSize
Is this some kind of debugging symbol information?
Thank you in advance for your help.
Re. 1. No, you can't be sure until you prove otherwise to yourself. One giveaway would be the presence of MOV CR0, ... in the code.
Re. 2. While marketing materials aren't to be confused with an engineering specification, there's a technical reason for this. A 286 CPU could address more than 1M of physical address space. The RAM was only "wasted" in real mode, and only if an EMM (or EMS) driver wasn't used. On 286 systems, the RAM past 640kb was usually "pushed up" to start at the 1088kb mark. The ISA and on-board peripherals' memory address space was mapped 1:1 into the 640-1024kb window. To use the RAM from the real mode needed an EMM or EMS driver. From protected mode, it was simply "there" as soon as you set up the segment descriptor correctly.
If the game actually needed the extra 384kb of RAM over the 640kb available in the real mode, it's a strong indication that it either switched to protected mode or required the services or an EMM or EMS driver.
Re. 3. I wish I remembered that. On reflection, I wish not :) Someone else please edit or answer separately. Hah, I did know it at some point in time :)
Re. 4. You say "[the code] has lots of instructions like call far ptr 18DCh:78Ch". This implies one of three things:
Protected mode is used and the segment part of the address is a selector into the segment descriptor table.
There is code there that relocates those instructions without DOS having to do it.
There is code there that forcibly relocates the game to a constant position in the address space. If the game doesn't use DOS to access on-disk files, it can remove DOS completely and take over, gaining lots of memory in the process. I don't recall whether you could exit from the game back to the command prompt. Some games where "play until you reboot".
Re. 5. The .EXE header does not "point" to any stack, there is no stack section you imply, the concept of sections doesn't exist as far as the .EXE file is concerned. The SS register value is obtained by adding the segment the executable was loaded at with the SS value from the header.
It's true that the linker can arrange sections contiguously in the .EXE file, but such sections' properties are not included in the .EXE header. They often can be reverse-engineered by inspecting the executable.
Re. 6. The SS and SP values in the .EXE header are not file pointers. The EXE file might have a part that maps to the stack, but that's entirely optional.
Re. 7. This is already asked and answered here.
Re. 8. This looks like a debug symbol list. The cheat utility was linked with the debugging information left in. You can have completely arbitrary data there - often it'd various resources (graphics, music, etc.).
In Windows the high memory of every process (0x80000000 or 0xc0000000)
Is reserved for kernel code, user code cannot access these regions of memory, if it tries so an access violation exception will be thrown.
I wish to know how is the kernel space protected ?
Is it via memory segmentations or via paging ?
I would like to hear a technical explanation.
Thanks a lot,
Michael.
Assuming you are talking about x86 and x64 architectures.
Memory protection is achieved using the paging system. Each page table entry on an x86/x64 CPU has a bit to indicate whether it is a user or supervisor page. Accesses to supervisor pages are only permitted for code running with CPL<3, whereas accesses to non supervisor pages are possible regardless of CPL.
CPL is the "Current Privilege Level" which is sometimes referred to as Ring. Windows only uses two rings, although the CPU implements 4. Ring 0 is the CPU mode in which what Windows refers to as "kernel mode" runs. Ring 3 is the CPU mode in which "User mode" runs. Since code running at CPL=3 cannot access supervisor pages, this is how memory protection is implemented.
The answer for ARM is likely to be similar, but different.
That's an easy one and doesn't require talking about rings and kernel behavior. Accessing virtual memory at a particular address requires that address to be mapped, the operating system has to allocate a memory page for that address. The low-level winapi function that does that is VirtualAlloc(). Which takes an optional address, first argument. The OS will simply fail a request for an unmappable address. Otherwise the exact same mechanism that prevents you from mapping any address in the lowest 64KB of the address space.