why virtual memory copy on write need to be backed by disk page - memory-management

reading the copy on write about window's memory management, it is saying that system will find a free page in the RAM for the shared memory ( be backed immediately by disk page ).
why it is necessary to back the RAM page with disk page ? it is not swapped out, it is just created ?
I remember the RAM page only get swapped when there is not enough RAM page

The system needs the guarantee that when a write happens, space will be available. You can't fail an allocation now if the system will run out of diskspace later.
That doesn't mean the disk s written to; the page reservation is merely bookkeeping.

Related

A Process accessing memory outside of allocated region

Assume a process is allocated a certain region of virtual memory.
How will the processor react if the process happens to access a memory region outside this allocation region?
Does the processor kill the process? Or does it raise a Fault?
Thank you in advance.
Processes are not really allocated a certain region of virtual memory. They are allocated physical frames that they can access using virtual memory. Processes have virtual access to all virtual memory available.
When a high level language is compiled, it is placed in an executable. This executable is a file format which specifies several things among which is the virtual memory in use by the program. When the OS launches that executable, it will allocate certain physical pages to the newly created process. These pages contain the actual code. The OS needs to set up the page tables so that the virtual addresses that the process uses are translated to the right position in memory (the right physical addresses).
When a process attempts to jump nowhere at a virtual address it shouldn't jump to, several things can happen. It is undefined behavior.
As stated on osdev.org (https://wiki.osdev.org/Paging):
A page fault exception is caused when a process is seeking to access an area of virtual memory that is not mapped to any physical memory, when a write is attempted on a read-only page, when accessing a PTE or PDE with the reserved bit or when permissions are inadequate.
The CPU pushes an error code on the stack before firing a page fault exception. The error code must be analyzed by the exception handler to determine how to handle the exception. The bottom 3 bits of the exception code are the only ones used, bits 3-31 are reserved.
It really depends on the language you used and several factors come into play. For example, in assembly, if you try to jump in RAM to a random virtual address. Several things can happen.
If you jump into an allocated page, then the page could contain anything. It could as well contain zeroes. If it contain zeroes, then the process will keep executing the instructions until it reaches a page which isn't present in RAM and trigger a page fault. Or it could as well just end up executing a jmp to somewhere else in RAM and in the end trigger page fault.
If you jump into a page which has the present bit not set (unallocated page), then the CPU will trigger a page fault immediately. Since the page is not allocated, it will not magically become allocated. The OS needs to take action. If the page was supposed to be accessed by the process then maybe it was swapped to the hard disk and the OS needs to swap it back in RAM. If it wasn't supposed to be accessed (like in this case), the OS needs to kill the process (and it does). The OS knows the process should not access a page by looking at its memory map for that process. It should not just blindly allocate a page to a process which jumps nowhere. If the process needs more memory during execution it can ask the OS properly using system calls.
If you jump to a virtual address which, once translated by the MMU using the page tables, lands in RAM in kernel mode code (supervisor code), the CPU will trigger a page fault with supervisor and present error codes (1 0 1).
The OS uses 2 levels of permission (0 and 3). Thus all user mode processes run with permission 3. Nothing prevents one user process from accessing the memory and the code of another process except the way the page tables are set up. The page tables are often not filled up completely. If you jump to a random virtual address, anything can happen. The virtual address can be translated to anything.

What part of the RAM is used by the system file cache in Windows?

According to general notions about the page cache and this answer the system file cache essentially uses all the RAM not used by any other process. This is, as far as I know, the case for the page cache in Linux.
Since the notion of "free RAM" is a bit blurry in Windows, my question is, what part of the RAM does the system file cache use? For example, is the same as "Available RAM" in the task manager?
Yes, the RAM used by the file cache is essentially the RAM displayed as available in the Task Manager. But not exactly. I'll go into details and explain how to measure it more precisely.
The file cache is not a process listed in the list of processes in the Task Manager. However, since Vista, its memory is managed like a process. Thus I'll explain a bit of memory management for processes, the file cache being a special case.
In Windows, the RAM used by a process has essentially two states: "Active" and "Standby":
"Active" RAM is displayed in the Task Manager and resource monitor as "In Use". It is also the RAM displayed for each process in the Task Manager.
"Standby" RAM is visible in the Resource monitor globally and for each process with RAMMap.
"Standby" + "Free" RAM is what is called "Available" in the task manager. "Free" RAM tends to be near 0 in Windows but you can meaningfully consider Standby RAM is free as well.
Standby RAM is considered as "not used for a while by the process". It is the part of the RAM that will be used to give new memory to processes needing it. But it still belongs to the process and could be used directly if the owning process suddenly access it (which is considered as unlikely by the system).
Thus the file cache has "Active" RAM and "Standby" RAM. "Active" RAM is somehow the cache for data recently accessed. "Standby" RAM is the cache for data accessed a while ago. The "Active" RAM of the file cache is usually relatively small. The Standby RAM of the file cache is most often all the RAM of your computer: Total RAM - Active RAM of all processes. Indeed, other processes rarely have Standby RAM because it tends to go to the file cache if you do disk I/O quite a bit.
This is the info displayed by RAMMap for a busy server doing a lot of I/O and computation:
The file cache is the second row called "Mapped file". See that most of the 32 GB is either in the Active part of other processes, or in the Standby part of the file cache.
So finally, yes, the RAM used by the file cache is essentially the RAM displayed as available in the Task Manager. If you want to measure with more certainty, you can use RAMMap.
Your answer is not entirely true.
The file cache, also called the system cache, describes a range of virtual addresses, it has a physical working set that is tracked by MmSystemCacheWs, and that working set is a subset of all the mapped file physical pages on the system.
The system cache is a range of virtual addresses, hence PTEs, that point to mapped file pages. The mapped file pages are brought in by a process creating a mapping or brought in by the system cache manager in response to a file read.
Existing pages that are needed by the file cache in response to a read become part of the system working set. If a page in a mapped file is not present then it is paged in and it becomes part of the system working set. When a page is in more than one working set (i.e. system and a process or process and another process), it is considered to be in a shared working set on programs like VMMap.
The actual mapped file pages themselves are controlled by a section object, one per file, a data control area (for the file) and subsection objects for the file, and a segment object for the file with prototype PTEs for the file. These get created the first time a process creates a mapping object for the file, or the first time the system cache manager creates the mapping object (section object) for the file due to it needing to access the file in response to a file IO operation performed by a process.
When the system cache manager needs to read from the file, it maps 256KiB views of the file at a time, and keeps track of the view in a VACB object. A process maps a variable view of a file, typically the size of the whole file, and keeps track of this view in the process VAD. The act of mapping the view is simply filling in PTEs to point to physical pages that contain the file that are already resident by looking at the prototype PTE for that range in the file and seeing what it contains, and in the event that the prototype PTE does not point to a physical page, initialising the PTE to point to the prototype PTE instead of the page it points to, and the PTE is left invalid, and this fault will be resolved on demand on a page by page basis when the read from the view is actually performed.
The VACBs keep track of the 256KiB views of files that the cache manager has opened and the virtual address range of that view, which describes the range of 64 PTEs that service that range of virtual addresses. There is no virtual external fragmentation or page table external fragmentation as all views are the same size, and there is no physical external fragmentation, because all pages in the view are 4KiB. 256KiB is the size chosen because if it were smaller, there would be too many VACB objects (64 times as many, taking up space), and if it were larger, there would effectively be a lot of internal fragmentation from reads and hence large virtual address pollution, and also, the VACB uses the lower bits of the virtual address to store the number of I/O operations that are currently being performed on that range, so the VACB size would have to be increased by a few bits or it would be able to handle fewer concurrent I/O operations.
If the view were the whole size of the file, there would quickly be a lot of virtual address pollution, because it would be mapping in the whole of every file that is read, and file mappings are supposed to be for user processes which knowingly map a whole file view into its virtual address space, expecting the whole of the file to be accessed. There would also be a lot of virtual external fragmentation, because the views wouldn't be the same size.
As for executable images, they are mapped in separately with separate prototype PTEs and separate physical pages, separate control area, separate segment and subsection object to the data file map for the file. The process maps the image in, but the kernel also maps images for ntoskrnl.exe, hal.dll in large pages, and then driver images are on the system PTE working set.

Is Kernel Virtual Memory pages are swappable

Like each user level process has its own Virtual memory space whose pages are swapped out/in, does Linux Kernel's Virtual memory pages are swappable ?
Kernel space pages don't get page-{in,out} by design and are pinned to memory. The pages in the kernel can usually be trusted from a security point of view, while the user space pages should NOT be trusted.
For this reason you don't have to worry about accessing kernel buffers directly in your code. While its not the same the user space buffers, without worrying about handling page faults.
Kernel space pages cannot page-out by design, as you may want to consider what would your application do when the page containing the instructions for handling a page fault gets page-out!
No, kernel memory is not swapped on Linux.

TLB Hit - Checking if the page is within the process's memory space

I have been reading about the translation of virtual addresses to physical addresses. I understand that the TLB is a hardware cache that resides in the CPU's Memory Management Unit and contains mappings of recent pages accessed.
However, say there is a TLB hit - How does the OS ensure that the page can actually be accessed by the process (is within the process's allocated address space)?
I believe that one way to do that would be to check with the process's page table, but that seems to defeat the whole purpose of using a TLB. Any insights ?
It depends upon the memory management strategy that the OS is using. For examples, in case of the OS using the inverted paging table, each entry in the page table contains the id of the process (PID) that are owning the page.
For the "normal" paging, each paging entry may contain extra bits for memory protection and sharing.
At a basic level the TLB only contains pages that are in ram, and the os clears the TLB whenever a page is removed from ram.

Why there is no SIGSEGV signal on copy on write?

The copy-on-write article on wikipedia says that copy-on-write is usually implemented by giving read only access to the pages, so that when one is written, the page fault trap handler can map a unique physical memory page for it. So my question is why a user-level application doesn't receive a SIGSEGV signal when such page fault happens? Afterall, the wikipedia article on SIGSEGV says that SIGSEGV is the signal sent to a process when it makes an invalid memory reference, or segmentation fault. So in this case, that is on copy-on-write case, why no SIGSEGV is sent to the process.
I know it's been a while since this was asked, but I wanted to expand on Alexey's answer a bit.
Copy-on-write (I assume you're talking about virtual memory and not filesystems) usually works like so:
The OS knows which pages need to be copied on write. (They are the pages which are private to a process.) These pages are marked in hardware as read-only. However, the virtual memory map of the process has the pages marked as readable and writable. This means that the user process believes it has full access to the pages in question.
When a user process attempts to write to one of these pages, a page fault is generated because the processor recognizes that the page is read-only (based on the hardware marks before). Page faults are sort of like segfaults, but for the kernel instead of for user processes.
This triggers the page fault handler to run within the kernel, which looks at the page in question and sees that it's a private page which has not yet been copied. The handler will create a copy of the page and mark the copy as writable.
Then the handler will replace the old page's address with the new one in the virtual-to-physical translation table and exit.
The last instruction will be retried by the user process at this point, and this time the write will succeed because the new page is writeable at both the virtual memory map (the user process' view of memory permissions) and hardware (the kernel's view of memory permissions) levels.
A page fault is generated every time a segmentation fault occurs, but most page faults are handled by the kernel and are never passed to the process that caused them as segfaults. There are many reasons why a page fault might be handled at a lower level, including:
The page which was accessed was paged out to disk because it hadn't been used in a long time. The OS must bring it back into memory so the process can use it again.
The process is accessing a newly-allocated page for the first time, and the actual physical page hasn't been allocated yet. The OS must allocate a page and then insert it into the virtual-to-physical translation table before the memory can actually be used.
The OS is playing a hardware page access permissions trick to allow it to watch for accesses to a particular page. This is what happens in copy-on-write, but it can have other uses as well. Consider an OS-level virtualization technology like kvm, where writing to a memory-mapped device's location in memory in the guest OS should actually write to a file or the display in the host OS.
The main idea of COW is that COW is completely transparent to the user process as if it fully owned the memory without any sharing.

Resources