Can a TLB hit lead to page fault in memory? - memory-management

In UC Berkley Video lectures on OS by John Kubiatowicz (Prof. Kuby) available on web, he mentioned that TLB hit doesn't mean that corresponding page is in main memory. Page fault can still occur.
Technically TLBs are cache for page table entry and since all page table entries don't have their corresponding page available in main memory. Same can be true for TLBs. A TLB hit may lead to page fault.
But according to algorithms given in text books I am unable to find such a case. On a TLB miss kernel refer to page tables and update the TLB cache for appropriate address translation. Next TLB hit can't lead to page fault. When kernel swap out the page, it updates the appropriate bits for that page table entry and invalidate the corresponding TLB, so there can't be a TLB hit next time until page is loaded in main memory.
So can someone stand for correctness of Prof kuby's claim and point out a case when instead of TLB hit (the translated physical address for corresponding virtual address in found in TLB), a page fault can occur?

One example is if the memory access is different from the allowed one.
e.g. you want to write to memory that's write protected. A TLB exists, it's a hit and the address is translated. But on access you get a trap, as you're trying to write to memory that's read-only

A page fault doesnt mean a missing page in the memory. A page can still be present and be dirty. This is also a page fault.
On a general note, the page fault refers to the scenario where the obtained translation cannot be effectively used.
It may be a missing page or a dirty page or access permission mismatch.
So a TLB hit can still lead to a page fault.

patterson says:"cannot have a translation in TLB if page is not present in memory" [computer organization and design,4th ed revised, page 507]

Related

What is in the PTE address field for an anonymously zero-fill-on-demand mapped page?

When a program calls mmap to allocate an anonymous page, also known as a demand-zero page, what appears in the address field of the corresponding page table entry (PTE)? I am assuming that the kernel does not create a zero-initialized page in physical memory (and enter that physical page's page number into the PTE) until the requesting process actually touches the page — hence the term demand-zero. Since it would not be a disk address, and would not be 0 (which is for unallocated pages), what value would appear there? As a different but related question, how does the kernel "know" that this page is to be handled as a demand-zero page, i.e., that the fault handler should find a physical page and initialize it with 0 rather than copy a page from disk?
I am assuming that the kernel does not create a zero-initialized page in physical memory
Indeed, this is usually the case. Unless special cases, like for example if MAP_POPULATE is specified to explicitly request the page to be initialized (also called "pre-fauting").
what appears in the address field of the corresponding page table entry (PTE)?
Right after mmap you don't even have a PTE allocated for the page (or in general, you don't have any entry at any page table level). For what the CPU is concerned, the page doesn't even exist. If you were to walk the page table you would just get to a point (at an arbitrary level) where the corresponding entry is marked as "not present".
Since it would not be a disk address, and would not be 0 (which is for unallocated pages), what value would appear there?
For what the CPU is concerned, the page is unallocated. At the first page fault, two things can happen:
For a read page fault, the PTE is updated to point to the zero page: this is a special page that is always entirely zeroed-out and is pointed to by the PTEs of any anonymous (demand-zero) page in the system that has not been modified yet.
For a write page fault, an actual physical page will be allocated and the corresponding PTE updated to point to its physical address.
Quoting directly from the documentation:
The anonymous memory or anonymous mappings represent memory that is not backed by a filesystem. Such mappings are implicitly created for program’s stack and heap or by explicit calls to mmap(2) system call. Usually, the anonymous mappings only define virtual memory areas that the program is allowed to access. The read accesses will result in creation of a page table entry that references a special physical page filled with zeroes. When the program performs a write, a regular physical page will be allocated to hold the written data. The page will be marked dirty and if the kernel decides to repurpose it, the dirty page will be swapped out.
how does the kernel "know" that this page is to be handled as a demand-zero page, i.e., that the fault handler should find a physical page and initialize it with 0 rather than copy a page from disk?
When a page fault occurs, the kernel page fault handler (architecture-dependent) determines to which VMA the page belongs to, and retrieves the corresponding struct vm_area_struct (which was created earlier either by the kernel itself or by a mmap syscall). This structure is then passed on to architecture-independent code (do_fault()) along with the needed fault information (struct vm_fault).
The vm_area_struct then contains all the remaining necessary information to handle the fault (for example the ->vm_file field which is != NULL in case of a file-backed mapping). The field ->vm_ops points to a struct vm_operations_struct which defines a set of function pointers to call in different occasions. In particular anonymous VMAs have ->vm_ops == NULL.
For other kind of pages, ->fault() is the function used when handling a page fault. This function knows what to check and how to actually handle the fault.
B & O also describe the VMA, but do not explain how the kernel could use the VMA to distinguish between, say, an unallocated page and an allocated page to be created and zero-initialized.
Simple, just check vma->vm_ops == NULL and in such case you know that the page is a demand-zero anon page. Then on a page fault act as needed (read fault -> update PTE to point to global zero page, write fault -> allocate a page and update PTE).

How do modern OS kernels (UNIX, Windows) distinguish between page faults? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I'm trying to understand how page faults are handled by the OS kernel. The wikipedia article at https://en.wikipedia.org/wiki/Page_fault distinguishes between Minor, Major and Invalid page faults.
A major page fault is one where the virtual->real address mapping is not yet present in main memory, but the real address is present on disk, and for this page fault exception, the exception handler searches the disk and brings the page frame to main memory and does the virtual->real address mapping.
An invalid page fault is when an application tries to access an unmapped address, for example, a rogue pointer. The same page fault exception is raised, but the exception handler now decides to terminate the program, mostly with a Seg Fault (core dumped) error.
My question is, how does the kernel distinguish between these two types of page faults? I'd like the answer to go into a bit of depth about this, and hopefully link me to more elaborate articles if possible. Please ask me for any clarifications!
Thanks.
Grossly speaking, the kernel has some representation of the virtual address space of the (current) process. It knows for each segment of pages how to handle page faults for it. It works in physical addresses (so its address space is not the user-mode address space), but maintain some complex data structures to efficiently represent the mapping between virtual and physical addresses (if any) and configure the MMU according to these.
See for example Gorman's book understanding the Linux virtual memory manager (some details are probably outdated).
Read also about GNU Hurd external pager mechanism.
A page fault is given the relevant (physical and/or virtual) addresses at fault (e.g. by MMU hardware). See the paging web page of osdev, and read about page tables. The kernel handles all page faults (it gets the same hardware exception for every page faults, with data describing the fault - including faulting virtual address) and determine what kind of fault it is.
On Linux you could even handle (in a non-portable, ABI & processor specific manner) the SIGSEGV signal. (Hence the kernel has gathered all the information it is able to give to your SIGSEGV handler. But read carefully signal(7)). But it is usually not worth the pain.
Look also inside the mm/ subtree of the Linux kernel source.
Read also the extensive documentation of Intel processors. Perhaps read some books on processor architecture and on operating systems, and study simpler architectures (like MMIX or RISC-V).
See Operating Systems : Three Easy Pieces notably its introduction to paging.
I would ignore the model in the Wikipedia article. An invalid page fault, is not a page fault at all but rather a failure of logical memory translation.
The concept of a major and minor page fault, IMHO is confusing. In fact, the Wikipedia article describes two different things as being a minor page fault. I even wonder if something different was intended than how the text reads.
I would rethink as this:
A process accesses a memory address.
The memory management unit attempts to translate the referenced LOGICAL PAGE to a PHYSICAL PAGE FRAME using the page tables.
If no such translation is possible (no corresponding table table entry, page table entry is marked as invalid), an access violation fault exception of some kind is generated (Invalid Page Fault in the Wiki article).
If there is already a direct mapping between the logical page and the physical page frame, we're all done.
If the page table indicates there is no physical page frame corresponding to the logical page at the moment, the CPU triggers a page fault exception.
The page fault handler executes.
The page fault handler has to find where the logical (now a virtual) page is stored.
During this process, the page fault handler may find that the page is sitting in physical memory already. There are a number of ways in which this can occur. Be this the case, all the page fault handler has to do is update the page table to reference the physical page frame and restart the instruction (This is one of the circumstances the wiki article calls "minor page fault"). All done.
The other alternative is that the virtual page is stored on disk in a page file, executable file, or shared file. In that case, the handler needs to allocate a physical page frame, read the virtual page from disk into the page frame, update the page table, then restart the instruction (what the wiki calls a "major page fault"). Because of the disk read, the "major" fault takes much longer to handle than the "minor" fault.
One of the functions of the operating system is to keep track of where all the virtual pages are stored. The specific mechanism used to find the page will depend upon a nmber of factors.

What's purpose of having valid-invalid bits is page table?

I was reading Operating System Concepts , I'm unable to understand use of valid-invalid bits in page table. Each process has it's own process table, shouldn't all entries be valid then ?
Valid-invalid bit attached to each entry in the page table:
“valid” indicates that the associated page is in the process’ logical address space, and is thus a legal page
“invalid” indicates that the page is not in the process’ logical address space
In demand paging, only the pages that are required currently are brought into the main memory.
Assume that a process has 5 pages: A, B, C, D, E and A and B are in the memory. With the help of valid-invalid bit, the system can know, when required, that pages C, D and E are not in the memory.
In short:
a 1 in valid-invalid bit signifies that the page is in memory and 0 signifies that the page may be invalid or haven't brought into the memory just yet.
If an entry is invalid, then the MMU won't use it for address translation, causing a page fault when accessing the corresponding memory area.
Because the entry isn't used by the MMU the operating system can use it to store its own information, like for example a reference to the filesystem entity (for example inode number) where it stored the data to free the main memory for some other processes (it swapped that page out)
Upon a page fault the operating system can react then, using this information it previously stored inside that entry, to get back that data from the disk into the main memory.
Of course, the invalid bit is also used to mark just as it says pages as invalid: In most systems in usea process needs to explicitly request memory from the operating system, accessing memory that hasn't been granted to that process is an access violation.
Valid indicates that the associated page is in the logical address space.
Invalid indicates that the associated page is not in logical address space.
Each process will have a page table.
Each page contains frame_number field, valid/invalid bit and other info.
Let's assume there is no valid/invalid bit entry in the page table.
Now CPU generated a logical address, MMU will translate to physical address.
How?
Hardware will be triggered and value of frame_number field of respective page number slot will be considered for translation. Whether it is garbage value, valid value or zero whatsoever it will be taken into consideration for translation. It might violates protection if the value is zero or any garbage value.
We don't want that too happen so we need special field that indicates the validity.
You may want delete the entry in page table instead of having a special field but that will lead lot of chaos. Now you have include page number field also to resolve inconsistencies in the absence of valid or invalid bit.

TLB Hit - Checking if the page is within the process's memory space

I have been reading about the translation of virtual addresses to physical addresses. I understand that the TLB is a hardware cache that resides in the CPU's Memory Management Unit and contains mappings of recent pages accessed.
However, say there is a TLB hit - How does the OS ensure that the page can actually be accessed by the process (is within the process's allocated address space)?
I believe that one way to do that would be to check with the process's page table, but that seems to defeat the whole purpose of using a TLB. Any insights ?
It depends upon the memory management strategy that the OS is using. For examples, in case of the OS using the inverted paging table, each entry in the page table contains the id of the process (PID) that are owning the page.
For the "normal" paging, each paging entry may contain extra bits for memory protection and sharing.
At a basic level the TLB only contains pages that are in ram, and the os clears the TLB whenever a page is removed from ram.

Page fault and dirty pages

I have started reading about CPU caches and I have two questions:
Lets say the CPU receives a page fault and transfers control to the kernel handler. The handler decides to evict a frame in memory which is marked dirty. Lets say the CPU caches are write back with valid and modified bits. Now, the memory content of this frame are stale and the cache contains the latest data. How does the kernel force the caches to flush?
The way the page table entry (PTE) gets marked as dirty is as follows: The TLB has a modify bit which is set when the CPU modifies the page's content. This bit is copied back to the PTE on context switch. If we get a page fault, the PTE might be non-dirty but the TLB entry might have the modified bit set (it has not been copied back yet). How is this situation resolved?
As for flushing cache, that's just a privileged instruction. The OS calls the instruction and the hardware begins flushing. There's one instruction for invalidating all values and signaling an immediate flush without write back, and there's another instruction that tells the hardware to write data back before flushing. After the instruction call, the hardware (cache controller and I/O) takes over. There are also privileged instructions that tell the hardware to flush the TLB.
I'm not certain about your second question because it's been a while since I've taken an operating systems course, but my understanding is that in the event of a page fault the page will first be brought into the page table. Any page that is removed depends on available space as well as the page replacement algorithm used. Before that page can be brought in, if the page that it is replacing has the modified bit set it must be written out first so an IO is queued up. If it's not modified, then the page is immediately replaced. Same process for the TLB. If the modified bit is set then before that page is replaced you must write it back out so an IO is queued up and you just have to wait.

Resources