what is pagecache page - linux-kernel

/*
* Each physical page in the system has a struct page associated with
* it to keep track of whatever it is we are using the page for at the
* moment. Note that we have no way to track which tasks are using
* a page, though if it is a pagecache page, rmap structures can tell us
* who is mapping it.
*/
include/linux/mm_types.h
Here Please lemme know what is "pagecache page" means?
Thanks!

The pagecache is - as the name suggests - a cache of physical pages.
http://www.moses.uklinux.net/patches/lki-4.html
It is used to store filesystem cache (disk cache).
Mmaped files seems to use pagecache too.
And the "pagecache page" means page, which belongs to pagecache

Related

Interaction of driver-supplied vm_operations_struct.fault method with page cache

https://manybutfinite.com/post/page-cache-the-affair-between-memory-and-files/ says "all regular file I/O happens through the page cache" (including for mmapped files). Great!
However, device drivers have the authority to choose the physical page returned by a page fault to an mmapped VMA, by installing a .fault method to the VMA's vm_operations_struct (example).
Let's say a file backed by such a driver is mmapped by a userspace app, and an access to the mapped VA page faults. What physical page ends up in the page cache representing that section of the file? Does the kernel accept the physical page returned by the driver's .fault method as the "kernel-official" page cache entry representing that section of the file? Or does the kernel take the page returned by the driver's .fault, copy it to some other kernel-managed page, and set that other page as the "official" page cache entry? Or does something else happen?
Thanks in advance for your help!

What is in the PTE address field for an anonymously zero-fill-on-demand mapped page?

When a program calls mmap to allocate an anonymous page, also known as a demand-zero page, what appears in the address field of the corresponding page table entry (PTE)? I am assuming that the kernel does not create a zero-initialized page in physical memory (and enter that physical page's page number into the PTE) until the requesting process actually touches the page — hence the term demand-zero. Since it would not be a disk address, and would not be 0 (which is for unallocated pages), what value would appear there? As a different but related question, how does the kernel "know" that this page is to be handled as a demand-zero page, i.e., that the fault handler should find a physical page and initialize it with 0 rather than copy a page from disk?
I am assuming that the kernel does not create a zero-initialized page in physical memory
Indeed, this is usually the case. Unless special cases, like for example if MAP_POPULATE is specified to explicitly request the page to be initialized (also called "pre-fauting").
what appears in the address field of the corresponding page table entry (PTE)?
Right after mmap you don't even have a PTE allocated for the page (or in general, you don't have any entry at any page table level). For what the CPU is concerned, the page doesn't even exist. If you were to walk the page table you would just get to a point (at an arbitrary level) where the corresponding entry is marked as "not present".
Since it would not be a disk address, and would not be 0 (which is for unallocated pages), what value would appear there?
For what the CPU is concerned, the page is unallocated. At the first page fault, two things can happen:
For a read page fault, the PTE is updated to point to the zero page: this is a special page that is always entirely zeroed-out and is pointed to by the PTEs of any anonymous (demand-zero) page in the system that has not been modified yet.
For a write page fault, an actual physical page will be allocated and the corresponding PTE updated to point to its physical address.
Quoting directly from the documentation:
The anonymous memory or anonymous mappings represent memory that is not backed by a filesystem. Such mappings are implicitly created for program’s stack and heap or by explicit calls to mmap(2) system call. Usually, the anonymous mappings only define virtual memory areas that the program is allowed to access. The read accesses will result in creation of a page table entry that references a special physical page filled with zeroes. When the program performs a write, a regular physical page will be allocated to hold the written data. The page will be marked dirty and if the kernel decides to repurpose it, the dirty page will be swapped out.
how does the kernel "know" that this page is to be handled as a demand-zero page, i.e., that the fault handler should find a physical page and initialize it with 0 rather than copy a page from disk?
When a page fault occurs, the kernel page fault handler (architecture-dependent) determines to which VMA the page belongs to, and retrieves the corresponding struct vm_area_struct (which was created earlier either by the kernel itself or by a mmap syscall). This structure is then passed on to architecture-independent code (do_fault()) along with the needed fault information (struct vm_fault).
The vm_area_struct then contains all the remaining necessary information to handle the fault (for example the ->vm_file field which is != NULL in case of a file-backed mapping). The field ->vm_ops points to a struct vm_operations_struct which defines a set of function pointers to call in different occasions. In particular anonymous VMAs have ->vm_ops == NULL.
For other kind of pages, ->fault() is the function used when handling a page fault. This function knows what to check and how to actually handle the fault.
B & O also describe the VMA, but do not explain how the kernel could use the VMA to distinguish between, say, an unallocated page and an allocated page to be created and zero-initialized.
Simple, just check vma->vm_ops == NULL and in such case you know that the page is a demand-zero anon page. Then on a page fault act as needed (read fault -> update PTE to point to global zero page, write fault -> allocate a page and update PTE).

how a virtual address is mapped to address on the swap partition in paging operation

I'm wondering if anyone could help me understand how a virtual address is mapped to its address one the backing store, which is used to hold moved-out pages of all user processes.
Is it a static mapping or a hash algorithm? If it's static, where such mapping is kept? It seems it can't be in the TLB or page table since according to https://en.wikipedia.org/wiki/Page_table, the PTE will be removed from both TLB and page table when a page is moved out. A description of the algorithm and C structs containing such info will be helpful.
Whether it's static mapping or hash algorithm, how to garrantee no 2 process will map its address to the same location on the swap partition, since the virtual address space of each process is so big (2^64) and the swap space is so small?
So:
during page-in, how the OS know where to find the address (corresponding to the virtual address accessed by the user process) on the swap partition to move in?
when a physical page needs to be paged out, how does the OS know where to put on the swap partition?
For the first part of your question : It is actually hardware dependent but the generic way is to keep a reference to the swap block containing the swapped out page (Depending on the implementation of the swap subsystem, it could be a pointer or a block number or an offset into a table) in it's corresponding page table entry.
EDIT:The TLB is a fast associative cache that help to do the virtual to physical page mapping very quickly. When a page is swapped out, it's entry in the TLB could be replaced by a newly active Page. But the entry in the page table cannot be replaced because page tables are not associative memory. A page table remains persistent in memory for all the duration of the process and no entry could be removed or replaced (By another virtual page). Entries in page tables could only be mapped or unmapped. When they are unmapped (Because of Swapping or freeing), the content of the entry could either hold a reference to the swap block or just an invalid value.
For the second part of your question : The system kernel maintains a list of free blocks in the swap partition. Whenever it needs to evict a RAM page, it allocates a free block and then the block reference is returned so that it can be inserted in the PTE. When the page comes back to RAM, the disk block is freed so that it could be used by other pages.
During page-in, how the OS know where to find the page (corresponding to the virtual address accessed by the user process) on the swap device to move in?
That's can actually be a fairly complicated process. The operating system has to maintain a table of where the process's pages are mapped to. This can be complicated because pages can be mapped to multiple devices and even multiple files on the same device. Some systems use the executable file for paging.
when a physical page needs to be paged out, after the virtual address for a physical page is looked up in TLB, how does the OS know where to put on the swap device?
On a rationally designed operating system, the secondary storage is allocated (or determined) when the virtual page is mapped to the process. That location remains fixed for the duration of the program being run.

Page translation of process code section in Linux. Why does the Page Table Entry get 0 for some pages?

For some reason, I need to translate the virtual address of the code section to physical address. I did following experiment:
I get the virtual address from the start_code and end_code in mm_struct of process A, which are the initial address and final address of the executable code.
I get the CR3 of process A.
I translate the virtual address to physical address page by page. For example, there are 10 pages for code section in process A. I will translate 10 virtual address of each beginning of the page.
I found out some pages will get Page Table Entry(PTE) == 0.Some pages could successfully translate to a physical address.
I tried Firefox and Minicom as my Process, and both of them will get into situation.
I guess my question is: could anyone explain to me why PTE == 0? Does it mean these pages have been swap out to disk? If this is the case, how can I find these pages?
Thanks for any input!!
It looks as if you are trying to perform page table introspection without using the kernel APIs for it. Note that the address space is arranged in a red-black tree of vm_area_struct structs and you should probably use the APIs that traverse them. The mappings might change at any time so using the proper locking for these data structures is necessary.
For example, see the get_user_pages() function. It can be used to swap-in and temporarily pin the pages into memory. Using this function for page table introspection is usually asked for because once have the physical address in hand then the kernel can swap out the page at any time.

Can a TLB hit lead to page fault in memory?

In UC Berkley Video lectures on OS by John Kubiatowicz (Prof. Kuby) available on web, he mentioned that TLB hit doesn't mean that corresponding page is in main memory. Page fault can still occur.
Technically TLBs are cache for page table entry and since all page table entries don't have their corresponding page available in main memory. Same can be true for TLBs. A TLB hit may lead to page fault.
But according to algorithms given in text books I am unable to find such a case. On a TLB miss kernel refer to page tables and update the TLB cache for appropriate address translation. Next TLB hit can't lead to page fault. When kernel swap out the page, it updates the appropriate bits for that page table entry and invalidate the corresponding TLB, so there can't be a TLB hit next time until page is loaded in main memory.
So can someone stand for correctness of Prof kuby's claim and point out a case when instead of TLB hit (the translated physical address for corresponding virtual address in found in TLB), a page fault can occur?
One example is if the memory access is different from the allowed one.
e.g. you want to write to memory that's write protected. A TLB exists, it's a hit and the address is translated. But on access you get a trap, as you're trying to write to memory that's read-only
A page fault doesnt mean a missing page in the memory. A page can still be present and be dirty. This is also a page fault.
On a general note, the page fault refers to the scenario where the obtained translation cannot be effectively used.
It may be a missing page or a dirty page or access permission mismatch.
So a TLB hit can still lead to a page fault.
patterson says:"cannot have a translation in TLB if page is not present in memory" [computer organization and design,4th ed revised, page 507]

Resources