swapper_pg_dir in kernel page table bootstrapping - memory-management

I am going through the book "Understanding the Linux Virtual Memory Management" by Gorman, and this section of Kernel Page Table really confuses me.
He says that an array called swapper_pg_dir is placed at 0x00101000 and this array holds the table entries for pg0 and pg1.
what does he mean by first pointers and second pointers?
why do PSE supported processor has 4MiB and others has 4KiB?
Furthermore, if we wanna cover memory from 1-9MiB why do we have the array that act as a table entry residing within the memory we just want to have a table for?
What do the mean by pagin unit?? doesn't it been covered by the software flow? or do we have an separate hardware unit called paging unit?? is that the MMU that they are talking about?
I am literally lost in the world of linux Memory Management, but still want to understand how it works, any guidance is appreciated

Related

About TLB entries and Page table entries

From a website about TLB: (https://www.bottomupcs.com/virtual_memory_hardware.xhtml#other-page-related-faults):
For the following parts I highlighted:
1: Is the format of TLB entries the same as PTE(page table entries)? And it's not clear
the "the page" in the page can be marked as mean TLB entry or PTE?
2: For "the pages" in go through all the pages, are they TLB entries or PTEs?
3: Why it's moved out of not "moved into"?
4: Is that the order is (1)set the two bits (2)put into TLB, or the converse?
There are two other important faults that the TLB can generally generate which help to mange accessed and dirty pages. Each page generally contains an attribute in the form of a single bit which flags if the page has been accessed or is dirty.
An accessed page is simply any page that has been accessed. 1When a page translation is initially loaded into the TLB the page can be marked as having been accessed (else why were you loading it in?[19])
2The operating system can periodically go through all the pages and clear the accessed bit to get an idea of what pages are currently in use. When system memory becomes full and it comes time for the operating system to choose pages to be swapped out to disk, obviously those pages whose accessed bit has not been reset are the best candidates for removal, because they have not been used the longest.
A dirty page is one that has data written to it, and so does not match any data already on disk. For example, if a page is loaded in from swap and then written to by a process, 3before it can be moved out of swap it needs to have its on disk copy updated. A page that is clean has had no changes, so we do not need the overhead of copying the page back to disk.
Both are similar in that they help the operating system to manage pages. The general concept is that a page has two extra bits; the dirty bit and the accessed bit. 4When the page is put into the TLB, these bits are set to indicate that the CPU should raise a fault.
When a process tries to reference memory, the hardware does the usual translation process. However, it also does an extra check to see if the accessed flag is not set. If so, it raises a fault to the operating system, which should set the bit and allow the process to continue. Similarly if the hardware detects that it is writing to a page that does not have the dirty bit set, it will raise a fault for the operating system to mark the page as dirty.
I suggest ignoring that link as a source. It is highly confusing. I do not see any mention of a specific implementation but it is clearly describing one.
In any rationally designed processor, the TLB is completely transparent to programmers (even systems programmers). It is entirely a piece of hardware.
1: Is the format of TLB entries the same as PTE(page table entries)?
A programmer never sees TLB entries. They could have the same format as PTEs. They might not.
2: For "the pages" in go through all the pages, are they TLB entries or PTEs?
Programmers have no access to TLB entries. They have to refer to PTEs.
3: Why it's moved out of not "moved into"?
Probably but there seems to be a lot of confusion in your link.
4: Is that the order is (1)set the two bits (2)put into TLB, or the converse?
This is describing some specific, unnamed implementation. Most processors have a dirty bit but not all have an accessed bit.

UNIX system call to unset the reference bit of a specific page in page table?

I'm trying to count hits of a specific set of pages, by hacking the reference bits in the page table. Is there any system call or any other way to unset reference bits (in UNIX-like systems)?
A page table is the data structure used by a virtual memory system in a computer operating system to store the mapping between virtual addresses and physical addresses. (https://en.wikipedia.org/wiki/Page_table)
In unix-like systems there is a bit associated with each page table entry, called "reference" bit, which indicates if a page was accessed since the bit was unset.
The linux kernel unsets these reference bits periodically and checks a while after that to know what pages have been accessed, in order to detect "hot" pages. But this information is very coarse grain and low-precision since it doesn't say anything about the number of accesses and their time.
I want to count accesses to specific pages during shorter epochs by unsetting reference bits then check if the pages have been accessed after a short time.
Therefore, I was wondering if there is any system call or CPU interrupt which provides means to unset "reference bits". Otherwise, I need to dive deep into kernel to see what goes on down there.
There is no API for resetting the page reference bits. Page management is a very twitchy aspect of kernel tuning and no one wants to upset it. Of course you could modify the kernel to your needs.
Instead, you might look into Valgrind which is a debugging and profiling tool for running a single program. Ordinarily it detects subtle memory errors such as detecting use of a dynamic memory block after it has been freed.
If you need page management information for the system as a whole, I think the most expedient solution is hacking the kernel.

windows memory managment: check if a page is in memory

Is there a way, in Windows, to check if a page in in memory or in disk(swap space)?
The reason I want know this is to avoid causing page fault if the page is in disk, by not accessing that page.
There is no documented way that I am aware of for accomplishing this in user mode.
That said, it is possible to determine this in kernel mode, but this would involve inspecting the Page Table Entries, which belong to the Memory Manager - not something that you really wouldn't want to do in any sort of production code.
What is the real problem you're trying to solve?
The whole point of Virtual Memory is to abstract this sort of thing away. If you are storing your own data and in user-land, put it in a data-structure that supports caching and don't think about pages.
If you are writing code in kernel-space, I know in linux you need to convert a memory address from a user-land to a kernal-space one, then there are API calls in the VMM to get at the page_table_entry, and subsequently the page struct from the address. Once that is done, you use logical operators to check for flags, one of which is "swapped". If you are trying to make something fast though, traversing and messing with memory at the page level might not be the most efficient (or safest) thing to do.
More information is needed in order to provide a more complete answer.

Why memory-mapped files are always mapped at page boundaries?

This is my first question here; I'm not sure if it is off-topic.
While self-studying, I have found the following statement regarding Operating Systems:
Operating systems that allow memory-mapped files always require files to be mapped at page boundaries. For example, with 4-KB page, a file can be mapped in starting at virtual address 4096, but not starting at virtual address 5000.
This statement is explained in the following way:
If a file could be mapped into the middle of page, a single virtual page would
need two partial pages on disk to map it. The first page, in particular, would
be mapped onto a scratch page and also onto a file page. Handling a page
fault for it would be a complex and expensive operation, requiring copying of
data. Also, there would be no way to trap references to unused parts of pages.
For these reasons, it is avoided.
I would like to ask for help to understand this answer. Particularly, what does it mean to say that "a single virtual page would need two partial pages on disk to map it"? From what I found about memory-mapped files, virtual pages are mapped to files on disk, and not to a paging file. Is this what is meant by "partial page"?
Also, what is meant by "scratch page" here? I've tried to look up this term on books (Tanenbaum's "Modern Operating Systems" and "Structured Computer Organization") and on the Web, but haven't found it.
First of all, when reading books and documentation always try to look critically at what you see. Sometimes authors tend to use language like "there is no other way" just to promote the solution that they are describing. Other ways are always possible.
Now to the matter. Modern operating systems always have a disk location for every allocated memory page. This makes sense. Once it will be necessary to discard the page in the memory - it is already clear where to put this page if it is 'dirty' or just discard it if it is not modified. This strategy is widely accepted. Although alternative policies are possible also.
The disk location can be either paging file or memory mapped file. The most common use of the memory mapped files - executables and dlls. They are (almost) never modified. If a page with the code is not used for some time - discard it. If control will come there - reread it from the file.
In the abstract that you mentioned, they say would need two partial pages on disk to map it. The first page, in particular, would be mapped onto a scratch page. They tend to present situation like there is only one solution here. In fact, it is possible to allocate page in a paging file for such combined page and handle appropriate data copying. It is also possible not to have anything in the paging file for such page and assemble this page from files using transient page. In 99% of cases disk controller can read/write only from/to the page boundary. This means that you need to read from the first file to memory page, from the second file to the transient page. Copy data from the transient page and immediately discard it.
As you see, it is perfectly possible to combine several files in one page. There is no principle problem here. Although algorithms for handling this solution will be more complex and they will consume more CPU clocks. Reconstructing such page (if it will be discarded) will require reading from several different files. In our days 4kb is rather small quantity. Saving 2kb is not a huge gain. In my opinion, looking at the benefits and the cost I would say that benefits are not significant enough.
Virtual address pages (on every machine I've ever heard of) are aligned on page sized boundaries. This is simply because it makes the math very easy. On x86, the page size is 4096. That is exactly 12 bits. To figure out which virtual page an address is referring to, you simply shift the address to the right by 12. If you were to map a disk block (assume 4096 bytes) to an address of 5000, it would start on page #1 (5000 >> 12 == 1) and end on page #2 (9095 >> 12 == 2).
Memory mapped files work by mapping a chunk of virtual address space to the file, but the data is loaded on demand (indeed, the file could be far larger than physical memory and may not fit). When you first access the virtual address, if the data isn't there (i.e. it's not in physical memory). The processor will fault and the OS has to fetch the data. When you fetch the data, you need to fetch all of the data for the page, or else you wouldn't be able to turn off the fault. If you don't have the addresses aligned, then you'd have to bring in multiple disk blocks to fill the page. You can certainly do this, it's just messy and inefficient.

Windows Memory Workings - page tables, and data

I was trying to understand following:
I know that page tables are built for translation between virtual memory and physical memory by virtual memory manager at some point. Since there are many processes running on a system, even though only process active at a time, I was wondering whether page tables for inactive process are moved to page file at any point of time? Given the fact that lower 2 GB area is reserved for windows, it would make sense that windows would keep page tables for all processes on the system. Although it would make sense as well that they are moved to page file if the current process is switched?
Same goes for the writable (data) pages. Will windows keep all the data pages for all the process in memory or move them to page file at some point. On my machine, task manager says 1.5 GB RAM is being utilized out of 3 GB and 1.5 is system cache in performance tab so my understanding is data stays in physical memory for all applications. But would there be a time when it needs to moved to paging file?
I was wondering whether page tables for inactive process are moved to page file at any point of time?
Yes, page tables are pageable.
Will windows keep all the data pages for all the process in memory or move them to page file at some point.
As far as the Windows paging policy is concerned, there's two kinds of memory: pageable and non-pageable. It doesn't really matter which process it belongs to or even if it belongs to the O/S itself, if it's pageable then it's subject to being paged out. So, yes, Windows will page out process data pages if necessary.
I suggest reading the memory management chapter in the Windows Internals book, it should cover all of this.
-scott
You are actually asking two questions here.
What's the paging policy regarding the page tables.
What's the paging policy for "writable data" pages (i.e. virtual memory with R/W permissions).
First I'll correct you a little.
Given the fact that lower 2 GB area is reserved for windows, it would
make sense that windows would keep page tables for all processes on
the system
To be exact it's the upper 2GB that are reserved to windows, more correctly - may be accessed in the kernel mode only by Windows kernel and drivers.
Now, this may surprise you, but the kernel memory may be pagable too! So technically it's not important at all which portion of the 32-bit address space is visible in the user/kernel mode. It's not related to paging.
Another correction: virtual memory may be in physical memory and saved to the page file. There's a common belief that the OS frees physical storage by on-demand saving the pages to the page file. Wrong.
Actually Windows saves memory pages to the page file before they need to be freed. In fact it dumps all the memory pages to the page file (besides of those that are related to other files, such as mapped sections) in background. There are two reasons for this:
During high load the OS will free memory pages quicker (since they're already saved)
In the kernel mode paging is not always possible. Drivers that run on high IRQL (i.e. serve the most time-critical events) may not access physical storage drivers, hence paging is not possible.
So, the answers to your questions are:
Don't know for sure, but it depends on the OS implementation details. I see no reasons why per-process page table may not be paged-out. It's needed during the context switch and modifying process virtual memory. Both situations don't belong to the time-critical events.
Definitely "writable data" memory pages are saved to the page file. Are they removed from the physical memory? On-demand only, during the system load, in the least-recent-used order.

Resources