I was given a question in a quiz "A process size is 2^6+^2^12+2^23 bytes and total memory size of system is 4GB page size is 4k, how many page tables are there how many page directories and pages ? Assume that initially all memory was free ?
How to solve this ?
Can a process have more then 1 page tables?
Yes, some systems use multiple page tables. On the VAX, e.g., each process has three page tables.
how many page tables are there
Entirely system specific
how many page directories
Entirely system specific. Some systems do not even use page directories.
how many [] pages
Add the page size to the process size and divide by the page size.
Related
To ask the question another way, can you confirm that when you mmap() a file that you do in fact access the exact physical pages that are already in the page cache?
I ask because I’m doing testing on a 192 core machine with 1TB of RAM, on a 400GB data file that is pre-cached into the page cache prior to the test (by just dropping the cache, then doing md5sum on the file).
Initially, I had all 192 threads each mmap the file separately, on the assumption that they would all get (basically) the same memory region back (or perhaps the same memory region but somehow mapped multiple times). Accordingly, I assumed two threads using two different mappings to the same file would both have direct access to the same pages. (Let’s ignore NUMA for this example, though obviously it’s significant at higher thread counts.)
However, in practice I found performance would get terrible at higher thread counts when each thread separately mmapped the file. When we removed that and instead just did a single mmap that was passed into the thread (such that all threads just directly access the same memory region), then performance improved dramatically.
That’s all great, but I’m trying to figure out why. If in fact mmapping a file just grants direct access to the existing page cache, then I would think that it shouldn’t matter how many times you map it — it should all go to the exact same place.
But given that there was such a performance cost, it seemed to me that in fact each mmap was being independently and redundantly populated (perhaps by copying from the page cache, or perhaps by reading again from disk).
Can you comment on why I was seeing such different performance between shared access to the same memory, versus mmapping the same file?
Thanks, I appreciate your help!
I think I found my answer, and it deals with the page directory. The answer is yes, two mmapped regions of the same file will access the same underlying page cache data. However, each mapping needs to independently map each of the virtual pages to the physical pages -- meaning 2x as many entries in the page directory to access the same RAM.
Basically, each mmap() creates a new range in virtual memory. Every page of that range corresponds to a page of physical memory, and that mapping is stored in a hierarchical page directory -- with one entry per 4KB page. So every mmap() of a large region generates a huge number of entries in the page directory.
My guess is it doesn't actually define them all up front, which is why mmap() is instant to call even for a giant file. But over time it probably has to establish those entries as there are faults on the mmapped range, meaning over the course of time it gets filled out. This extra work to populate the page directory is probably why threads using different mmaps are slower than threads sharing the same mmap. And I bet the kernel needs to erase all those entries when unmapping the range -- which is why unmmap() is so slow.
(There's also the translation lookaside buffer, but that's per-CPU, and so small I don't think that matters much here.)
Anyway, it sounds like re-mapping the same region just adds extra overhead, for what seems to me like no gain.
Working Set algorithm: There are 2 processes, each one of them has its own working set window. According to theory, in that window are stored the Δ most recent pages that the process has asked for.
My problem is this: When a page must be brought to the window, are we moving that page directly from the disk (Disk -> Windown) meaning there's no need for virtual memory; or, there should be an inverted page table, that stores the pages, so that we move it from there (Disk -> Inverted Page Table -> Window).
Long question short: Is the WS algorithm connected (in any way) with the Inverted Page Table
-Thanks
It sounds like you are confused here.
1) Inverted Page Tables are simply a mechanism for implementing page tables (logical memory translation). For learning how virtual memory works, you can ignore inverted page tables.
If you move a page from disk to physical memory, you are using virtual memory.
So, no the WS is not connected with Inverted Page Tables.
i know that page tables are stored in memory , and each process has its own table , but each table has entries as the number of virtual pages in virtual memory so how can every process has a table and each table resides in main memory besides , the number of entries in each table is larger than the number of physical pages in main memory ...can someone explain that to me i'm very confused ,
Thanks in Advance.
Typically, page tables are said to be stored in the kernel-owned physical memory. However page tables can get awfully big since each process have their own page tables (unless the OS uses inverted paging scheme). For even a 32 bit address space with a typical 4KB page size, we shall require a 20 Bit virtual page number and a 12 bit offset. A 20 bit VPN(Virtual Page Number) implies that there would be 2^20 translations. Even if each translation i.e the Page Table entry requires 4 Bytes of memory, it amounts to 4x(2^20)= 4MB of memory, all just of address translations, which is awful.
Hence modern OSes place such large page tables in virtual kernel memory which is the Hard Disk, and swaps them to the physical memory whenever required. Thus page table is virtualized the same way each page is virtualized.
I would suggest you to go through this wonderful and easy book to get a clear under standing of Memory Virtualization and Paging related concepts:
http://pages.cs.wisc.edu/~remzi/OSTEP.
1)So lets say a single level page table
3)A TLB miss happens
3)The required page table is at main memory
Question : Does MMU always fetch the page table required to a number of registers inside it so that fast hardware search like TLB can be performed? I guess no that would be costly hardware
4)MMU fetch the physical page number (I guess MMU must be saved it with a format like high n-bits as virtual page no. and low m bits as physical page frame no. Please correct and explain if I am wrong)
Question: I guess there has to be a key-value map with Virtual page no as key and physical frame no. as value. How MMU search for the key in the page table. If it is a s/w like linear search than it would be very costly.
5)With hardware it appends offset bits to page frame no.
and finally a read occurs for physical address.
So this question is bugging me a lot, how the MMU performs the search for given key(virtual page entry) in page table?
The use of registers for a page table is satisfactory if the page
table is reasonably small(for example, 256 entries). Most contemporary
computers, however, allow the page table to be very large (for
example, 1 million entries). For these machines, the use of fast
registers to implement the page table is not feasible. Rather, the
page table is kept in main memory, and a page table base register (PTBR) points to the page table.
Changing page tables requires changing only this one register,
substantially reducing context-switch time.
The problem with this
approach is the time required to access a user memory location. If we
want to access location i, we must first index into the page table,
using the value in the PTBR offset by the page number for i. This task
requires a memory access. It provides us with the frame number, which
is combined with the page offset to produce the actual address. We can
then access the desired place in memory. With this scheme, two memory
accesses are needed to access a byte (one for the page-table entry,
one for the byte). Thus, memory access is slowed by a factor of 2.
This delay would be intolerable under most circumstances. We might as
well resort to swapping!
The standard solution to this problem is to
use a special, small, fastlookup hardware cache, called a translation look-aside buffer(TLB) . The
TLB is associative, high-speed memory. Each entry in the TLB consists
of two parts: a key (or tag) and a value. When the associative memory
is presented with an item, the item is compared with all keys
simultaneously. If the item is found, the corresponding value field is
returned. The search is fast; the hardware, however, is expensive.
Typically, the number of entries in a TLB is small, often numbering
between 64 and 1,024.
Source:Operating System Concepts by Silberschatz et al. page 333
On Windows, can data in memory ever exist continuously across virtual memory pages?
For example,
The string "hello", where "he" on one page and "llo" is on the next.
Any large block of data that exceeds max page size, if possible.
Of course.
Memory pages might not appear contiguously in physical memory, but through the magic of virtual memory your program is none the wiser.
VirtualQueryEx doesn't return individual pages, but ranges of pages having the same access. If you're asking whether a string could span two pages with different access, theoretically yes, but this would in general be VERY rare. It's more likely that the string you want is swapped out to disk.