In studying shadow paging mechanisms, I learned of a case where a shadow page table starts out empty and only gets filled in as the guest VM accesses memory. It got me thinking about traditional page tables. When the OS is running and a page table becomes empty (perhaps when the page table's process terminates), I would think that page table gets released as a free page of memory.
Is there ever a case where an empty page table or even empty page directory table can exist during normal operations? Three cases I can think of are:
When the OS boots - but my understanding is that modern OSes like Linux start in real mode and then switch to paging mode, during which I would imagine process 1 gets its own page table with kernel mappings among other things. Is this correct?
If the last valid entry in a page table is then unmapped or swapped out - but I've also read that invalid entries could be used to store swap addresses, so not sure exactly.
When a new process is spawned - although I think similar to 1), a new process is started with kernel mappings and linked library mappings, so it would already have a small page table upon starting.
UPDATE: I learned that even in the shadow page table where it starts out "empty", it still has some mappings to hypervisor memory, so even then the page tables are not truly empty.
There's no point in having an empty page table, so I'll say no.
If you mean one particular table, then leaving it empty is a waste of memory. If you have an empty page table, you can free it, and in the place that pointed to the page table, you tell the CPU that there is no page table. For example, if a level-1 page table is empty, instead of pointing to it in the level-2 page table, you can put an entry in the level-2 page table which says "there is no level-1 page table for this address".
If you mean the entire set of page tables - so are there no pages at all - the CPU can't run any instructions without page tables (unless paging is turned off) so that's still a no. The CPU would triple-fault (x86) and reboot.
Related
In two level address translation, it's said that the first level page table (1K entries)will always be there in main memory for a process.
Out of 1K second level page tables , only those page tables will be there in memory which are currently in use.
Where will we store other second level page tables ( which are not currently in use) in the absence of any secondary storage (e.g. in embedded systems)?
If we can't swap out second level page tables from memory, is there no advantage of Two level Address Translation?
The advantage of a multi-level table for logical address translation without virtual memory is that one can have dynamic page table size (even if it is not paged out). Paging is just on possible benefit (however, systems with dedicated system address spaces can page page-tables without having nesting).
Suppose, the page table changes with each processes then we don't require TLB and memory for page table. We can implement it with some reasonable number of registers. But the galvin book says(not precisely but my interpretation) we have an entry in page table all pages and we have separate table for each processes so we are using pointer to refer a particular table.
Am I correct(understanding from the book)?
If then what is the need to change the page table for each context switch?
if we are arguing that we can use one page table for whole system then simple answer to this question is that using page table/process provides more security by providing memory isolation among processes running on same system. each process has its own page table means it can not interfere with other processes memory. page table management can not be achieved through registers due to size and number of page tables. suppose you want to have extra registers to store active page tables still you will need memory to store back inactive page tables this is equally expensive method(for your first line). I suggest you to spend some time on understanding of present hardware facilities and OS functionalities then try to come up with innovation in design otherwise you will remain astray from learning.
your Op title ask "does page table changes with context switch" YES page table changes on context switch
I reading the book Understanding the linux kernel, and the topic about address transition very confuses me. Book says each linear address has three fields: Directory, Table, and Offset. The Directory field relates to the Directory Table, and Table field relates to Page Table.
One thing it does not point out, or I may miss, is that whether each entry in the tables relates to a page, which is a group of linear addresses, or relates to an individual linear address.
Can someone help me?
Ok, so there are (at least) two types of page tables: single-level, and multi-level.
Single-level page tables' entries map directly to virtual addresses.
Multi-level page tables' entries can map to two different places:
They may map directly to virtual memory addresses (like single-level tables).
They may map to secondary (or tertiary, etc, etc.) page tables
Here's an example of a multi-level page table:
Remember, each page table entry holds a virtual address. It is the responsibility of the operating system to translate virtual addresses to physical addresses (the benefits of which are outside of this particular topic).
Most paging systems also maintain a frame table that keeps track of used and unused frames. The frame table is traditionally a different data structure than the page table.
You can read more about paging tables here.
You can read about page tables here.
I've seen a few (literally, only a few) links and nothing in the documentation that talks about clustering with Firebird, that it can be done.
Then, I shot for the moon on this question CLUSTER command for Firebird?, but answerer told me that Firebird doesn't even have clustered indexes at all, so now I'm really confused.
Does Firebird physically order data at all? If so, can it be ordered by any key, not just primary, and can the clustering/defragging be turned on and off so that it only does it during downtime?
If not, isn't this a hit to performance since it will take the disk longer to put together disparate rows that naturally should be right next to each other?
(DB noob)
MVCC
I found out that Firebird is based upon MVCC, so old data actually isn't overwritten until a "sweep". I like that a lot!
Again, I can't find much, but it seems like a real shame that data wouldn't be defragged according to a key.
This says that database pages are defragmented but provides no further explanation.
Firebird does not cluster records. It was designed to avoid the problems that require clustering and the fragmentation problems that come with clustered indexes. Indexes and data are stored separately, on different types of pages. Each data page contains data from only one table. Records are stored in the order they were inserted, give or take concurrent inserts, which generally go on separate pages. When old records are removed, new records will be stored in their place, so new records sometimes appear on the same page as older ones.
Many tables use an artificial primary key, generally ascending, which might be a database generated sequence or a timestamp. That practice causes records to be stored in key order, but that order is by no means guaranteed. Nor is it very interesting. When the primary key is artificial, most queries that return groups of related records are done on secondary indexes. That's a performance hit for records that are clustered because look-ups on secondary indexes require traversing two indexes because the secondary index provides only the key to the primary index, which must be traversed to find the data.
On the larger issue of defragmentation and space usage, Firebird tracks the free space on pages so new records will be inserted on pages that have had records removed. If a page becomes completely empty, it will be reallocated. This space management is done as the database runs. As you know, Firebird uses Multi-Version Concurrency Control, so when a record is updated or deleted, Firebird creates a new record version, but keeps the old version around. When all transactions that were running before the change was committed have ended, the old record version no longer serves any purposes, and Firebird will remove it. In many applications, old versions are removed in the normal course of running the database. When a transaction touches a record with old versions, Firebird checks the state of the old versions and removes them if no running transaction can read them. There is a function called "Sweep" that systematically removes unneeded old record versions. Sweep can run concurrently with other database activity, though it's better to schedule it when the database load is low. So no, it's not true that nothing is removed until you run a sweep.
Best regards,
Ann Harrison
who's worked with Firebird and it's predecessors for an embarassingly long time
BTW - as the first person to answer mentioned, Firebird does leave space on pages so that the old version of a record stays on the same page as the newer version. It's not a fixed percentage of the space, but 16 bytes per record stored on the page, so pages of tables with very short records have more free space and tables that have long records have less.
On restore, database pages are created ~70% full (as I recall, unless you specify gbak's -use_all_space switch) and the restore is done one table at a time, writing pages to the end of the database file as needed. You can imagine a scenario where pages could be condensed down to much less. Hence bringing the data together and "defragging" it.
As far as controlling the physical grouping on disk or doing an online defrag -- in Firebird there is none. Remember that just because you need to access a page does not mean your disk does a read -- file system and database cache can avoid it!
Does every process have its own page table or does it simply add it's page entries into one big page table?
Yes every process has its own pagetables. They might be shared with the parent process(copy on write) or with other processes(shared memory). But in general every process has its own.
Yes, unless you use an inverted page table see this answer. Because an inverted page table is global, each entry must also contain which process it belongs to.