cache miss, a TLB miss and page fault - caching

Can someone clearly explain me the difference between a cache miss, a tlb miss and page fault, and how do these affect the effective memory access time?

Let me explain all these things step by step.
The CPU generates the logical address, which contains the page number and the page offset.
The page number is used to index into the page table, to get the corresponding page frame number, and once we have the page frame of the physical memory(also called main memory), we can apply the page offset to get the right word of memory.
Why TLB(Translation Look Aside Buffer)
The thing is that page table is stored in physical memory, and sometimes can be very large, so to speed up the translation of logical address to physical address , we sometimes use TLB, which is made of expensive and faster associative memory, So instead of going into page table first, we go into the TLB and use page number to index into the TLB, and get the corresponding page frame number and if it is found, we completely avoid page table( because we have both the page frame number and the page offset) and form the physical address.
TLB Miss
If we don't find the page frame number inside the TLB, it is called a TLB miss only then we go to the page table to look for the corresponding page frame number.
TLB Hit
If we find the page frame number in TLB, its called TLB hit, and we don't need to go to page table.
Page Fault
Occurs when the page accessed by a running program is not present in physical memory. It means the page is present in the secondary memory but not yet loaded into a frame of physical memory.
Cache Hit
Cache Memory is a small memory that operates at a faster speed than physical memory and we always go to cache before we go to physical memory. If we are able to locate the corresponding word in cache memory inside the cache, its called cache hit and we don't even need to go to the physical memory.
Cache Miss
It is only after when mapping to cache memory is unable to find the corresponding block(block similar to physical memory page frame) of memory inside cache ( called cache miss ), then we go to physical memory and do all that process of going through page table or TLB.
So the flow is basically this
1.First go to the cache memory and if its a cache hit, then we are done.
2. If its a cache miss, go to step 3.
3. First go to TLB and if its a TLB hit, go to physical memory using physical address formed, we are done.
4. If its a TLB miss, then go to page table to get the frame number of your page for forming the physical address.
5. If the page is not found, its a page fault.Use one of the page replacement algorithms if all the frames are occupied by some page else just load the required page from secondary memory to physical memory frame.
End Note
The flow I have discussed is related to virtual cache(VIVT)(faster but not sharable between processes), the flow would definitely change in case of physical cache(PIPT)(slower but can be shared between processes). Cache can be addressed in multiple ways. If you are willing to dive deeply have a look at this and this.

This diagram might help to see what will happen when there is a hit or a miss.

Just imagine a process is running and requires a data item X.
At first cache memory will be checked to see if it has the requested data item, if it is there(cache hit), it will be returned.If it is not there(cache miss), it will be loaded from main memory.
If there is a cache miss main memory will be checked to see if there is page containing the requested data item(page hit) and if such page is not there (page fault), the page containing the desired item has to be brought into main memory from disk.
While processing the page fault TLB will be checked to see if the desired page's frame number is available there (TLB hit) otherwise (TLB miss)OS has to consult page table for servicing page fault.
Time required to access these types memories:
cache << main memory << disk
Cache access requires least time so a hit or miss at certain level drastically changes the effective access time.

What causes page faults? Is it always because the memory has been
moved to hard disk? Or just moved around for other applications?
Well, it depends. If your system does not support multiprogramming(In a multiprogramming system there are one or more programs loaded in main memory which are ready to execute), then definitely page fault has occurred because memory has been moved to hard disk.
If your system does support multiprogramming, then it depends on whether your operating system uses global page replacement or local page replacement. If it uses global, then yes there is a chance that memory has been moved around for other applications. But in local, the memory has been moved back to hard disk. When a process incurs a page fault, a local page replacement algorithm selects for replacement some page that belongs to that same process. On the other hand a global replacement algorithm is free to select any page in from the entire pool of frames. This discussion about these pops up more when dealing with thrashing.
I am confused of the difference between TLB miss and page faults.
TLB miss occurs when the page table entry required for conversion of virtual address to physical address is not present in the TLB(translation look aside buffer). TLB is like a cache, but it does not store data rather it stores page table entries so that we can completely bypass the page table in case of TLB hit as you can see in the diagram.
Is page fault a crash? Or is it the same as a TLB miss?
Neither of them is a crash as crash is not recoverable. But it is well known that we can recover from both page fault and TLB miss without any need for aborting the process execution.

The Operating system uses virtual memory and page tables maps these virtual address to physical address. TLB works as a cache for such mapping.
program >>> TLB >>> cache >>> Ram
A program search for a page in TLB, if it doesn't find that page it's a TLB miss and then further looks for the page in cache.
If the page is not in cache then it's a cache miss and further looks for the page in RAM.
If the page is not in RAM, then it's a page fault and program look for the data in secondary storage.
So, typical flow would be
Page Requested >> TLB miss >> cache miss >> page fault >> looks in secondary memory.

Related

Virtual Address to Physical address translation in the light of the cache memory

I do understand how the a virtual address is translated to a physical address to access the main memory. I also understand how the cache memory works as well.
But my problem is in putting the 2 concepts together and understanding the big picture of how a process accesses memory and what will happen if we have a cache miss. so i have this drawing that will help me asks the following questions:
click to see the image ( assume one-level cache)
1- Does the process access the cache with the exact same physical address that represent the location of byte in the main memory ?
2- Is the TLB actually in the first level of Cache or is it a separate memory inside the CPU chip that is dedicated for the translation purpose ?
3- When there is a cache miss, i need to get a whole block and allocated in the cache, but the main memory organized in frames(pages) not blocks. So does a process page is divided itself to cache blocks that can be brought to cache in case of a miss ?
4- Lets assume there is a TLB miss, does that mean that I need to go all the way to the main memory and do the page walk there , or does the page walk happen in the cache ?
5- Does a TLB miss guarantee that there will be a cache miss ?
6- If you have any reading material that explain the big picture that i am trying to understand i would really appreciate sharing it with me.
Thanks and feel free to answer any single question i have asked
Yes. The cache is not memory that can be addressed separately. Cache mapping will translate a physical address into an address for the cache but this mapping is not something a process usually controls. For some CPU architecture it is completely controlled by the hardware (e.g. Intel x86). For others the operating system would be expected to program the mapping.
The TLB in the diagram you gave is for virtual to physical address mapping. It is probably not for the cache. Again on some architecture the TLBs are programmed whereas on others it is controlled by the hardware.
Page size and cache line size do not have to be the same as one relates to virtual memory and the other to physical memory. When a process access a virtual address that address will be translated to a physical address using the TLB considering page size. Once that's done the size of a page is of no concern. The access is for a byte/word at a physical address. If this causes a cache miss occurs then the cache block that will be read will be of the size of a cache block that covers the physical memory address that's being accessed.
TLB miss will require a page translation by reading other memory. This process can occur in hardware on some CPU (such as Intel x86/x64) or need to be handled in software. Once the page translation has been completed the TLB will be reloaded with the page translation.
TLB miss does not imply cache miss. TLB miss just means the virtual to physical address mapping was not known and required a page address translation to occur. A cache miss means the physical memory content could not be provided quickly.
To recap:
the TLB is to convert virtual addresses to physical address quickly. It exist to cache the virtual to physical memory mapping quickly. It does not have anything to do with physical memory content.
the cache is to allow faster access to memory. It is only there to provide the content of physical memory faster.
Keep in mind that the term cache can be used for lots of purposes (e.g. note the usage of cache when describing the TLB). TLB is a bit more specific and usually implies a virtual memory translation though that's not universal. For example some DMA controllers have a TLB too but that TLB is not necessarily used to translate virtual to physical addresses but rather to convert block addresses to physical addresses.

What is the difference between demand paging and page replacement?

From what I understand, demand paging is basically paging with swapping, so you can swap in a page when it is needed. But page replacement seems like more or less the same thing, where you bring in a page is needed and switching it with an existing page in physical memory.
So is there a distinct difference?
In a system that uses demand paging, the operating system copies a disk page into physical memory only if an attempt is made to access it and that page is not already in memory (i.e., if a page fault occurs). It follows that a process begins execution with none of its pages in physical memory, and many page faults will occur until most of a process's working set of pages is located in physical memory. This is an example of a lazy loading technique.
From Wikipedia's Demand paging:
Demand paging follows that pages should only be brought into memory if
the executing process demands them. This is often referred to as lazy
evaluation as only those pages demanded by the process are swapped
from secondary storage to main memory. Contrast this to pure swapping,
where all memory for a process is swapped from secondary storage to
main memory during the process startup.
Whereas, page replacement is simply the technique which is done when there occurs a page-fault. Page replacement is a technique which is utilised for both pure swapping and demand-paging.
Page Replacement simply means swapping two processes between memory and disk.
Demand Paging is a concept in which only required pages are brought into the memory. In case where a page required is not in the memory, the system looks for free frames in the memory. If there are no free frames, then a page replacement is done to bring the required page from the disk to the memory.

Paging and TLB operating systems

I'm really stuck on this question for my OS class, I don't want someone to just give me the answer though, instead if someone could tell me how to work it out.
Example Question:
This system uses simple paging and TLB
Each memory access requires 80ns
TLB access requires 10ns
TLB hit rate is 80%.
Work out the actual speedup because of the TLB?
NOTE: I changed the memory accessed required and the TLB access requires part of the question because as I said I don't want the answer, just a way to work it out.
In case the virtual address translation is cached in the TLB, all we need is one lookup in the TLB that will give us a physical address, and we are done. The interesting part is if we need to do the page table walk. Think carefully about what the system has to do in case it did not find an address in the TLB (well it already had to do a TLB look-up). Memory access takes 80ns, but how many of them do you need to actually get the physical address? Pretty much every paging architecture follows the schema that page-tables are stored in memory and only the entry point, the address that points to the base of the first page table (the root) is in a register.
If you have the amount of time you can calculate the speed-up by comparing it to the TLB access time.
On TLB Hit 80% your required to access time 2ns and to access that page in main memory required 20ns therefore one part is
0.8×(2+20)
On TLB miss i.e. (1-0.8) 20% for that you are checking TLB again so required 2ns when it is TLB miss it will check into Page Table but base Address of Page Table is into Main Memory so it requires 20ns and when it searches into PT it will getting desired Frame and again required memory access time to access data from main memory so miss calculation is
0.2×(2+20+20)
From above 2 :
Effective access time=0.8×(2+20)+0.2×(2+20+20)
= 26ns

What is TLB shootdown?

What is a TLB shootdown in SMPs?
I am unable to find much information regarding this concept. Any good example would be very much appreciated.
A TLB (Translation Lookaside Buffer) is a cache of the translations from virtual memory addresses to physical memory addresses. When a processor changes the virtual-to-physical mapping of an address, it needs to tell the other processors to invalidate that mapping in their caches.
That process is called a "TLB shootdown".
A quick example:
You have some memory shared by all of the processors in your system.
One of your processors restricts access to a page of that shared memory.
Now, all of the processors have to flush their TLBs, so that the ones that were allowed to access that page can't do so any more.
The actions of one processor causing the TLBs to be flushed on other processors is what is called a TLB shootdown.
I think the question demands a more detailed answer.
page table: a data structure that stores the mapping between virtual memory (software) and physical memory (hardware)
however, the page table can be quite large and traversing the page table (to find the virtual address's corresponding physical address) can be a time consuming process. To make this process faster, a cache called the TLB (Translation Lookaside Buffer) is used, which stores the recently accessed virtual memory addresses.
As can be clearly seen the TLB entries need to be in sync with their respective page table entries at all times. Now the TLBs are a per-core cache ie. every core has its own TLB.
Whenever a page table entry is modified by any of the cores, that particular TLB entry is invalidated in all of the cores. This process is called TLB shootdown.
TLB flushing can be triggered by various virtual memory operations that change the page table entries like page migration, freeing pages etc.

Committed memory goes to physical RAM or reserves space in the paging file?

When I do VirtualAlloc with MEM_COMMIT this "Allocates physical storage in memory or in the paging file on disk for the specified reserved memory pages" (quote from MSDN article http://msdn.microsoft.com/en-us/library/aa366887%28VS.85%29.aspx).
All is fine up until now BUT:
the description of Committed Bytes Counter says that "Committed memory is the physical memory which has space reserved on the disk paging file(s)."
I also read "Windows via C/C++ 5th edition" and this book says that commiting memory means reserving space in the page file....
The last two cases don't make sense to me... If you commit memory, doesn't that mean that you commit to physical storage (RAM)? The page file being there for swaping out currently unused pages of memory in case memory gets low.
The book says that when you commit memory you actually reserve space in the paging file. If this were true than that would mean that for a committed page there is space reserved in the paging file and a page frame in physical in memory... So twice as much space is needed ?! Isn't the page file's purpose to make the total physical memory larger than it actually is? If I have a 1G of RAM with a 1G page file => 2G of usable "physical memory"(the book also states this but right after that it says what I discribed at point 2).
What am I missing? Thanks.
EDIT: The way I see it is perfectly described here: http://support.microsoft.com/kb/555223
"It shows how many bytes have been allocated by processes and to which the operating system has committed a RAM page frame or a page slot in the pagefile (perhaps both)"
but I have read too many things that contradict my belief like those two points above and others like this one for instance: http://blogs.msdn.com/ricom/archive/2005/08/01/446329.aspx
You're misunderstanding the way windows' memory model works. The terminology and the documentation confuse things a bit which doesn't help.
When you commit memory, the OS is providing you with a "commitment" to providing a page to back that memory. It is not actually allocating one, either from physical memory or from the page file, it is simply checking that the "uncommited pages" counter is larger than zero and then decrementing it. If this succeeds the page is marked as commited in your page table.
What happens next depends on whether you access the memory. If you don't, all you did was stop someone else using a page - it was never actually allocated though so it is impossible to say which page you didn't use. When you touch the memory though a page fault is generated. At this point the page fault handler sees that the page is commited and starts to look for a page that can be used on a number of lists of pages the memory manager keeps. If it can't find one then it will force something else out to the page file and give you that page.
So really a page is never actually allocated until you need it, when it is allocated by the page fault handler. The reason the documentation is confusing is that the above description is quite complicated. The documentation has to describe how it works without going into the gory details of how a paging memory manager works and the description is good enough.
Each page of physical memory is backed by either a pagefile (for volatile pages of memory, such as data) or a disk file (for read-only memory pages, such as code and resources). When I say backed, I mean the behavior/process that when the page is needed, the operating system reads it from the disk file, when it is no longer needed, the operating system frees it and flush the content to the disk file.
When you commit a block of virtual memory, the Virtual Memory Manager(VMM) will allocate you an entry into the process's virtual address descriptor (VAD) tree, and reserve the space in the paging file. The physical memory will only be allocated until this block memory is accessed. In the case that page file is disabled, the space will be reserved in the physical memory directly.
Please refer to these links: Managing Virtual Memory and Managing Memory-Mapped Files

Resources