With a system that employs cache memory as well as virtual memory, a choice must be made whether to use virtual addresses to access the cache or to wait until each virtual address has been mapped to the corresponding physical address and then use the physical address to access the cache.
What do you think are the consequences of using one approach versus the other in terms of cache hit ratio, on page faults and on the information that must be used to manage the cache and virtual memory systmes?
Related
I have a picture of the virtual address space of a process (for x86-64):
However, I am confused about a few things.
What is the "Physical memory map" region for?
I know the 4-page tables are found in the high canonical region but where exactly are they? (data, code, stack, heap or physical memory map?)
What is the "Physical memory map" region for?
Direct-mapping of all physical RAM (usually with hugepages) allows easy access to memory given a physical address. (i.e. add an offset to generate a virtual address you can use to load or store from there.)
Having phys<->virt be cheap makes it easier to manage memory allocations, so you can primarily track what regions of physical memory are in use.
This is how kmalloc works: it returns a kernel virtual address that points into the direct-mapped region. This is great: it doesn't have to spend any time finding free virtual address space as well, just bookkeeping for physical memory. And it doesn't have to create or modify any page tables (And freeing doesn't have to tear down page tables and invlpg.)
kmalloc requires the memory to be contiguous in physical memory, not stitching together multiple 4k pages into a contiguous virtual allocation (that's what vmalloc does), so that's one reason to maybe not use kmalloc for everything, like for larger allocations that might fail or have to stop and defrag or page out memory if the kernel can't find enough contiguous physical pages. Which it couldn't do in a context that must run without pre-emption, like in an interrupt handler. (Correct me if I'm wrong, I don't regularly actually look at Linux kernel code. Regardless of actual Linux details, the basics of this way of handling allocation is important and relevant to any OS that direct-maps all physical RAM.)
Related:
What is the rationality of Linux kernel's mapping as much RAM as possible in direct-mapping(linear mapping) area?
Confusion about different meanings of "HighMem" in Linux Kernel re: how Linux uses physical RAM that it doesn't have enough virtual address-space to keep mapped all the time. (On architectures where Linux supports the concept of Highmem, e.g. i386 but not x86-64). Still, thinking about that can be a useful thought exercise in how kernels have to deal with memory, and why it's nice that x86-64 kernels generally don't have to deal with that pain.
Linux Torvalds has ranted about 32-bit x86 PAE which expanded physical address space but not virtual, when 4GiB virtual was already not enough to comfortably deal with 4GiB physical. It's a useful perspective on how this looks from an OS developer's perspective.
I know the 4-page tables are found in the high canonical region but where exactly are they? (data, code, stack, heap or physical memory map?)
Page tables for user-space task are in physical memory dynamically allocated by the kernel, probably with kmalloc. I haven't looked at the code. Every user-space page-table refers to the page directories for the kernel part of virtual address space, which are also stored somewhere.
They're only accessed by the CPU by physical address, so there's no need for there to be a virtual mapping of them other than the direct mapping of all physical RAM.
(The CPU accesses them on TLB miss, to fetch a PTE with the translation for this virtual address. But if they used virtual addresses themselves, you'd have a catch-22 unless there was a way for the OS to prime the TLB with an entry for the virtual address in CR3, and so on. Much better to just have the OS put physical linear addresses into CR3 and the page-directory / page-table entries.)
For Linux on x86-64, each process has its own page tables. The page tables are independent 4KiB physical pages that can be allocated anywhere in physical memory. The page tables are not part of the virtual address space -- they are accessed by the page table walker hardware using their physical addresses with the bit fields of the requested virtual address as indices into the page table hierarchy. The control register CR3 contains the physical address of the 4KiB page that holds the root of the page table tree for the currently running process. The kernel knows the CR3 of each process (since it must be saved and restored on context switches), so the kernel can walk a process's page tables in software (by emulating what the page table walker does in hardware) for any desired virtual address.
I do understand how the a virtual address is translated to a physical address to access the main memory. I also understand how the cache memory works as well.
But my problem is in putting the 2 concepts together and understanding the big picture of how a process accesses memory and what will happen if we have a cache miss. so i have this drawing that will help me asks the following questions:
click to see the image ( assume one-level cache)
1- Does the process access the cache with the exact same physical address that represent the location of byte in the main memory ?
2- Is the TLB actually in the first level of Cache or is it a separate memory inside the CPU chip that is dedicated for the translation purpose ?
3- When there is a cache miss, i need to get a whole block and allocated in the cache, but the main memory organized in frames(pages) not blocks. So does a process page is divided itself to cache blocks that can be brought to cache in case of a miss ?
4- Lets assume there is a TLB miss, does that mean that I need to go all the way to the main memory and do the page walk there , or does the page walk happen in the cache ?
5- Does a TLB miss guarantee that there will be a cache miss ?
6- If you have any reading material that explain the big picture that i am trying to understand i would really appreciate sharing it with me.
Thanks and feel free to answer any single question i have asked
Yes. The cache is not memory that can be addressed separately. Cache mapping will translate a physical address into an address for the cache but this mapping is not something a process usually controls. For some CPU architecture it is completely controlled by the hardware (e.g. Intel x86). For others the operating system would be expected to program the mapping.
The TLB in the diagram you gave is for virtual to physical address mapping. It is probably not for the cache. Again on some architecture the TLBs are programmed whereas on others it is controlled by the hardware.
Page size and cache line size do not have to be the same as one relates to virtual memory and the other to physical memory. When a process access a virtual address that address will be translated to a physical address using the TLB considering page size. Once that's done the size of a page is of no concern. The access is for a byte/word at a physical address. If this causes a cache miss occurs then the cache block that will be read will be of the size of a cache block that covers the physical memory address that's being accessed.
TLB miss will require a page translation by reading other memory. This process can occur in hardware on some CPU (such as Intel x86/x64) or need to be handled in software. Once the page translation has been completed the TLB will be reloaded with the page translation.
TLB miss does not imply cache miss. TLB miss just means the virtual to physical address mapping was not known and required a page address translation to occur. A cache miss means the physical memory content could not be provided quickly.
To recap:
the TLB is to convert virtual addresses to physical address quickly. It exist to cache the virtual to physical memory mapping quickly. It does not have anything to do with physical memory content.
the cache is to allow faster access to memory. It is only there to provide the content of physical memory faster.
Keep in mind that the term cache can be used for lots of purposes (e.g. note the usage of cache when describing the TLB). TLB is a bit more specific and usually implies a virtual memory translation though that's not universal. For example some DMA controllers have a TLB too but that TLB is not necessarily used to translate virtual to physical addresses but rather to convert block addresses to physical addresses.
How is the Virtual address space greater than Physical address space ?
suppose a Virtual 0x7000 maps to physical address 0x8000, can another virtual address lets say
0x7500 map to the same physical location as 0x8000, if not then how can there be more virtual
address and limited physical memory since mapping has to convert to physical address?
Please help me understand this concept.
http://en.wikipedia.org/wiki/Virtual_memory.
Virtual Memory uses both physical ram and hard disk space to represent more memory than may physically exist, and provides an interface whereby each program can request memory resources without having to be concerned with the other programs existent on the machine and which memory addresses they may request.
The whole virtual address space does not have to be mapped to physical memory at the same time. That's what makes it "virtual". The contents of that virtual memory which is allocated but not currently mapped to physical memory reside on some form of external storage, typically disk.
It is the memory management system's job to move virtual memory pages into and out of physical memory as needed, and the requirement to do so is why virtual-memory computers can slow down overall when enough memory is allocated that it no longer all fits in physical memory at the same time.
I am not able to exactly difference between kernel logical address and virtual address. In Linux device driver book it says that all logical address are kernel virtual address, and virtual address doesn't have any linear mapping. But logically wise when we say it is logical and when we say virtual and in which situation we use these two ?
The Linux kernel maps most of the virtual address space that belongs to the kernel to perform 1:1 mapping with an offset of the first part of physical memory. (slightly less then for 1Gb for 32bit x86, can be different for other processors or configurations). For example, for kernel code on x86 address 0xc00000001 is mapped to physical address 0x1.
This is called logical mapping - a 1:1 mapping (with an offset) that allows the kernel to access most of the physical memory of the machine.
But this is not enough - sometime we have more then 1Gb physical memory on a 32bit machine, sometime we want to reference non contiguous physical memory blocks as contiguous to make thing simple, sometime we want to map memory mapped IO regions which are not RAM.
For this, the kernel keeps a region at the top of its virtual address space where it does a "random" page to page mapping. The mapping there do not follow the 1:1 pattern of the logical mapping area. This is what we call the virtual mapping.
It is important to add that on many platforms (x86 is an example), both the logical and virtual mapping are done using the same hardware mechanism (TLB controlling virtual memory). In many cases, the "logical mapping" is actually done using virtual memory facility of the processor, so this can be a little confusing. The difference therefore is the pattern according to which the mapping is done: 1:1 for logical, something random for virtual.
Basically there are 3 kinds of addressing, namely
Logical Addressing : Address is formed by base and offset. This is nothing but segmented addressing, where the address (or offset) in the program is always used with the base value in the segment descriptor
Linear Addressing : Also called virtual address. Here adresses are contigous, but the physical address are not. Paging is used to implement this.
Physical addressing : The actual address on the Main Memory!
Now, in linux, Kernel memory (in address space) is beyond 3 GB ( 3GB to 4GB), i.e. 0xc000000..The addresses used by Kernel are not Physical addresses. To map the virtual address it uses PAGE_OFFSET. Care must be taken that no page translation is involved. i.e. these addresses are contiguous in nature. However there is a limit to this, i.e. 896 MB on x86. Beyond which paging is used for translation. When you use vmalloc, these addresses are returned to access the allocated memory.
In short, when someone refers to Virtual Memory in context of User Space, then it is through Paging. If Kernel Virtual Memory is mentioned then it is either PAGE_OFFSETed or vmalloced address.
(Reference - Understanding Linux Kernel - 2.6 based )
Shash
Kernel logical addresses are mappings accessible to kernel code through normal CPU memory access functions. On 32-bit systems, only 4GB of kernel logical address space exists, even if more physical memory than that is in use. Logical address space backed by physical memory can be allocated with kmalloc.
Virtual addresses do not necessarily have corresponding logical addresses. You can allocate physical memory with vmalloc and get back a virtual address that has no corresponding logical address (on 32-bit systems with PAE, for example). You can then use kmap to assign a logical address to that virtual address.
Simply speaking, virtual address would include "high memory", which doesn't do the 1:1 mapping for the physical address,if your RAM size is more than the address range of kernel(typically,For 1G/3G in X86,your RAM is 3G but your kernel addressing range is 1G) and also the address return from kmap() and vmalloc(), which requires the kernel to establish page table for the memory mapping. since logic address is always memory mapped by the kernel(1:1 mapping), you don't need to explicitly call kernel API,like set_pte to set up the page table entry for the particular page.
so virtual address can't be logic address all the time.
I'm a little confused by the meaning of "Aliasing" between CPU-cache and Physical address.
First I found It's definition on Wikipedia :
However, VIVT suffers from aliasing problems, where several different virtual addresses may refer to the same physical address. Another problem is homonyms, where the same virtual address maps to several different physical addresses.
but after a while I saw a different definition on a presentation(ppt)
of DAC'05: "Energy-Efficient Physically Tagged Caches for Embedded Processors with
Virtual Memory"
Cache aliasing and synonyms:
Alias: Same virtual address from different contexts mapped to different physical addresses Synonym: Different virtual address mapped to the same physical address (data sharing)
As I'm not a native speaker, I don't know which is correct,
though I feel the Wiki's definition is correct.
Edit:
Concept of "aliasing" in CPU cache usually means "synonym", on the contrary is "homonym". In a more generic level, "aliasing" is "confusing" or "chaos" or something like that. So In my opinion, "aliasing" exactly means the mapping of (X->Y) is "not bijective", where
"X" = the subset of physical addresses units which has been cached. (each element is a line of byte)
"Y" = the set of valid cache lines. (elements a also "line")
You'd need to learn about Virtual Memory first, but basically it's this:
The memory addresses your program uses aren't the physical addresses that the RAM uses; they're virtual addresses mapped to physical addresses by the CPU.
Multiple virtual addressses can point to the same physical address.
That means that you can have two copies of the same data in separate parts of the cache without knowing it... and they wouldn't be updated correctly, so you'd get wrong results.
Edit:
Exerpt of reference:
Cache aliasing occurs when multiple mappings to a physical page of memory have conflicting caching states, such as cached and uncached. Due to these conflicting states, data in that physical page may become corrupted when the processor's cache is flushed. If that page is being used for DMA by a driver, this can lead to hardware stability problems and system lockups.
For those who are still unconvinced:
On ARMv4 and ARMv5 processors, cache is organized as a virtual-indexed, virtual-tagged (VIVT) cache in which both the index and the tag are based on the virtual address. The main advantage of this method is that cache lookups are faster because the translation look-aside buffer (TLB) is not involved in matching cache lines for a virtual address. However, this caching method does require more frequent cache flushing because of cache aliasing, in which the same physical address can be mapped to multiple virtual addresses.
#Wu yes you do need to understand virtual memory little to understand aliasing. Let me give you a few lines of explanation first:
Lets say I have a RAM (physical memory) of 1GB. I want to present my programmer with a view that I have 4GB memory then I use virtual memory. In virtual memory, the programmer thinks that he/she has 4GB and writes their program from that perspective. They do not need to know how much physical memory exists. The advantage is that program will run on computers with different amounts of RAM. Also, the program can run on a computer together with other programs (also consuming physical memory).
So here is how virtual memory is implement. I will give a simple 1-level virtual memory system (Intel has a 2/3-level system which just makes it complicated for explanation.
Our problem here is that the programmer has 4 Billion addresses and we only have 1 billion places to put those 4 billion addresses. So, addresses are from the virtual address space need to be mapped to physical address space. This is done using a simple index table called a Page Table. You access a Page Table with a virtual address and it gives you the physical address of that memory location.
Some details: Remember that physical space is only 1GB so the system only keeps the most recently accessed 1GB worth in physical memory and keeps the rest in system disk. When the program requests a particular address, we first check if it is already in physical memory. If so, it is returned to the program. If not, it brought from the disk and put into physical memory and then returned to the program. The latter is known as a Page Fault.
Coming back to aliasing in context of virtual memory: since there is mapping between virtual -> physical addresses, it is possible to make two virtual addresses to map to the same physical address. it is the same as saying that if I look at my page table for virtual
address X and Y, I will get the same physical address in BOTH cases.
I show below a simple example of a 8 entry Page Table. Say there are 8 vitual addresses and only 3 physical addresses. The page table looks as follows:
0: 1
1: On disk
2: 2
3: 1
4: On disk
5: On disk
6: On disk
7: 0
This mean that if virtual address 4 is accessed, you will get a page fault.
If virtual addresses 3 is accessed, you will get the physical address 1
In this case, virtual addresses 0 and 3 are aliasing to the same physical address 1 for both of them
NOTE: I used the terms physical and virtual addresses everywhere to simplify the concept. In a real system, the virtual-to-physical mapping is not on a per address basis . Instead, we map chunks of virtual space to physical space. Each chunk is called a Page (thats why the mapping table is called a page table) and the size of the chunk is a property of the ISA, e.g., Intel x86 has 4Kbyte pages.