Relocation register are really useful in paging? - memory-management

In a memory management system that make use of pagination, it is really useful to have a relocation register?
I think that the relocation register make sense only in a memory management system that allocate the address space of a process in contiguous region of memory.
With paging, all the work is made by the page table, so I can't understand the role of relocation register in paging.

Related

Avoiding Translation Look aside buffer ( TLB ) pollution when mmap()

When we want to write a data item, the block containing the data is brought into the cache first and data item is written into the cache. This can cause cache pollution. To avoid this, Intel has introduced no temporal instructions.
If I'm going to be using mmap() to write data to the file and never going to read again, is it possible to avoid TLB entry creation for this ? Is there anything instruction similar to non temporal instructions available ?
TLB entries are needed by the CPU to map from the virtual address to the physical address, so it is not possible to avoid them with mmap() or any similar API.
Even if it were possible to avoid storing the mapping in the TLB, every access to the mapped memory would need to reload the corresponding entries from the page tables, so the performance would be much worse.
Non-temporal accesses make sense only for stores, but the page table entries are read.

Virtual Address to Physical address translation in the light of the cache memory

I do understand how the a virtual address is translated to a physical address to access the main memory. I also understand how the cache memory works as well.
But my problem is in putting the 2 concepts together and understanding the big picture of how a process accesses memory and what will happen if we have a cache miss. so i have this drawing that will help me asks the following questions:
click to see the image ( assume one-level cache)
1- Does the process access the cache with the exact same physical address that represent the location of byte in the main memory ?
2- Is the TLB actually in the first level of Cache or is it a separate memory inside the CPU chip that is dedicated for the translation purpose ?
3- When there is a cache miss, i need to get a whole block and allocated in the cache, but the main memory organized in frames(pages) not blocks. So does a process page is divided itself to cache blocks that can be brought to cache in case of a miss ?
4- Lets assume there is a TLB miss, does that mean that I need to go all the way to the main memory and do the page walk there , or does the page walk happen in the cache ?
5- Does a TLB miss guarantee that there will be a cache miss ?
6- If you have any reading material that explain the big picture that i am trying to understand i would really appreciate sharing it with me.
Thanks and feel free to answer any single question i have asked
Yes. The cache is not memory that can be addressed separately. Cache mapping will translate a physical address into an address for the cache but this mapping is not something a process usually controls. For some CPU architecture it is completely controlled by the hardware (e.g. Intel x86). For others the operating system would be expected to program the mapping.
The TLB in the diagram you gave is for virtual to physical address mapping. It is probably not for the cache. Again on some architecture the TLBs are programmed whereas on others it is controlled by the hardware.
Page size and cache line size do not have to be the same as one relates to virtual memory and the other to physical memory. When a process access a virtual address that address will be translated to a physical address using the TLB considering page size. Once that's done the size of a page is of no concern. The access is for a byte/word at a physical address. If this causes a cache miss occurs then the cache block that will be read will be of the size of a cache block that covers the physical memory address that's being accessed.
TLB miss will require a page translation by reading other memory. This process can occur in hardware on some CPU (such as Intel x86/x64) or need to be handled in software. Once the page translation has been completed the TLB will be reloaded with the page translation.
TLB miss does not imply cache miss. TLB miss just means the virtual to physical address mapping was not known and required a page address translation to occur. A cache miss means the physical memory content could not be provided quickly.
To recap:
the TLB is to convert virtual addresses to physical address quickly. It exist to cache the virtual to physical memory mapping quickly. It does not have anything to do with physical memory content.
the cache is to allow faster access to memory. It is only there to provide the content of physical memory faster.
Keep in mind that the term cache can be used for lots of purposes (e.g. note the usage of cache when describing the TLB). TLB is a bit more specific and usually implies a virtual memory translation though that's not universal. For example some DMA controllers have a TLB too but that TLB is not necessarily used to translate virtual to physical addresses but rather to convert block addresses to physical addresses.

In context of ARMv7 what is the advantage of Linux kernel one to one mapped memory when mmu has to do a page table translation

Linux kernel virtual address are one-to-one mapped. So by subtracting a PAGE_OFFSET to virtual address we will get the physical address. That is how virt_to_phys and phys_to_virt are implemented in memory.h.
My question is what is the advantage of these one to one mapping on the armv7 mmu, when the mmu has to do the page table translation when there is a TLB miss?
Is the only advantage of one to one mapping so that S/W can directly gets the physical address of respective virtual address by just subtracting PAGE_OFFSET or there is some other advantage on ARMV7 MMU page translation too?
If there is no advantage of 1:1 mapped memory over mmu page table translation then why we need page tables for 1:1 mapped memory. I mean mmu can do the operation in similar way of virt_to_phys instead walking all the page tables.
My question is what is the advantage of these one to one mapping on
the armv7 mmu, when the mmu has to do the page table translation when
there is a TLB miss?
Your answer is partially in the question. The 1:1 mappings are implemented with 1MB sections so the TLB entry is smaller. Ie, a 4k page needs a level 1 and level 2 TLB entry and it only encompasses 4k of memory. The ARM kernel must always remain mapped as it has interrupt, page fault and other critical code which maybe called at any time.
For user space code, each 4k chunk of code is backed by an inode and maybe evicted from memory during times of memory pressure. The user space code is usually only a few hot processes/routines, so the TLB entries for them are not as critical. The TLB is often secondary to L1/L2 caches.
As well, device drivers typically need to know physical addresses as they are outside of the CPU and do not know virtual addresses. The simplicity of subtracting PAGE_OFFSET makes for efficient code.
Is the only advantage of one to one mapping so that S/W can directly gets the physical address of respective virtual address by just subtracting PAGE_OFFSET or there is some other advantage on ARMV7 MMU page translation too?
The 1:1 mapping allows for larger ranges to be mapped a one time. Typical SDRAM/core memory comes in 1MB increments. It is also very efficient. There are other possibilities, but these are probably wins for this choice.
Is the only advantage of one to one mapping so that S/W can directly
gets the physical address of respective virtual address by just
subtracting PAGE_OFFSET or there is some other advantage on ARMV7 MMU
page translation too?
The MMU must be on to use the data cache and for memory protection between user space process; each other as well as user/kernel separation. Examining the kernels use of 1:1 mappings by itself is not the full story. Other parts of the kernel need the MMU. Without the MMU, the 1:1 mapping would be the identity. Ie. PAGE_OFFSET==0. The only reason to have a fixed offset is to allow memory at any physical address to be mapped to a common virtual address. Not all platforms have the same PAGE_OFFSET value.
Another benefit of the virt_to_phys relation; the kernel is written to execute at a fixed virtual address. This means the kernel code doesn't need to be PC-relative and yet can run on platforms with different physical addresses of the core memory. Care is taken in the arm/boot assembler code to be PC-relative as the boot loader is to hand control with the MMU off. This arm/boot code sets up up an initial mapping.
See also: Find the physical address of the vector table, an exception to the virt_to_phys mapping.
Kernel data swappable?
How does the kernel manage less than 1gb?
Some details on ARM Linux boot?
Page table in linux kernel - early boot and MMU.

Paging and TLB operating systems

I'm really stuck on this question for my OS class, I don't want someone to just give me the answer though, instead if someone could tell me how to work it out.
Example Question:
This system uses simple paging and TLB
Each memory access requires 80ns
TLB access requires 10ns
TLB hit rate is 80%.
Work out the actual speedup because of the TLB?
NOTE: I changed the memory accessed required and the TLB access requires part of the question because as I said I don't want the answer, just a way to work it out.
In case the virtual address translation is cached in the TLB, all we need is one lookup in the TLB that will give us a physical address, and we are done. The interesting part is if we need to do the page table walk. Think carefully about what the system has to do in case it did not find an address in the TLB (well it already had to do a TLB look-up). Memory access takes 80ns, but how many of them do you need to actually get the physical address? Pretty much every paging architecture follows the schema that page-tables are stored in memory and only the entry point, the address that points to the base of the first page table (the root) is in a register.
If you have the amount of time you can calculate the speed-up by comparing it to the TLB access time.
On TLB Hit 80% your required to access time 2ns and to access that page in main memory required 20ns therefore one part is
0.8×(2+20)
On TLB miss i.e. (1-0.8) 20% for that you are checking TLB again so required 2ns when it is TLB miss it will check into Page Table but base Address of Page Table is into Main Memory so it requires 20ns and when it searches into PT it will getting desired Frame and again required memory access time to access data from main memory so miss calculation is
0.2×(2+20+20)
From above 2 :
Effective access time=0.8×(2+20)+0.2×(2+20+20)
= 26ns

What is TLB shootdown?

What is a TLB shootdown in SMPs?
I am unable to find much information regarding this concept. Any good example would be very much appreciated.
A TLB (Translation Lookaside Buffer) is a cache of the translations from virtual memory addresses to physical memory addresses. When a processor changes the virtual-to-physical mapping of an address, it needs to tell the other processors to invalidate that mapping in their caches.
That process is called a "TLB shootdown".
A quick example:
You have some memory shared by all of the processors in your system.
One of your processors restricts access to a page of that shared memory.
Now, all of the processors have to flush their TLBs, so that the ones that were allowed to access that page can't do so any more.
The actions of one processor causing the TLBs to be flushed on other processors is what is called a TLB shootdown.
I think the question demands a more detailed answer.
page table: a data structure that stores the mapping between virtual memory (software) and physical memory (hardware)
however, the page table can be quite large and traversing the page table (to find the virtual address's corresponding physical address) can be a time consuming process. To make this process faster, a cache called the TLB (Translation Lookaside Buffer) is used, which stores the recently accessed virtual memory addresses.
As can be clearly seen the TLB entries need to be in sync with their respective page table entries at all times. Now the TLBs are a per-core cache ie. every core has its own TLB.
Whenever a page table entry is modified by any of the cores, that particular TLB entry is invalidated in all of the cores. This process is called TLB shootdown.
TLB flushing can be triggered by various virtual memory operations that change the page table entries like page migration, freeing pages etc.

Resources