In context of ARMv7 what is the advantage of Linux kernel one to one mapped memory when mmu has to do a page table translation - linux-kernel

Linux kernel virtual address are one-to-one mapped. So by subtracting a PAGE_OFFSET to virtual address we will get the physical address. That is how virt_to_phys and phys_to_virt are implemented in memory.h.
My question is what is the advantage of these one to one mapping on the armv7 mmu, when the mmu has to do the page table translation when there is a TLB miss?
Is the only advantage of one to one mapping so that S/W can directly gets the physical address of respective virtual address by just subtracting PAGE_OFFSET or there is some other advantage on ARMV7 MMU page translation too?
If there is no advantage of 1:1 mapped memory over mmu page table translation then why we need page tables for 1:1 mapped memory. I mean mmu can do the operation in similar way of virt_to_phys instead walking all the page tables.

My question is what is the advantage of these one to one mapping on
the armv7 mmu, when the mmu has to do the page table translation when
there is a TLB miss?
Your answer is partially in the question. The 1:1 mappings are implemented with 1MB sections so the TLB entry is smaller. Ie, a 4k page needs a level 1 and level 2 TLB entry and it only encompasses 4k of memory. The ARM kernel must always remain mapped as it has interrupt, page fault and other critical code which maybe called at any time.
For user space code, each 4k chunk of code is backed by an inode and maybe evicted from memory during times of memory pressure. The user space code is usually only a few hot processes/routines, so the TLB entries for them are not as critical. The TLB is often secondary to L1/L2 caches.
As well, device drivers typically need to know physical addresses as they are outside of the CPU and do not know virtual addresses. The simplicity of subtracting PAGE_OFFSET makes for efficient code.
Is the only advantage of one to one mapping so that S/W can directly gets the physical address of respective virtual address by just subtracting PAGE_OFFSET or there is some other advantage on ARMV7 MMU page translation too?
The 1:1 mapping allows for larger ranges to be mapped a one time. Typical SDRAM/core memory comes in 1MB increments. It is also very efficient. There are other possibilities, but these are probably wins for this choice.
Is the only advantage of one to one mapping so that S/W can directly
gets the physical address of respective virtual address by just
subtracting PAGE_OFFSET or there is some other advantage on ARMV7 MMU
page translation too?
The MMU must be on to use the data cache and for memory protection between user space process; each other as well as user/kernel separation. Examining the kernels use of 1:1 mappings by itself is not the full story. Other parts of the kernel need the MMU. Without the MMU, the 1:1 mapping would be the identity. Ie. PAGE_OFFSET==0. The only reason to have a fixed offset is to allow memory at any physical address to be mapped to a common virtual address. Not all platforms have the same PAGE_OFFSET value.
Another benefit of the virt_to_phys relation; the kernel is written to execute at a fixed virtual address. This means the kernel code doesn't need to be PC-relative and yet can run on platforms with different physical addresses of the core memory. Care is taken in the arm/boot assembler code to be PC-relative as the boot loader is to hand control with the MMU off. This arm/boot code sets up up an initial mapping.
See also: Find the physical address of the vector table, an exception to the virt_to_phys mapping.
Kernel data swappable?
How does the kernel manage less than 1gb?
Some details on ARM Linux boot?
Page table in linux kernel - early boot and MMU.

Related

Physical memory mapping and location of page tables

I have a picture of the virtual address space of a process (for x86-64):
However, I am confused about a few things.
What is the "Physical memory map" region for?
I know the 4-page tables are found in the high canonical region but where exactly are they? (data, code, stack, heap or physical memory map?)
What is the "Physical memory map" region for?
Direct-mapping of all physical RAM (usually with hugepages) allows easy access to memory given a physical address. (i.e. add an offset to generate a virtual address you can use to load or store from there.)
Having phys<->virt be cheap makes it easier to manage memory allocations, so you can primarily track what regions of physical memory are in use.
This is how kmalloc works: it returns a kernel virtual address that points into the direct-mapped region. This is great: it doesn't have to spend any time finding free virtual address space as well, just bookkeeping for physical memory. And it doesn't have to create or modify any page tables (And freeing doesn't have to tear down page tables and invlpg.)
kmalloc requires the memory to be contiguous in physical memory, not stitching together multiple 4k pages into a contiguous virtual allocation (that's what vmalloc does), so that's one reason to maybe not use kmalloc for everything, like for larger allocations that might fail or have to stop and defrag or page out memory if the kernel can't find enough contiguous physical pages. Which it couldn't do in a context that must run without pre-emption, like in an interrupt handler. (Correct me if I'm wrong, I don't regularly actually look at Linux kernel code. Regardless of actual Linux details, the basics of this way of handling allocation is important and relevant to any OS that direct-maps all physical RAM.)
Related:
What is the rationality of Linux kernel's mapping as much RAM as possible in direct-mapping(linear mapping) area?
Confusion about different meanings of "HighMem" in Linux Kernel re: how Linux uses physical RAM that it doesn't have enough virtual address-space to keep mapped all the time. (On architectures where Linux supports the concept of Highmem, e.g. i386 but not x86-64). Still, thinking about that can be a useful thought exercise in how kernels have to deal with memory, and why it's nice that x86-64 kernels generally don't have to deal with that pain.
Linux Torvalds has ranted about 32-bit x86 PAE which expanded physical address space but not virtual, when 4GiB virtual was already not enough to comfortably deal with 4GiB physical. It's a useful perspective on how this looks from an OS developer's perspective.
I know the 4-page tables are found in the high canonical region but where exactly are they? (data, code, stack, heap or physical memory map?)
Page tables for user-space task are in physical memory dynamically allocated by the kernel, probably with kmalloc. I haven't looked at the code. Every user-space page-table refers to the page directories for the kernel part of virtual address space, which are also stored somewhere.
They're only accessed by the CPU by physical address, so there's no need for there to be a virtual mapping of them other than the direct mapping of all physical RAM.
(The CPU accesses them on TLB miss, to fetch a PTE with the translation for this virtual address. But if they used virtual addresses themselves, you'd have a catch-22 unless there was a way for the OS to prime the TLB with an entry for the virtual address in CR3, and so on. Much better to just have the OS put physical linear addresses into CR3 and the page-directory / page-table entries.)
For Linux on x86-64, each process has its own page tables. The page tables are independent 4KiB physical pages that can be allocated anywhere in physical memory. The page tables are not part of the virtual address space -- they are accessed by the page table walker hardware using their physical addresses with the bit fields of the requested virtual address as indices into the page table hierarchy. The control register CR3 contains the physical address of the 4KiB page that holds the root of the page table tree for the currently running process. The kernel knows the CR3 of each process (since it must be saved and restored on context switches), so the kernel can walk a process's page tables in software (by emulating what the page table walker does in hardware) for any desired virtual address.

How does the address translation between Main Memory and Disk storage work?

This question comes up when I learn about virtual memory and memory management.
The following describes what I understand so far:
The memory hierarchy, which benefits from locality principle, briefly includes (from top to bottom):
register
cache (SRAM)
main memory (DRAM)
disk storage
A page table provides the address translation from virtual address to physical address
The virtual memory provides an abstraction for physical memory
Page(Frame) is the basic unit when memory management unit (MMU) manipulates memory between main memory and disk.
A process can only understand the addresses inside the virtual address space.
Each process has its own virtual address space.
Each process has its own page table, and all page tables are maintained by the kernel.
a physical address describes a location inside main memory
Translation Lookaside Buffer (TLB) is introduced as a cache-version page table.
Cache stores a tag field for each cache line to determine its mapping to main memory
Cache line is the basic unit when MMU manipulates memory between main memory and cache.
In each cache line, it stores a tag field and a valid bit to determine what range of physical addresses (e.g. what part of main memory) resides in the cache line.
After tranlsating virtual address into physical address with TLB or page table, MMU compares the physical address with cache line tag field, and finds the desired memory content (assuming that cache hits).
I believe there's an address translation mechanism between the main memory and the disk for 2 reasons:
The main memory is the locality principle result for the disk.
To reduce main memory miss rate, the main memory applies the full associative placement policy.
However, I only find few material which possiblely relates to the mechanism.
wiki: frame table data says:
Frame table data
The simplest page table systems often maintain a frame table and a page table. The frame table holds information about which frames are mapped.
and wiki: LBA says:
Logical block addressing (LBA) is a common scheme used for specifying the location of blocks of data stored on computer storage devices, generally secondary storage systems such as hard disk drives.
So I guess there's a frame table to store the address translation between physical address and LBA, and MMU would refer to the frame table when page fault occurs.
Please help to point out how does the address translation between Main Memory and Disk storage work.
Thanks for help!
Hard-disk addressing and main memory addressing is done much differently. The CPU doesn't support hard-disk addressing by default. What modern CPUs support is PCI devices that can present an interface to hard-disks like NVME or SATA.
PCI devices have registers that are memory mapped in RAM. The position of these registers is specified in the MCFG which is an ACPI table. ACPI is a convention to represent hardware for software (the os) to be able to determine what is present on the motherboard that it needs to drive. ACPI is also a power management convention which is required for software to even shutdown the computer.
With that said, you can take example on Linux and how it does things to understand how a modern os does to make the link between pages on the hard-disk and the pages in main memory. On the swap management chapter of the kernel.org documentation (https://www.kernel.org/doc/gorman/html/understand/understand014.html), you can read the following:
11.2 Mapping Page Table Entries to Swap Entries
When a page is swapped out, Linux uses the corresponding PTE to store enough information to locate the page on disk again. Obviously a PTE is not large enough in itself to store precisely where on disk the page is located, but it is more than enough to store an index into the swap_info array and an offset within the swap_map and this is precisely what Linux does.
Each PTE, regardless of architecture, is large enough to store a swp_entry_t which is declared as follows in <linux/shmem_fs.h>
16 typedef struct {
17 unsigned long val;
18 } swp_entry_t;
Two macros are provided for the translation of PTEs to swap entries and vice versa. They are pte_to_swp_entry() and swp_entry_to_pte() respectively.
You should probably read the link above along with one of my answers on cs.stackexchange.com: https://cs.stackexchange.com/questions/142525/data-transfer-between-cpu-ram-and-secondary-storage/142553#142553. This will probably provide a fair understanding of what is going on.
Everything is PCI today. You can think of all graphics cards, Intel HD Audio, the AHCI for SATA, the xHCI to drive USB or network cards. That pretty much sums what a current modern computer supports which is audio, USB, SATA, network and graphics. Understanding PCI is thus the key to understand how low level drivers work. The higher level implementation detail is unimportant and varies between os.

How does a memory managemet unit map a virtual address to a physical address

If a computer system has a main memory of 1mb and virtual address space of 16mb while the disk block size is 1kb. How does the memory management unit map a virtual address to a physical address?
I assume you intend to ask "how does MMU maps virtual memory to physical memory" (as in your question description).
To start off, virtual memory is managed by operating systems. MMU only provides hardware mechanism to take advantage of it.
Operating systems will keep a map of virtual_address -> physical_address for each individual process.
For example if a program uses virtual page [0, 1, 2, 3], operating systems can map these pages to physical pages as [64, 128, 256, 512].
Since the virtual address space is larger than physical address space, not all of the virtual memory will be mapped to physical memory at any moment if physical memory cannot hold all of them. Therefore, some of the data would be swapped out to disk, and thus not present in physical memory.
For example, let's simplify your case by assuming that the virtual memory has 8 pages, but the physical memory can only hold 4 pages of data. If the process has data on virtual page [0,1,2,3,4], apparently physical memory cannot hold all 5 pages. Therefore one of the virtual pages will be put in disk, and the memory mappings of system will be something like [0->2, 1->1, 2->3, 4->0], and in this case virtual page 3 will be swapped out in disk.
Those swapped out pages will only be brought back to main memory by OS if the program needs the data, and one of the pages previously present in main memory needs to be swapped out to make space. Algorithms to determine which page to swap out is another topic (for example, LRU, clock algorithm).
In reality the memory system is more complicated than this scenario, because modern operating systems allow multiple processes running in the system, and OS itself uses other techniques (setting threshold to trigger page swapping, for example) to make memory system more efficient.
The memory management using translates LOGICAL addresses to PHYSICAL addresses.
It does that translation using PAGE TABLE defined by the operating system. The format of page tables varies with systems and there are at least three major approaches processors take to define them. Generally, a processor with have one or more privileged registers that point to the page tables for the current process. These registers are normally loaded as part of the context change that brings in a new process.
In the simple case a page table is just an array that contains the mapping between logical and physical address. Some number of high order bits of an address serve as an index into the page table. The corresponding page table entry specifies the physical page frame the logical page is mapped to.
Some number of low order bits of the addressserve as the offset into the physical page frame.
The operating system maintains the page tables in the format the MMU expects them to be in. The MMU does the translation between logical addresses an physical addresses transparently.
The disk block size is irrelevant to this translation.

What data structures use 128MB of 1GB Linux kernel space?

In almost all books and articles I have read about about HIGHMEM in Linux kernel, they say while using 3:1 split, not all of the 1GB is available to the kernel for mapping. And normally its 896MB or so, with the rest being used for kernel data structures, memory maps, page tables and such.
My question is, what exactly are these data structures? Page tables are normally accessed via a page table address register, right? And the base address of page table is normally stored as a physical address. Now why do one need to reserve a virtual address space for the entire table?
Similarly, I read about kernel code itself occupying space. What does that have to do with virtual address space? Is it not the physical memory that would be consumed for storing the code?
And finally, these data structures why do they have to reserve the 128MB space? Why can't they be used out of the entire 1GB address space, as required, like any other normal data structure in kernel would do?
I have gone through LDD3, Professional Linux Kernel Architecture and several posts here at stack-overflow (like: Why Linux Kernel ZONE_NORMAL is limited to 896 MB?) and an older LWN article, but found no specific information on the same.
With regards to page tables, it's true that the MMU wouldn't care if the page tables themselves weren't mapped in the virtual address space - for the purposes of address translations, that would be OK. But when the kernel needs to modify the page tables, they do need to be mapped in the virtual address space - and the kernel can't just map them in "just in time", because it needs to modify the page tables themselves to do that. It's a chicken-and-egg problem, which means that the page tables need to remain mapped at all times.
A similar problem exists with the kernel code. For the code to execute, it must be mapped in the virtual address space - and if the code that does the page table modification were itself not present, we'd have a similar chicken-and-egg problem. Given this, it's easier to leave the entireity of the kernel code mapped all the time, along with the kernel-mode stacks and any kernel data structures access by code where you wouldn't want to potentially take a page fault. One large example of such data structures is the array of struct page structures, representing each physical memory page.
The 128MB reserve is not for a specific data structure, that always use it.
It's virtual memory, reserved for various users, which might use it. Normally, it's not all used.
About physical and virtual memory: every allocation needs three things - a physical page, a virutal page, and mapping connecting the two. Linux almost never uses physical addresses directly, it always passes virtual address translation.
For most kernel memory allocation (called lowmem), this translation is quite simple - subtract some constant from the virtual address to get the physical. But still, a virtual address is used.
Linux's memory management was written when the virtual memory space (4GB) was much larger
than the physical memory, even on the largest machines. In such cases, wasting virtual addresses is not a problem. Today, when physical memory is large, this leads to inefficiencies and problems.
The vmalloc virtual address range is used by any caller of vmalloc. For example:
1. Loading kernel drivers (using modprobe or insmod).
2. Kernel modules often allocate with vmalloc. The alternative function kmalloc used to be limited to 128K, and it rounds the size up to a power of 2, so vmalloc is often preferred for large allocations.

Difference between Kernel Virtual Address and Kernel Logical Address?

I am not able to exactly difference between kernel logical address and virtual address. In Linux device driver book it says that all logical address are kernel virtual address, and virtual address doesn't have any linear mapping. But logically wise when we say it is logical and when we say virtual and in which situation we use these two ?
The Linux kernel maps most of the virtual address space that belongs to the kernel to perform 1:1 mapping with an offset of the first part of physical memory. (slightly less then for 1Gb for 32bit x86, can be different for other processors or configurations). For example, for kernel code on x86 address 0xc00000001 is mapped to physical address 0x1.
This is called logical mapping - a 1:1 mapping (with an offset) that allows the kernel to access most of the physical memory of the machine.
But this is not enough - sometime we have more then 1Gb physical memory on a 32bit machine, sometime we want to reference non contiguous physical memory blocks as contiguous to make thing simple, sometime we want to map memory mapped IO regions which are not RAM.
For this, the kernel keeps a region at the top of its virtual address space where it does a "random" page to page mapping. The mapping there do not follow the 1:1 pattern of the logical mapping area. This is what we call the virtual mapping.
It is important to add that on many platforms (x86 is an example), both the logical and virtual mapping are done using the same hardware mechanism (TLB controlling virtual memory). In many cases, the "logical mapping" is actually done using virtual memory facility of the processor, so this can be a little confusing. The difference therefore is the pattern according to which the mapping is done: 1:1 for logical, something random for virtual.
Basically there are 3 kinds of addressing, namely
Logical Addressing : Address is formed by base and offset. This is nothing but segmented addressing, where the address (or offset) in the program is always used with the base value in the segment descriptor
Linear Addressing : Also called virtual address. Here adresses are contigous, but the physical address are not. Paging is used to implement this.
Physical addressing : The actual address on the Main Memory!
Now, in linux, Kernel memory (in address space) is beyond 3 GB ( 3GB to 4GB), i.e. 0xc000000..The addresses used by Kernel are not Physical addresses. To map the virtual address it uses PAGE_OFFSET. Care must be taken that no page translation is involved. i.e. these addresses are contiguous in nature. However there is a limit to this, i.e. 896 MB on x86. Beyond which paging is used for translation. When you use vmalloc, these addresses are returned to access the allocated memory.
In short, when someone refers to Virtual Memory in context of User Space, then it is through Paging. If Kernel Virtual Memory is mentioned then it is either PAGE_OFFSETed or vmalloced address.
(Reference - Understanding Linux Kernel - 2.6 based )
Shash
Kernel logical addresses are mappings accessible to kernel code through normal CPU memory access functions. On 32-bit systems, only 4GB of kernel logical address space exists, even if more physical memory than that is in use. Logical address space backed by physical memory can be allocated with kmalloc.
Virtual addresses do not necessarily have corresponding logical addresses. You can allocate physical memory with vmalloc and get back a virtual address that has no corresponding logical address (on 32-bit systems with PAE, for example). You can then use kmap to assign a logical address to that virtual address.
Simply speaking, virtual address would include "high memory", which doesn't do the 1:1 mapping for the physical address,if your RAM size is more than the address range of kernel(typically,For 1G/3G in X86,your RAM is 3G but your kernel addressing range is 1G) and also the address return from kmap() and vmalloc(), which requires the kernel to establish page table for the memory mapping. since logic address is always memory mapped by the kernel(1:1 mapping), you don't need to explicitly call kernel API,like set_pte to set up the page table entry for the particular page.
so virtual address can't be logic address all the time.

Resources