What data structures use 128MB of 1GB Linux kernel space? - memory-management

In almost all books and articles I have read about about HIGHMEM in Linux kernel, they say while using 3:1 split, not all of the 1GB is available to the kernel for mapping. And normally its 896MB or so, with the rest being used for kernel data structures, memory maps, page tables and such.
My question is, what exactly are these data structures? Page tables are normally accessed via a page table address register, right? And the base address of page table is normally stored as a physical address. Now why do one need to reserve a virtual address space for the entire table?
Similarly, I read about kernel code itself occupying space. What does that have to do with virtual address space? Is it not the physical memory that would be consumed for storing the code?
And finally, these data structures why do they have to reserve the 128MB space? Why can't they be used out of the entire 1GB address space, as required, like any other normal data structure in kernel would do?
I have gone through LDD3, Professional Linux Kernel Architecture and several posts here at stack-overflow (like: Why Linux Kernel ZONE_NORMAL is limited to 896 MB?) and an older LWN article, but found no specific information on the same.

With regards to page tables, it's true that the MMU wouldn't care if the page tables themselves weren't mapped in the virtual address space - for the purposes of address translations, that would be OK. But when the kernel needs to modify the page tables, they do need to be mapped in the virtual address space - and the kernel can't just map them in "just in time", because it needs to modify the page tables themselves to do that. It's a chicken-and-egg problem, which means that the page tables need to remain mapped at all times.
A similar problem exists with the kernel code. For the code to execute, it must be mapped in the virtual address space - and if the code that does the page table modification were itself not present, we'd have a similar chicken-and-egg problem. Given this, it's easier to leave the entireity of the kernel code mapped all the time, along with the kernel-mode stacks and any kernel data structures access by code where you wouldn't want to potentially take a page fault. One large example of such data structures is the array of struct page structures, representing each physical memory page.

The 128MB reserve is not for a specific data structure, that always use it.
It's virtual memory, reserved for various users, which might use it. Normally, it's not all used.
About physical and virtual memory: every allocation needs three things - a physical page, a virutal page, and mapping connecting the two. Linux almost never uses physical addresses directly, it always passes virtual address translation.
For most kernel memory allocation (called lowmem), this translation is quite simple - subtract some constant from the virtual address to get the physical. But still, a virtual address is used.
Linux's memory management was written when the virtual memory space (4GB) was much larger
than the physical memory, even on the largest machines. In such cases, wasting virtual addresses is not a problem. Today, when physical memory is large, this leads to inefficiencies and problems.
The vmalloc virtual address range is used by any caller of vmalloc. For example:
1. Loading kernel drivers (using modprobe or insmod).
2. Kernel modules often allocate with vmalloc. The alternative function kmalloc used to be limited to 128K, and it rounds the size up to a power of 2, so vmalloc is often preferred for large allocations.

Related

Physical memory mapping and location of page tables

I have a picture of the virtual address space of a process (for x86-64):
However, I am confused about a few things.
What is the "Physical memory map" region for?
I know the 4-page tables are found in the high canonical region but where exactly are they? (data, code, stack, heap or physical memory map?)
What is the "Physical memory map" region for?
Direct-mapping of all physical RAM (usually with hugepages) allows easy access to memory given a physical address. (i.e. add an offset to generate a virtual address you can use to load or store from there.)
Having phys<->virt be cheap makes it easier to manage memory allocations, so you can primarily track what regions of physical memory are in use.
This is how kmalloc works: it returns a kernel virtual address that points into the direct-mapped region. This is great: it doesn't have to spend any time finding free virtual address space as well, just bookkeeping for physical memory. And it doesn't have to create or modify any page tables (And freeing doesn't have to tear down page tables and invlpg.)
kmalloc requires the memory to be contiguous in physical memory, not stitching together multiple 4k pages into a contiguous virtual allocation (that's what vmalloc does), so that's one reason to maybe not use kmalloc for everything, like for larger allocations that might fail or have to stop and defrag or page out memory if the kernel can't find enough contiguous physical pages. Which it couldn't do in a context that must run without pre-emption, like in an interrupt handler. (Correct me if I'm wrong, I don't regularly actually look at Linux kernel code. Regardless of actual Linux details, the basics of this way of handling allocation is important and relevant to any OS that direct-maps all physical RAM.)
Related:
What is the rationality of Linux kernel's mapping as much RAM as possible in direct-mapping(linear mapping) area?
Confusion about different meanings of "HighMem" in Linux Kernel re: how Linux uses physical RAM that it doesn't have enough virtual address-space to keep mapped all the time. (On architectures where Linux supports the concept of Highmem, e.g. i386 but not x86-64). Still, thinking about that can be a useful thought exercise in how kernels have to deal with memory, and why it's nice that x86-64 kernels generally don't have to deal with that pain.
Linux Torvalds has ranted about 32-bit x86 PAE which expanded physical address space but not virtual, when 4GiB virtual was already not enough to comfortably deal with 4GiB physical. It's a useful perspective on how this looks from an OS developer's perspective.
I know the 4-page tables are found in the high canonical region but where exactly are they? (data, code, stack, heap or physical memory map?)
Page tables for user-space task are in physical memory dynamically allocated by the kernel, probably with kmalloc. I haven't looked at the code. Every user-space page-table refers to the page directories for the kernel part of virtual address space, which are also stored somewhere.
They're only accessed by the CPU by physical address, so there's no need for there to be a virtual mapping of them other than the direct mapping of all physical RAM.
(The CPU accesses them on TLB miss, to fetch a PTE with the translation for this virtual address. But if they used virtual addresses themselves, you'd have a catch-22 unless there was a way for the OS to prime the TLB with an entry for the virtual address in CR3, and so on. Much better to just have the OS put physical linear addresses into CR3 and the page-directory / page-table entries.)
For Linux on x86-64, each process has its own page tables. The page tables are independent 4KiB physical pages that can be allocated anywhere in physical memory. The page tables are not part of the virtual address space -- they are accessed by the page table walker hardware using their physical addresses with the bit fields of the requested virtual address as indices into the page table hierarchy. The control register CR3 contains the physical address of the 4KiB page that holds the root of the page table tree for the currently running process. The kernel knows the CR3 of each process (since it must be saved and restored on context switches), so the kernel can walk a process's page tables in software (by emulating what the page table walker does in hardware) for any desired virtual address.

How does the address translation between Main Memory and Disk storage work?

This question comes up when I learn about virtual memory and memory management.
The following describes what I understand so far:
The memory hierarchy, which benefits from locality principle, briefly includes (from top to bottom):
register
cache (SRAM)
main memory (DRAM)
disk storage
A page table provides the address translation from virtual address to physical address
The virtual memory provides an abstraction for physical memory
Page(Frame) is the basic unit when memory management unit (MMU) manipulates memory between main memory and disk.
A process can only understand the addresses inside the virtual address space.
Each process has its own virtual address space.
Each process has its own page table, and all page tables are maintained by the kernel.
a physical address describes a location inside main memory
Translation Lookaside Buffer (TLB) is introduced as a cache-version page table.
Cache stores a tag field for each cache line to determine its mapping to main memory
Cache line is the basic unit when MMU manipulates memory between main memory and cache.
In each cache line, it stores a tag field and a valid bit to determine what range of physical addresses (e.g. what part of main memory) resides in the cache line.
After tranlsating virtual address into physical address with TLB or page table, MMU compares the physical address with cache line tag field, and finds the desired memory content (assuming that cache hits).
I believe there's an address translation mechanism between the main memory and the disk for 2 reasons:
The main memory is the locality principle result for the disk.
To reduce main memory miss rate, the main memory applies the full associative placement policy.
However, I only find few material which possiblely relates to the mechanism.
wiki: frame table data says:
Frame table data
The simplest page table systems often maintain a frame table and a page table. The frame table holds information about which frames are mapped.
and wiki: LBA says:
Logical block addressing (LBA) is a common scheme used for specifying the location of blocks of data stored on computer storage devices, generally secondary storage systems such as hard disk drives.
So I guess there's a frame table to store the address translation between physical address and LBA, and MMU would refer to the frame table when page fault occurs.
Please help to point out how does the address translation between Main Memory and Disk storage work.
Thanks for help!
Hard-disk addressing and main memory addressing is done much differently. The CPU doesn't support hard-disk addressing by default. What modern CPUs support is PCI devices that can present an interface to hard-disks like NVME or SATA.
PCI devices have registers that are memory mapped in RAM. The position of these registers is specified in the MCFG which is an ACPI table. ACPI is a convention to represent hardware for software (the os) to be able to determine what is present on the motherboard that it needs to drive. ACPI is also a power management convention which is required for software to even shutdown the computer.
With that said, you can take example on Linux and how it does things to understand how a modern os does to make the link between pages on the hard-disk and the pages in main memory. On the swap management chapter of the kernel.org documentation (https://www.kernel.org/doc/gorman/html/understand/understand014.html), you can read the following:
11.2 Mapping Page Table Entries to Swap Entries
When a page is swapped out, Linux uses the corresponding PTE to store enough information to locate the page on disk again. Obviously a PTE is not large enough in itself to store precisely where on disk the page is located, but it is more than enough to store an index into the swap_info array and an offset within the swap_map and this is precisely what Linux does.
Each PTE, regardless of architecture, is large enough to store a swp_entry_t which is declared as follows in <linux/shmem_fs.h>
16 typedef struct {
17 unsigned long val;
18 } swp_entry_t;
Two macros are provided for the translation of PTEs to swap entries and vice versa. They are pte_to_swp_entry() and swp_entry_to_pte() respectively.
You should probably read the link above along with one of my answers on cs.stackexchange.com: https://cs.stackexchange.com/questions/142525/data-transfer-between-cpu-ram-and-secondary-storage/142553#142553. This will probably provide a fair understanding of what is going on.
Everything is PCI today. You can think of all graphics cards, Intel HD Audio, the AHCI for SATA, the xHCI to drive USB or network cards. That pretty much sums what a current modern computer supports which is audio, USB, SATA, network and graphics. Understanding PCI is thus the key to understand how low level drivers work. The higher level implementation detail is unimportant and varies between os.

virtual memory effects and relations between paging and segmentation

This is my first post. I want to ask about how are virtual memory related to paging and segmentation. I am searching internet for few days, but still can't manage to put that information into right order. Here is what I know so far:
We can talk about addresses (we could say they are levels of memory abstraction) in memory:
physical level (CPU talking to memory controller, "hey give me contents of address 0xFFEABCD", these adresses are adresses of cells in RAM, so cell 0xABCD has physical address 0xABCD. memory controller can only use physical adresses, so if adress is not physical it must be changed to physical.
logical level.This is abstraction over physical addresses. Here processes if ask for memory, (assume successfull allocation) are given address which has no direct relation to cells in RAM. We can say these addresses are from different pool (world?) than physical addresses. As I said before memory controller only understand physical adresses, so to use logical addresses, we need to convert them to physical addresses. There are two ways for OS to be able to create logical adresses:
paging - in which physical memory (RAM) is divided into continous blocks of memory (called frames), and logical memory (this other world) is also divided in same in length blocks (called pages). Now OS keep in RAM data structure called page table. It's an associative array (map) and it's primary goal of existence is to translate logical level addresses to physical level adresses. Paging has following effect: memory allocated by process in RAM (so in frames in physical memory belonging to program) may not be in contingous manner (so there may be holes inside).
segmentation - program is divided into parts called segments. Segments sizes are not fixed, so different segments may have different sizes. Program is divided in few segments and each segment will have its own place in RAM (physical) memory. So one segment (call it sementA), and another (call it segmentB) may not be near each other. In other words segmentA don't have to has segmentB as a neighbour.
internal fragmentation - when memory which belongs to process isn't used in 100%. So if process want to have 2 bytes for its use, OS need to allocate page/pages which total size need to be greater or equal than amount of memory requested by program. Typical size of page is 4KB. Unit in which OS gives memory to process are pages. So it can't give less than 4KB. So if we use 2 bytes, 4KB - 2B = 4094 bytes are wasted (memory is associated with our process so other processes can't use it. Only we can use it, but we only need 2B).
external fragmentation - when allocated blocks of memory are one near another, but there is a little hole between them. Its free, so other programs, can use it, but it is unlikly because it is very small. That holes with high probability will be wasted. More holes - more wasted memory.
Paging may cause effect of internal fragmentation. Segmentation may cause effect of external fragmentation.
virtual level - addresses used in virtual memory. This is extension of logical memory level. Now program don't even need to have all of it's allocated pages in RAM to start execution. It can be implemented with following techniques:
paged segmentation - method in which segments are divided into pages.
segmented paging - less used method but also possible.
Combining them takes a positive aspects from both solutions.
What i have read about pros and cons of virtual memory:
PROS:
processes have their own address space which mean if we have two processes A and B, and both of them have a pointer to address eg. 17 processA pointer will be showing to different frame than pointer in processB. this results in greater process isolation. Processes are protected from each other (so one process can't do things with another process memory if it isn't shared memory because in its mapping don't exist such mapping entry), and OS is more protected from processes.
have more memory than you physical first order memory(RAM, due to swapping to secondary order memory).
better use of memory due to:
swapping unused parts of programs to secondary memory.
making sharings pages possible, also make possible "copy on write".
improved multiprogram capability (when not needed parts of programs are swapped out to secondary memory, they made free space in ram which could be used for new procesess.)
improved CPU utilisation (if you can have more processes loaded into memory you have bigger probability than there exist some program that now need do CPU stuff, not IO stuff. In such cases you can better utilise CPU).
CONS:
virtual memory has it's overhead because we need to get access to memory twice (but here a lot of improvment can be achieved using TLB buffers)
it makes OS part managing memory more complicated.
So here we came to parts which I don't really understand:
Why in some sources logical address and virtual addresses are described as synonymes? Do I get something wrong?
Is really virtual memory making protection to processes? I mean, in segmentation for example there was also check if process do not acces other memory (resulting in segfault if it does), paging also has a protection bit in a page table, so doesn't the protection come from simply extending abstraction of logic level addresses? If VM (Virtual Memory) brings extended protection features, what are they and how they work? In other words: does creating separate address space for each process, bring extended memory protection. If so, what can't be achieved is paging without VM?
How really differ paged segmentation from segmented paging. I know that the difference between these two will be how a address is constructed (a page number, segment number, that stuff..), but I suppose it isn't enough to develop 2 strategies. This reason is like nothing. I read that segmented paging is less elastic, and that's the reason why it is rarely used. But why it it less elastic? Is the reason for that, that in program you can have only few segments instead a lot of pages. If thats the case paging indeed allow better "granularity".
If VM make separate address space for each process, does it mean, paging without VM use logic addresses from "one pool" (is then every logic address globally unique in that case?).
Any help on that topic would be appreciated.
Edit: #1
Ok. I finally understood that paging not on demand is also a virtual memory. I just found some clarification was helpful to understand the topic. Below is link to image which I made to visualize differences. Thanks for help.
differences between paging, demand paging and swapping
Why in some sources logical address and virtual addresses are described as synonymes? Do I get something wrong?
Many sources conflate logical and virtual memory translation. In ye olde days, logical address translation never took place without virtual address translation so processor documentation referred to them as the same.
Now we have large memory systems that use logical memory translation without virtual memory.
Is really virtual memory making protection to processes?
It is the logical memory translation that implements page protections.
How really differ paged segmentation from segmented paging.
You can really ignore segments. No rationally designed processor architecture designed after 1970 used segments and they are finally dying out.
If VM make separate address space for each process, does it mean, paging without VM use logic addresses from "one pool"
It is logical memory that creates the separate address space for each process. Paging is virtual memory. You cannot have one without the other.

How does a memory managemet unit map a virtual address to a physical address

If a computer system has a main memory of 1mb and virtual address space of 16mb while the disk block size is 1kb. How does the memory management unit map a virtual address to a physical address?
I assume you intend to ask "how does MMU maps virtual memory to physical memory" (as in your question description).
To start off, virtual memory is managed by operating systems. MMU only provides hardware mechanism to take advantage of it.
Operating systems will keep a map of virtual_address -> physical_address for each individual process.
For example if a program uses virtual page [0, 1, 2, 3], operating systems can map these pages to physical pages as [64, 128, 256, 512].
Since the virtual address space is larger than physical address space, not all of the virtual memory will be mapped to physical memory at any moment if physical memory cannot hold all of them. Therefore, some of the data would be swapped out to disk, and thus not present in physical memory.
For example, let's simplify your case by assuming that the virtual memory has 8 pages, but the physical memory can only hold 4 pages of data. If the process has data on virtual page [0,1,2,3,4], apparently physical memory cannot hold all 5 pages. Therefore one of the virtual pages will be put in disk, and the memory mappings of system will be something like [0->2, 1->1, 2->3, 4->0], and in this case virtual page 3 will be swapped out in disk.
Those swapped out pages will only be brought back to main memory by OS if the program needs the data, and one of the pages previously present in main memory needs to be swapped out to make space. Algorithms to determine which page to swap out is another topic (for example, LRU, clock algorithm).
In reality the memory system is more complicated than this scenario, because modern operating systems allow multiple processes running in the system, and OS itself uses other techniques (setting threshold to trigger page swapping, for example) to make memory system more efficient.
The memory management using translates LOGICAL addresses to PHYSICAL addresses.
It does that translation using PAGE TABLE defined by the operating system. The format of page tables varies with systems and there are at least three major approaches processors take to define them. Generally, a processor with have one or more privileged registers that point to the page tables for the current process. These registers are normally loaded as part of the context change that brings in a new process.
In the simple case a page table is just an array that contains the mapping between logical and physical address. Some number of high order bits of an address serve as an index into the page table. The corresponding page table entry specifies the physical page frame the logical page is mapped to.
Some number of low order bits of the addressserve as the offset into the physical page frame.
The operating system maintains the page tables in the format the MMU expects them to be in. The MMU does the translation between logical addresses an physical addresses transparently.
The disk block size is irrelevant to this translation.

Definition/meaning of Aliasing? (CPU cache architectures)

I'm a little confused by the meaning of "Aliasing" between CPU-cache and Physical address.
First I found It's definition on Wikipedia :
However, VIVT suffers from aliasing problems, where several different virtual addresses may refer to the same physical address. Another problem is homonyms, where the same virtual address maps to several different physical addresses.
but after a while I saw a different definition on a presentation(ppt)
of DAC'05: "Energy-Efficient Physically Tagged Caches for Embedded Processors with
Virtual Memory"
Cache aliasing and synonyms:
Alias: Same virtual address from different contexts mapped to different physical addresses Synonym: Different virtual address mapped to the same physical address (data sharing)
As I'm not a native speaker, I don't know which is correct,
though I feel the Wiki's definition is correct.
Edit:
Concept of "aliasing" in CPU cache usually means "synonym", on the contrary is "homonym". In a more generic level, "aliasing" is "confusing" or "chaos" or something like that. So In my opinion, "aliasing" exactly means the mapping of (X->Y) is "not bijective", where
"X" = the subset of physical addresses units which has been cached. (each element is a line of byte)
"Y" = the set of valid cache lines. (elements a also "line")
You'd need to learn about Virtual Memory first, but basically it's this:
The memory addresses your program uses aren't the physical addresses that the RAM uses; they're virtual addresses mapped to physical addresses by the CPU.
Multiple virtual addressses can point to the same physical address.
That means that you can have two copies of the same data in separate parts of the cache without knowing it... and they wouldn't be updated correctly, so you'd get wrong results.
Edit:
Exerpt of reference:
Cache aliasing occurs when multiple mappings to a physical page of memory have conflicting caching states, such as cached and uncached. Due to these conflicting states, data in that physical page may become corrupted when the processor's cache is flushed. If that page is being used for DMA by a driver, this can lead to hardware stability problems and system lockups.
For those who are still unconvinced:
On ARMv4 and ARMv5 processors, cache is organized as a virtual-indexed, virtual-tagged (VIVT) cache in which both the index and the tag are based on the virtual address. The main advantage of this method is that cache lookups are faster because the translation look-aside buffer (TLB) is not involved in matching cache lines for a virtual address. However, this caching method does require more frequent cache flushing because of cache aliasing, in which the same physical address can be mapped to multiple virtual addresses.
#Wu yes you do need to understand virtual memory little to understand aliasing. Let me give you a few lines of explanation first:
Lets say I have a RAM (physical memory) of 1GB. I want to present my programmer with a view that I have 4GB memory then I use virtual memory. In virtual memory, the programmer thinks that he/she has 4GB and writes their program from that perspective. They do not need to know how much physical memory exists. The advantage is that program will run on computers with different amounts of RAM. Also, the program can run on a computer together with other programs (also consuming physical memory).
So here is how virtual memory is implement. I will give a simple 1-level virtual memory system (Intel has a 2/3-level system which just makes it complicated for explanation.
Our problem here is that the programmer has 4 Billion addresses and we only have 1 billion places to put those 4 billion addresses. So, addresses are from the virtual address space need to be mapped to physical address space. This is done using a simple index table called a Page Table. You access a Page Table with a virtual address and it gives you the physical address of that memory location.
Some details: Remember that physical space is only 1GB so the system only keeps the most recently accessed 1GB worth in physical memory and keeps the rest in system disk. When the program requests a particular address, we first check if it is already in physical memory. If so, it is returned to the program. If not, it brought from the disk and put into physical memory and then returned to the program. The latter is known as a Page Fault.
Coming back to aliasing in context of virtual memory: since there is mapping between virtual -> physical addresses, it is possible to make two virtual addresses to map to the same physical address. it is the same as saying that if I look at my page table for virtual
address X and Y, I will get the same physical address in BOTH cases.
I show below a simple example of a 8 entry Page Table. Say there are 8 vitual addresses and only 3 physical addresses. The page table looks as follows:
0: 1
1: On disk
2: 2
3: 1
4: On disk
5: On disk
6: On disk
7: 0
This mean that if virtual address 4 is accessed, you will get a page fault.
If virtual addresses 3 is accessed, you will get the physical address 1
In this case, virtual addresses 0 and 3 are aliasing to the same physical address 1 for both of them
NOTE: I used the terms physical and virtual addresses everywhere to simplify the concept. In a real system, the virtual-to-physical mapping is not on a per address basis . Instead, we map chunks of virtual space to physical space. Each chunk is called a Page (thats why the mapping table is called a page table) and the size of the chunk is a property of the ISA, e.g., Intel x86 has 4Kbyte pages.

Resources