Linux kernel knowledge about physical memory

Linux kernel knowledge about physical memory - linux-kernel

I understand that for every process virtual addresses are mapped to physical pages.The corresponding physical page number for a given virtual page number would be available in page table entry.
But i am curious to know how this mapping is done by kernel. How does kernel knows which physical page is free before allocating that page to a virtual page number. Does it keeps track of all the available empty pages in physical memory?

Yes, the kernel keeps a data structure describing the current status of all the physical pages that are available - an array of struct page entries, one for each physical page.

Related

How does a memory managemet unit map a virtual address to a physical address

If a computer system has a main memory of 1mb and virtual address space of 16mb while the disk block size is 1kb. How does the memory management unit map a virtual address to a physical address?

I assume you intend to ask "how does MMU maps virtual memory to physical memory" (as in your question description).
To start off, virtual memory is managed by operating systems. MMU only provides hardware mechanism to take advantage of it.
Operating systems will keep a map of virtual_address -> physical_address for each individual process.
For example if a program uses virtual page [0, 1, 2, 3], operating systems can map these pages to physical pages as [64, 128, 256, 512].
Since the virtual address space is larger than physical address space, not all of the virtual memory will be mapped to physical memory at any moment if physical memory cannot hold all of them. Therefore, some of the data would be swapped out to disk, and thus not present in physical memory.
For example, let's simplify your case by assuming that the virtual memory has 8 pages, but the physical memory can only hold 4 pages of data. If the process has data on virtual page [0,1,2,3,4], apparently physical memory cannot hold all 5 pages. Therefore one of the virtual pages will be put in disk, and the memory mappings of system will be something like [0->2, 1->1, 2->3, 4->0], and in this case virtual page 3 will be swapped out in disk.
Those swapped out pages will only be brought back to main memory by OS if the program needs the data, and one of the pages previously present in main memory needs to be swapped out to make space. Algorithms to determine which page to swap out is another topic (for example, LRU, clock algorithm).
In reality the memory system is more complicated than this scenario, because modern operating systems allow multiple processes running in the system, and OS itself uses other techniques (setting threshold to trigger page swapping, for example) to make memory system more efficient.

The memory management using translates LOGICAL addresses to PHYSICAL addresses.
It does that translation using PAGE TABLE defined by the operating system. The format of page tables varies with systems and there are at least three major approaches processors take to define them. Generally, a processor with have one or more privileged registers that point to the page tables for the current process. These registers are normally loaded as part of the context change that brings in a new process.
In the simple case a page table is just an array that contains the mapping between logical and physical address. Some number of high order bits of an address serve as an index into the page table. The corresponding page table entry specifies the physical page frame the logical page is mapped to.
Some number of low order bits of the addressserve as the offset into the physical page frame.
The operating system maintains the page tables in the format the MMU expects them to be in. The MMU does the translation between logical addresses an physical addresses transparently.
The disk block size is irrelevant to this translation.

How paging in Linux distinguishes pages from page frames?

The book Understanding the Linux Kernel, 3rd Edition
By Daniel P. Bovet, Marco Cesati
talks about advantages of Paging in Chapter 2 Memory Addressing.
Here it says one of the advantages is,
Distinguish pages (groups of data) from page frames (physical addresses in main memory). This allows the same page to be stored in a page frame, then saved to disk and later reloaded in a different page frame. This is the basic ingredient of the virtual memory mechanism.
I am unable to fully understand this. Does it mean that the when a swapped page is loaded back in the Physical memory, its virtual address remains the same but the Physical address changes?

A process address space is organized into logical pages. A logical page may be mapped to a physical page frame.
Does it mean that the when a swapped page is loaded back in the Physical memory, its virtual address remains the same but the Physical address changes?
It means a lot more than that. However, yes, a logical page may be mapped to different physical page frames over time.

Can someone explain this diagram on Paging (virtual memory) to me?

I've been trying to understand virtual memory but when I get into the real specifics of it I just get confused. I understand (or feel like I do) the fact that virtual memory is a way for a process to "think" that it has a certain amount of memory allocated to it. The virtual address space is partitioned into equal sized pages, the physical memory is partitioned into equal sized frames, and the pages map to the frames.
But like..when does this happen? In this diagram, the CPU is running Program P. That means that a part of P was already in the physical memory, correct? (Since the cpu only has access to the physical/main memory). So what exactly is being pointed at by the CPU? I see that it's a page in the virtual memory space, so like..what exactly does this page represent? Is it an instruction? Are we moving an instruction from virtual memory to physical memory, so that more of the program is in physical memory (that hadn't been needed up until that point)? Am I way off? Can someone explain this to me?

The diagram shows the process of translating a virtual address to a physical address.
The fat arrow from Program P to CPU symbolizes the program being "fed" into the CPU.1
The CPU "points" to a virtual address used by an instruction to address a memory location in the program P. It is divided into two parts:
Page Table Index (p): the virtual address contains an index into the page table, which maps a page to a page frame (f). For a description of the mechanism, including multi-level paging, read this.
Offset (o): as you can see, the offset is directly added to the physical address, since paging's smallest addressable unit is a page, not a byte
Finally, the calculated address is used to address a memory location in physical memory.
1 "fed" means "read (pronounced like red) from secondary storage into RAM and executing the program instruction by instruction".

I would not bother trying to understand that diagram because it makes no sense.
It is titled "Paging" but the diagram does not show paging at all.
What you are missing is that there are two steps. First there is logical memory translation (what the diagram kinda, sorta) shows.
Physical memory is arranged in an array of PAGE FRAMES of some fixed size (e.g., 1K, 4K).
Each process has a LOGICAL ADDRESS SPACE consisting of PAGES that match the page frame size.
The logical address space is defined by a PAGE TABLE managed by the operating system. The page table maps logical pages to physical page frames.
If there are two processes (X and Y), logical address Q in process X and address Y map to different physical page frames in most cases.
Usually there is a range of logical addresses that are assigned to the SYSTEM ADDRESS SPACE. Those logical pages map to the same physical page for all processes.
Processes only address logical pages. The have no knowledge of physical pages. The Program Counter register always refers to logical addresses. The CPU automatically translates logical pages to physical page frames. The translation is entirely transparent to the process. The operating system is the only thing that has any knowledge of physical page frame but it only manages the page tables; the CPU does the translation.
Paging is something different but related.
When a program accesses a memory address, the CPU attempts to translate that into a physical address within a page frame. Several steps occur.
The CPU locates the page table entry for the requested page.
There may not be a page page table entry at all for the page. The diagram shows a contiguous logical to physical mapping. That rarely occurs. The logical address space usually has clusters of valid pages with gaps between them. If there is no page table entry for the address, the CPU triggers an exception.
The CPU reads the page table entry to determine if it references a valid page frame.
There may be an entry for the page that has not been mapped to the logical address space (e.g., the first page is usually not mapped to trap null pointer errors). If the page has not been mapped, that triggers an exception.
The CPU checks whether the access is permitted for the current processor mode.
Read/Write/Execute protection can be set for a page and access can be restricted by mode (kernel mode, user mode, or some other mode in some processors).
If the access is not permitted, the CPU triggers an exception.
[Here is where paging comes in] The CPU checks whether the page has been mapped to a physical page frame. If not, the CPU triggers a PAGE FAULT. The OS responds by locating where the page is stored in a paging file, mapping the page table to a physical page frame, loading the data from the page file into memory, and then restarting the instruction.

I guess most of your confusion comes from the fact that the above diagram is a little bit misguided.
Please note that the lack of the IP register and some extra text # both 'Tables' are the problematic ones. The rest is OK.
I show you below the same but fixed diagram which makes more sense.
As others guys already told you, the above diagram is just the translation scheme for the addresses that the CPU use to fetch the actual instructions & operands from P's virtual address space. Did you see it? It's all about addresses and nothing else!!!
It shows you how virtual addresses are managed by the CPU (in a paged scheme) in order to reach the next instruction or operand from the real physical memory using physical addresses.
The explanation of 'Downvoter' is great for that, so no need to add anything else.

Will physical addresses of all paging structures in Linux be mapped in the page tables

In 64-bit Linux, IA-32E paging is used with 4 levels of paging structures (PML4/PDPT/PD/PT). The entries in the former three structures give the physical address of the corresponding next structure. My question is that will the physical addresses of all these paging structures be mapped in the paging table? If they are mapped, in which mode (User/Supervisor)? Thanks very much!
I captured some specific memory addresses which a vcpu have accessed during a period in KVM. These addresses are in the gfn(guest physical frame number) form. I wanted to tell if these gfns were mapped in kernel or userspace. So I traversed the guest's (virtual machine) paging table to find out the corresponding page table entries mapping to these gfns. See my previous question here.
I found that the physical addresses of some paging structures are mapped in the paging table while some are not. That is, the physical addresses of some paging structures (such as the address of PT given by a PDE) do not have valid corresponding PTE in the page table. Since I have changed the memory mechanism of the KVM a lot, I am afraid that maybe this phenomenon is caused by my code or maybe there is something wrong with my page-table-walking code.
So I want to know in a normal Linux, how these stuffs are handled.
Thanks very much!

In 64-bit Linux, all physical addresses are always mapped with a Supervisor mapping in the kernel half of the address space.
You can convert a physical address to the corresponding virtual address in the linear kernel mapping by adding PAGE_OFFSET, which on x86-64 is 0xffff880000000000.
Are you sure that you are correctly handling 1GB and 2MB "huge pages" in your page table walker?

In normal linux CR3 contains PA of frame containing PML4 of Page table. last bits of virtual address are offset in that frame. data at that offset contains PA of page frame for next level. In this way corresponding page frame, containing desired data,is accessed.Those addresses containing PT structs are not mapped in any Page table.
In case of KVM, guest physical pages are virtual addresses mmaped by kernel. Those addresses used by guest need mapping to physical frames which is duty and discretion of host kernel. So host kernel can map some pages and not others according to its own algorithm. So if some gfn are mapped while others not is a quite natural and correct phenomena.

What data structures use 128MB of 1GB Linux kernel space?

In almost all books and articles I have read about about HIGHMEM in Linux kernel, they say while using 3:1 split, not all of the 1GB is available to the kernel for mapping. And normally its 896MB or so, with the rest being used for kernel data structures, memory maps, page tables and such.
My question is, what exactly are these data structures? Page tables are normally accessed via a page table address register, right? And the base address of page table is normally stored as a physical address. Now why do one need to reserve a virtual address space for the entire table?
Similarly, I read about kernel code itself occupying space. What does that have to do with virtual address space? Is it not the physical memory that would be consumed for storing the code?
And finally, these data structures why do they have to reserve the 128MB space? Why can't they be used out of the entire 1GB address space, as required, like any other normal data structure in kernel would do?
I have gone through LDD3, Professional Linux Kernel Architecture and several posts here at stack-overflow (like: Why Linux Kernel ZONE_NORMAL is limited to 896 MB?) and an older LWN article, but found no specific information on the same.

With regards to page tables, it's true that the MMU wouldn't care if the page tables themselves weren't mapped in the virtual address space - for the purposes of address translations, that would be OK. But when the kernel needs to modify the page tables, they do need to be mapped in the virtual address space - and the kernel can't just map them in "just in time", because it needs to modify the page tables themselves to do that. It's a chicken-and-egg problem, which means that the page tables need to remain mapped at all times.
A similar problem exists with the kernel code. For the code to execute, it must be mapped in the virtual address space - and if the code that does the page table modification were itself not present, we'd have a similar chicken-and-egg problem. Given this, it's easier to leave the entireity of the kernel code mapped all the time, along with the kernel-mode stacks and any kernel data structures access by code where you wouldn't want to potentially take a page fault. One large example of such data structures is the array of struct page structures, representing each physical memory page.

The 128MB reserve is not for a specific data structure, that always use it.
It's virtual memory, reserved for various users, which might use it. Normally, it's not all used.
About physical and virtual memory: every allocation needs three things - a physical page, a virutal page, and mapping connecting the two. Linux almost never uses physical addresses directly, it always passes virtual address translation.
For most kernel memory allocation (called lowmem), this translation is quite simple - subtract some constant from the virtual address to get the physical. But still, a virtual address is used.
Linux's memory management was written when the virtual memory space (4GB) was much larger
than the physical memory, even on the largest machines. In such cases, wasting virtual addresses is not a problem. Today, when physical memory is large, this leads to inefficiencies and problems.
The vmalloc virtual address range is used by any caller of vmalloc. For example:
1. Loading kernel drivers (using modprobe or insmod).
2. Kernel modules often allocate with vmalloc. The alternative function kmalloc used to be limited to 128K, and it rounds the size up to a power of 2, so vmalloc is often preferred for large allocations.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio