This might be a silly question but it just popped up in my mind. All the text about process address space and virtual memory layout mentions that the process address space has
space reserved for kernel. For e.g. on 32 bit systems the process address space is 4GB of which 1 GB is reserved for kernel in Linux (Might be different on other OS).
I am just wondering why kernel is said to be in the process address space when a process cannot address the kernel directly. Why don't we say that the kernel has a separate address space than a process and why can't we have a different page table for kernel itself which is separate from the page tables of the processes?
When the process makes a system call, we don't need to switch the page tables (from process address space page table to kernel address space page table) for servicing the system call (which should be done only in kernel mode). This is said to be that the kernel is running in the process context.
Some kernel events which won't run in process context will load the page tables only for kernel.
Got it ?
Related
I think I'm missing a fundamental concept of how the OS manages memory.
OS is responsible for keeping track of what parts of physical memory are free.
OS creates and manages page tables, which have mappings between virtual to physical addresses.
For each instruction that specifies a virtual address, the hardware reads the page table to get the corresponding physical address. One way the hardware may know the location of the current page table is by a register that the OS updates.
This makes sense for how processes access memory. However, I don't understand how the OS itself accesses memory.
Assuming it uses the same instructions, the hardware would still be translating from virtual addresses to physical? Is there, for example, a known physical location for a page table for the OS itself? I know the question is murky, having trouble even understanding what to ask.
At some point there has to be a page table in a physical location. The method used for this depends upon the processor.
Let me give a simple example based upon the VAX processor. Suppose you divide the logical address space into a system range shared by all processes and a user range that is unique to each process. Then give each of those ranges its own page table.
Now you can place the user page table in the system address range of the logical address space.
If you access memory in the user space, you go to page table that the system finds at a logical address in the system space, that the processor then had to translate into a physical address using the page table for the system space; a two level translation.
If you use logical addresses for the the system space page table then you'd have no way of translating those into physical addresses. Instead, the local of the system page table is defined using physical addresses.
Another approach would be to define all page tables using physical addresses.
I don't understand how the OS itself accesses memory.
Think of the operating system as a process. The OS basically is a process, just like other processes, with elevated privileges. Whenever the OS wants to use eome memory location,it uses page tables for virtual to physical address translation, just like other processes would.
Think of it this way: Every process has a page table of its own, the same goes for the OS. The OS remembers the location of these page tables in the control structures associated with each process (e.g. the PCB), and for the currently running process the address (physical pointer) to the page table is kept in hardware (for the x86 architecture this is in control register 4 (CR4)). On x86, whenever the OS switches the running process it changes the value in CR4 so that the address points to the correct page table (its own if it switches to itself).
However, this is greatly simplified in modern operating systems, where the kernel (the OS) is mapped into the memory space of all processes, so that the kernel can run whenever it wants without having to switch page tables (which is costly). The pages in a process' page table belonging to the kernel are restricted to the process, and only accessible once the kernel takes control to do some management task.
If there is a 32 bit system (assume Windows), the virtual address space is 4GB. So CPu can generate any address between this range. Then shoudn't a process also be able to address anywhere in this range?
It is said that each process has its own private virtual address space.Then How does the system facilitate this?
In other words the CPU generates a 32 bit address, and that gets translated into physical address. Now how does CPU know that a specific process has to address only a specific part of the virtual address space(its private virtual address space).
Suppose a process addresses an address out of its private virtual address space, what happens?
A program has to call VirtualAlloc() on Windows to tell the operating system that it wants to use a chunk of virtual memory. Often called indirectly as a result of allocating memory from a heap or loading a DLL.
The operating system, in turn, sets up the page mapping tables that the CPU uses to translate a virtual address as used in the program to a physical RAM address as output on its address bus pins. One of three unusual things can happen whenever the CPU reads or writes data or executes code at a virtual memory address:
if there is no entry in the page mapping tables then the CPU raises a general protection fault trap. The operating system verifies that the address is invalid and terminates the program
if the page is not mapped to RAM yet then the CPU raises a page fault trap. The operating system finds a page of RAM that's unused, swapping out a used page if necessary. And ensures the content is valid, loading it from a file or the paging file if necessary. And updates the table entry so it now has the physical address of the RAM page. Execution resumes as normal
the CPU verifies that access to the page is allowed. A write to a page that is marked as read-only or an execute of a instruction in the page that's marked as no-execute generates a general protection fault trap. The operating system terminates the program.
Every process has its own set of page mapping tables, ensuring that one process cannot access the RAM pages that are used by another. Unless sharing is specifically requested, common for pages of code loaded from an executable file and memory mapped files. A context switch loads the CR2 register, the CPU register that contains the address of the page mapping table.
So there is no scenario where a process can ever address memory outside of its private virtual address space, the lack of a matching paging table entry ensures that this terminates the program.
The whole 4 GB address space is available to the process (although typically the upper half is reserved for kernel data), and the MMU maps parts of it to physical memory. The process cannot go "out" of its address space (all the 4 GB of it are allowed to be used), but if some part of it hasn't been mapped to physical memory a hardware exception is raised.
The address space is said to be private since the operating system changes the settings of the MMU at task switch, so every process sees a different independent memory layout (although parts of the address space can be shared with other processes).
In Linux OS, after enable the page table, kernel will only map PTEs belong to kernel space once and never remap them again ? This action is opposite with PTEs in the user space which needs to remap every time process switching happening ?
So, I want know the difference in management of PTEs in kernel and user space.
This question is a extended part from the question at:
Page table in Linux kernel space during boot
Each process has its own page tables (although the parts that describe the kernel's address space are the same and are shared.)
On a process switch, the CPU is told the address of the new table (this is a single pointer which is written to the CR3 register on x86 CPUs).
So, I want know the difference in management of PTEs in kernel and user space.
See these related questions,
Does Linux use self map for page tables?
Linux Virtual memory
Kernel developer on memory management
Position independent code and shared libraries
There are many optimizations to this,
Each task has a different PGD, but PTE values maybe shared between processes, so large chunks of memory can be mapped the same for each process; only the top-level directory (CR3 on x86, TTB on ARM) is updated.
Also, many CPUs have a TLB and cache. These need to be maintained with the memory mapping. Some caches are VIVT, VIPT and PIPT. The first two have to have some cache flushing iff the PGD and/or PTE change. Often a CPU will support a process, thread or domain id. The OS only needs to switch this register during a context switch. The hardware cache and TLB entries must contains tags with the process, thread, or domain id. This is an implementation detail for each architecture.
So it is possible that TLB flushes could be needed when a top level page registers changes. The CPU could flush the entire TLB when this happens. However, this would be a disadvantage to pages that remain mapped.
Also, sub-sections of memory can be the same. A loader or other library can use mmap to create code that is similar between processes. This common code may not need to be swapped at the page table level, depending on architecture, loader and Linux version. It could of course have a virtual alias and then it needs to be swapped.
And the final point to the answer; kernel pages are always mapped. Only a non-preemptive OS could not map the kernel, but that would make little sense as every process wants to call the kernel. I guess the micro-kernel paradigm allows for device drivers to unload when they are not in use. Linux uses module loading to handle this.
When loading the contents of a virtual address into a particular register, what are some general sequence of events that need to happen in the hardware and operating system as part of the process?
For example,
LD 0xffe4ca32, R1
The address used for this is the virtual address right?
And it would need to go through some address translation first to get a physical address.
My first question is,
When this instruction executes, how is this instruction handled by the Hardware and Operating System?
And my second question is,
Is the "value" of that virtual address, 0xffe4ca32, the contents of its mapped physical address or is it the physical address itself?
Im just not clear what is being loaded into R1
Here:
Let's assume x86. First, the CPU asks the MMU (memory management unit) to to translate the address. First the MMU checks something called the TLB (translation look-aside buffer), where recent translations from virtual to physical are stored. If it is there, the referenced address is returned. Otherwise, the MMU looks up the address in the page table. If the page is either a supervisor only page, or a page marked as not present in memory, the CPU throws a protection fault, or a page fault. For the protection fault, the OS will usually terminate the responsible process however it does that. For a page fault, the OS then checks it's own special paging structures to see if that page has been paged out, or if it just doesn't exist. If it has been paged out, it is read in to some page somewhere in memory, and the virtual address is remapped to that new place. If space cannot be found, another page will be put on disk to make room (a lot of this is called thrashing). If it has not been paged out, the OS will most likely kill the process, as it is trying to reference a non existing page.
Value of mapped physical address. Virtual memory pointers behave just like physical memory pointers in the perspective of user-space. In kernel space, there are some complications as physical memory access is needed (this is usually achieved through something called identity paging, where the first few hundred pages are mapped directly to their corresponding physical memory.
If I understand correctly, a memory adderss in system space is accesible only from kernel mode. Does it mean when components mapped in system space are executed the processor must be swicthed to kernel mode?
For ex: the virtual memory manager is a frequently used component and is mapped in system space. Whenever the VMM runs in the context of user process (lets say it translated an address), does the processor must be swicthed to kernel mode?
Thanks,
Suresh.
Typically, there's 2 parts involved.The MMU(Memory manage unit) which is a hardware component that does the translation from virtual addresses to physical addresses. And the operating system VM subsystem.
The operating system part needs to run in privileged mode (a.k.a. kernel mode) and will set up/change the mapping in the MMU based on the the user space needs.
E.g. to request more (virtual) memory, or map a file into memory, a transition to kernel mode is needed and the VM subsystem can change the mapping of the process.
Around this there's often a ton of tricks to be made - e-g. map the whole address space of the kernel into the user process virtual space, but change its access so the process can't use that memory - this means whenever you transit to kernel mode you don't need to reload the mapping for the kernel.
Taking your example of the virtual memory manager, it never actually runs in user space. To allocate memory, user mode applications make calls to the Win32 API (NTDLL.DLL as one example) to routines such as VirtualAlloc.
With regards to address translation, here's a summary of how it works (based on the content from Windows Internals 5th Edition).
The VMM uses page tables which the CPU uses to translate virtual addresses to physical addresses. The page tables live in the system space. Each table contains many PTEs (page table entries) which stores the physical address to which a virtual address is mapped. I won't go into too much detail here, but the point is that all of the VMM's work is performed in system space and not in user space.
As for context switching - when a thread running in user space needs to run in the system space, then a context switch will occur. Since the memory manager lives in system space, it's threads never need to make a context switch, since it already lives in the system space.
Apologies for the simplistic explanation, this is quite a complicated topic of discussion in depth. I would highly recommend that you pick up a copy of Windows Internals as this sounds like it would come in handy for you.