How remap_pfn_range remaps kernel memory to user space?

How remap_pfn_range remaps kernel memory to user space? - linux-kernel

remap_pfn_range function (used in mmap call in driver) can be used to map kernel memory to user space. How is it done? Can anyone explain precise steps? Kernel Mode is a privileged mode (PM) while user space is non privileged (NPM). In PM CPU can access all memory while in NPM some memory is restricted - cannot be accessed by CPU. When remap_pfn_range is called, how is that range of memory which was restricted only to PM is now accessible to user space?
Looking at remap_pfn_range code there is pgprot_t struct. This is protection mapping related struct. What is protection mapping? Is it the answer to above question?

It's simple really, kernel memory (usually) simply has a page table entry with the architecture specific bit that says: "this page table entry is only valid while the CPU is in kernel mode".
What remap_pfn_range does is create another page table entry, with a different virtual address to the same physical memory page that doesn't have that bit set.
Usually, it's a bad idea btw :-)

The core of the mechanism is page table MMU:
Related image1 http://windowsitpro.com/content/content/3686/figure_01.gif
or this:
Both picture above are characteristics of x86 hardware memory MMU, nothing to do with Linux kernel.
Below described how the VMAs is linked to the process's task_struct:
Related image http://image9.360doc.com/DownloadImg/2010/05/0320/3083800_2.gif
(source: slideplayer.com)
And looking into the function itself here:
http://lxr.free-electrons.com/source/mm/memory.c#L1756
The data in physical memory can be accessed by the kernel through the kernel's PTE, as shown below:
(source: tldp.org)
But after calling remap_pfn_range() a PTE (for an existing kernel memory but to be used in userspace to access it) is derived (with different page protection flags). The process's VMA memory will be updated to use this PTE to access the same memory - thus minimizing the need to waste memory by copying. But kernel and userspace PTE have different attributes - which is used to control the access to the physical memory, and the VMA will also specified the attributes at the process level:
vma->vm_flags |= VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP;

Related

linux kernel vfree() how to synchronize master kernel page table and process's page table？

I know how vmalloc() does。 When a process(in kernel space) want to access the memory that belongs to vmalloc()，a page fault happens and does the synchronization。
But when it invokes the vfree()， how the process update its page table to sync with the master kernel page table？ Or I have some understandings with it.
Thanks.

your understanding of memory allocation is seems to be incorrect. No memory belongs to vmalloc. A fixed virtual address (of kernel space) is assigned to vmalloc at boot up time. Later when vmalloc is called then virtual addresses are picked from fix allocated range and physical memory pages are allocated from buddy system.
Virtual addresses and physical pages are mapped one to one.
when vfree() is called, the virtual address range is freed again , so does the physical pages are return to buddy system.
Hope this correct your understanding.
I suggest you go through some online tutorial about kernel memory as also reading them now.

Is Kernel Virtual Memory pages are swappable

Like each user level process has its own Virtual memory space whose pages are swapped out/in, does Linux Kernel's Virtual memory pages are swappable ?

Kernel space pages don't get page-{in,out} by design and are pinned to memory. The pages in the kernel can usually be trusted from a security point of view, while the user space pages should NOT be trusted.
For this reason you don't have to worry about accessing kernel buffers directly in your code. While its not the same the user space buffers, without worrying about handling page faults.
Kernel space pages cannot page-out by design, as you may want to consider what would your application do when the page containing the instructions for handling a page fault gets page-out!

No, kernel memory is not swapped on Linux.

Windows kernel memory protection

In Windows the high memory of every process (0x80000000 or 0xc0000000)
Is reserved for kernel code, user code cannot access these regions of memory, if it tries so an access violation exception will be thrown.
I wish to know how is the kernel space protected ?
Is it via memory segmentations or via paging ?
I would like to hear a technical explanation.
Thanks a lot,
Michael.

Assuming you are talking about x86 and x64 architectures.
Memory protection is achieved using the paging system. Each page table entry on an x86/x64 CPU has a bit to indicate whether it is a user or supervisor page. Accesses to supervisor pages are only permitted for code running with CPL<3, whereas accesses to non supervisor pages are possible regardless of CPL.
CPL is the "Current Privilege Level" which is sometimes referred to as Ring. Windows only uses two rings, although the CPU implements 4. Ring 0 is the CPU mode in which what Windows refers to as "kernel mode" runs. Ring 3 is the CPU mode in which "User mode" runs. Since code running at CPL=3 cannot access supervisor pages, this is how memory protection is implemented.
The answer for ARM is likely to be similar, but different.

That's an easy one and doesn't require talking about rings and kernel behavior. Accessing virtual memory at a particular address requires that address to be mapped, the operating system has to allocate a memory page for that address. The low-level winapi function that does that is VirtualAlloc(). Which takes an optional address, first argument. The OS will simply fail a request for an unmappable address. Otherwise the exact same mechanism that prevents you from mapping any address in the lowest 64KB of the address space.

How different in management page table entries (PTE) in kernel space and user space?

In Linux OS, after enable the page table, kernel will only map PTEs belong to kernel space once and never remap them again ? This action is opposite with PTEs in the user space which needs to remap every time process switching happening ?
So, I want know the difference in management of PTEs in kernel and user space.
This question is a extended part from the question at:
Page table in Linux kernel space during boot

Each process has its own page tables (although the parts that describe the kernel's address space are the same and are shared.)
On a process switch, the CPU is told the address of the new table (this is a single pointer which is written to the CR3 register on x86 CPUs).

So, I want know the difference in management of PTEs in kernel and user space.
See these related questions,
Does Linux use self map for page tables?
Linux Virtual memory
Kernel developer on memory management
Position independent code and shared libraries
There are many optimizations to this,
Each task has a different PGD, but PTE values maybe shared between processes, so large chunks of memory can be mapped the same for each process; only the top-level directory (CR3 on x86, TTB on ARM) is updated.
Also, many CPUs have a TLB and cache. These need to be maintained with the memory mapping. Some caches are VIVT, VIPT and PIPT. The first two have to have some cache flushing iff the PGD and/or PTE change. Often a CPU will support a process, thread or domain id. The OS only needs to switch this register during a context switch. The hardware cache and TLB entries must contains tags with the process, thread, or domain id. This is an implementation detail for each architecture.
So it is possible that TLB flushes could be needed when a top level page registers changes. The CPU could flush the entire TLB when this happens. However, this would be a disadvantage to pages that remain mapped.
Also, sub-sections of memory can be the same. A loader or other library can use mmap to create code that is similar between processes. This common code may not need to be swapped at the page table level, depending on architecture, loader and Linux version. It could of course have a virtual alias and then it needs to be swapped.
And the final point to the answer; kernel pages are always mapped. Only a non-preemptive OS could not map the kernel, but that would make little sense as every process wants to call the kernel. I guess the micro-kernel paradigm allows for device drivers to unload when they are not in use. Linux uses module loading to handle this.

OS and Hardware role during a LD instruction

When loading the contents of a virtual address into a particular register, what are some general sequence of events that need to happen in the hardware and operating system as part of the process?
For example,
LD 0xffe4ca32, R1
The address used for this is the virtual address right?
And it would need to go through some address translation first to get a physical address.
My first question is,
When this instruction executes, how is this instruction handled by the Hardware and Operating System?
And my second question is,
Is the "value" of that virtual address, 0xffe4ca32, the contents of its mapped physical address or is it the physical address itself?
Im just not clear what is being loaded into R1

Here:
Let's assume x86. First, the CPU asks the MMU (memory management unit) to to translate the address. First the MMU checks something called the TLB (translation look-aside buffer), where recent translations from virtual to physical are stored. If it is there, the referenced address is returned. Otherwise, the MMU looks up the address in the page table. If the page is either a supervisor only page, or a page marked as not present in memory, the CPU throws a protection fault, or a page fault. For the protection fault, the OS will usually terminate the responsible process however it does that. For a page fault, the OS then checks it's own special paging structures to see if that page has been paged out, or if it just doesn't exist. If it has been paged out, it is read in to some page somewhere in memory, and the virtual address is remapped to that new place. If space cannot be found, another page will be put on disk to make room (a lot of this is called thrashing). If it has not been paged out, the OS will most likely kill the process, as it is trying to reference a non existing page.
Value of mapped physical address. Virtual memory pointers behave just like physical memory pointers in the perspective of user-space. In kernel space, there are some complications as physical memory access is needed (this is usually achieved through something called identity paging, where the first few hundred pages are mapped directly to their corresponding physical memory.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio