vma's in the kernel - linux-kernel

I was reading up on virtual memory areas for linux processes here.
In the kernel vma linked list pointed to by the "mmap" field inside the memory descriptor(an instance of mm_struct).
I understand that there is a vma for each contiguous set of virtual addresses.
Are there vma's in that list corresponding to the kernel mapping?

Related

Physical memory mapping and location of page tables

I have a picture of the virtual address space of a process (for x86-64):
However, I am confused about a few things.
What is the "Physical memory map" region for?
I know the 4-page tables are found in the high canonical region but where exactly are they? (data, code, stack, heap or physical memory map?)
What is the "Physical memory map" region for?
Direct-mapping of all physical RAM (usually with hugepages) allows easy access to memory given a physical address. (i.e. add an offset to generate a virtual address you can use to load or store from there.)
Having phys<->virt be cheap makes it easier to manage memory allocations, so you can primarily track what regions of physical memory are in use.
This is how kmalloc works: it returns a kernel virtual address that points into the direct-mapped region. This is great: it doesn't have to spend any time finding free virtual address space as well, just bookkeeping for physical memory. And it doesn't have to create or modify any page tables (And freeing doesn't have to tear down page tables and invlpg.)
kmalloc requires the memory to be contiguous in physical memory, not stitching together multiple 4k pages into a contiguous virtual allocation (that's what vmalloc does), so that's one reason to maybe not use kmalloc for everything, like for larger allocations that might fail or have to stop and defrag or page out memory if the kernel can't find enough contiguous physical pages. Which it couldn't do in a context that must run without pre-emption, like in an interrupt handler. (Correct me if I'm wrong, I don't regularly actually look at Linux kernel code. Regardless of actual Linux details, the basics of this way of handling allocation is important and relevant to any OS that direct-maps all physical RAM.)
Related:
What is the rationality of Linux kernel's mapping as much RAM as possible in direct-mapping(linear mapping) area?
Confusion about different meanings of "HighMem" in Linux Kernel re: how Linux uses physical RAM that it doesn't have enough virtual address-space to keep mapped all the time. (On architectures where Linux supports the concept of Highmem, e.g. i386 but not x86-64). Still, thinking about that can be a useful thought exercise in how kernels have to deal with memory, and why it's nice that x86-64 kernels generally don't have to deal with that pain.
Linux Torvalds has ranted about 32-bit x86 PAE which expanded physical address space but not virtual, when 4GiB virtual was already not enough to comfortably deal with 4GiB physical. It's a useful perspective on how this looks from an OS developer's perspective.
I know the 4-page tables are found in the high canonical region but where exactly are they? (data, code, stack, heap or physical memory map?)
Page tables for user-space task are in physical memory dynamically allocated by the kernel, probably with kmalloc. I haven't looked at the code. Every user-space page-table refers to the page directories for the kernel part of virtual address space, which are also stored somewhere.
They're only accessed by the CPU by physical address, so there's no need for there to be a virtual mapping of them other than the direct mapping of all physical RAM.
(The CPU accesses them on TLB miss, to fetch a PTE with the translation for this virtual address. But if they used virtual addresses themselves, you'd have a catch-22 unless there was a way for the OS to prime the TLB with an entry for the virtual address in CR3, and so on. Much better to just have the OS put physical linear addresses into CR3 and the page-directory / page-table entries.)
For Linux on x86-64, each process has its own page tables. The page tables are independent 4KiB physical pages that can be allocated anywhere in physical memory. The page tables are not part of the virtual address space -- they are accessed by the page table walker hardware using their physical addresses with the bit fields of the requested virtual address as indices into the page table hierarchy. The control register CR3 contains the physical address of the 4KiB page that holds the root of the page table tree for the currently running process. The kernel knows the CR3 of each process (since it must be saved and restored on context switches), so the kernel can walk a process's page tables in software (by emulating what the page table walker does in hardware) for any desired virtual address.

What is the difference between kernel logicla address space , kernel virtual address space and user virtual address space

Let me put my understanding.
Suppose we have a 32-bit memory address space for a system. So a process can access any memory in the 4GB range
If the RAM in the system we have of 4GB, kernel divides it into 1:3 . 1GB for kernel , and rest 3GB for the user space process.
A user space process will get the system memory access within that 3GB memory only and which address it gets is determined by the page table.
Kernel logical address is that 1GB ( approx ~896MB) memory which is being reserved only for the kernel. Is this correct?
kernel virtual address is the memory left i.e. 104MB + 3GB that also can be assigned to userspace. Is this correct?
user virtual address is the address generated by the user space process and its corresponding memory would be assigned from the 3GB reserved for the user space process by the kernel.
Let me know if my above understanding is correct? If not can you please explain in detail the difference between kernel logical address space , kernel virtual address space and user virtual address space.
your understanding is a mixture of right and wrong, I'll try to point to some of them:
in 32 bit machines, we're not always limited by 4GB addressable RAM, check this question for more detail: link
the memory is an abstraction for the user space programs, they see it a a continuous big chunk of memory, but the kernel manages this abstraction with some hardware support named MMU, to map the used virtual space in the user space program into an actual physical address or even some bloc in hard drive if swapping is activated.
the kernel can actually access to the physical memory, in order to manage the abstraction mentioned above, it can also use this abstraction, this depends on the designer of the kernel.
as for the difference between the virtual and logical addressing, check this answer: link

Process virtual address space and kernel address space? How?

I am very new to kernel or system programming,
I have couple of questions related to virtual memory. Mostly related to static vs run time, [i.e. ELF and loading/Linking etc], Specific to linux-x86.
My understandings might be completely wrong...
I am aware of virtual memory and it's split 1G/3G. where process can not access address above PAGE_OFFSET in user mode - PAGE_OFFSET is virtual address.
At Static time ELF defines process Virtual space?
If ELF defines virtual address space then does ELF also defines kernel virtual address space? How?
[ I assume kernel virtual address space is dynamically mapped at run time?]
If kernel address space is mapped to process address space then why doesn't process size(virtual) includes kernel size also?
When and How this kernel address space is mapped/linked?
Like , In case of shared library the particular file is pointed by vm struct etc..
Is it when code flow hits system call? For example.
Does executable size determines process size (virtual) completely? in what context sizes differ or they are completely different.
Any article that explains overall flow?
Compile --> link/load --> virtual memory structure (kernel address space/shared objects etc)
I know its very vast but explanation on overall flow will work.
Most systems define logical address ranges for kernel and user address spaces. On some systems the range is entirely up to the operating system (how it sets up the page tables) on others it is done in hardware.
For the former, page tables are usually nested. In which case multiple page tables share identical entires.
For the latter, there are usually separate page tables for the user and kernel address spaces.
If ELF defines virtual address space then does ELF also defines kernel virtual address space? How? [ I assume kernel virtual address space is dynamically mapped at run time?]
The executable file only defines the initial layout of the user address space.
If kernel address space is mapped to process address space then why doesn't process size(virtual) includes kernel size also?
That would depend upon the system and how it does the counting.
When and How this kernel address space is mapped/linked? Like , In case of shared library the particular file is pointed by vm struct etc.
The kernel address space exists independent of any processes. As mentioned above, it is mapped to a process either by having a system page table shared by all processes or a set of nested page table entries shared by all processes.
Does executable size determines process size (virtual) completely? in what context sizes differ or they are completely different.'
Not really. Large executables are indicative of larger ranges of logical addresses required. However, a small EXE can easily describe a large number of demand zero pages. In addition, the application can map logical pages as it executes. The EXE only defines the initial state of the logical address space.

Windows - How does this memory addressing work?

Assuming x86, I'm starting to learn that addresses 0x0 thru 0x7FFFFFFF are for the process; whereas anything higher is reserved for the kernel.
I have three curiosities:
1) Does a process EVER call an address higher than 0x7FFFFFFF? I assume it will always result in some sort of access denied? How is that access denied enforced?
2) Does "shared memory" IPC work by mapping two processes virtual addresses to the same physical address range?
3) The amount of RAM in your machine can vary. You may have 2GB, or much more like 16GB. How does this affect the addressing of RAM? Does the kernel ever leave a bunch of RAM unused because it was reserved for itself, but doesn't need it? How can I see this?
I am not very sure but you will find the maximum in this MSDN doc about how it works:-
The range of virtual addresses that is available to a process is
called the virtual address space for the process. Each user-mode
process has its own private virtual address space. For a 32-bit
process, the virtual address space is usually the 2-gigabyte range
0x00000000 through 0x7FFFFFFF. For a 64-bit process, the virtual
address space is the 8-terabyte range 0x000'00000000 through
0x7FF'FFFFFFFF. A range of virtual addresses is sometimes called a
range of virtual memory.
The diagram shows the virtual address spaces for two 64-bit processes:
Notepad.exe and MyApp.exe. Each process has its own virtual address
space that goes from 0x000'0000000 through 0x7FF'FFFFFFFF. Each shaded
block represents one page (4 kilobytes in size) of virtual or physical
memory. Notice that the Notepad process uses three contiguous pages of
virtual addresses, starting at 0x7F7'93950000. But those three
contiguous pages of virtual addresses are mapped to noncontiguous
pages in physical memory. Also notice that both processes use a page
of virtual memory beginning at 0x7F7'93950000, but those virtual pages
are mapped to different pages of physical memory.

Difference between Kernel Virtual Address and Kernel Logical Address?

I am not able to exactly difference between kernel logical address and virtual address. In Linux device driver book it says that all logical address are kernel virtual address, and virtual address doesn't have any linear mapping. But logically wise when we say it is logical and when we say virtual and in which situation we use these two ?
The Linux kernel maps most of the virtual address space that belongs to the kernel to perform 1:1 mapping with an offset of the first part of physical memory. (slightly less then for 1Gb for 32bit x86, can be different for other processors or configurations). For example, for kernel code on x86 address 0xc00000001 is mapped to physical address 0x1.
This is called logical mapping - a 1:1 mapping (with an offset) that allows the kernel to access most of the physical memory of the machine.
But this is not enough - sometime we have more then 1Gb physical memory on a 32bit machine, sometime we want to reference non contiguous physical memory blocks as contiguous to make thing simple, sometime we want to map memory mapped IO regions which are not RAM.
For this, the kernel keeps a region at the top of its virtual address space where it does a "random" page to page mapping. The mapping there do not follow the 1:1 pattern of the logical mapping area. This is what we call the virtual mapping.
It is important to add that on many platforms (x86 is an example), both the logical and virtual mapping are done using the same hardware mechanism (TLB controlling virtual memory). In many cases, the "logical mapping" is actually done using virtual memory facility of the processor, so this can be a little confusing. The difference therefore is the pattern according to which the mapping is done: 1:1 for logical, something random for virtual.
Basically there are 3 kinds of addressing, namely
Logical Addressing : Address is formed by base and offset. This is nothing but segmented addressing, where the address (or offset) in the program is always used with the base value in the segment descriptor
Linear Addressing : Also called virtual address. Here adresses are contigous, but the physical address are not. Paging is used to implement this.
Physical addressing : The actual address on the Main Memory!
Now, in linux, Kernel memory (in address space) is beyond 3 GB ( 3GB to 4GB), i.e. 0xc000000..The addresses used by Kernel are not Physical addresses. To map the virtual address it uses PAGE_OFFSET. Care must be taken that no page translation is involved. i.e. these addresses are contiguous in nature. However there is a limit to this, i.e. 896 MB on x86. Beyond which paging is used for translation. When you use vmalloc, these addresses are returned to access the allocated memory.
In short, when someone refers to Virtual Memory in context of User Space, then it is through Paging. If Kernel Virtual Memory is mentioned then it is either PAGE_OFFSETed or vmalloced address.
(Reference - Understanding Linux Kernel - 2.6 based )
Shash
Kernel logical addresses are mappings accessible to kernel code through normal CPU memory access functions. On 32-bit systems, only 4GB of kernel logical address space exists, even if more physical memory than that is in use. Logical address space backed by physical memory can be allocated with kmalloc.
Virtual addresses do not necessarily have corresponding logical addresses. You can allocate physical memory with vmalloc and get back a virtual address that has no corresponding logical address (on 32-bit systems with PAE, for example). You can then use kmap to assign a logical address to that virtual address.
Simply speaking, virtual address would include "high memory", which doesn't do the 1:1 mapping for the physical address,if your RAM size is more than the address range of kernel(typically,For 1G/3G in X86,your RAM is 3G but your kernel addressing range is 1G) and also the address return from kmap() and vmalloc(), which requires the kernel to establish page table for the memory mapping. since logic address is always memory mapped by the kernel(1:1 mapping), you don't need to explicitly call kernel API,like set_pte to set up the page table entry for the particular page.
so virtual address can't be logic address all the time.

Resources