I am implementing a device driver in guest OS. For this I need to allocate a buffer space which is required to be contiguous physical memory. Does allocating buffer using kmalloc in guest OS guarantee contiguous physical address? If not, how can I achieve this?
kmalloc() will guarantee contiguous physical memory, and is supposed to be used for small objects, as stated in the function documentation:
* kmalloc is the normal method of allocating memory
* for objects smaller than page size in the kernel.
For bigger physically contiguous allocations, you should use alloc_pages() instead.
However, since you are in a guest OS, the physical memory you will be allocating is the one seen by the guest, not by the hypervisor (the "real" one). Whether the memory allocated is actually physically contiguous depends on how your hypervisor exposes memory to the guest OS.
Related
I have a picture of the virtual address space of a process (for x86-64):
However, I am confused about a few things.
What is the "Physical memory map" region for?
I know the 4-page tables are found in the high canonical region but where exactly are they? (data, code, stack, heap or physical memory map?)
What is the "Physical memory map" region for?
Direct-mapping of all physical RAM (usually with hugepages) allows easy access to memory given a physical address. (i.e. add an offset to generate a virtual address you can use to load or store from there.)
Having phys<->virt be cheap makes it easier to manage memory allocations, so you can primarily track what regions of physical memory are in use.
This is how kmalloc works: it returns a kernel virtual address that points into the direct-mapped region. This is great: it doesn't have to spend any time finding free virtual address space as well, just bookkeeping for physical memory. And it doesn't have to create or modify any page tables (And freeing doesn't have to tear down page tables and invlpg.)
kmalloc requires the memory to be contiguous in physical memory, not stitching together multiple 4k pages into a contiguous virtual allocation (that's what vmalloc does), so that's one reason to maybe not use kmalloc for everything, like for larger allocations that might fail or have to stop and defrag or page out memory if the kernel can't find enough contiguous physical pages. Which it couldn't do in a context that must run without pre-emption, like in an interrupt handler. (Correct me if I'm wrong, I don't regularly actually look at Linux kernel code. Regardless of actual Linux details, the basics of this way of handling allocation is important and relevant to any OS that direct-maps all physical RAM.)
Related:
What is the rationality of Linux kernel's mapping as much RAM as possible in direct-mapping(linear mapping) area?
Confusion about different meanings of "HighMem" in Linux Kernel re: how Linux uses physical RAM that it doesn't have enough virtual address-space to keep mapped all the time. (On architectures where Linux supports the concept of Highmem, e.g. i386 but not x86-64). Still, thinking about that can be a useful thought exercise in how kernels have to deal with memory, and why it's nice that x86-64 kernels generally don't have to deal with that pain.
Linux Torvalds has ranted about 32-bit x86 PAE which expanded physical address space but not virtual, when 4GiB virtual was already not enough to comfortably deal with 4GiB physical. It's a useful perspective on how this looks from an OS developer's perspective.
I know the 4-page tables are found in the high canonical region but where exactly are they? (data, code, stack, heap or physical memory map?)
Page tables for user-space task are in physical memory dynamically allocated by the kernel, probably with kmalloc. I haven't looked at the code. Every user-space page-table refers to the page directories for the kernel part of virtual address space, which are also stored somewhere.
They're only accessed by the CPU by physical address, so there's no need for there to be a virtual mapping of them other than the direct mapping of all physical RAM.
(The CPU accesses them on TLB miss, to fetch a PTE with the translation for this virtual address. But if they used virtual addresses themselves, you'd have a catch-22 unless there was a way for the OS to prime the TLB with an entry for the virtual address in CR3, and so on. Much better to just have the OS put physical linear addresses into CR3 and the page-directory / page-table entries.)
For Linux on x86-64, each process has its own page tables. The page tables are independent 4KiB physical pages that can be allocated anywhere in physical memory. The page tables are not part of the virtual address space -- they are accessed by the page table walker hardware using their physical addresses with the bit fields of the requested virtual address as indices into the page table hierarchy. The control register CR3 contains the physical address of the 4KiB page that holds the root of the page table tree for the currently running process. The kernel knows the CR3 of each process (since it must be saved and restored on context switches), so the kernel can walk a process's page tables in software (by emulating what the page table walker does in hardware) for any desired virtual address.
Generally as we know virtual memory is larger than physical memory.But when is it advantageous to define virtual memory smaller than physical memory?
If you have pointer-heavy code, you can save memory by choosing a smaller address space. For example, a pointer on a 32-bit platform occupies 4 bytes versus 8 bytes on 64-bit. The same goes for integer types like size_t.
This only works and makes sense if:
Your code/application/server uses multiple processes and all processes together need more memory than the amount of virtual memory (otherwise you wouldn't need more physical than virtual memory).
Your platform supports more physical than virtual memory (for example, Intel PAE).
The smaller amount of virtual memory is enough for each single process.
Imagine a large server system supporting multiple users. You don't want users to hog memory, so you restrict the size of the logical (virtual) address space by limiting page table size.
I was looking at allocating a buffer for a DMA transaction. I read that there are two ways of doing this - coherent mappings or streaming mappings.
Now, we want cache coherent buffers. However, we are not doing any scatter/gather so we want the benefits of the dma_map_single call. We have set aside some contiguous memory in the bootargs so we always have enough contiguous memory available.
So I was wondering, can we call dma_alloc_coherent and then dma_map_single after it with the virtual address that dma_alloc_coherent returns? The returned physical address of the map single would then be set as the dma handle that dma_alloc_coherent returned in its call.
Does this make sense? Or is it superfluous/incorrect to do it this way?
You can't use both. But you're also trying to solve a nonexistent problem. Just use dma_alloc_coherent(), which gives you a contiguous DMA memory buffer with both virtual and physical addresses. DMA to the physical address, access it from the CPU with the virtual address. What's the issue?
I trying to to make the linux memory management a little bit more clear for tuning and performances purposes.
By reading this very interesting redbook "Linux Performance and Tuning Guidelines" found on the IBM website I came across something I don't fully understand.
On 32-bit architectures such as the IA-32, the Linux kernel can directly address only the first gigabyte of physical memory (896 MB when considering the reserved range). Memory above the so-called ZONE_NORMAL must be mapped into the lower 1 GB. This mapping is completely transparent to applications, but allocating a memory page in ZONE_HIGHMEM causes a small performance degradation.
why the memory above 896 MB has to be mapped into the lower 1GB ?
Why there is an impact on performances by allocating a memory page in ZONE_HIGHMEM ?
what is the ZONE_HIGHMEM used for then ?
why a kernel that is able to recognize up to 4gb ( CONFIG_HIGHMEM=y ) can just use the first gigabyte ?
Thanks in advance
When a user process traps in to the kernel, the page tables are not changed. This means that one linear address space must be able to cover both the memory addresses available to the user process, and the memory addresses available to the kernel.
On IA-32, which allows a 4GB linear address space, usually the first 3GB of the linear address space are allocated to the user process, and the last 1GB of the linear address space is allocated to the kernel.
The kernel must use its 1GB range of addresses to be able to address any part of physical memory it needs to. Memory above 896MB is not "mapped into the low 1GB" - what happens is that physical memory below 896MB is assigned a permanent linear address in the kernel's part of the linear address space, whereas as memory above that limit must be assigned a temporary mapping in the remaining part of the linear address space.
There is no impact on performance when mapping a ZONE_HIGHMEM page into a userspace process - for a userspace process, all physical memory pages are equal. The impact on performance exists when the kernel needs to access a non-user page in ZONE_HIGHMEM - to do so, it must map it into the linear address space if it is not already mapped.
I've read that, on a 32-bit system with 4GB system memory, 2GB is allocated to user mode and 2GB allocated to kernel mode. But, If I had a system with 512 MB of memory, would it be partitioned as 256 MB to user and 256 MB to kernel address space?
You are confusing physical and virtual memory. 2GB is allocated to user/system, but it is virtual memory. It is even more correct to say that they are not rather allocated but they constitute an addressing space. Initially this space is not bound to physical memory at all. When application actually needs memory (first time is at start up) physical memory is allocated and some addresses from address space are mapped to it. When memory is allocated but not used long enough or PC is running out of physical memory data can be dumped in swap file, and stay there until requested. This mapping is transparent for application and it has no idea where data currently is: on chip or on HDD. So the address space is always splitted the same way.
This is not about memory (physical or virtual), but about address space.
You can plug 16GB of physical memory into your computer and make a 100GB swapfile, but 32-bit (non-enterprise) Windows will still only see 4GB (and subtract 0.75 GB for GPU memory and such). Via PAE, it could use more, but non-enterprise versions won't do that.
On top of the actual amount of memory, there is address space, which is limited to 4GB as well. Basically it is no more and no less than the collection of "numbers" (which, in this case, are addresses) that can be represented by a 32 bit number.
Since the kernel will need memory too, there is some arbitrary line drawn, which happens to be at the 2GB boundary for 32bit Windows, but can be configured differently, too.
It has nothing to do with the amount of memory on your computer (virtual or phsyical), but it is a limiting factor of how much memory you can use within a single program instance. It is not, however, a limiting factor on the memory that several programs could use.
As far as I can tell, what you are referring to are limits of how much memory can be allocated. This is much different than how much memory the OS allocated during runtime.