Is there any difference between kernel space & user space MEMORY ALLOCATIONS?
From which region of memory they get allocated.
Can anyone please provide some pointers on this?
Thanks.
Best Regards,
Sandeep Singh
The memory regions for both the regions are governed by the respective address-space ranges. The boundary value is stored in the fence register.
User and kernel memory does have differences: in the sense of having different physical attributes tagged to it:
https://unix.stackexchange.com/questions/87625/what-is-difference-between-user-space-and-kernel-space
But for allocation algorithm itself: userspace memory is always falling back on kernel memory for its ultimate implementation.
And because kernel mode memory is so much more powerful than usermode, there is a hardware mechanism called SMEP to prevent executing usermode memory from inside kernel mode:
https://www.ncsi.com/nsatc11/presentations/wednesday/emerging_technologies/fischer.pdf
And hardware features like NX bit is always controlled from the kernel mode (ring 0): as a normal user (ring 3) you will not be able to access the bit.
More hardware features:
http://hypervsir.blogspot.sg/2014/11/page-structure-table-corruption-attacks.html
Related
Why do we need the memory management unit?
It seems the only task of the memory management unit is to convert virtual addresses to physical address. Can't this be done in software? Why do we need another hardware device to this?
MMU (Memory Management Unit) is a hardware component available on most hardware platforms translating virtual addresses to physical addresses. This translation brings the following benefits:
Swap: your system can handle more memory than the one physically available. For example, on a 32-bit architecture, the system "sees" 4 GB of memory, regardless of the amount of the physical memory available. If you use more memory than actually available, memory pages are swapped out onto the swap disk.
Memory protection: the MMU enforces memory protection by preventing a user-mode task to access the portion of memory owned by other tasks.
Relocation: each task can use addresses at a certain offset (e.g., for variables), regardless of the real addresses assigned at run-time.
It is possible to partially implement a software translation mechanism. For example, for the relocation you can have a look at the implementation of gcc's fpic. However, a software mechanism can't provide memory protection (which, in turn, affects system security and reliability).
The reason for a MMU component of a CPU is to make the logical to physical address translation transparent to the executing process. Doing it in software would require stopping to process every memory access by a process. Plus, you'd have the chicken and egg problem that, if memory translation is done by software, who does that software's memory translation.
I was reading this on a page that:
Because of hardware limitations, the kernel cannot treat all pages as identical. Some pages, because of their physical address in memory, cannot be used for certain tasks. Because of this limitation, the kernel divides pages into different zones.
I was wondering about those hardware limitation. Can somebody please explain me those hardware limitation and give an example. As well, is there any software guide from intel explaining this?
Also, I read that virtual memory is divided into two parts 1GB for kernel space and 3GB for user space. Why do we give 1GB space in the virtual space of all processes to kernel? How is it mapped to actual physical pages? Can somebody please point me to a clean text explaining this?
Thanks in advance.
The hardware limitations mostly concern old devies. For example, you have the ZONE_DMA, which is from 0 - 16MB. This is e.g. needed for older ISA Devices, which are not capable of adressing above the 16MB limit. Then you have the ZONE_NORMAL, where most of the kernel operations take place and is adressed permanently into the kernels adress space.
The 1GB and 3GB split is simple. You have virtual adresses here, so for your application, the memory adress always starts at 0x00000000, reserved are the 1st GB of this for kernel stuff. Why this is done is pretty simple: You have the kernel mode and the user mode. In kernel mode you are allowed to use system calls. If you would not have the kernel memory mapped to your virtual adress space, you would have to do a context switch to trap you into kernel mode (context switch: Save current context to memory, load another context from memory -> time consuming). But as kernel-mode operations can take place in the same virtual adress space, you dont need to switch the context to, for example, allocate new memory or do any other system call.
for your second question about 1GB kernel mapping in user space for a processor
kernel is mapped of course for time saving by not having switch. 1 GB is for kernel functionality so that if kernel maps new memory for its functionality kernel can do that. any book on Unix can give you details
I have an SoC which has both DSP and ARM cores on it and I would like to create a section of shared memory that both my userspace software, and DSP software are able to access. What would be the best way to allocate a buffer like this in Linux? Here is a little background, right now what I have is a kernel module in which I use kmalloc() to get a kernel buffer, I then use the __pa() macro from asm/page.h to get the physical address of my kernel buffer. I save this address as a sysfs entry so that my userspace code can get the physical address of this buffer. I can then write this address to the DSP so it knows where the shared memory location is, and I can also mmap /dev/mem or my own kernel module so that I can access this buffer from userspace (I could also use the read/write fileops).
For some reason I feel like this is overboard but I cannot find the best way to do what I am trying to do.
Would it be possible to just mmap \dev\mem a section of memory and just read and write to this section? My feeling is that this would not 'lock' this section of memory from the kernel, thus the kernel could still read/write to this memory without me knowing. Is this the case. After reading the memory management chapter of LDD3 I see that mmap creates a new VMA of the mapping. Would this lock this area of memory so that other processes would not get allocated this section of memory?
Any and all help is appreciated
Depending on the kind of DMA you're using, you need to allocate the buffer with dma_alloc_coherent(), or use standard allocations and the dma_map_* functions.
(You must not use __pa(); physical addresses are not necessarily the same as DMA bus addresses.)
To map the buffers to user space, use dma_mmap_coherent() for coherent buffers, or map the memory pages manually for streaming buffers.
For a similar requirement of mine, I had reserved about 16 MB of memory towards the end of ram and used it in both kernel and user space. Suppose you have 128 MB ram, you can set BOOTMEM argument as 112 MB in your boot loader. I am assuming you are using uboot. This will reserve 16 MB towards the end of the ram. Now in kernel and user space you can map this area and use it as shared memory.
I have a sequential user space program (some kind of memory intensive search data structure). The program's performance, measured as number of CPU cycles, depends on memory layout of the underlying data structures and data cache size (LLC).
So far my user space program is tuned to death, now I am wondering if I can get performance gain by moving the user space code into kernel (as a kernel module). I can think of the following factors that improve the performance in kernel space ...
No system call overhead (how many CPU cycles is gained per system call). This is less critical as I am barely using any system call in my program except for allocating memory that too just when the program starts.
Control over scheduling, I can create a kernel thread and make it run on a given core without being thrown away.
I can use kmalloc memory allocation and thus can have more control over memory allocated, may can also control the cache coloring more precisely by controlling allocated memory. Is it worth trying?
My questions to the kernel experts...
Have I missed any factors in the above list that can improve performance further?
Is it worth trying or it is straight way known that I will NOT get much performance improvement?
If performance gain is possible in kernel, is there any estimate how much gain it can be (any theoretical guess)?
Thanks.
Regarding point 1: kernel threads can still be preempted, so unless you're making lots of syscalls (which you aren't) this won't buy you much.
Regarding point 2: you can pin a thread to a specific core by setting its affinity, using sched_setaffinity() on Linux.
Regarding point 3: What extra control are you expecting? You can already allocate page-aligned memory from user space using mmap(). This already lets you control for the cache's set associativity, and you can use inline assembly or compiler intrinsics for any manual prefetching hints or non-temporal writes. The main difference between memory allocated in the kernel and in user space is that kmalloc() allocates wired (non-pageable) memory. I don't see how this would help.
I suspect you'll see much better ROI on parallelising using SIMD, multithreading or making further algorithmic or memory optimisations.
Create a dedicated cpuset for your program and move all other processes out of it. Then bump your process' priority to realtime with FIFO scheduling policy using something like:
struct sched_param schedparams;
// Be portable - don't just set priority to 99 :)
schedparams.sched_priority = sched_get_priority_max(SCHED_FIFO);
sched_setscheduler(0, SCHED_FIFO, &schedparams);
Don't do that on a single-core system!
Reserve large enough stack space with alloca(3) and touch all of the allocated stack memory, map more than enough heap space and then use mlock(2) or mlockall(2) to pin process memory.
Even if your program is a sequential one, if run on a multisocket Nehalem or post-Nehalem Intel system or an AMD64 system, NUMA effects can slow your program down. Use API functions from numa(3) to allocate and keep memory as close to the NUMA node where your program executes as possible.
Try other compilers - some of them might optimise better than the compiler that you are currently using. Intel's compiler for example is very aggresive on laying out instructions as to benefit from out of order execution, pipelining and branch prediction.
This article http://msdn.microsoft.com/en-us/library/aa366912(v=vs.85).aspx states that virtual memory in a win32 environment (32 bit supposed) half is dedicated to user mode processes, half to kernel mode processes.
If I recall from pagination, every process should have its own address space from 0 to whatsoever (max 0x7FFFFFFF according to the article). But what for a kernel driver? Does every kernel driver/program has its kernel address space from 0x80000000 through 0xFFFFFFFF?
Or I'm just getting wrong?
I believe that you are under the impression that drivers are separate processes; with monolithic and hybrid kernels (NT is considered a hybrid), they are not. Think of drivers as modules that the kernel loads into itself in ring 0. In effect, they become part of the kernel.
Parts of that address space may change between processes, but most of the kernel address space would be shared between all processes.
As far as I know, there is only one kernel. :-)
The address ranges seems ok though, unless the system is configured for 3GB user space.
In Windows, kernel mode drivers live in the kernel and share the kernel's address space.