Kernel thread does not have memory descriptor it use mm_struct of last used process
how and what part of mm_struct use by kernel thread?
is it clear all detail of previous process?
kernel threads runs only in kernel address space. They don't have access to user space virtual memory and they only use kernel space memory address after PAGE_OFFSET. So (struct task_struct *)->mm field in the process descriptor is NULL. You would require to dynamically allocate memory within your kernel thread if required.
Related
I have few questions after reading Mel Gorman's book Understanding the Linux Virtual Memory Manager. Section 4.3 Process Address Space Descriptor says kernel threads never page fault or access the user space portion. The only exception is page faulting within the vmalloc space . Following are my questions.
kenrel threads never page fault: Does this mean only user space code triggers page fault? If a kmalloc() or vmalloc() is called, will it not page fault? I believe the kernel has to map these to the anon pages. When a write to this pages is performed, a page fault occurs. Is my understanding correct?
Why can't kernel threads access user space? Aren't copy_to_user() or copy_from_user() do that?
Exception is page faulting within vmalloc space: Does that mean vmalloc() triggers a page fault and kmalloc() doesn't ? Why kmalloc() does not page fault? The physical frames to kernel's virtual address need not to be kept as a page table entry?
kernel threads never page fault: The page fault talked about is when making a virtual page resident, or bringing it back from swap. Kernel pages not only get paged in on kmalloc(), but also remain resident for their lifetime. The same does not hold for user space pages, which A) may be lazy allocated (i.e. just reserved as page table entries on malloc(), but not actually faulted in until a memset() or other dereference) and B) may be swapped out on low memory conditions.
Why can't kernel threads access user space? Aren't copy_to_user() or copy_from_user() do that?
That's a great question, with a hardware-specific reply. It used to be the case that kernel threads were discouraged from accessing user space, exactly because of the possible page fault hit that might occur, if accessing unpaged/paged out memory in user space (recall, that wouldn't happen in kernel space, as above ensures). So copy_to/from would be normal memcpy, but wrapped in a page fault handler. This way, any potential page fault would be handled transparently (i.e. the memory would be paged in) and all would be well. But there were certainly cases where the bad approach of memcpy to/from user memory would just work - worse, it would work more often than not, as page faults very with RAM residency and availability - and thus unhandled faults would cause random panics. Hence the decree of always using the copy_from/to_user.
Recently, however, kernel/user memory isolation became important from a security standpoint. This is due to many exploitation techniques (NULL pointer dereferencing being a very common and powerful one), where fake kernel objects (or code) could be constructed in user space (and thus, easily controlled) memory, and could lead to code execution in kernel.
Most architectures thus have a page table bit which physically prevents a page belonging to user mode from being accessed by kernel. Taking ARM64 as an example, this feature is called PAN/PXN (Privileged Access/Execute Never).
Thus, copy_from/to now not only handles page faults, but also disables PAN/PXN before the operation, and restores it after.
Exception is page faulting within vmalloc space: vmalloc() allocates memory which is swappable, whereas kmalloc does not. The difference is in the implementation (kmalloc uses GFP_KERNEL). This also means that kmalloc is more likely to fail (if there is no RAM available for this), but will not page fault (it would return NULL, which itself would be a problem..)
I think you get counfused because you haven't understand clearly about the start of kernel, process, and virtual memeory.
kenrel threads never page fault: This is because the pages of kernel space and user space use different allocation methods. For the kernel space, we allocate pages when initialization, but for user space, we allocate them when running process and calling funcitons like malloc(), and after mapping, when truly using that virtual memory, we trigger page fault.
Why can't kernel threads access user space? When kenrel start, the process 0 will create process 1 and process 2. The process 1 is used to form the user space process tree, while the process 2 is used to manage the kernel threads. And the functions you mensioned are always used by those user threads to transmit data into/out of kernel to realise some function like open file or socket and so on.
Exception is page faulting within vmalloc space: The vmalloc space is not function vmalloc(), it is an area in kernel memory space for some dynamic memory allocation used as an exception.
I know how vmalloc() does。 When a process(in kernel space) want to access the memory that belongs to vmalloc(),a page fault happens and does the synchronization。
But when it invokes the vfree(), how the process update its page table to sync with the master kernel page table? Or I have some understandings with it.
Thanks.
your understanding of memory allocation is seems to be incorrect. No memory belongs to vmalloc. A fixed virtual address (of kernel space) is assigned to vmalloc at boot up time. Later when vmalloc is called then virtual addresses are picked from fix allocated range and physical memory pages are allocated from buddy system.
Virtual addresses and physical pages are mapped one to one.
when vfree() is called, the virtual address range is freed again , so does the physical pages are return to buddy system.
Hope this correct your understanding.
I suggest you go through some online tutorial about kernel memory as also reading them now.
I am developing a kernel application which involves kthreads. I create an array of structure and allocate memory using malloc in user-space. Then I call a system call (which I implemented) and pass the address of array to kernel-space. In the handler of system-call I create I create 2 kthreads which will monitor the array. kthread can change some value and user-space threads can also change some values. The idea is to use the array as a shared memory. But some when I access the memory in kernel space (using copy_from_user) the data are somehow changed. I can verify that the address are same when it was assigned and in kernel. But when using copy_from_user it is giving various values like garbage values.
Also is the following statement ok?
int kthread_run_function(void* data){
struct entry tmp;
copy_from_user(&tmp, data, sizeof(struct entry));
}
This is not OK because copy_from_user() copies from the current user process (which should be obvious, since there's no way to tell it which user process to copy from).
In a syscall invoked by your userspace process this is OK, because the current process is your userspace process. However, within the kernel thread the current process could be any other process on the system - so you're copying from a random process's memory, which is why you get garbage.
If you want to share memory between the kernel and a userspace process, the right way to do this is to have the kernel allocate it, then allow the userspace process to map it into its address space with mmap(). The kernel thread and the userspace process will use different pointers to refer to the memory region - the kernel thread will use a pointer to the memory allocated within the kernel address space, and the userspace process will use a pointer to the memory region returned by mmap().
No, generally it's not OK since data is kernel virtual address, not a user virtual address.
However, IFF you called kthread_create with the data argument equal to an __user pointer, this should be ok.
This might be a silly question but it just popped up in my mind. All the text about process address space and virtual memory layout mentions that the process address space has
space reserved for kernel. For e.g. on 32 bit systems the process address space is 4GB of which 1 GB is reserved for kernel in Linux (Might be different on other OS).
I am just wondering why kernel is said to be in the process address space when a process cannot address the kernel directly. Why don't we say that the kernel has a separate address space than a process and why can't we have a different page table for kernel itself which is separate from the page tables of the processes?
When the process makes a system call, we don't need to switch the page tables (from process address space page table to kernel address space page table) for servicing the system call (which should be done only in kernel mode). This is said to be that the kernel is running in the process context.
Some kernel events which won't run in process context will load the page tables only for kernel.
Got it ?
I've to implement a char device, a LKM.
I know some basics about OS, but I feel I don't have the big picture.
In a C programm, when I call a syscall what I think it happens is that the CPU is changed to ring0, then goes to the syscall vector and jumps to a kernel memmory space function that handle it. (I think that it does int 0x80 and in eax is the offset of the syscall vector, not sure).
Then, I'm in the syscall itself, but I guess that for the kernel is the same process that was before, only that it is in kernel mode, I mean the current PCB is the process that called the syscall.
So far... so good?, correct me if something is wrong.
Others questions... how can I write/read in process memory?.
If in the syscall handler I refer to address, say, 0xbfffffff. What it means that address? physical one? Some virtual kernel one?
To read/write memory from the kernel, you need to use function calls such as get_user or __copy_to_user.
See the User Space Memory Access API of the Linux Kernel.
You can never get to ring0 from a regular process.
You'll have to write a kernel module to get to ring0.
And you never have to deal with any physical addresses, 0xbfffffff represents an address in a virtual address space of your process.
Big picture:
Everything happens in assembly. So in Intel assembly, there is a set of privilege instruction which can only be executed in Ring0 mode (http://en.wikipedia.org/wiki/Privilege_level). To make the transition into Ring0 mode, you can use the "Int" or "Sysenter" instruction:
what all happens in sysenter instruction is used in linux?
And then inside the Ring0 mode (which is your kernel mode), accessing the memory will require the privilege level to be matched via DPL/CPL/RPL attributes bits tagged in the segment register:
http://duartes.org/gustavo/blog/post/cpu-rings-privilege-and-protection/
You may asked, how the CPU initialize the memory and register in the first place: it is because when bootup, x86 CPU is running in realmode, unprotected (no Ring concept), and so everything is possible and lots of setup work is done.
As for virtual vs non-virtual memory address (or physical address): just remember that anything in the register used for memory addressing, is always via virtual address (if the MMU is setup, protected mode enabled). Look at the picture here (noticed that anything from the CPU is virtual address, only the memory bus will see physical address):
http://en.wikipedia.org/wiki/Memory_management_unit
As for memory separation between userspace and kernel, you can read here:
http://www.inf.fu-berlin.de/lehre/SS01/OS/Lectures/Lecture14.pdf