Linux kernel code space write protection - linux-kernel

I had couple of questions on linux kernel memory page write protection.
How can i figure out if the kernel
code (text segment) is write
protected or not. I can look at
/proc/<process-id>/map to see the
memory map for various processes.
But not sure where to look for the
kernel code memory map.
If the kernel code segment is write
protected, then is it possible for
the code segment pages to be
overwritten by any other kernel
level code. In other words, does the
write protect on a text segment page
protects against only the user space
code writing to it or will it
prevent writes even from within the
kernel space code.
Thanks

Code running in the kernel has direct access to the page tables for the current address space, so it can check for write access by examining those. There are probably functions to help you with that check, but I'm not familiar enough with the mm code to point them out. Is there an easier way? I'm not sure.
The kernel text should never be writable from user-space. The text can additionally be protected against writing from kernel code too (I think this is what you're talking about). This is only a basic protection against bugs. Kernel code, if it really wants to, can disable that protection by modifying the page tables directly.

There is one paper talking about that. Basically, it uses a small hypervisor to protect the OS kernel.
SecVisor: A Tiny Hypervisor to Provide Lifetime Kernel Code Integrity for Commodity OSes.
http://www.sosp2007.org/papers/sosp079-seshadri.pdf

Related

How to pin a interrupt to a CPU in driver

Is it possible to pin a softirq, or any other bottom half to a processor. I have a doubt that this could be done from within a softirq code.
But then inside a driver is it possible to pin a particular IRQ to a
core.
From user mode, you can easily do this by writing to /proc/irq/N/smp_affinity to control which processor(s) an interrupt is directed to. The symbols for the code implementing this are not exported though, so it's difficult to do from the kernel (at least for a loadable module which is how most drivers are structured).
The fact that the implementing function symbols aren't exported is a sign that the kernel developers don't want to encourage this. Presumably that's because it takes control away from the user. And also embeds assumptions about number of processors and so forth into the driver.
So, to answer your question, yes, it's possible, but it's discouraged, and you would need to do one of several "ugly" things to implement it ((a) change kernel exports, (b) link your driver statically into main kernel, or (c) open/write to the proc file from kernel mode).
The usual way to achieve this is by writing a user-mode program (can even be a shell script) that programs core numbers/masks into the appropriate proc file. See Documentation/IRQ-affinity.txt in the kernel source directory for details.

Callback from userspace to kernel space

I am looking into the fpga driver code which will write some value to FPGA device at low level. At top level in user space value is being written to /dev/fpga, now I guess this is the logic how driver gets its value from user-space and exposed file in user space is "/dev/fpga".
But now how actually this value from fpga is reached to device , there must some callback maintained.
But I really could not figure out how it actually happens,Is there any standard way?
Anybody can help me find out this userspace to kernel space link.
It's probably a character device. You can create one in your kernel module, and your callback functions will be called in the kernel when it is opened, something is written to it etc. See:
http://linux.die.net/lkmpg/x569.html
for an explanation how it works and sample code.

Modifying Linux process page table for physical memory access without system call

I am developing a real-time application for Linux 3.5.7. The application needs to manage a PCI-E device.
In order to access the PCI-E card spaces, I have been using mmap in combination with /dev/mem. However (please correct me if I am wrong) each time I read or write the mapped memory, a system call is required for the /dev/mem pseudo-driver to handle the memory access.
To avoid the overhead of this system call, I think it should be possible to write a kernel module so that, within e.g. a ioctl call I can modify the process page table, in order to map the physical device pages to userspace pages and avoid the system call.
Can you give me some orientation on this?
Thanks and regards
However (please correct me if I am wrong) each time I read or write the mapped memory, a system call is required
You are wrong.
it should be possible to write a kernel module so that, within e.g. a ioctl call I can modify the process page table
This is precisely what mmap() does.

What happens when I printk a char * that was initialized in userspace?

I implemented a new system call as an intro exercise. All it does is take in a buffer and printk that buffer. I later learned that the correct practice would be to use copy_from_user.
Is this just a precautionary measure to validate the address, or is my system call causing some error (page fault?) that I cannot see?
If it is just a precautionary measure, what is it protecting against?
Thanks!
There are several reasons.
Some architectures employ segmented memory, where there is a separate segment for the user memory. In that case, copy_from_user is essential to actually get the right memory address.
The kernel has access to everything, including (almost by definition) a lot of privileged information. Not using copy_from_user could allow information disclosure if a user passes in a kernel address. Worse, if you are writing to a user-supplied buffer without copy_to_user, the user could overwrite kernel memory.
You'd like to prevent the user from crashing the kernel module just by passing in a bad pointer; using copy_from_user protects against faults so e.g. a system call handler can return EFAULT in response to a bad user pointer.

Change user space memory protection flags from kernel module

I am writing a kernel module that has access to a particular process's memory. I have done an anonymous mapping on some of the user space memory with do_mmap():
#define MAP_FLAGS (MAP_PRIVATE | MAP_FIXED | MAP_ANONYMOUS)
prot = PROT_WRITE;
retval = do_mmap(NULL, vaddr, vsize, prot, MAP_FLAGS, 0);
vaddr and vsize are set earlier, and the call succeeds. After I write to that memory block from the kernel module (via copy_to_user), I want to remove the PROT_WRITE permission on it (like I would with mprotect in normal user space). I can't seem to find a function that will allow this.
I attempted unmapping the region and remapping it with the correct protections, but that zeroes out the memory block, erasing all the data I just wrote; setting MAP_UNINITIALIZED might fix that, but, from the man pages:
MAP_UNINITIALIZED (since Linux 2.6.33)
Don't clear anonymous pages. This flag is intended to improve performance on embedded
devices. This flag is only honored if the kernel was configured with the
CONFIG_MMAP_ALLOW_UNINITIALIZED option. Because of the security implications, that option
is normally enabled only on embedded devices (i.e., devices where one has complete
control of the contents of user memory).
so, while that might do what I want, it wouldn't be very portable. Is there a standard way to accomplish what I've suggested?
After some more research, I found a function called get_user_pages() (best documentation I've found is here) that returns a list of pages from userspace at a given address that can be mapped to kernel space with kmap() and written to that way (in my case, using kernel_read()). This can be used as a replacement for copy_to_user() because it allows forcing write permissions on the pages retrieved. The only drawback is that you have to write page by page, instead of all in one go, but it does solve the problem I described in my question.
In userspace there is a system call mprotect that can modify the protection flags on existing mapping. You probably need to follow from the implementation of that system call, or maybe simply call it directly from your code. See mm/protect.c.

Resources