What options do we have for communication between a user program and a Linux Kernel Module? - linux-kernel

I am a new comer to Linux Kernel Module programming. From the material that I have read so far, I have found that there are 3 ways for a user program to request services or to communicate with a Linux Kernel Module
a device file in /dev
a file in /proc file system
ioctl() call
Question: What other options do we have for communication between user program and linux kernel module?

Your option 3) is really a sub-option of option 1) - ioctl() is one way of interacting with a device file (read() and write() being the usual ways).
Two other ways worth considering are:
The sysfs filesystem;
Netlink sockets.

Basically, many standard IPC mechanisms — cf. http://en.wikipedia.org/wiki/Inter-process_communication — can be used:
File and memory-mapped file: a device file (as above) or similarly special file in /dev, procfs, sysfs, debugfs, or a filesystem of your own, cartesian product with read/write, ioctl, mmap
Possibly signals (for use with a kthread)
Sockets: using a protocol of choice: TCP, UDP (cf. knfsd, but likely not too easy), PF_LOCAL, or Netlink (many subinterfaces - base netlink, genetlink, Connector, ...)
Furthermore,
 4. System calls (not really usable from modules though)
 5. Network interfaces (akin to tun).
Working examples of Netlink — just to name a few — can be found for example in
git://git.netfilter.org/libmnl (userspace side)
net/core/rtnetlink.c (base netlink)
net/netfilter/nf_conntrack_netlink.c (nfnetlink)
fs/quota/netlink.c (genetlink)

This includes all types with examples :)
http://people.ee.ethz.ch/~arkeller/linux/kernel_user_space_howto.html

Runnable examples of everything
Too much talk is making me bored!
file operations:
file types that implement file operations:
procfs. See also: proc_create() example for kernel module
debugfs
character devices. See also: https://unix.stackexchange.com/questions/37829/how-do-character-device-or-character-special-files-work/371758#371758
sysfs. See also: How to attach file operations to sysfs attribute in platform driver?
file operation syscalls themselves
open, read, write, close, lseek. See also: How to add poll function to the kernel module code?
poll. See also: How do I use ioctl() to manipulate my kernel module?
ioctl. See also: How do I use ioctl() to manipulate my kernel module?
mmap. See also: How to mmap a Linux kernel buffer to user space?
anonymous inodes. See also: What is an anonymous inode in Linux?
netlink sockets. See also: How to use netlink socket to communicate with a kernel module?

This Linux document gives some of the ways in which the kernel and user space can interact(communicate). They are the following.
Procfs, sysfs, and similar mechanisms. This includes /dev entries as well, and all the methods in which kernel space exposes a file in user space (/proc, /dev, etc. entries are basically files exposed from the kernel space).
Socket based mechanisms. Netlink is a type of socket, which is meant specially for communication between user space and kernel space.
System calls.
Upcalls. The kernel executes a code in user space. For example spawning a new process.
mmap - Memory mapping a region of kernel memory to user space. This allows both the kernel, and the user space to read/write to the same memory area.
Other than these, the following list adds some other mechanisms I know.
Interrupts. The user space can raise interrupts to talk to kernel space. For example some CPUs use int80 to make system calls (while others may use a different mechanism like syscall instruction). The kernel has to define the corresponding interrupt handler in advance.
vDSO/vsyscall - These are mechanisms in Linux kernel to optimize execution of some system calls. The idea is to have a shared memory region, and when a process makes a system call, the user space library gets data from this region, instead of actually calling the corresponding system call. This saves context switch overhead.

Related

Virtual file for AES/SHA1 kernel side computing

Is there a file (in /dev perhaps) that allows me to compute AES or SHA1 on data? There are analogs like /dev/urandom /dev/zero etc.
It would work like this: open said file, write data to it and read results out of it. Using sendfile syscall would be useful here as well, copying data directly within kernel space.
Not as a device node. There is an interface to the kernel CryptoAPI, but it's through netlink (AF_ALG). More information is available in the Linux kernel documentation.
However, it is rarely useful unless you have a hardware crypto accelerator which is only available from the kernel. The overhead of system calls will often make this interface much slower than performing crypto operations directly in your process.

From Kernel Space to User Space: Inner-workings of Interrupts

I have been trying to understand how do h/w interrupts end up in some user space code, through the kernel.
My research led me to understand that:
1- An external device needs attention from CPU
2- It signals the CPU by raising an interrupt (h/w trance to cpu or bus)
3- The CPU asserts, saves current context, looks up address of ISR in the
interrupt descriptor table (vector)
4- CPU switches to kernel (privileged) mode and executes the ISR.
Question #1: How did the kernel store ISR address in interrupt vector table? It might probably be done by sending the CPU some piece of assembly described in the CPUs user manual? The more detail on this subject the better please.
In user space how can a programmer write a piece of code that listens to a h/w device notifications?
This is what I understand so far.
5- The kernel driver for that specific device has now the message from the device and is now executing the ISR.
Question #3:If the programmer in user space wanted to poll the device, I would assume this would be done through a system call (or at least this is what I understood so far). How is this done? How can a driver tell the kernel to be called upon a specific systemcall so that it can execute the request from the user? And then what happens, how does the driver gives back the requested data to user space?
I might be completely off track here, any guidance would be appreciated.
I am not looking for specific details answers, I am only trying to understand the general picture.
Question #1: How did the kernel store ISR address in interrupt vector table?
Driver calls request_irq kernel function (defined in include/linux/interrupt.h and in kernel/irq/manage.c), and Linux kernel will register it in right way according to current CPU/arch rules.
It might probably be done by sending the CPU some piece of assembly described in the CPUs user manual?
In x86 Linux kernel stores ISR in Interrupt Descriptor Table (IDT), it format is described by vendor (Intel - volume 3) and also in many resources like http://en.wikipedia.org/wiki/Interrupt_descriptor_table and http://wiki.osdev.org/IDT and http://phrack.org/issues/59/4.html and http://en.wikibooks.org/wiki/X86_Assembly/Advanced_Interrupts.
Pointer to IDT table is registered in special CPU register (IDTR) with special assembler commands: LIDT and SIDT.
If the programmer in user space wanted to poll the device, I would assume this would be done through a system call (or at least this is what I understood so far). How is this done? How can a driver tell the kernel to be called upon a specific systemcall so that it can execute the request from the user? And then what happens, how does the driver gives back the requested data to user space?
Driver usually registers some device special file in /dev; pointers to several driver functions are registered for this file as "File Operations". User-space program opens this file (syscall open), and kernels calls device's special code for open; then program calls poll or read syscall on this fd, kernel will call *poll or *read of driver's file operations (http://www.makelinux.net/ldd3/chp-3-sect-7.shtml). Driver may put caller to sleep (wait_event*) and irq handler will wake it up (wake_up* - http://www.makelinux.net/ldd3/chp-6-sect-2 ).
You can read more about linux driver creation in book LINUX DEVICE DRIVERS (2005) by Jonathan Corbet, Alessandro Rubini, and Greg Kroah-Hartman: https://lwn.net/Kernel/LDD3/
Chapter 3: Char Drivers https://lwn.net/images/pdf/LDD3/ch03.pdf
Chapter 10: Interrupt Handling https://lwn.net/images/pdf/LDD3/ch10.pdf

What is the difference between Virtual File System and System Call?

As I understand, kernel provides mainly two interface for user space to do something in kernel, these are System Call and Virtual File system (procfs, sysfs etc).
What I read in a book, that internally VFS also uses System Call.
So I want to know, how these two are connected exactly? And what are the situation where we should use VFS over System Call and vice versa.
A system call is the generic facility for any user space process to switch from user space mode to kernel mode.
It is like a function call that resides in the kernel and being invoked from user space with a variable number of parameters, the most important one is the syscall number.
The kernel will always maintain an architecture-specific array of supported system calls (=kernel functions) and will basically dispatch any syscall coming from user space to the correct function based on the system call number passed from user space.
Virtual File System is just an abstraction of a file system that provides you with standard functions to deal with any thing that can be considered a file. So for example you can call "open", "close", "read", etc. on any file without being concerned about what filesystem is this file stored in.
The relation here between VFS and syscalls is that VFS is basically code that resides in the kernel and the only way to get to the kernel is through syscalls ( "open" is a syscall, so is "close", etc )

bypassing tty layer and copy to user

I would like to copy data to user space from kernel module which receives data from serial port and transfers it to DMA, which in turn forwards the data to tty layer and finally to user space.
the current flow is
serial driver FIFO--> DMA-->TTY layer -->User space (the data to tty layer is emptied from DMA upon expiration of timer)
What I want to achieve is
serial driver FIFO-->DMA-->user space. (I am OK with using timer to send the data to user space, if there is a better way let me know)
Also the kernel module handling the serialFIFO->DMA is not a character device.
I would like to bypass tty layer completely. what is the best way to achieve so?
Any pointers/code snippet would be appreciated.
In >=3.10.5 the "serial FIFO" that you refer to is called a uart_port. These are defined in drivers/tty/serial.
I assume that what you want to do is to copy the driver for your UART to a new file, then instead of using uart_insert_char to insert characters from the UART RX FIFO, you want to insert the characters into a buffer that you can access from user space.
The way to do this is to create a second driver, a misc class device driver that has file operations, including mmap, and that allocates kernel memory that the driver's mmap file operation function associates with the userspace mapped memory. There is a good example of code for this written by Maxime Ripard. This example was written for a FIQ handled device, but you can use just the probe routine's dma_zalloc_coherent call and the mmap routine, with it's call to remap_pfn_range, to do the trick, that is, to associate a user space mmap on the misc device file with the alloc'ed memory.
You need to connect the memory that you allocated in your misc driver to the buffer that you write to in your UART driver using either a global void pointer, or else by using an exported symbol, if your misc driver is a module. Initialize the pointer to a known invalid value in the UART driver and test it to make sure the misc driver has assigned it before you try to insert characters to the address to which it points.
Note that you can't add an mmap function to the UART driver directly because the UART driver class does not support an mmap file operation. It only supports the operations defined in the include/linux/serial_core.h struct uart_ops.
Admittedly this is a cumbersome solution - two device drivers, but the alternative is to write a new device class, a UART device that has an mmap operation, and that would be a lot of work compared with the above solution although it would be elegant. No one has done this to date because as Jonathan Corbet say's "...not every device lends itself to the mmap abstraction; it makes no sense, for instance, for serial ports and other stream-oriented devices", though this is exactly what you are asking for.
I implemented this solution for a polling mode UART driver based on the mxs-auart.c code and Maxime's example. It was non-trivial effort but mostly because I am using a FIQ handler for the polling timer. You should allow two to three weeks to get the whole thing up and running.
The DMA aspect of your question depends on whether the UART supports DMA transfer mode. If so, then you should be able to set it using the serial flags. The i.MX28's PrimeCell auarts support DMA transfer but for my application there was no advantage over simply reading bytes directly from the UART RX FIFO.

Modifying Linux process page table for physical memory access without system call

I am developing a real-time application for Linux 3.5.7. The application needs to manage a PCI-E device.
In order to access the PCI-E card spaces, I have been using mmap in combination with /dev/mem. However (please correct me if I am wrong) each time I read or write the mapped memory, a system call is required for the /dev/mem pseudo-driver to handle the memory access.
To avoid the overhead of this system call, I think it should be possible to write a kernel module so that, within e.g. a ioctl call I can modify the process page table, in order to map the physical device pages to userspace pages and avoid the system call.
Can you give me some orientation on this?
Thanks and regards
However (please correct me if I am wrong) each time I read or write the mapped memory, a system call is required
You are wrong.
it should be possible to write a kernel module so that, within e.g. a ioctl call I can modify the process page table
This is precisely what mmap() does.

Resources