I just learned to use ftrace and perf and there are some stuff they are in common I don't understand--trace events. I guess they are some kernel internal functions, ftrace will record their name when they're called if they're enabled. Is that right? All the evens are sorted in groups listed below. Would someone tell me what they stand for or where I can get information about them in detail. thx.
block btrfs compaction drm ext3 ext4 fs ftrace gpio header_event header_page irq jbd jbd2 kmem mce module napi net power raw_syscalls rcu regmap regulator rpm sched scsi signal skb sock syscalls timer udp vfs vmscan vsyscall workqueue writeback xen xfs
Each of those is the name of the code in the linux kernel which printed the log message. For example, rcu is the lockless list code. It stands for Read Copy Update. The names will roughly match up with names of files or directories in the kernel source. Look in the Documentation directory of the kernel source for more information.
Related
Is there a file (in /dev perhaps) that allows me to compute AES or SHA1 on data? There are analogs like /dev/urandom /dev/zero etc.
It would work like this: open said file, write data to it and read results out of it. Using sendfile syscall would be useful here as well, copying data directly within kernel space.
Not as a device node. There is an interface to the kernel CryptoAPI, but it's through netlink (AF_ALG). More information is available in the Linux kernel documentation.
However, it is rarely useful unless you have a hardware crypto accelerator which is only available from the kernel. The overhead of system calls will often make this interface much slower than performing crypto operations directly in your process.
I have been trying to understand how do h/w interrupts end up in some user space code, through the kernel.
My research led me to understand that:
1- An external device needs attention from CPU
2- It signals the CPU by raising an interrupt (h/w trance to cpu or bus)
3- The CPU asserts, saves current context, looks up address of ISR in the
interrupt descriptor table (vector)
4- CPU switches to kernel (privileged) mode and executes the ISR.
Question #1: How did the kernel store ISR address in interrupt vector table? It might probably be done by sending the CPU some piece of assembly described in the CPUs user manual? The more detail on this subject the better please.
In user space how can a programmer write a piece of code that listens to a h/w device notifications?
This is what I understand so far.
5- The kernel driver for that specific device has now the message from the device and is now executing the ISR.
Question #3:If the programmer in user space wanted to poll the device, I would assume this would be done through a system call (or at least this is what I understood so far). How is this done? How can a driver tell the kernel to be called upon a specific systemcall so that it can execute the request from the user? And then what happens, how does the driver gives back the requested data to user space?
I might be completely off track here, any guidance would be appreciated.
I am not looking for specific details answers, I am only trying to understand the general picture.
Question #1: How did the kernel store ISR address in interrupt vector table?
Driver calls request_irq kernel function (defined in include/linux/interrupt.h and in kernel/irq/manage.c), and Linux kernel will register it in right way according to current CPU/arch rules.
It might probably be done by sending the CPU some piece of assembly described in the CPUs user manual?
In x86 Linux kernel stores ISR in Interrupt Descriptor Table (IDT), it format is described by vendor (Intel - volume 3) and also in many resources like http://en.wikipedia.org/wiki/Interrupt_descriptor_table and http://wiki.osdev.org/IDT and http://phrack.org/issues/59/4.html and http://en.wikibooks.org/wiki/X86_Assembly/Advanced_Interrupts.
Pointer to IDT table is registered in special CPU register (IDTR) with special assembler commands: LIDT and SIDT.
If the programmer in user space wanted to poll the device, I would assume this would be done through a system call (or at least this is what I understood so far). How is this done? How can a driver tell the kernel to be called upon a specific systemcall so that it can execute the request from the user? And then what happens, how does the driver gives back the requested data to user space?
Driver usually registers some device special file in /dev; pointers to several driver functions are registered for this file as "File Operations". User-space program opens this file (syscall open), and kernels calls device's special code for open; then program calls poll or read syscall on this fd, kernel will call *poll or *read of driver's file operations (http://www.makelinux.net/ldd3/chp-3-sect-7.shtml). Driver may put caller to sleep (wait_event*) and irq handler will wake it up (wake_up* - http://www.makelinux.net/ldd3/chp-6-sect-2 ).
You can read more about linux driver creation in book LINUX DEVICE DRIVERS (2005) by Jonathan Corbet, Alessandro Rubini, and Greg Kroah-Hartman: https://lwn.net/Kernel/LDD3/
Chapter 3: Char Drivers https://lwn.net/images/pdf/LDD3/ch03.pdf
Chapter 10: Interrupt Handling https://lwn.net/images/pdf/LDD3/ch10.pdf
Assume that the gpio X can be exported in sysfs as an input pin, after doing that a directory called gpioX will be created into /sys/class/gpio/. gpioX/ contains few file such as "value" which represents the current state of the gpio X (high or low).
What happens (in kernel space) when the signal applied to the pin X changes its state (for example from low to high)?
I mean, before the transition gpioX/value contains "low", but after that it will contain "high" value. How is this file updated by the OS?
I think that an interrupt mechanism is required.Does it use an interrupt mechanism to update sysfs?
How is this file updated by the OS? I think that an interrupt mechanism is required.
It does not require an interrupt mechanism unless it supports polling (man poll) or alternate asynchronous notifications. At least with most version, the /sys/class/gpio/ only does a read of the GPIO level when someone reads the file.
sysfs, debugfs, configfs, procfs, etc are virtual file systems. When you access the file, code within the Linux kernel runs to provide the value. sysfs only provides a file like interface; that doesn't mean it is backed with actual state. The state is the GPIO level which can be read at any time.
gpio_value_show() appears to be the current implementation. What you describe with interrupts is possible. It can be done through the sysfs_set_active_low() function or the sysfs file /sys/class/gpio/gpioN/edge. Writing to the file may return an error if the GPIO doesn't support interrupts. See gpio.txt for more (especially for your particular version of Linux).
Is there any way to get address and size of code segment of linux kernel thread (like task_struct->mm->mmap->vm_start and vm_end for active task with task_struct->mm != 0)?
I would recommend you go through the taskstats interface from the Linux kernel which can provide info on all the Linux threads, including VM stats.
Have a look on the doc, as well as on the header for the interface.
There is no easy way to hack into the kernel to enumerate all the task_struct available.
I am a new comer to Linux Kernel Module programming. From the material that I have read so far, I have found that there are 3 ways for a user program to request services or to communicate with a Linux Kernel Module
a device file in /dev
a file in /proc file system
ioctl() call
Question: What other options do we have for communication between user program and linux kernel module?
Your option 3) is really a sub-option of option 1) - ioctl() is one way of interacting with a device file (read() and write() being the usual ways).
Two other ways worth considering are:
The sysfs filesystem;
Netlink sockets.
Basically, many standard IPC mechanisms — cf. http://en.wikipedia.org/wiki/Inter-process_communication — can be used:
File and memory-mapped file: a device file (as above) or similarly special file in /dev, procfs, sysfs, debugfs, or a filesystem of your own, cartesian product with read/write, ioctl, mmap
Possibly signals (for use with a kthread)
Sockets: using a protocol of choice: TCP, UDP (cf. knfsd, but likely not too easy), PF_LOCAL, or Netlink (many subinterfaces - base netlink, genetlink, Connector, ...)
Furthermore,
4. System calls (not really usable from modules though)
5. Network interfaces (akin to tun).
Working examples of Netlink — just to name a few — can be found for example in
git://git.netfilter.org/libmnl (userspace side)
net/core/rtnetlink.c (base netlink)
net/netfilter/nf_conntrack_netlink.c (nfnetlink)
fs/quota/netlink.c (genetlink)
This includes all types with examples :)
http://people.ee.ethz.ch/~arkeller/linux/kernel_user_space_howto.html
Runnable examples of everything
Too much talk is making me bored!
file operations:
file types that implement file operations:
procfs. See also: proc_create() example for kernel module
debugfs
character devices. See also: https://unix.stackexchange.com/questions/37829/how-do-character-device-or-character-special-files-work/371758#371758
sysfs. See also: How to attach file operations to sysfs attribute in platform driver?
file operation syscalls themselves
open, read, write, close, lseek. See also: How to add poll function to the kernel module code?
poll. See also: How do I use ioctl() to manipulate my kernel module?
ioctl. See also: How do I use ioctl() to manipulate my kernel module?
mmap. See also: How to mmap a Linux kernel buffer to user space?
anonymous inodes. See also: What is an anonymous inode in Linux?
netlink sockets. See also: How to use netlink socket to communicate with a kernel module?
This Linux document gives some of the ways in which the kernel and user space can interact(communicate). They are the following.
Procfs, sysfs, and similar mechanisms. This includes /dev entries as well, and all the methods in which kernel space exposes a file in user space (/proc, /dev, etc. entries are basically files exposed from the kernel space).
Socket based mechanisms. Netlink is a type of socket, which is meant specially for communication between user space and kernel space.
System calls.
Upcalls. The kernel executes a code in user space. For example spawning a new process.
mmap - Memory mapping a region of kernel memory to user space. This allows both the kernel, and the user space to read/write to the same memory area.
Other than these, the following list adds some other mechanisms I know.
Interrupts. The user space can raise interrupts to talk to kernel space. For example some CPUs use int80 to make system calls (while others may use a different mechanism like syscall instruction). The kernel has to define the corresponding interrupt handler in advance.
vDSO/vsyscall - These are mechanisms in Linux kernel to optimize execution of some system calls. The idea is to have a shared memory region, and when a process makes a system call, the user space library gets data from this region, instead of actually calling the corresponding system call. This saves context switch overhead.