change smp_affinity from linux device driver - linux-kernel

If I examine the
cat /proc/interrupts
command, all the IRQs are listed under cpu0 in SMP system.
I can change the smp_affinity mask to tag the IRQ to particular CPU using following command.
echo "4" > /proc/irq/230/smp_affinity
Above command sets the affinity mask of the interrupt 230 to CPU 2.
I would like achieve same from linux kernel module. How can I do this?
I see create_proc_entry method which allows to create new proc entry.
Is there any method which we can use to write existing proc entry?

In a kernel module you can just call the kernel API function irq_set_affinity(...) directly. No need to go through /proc. See: http://lxr.free-electrons.com/source/kernel/irq/manage.c#L189

Related

How can I capture combined kernel and userspace stacks with perf

I'm trying to capture combined user and kernel stacks with perf, so I can see which user space code produces are particular kernel call chain.
Basically I want to create a flamegraph looking like this:
Unfortunately all my kernel stacks end at entry_SYSCALL_64_fastpath and there is no connection to the userspace stacks.
I'm using perf record -g --call-graph dwarf -F 99 --pid 12345 to capture. I have debug symbols for the kernel, libc and my program.
This is kernel 4.8.14 on a Fedora 25 system.
Try bcc utilities that use BPF technology. Take a look at profile util.
https://github.com/iovisor/bcc/blob/master/docs/tutorial.md

How is userspace able to write to sysfs

Recently I was looking through the kernel at kobjects and sysfs.
I know/understand the following..
All kernel objects use addresses > 0x80000000
kobjects should be no exception to this rule
The sysfs is nothing but a hierarchy of kobjects (maybe includes ksets and other k* stuff..not sure)
Given this information, I'm not sure I understand exactly what happens when I run echo ondemand >/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
I can see that the cpufreq module has a function called store_scaling_governor which handles writes to this 'file'..but how does usermode transcend into kernelmode with this simple echo?
When you execute command echo ondemand >/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor, your shell calls write system call, then kernel dispatch it for corresponding handler.
The cpufreq setups struct kobj_type ktype_cpufreq with sysfs_ops. Then cpufreq register it in cpufreq_add_dev_interface(). After that, kernel can get corresponding handler to execute on write syscall.
I can tell you one implementation which I have used for accessing kernel space variables from sysfs (user-space in shell prompt).Basically each set of variables which are exposed to user-space in sys file system appear as a separate file under /sys/.Now when you issue an echo value > /sys/file-path in shell prompt (user-space).When you do so the respective method which gets called in kernel space in .store method.Additionally when you issue cat /sys/file-path the respective method which gets called is .show in kernel.You can see more information about here: http://lwn.net/Articles/31220/

How to switch from user mode to kernel mode?

I'm learning about the Linux kernel but I don't understand how to switch from user mode to kernel mode in Linux. How does it work? Could you give me some advice or give me some link to refer or some book about this?
The only way an user space application can explicitly initiate a switch to kernel mode during normal operation is by making an system call such as open, read, write etc.
Whenever a user application calls these system call APIs with appropriate parameters, a software interrupt/exception(SWI) is triggered.
As a result of this SWI, the control of the code execution jumps from the user application to a predefined location in the Interrupt Vector Table [IVT] provided by the OS.
This IVT contains an adress for the SWI exception handler routine, which performs all the necessary steps required to switch the user application to kernel mode and start executing kernel instructions on behalf of user process.
To switch from user mode to kernel mode you need to perform a system call.
If you just want to see what the stuff is going on under the hood, go to TLDP is your new friend and see the code (it is well documented, no need of additional knowledge to understand an assembly code).
You are interested in:
movl $len,%edx # third argument: message length
movl $msg,%ecx # second argument: pointer to message to write
movl $1,%ebx # first argument: file handle (stdout)
movl $4,%eax # system call number (sys_write)
int $0x80 # call kernel
As you can see, a system call is just a wrapper around the assembly code, that performs an interruption (0x80) and as a result a handler for this system call will be called.
Let's cheat a bit and use a C preprocessor here to build an executable (foo.S is a file where you put a code from the link below):
gcc -o foo -nostdlib foo.S
Run it via strace to ensure that we'll get what we write:
$ strace -t ./foo
09:38:28 execve("./foo", ["./foo"], 0x7ffeb5b771d8 /* 57 vars */) = 0
09:38:28 stat(NULL, Hello, world!
NULL) = 14
09:38:28 write(0, NULL, 14)
I just read through this, and it's a pretty good resource. It explains user mode and kernel mode, why changes happen, how expensive they are, and gives some interesting related reading.
https://blog.codinghorror.com/understanding-user-and-kernel-mode
Here's a short excerpt:
Kernel Mode
In Kernel mode, the executing code has complete and unrestricted access to the underlying hardware. It can execute any CPU instruction and reference any memory address. Kernel mode is generally reserved for the lowest-level, most trusted functions of the operating system. Crashes in kernel mode are catastrophic; they will halt the entire PC.
User Mode
In User mode, the executing code has no ability to directly access hardware or reference memory. Code running in user mode must delegate to system APIs to access hardware or memory. Due to the protection afforded by this sort of isolation, crashes in user mode are always recoverable. Most of the code running on your computer will execute in user mode.

Handling Hardware interrupts in Linux

I am working on a embedded linux platform running Linux 2.6 . I would love to know how to do the following.
1) I have a hardware interrupt source irq7 which shows up in /proc/interrupts
cat /proc/interrupts | grep IRQ7
M547X_8X 71: 1916076 PCI IRQ7
2) For PCI IRQ7, each time i press a button, 3rd value value changes
M547X_8X 71: 2177862 PCI IRQ7
Doesn't this mean my switch press is recognized?
Now i want to trigger a user program from sleep when i press this button. How do i write
this user space program using interrupts or signal?
Should i write a driver program for this?
Can you suggest resources that i should look into?
You should take a look on gpio key on linux. After exporting the interrupt to /dev/input/eventXXX, you can use evtest tool for check from user space.

What options do we have for communication between a user program and a Linux Kernel Module?

I am a new comer to Linux Kernel Module programming. From the material that I have read so far, I have found that there are 3 ways for a user program to request services or to communicate with a Linux Kernel Module
a device file in /dev
a file in /proc file system
ioctl() call
Question: What other options do we have for communication between user program and linux kernel module?
Your option 3) is really a sub-option of option 1) - ioctl() is one way of interacting with a device file (read() and write() being the usual ways).
Two other ways worth considering are:
The sysfs filesystem;
Netlink sockets.
Basically, many standard IPC mechanisms — cf. http://en.wikipedia.org/wiki/Inter-process_communication — can be used:
File and memory-mapped file: a device file (as above) or similarly special file in /dev, procfs, sysfs, debugfs, or a filesystem of your own, cartesian product with read/write, ioctl, mmap
Possibly signals (for use with a kthread)
Sockets: using a protocol of choice: TCP, UDP (cf. knfsd, but likely not too easy), PF_LOCAL, or Netlink (many subinterfaces - base netlink, genetlink, Connector, ...)
Furthermore,
 4. System calls (not really usable from modules though)
 5. Network interfaces (akin to tun).
Working examples of Netlink — just to name a few — can be found for example in
git://git.netfilter.org/libmnl (userspace side)
net/core/rtnetlink.c (base netlink)
net/netfilter/nf_conntrack_netlink.c (nfnetlink)
fs/quota/netlink.c (genetlink)
This includes all types with examples :)
http://people.ee.ethz.ch/~arkeller/linux/kernel_user_space_howto.html
Runnable examples of everything
Too much talk is making me bored!
file operations:
file types that implement file operations:
procfs. See also: proc_create() example for kernel module
debugfs
character devices. See also: https://unix.stackexchange.com/questions/37829/how-do-character-device-or-character-special-files-work/371758#371758
sysfs. See also: How to attach file operations to sysfs attribute in platform driver?
file operation syscalls themselves
open, read, write, close, lseek. See also: How to add poll function to the kernel module code?
poll. See also: How do I use ioctl() to manipulate my kernel module?
ioctl. See also: How do I use ioctl() to manipulate my kernel module?
mmap. See also: How to mmap a Linux kernel buffer to user space?
anonymous inodes. See also: What is an anonymous inode in Linux?
netlink sockets. See also: How to use netlink socket to communicate with a kernel module?
This Linux document gives some of the ways in which the kernel and user space can interact(communicate). They are the following.
Procfs, sysfs, and similar mechanisms. This includes /dev entries as well, and all the methods in which kernel space exposes a file in user space (/proc, /dev, etc. entries are basically files exposed from the kernel space).
Socket based mechanisms. Netlink is a type of socket, which is meant specially for communication between user space and kernel space.
System calls.
Upcalls. The kernel executes a code in user space. For example spawning a new process.
mmap - Memory mapping a region of kernel memory to user space. This allows both the kernel, and the user space to read/write to the same memory area.
Other than these, the following list adds some other mechanisms I know.
Interrupts. The user space can raise interrupts to talk to kernel space. For example some CPUs use int80 to make system calls (while others may use a different mechanism like syscall instruction). The kernel has to define the corresponding interrupt handler in advance.
vDSO/vsyscall - These are mechanisms in Linux kernel to optimize execution of some system calls. The idea is to have a shared memory region, and when a process makes a system call, the user space library gets data from this region, instead of actually calling the corresponding system call. This saves context switch overhead.

Resources