How is userspace able to write to sysfs - linux-kernel

Recently I was looking through the kernel at kobjects and sysfs.
I know/understand the following..
All kernel objects use addresses > 0x80000000
kobjects should be no exception to this rule
The sysfs is nothing but a hierarchy of kobjects (maybe includes ksets and other k* stuff..not sure)
Given this information, I'm not sure I understand exactly what happens when I run echo ondemand >/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
I can see that the cpufreq module has a function called store_scaling_governor which handles writes to this 'file'..but how does usermode transcend into kernelmode with this simple echo?

When you execute command echo ondemand >/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor, your shell calls write system call, then kernel dispatch it for corresponding handler.
The cpufreq setups struct kobj_type ktype_cpufreq with sysfs_ops. Then cpufreq register it in cpufreq_add_dev_interface(). After that, kernel can get corresponding handler to execute on write syscall.

I can tell you one implementation which I have used for accessing kernel space variables from sysfs (user-space in shell prompt).Basically each set of variables which are exposed to user-space in sys file system appear as a separate file under /sys/.Now when you issue an echo value > /sys/file-path in shell prompt (user-space).When you do so the respective method which gets called in kernel space in .store method.Additionally when you issue cat /sys/file-path the respective method which gets called is .show in kernel.You can see more information about here: http://lwn.net/Articles/31220/

Related

How to Send a Value to Another Driver's Sysfs Attribute

This is all in Linux 4.14.73. I'd upgrade if I could but I cant.
I'm trying to trigger an LED flash in a standard LED class instance from another kernel space driver. I know all about the "bad form" of not accessing files from Kernel Space so I figure there must be some way already defined way for accessing Sysfs attributes from Kernel Space.
The LED is defined here:
/sys/class/leds/fpga_led0
Its trigger is set to [oneshot] so it has a device attribute called "shot" exposed. To get a single LED flash all I need to do from the command line is this:
echo 1 > /sys/class/leds/fpga_led0/shot
I can easily write a User Space program to open the "shot" attribute and write a "1" string to it. There are various published methods of forcing file operations into a kernel driver. Most of them are fairly limited. I've yet to see one that exposes file seek operations which are key to repeatedly writing to an attribute without wasting time opening and closing the file. To be clear, this is not setting values at boot time. In this case I have one driver that needs to send a value to another driver's Sysfs entry at a specific moment in its own operation. Is there a standard, accepted way of sending a value from one running kernel driver to the Sysfs attribute of another kernel driver?

What is the difference between Virtual File System and System Call?

As I understand, kernel provides mainly two interface for user space to do something in kernel, these are System Call and Virtual File system (procfs, sysfs etc).
What I read in a book, that internally VFS also uses System Call.
So I want to know, how these two are connected exactly? And what are the situation where we should use VFS over System Call and vice versa.
A system call is the generic facility for any user space process to switch from user space mode to kernel mode.
It is like a function call that resides in the kernel and being invoked from user space with a variable number of parameters, the most important one is the syscall number.
The kernel will always maintain an architecture-specific array of supported system calls (=kernel functions) and will basically dispatch any syscall coming from user space to the correct function based on the system call number passed from user space.
Virtual File System is just an abstraction of a file system that provides you with standard functions to deal with any thing that can be considered a file. So for example you can call "open", "close", "read", etc. on any file without being concerned about what filesystem is this file stored in.
The relation here between VFS and syscalls is that VFS is basically code that resides in the kernel and the only way to get to the kernel is through syscalls ( "open" is a syscall, so is "close", etc )

How to access kernel parameters in kernel space

This is one of my lab assignments: I have to create an proc entry here: /proc/sys/kernel/ and I have to write a system call to manipulate a user space variable for different values of the proc entry I just added. For eg: say, user space variable is 1 and proc entry is 0 or 1. Now the system call should increment the user space variable by 1(if proc entry is 0/off) or multiply it by two(if proc entry is 1/on)
I did the following to add the proc entry: I created an entry xxx by adding a struct under the kernel ctl table section in the file in the kernel/sysctl.c. Compiled the kernel and the system boots well with this kernel. The entry is also added into proc directory as /proc/sys/kernel/xxx.
I am now able to read or write to it from user space. I did both cat and echo to read and write resp.
I did the following in the system call: I wrote a system call to read the user space variable. I also completed and tested the access_ok, copy_from user, copy_to_user and all that. I also completed manipulating the user space variable to increment always(for now).
Problem I am facing: Now, I have to add an if condition to check the "xxx" value to decide whether I should increment or multiply the user space variable. This is where I am stuck. Not in writing the system call. I don't know how to read this proc entry "xxx".
Can I use file handling?
If so, should I use open() system call inside my system call? Will it work?
When I checked, there was sysctl system call, but it seems deprecated now. This IBM tutorial talks about reading the proc entry. But create_proc_entry does not apply to parameters inside /proc/sys/kernel directory right? If so, how can I ever use read proc entry function?
"But, now I have to write a system call to read the value of xxx."
I suspect that the term "system call" is being used in a formal sense and that you are being asked to add a new system call to the kernel (similar to open, read, mmap, signal etc) that returns your value.
See Adding a new system call in Linux kernel 3.3

change smp_affinity from linux device driver

If I examine the
cat /proc/interrupts
command, all the IRQs are listed under cpu0 in SMP system.
I can change the smp_affinity mask to tag the IRQ to particular CPU using following command.
echo "4" > /proc/irq/230/smp_affinity
Above command sets the affinity mask of the interrupt 230 to CPU 2.
I would like achieve same from linux kernel module. How can I do this?
I see create_proc_entry method which allows to create new proc entry.
Is there any method which we can use to write existing proc entry?
In a kernel module you can just call the kernel API function irq_set_affinity(...) directly. No need to go through /proc. See: http://lxr.free-electrons.com/source/kernel/irq/manage.c#L189

What options do we have for communication between a user program and a Linux Kernel Module?

I am a new comer to Linux Kernel Module programming. From the material that I have read so far, I have found that there are 3 ways for a user program to request services or to communicate with a Linux Kernel Module
a device file in /dev
a file in /proc file system
ioctl() call
Question: What other options do we have for communication between user program and linux kernel module?
Your option 3) is really a sub-option of option 1) - ioctl() is one way of interacting with a device file (read() and write() being the usual ways).
Two other ways worth considering are:
The sysfs filesystem;
Netlink sockets.
Basically, many standard IPC mechanisms — cf. http://en.wikipedia.org/wiki/Inter-process_communication — can be used:
File and memory-mapped file: a device file (as above) or similarly special file in /dev, procfs, sysfs, debugfs, or a filesystem of your own, cartesian product with read/write, ioctl, mmap
Possibly signals (for use with a kthread)
Sockets: using a protocol of choice: TCP, UDP (cf. knfsd, but likely not too easy), PF_LOCAL, or Netlink (many subinterfaces - base netlink, genetlink, Connector, ...)
Furthermore,
 4. System calls (not really usable from modules though)
 5. Network interfaces (akin to tun).
Working examples of Netlink — just to name a few — can be found for example in
git://git.netfilter.org/libmnl (userspace side)
net/core/rtnetlink.c (base netlink)
net/netfilter/nf_conntrack_netlink.c (nfnetlink)
fs/quota/netlink.c (genetlink)
This includes all types with examples :)
http://people.ee.ethz.ch/~arkeller/linux/kernel_user_space_howto.html
Runnable examples of everything
Too much talk is making me bored!
file operations:
file types that implement file operations:
procfs. See also: proc_create() example for kernel module
debugfs
character devices. See also: https://unix.stackexchange.com/questions/37829/how-do-character-device-or-character-special-files-work/371758#371758
sysfs. See also: How to attach file operations to sysfs attribute in platform driver?
file operation syscalls themselves
open, read, write, close, lseek. See also: How to add poll function to the kernel module code?
poll. See also: How do I use ioctl() to manipulate my kernel module?
ioctl. See also: How do I use ioctl() to manipulate my kernel module?
mmap. See also: How to mmap a Linux kernel buffer to user space?
anonymous inodes. See also: What is an anonymous inode in Linux?
netlink sockets. See also: How to use netlink socket to communicate with a kernel module?
This Linux document gives some of the ways in which the kernel and user space can interact(communicate). They are the following.
Procfs, sysfs, and similar mechanisms. This includes /dev entries as well, and all the methods in which kernel space exposes a file in user space (/proc, /dev, etc. entries are basically files exposed from the kernel space).
Socket based mechanisms. Netlink is a type of socket, which is meant specially for communication between user space and kernel space.
System calls.
Upcalls. The kernel executes a code in user space. For example spawning a new process.
mmap - Memory mapping a region of kernel memory to user space. This allows both the kernel, and the user space to read/write to the same memory area.
Other than these, the following list adds some other mechanisms I know.
Interrupts. The user space can raise interrupts to talk to kernel space. For example some CPUs use int80 to make system calls (while others may use a different mechanism like syscall instruction). The kernel has to define the corresponding interrupt handler in advance.
vDSO/vsyscall - These are mechanisms in Linux kernel to optimize execution of some system calls. The idea is to have a shared memory region, and when a process makes a system call, the user space library gets data from this region, instead of actually calling the corresponding system call. This saves context switch overhead.

Resources