What is the difference between Virtual File System and System Call? - linux-kernel

As I understand, kernel provides mainly two interface for user space to do something in kernel, these are System Call and Virtual File system (procfs, sysfs etc).
What I read in a book, that internally VFS also uses System Call.
So I want to know, how these two are connected exactly? And what are the situation where we should use VFS over System Call and vice versa.

A system call is the generic facility for any user space process to switch from user space mode to kernel mode.
It is like a function call that resides in the kernel and being invoked from user space with a variable number of parameters, the most important one is the syscall number.
The kernel will always maintain an architecture-specific array of supported system calls (=kernel functions) and will basically dispatch any syscall coming from user space to the correct function based on the system call number passed from user space.
Virtual File System is just an abstraction of a file system that provides you with standard functions to deal with any thing that can be considered a file. So for example you can call "open", "close", "read", etc. on any file without being concerned about what filesystem is this file stored in.
The relation here between VFS and syscalls is that VFS is basically code that resides in the kernel and the only way to get to the kernel is through syscalls ( "open" is a syscall, so is "close", etc )

Related

Is there a way to call user space function form linux kernel module?

Imagine a situation like this: I'll take a function pointer, which is located in the user space, from a syscall, and the kernel module calls back this function.
(It would be important for this function to run in user space)
Will the kenel module see the same memory address (acquired function pointer) as the user space application? (I mean user's virtual address space or liner address space)
First of, you are trying to do something wrong. If you need custom code in the kernel, you provide it as a kernel module.
The answer in the linked duplicate ( Executing a user-space function from the kernel space ) is largely crap. This would "work" on certain architectures as long as no syscalls are used and no tls/whatever other stuff is used. In fact this is how plenty of exploits do it.
I'll take a function pointer, which is located in the user space, from
a syscall, and the kernel module calls back this function.
It really sounds like you are trying to do something backwards. If you need a userspace component, that's the thing which should have all the logic. Then you call the kernel telling it what to do.
(It would be important for this function to run in user space?)
Who are you asking? I can only state that calling a function which was planted by userspace does not mean it starts "running in user space". Switching to userspace is a lot of work, definitely not done by calling a function.
Will the kenel module see the same memory address (acquired function pointer) as the user space application?
Depends on the architecture, typically it will. But even then there are hardware protections from using this "feature" which have to explicitly turned off.
But again, you DON'T want to do it. I strongly suggest you state the actual problem.

How is userspace able to write to sysfs

Recently I was looking through the kernel at kobjects and sysfs.
I know/understand the following..
All kernel objects use addresses > 0x80000000
kobjects should be no exception to this rule
The sysfs is nothing but a hierarchy of kobjects (maybe includes ksets and other k* stuff..not sure)
Given this information, I'm not sure I understand exactly what happens when I run echo ondemand >/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
I can see that the cpufreq module has a function called store_scaling_governor which handles writes to this 'file'..but how does usermode transcend into kernelmode with this simple echo?
When you execute command echo ondemand >/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor, your shell calls write system call, then kernel dispatch it for corresponding handler.
The cpufreq setups struct kobj_type ktype_cpufreq with sysfs_ops. Then cpufreq register it in cpufreq_add_dev_interface(). After that, kernel can get corresponding handler to execute on write syscall.
I can tell you one implementation which I have used for accessing kernel space variables from sysfs (user-space in shell prompt).Basically each set of variables which are exposed to user-space in sys file system appear as a separate file under /sys/.Now when you issue an echo value > /sys/file-path in shell prompt (user-space).When you do so the respective method which gets called in kernel space in .store method.Additionally when you issue cat /sys/file-path the respective method which gets called is .show in kernel.You can see more information about here: http://lwn.net/Articles/31220/

Modifying Linux process page table for physical memory access without system call

I am developing a real-time application for Linux 3.5.7. The application needs to manage a PCI-E device.
In order to access the PCI-E card spaces, I have been using mmap in combination with /dev/mem. However (please correct me if I am wrong) each time I read or write the mapped memory, a system call is required for the /dev/mem pseudo-driver to handle the memory access.
To avoid the overhead of this system call, I think it should be possible to write a kernel module so that, within e.g. a ioctl call I can modify the process page table, in order to map the physical device pages to userspace pages and avoid the system call.
Can you give me some orientation on this?
Thanks and regards
However (please correct me if I am wrong) each time I read or write the mapped memory, a system call is required
You are wrong.
it should be possible to write a kernel module so that, within e.g. a ioctl call I can modify the process page table
This is precisely what mmap() does.

win32k.sys mapping address in the session space

My question:
when win32k.sys is loaded into the session space, does it get the same base address in every session?
Details:
I'm writing a kernel-mode device driver for Windows (32 bit). It loads as a standard WDM driver into the system space (global kernel-mode memory) during the system boot.
However in some situations I need to access functions exported by win32k.sys. To be exact, I'm writing a sort of a driver that needs sometimes to pretend as a display driver.
I may not statically import those functions (means, import them via executable import table). This is because win32k.sys is loaded during the later stage when sessions are created. Moreover, it's loaded into the session space.
Nevertheless I've found the workaround. During the session creation I import the needed functions dynamically. I use ZwQuerySystemInformation with SystemModuleInformation to find the base address of win32k.sys in the current session. Then using this base address I analyze it to find the export directory of win32k.sys and obtain the needed function pointers.
Currently for every session I keep a separate array of imported functions. However practically those functions are always the same in all the sessions. Means - win32k.sys is mapped into the same address belonging to the session space in every session.
Hence, my question is, is there a guarantee that win32k.sys will be mapped into the same address in all the sessions?
Apart from saving some memory this will make things easier for me. Currently in order to call such a function I need a session-specific context where the function pointers are stored.
My experience is that win32k.sys base address is the same in the context of all processes the driver is mapped. During its initialization, win32k.sys calls ntoskrnl.exe to create Object Type kernel objects for desktops, window stations and possibly other objects used by the driver. These kernel objects must be at the same addresses in context of all processes to keep the kernel data structures consistent (for example, there is an array of pointers to all Object Type objects inside ntoskrnl.exe module).
Moreover, win32k.sys contains a system call table (win32k!W32pServiceTable). Address of the table is, again, stored in a fixed location in ntoskrnl.exe (nt!KeServiceDescriptorTableshadow).
So, if the win32k.sys driver was mapped to different addresses in different session, ntoskrnl.exe must behave the same. And this is not true (such behavior would cause additional problems, for example, with SYSENTER/SYSCALL). But I did not see this fact written in any official documentation.
I am not very sure but I guess the answer is YES. Win32k.sys is just another (special) dll file, and every dll file on Windows has a base address in its PE header. For win32k.sys which is provided by the Windows(I think), the base address should never conflict with other system dll (.sys) files.
To be safe, you can make your program a little bit flexible. At the beginning, you assume the address is same. But you check the address before you actually call it. In that way, the system will not hang because of bad address, at least.

What options do we have for communication between a user program and a Linux Kernel Module?

I am a new comer to Linux Kernel Module programming. From the material that I have read so far, I have found that there are 3 ways for a user program to request services or to communicate with a Linux Kernel Module
a device file in /dev
a file in /proc file system
ioctl() call
Question: What other options do we have for communication between user program and linux kernel module?
Your option 3) is really a sub-option of option 1) - ioctl() is one way of interacting with a device file (read() and write() being the usual ways).
Two other ways worth considering are:
The sysfs filesystem;
Netlink sockets.
Basically, many standard IPC mechanisms — cf. http://en.wikipedia.org/wiki/Inter-process_communication — can be used:
File and memory-mapped file: a device file (as above) or similarly special file in /dev, procfs, sysfs, debugfs, or a filesystem of your own, cartesian product with read/write, ioctl, mmap
Possibly signals (for use with a kthread)
Sockets: using a protocol of choice: TCP, UDP (cf. knfsd, but likely not too easy), PF_LOCAL, or Netlink (many subinterfaces - base netlink, genetlink, Connector, ...)
Furthermore,
 4. System calls (not really usable from modules though)
 5. Network interfaces (akin to tun).
Working examples of Netlink — just to name a few — can be found for example in
git://git.netfilter.org/libmnl (userspace side)
net/core/rtnetlink.c (base netlink)
net/netfilter/nf_conntrack_netlink.c (nfnetlink)
fs/quota/netlink.c (genetlink)
This includes all types with examples :)
http://people.ee.ethz.ch/~arkeller/linux/kernel_user_space_howto.html
Runnable examples of everything
Too much talk is making me bored!
file operations:
file types that implement file operations:
procfs. See also: proc_create() example for kernel module
debugfs
character devices. See also: https://unix.stackexchange.com/questions/37829/how-do-character-device-or-character-special-files-work/371758#371758
sysfs. See also: How to attach file operations to sysfs attribute in platform driver?
file operation syscalls themselves
open, read, write, close, lseek. See also: How to add poll function to the kernel module code?
poll. See also: How do I use ioctl() to manipulate my kernel module?
ioctl. See also: How do I use ioctl() to manipulate my kernel module?
mmap. See also: How to mmap a Linux kernel buffer to user space?
anonymous inodes. See also: What is an anonymous inode in Linux?
netlink sockets. See also: How to use netlink socket to communicate with a kernel module?
This Linux document gives some of the ways in which the kernel and user space can interact(communicate). They are the following.
Procfs, sysfs, and similar mechanisms. This includes /dev entries as well, and all the methods in which kernel space exposes a file in user space (/proc, /dev, etc. entries are basically files exposed from the kernel space).
Socket based mechanisms. Netlink is a type of socket, which is meant specially for communication between user space and kernel space.
System calls.
Upcalls. The kernel executes a code in user space. For example spawning a new process.
mmap - Memory mapping a region of kernel memory to user space. This allows both the kernel, and the user space to read/write to the same memory area.
Other than these, the following list adds some other mechanisms I know.
Interrupts. The user space can raise interrupts to talk to kernel space. For example some CPUs use int80 to make system calls (while others may use a different mechanism like syscall instruction). The kernel has to define the corresponding interrupt handler in advance.
vDSO/vsyscall - These are mechanisms in Linux kernel to optimize execution of some system calls. The idea is to have a shared memory region, and when a process makes a system call, the user space library gets data from this region, instead of actually calling the corresponding system call. This saves context switch overhead.

Resources