I wonder if there exists any reference material (tables or some sort) detailing every Linux kernel functions that are schedule-free (sleep free) to be used safely under atomic or interrupt context
It would be good to check things out while doing coding
Related
I'm new to eBPF, and there are a lot of tutorials saying eBPF is just the extended BPF, but I cannot understand what extended mean? So what is the difference between BPF and eBPF? Are the samples resides in Linux source file [root]/samples/bpf examples of eBPF or just BPF?
BPF is sometimes used to refer to eBPF (created in 2014) and sometimes to cBPF (its predecessor from 1991). As Qeole noted, you can find a detailed comparison of the two in the kernel documentation.
cBPF (classic BPF) is a small bytecode with two 32-bit registers to perform basic filtering on packets and syscalls. No state can be persisted between two calls to a cBPF program.
cBPF is still used by e.g. seccomp and tcpdump, but is actually translated to eBPF bytecode in the recent kernels.
eBPF (extended BPF) is a new bytecode with significant extensions. The bytecode has a more "modern" form, with 10 64-bit registers, fall-through jumps, and a stack space, enabling easier JIT-compilation to native instruction sets. It can call special functions, called helpers, to interact with the kernel. It can save state to maps using those helpers. It comes with a new syscall, bpf(2), to manipulate BPF objects (e.g., maps, programs, etc.). A good introduction to the eBPF ecosystem is available at ebpf.io.
eBPF programs can be written in C and compiled to the bytecode using LLVM/Clang. The examples in the kernel sources are eBPF programs.
I am trying to construct two shared queues(one command queue, and one reply queue) between user and kernel space. So that kernel can send message to userspace and userspace can send reply to kernel after it finishes the processing.
What I have done is use allocate kernel memory pages(for the queues) and mmap to user space, and now both user and kernel side can access those pages(here I mean what is written in kernel space can be correctly read in user space, or vise versa).
The problem is I don't know how I can synchronize the access between kernel and user space. Say if I am going to construct a ring buffer for multi-producer 1-consumer scheme, How to those ring buffer access don't get corrupted by simultaneous writes?
I did some research this week and here are some possible approaches but I am quite new in kernel module development and not so sure whether it will work or not. While digging into them, I will be so glad if I can get any comments or suggestions:
use shared semaphore between user/kernel space: Shared semaphore between user and kernel spaces
But many system calls like sem_timedwait() will be used, I am worrying about how efficient it will be.
What I really prefer is a lock-free scheme, as described in https://lwn.net/Articles/400702/. Related files in kernel tree are:
kernel/trace/ring_buffer_benchmark.c
kernel/trace/ring_buffer.c
Documentation/trace/ring-buffer-design.txt
how lock-free is achieved is documented here: https://lwn.net/Articles/340400/
However, I assume these are kernel implementation and cannot directly be used in user space(As the example in ring_buffer_benchmark.c). Is there any way I can reuse those scheme in user space? Also hope I can find more examples.
Also in that article(lwn 40072), one alternative approach is mentioned using perf tools, which seems similar what I am trying to do. If 2 won't work i will try this approach.
The user-space perf tool therefore interacts with the
kernel through reads and writes in a shared memory region without using system
calls.
Sorry for the English grammar...Hope it make sense.
For syncrhonize between kernel and user space you may use curcular buffer mechanism (documentation at Documentation/circular-buffers.txt).
Key factor of such buffers is two pointers (head and tail), which can be updated separately, which fits well for separated user and kernel codes. Also, implementation of circular buffer is quite simple, so it is not difficult to implement it in user space.
Note, that for multiple producers in the kernel you need to syncrhonize them with spinlock or similar.
I am examining how kernel memory allocators work (SLAB and SLUB). To trick them into being called, I need to invoke kernel memory allocations via a user-land program.
The obvious way would be calling syscall.fork(), which would generate process instances, for which the kernel must maintain PCB structures, which require a fair amount of memory space.
Then I'm out. I would not limit my experiments to merely calling fork() and trace them using Systemtap. Any other convenient ways to do the similar, but may requiring kernel objects (other than proc_t) with various features (the most important of which: their sizes)?
Thanks.
SLUB is just an efficient way (in comparison with SLAB) of managing the cache objects. It is more or less the same thing. You can read here why SLUB was introduced and this link talks about what exactly slab allocator is. Now on to tracing what exactly happens in kernel and how to trace it:
The easier but inefficient way is to read the source code but for that you need to know from where to start in the source.
Another way, more accurate, is to write a driver that allocates memory using kmem_cache_create() and then call it from your user program. Now you have a well defined start point, use kgdb and step through the entire sequence.
I came across few articles talking about differences between Mutexes and Critical sections.
One of the major differences which I came across is , Mutexes run in kernel mode whereas Critical sections mainly run in user mode.
So if this is the case then arent the applications which use mutexes harmful for the system in case the application crashes?
Thanks.
Use Win32 Mutexes handles when you need to have a lock or synchronization across threads in different processes.
Use Win32 CRITICAL_SECTIONs when you need to have a lock between threads within the same process. It's cheaper as far as time and doesn't involve a kernel system call unless there is lock contention. Critical Section objects in Win32 can't span process boundaries anyway.
"Harmful" is the wrong word to use. More like "Win32 mutexes are slightly more expensive that Win32 Critical Sections in terms of performance". A running app that uses mutexes instead of critical sections won't likely hurt system performance. It will just run minutely slower. But depending on how often your lock is acquired and released, the difference may not even be measurable.
I forget the perf metrics I did a long time ago. The bottom line is that EnterCriticalSection and LeaveCriticalSection APIs are on the order of 10-100x faster than the equivalent usage of WaitForSingleObject and ReleaseMutex. (on the order of 1 microsecond vs 1 millisecond).
How can I let threads in a kernel module communicate? I'm writing a kernel module and my architecture is going to use three threads that need to communicate. So far, my research has led me to believe the only way is using shared memory (declaring global variables) and locking mechanisms to synchronize reading/writing between the threads. There's rather scarce material on this out there.
Is there any other way I might take into consideration? What's the most common, the standard in the kernel code?
You don't say what operating system you're programming on. I'll assume Linux, which is the most common unix system.
There are several good books on Linux kernel programming. Linux Device Drivers is available online as well as on paper. Chapter 5 deals with concurrency; you can jump in directly to chapter 5 though it would be best to skim through at least chapters 1 and 3 first. Subsequent chapters have relevant sections as well (in particular wait queues are discussed in chapter 6).
The Linux kernel concurrency model is built on shared variables. There is a large range of synchronization methods: atomic integer variables, mutual exclusion locks (spinlocks for nonblocking critical sections, semaphores for blocking critical sections), reader-writer locks, condition variables, wait queues, …