eBPF vs non-eBPF tracepoint/kprobes - linux-kernel

As per this document (amongst others): https://blogs.oracle.com/linux/post/taming-tracepoints-in-the-linux-kernel
It is possible using both eBPF and other kernel-provided mechanisms to register callbacks for tracepoints or kprobes.
It seems that nowadays everybody wants to use eBPF for this task. What is the advantage of using eBPF instead of just registering tracepoints as explained e.g. here https://www.kernel.org/doc/Documentation/trace/tracepoints.txt ?

Registering tracepoints without using eBPF requires you to use Linux kernel modules. Contrary to eBPF programs, kernel modules are not verified at load time; they may crash your system.
See https://stackoverflow.com/a/70404149/6884590 for the longer explanation.

Related

Why is having an userspace version of eBPF interesting?

I've seen that userspace version of ebpf (runtime, assembler, dissasembler) are being developped (uBPF, rbpf).
Why is having an userspace version of eBPF interesting ?
Do those alternatives focus on the same goal than the eBPF program types (network, observability and security) ?
One of the main advantages of eBPF is that it runs code in the kernel. Observability, in-kernel data aggregation, early packet processing: it all happens in kernel space. So the question sounds legit: Why were uBPF or rbpf created?
I think they were created mostly as prototypes. uBPF was introduced very early in eBPF history, and was probably a proof-of-concept implementation of an eBPF interpreter and x86_64 JIT in user space. I wrote rbpf, which is strongly based on uBPF, and the main objective for me was to get more familiar with two things: eBPF and Rust. Very little afterthought :).
I've always been curious to see what people could do with it. Truth to tell, there are not so many users. The biggest users for rbpf are probably the people from Solana, who implement some blockchain tool with smart contracts run in the eBPF machine. One other use case I've had in the past was to debug some eBPF bytecode, because it is easy to have breakpoints in an user space interpreter (by contrast, runtime debugging for regular kernel eBPF is quite limited at this time).
uBPF had more success and was used as a basis for other projects like DPDK as a filtering library or Oko, an extension to Open vSwitch (both about high-performance network processing). [Edit August 2021] More recently, it was chosen to serve as an eBPF runtime for the implementation of eBPF for Windows (references: announcement, my analysis).
As you can see, the interest of having these user space eBPF machines entirely depends on what you do with it. They're available to run eBPF programs, they don't have a specific focus by themselves, but they can help if you have a use case. In that regard, one of the particularities for uBPF and rbpf is their licenses (Apache, MIT): They are not under GPL, which means that you could reuse them with a larger number of projects, including proprietary ones. This is not the case with code from the kernel.
One big limitation for those user space eBPF machines is also that they tend to be quite out-of-date with regards to what happens in the kernel, where things evolve fast. They don't have a solid verifier, so you cannot assert security or safety of the programs. They hardly support eBPF maps if at all, they do not support function calls, or BTF, or even the latest eBPF instructions for that matter. Some of it could be added, but this would require some engineering efforts and time. [Edit August 2021] uBPF is getting a lot of activity now that Microsoft contributes to it for its eBPF implementation. They also use a user-space verifier, PREVAIL.

Can Windows device drivers have dependencies?

I wrote a kernel-mode driver using C. When I examined it using dependency walker I saw that it depends on some NT*.dll and HAL.dll.
I have several questions:
When does the OS load these DLLs? I thought kernel is responsible for loading DLLs in that case how can driver load a DLL if it is already in kernel-mode
Why don't the standard C dependencies show up like ucrtbase, concrt, vcruntime, msvcp etc? Would it be possible for a driver to have these dependencies and still function?
(A continuation of the last question). If Windows will still load DLLs even in kernel mode, I don't see why drivers cannot be written in (MS) C++
Thanks,
Most of the API in the driver is exported from ntoskrnl.exe.
Your driver is actually a "kernel module", which is part of a process, just like the modules in Ring3.
The driver's "process" is "System", with a Pid of 4, which you can see in the task manager.
ntoskrnl.exe and HAL.dll are modules in the "System", they will Loaded at system startup, while other modules are loaded at time of use (such as your drivers).
You can write and load "driver DLLs", but I haven't done so yet, so I can't answer that.
Ring3 modules are not loaded into the kernel, so you can't call many common Ring3 APIs, but Microsoft has mostly provided alternative APIs for them.
You can't load the Ring3 module directly into the kernel and call its export function. There may be some very complicated methods or tricks to do this, but it's really not necessary.
You can write drivers in C++, but this is not officially recommended by Microsoft at this time as it will encounter many problems, such as:
Constructors and destructors of global variables cannot be called automatically.
You can't use C++ standard libraries directly.
You can't use new and delete directly, they need to be overridden.
C++ exceptions cannot be used directly, and will consume a lot of stack space if you support them manually. Ring0 driver stack space is usually much smaller than Ring3 application stack space, indicating that BSOD may be caused.
Fortunately:
Some great people have solved most of the problems, such as the automatic calling of constructors and destructors and the use of standard libraries.
GitHub Project Link (But I still don't recommend using standard libraries in the kernel unless it's necessary, because they are too complex and large and can lead to some unanticipated issues)
My friend told me that Microsoft seems to have a small team currently trying to make drivers support C++. But I don't have time to confirm the veracity of this claim.

Instrumenting Linux kernel functions

I am looking for a way to instrument a function in Linux kernel. It seems that GCC's -finstrument-functions flag allows instrumentation, but is there any way to instrument only a particular Linux function using compiler directives (i.e., function attributes) instead of instrumenting all functions?
It seems that KProbe also has functionality of instrumentation, but KProbe maintains a blacklist of functions and does not allow to monitor those functions, resulting limited scope of instrumentation.
I am running Ubuntu-16.04 with kernel version 4.8.11 on x86_64.
The purpose of instrumentation is to monitor the entry and exit of the target function by setting and clearing a flag.
The answer is it depends. However it is not as straightforward to inject as one might think.
In particular, example you can use -finstrument-functions for specific C files and then use attribute((no_instrument_function)) to exclude the ones you don't want instrumented
You would have to keep track of preemption and fastcalls, you would have to make sure that you're not accidentally stopping or removing instrumentation. Oh you also have to track any random stack layout that the kernel is self inflicting.

How to add a custom semaphore to the linux kernel?

Basically I want implement my own semaphore inside the linux kernel and be able to use it in user programs.
I've made some progress implementing the kernel code however I do not know how to make semaphore type and the functions I've written available to user programs.
User programs would need to have access to my semaphore type and its functions (wait, signal, ...)
Is there any way to this so that a linux using a kernel compiled with my code would be able to use my semaphore simply by including a header file?
I'm no pro when it comes to the linux kernel, so if I'm making any obvious mistakes feel free to point them out.Thanks.
The kernel version I'm using is 2.6.32.
I would recommend looking into the user space libraries for how a semaphore implemented for user space programs.
Semaphores are only available in kernels older 2.6.16 kernels, as mutex's appeared after that version of the kernel. Only the previous implementation used semaphores. The newer code should use mutexes instead which are used only in process context. You may want to look the following headers, struct's and api's.
#include <linux/mutex.h>
struct mutex
mutex_{lock,trylock,unlock,lock_interruptable}()
Also you may want to look semaphore.c for the implementation.

Differences in kernel mode and drivers

I am just trying to understand the differences to patching into the kernel and writing a driver.
It is my understanding that a kernel mode driver can do anything the kernel can do, and is similar in some ways to a linux module.
Why then, were AV makers so upset when Microsoft stopped them from patching into the Windows kernel?
What kind of stuff can you do through kernel patching that you can't do through a driver?
In this context patching the kernel means modifying its (undocumented?) internal structures in order to achieve some functionality, typically hooking various functions (e.g. opening a file). You are not supposed to go messing around with internal kernel structures that do not belong to you. In the past Microsoft did not provide official hooks for some things, so security companies reverse engineered the internals and hooked the kernel directly. Recently Microsoft has provided official hooks for some things, so the need to hook the kernel directly is not as strong.
It's true that a kernel-mode driver can do anything the kernel can do - after all, they both run in ring 0. The key question here is: how difficult is it? Patching things relies on internal details that may change between different kernel releases. For example, the system call number of NtTerminateProcess will change between versions, so a driver which hooks the SSDT will break between versions (although the system call number can be obtained through other means). Reading or modifying fields of internal structures such as EPROCESS or ETHREAD is risky as well, because again, these structures change between versions. None of this is impossible for a driver to do, but it's hard.
If an official interface is provided for hooking, Microsoft can guarantee compatibility between versions as well as being able to control who can do what (e.g. only signed drivers can use the object manager callbacks). However, Microsoft can't do this for everything, because some things are just implementation details that drivers shouldn't know about.

Resources