Why is having an userspace version of eBPF interesting? - linux-kernel

I've seen that userspace version of ebpf (runtime, assembler, dissasembler) are being developped (uBPF, rbpf).
Why is having an userspace version of eBPF interesting ?
Do those alternatives focus on the same goal than the eBPF program types (network, observability and security) ?

One of the main advantages of eBPF is that it runs code in the kernel. Observability, in-kernel data aggregation, early packet processing: it all happens in kernel space. So the question sounds legit: Why were uBPF or rbpf created?
I think they were created mostly as prototypes. uBPF was introduced very early in eBPF history, and was probably a proof-of-concept implementation of an eBPF interpreter and x86_64 JIT in user space. I wrote rbpf, which is strongly based on uBPF, and the main objective for me was to get more familiar with two things: eBPF and Rust. Very little afterthought :).
I've always been curious to see what people could do with it. Truth to tell, there are not so many users. The biggest users for rbpf are probably the people from Solana, who implement some blockchain tool with smart contracts run in the eBPF machine. One other use case I've had in the past was to debug some eBPF bytecode, because it is easy to have breakpoints in an user space interpreter (by contrast, runtime debugging for regular kernel eBPF is quite limited at this time).
uBPF had more success and was used as a basis for other projects like DPDK as a filtering library or Oko, an extension to Open vSwitch (both about high-performance network processing). [Edit August 2021] More recently, it was chosen to serve as an eBPF runtime for the implementation of eBPF for Windows (references: announcement, my analysis).
As you can see, the interest of having these user space eBPF machines entirely depends on what you do with it. They're available to run eBPF programs, they don't have a specific focus by themselves, but they can help if you have a use case. In that regard, one of the particularities for uBPF and rbpf is their licenses (Apache, MIT): They are not under GPL, which means that you could reuse them with a larger number of projects, including proprietary ones. This is not the case with code from the kernel.
One big limitation for those user space eBPF machines is also that they tend to be quite out-of-date with regards to what happens in the kernel, where things evolve fast. They don't have a solid verifier, so you cannot assert security or safety of the programs. They hardly support eBPF maps if at all, they do not support function calls, or BTF, or even the latest eBPF instructions for that matter. Some of it could be added, but this would require some engineering efforts and time. [Edit August 2021] uBPF is getting a lot of activity now that Microsoft contributes to it for its eBPF implementation. They also use a user-space verifier, PREVAIL.

Related

How to specify the physical CoreIDs used for "CLOSE" when specifying OMP_PROC_BIND?

We are trying to optimize HPC applications using OpenMP on a new hardware platform. These applications need precise placement/pinning of their cores or performance falls in half. Currently, we provide the user a custom GOMP_CPU_AFFINITY map for each platform, but this is cumbersome, because it's different on each hardware version, and even platforms with different firmware versions sometimes change their CoreID physical mappings - all things impossible for the user to detect on the fly.
It would be a great help if HPC applications could simply set GOMP_PROC_BIND to "close" and OpenMP would do the right thing for the given platform - but to make this possible, the hardware vendor would need to define what "close" means for each machine. We'd like to do this, but we can't tell how/where OpenMP gets CoreID lists to use for things like close, spread, etc. (For various external requirements, the CoreID spatial pattern on this machine would appear utterly random to a software writer.)
Any advice as to where/how OpenMP defines the CoreID lists for OMP_PROC_BIND so we could configure them? We are comfortable with the idea that we might need a custom version of OpenMP (with altered source code) for this platform if needed.
Thanks, everyone. :)
Jeff
Expanding on what #VictorEijkhout said...
You seem have invented an envirable that I can't find anywhere with Google (GOMP_PROC_BIND), with the OpenMP standard envirable (OMP_PROC_BIND). If GOMP_PROC_BIND exists the name suggests that it is a GNU feature. Note too that one of the two Google hits for GOMP_PROC_BIND says "Code that reads the setting is buggy. Setting is invalid and ignored at runtime." So, if you are setting that it is unsurprising that it has no effect!
I will therefore answer for the more general case of OMP_PROC_BIND.
The binding of OpenMP threads to logicalCPUs clearly has to be done at runtime, since, beyond its ISA, the compiler has no knowledge of the hardware on which the compiled code will run. Therefore you need to be looking at the runtime library code.
I have not looked at GNU's libgomp, but, where it can, LLVM's libomp uses the hwloc library to explore the machine hardware. Since hwloc also includes other useful tools for machine exploration (such as lstopo) it is likely that your effort is best invested in ensuring good hwloc support on your machine, at which point there will be no need to delve inside the OpenMP runtime.

Can userspace code leverage NVIDIA's open-sourcing of their kernel modules?

NVIDIA has recently announced they are open-sourcing (a variant of) their GPU Linux kernel driver. They are not, however, open-sourcing the user-mode driver libraries (e.g. libcuda.so).
It's a gradual process and not all GPUs are supported initially, but regardless of these details: Is there some way that developers of user-space code can leverage this open-sourcing? Or is it only interesting/useful for kernel developers?
What I would personally love to be able to do is avoid having to make libcuda calls to get the current context. If that piece of information were somehow readable now from userspace, that could be neat. Of course that's just wishful thinking on my part - I don't know how to check what the driver directly "exposes" - if anything.

Measuring time in omp_fn routines

I am writing a pintool gathering metrics in a subset of applications routines(some among them, are generated by the compiler).
The goal is to get the execution time of those routines.
Below is a set of attempts I already gave:
Of course doing it with pin is a bad idea because of the Virtual Machine overhead.
gcc option -finstrument-functions does not scope the OpenMP functions it generates.
LD_PRELOAD does not work with OpenMP functions which are statically linked.
Maybe if pin allowed to dump statically instrumented assembly, we could avoid the virtual environment overhead, but as far as I know it isn't possible.
I know about Maqao instrumentation tool which do not use virtual environment, but I want to avoid using too many frameworks or translating my pintool into maqao lua script.
I guess I am left with manual binary instrumentation, but if anybody has a better solution, the help will be appreciated.
If you just want the results - use a comprehensive measurement infrastructure that supports OpenMP such as Intel VTune, Extrae/Paraver, Score-P. This will provide you profiling or tracing information about the OpenMP regions.
If you want to implement the measurement yourself, you can use the underlying source-to-source transformation tool Opari. You could also use the much cleaner OpenMP tools interface (OMPT), but AFAIK it is not widely supported yet. You might have some luck with recent Intel OpenMP runtimes.

Differences in kernel mode and drivers

I am just trying to understand the differences to patching into the kernel and writing a driver.
It is my understanding that a kernel mode driver can do anything the kernel can do, and is similar in some ways to a linux module.
Why then, were AV makers so upset when Microsoft stopped them from patching into the Windows kernel?
What kind of stuff can you do through kernel patching that you can't do through a driver?
In this context patching the kernel means modifying its (undocumented?) internal structures in order to achieve some functionality, typically hooking various functions (e.g. opening a file). You are not supposed to go messing around with internal kernel structures that do not belong to you. In the past Microsoft did not provide official hooks for some things, so security companies reverse engineered the internals and hooked the kernel directly. Recently Microsoft has provided official hooks for some things, so the need to hook the kernel directly is not as strong.
It's true that a kernel-mode driver can do anything the kernel can do - after all, they both run in ring 0. The key question here is: how difficult is it? Patching things relies on internal details that may change between different kernel releases. For example, the system call number of NtTerminateProcess will change between versions, so a driver which hooks the SSDT will break between versions (although the system call number can be obtained through other means). Reading or modifying fields of internal structures such as EPROCESS or ETHREAD is risky as well, because again, these structures change between versions. None of this is impossible for a driver to do, but it's hard.
If an official interface is provided for hooking, Microsoft can guarantee compatibility between versions as well as being able to control who can do what (e.g. only signed drivers can use the object manager callbacks). However, Microsoft can't do this for everything, because some things are just implementation details that drivers shouldn't know about.

Quick CPU ring mode protection question

I am very curious in messing up with HW. But my top level "messing" so far was linked or inline assembler in C program. If my understanding of CPU and ring mode is right, I cannot directly from user mode app access some low level CPU features, like disabling interrupts, or changing protected mode segments, so I must use system calls to do everything I want.
But, if I am right, drivers can run in ring mode 0. I actually donĀ“t know much about drivers, but this is what I ask for. I just want to know, is learning how to write your own drivers and than call them the way I should go, to do what I wrote?
I know I could write whole new OS (at least to some point), but what I exactly want to do is acessing some low level features of HW from standart windows application. So, is driver the way to go?
Short answer: yes.
Long answer: Managing access to low-level hardware features is exactly the job of the OS kernel and if you only want access to a single feature there's no need to start your own OS from scratch. Most modern OSes, such as WIndows, Linux, or the BSDs, allow you to add code to the kernel through kernel modules.
When writing a kernel module (or device driver), you write code that is going to be executed inside the OS kernel and will thus be running in CPU ring 0. Great power comes with great responsibility, which in this case means that you should really know what you're doing as there will be no pre-configured OS interface to prevent you from doing the wrong things. You should therefore study the manuals of your hardware (e.g., Intel's x86 software developer's manuals, device specs, ...) as well as standard operating systems development literature (where you're also going to find plenty on the web -- OSDev, OSDever, OSR, Linux Device Drivers).
If you want to play with HW write some programs for 16-bit real-mode (or even with your own transition to protected-mode). There you have to deal with ASM, BIOS interrupts, segments, video memory and a lot of other low-level stuff.

Resources