ebpf: intercepting function calls - linux-kernel

I am reading about kprobes BPF program type, and am wondering if it is possible to not just intercept a function call for tracing purposes or collect some low-level information (registers, stack etc.), but substitute a call and execute instead of the actual function?
Does kprobe provide this capability or I'm looking at the wrong tool?

No, kprobes BPF programs have only read access to the syscall parameters and return value, they cannot modify registers and therefore cannot intercept function calls. This is a limitation imposed by the BPF verifier.
Kernel modules, however, can intercept function calls using kprobes.

Related

What is the difference between the following two eBPF program types BPF_PROG_TYPE_SYSCALL and BPF_PROG_TYPE_KPROBE?

So I am assuming that BPF_PROG_TYPE_SYSCALL programs are triggered whenever a particular syscall is executed inside the kernel. Can't BPF_PROG_TYPE_KPROBE ebpf programs be used for that purpose? kprobes can hook into any kernel function and syscalls are also kernel functions.
So what is the difference between the two types of programs and when to use which?
You would think that but actually BPF_PROG_TYPE_SYSCALL is a program type which can execute syscalls itself. https://lwn.net/Articles/854228/ It was introduced as an attempt to make one BPF program load another so the first program can be signed with a certificate. But it hasn't caught on very well yet as of writing this.
Indeed if you want to trigger on syscall execution, kprobes are the way to go.

How can an ebpf program change kernel execution flow or call kernel functions?

I'm trying to figure out how an ebpf program can change the outcome of a function (not a syscall, in my case) in kernel space. I've found numerous articles and blog posts about how ebpf turns the kernel into a programmable kernel, but it seems like every example is just read-only tracing and collecting statistics.
I can think of a few ways of doing this: 1) make a kernel application read memory from an ebpf program, 2) make ebpf change the return value of a function, 3) allow an ebpf program to call kernel functions.
The first approach does not seem like a good idea.
The second would be enough, but as far as I understand it's not easy. This question says syscalls are read-only. This bcc document says it is possible but the function needs to be whitelisted in the kernel. This makes me think that the whitelist is fixed and can only be changed by recompiling the kernel, is this correct?
The third seems to be the most flexible one, and this blog post encouraged me to look into it. This is the one I'm going for.
I started with a brand new 5.15 kernel, which should have this functionality
As the blog post says, I did something no one should do (security is not an issue since I'm just toying with this) and opened every function to ebpf by adding this to net/core/filter.c (which I'm not sure is the correct place to do so):
static bool accept_the_world(int off, int size,
enum bpf_access_type type,
const struct bpf_prog *prog,
struct bpf_insn_access_aux *info)
{
return true;
}
bool export_the_world(u32 kfunc_id)
{
return true;
}
const struct bpf_verifier_ops all_verifier_ops = {
.check_kfunc_call = export_the_world,
.is_valid_access = accept_the_world,
};
How does the kernel know of the existence of this struct? I don't know. None of the other bpf_verifier_ops declared are used anywhere else, so it doesn't seem like there is a register_bpf_ops
Next I was able to install bcc (after a long fight due to many broken installation guides).
I had to checkout v0.24 of bcc. I read somewhere that pahole is required when compiling the kernel, so I updated mine to v1.19.
My python file is super simple, I just copied the vfs example from bcc and simplified it:
bpf_text_kfunc = """
extern void hello_test_kfunc(void) __attribute__((section(".ksyms")));
KFUNC_PROBE(vfs_open)
{
stats_increment(S_OPEN);
hello_test_kfunc();
return 0;
}
"""
b = BPF(text=bpf_text_kfunc)
Where hello_test_kfunc is just a function that does a printk, inserted as a module into the kernel (it is present in kallsyms).
When I try to run it, I get:
/virtual/main.c:25:5: error: cannot call non-static helper function
hello_test_kfunc();
^
And this is where I'm stuck. It seems like it's the JIT that is not allowing this, but who exactly is causing this issue? BCC, libbpf or something else? Do I need to manually write bpf code to call kernel functions?
Does anyone have an example with code of what the lwn blog post I linked talks about actually working?
eBPF is fundamentally made to extend kernel functionality in very specific limited ways. Essentially a very advanced plugin system. One of the main design principles of the eBPF is that a program is not allowed to break the kernel. Therefor it is not possible to change to outcome of arbitrary kernel functions.
The kernel has facilities to call a eBPF program at any time the kernel wants and then use the return value or side effects from helper calls to effect something. The key here is that the kernel always knows it is doing this.
One sort of exception is the BPF_PROG_TYPE_STRUCT_OPS program type which can be used to replace function pointers in whitelisted structures.
But again, explicitly allowed by the kernel.
make a kernel application read memory from an ebpf program
This is not possible since the memory of an eBPF program is ephemaral, but you could define your own custom eBPF program type and pass in some memory to be modified to the eBPF program via a custom context type.
make ebpf change the return value of a function
Not possible unless you explicitly call a eBPF program from that function.
allow an ebpf program to call kernel functions.
While possible for a number for purposes, this typically doesn't give you the ability to change return values of arbitrary functions.
You are correct, certain program types are allowed to call some kernel functions. But these are again whitelisted as you discovered.
How does the kernel know of the existence of this struct?
Macro magic. The verifier builds a list of these structs. But only if the program type exists in the list of program types.
/virtual/main.c:25:5: error: cannot call non-static helper function
This seems to be a limitation of BCC, so if you want to play with this stuff you will likely have to manually compile your eBPF program and load it with libbpf or cilium/ebpf.

Calling schedule() inside Linux IRQ

I'm making an emulation driver that requires me to call schedule() in ATOMIC contexts in order to make the emulation part work. For now I have this hack that allows me to call schedule() inside ATOMIC (e.g. spinlock) context:
int p_count = current_thread_info()->preempt_count;
current_thread_info()->preempt_count = 0;
schedule();
current_thread_info()->preempt_count = p_count;
But that doesn't work inside IRQs, the system just stops afer calling schedule().
Is there any way to hack the kernel in a way to allow me to do it? I'm using Linux kernel 4.2.1 with User Mode Linux
In kernel code you can be either in interrupt context or in process context.
When you are in interrupt context, you cannot call any blocking function (e.g., schedule()) or access the current pointer. That's related to how the kernel is designed and there is no way for having such functionalities in interrupt context. See also this answer.
Depending on what is your purpose, you can find some strategy that allows you to reach your goal. To me, it sounds strange that you have to call schedule() explicitly instead of relying on the natural kernel flow.
One possible approach follows (but, again, it depends on your specific goal). Form the IRQ you can schedule the work on a work queue through schedule_work(). The work queue, in fact, by design, executes kernel code in process context. From there, you are allowed to call blocking functions and access the current process data.

Syscall implementation kernel module 2.6

after doing some reading I came to understand that adding a new syscall via a LKM has gotten harder in 2.6. It seems that the syscall table is not exported any longer, therefore making it (impossible?) to insert a new call at runtime.
The stuff I want to achieve is the following.
I have a kernel module which is doing a specific task.
This task depends on input which should be provided by a user land process.
This information needs to reach the module.
For this purpose I would introduce a new syscall which is implemented in the kernel module and callable from the user land process.
If I have to recompile the kernel in order to add my new syscall, I would also need to write the actual syscall logic outside of the kernel module, correct?
Is there another way to do this?
Cheers,
eeknay
Syscalls are not the correct interface for this sort of work. At least, that's the reason kernel developers made adding syscalls difficult.
There are lots of different ways to move data between userspace and a kernel module: the proc and sysfs pseudo-filesystems, char device interface (using read or write or ioctl), or the local pseudo-network interface netlink.
Which one you choose depends on the amount of type of data you want to send. You should probably only use proc/sysfs if you intend to pass only tiny amounts of data; for big bulk transfers char device or netlink are better suited.
Impossible -- no.
AV modules and rootkits do it all the time.

An impasse with hooking calls to HeapAlloc for a memory tracking application

I am writing a memory tracking application that hooks all the calls to HeapAlloc using IAT patching mechanism. The idea is to capture all the calls to HeapAlloc and get a callstack.
However I am currently facing a problem with getting the callstack using DBGHELP Apis. I found that the dbghelp dll itself is linking to MSVCRT dll and this dependancy results in a recursive call. When I try to get a callstack for any of the calls from the target application, dbghelp internally calls some method from MSVCRT that again calls HeapAlloc. And since I have already patched MSVCRT it results in an infinite loop.
Has anyone faced this problem and solved it ? Is there any way out of this impasse?
This is a standard problem in function interception code. We had a similar issue with a logging library that used shared memory for storing log level information, while the shared memory library had to log information.
The way we fixed it could be applied to your situation, I believe.
In your intercept code, maintain a static flag that indicates whether or not you're in the middle of an intercept. When your intercept is called and the flag isn't set, set the flag then do what you currently do, including calling DbgHelp, then clear the flag.
If your intercept is called while the flag is set, only call the back-end HeapAlloc code without doing any of the other stuff (including calling DbgHelp which is what's causing your infinite recursion).
Something along the lines of (pseudo-code):
function MyHookCode:
static flag inInterceptMode = false
if inInterceptMode:
call HeapAlloc
return
inInterceptMode = true
call DbgHelp stuff
call HeapAlloc
inInterceptMode = false
return
function main:
hook HeapAlloc with MyHookCode
: : :
return
What about using some real memory tracking products like GlowCode?
You can use Deviare API Hook and get full stack-trace without using that API that has a big number of problems.

Resources