Hook Arbitrary Kernel Function through Module - linux-kernel

I am trying to monitor all times the PCIe stack writes configures to a device. In the absence of a PCI equivalent of usbmon, I thought to monitor all times the pci_bus_write_config_byte() function is called. I wanted to write a kernel module that essentially did this:
int (*original)(struct pci_bus *, unsigned int, int, u8);
original = &pci_bus_write_config_byte;
pci_bus_write_config_byte = &my_custom_func;
And then my custom function will printk() whatever data is passed, and return the original pci_bus_write_config_byte. However, when I load the module nothing happens. I suspect this is due to some sort of RW protection.
My google searches revealed that set_memory_rw() is supposed to make a function pointer writable, but I am not able to properly include it or use this function - when I go to insmod the module, the kernel says there are unknown symbols.
Any ideas on how one would do this?

Related

How can an ebpf program change kernel execution flow or call kernel functions?

I'm trying to figure out how an ebpf program can change the outcome of a function (not a syscall, in my case) in kernel space. I've found numerous articles and blog posts about how ebpf turns the kernel into a programmable kernel, but it seems like every example is just read-only tracing and collecting statistics.
I can think of a few ways of doing this: 1) make a kernel application read memory from an ebpf program, 2) make ebpf change the return value of a function, 3) allow an ebpf program to call kernel functions.
The first approach does not seem like a good idea.
The second would be enough, but as far as I understand it's not easy. This question says syscalls are read-only. This bcc document says it is possible but the function needs to be whitelisted in the kernel. This makes me think that the whitelist is fixed and can only be changed by recompiling the kernel, is this correct?
The third seems to be the most flexible one, and this blog post encouraged me to look into it. This is the one I'm going for.
I started with a brand new 5.15 kernel, which should have this functionality
As the blog post says, I did something no one should do (security is not an issue since I'm just toying with this) and opened every function to ebpf by adding this to net/core/filter.c (which I'm not sure is the correct place to do so):
static bool accept_the_world(int off, int size,
enum bpf_access_type type,
const struct bpf_prog *prog,
struct bpf_insn_access_aux *info)
{
return true;
}
bool export_the_world(u32 kfunc_id)
{
return true;
}
const struct bpf_verifier_ops all_verifier_ops = {
.check_kfunc_call = export_the_world,
.is_valid_access = accept_the_world,
};
How does the kernel know of the existence of this struct? I don't know. None of the other bpf_verifier_ops declared are used anywhere else, so it doesn't seem like there is a register_bpf_ops
Next I was able to install bcc (after a long fight due to many broken installation guides).
I had to checkout v0.24 of bcc. I read somewhere that pahole is required when compiling the kernel, so I updated mine to v1.19.
My python file is super simple, I just copied the vfs example from bcc and simplified it:
bpf_text_kfunc = """
extern void hello_test_kfunc(void) __attribute__((section(".ksyms")));
KFUNC_PROBE(vfs_open)
{
stats_increment(S_OPEN);
hello_test_kfunc();
return 0;
}
"""
b = BPF(text=bpf_text_kfunc)
Where hello_test_kfunc is just a function that does a printk, inserted as a module into the kernel (it is present in kallsyms).
When I try to run it, I get:
/virtual/main.c:25:5: error: cannot call non-static helper function
hello_test_kfunc();
^
And this is where I'm stuck. It seems like it's the JIT that is not allowing this, but who exactly is causing this issue? BCC, libbpf or something else? Do I need to manually write bpf code to call kernel functions?
Does anyone have an example with code of what the lwn blog post I linked talks about actually working?
eBPF is fundamentally made to extend kernel functionality in very specific limited ways. Essentially a very advanced plugin system. One of the main design principles of the eBPF is that a program is not allowed to break the kernel. Therefor it is not possible to change to outcome of arbitrary kernel functions.
The kernel has facilities to call a eBPF program at any time the kernel wants and then use the return value or side effects from helper calls to effect something. The key here is that the kernel always knows it is doing this.
One sort of exception is the BPF_PROG_TYPE_STRUCT_OPS program type which can be used to replace function pointers in whitelisted structures.
But again, explicitly allowed by the kernel.
make a kernel application read memory from an ebpf program
This is not possible since the memory of an eBPF program is ephemaral, but you could define your own custom eBPF program type and pass in some memory to be modified to the eBPF program via a custom context type.
make ebpf change the return value of a function
Not possible unless you explicitly call a eBPF program from that function.
allow an ebpf program to call kernel functions.
While possible for a number for purposes, this typically doesn't give you the ability to change return values of arbitrary functions.
You are correct, certain program types are allowed to call some kernel functions. But these are again whitelisted as you discovered.
How does the kernel know of the existence of this struct?
Macro magic. The verifier builds a list of these structs. But only if the program type exists in the list of program types.
/virtual/main.c:25:5: error: cannot call non-static helper function
This seems to be a limitation of BCC, so if you want to play with this stuff you will likely have to manually compile your eBPF program and load it with libbpf or cilium/ebpf.

cdev_alloc() vs cdev_init()

In Linux kernel modules, two different approaches can be followed when creating a struct cdev, as suggested in this site and in this answer:
First approach, cdev_alloc()
struct cdev *my_dev;
...
static int __init example_module_init(void) {
...
my_dev = cdev_alloc();
if (my_dev != NULL) {
my_dev->ops = &my_fops; /* The file_operations structure */
my_dev->owner = THIS_MODULE;
}
else
...
}
Second approach, cdev_init()
static struct cdev my_cdev;
...
static int __init example_module_init(void) {
...
cdev_init(&my_cdev, my_fops);
my_cdev.owner = THIS_MODULE;
...
}
(assuming that my_fops is a pointer to an initialized struct file_operations).
Is the first approach deprecated, or still in use?
Can cdev_init() be used also in the first approach, with cdev_alloc()? If no, why?
The second question is also in a comment in the linked answer.
Can cdev_init() be used also in the first approach, with cdev_alloc()?
No, cdev_init shouldn't be used for a character device, allocated with cdev_alloc.
At some extent, cdev_alloc is equivalent to kmalloc plus cdev_init. So calling cdev_init for a character device, created with cdev_alloc, has no sense.
Moreover, a character device allocated with cdev_alloc contains a hint that the device should be deallocated when no longer be used. Calling cdev_init for that device will clear that hint, so you will get a memory leakage.
Selection between cdev_init and cdev_alloc depends on a lifetime you want a character device to have.
Usually, one wants lifetime of a character device to be the same as lifetime of the module. In that case:
Define a static or global variable of type struct cdev.
Create the character device in the module's init function using cdev_init.
Destroy the character device in the module's exit function using cdev_del.
Make sure that file operations for the character device have .owner field set to THIS_MODULE.
In complex cases, one wants to create a character device at specific point after module's initializing. E.g. a module could provide a driver for some hardware, and a character device should be bound with that hardware. In that case the character device cannot be created in the module's init function (because a hardware is not detected yet), and, more important, the character device cannot be destroyed in the module's exit function. In that case:
Define a field inside a structure, describing a hardware, of pointer type struct cdev*.
Create the character device with cdev_alloc in the function which creates (probes) a hardware.
Destroy the character device with cdev_del in the function which destroys (disconnects) a hardware.
In the first case cdev_del is called at the time, when the character device is not used by a user. This guarantee is provided by THIS_MODULE in the file operations: a module cannot be unloaded if a file, corresponded to the character device, is opened by a user.
In the second case there is no such guarantee (because cdev_del is called NOT in the module's exit function). So, at the time when cdev_del returns, a character device can be still in use by a user. And here cdev_alloc really matters: deallocation of the character device will be deferred until a user closes all file descriptors associated with the character device. Such behavior cannot be obtained without cdev_alloc.
They do different things. The preference would be usual - prefer not to use dynamic allocation when not needed and allocate on stack when it's possible.
cdev_alloc() dynamically allocates my_dev, so it will call kfree(pointer) when cdev_del().
cdev_init() will not free the pointer.
Most importantly, the lifetime of the structure my_cdev is different. In cdev_init() case struct cdev my_cdev is bound to the containing lexical scope, while cdev_alloc() returns dynamically allocate pointer valid up until free-d.

Crash while accessing Address passed via ioctl in linux kernel

I am passing pointer to below structure through ioctl.
typedef struct myparam_t {
int i;
char *myname;
}myparam;
I allocate memory for myname in the user space before passing the argument.
In the ioctl implementation at kernel space I use copy_from_user() to copy the argument in kspace variable (say kval). call is like copy_from_user(kval,uval,sizeof(myparam)); kval & uval are of myparam * type.
Now in ioctl function I have check like -
if (uval->myname != NULL) {
}
I thought this will result in to kernel crash or panic due to page fault.
However I see different behaviour with different version of kernel.
With 4.1.21 kernel I see no issue with NULL check, But with 4.10.17 kernel I see a page fault that results in to kernel panic.
My question is what might make NULL check working in kernel 4.1.21 ?
Is there a way I can make same code working for 4.10.17 as well ?
Note that I am facing this issue as part of kernel migration. We are trying to migrate from version 4.1.21 to 4.10.17. There may be many instance of such usage in our code base. Hence, if possible, I'd prefer to put a patch in kernel to fix the issue, rather than fixing all such instance.
Thanks
Dipak.

Successfully de-referenced userspace pointer in kernel space without using copy_from_user()

There's a bug in a driver within our company's codebase that's been there for years.
Basically we make calls to the driver via ioctls. The data passed between userspace and driver space is stored in a struct, and the pointer to the data is fed into the ioctl. The driver is responsible to dereference the pointer by using copy_from_user(). But this code hasn't been doing that for years, instead just dereferencing the userspace pointer. And so far (that I know of) it hasn't caused any issues until now.
I'm wondering how this code hasn't caused any problems for so long? In what case will dereferencing a pointer in kernel space straight from userspace not cause a problem?
In userspace
struct InfoToDriver_t data;
data.cmd = DRV_SET_THE_CLOCK;
data.speed = 1000;
ioctl(driverFd, DEVICE_XX_DRIVER_MODIFY, &data);
In the driver
device_xx_driver_ioctl_handler (struct inode *inode, struct file *filp, unsigned int cmd, unsigned long arg)
{
struct InfoToDriver_t *user_data;
switch(cmd)
{
case DEVICE_XX_DRIVER_MODIFY:
// what we've been doing for years, BAD
// But somehow never caused a kernel oops until now
user_data = (InfoToDriver_t *)arg;
if (user_data->cmd == DRV_SET_THE_CLOCK)
{ .... }
// what we're supposed to do
copy_from_user(user_data, (void *)arg, sizeof(InfoToDriver_t));
if (user_data->cmd == DRV_SET_THE_CLOCK)
{ ... }
A possible answer is, this depends on the architecture. As you have seen, on a sane architecture (such as x86 or x86-64) simply dereferencing __user pointers just works. But Linux pretends to support every possible architecture, there are architectures where simple dereference does not work. Otherwise copy_to/from_user won't existed.
Another reason for copy_to/from_user is possibility that usermode side modifies its memory simultaneously with the kernel side (in another thread). You cannot assume that the content of usermode memory is frozen while accessing it from kernel. Rougue usermode code can use this to attack the kernel. For example, you can probe the pointer to output data before executing the work, but when you get to copy the result back to usermode, this pointer is already invalid. Oops. The copy_to_user API ensures (should ensure) that the kernel won't crash during the copy, instead the guilty application will be killed.
A safer approach is to copy the whole usermode data structure into kernel (aka 'capture'), check this copy for consistency.
The bottom line... if this driver is proven to work well on certain architecture, and there are no plans to port it, there's no urgency to change it. But check carefully robustness of the kernel code, if capture of the usermode data is needed, or problem may arise during copying from usermode.

Regarding msgrcv in android-kernel?

I was running a test suite for testing IPC related functionality in android kernel. while I was testing msgrcv system call , it return error function not implemented.
So is it true msgrcv() system call not implemented in android-kernel, if so why and which system call in android kernel serve purpose of msgrcv() system call.
I got related statement which says System V IPCs (including message queues) are not implemented on Bionic. but not sure what does it mean.
Update : I am able to find definition of msgrcv in android kernel , but not sure why it is returning error function not implemented.
Below code snippet :
SYSCALL_DEFINE5(msgrcv, int, msqid, struct msgbuf __user *, msgp, size_t, msgsz,
long, msgtyp, int, msgflg)
{
return do_msgrcv(msqid, msgp, msgsz, msgtyp, msgflg, do_msg_fill);
}
Please comment if information seems incomplete or vague ,Help is appreciated.
System V IPC may be available in the kernel but system call interfaces are not implemented in Bionic lib C. For Example, /bionic/libc/arch-arm/syscalls/ contains all system call implementations with respect to ARM.

Resources