Detect write to DebugFS - linux-kernel

I have a kernel module that creates several DebugFS entries, each 4 to 8 bytes. I would like to use one (or more) of these entries to initiate action within the kernel module--in other words, I want to use an entry for configuration purposes.
Is there a common idiom to detect the user write to the DebugFS entry without polling (some kind of user-space to kernel space signal) within my kernel module, or is sleep/poll the best (only?) option.

Helper functions like debugfs_create_u32() are intended for cases where you want to be able to change a variable without any other helper code.
If you want to do anything but setting a variable, you have to implement your own file operations with debugfs_create_file().

Related

Where are page permissions stored on hardware and how can I alter them directly?

I'm trying to write a pseudo kernel driver (it uses CVE 2018-8120 to get kernel permission so it's technically not a driver) and I want to be as safe as possible when entering ring0. I'm writing a function to read and write MSR's from userland, and before the transition to ring0 I'm trying to guarantee that the void pointer given to my function can be written, I decided the ideal way to do this was to make it writable if it is not already.
The problem is that the only way I know how to do this is with VirtualProtect() and NtAllocateVirtualMemory, but VirtualProtect() sometimes fails and returns an error instead. I want to know precisely where these access permissions are stored (in ram? in some special CPU register?) how I can obtain their address and how can I modify them directly?
User-mode code should never try to muck around in kernel data structures, and any properly written kernel will prevent it anyway. The best way for user mode code to ensure that an address can be written is to write to it. If the page was not already writeable, the page fault will cause the kernel to make it so.
Nevertheless, the kernel code /cannot/ rely on the application having done so, for two reasons:
1) Even if the application does it properly, the page might be unmapped again before (or after) entering ring 0.
2) The kernel should /never/ rely on application code to do the right thing. It always has to protect itself.
The access permissions information and page data is stored in the page directory, page table, CR0 and CR3.
More information can be found here: https://wiki.osdev.org/Paging.

BPF: owner of a map

This is follow-up to who creates map in BPF since my new question is not directly relevant that thread.
So, it seems to me that there has to be a single point where a BPF map is created, either it is a bpf program or a user program that loads bpf etc.
A BPF program has to know type of maps it is going to work with at compile-time, so we need:
struct bpf_map_def SEC("maps") my_map = {
...
};
So it means that a user program, for example bpftool, will initiate creation of maps found in bpf ELF sections, as was shown in who creates map in BPF thread.
On the other hand, user application will need to add/delete entries in the map. For this to happen, it has to know map's ID in order to obtain get map's fd with bpf_map_get_fd_by_id() from libbpf. After that we can enjoy bpf_map_update_elem() and similar APIs.
On the other hand, if we declared a map section in the BPF program and do have map API in use, the map(s) will be preserved in the kernel and will be allocated IDs.
So in this case, we are going to have two maps with two different IDs: one created as a result of bpf_prog_load() from bpftool, and the other from the user application's bpf_create_map() (assuming that the application continues running, e.g. update maps, and does not return to shell).
There must be a way to bypass this ambiguity?
I am not completely sure I understand your question, let me try to rephrase this.
You load an eBPF program with bpftool, which creates all maps needed by the program. bpftool is a user space application, and ultimately creates maps with the bpf(BPF_MAP_CREATE, …) syscall.
You have another user space application foobar that interacts with these maps, possibly by using libbpf (that in turns ends up performing bpf(BPF_MAP_*, …) syscalls) to look up, update or delete elements from the maps.
As I understand it, this second application foobar also tries to create the maps. Hence you have a conflict between the maps created by bpftool and the one created by foobar.
If this is correct, the solution is “simple”: do not create the maps twice.
This means that you should either delete the calls to bpf_create_map() from your other application foobar, or load your programs with something else than bpftool. Usually, the workflow consists in having the maps described in the eBPF object file, and created by the same application that loads the program, just before loading—this is what bpftool does. Then the application has the file descriptor for the map and can work on it.
Alternatively, it is possible to pin the map under the BPF virtual file system (/sys/fs/bpf/) so that another application can retrieve the file descriptor, and also access this map. This is done with the syscall bpf(BPF_OBJ_GET, …) (not yet documented on the man page at this time, at least on my system).
If I am correct, using pinned maps can also allow one to reuse an already existing map when loading a new eBPF program. I believe tc from package iproute2 intends to do that if the map described exists and is pinned already (see file lib/bpf.c, but the code is not exactly easy to read). This would typically be performed at relocation time.
Maps IDs were added recently, and primarily for debug or introspection, but they may provide another way to retrieve the file descriptor to a map in your case, as you describe with bpf_map_get_fd_by_id(). Although you have to find a way to get the ID in the first place.
Hope this helps!

Which methods/calls perform the disk I/O operations and how to find them?

Which methods and system calls should I hook into, so I can replace 'how' an OS X app (the target) reads and writes to/from the HD?.
How may I determine that list of functions or system calls?.
Adding more context:
This is a final project and I'm looking for advise. The goal is to alter the behavior of an OS X app, adding it data encryption and decryption capabilities.
Which tools could I use to achieve my goal, and why?
For instance, assume the target app is Text Edit. Instead of saving "hello world" as plain text in a .txt file in the HD, it'll save: "ifmmnXxnpme". Opening the file will show the original text.
I think its better to get more realistic or at least conscious of what you want to do.
The lowest level in software is a kernel module on top of the storage modules, that "encrypt" the data.
In Windows you can stack drivers, so conceptually you simply intercept the call for a read/write, edit it and pass it down the driver stack.
Under BSD there is an equivalent mechanism surely, but I don't know precisely what it is.
I don't think you want to dig into kernel programming.
At the lowest level from an user space application point of view, there are the system calls.
The system calls used to write and read are respectively the number 3 and 4 (see here), in BSD derived OS, like OS X, they becomes 2000003h and 2000004h (see here).
This IA32e specific since you are using Apple computers.
Files can be read/written by memory mapping them, so you would need to hijack the system call sys_mmap too.
This is more complex as you need to detect page faults or any mechanism used to implement file mapping.
To hijack system calls you need a kernel module again.
The next upper level of abstraction is the runtime, that probably is the Obj C runtime (up to data, Swift still use Obj C runtime AFAIK).
An Obj C application use the Cocoa Framework and can read/write to file with calls like [NSData dataWithContentOfFile: myFileName] or [myData writeToFile: myFileName atomically:myAtomicalBehavior].
There are plenty of Cocoa methods that write to or read from file, but internally the framework will use few methods from the Obj C runtime.
I'm not an expert of the internals of Cocoa, so you need to take a debugger and look what the invocation chain is.
Once you have found the "low level" methods that read or write to files you can use method swizzling.
If the target app load your code as part of a library, this is really simple, otherwise you need more clever techniques (like infecting or manipulating the memory of the other process directly). You can google around for more info.
Again to be honest this is still a lot of work, although manageable.
You may consider to simply hijack a limited set of Cocoa methods, for example the writeToFile of NSData or similar for NSString and consider the project a work in progress demo.
A similar question has been asked and answered here.

user defined page fault and exception handlers

I am trying to understand if we can add our page fault handlers / exception handlers in kernel / user mode and handle the fault we induced before giving the control back to the kernel.
The task here will be not modifying the existing kernel code (do_page_fault fn) but add a user defined handler which will be looked up when a page fault or and exception is triggered
One could find tools like "kprobe" which provide hooks at instruction, but looks like this will not serve my purpose.
Will be great if somebody can help me understand this or point to good references.
From user space, you can define a signal handler for SIGSEGV, so your own function will be invoked whenever an invalid memory access is made. When combined with mprotect(), this lets a program manage its own virtual memory, all from user-space.
However, I get the impression that you're looking for a way to intercept all page faults (major, minor, and invalid) and invoke an arbitrary kernel function in response. I don't know a clean way to do this. When I needed this functionality in my own research projects, I ended up adding code to do_page_fault(). It works fine for me, but it's a hack. I would be very interested if someone knew of a clean way to do this (i.e., that could be used by a module on a vanilla kernel).
If you don't won't to change the way kernel handles these fault and just add yours before, then kprobes will server your purpose. They are a little difficult to handle, because you get arguments of probed functions in structure containing registers and on stack and you have to know, where exactly did compiler put each of them. BUT, if you need it for specific functions (known during creation of probes), then you can use jprobes (here is a nice example on how to use both), which require functions for probing with exactly same arguments as probed one (so no mangling in registers/stack).
You can dynamically load a kernel module and install jprobes on chosen functions without having to modify your kernel.
You want can install a user-level pager with gnu libsegsev. I haven't used it, but it seems to be just what you are looking for.
I do not think it would be possible - first of all, the page fault handler is a complex function which need direct access to virtual memory subsystem structures.
Secondly, imagine it would not be an issue, yet in order to write a page fault handler in user space you should be able to capture a fault which is by default a force transfer to kernel space, so at least you should prevent this to happen.
To this end you would need a supervisor to keep track of all memory access, but you cannot guarantee that supervisor code was already mapped and present in memory.

Linux Kernel - programmatically retrieve block numbers as they are written to

I want to maintain a list of block numbers as they are physically written to using the linux kernel source. I plan to modify the kernel source to do this. I just need to find the structure and functions in the kernel source that handle writing to physical partitions and get the block numbers as they write to the physical partition.
Any way of doing this? Any help is appreciated. If I can find where the kernel is actually writing to the partitions and returning the block numbers, that'd work.
I believe you could do this entirely from userspace, without modifying the kernel, using the blktrace interface.
It isn't just one place to check. For instance, if the block device was an iSCSI or AoE target, you would be looking for their respective drivers, then ultimately the same on the other end.
The same would go for normal SCSI, misc flash devices, etc, minus network interaction.
VFS just pulls these all together in a convenient, unified and consistent interface for calls like read() and write() to work while providing buffering. The actual magic, including ordering and write barriers are handled by the block dev drivers themselves.
In the case of using device mapper, the path alters slightly. It goes from vfs -> dm_(target) -> blockdev_driver.

Resources