Read highmemory page address in a BPF program - linux-kernel

Could someone provide a replacement for kmap_load_page(page) or kmap_atomic(page) so that page memory can be read with an offset, i.e. how can the following be done in the BPF program?
struct request *req;
struct bio *bio = req->bio;
struct bio_vec *bi_ivec = bio->bio->bi_io_vec;
void *page;
page = kmap_local_page(bi_ivec->bv_page);
bpf_probe_read(&curr_io->data, bi_ivec->bv_len,
page + bi_ivec->bv_offset);
kunmap(bi_ivec->bv_page);
Above code gives error currently because I think kernel code can not be called from a BPF program just by including the header file.

Related

What is the purpose of the function "blk_rq_map_user" in the NVME disk driver?

I am trying to understand the nvme linux drivers. I am now tackling the function nvme_user_submit_cmd, which I report partially here:
static int nvme_submit_user_cmd(struct request_queue *q,
struct nvme_command *cmd, void __user *ubuffer,
unsigned bufflen, void __user *meta_buffer, unsigned meta_len,
u32 meta_seed, u32 *result, unsigned timeout)
{
bool write = nvme_is_write(cmd);
struct nvme_ns *ns = q->queuedata;
struct gendisk *disk = ns ? ns->disk : NULL;
struct request *req;
struct bio *bio = NULL;
void *meta = NULL;
int ret;
req = nvme_alloc_request(q, cmd, 0, NVME_QID_ANY);
[...]
if (ubuffer && bufflen) {
ret = blk_rq_map_user(q, req, NULL, ubuffer, bufflen,
GFP_KERNEL);
[...]
The ubufferis a pointer to some data in the virtual address space (since this comes from an ioctl command from a user application).
Following blk_rq_map_user I was expecting some sort of mmap mechanism to translate the userspace address into a physical address, but I can't wrap my head around what the function is doing. For reference here's the call chain:
blk_rq_map_user -> import_single_range -> blk_rq_map_user_iov
Following those function just created some more confusion for me and I'd like some help.
The reason I think that this function is doing a sort of mmap is (apart from the name) that this address will be part of the struct request in the struct request queue, which will eventually be processed by the NVME disk driver (https://lwn.net/Articles/738449/) and my guess is that the disk wants the physical address when fulfilling the requests.
However I don't understand how is this mapping done.
ubuffer is a user virtual address, which means it can only be used in the context of a user process, which it is when submit is called. To use that buffer after this call ends, it has to be mapped to one or more physical addresses for the bios/bvecs. The unmap call frees the mapping after the I/O completes. If the device can't directly address the user buffer due to hardware constraints then a bounce buffer will be mapped and a copy of the data will be made.
Edit: note that unless a copy is needed, there is no kernel virtual address mapped to the buffer because the kernel never needs to touch the data.

Create array of struct scatterlist from buffer

I am trying to build an array of type "struct scatterlist", from a buffer pointed by a virtual kernel address (I know the byte size of the buffer, but it may be large). Ideally I would like to have function like init_sg_array_from_buf:
void my_function(void *buffer, int buffer_length)
{
struct scatterlist *sg;
int sg_count;
sg_count = init_sg_array_from_buf(buffer, buffer_length, sg);
}
Which function in the scatterlist api, does something similar? Currently the only possibility I see, is to manually determine the amount of pages, spanned by the buffer. Windows has a kernel macro called "ADDRESS_AND_SIZE_TO_SPAN_PAGES", but I didn't even manage to find something like this in the linux kernel.

Basic mmap implementation

I am pretty much new to linux and am writing a custom driver for which I need to use mmap. I have found the sample code in many places as below
static int simple_remap_mmap(struct file *filp, struct vm_area_struct *vma)
{
if (remap_pfn_range(vma, vma->vm_start, vm->vm_pgoff,
vma->vm_end - vma->vm_start,
vma->vm_page_prot))
return -EAGAIN;
vma->vm_ops = &simple_remap_vm_ops;
simple_vma_open(vma);
return 0;
}
But I want to know where exactly does this code specify which portion of kernel memory to map e.g if I have allocated a buffer in the kernel using kmalloc pointed to by *kptr how can i mmap that buffer to user space. Thank you.

Linux kernel module - accessing memory mapping

I'm running into some odd issue on kernel module load that I'm suspecting having to do with linking and loading. How to I programmatically figure out the address of each section after they are loaded in memory (from inside the module itself). Like where is .bss / .data / .text and so on.
From reading this article
https://lwn.net/Articles/90913/
It is sorta in the directly that I'm looking for.
You can see the sections begin addresses like this from userspace (need root permissions):
sudo cat /sys/module/<modulename>/sections/.text
I have browsed how syfs retrieves this addresses, and i found the following:
There is a section attributes in struct module
309 /* Section attributes */
310 struct module_sect_attrs *sect_attrs;
This attrs is a bunch of attr structs
1296 struct module_sect_attrs {
1297 struct attribute_group grp;
1298 unsigned int nsections;
1299 struct module_sect_attr attrs[0];
1300 };
where sect attr is the thing you are looking for
1290 struct module_sect_attr {
1291 struct module_attribute mattr;
1292 char *name;
1293 unsigned long address;
From the module's code THIS_MODULE macro is actually a pointer to the struct module object. Its module_init and module_core fields point to memory regions, where all module sections are loaded.
As I understand, sections division is inaccessible from the module code(struct load_info is dropped after module is loaded into memory). But having module's file you can easily deduce section's addresses after load:
module_init:
- init sections with code (.init.text)
- init sections with readonly data
- init sections with writable data
module_core:
- sections with code (.text)
- sections with readonly data
- sections with writable data
If several sections suit to one category, they are placed in the same order, as in the module's file.
Within module's code you can also print address of any its symbol, and after calculate start of the section, contained this symbol.
While this question is five years old, I thought I would contribute my two-cents. I was able to access the kernel's sections in a sort of hack-y way inspired by Alex Hoppus' answer. I don't advocate doing things this way, unless you are writing the kernel module to debug things or understand the kernel etc.
Anyway, I copy the following two structs into my module to help resolve incomplete types.
struct module_sect_attr {
struct module_attribute mattr;
char *name;
unsigned long address;
};
struct module_sect_attrs {
struct attribute_group grp;
unsigned int nsections;
struct module_sect_attr attrs[0];
};
Then, in my module initialization function, I do the following to get the section addresses.
unsigned long text = 0;
unsigned int nsections = 0;
unsigned int i;
struct module_sect_attr* sect_attr;
nsections = THIS_MODULE->sect_attrs->nsections;
sect_attr = THIS_MODULE->sect_attrs->attrs;
for (i = 0; i < nsections; i++) {
if (strcmp((sect_attr + i)->name, ".text") == 0)
text = (sect_attr + i)->address;
}
Finally, it should be noted that if you are looking for the address of .rodata, .bss, or .data you will need to define constant global variables, uninitialized global variables, or regular global variables, respectively, if you don't want those sections to be omitted.

sysfs: free to use struct device platform_data field?

Summary: is the platform_data field of struct device free to use in a device driver module?
I am creating a very simple sysfs entry for my character device driver module to allow me to control an internal variable (because I know using ioctl() and the proc filesystem are deprecated.) I call class_create() to make a class in /sys/class/ and then device_create() to make a new device entry. Then I call device_create_file() to set up my load and store routines for the driver. I want to lock my driver in these routines. I have a mutex in my driver's main structure. Can I use the platform_data field to store a pointer to this structure like I would the private_data field of struct file in the module's open() routine or is this reserved? It's set to NULL after device_create so it would appear OK but I don't know for sure.
What I'd like to do is:
struct mymodule mymod; // main module structure, has a mutex called lockmx
static ssize_t mydev_store_val(struct device *dev,
struct device_attribute *attr,
const char *buf,size_t count)
{
struct mymodule *mymodp=(struct mymodule*)dev->platform_data;
if(mutex_lock_interruptible(&mymodp->lockmx))
return 0;
// get data from buf
mutex_unlock(&mymodp->lockmx);
return count;
}
DEVICE_ATTR(mydeva,S_IWUSR|S_IRUGO,NULL,mydev_store_val);
static int __init modinit(void)
{
...
dev_t dev; // alloc'ed already
myclass=class_create(THIS_MODULE,"myclass");
mydev=device_create(myclass,NULL,dev,NULL,"mydev");
mydev->platform_data=&mymod;
device_create_file(mydev,&dev_attr_mydeva);
...
}
So this will create the entry /sys/class/myclass/mydev/mydeva which can be written to. If the platform_data field is available then I can avoid using globals. But if it moves under me my kernel is going to oops at best and probably panic.
Such a pointer can be stored in the drvdata field (which has been cleverly hidden so that you will not see it if you look at the definition of struct device).
Initialize it through the fourth parameter of device_create, and read it with dev_get_drvdata:
mydev = device_create(myclass, NULL, dev, &mymod, "mydev");
...
struct mymodule *mymodp = dev_get_drvdata(dev);

Resources