How to get device from cdev - linux-kernel

I am writing a kernel module that will allocate some coherent memory and return the corresponding virtual and physical addresses.
I am registering the module as cdev, allocating space with dma_alloc_coherent() and I wanted to mmap it using dma_common_mmap().
dma_common_mmap() requires a pointer to struct device: how could I obtain it?

void *dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *dma_handle, int flag);
This function handles both the allocation and the mapping of the buffer. The first two arguments are the device structure and the size of the buffer needed.
The function returns the result of the DMA mapping in two places. The return value from the function is a kernel virtual address for the buffer, which may be used by the driver.
The associated bus address, meanwhile, is returned in dma_handle.

Take a look at
int cdev_device_add(struct cdev *cdev, struct device *dev)
You can find an working example at
linux-source/drivers/gpio/gpiolib.c

Related

Userspace virtual address to kernel virtual address translation

I have an userspace process that allocates memory from the huge pages. This memory needs to be shared with kernel space threads and work queues. To do so I'm using ioctl to register the process memory.
In IOCTL:
take struct ring address from usersapce and do the vmap() the pages pages.
for memory blocks that can show up in the ring I pin_user_pages with LONG_TERM flag.
And here are the questions I have:
If I do the vmap on struct ring pointer can I (should I) unpin_user_page referring to it?
Should I vmap buffers that are going to be placed in the ring? Or pinning user pages is enough?
How to translate (remote to the kthread) userspace address to kernel address which can be accessed from kernel?
For second and third question:
So far I analyzed the code of the pin_user_pages_remote function. It ends up in the __get_user_pages_locked. In this function there is piece of code:
pages[i] = virt_to_page((void *)start);
if (pages[i])
get_page(pages[i]);
Does this mean that when pages are pinned in ioctl then in kthread I can:
u64 offset = user_address & (PAGE_SIZE - 1);
struct page page = virt_to_page(user_address);
u64 phys = page_to_phys(page);
void *kva = phys_to_virt(phys) + offset;
Does this need to be vmaped to do this trick?
Or there is much simpler method for doing so?

Get PFN from DMA address (dma_addr_t)?

I would like to get the PFN associated with a memory block allocated with dma_alloc_coherent for use with a PCIe device as shown below:
unsigned long pfn;
buffer = dma_alloc_coherent(&pcie->dev, size, &bus_addr, GFP_KERNEL);
// Get PFN?
virt_to_phys(buffer) >> PAGE_SHIFT;
I'm aware that this is probably not the correct method, but it seems to work... I'm just looking for the right solution to translate the potential bus address (since I do not know if there is an IOMMU) to a PFN. Thanks in advance.
Note: There seems to be an ARM function in the kernel called dma_to_pfn, which seems to be exactly what I need, but for x86.
What you're doing is indeed wrong. From the man page for virt_to_phys():
This function does not give bus mappings for DMA transfers. In almost all conceivable cases a device driver should not be using this function.
The equivalent function for DMA addresses is dma_to_phys(), defined in include/linux/dma-direct.h as follows:
phys_addr_t dma_to_phys(struct device *dev, dma_addr_t daddr);
Therefore you can do:
dma_to_phys(&pcie->dev, bus_addr) >> PAGE_SHIFT;
Notice that I am using the bus_addr returned by dma_alloc_coherent(), not buffer, since you obviously need to pass a DMA address (dma_addr_t) to this function, not a virtual address.
There also seems to be a macro PHYS_PFN() defined in include/linux/pfn.h to get the PFN for a given physical address, if you prefer to use that:
PHYS_PFN(dma_to_phys(&pcie->dev, bus_addr));

mmap query on linux platform

On Linux machine, trying to write driver and trying to map some kernel memory to application for performance gains.
Checking driver implementation for mmap online, finding different varieties of implementation.
As per man pages, mmap - creates new mapping in virtual address space of calling process.
1) Who allocates physical address space during mmap calling? kernel or device driver?
seen following varieties of driver mmap implementation.
a) driver creates contiguous physical kernel memory and maps it with process address space.
static int driver_mmap(struct file *filp, struct vm_area_struct *vma)
{
unsigned long size = vma->vm_end - vma->vm_start;
pos = kmalloc(size); //allocate contiguous physical memory.
while (size > 0) {
unsigned long pfn;
pfn = virt_to_phys((void *) pos) >> PAGE_SHIFT; // Get Page frame number
if (remap_pfn_range(vma, start, pfn, PAGE_SIZE, PAGE_SHARED)) // creates mapping
return -EAGAIN;
start += PAGE_SIZE;
pos += PAGE_SIZE;
size -= PAGE_SIZE;
}
}
b) driver creates virtual kernel memory and maps it with process address space.
static struct vm_operations_struct dr_vm_ops = {
.open = dr_vma_open,
.close = dr_vma_close,
};
static int driver_mmap(struct file *filp, struct vm_area_struct *vma)
{
unsigned long size = vma->vm_end - vma->vm_start;
void *kp = vmalloc(size);
unsigned long up;
for (up = vma->vm_start; up < vma->vm_end; up += PAGE_SIZE) {
struct page *page = vmalloc_to_page(kp); //Finding physical page from virtual address
err = vm_insert_page(vma, up, page); //How is it different from remap_pfn_range?
if (err)
break;
kp += PAGE_SIZE;
}
vma->vm_ops = &dr_vm_ops;
ps_vma_open(vma);
}
c) not sure who allocates memory in this case.
static int driver_mmap(struct file *filp, struct vm_area_struct *vma)
{
if (remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff,
vma->vm_end - vma->vm_start,
vma->vm_page_prot)) // creates mapping
return -EAGAIN;
}
2) If kernel allocates memory for mmap, is n't memory wasted in a&b cases?
3) remap_pfn_range is to map multiple pages where as vm_insert_page is just for single page mapping. is it the only difference b/w these two APIs?
Thank You,
Gopinath.
Which you use depends on what you're trying to accomplish.
(1) A device driver is part of the kernel so it doesn't really make sense to differentiate that way. For these cases, the device driver is asking for memory to be allocated for its own use from the (physical) memory resources available to the entire kernel.
With (a), a physically contiguous space is being allocated. You might do this if there is some piece of external hardware (a PCI device, for example) that will be reading or writing that memory. The return value from kmalloc already has a mapping to kernel virtual address space. remap_pfn_range is being used to map the page into the user virtual address space of the current process as well.
For (b), a virtually contiguous space is being allocated. If there is no external hardware involved, this is what you would typically use. There is still physical memory being allocated to your driver, but it isn't guaranteed that the pages are physically contiguous -- hence fewer constraints on which pages can be allocated. (They will still be contiguous in kernel virtual address space.) And then you are simply using a different API to implement the same kind of mapping into user virtual address space.
For (c), the memory being mapped is allocated under control of some other subsystem. Thevm_pgoff field has already been set to the base physical address of the resource. For example, the memory might correspond to a PCI device's address region (a network interface controller's registers, say) where that physical address is determined/assigned by your BIOS (or whatever mechanism your machine uses).
(2) Not sure I understand this question. How can the memory be "wasted" if it's being used by the device driver and a cooperating user process? And if the kernel needs to read and write the memory, there must be kernel virtual address space allocated and it needs to be mapped to the underlying physical memory. Likewise, if the user space process is to access the memory, there must be user virtual address space allocated and that must be mapped to the physical memory as well.
"Allocating virtual address space" essentially just means allocating page table entries for the memory. That is done separately from actually allocating the physical memory. And it's done separately for kernel space and user space. And "mapping" means setting the page table entry (the virtual address of the beginning of the page) to point to the correct physical page address.
(3) Yes. They are different APIs that accomplish much the same thing. Sometimes you have a struct page, sometimes you have a pfn. It can be confusing: there are often several ways to accomplish the same thing. Developers typically use the one most obvious for the item they already have ("I already have a struct page. I could calculate its pfn. But why do that when there's this other API that accepts a struct page?").

What does actually cdev_add() do? in terms of registering a device to the kernel

What does cdev_add() actually do? I'm asking terms of registering a device with the kernel.
Does it add the pointer to cdev structure in some map which is indexed by major and minor number? How exactly does this happen when you say the device is added/registered with the kernel. I want to know what steps the cdev_add takes to register the device in the running kernel. We create a node for user-space using mknod command. Even this command is mapped using major and minor number. Does registration also do something similar?
cdev_add registers a character device with the kernel. The kernel maintains a list of character devices under cdev_map
static struct kobj_map *cdev_map;
kobj_map is basically an array of probes, which in this case is the list of character devices:
struct kobj_map {
struct probe {
struct probe *next;
dev_t dev;
unsigned long range;
struct module *owner;
kobj_probe_t *get;
int (*lock)(dev_t, void *);
void *data;
} *probes[255];
struct mutex *lock;
};
You can see that each entry in the list has the major and minor number for the device (dev_t dev), and the device structure (in the form of kobj_probe_t, which is a kernel object, which represents a cdev in this case). cdev_add adds your character device to the probes list:
int cdev_add(struct cdev *p, dev_t dev, unsigned count)
{
...
error = kobj_map(cdev_map, dev, count, NULL,
exact_match, exact_lock, p);
When you do an open on a device from a process, the kernel finds the inode associated to the filename of your device (via namei function). The inode has the major a minor number for the device (dev_t i_rdev), and flags (imode) indicating that it is a special (character) device. With this it can access the cdev list I explained above, and get the cdev structure instantiated for your device. From there it can create a struct file with the file operations to your cdev, and install a file descriptor in the process's file descriptor table.
This is what actually 'registering' a character device means and why it needs to be done. Registering a block device is similar. The kernel maintains another list for registered gendisks.
You can read Linux Device Driver. It is a little bit old, but the main ideas are the same. It is difficoult to explain a simple operation like cdev_add() and all the stuff around in few lines.
I suggest you to read the book and the source code. If you have trouble to navigate your source code, you can use some tag system like etags + emacs, or the eclipse indexer.
Please see the code comments here:
cdev_add() - add a char device to the system 464 *
#p: the cdev structure for the device 465 * #dev: the first device
number for which this device is responsible 466 * #count: the number
of consecutive minor numbers corresponding to this 467 *
device 468 * 469 * cdev_add() adds the device represented by #p to
the system, making it 470 * live immediately. A negative error code
is returned on failure. 471 */ `
the immediate answer to any such question is read the code. Thats what Linus say.
[edit]
the cdev_add basically adds the device to the system. What it means essentially is that after the cdev_add operation your new device will get visibility through the /sys/ file system. The function does all the necessary house keeping activities related to that particularly the kobj reference to your device will get inserted at its position in the object hierarchy. If you want to get more information about it, I would suggest some reading around /sysfs/ and struct kboj

How to get a struct page from any address in the Linux kernel

I have existing code that takes a list of struct page * and builds a descriptor table to share memory with a device. The upper layer of that code currently expects a buffer allocated with vmalloc or from user space, and uses vmalloc_to_page to obtain the corresponding struct page *.
Now the upper layer needs to cope with all kinds of memory, not just memory obtained through vmalloc. This could be a buffer obtained with kmalloc, a pointer inside the stack of a kernel thread, or other cases that I'm not aware of. The only guarantee I have is that the caller of this upper layer must ensure that the memory buffer in question is mapped in kernel space at that point (i.e. it is valid to access buffer[i] for all 0<=i<size at this point). How do I obtain a struct page* corresponding to an arbitrary pointer?
Putting it in pseudo-code, I have this:
lower_layer(struct page*);
upper_layer(void *buffer, size_t size) {
for (addr = buffer & PAGE_MASK; addr <= buffer + size; addr += PAGE_SIZE) {
struct page *pg = vmalloc_to_page(addr);
lower_layer(pg);
}
}
and I now need to change upper_layer to cope with any valid buffer (without changing lower_layer).
I've found virt_to_page, which Linux Device Drivers indicates operates on “a logical address, [not] memory from vmalloc or high memory”. Furthermore, is_vmalloc_addr tests whether an address comes from vmalloc, and virt_addr_valid tests if an address is a valid virtual address (fodder for virt_to_page; this includes kmalloc(GFP_KERNEL) and kernel stacks). What about other cases: global buffers, high memory (it'll come one day, though I can ignore it for now), possibly other kinds that I'm not aware of? So I could reformulate my question as:
What are all the kinds of memory zones in the kernel?
How do I tell them apart?
How do I obtain page mapping information for each of them?
If it matters, the code is running on ARM (with an MMU), and the kernel version is at least 2.6.26.
I guess what you want is a page table walk, something like (warning, not actual code, locking missing etc):
struct mm_struct *mm = current->mm;
pgd = pgd_offset(mm, address);
pmd = pmd_offset(pgd, address);
pte = *pte_offset_map(pmd, address);
page = pte_page(pte);
But you you should be very very careful with this. the kmalloc address you got might very well be not page aligned for example. This sounds like a very dangerous API to me.
Mapping Addresses to a struct page
There is a requirement for Linux to have a fast method of mapping virtual addresses to physical addresses and for mapping struct pages to their physical address. Linux achieves this by knowing where, in both virtual and physical memory, the global mem_map array is because the global array has pointers to all struct pages representing physical memory in the system. All architectures achieve this with very similar mechanisms, but, for illustration purposes, we will only examine the x86 carefully.
Mapping Physical to Virtual Kernel Addresses
any virtual address can be translated to the physical address by simply subtracting PAGE_OFFSET, which is essentially what the function virt_to_phys() with the macro __pa() does:
/* from <asm-i386/page.h> */
132 #define __pa(x) ((unsigned long)(x)-PAGE_OFFSET)
/* from <asm-i386/io.h> */
76 static inline unsigned long virt_to_phys(volatile void * address)
77 {
78 return __pa(address);
79 }
Obviously, the reverse operation involves simply adding PAGE_OFFSET, which is carried out by the function phys_to_virt() with the macro __va(). Next we see how this helps the mapping of struct pages to physical addresses.
There is one exception where virt_to_phys() cannot be used to convert virtual addresses to physical ones. Specifically, on the PPC and ARM architectures, virt_to_phys() cannot be used to convert addresses that have been returned by the function consistent_alloc(). consistent_alloc() is used on PPC and ARM architectures to return memory from non-cached for use with DMA.
What are all the kinds of memory zones in the kernel? <---see here
For user-space allocated memory, you want to use get_user_pages, which will give you the list of pages associated with the malloc'd memory, and also increment their reference counter (you'll need to call page_cache_release on each page once done with them.)
For vmalloc'd pages, vmalloc_to_page is your friend, and I don't think you need to do anything.
For 64 bit architectures, the answer of gby should be adapted to:
pgd_t * pgd;
pmd_t * pmd;
pte_t * pte;
struct page *page = NULL;
pud_t * pud;
void * kernel_address;
pgd = pgd_offset(mm, address);
pud = pud_offset(pgd, address);
pmd = pmd_offset(pud, address);
pte = pte_offset_map(pmd, address);
page = pte_page(*pte);
// mapping in kernel memory:
kernel_address = kmap(page);
// work with kernel_address....
kunmap(page);
You could try virt_to_page. I am not sure it is what you want, but at least it is somewhere to start looking.

Resources