How can my PCI device driver remap PCI memory to userspace? - memory-management

I am trying to implement a PCI device driver for a virtual PCI device on QEMU. The device defines a BAR region as RAM, and the driver can do ioremap() this region and access it without any issues. The next step is to assign this region (or a fraction of it) to a user application.
To do this, I have also implemented an .mmap function as part of my driver file operations. This mmap is simply using remap_pfn_range, but it also passes the pfn of the memory pointer returned by the ioremap() earlier.
However, upon running the user space application, the mmap is successful, but when the app tries to access the memory, it gets killed and I get the following dmesg errors.
"
a.out: Corrupted page table at address 7f66248b8000
..Some page table info..
Bad pagetable: 000f [#2] SMP NOPTI
..and the core dump..
"
Does anyone know what have I done wrong? Did I missed a step? Or it could be an error specific to QEMU?
I am running x86_softmmu as my QEMU configuration and my kernel is the 4.14

I've solved this issue and managed to map PCI memory to user space via the driver. As #IanAbbott implied, I've changed the pfn input of the remap_pfn_range() function I was using in my custom ->mmap().
The original was:
io_remap_pfn_range(vma, vma->vm_start, pfn, vma->vm_end - vma->vm_start, vma->vm_page_prot));
where the pfn was the result of the buffer pointer return from the ioremap(). I changed the pfn to:
pfn = pci_resource_start(pdev, BAR) >> PAGE_SHIFT;
That basically points to the actual starting address pointed by the BAR. My working remap_pfn_range() function is now:
io_remap_pfn_range(vma, vma->vm_start, pci_resource_start(pdev, BAR) >> PAGE_SHIFT, vma->vm_end - vma->vm_start,vma->vm_page_prot);
I confirmed that it works by doing some dummy writes to the buffer pointer in my driver, then picking up the reads and doing some writes in my user space application.

Related

Get PFN from DMA address (dma_addr_t)?

I would like to get the PFN associated with a memory block allocated with dma_alloc_coherent for use with a PCIe device as shown below:
unsigned long pfn;
buffer = dma_alloc_coherent(&pcie->dev, size, &bus_addr, GFP_KERNEL);
// Get PFN?
virt_to_phys(buffer) >> PAGE_SHIFT;
I'm aware that this is probably not the correct method, but it seems to work... I'm just looking for the right solution to translate the potential bus address (since I do not know if there is an IOMMU) to a PFN. Thanks in advance.
Note: There seems to be an ARM function in the kernel called dma_to_pfn, which seems to be exactly what I need, but for x86.
What you're doing is indeed wrong. From the man page for virt_to_phys():
This function does not give bus mappings for DMA transfers. In almost all conceivable cases a device driver should not be using this function.
The equivalent function for DMA addresses is dma_to_phys(), defined in include/linux/dma-direct.h as follows:
phys_addr_t dma_to_phys(struct device *dev, dma_addr_t daddr);
Therefore you can do:
dma_to_phys(&pcie->dev, bus_addr) >> PAGE_SHIFT;
Notice that I am using the bus_addr returned by dma_alloc_coherent(), not buffer, since you obviously need to pass a DMA address (dma_addr_t) to this function, not a virtual address.
There also seems to be a macro PHYS_PFN() defined in include/linux/pfn.h to get the PFN for a given physical address, if you prefer to use that:
PHYS_PFN(dma_to_phys(&pcie->dev, bus_addr));

io_remap_pfn_range issue on powerpc

I'd like to access to PCIe IO from userland.
In the module driver, I'm able to write/read using the pointer returned by ioremap () without any problem.
From userland, I want to use the pointer returned by mmap () but the host hangs whatever I write or read on the PCIe bus.
I implemented the mmap call in the file operation structure which calls io_remap_pfn_range(vma, vma->vm_start, start >> PAGE_SHIFT, vma->vm_end - vma->vm_start, vma->vm_page_prot); where start is the value returned by pci_resource_start ().
What did I miss ?
Note that my module works fine on x86.
Thanks,
Fred
The POWER architecture does not support PCIe IO accesses; you'll need to use PCIe memory cycles instead. It's likely that your PCIe device has a corresponding resource for MMIO space, perhaps you can use that.
Also, depending on your usage, you may want to perform accesses on the resource<N> files in sysfs, under /sys/bus/pci/devices/<id>. This might mean that you don't need any kernel code at all.

PCIe driver error for enabling device and allocating memory

I'm using PCIe bus on Freescale MPC8308 (as root complex) and the endpoint device is an ASIC with just one 256 MB memory region and just one BAR register. The device configuration space registers are readily accessible through "pciutils" package. At first I tried to access memory region by using mmap() but it didn't work. So at the next level, I prepared a device driver for the PCIe endpoint device which is a kernel module that I load into kernel after Linux booting.
In my driver the endpoint device is identified from device ID table but when I want to enable the device by pci_enable_device(), I see this error:
driver-pci 0000:00:00.0: device not available because of BAR 0 [0x000000-0xfffffff] collisions
Also when I want to allocate memory region for PCIe device by using pci_request_region(), it is not possible.
Here is the part of driver code which is not working:
pci_enable_result = pci_enable_device (pdev);
if (pci_enable_result)
{
printk(KERN_INFO "PCI enable encountered a problem \n");
return pci_enable_result;
}
else
{
printk(KERN_INFO "PCI enable was succesfull \n");
}
And here is the result in "dmesg" :
driver-pci 0000:00:00.0: device not available because of BAR 0 [0x000000-0xfffffff] collisions
PCI enable encountered a problem
driver-pci: probe of 0000:00:00.0 failed with error -22
It is worth noting that in the driver I can read and write configuration registers correctly by using functions like pci_read_config_dword() and pci_write_config_dword().
What's the problem do you think? is it possible that the problem appears because the kernel initializes the device prior to kernel module? what should I do to prevent this to occur?
BAR registers access are generally for small region. Your BAR0 size seems to be too large. Try with less memory (less than 1MB), it should works.

memory map a device file in linux kernel

In linux kernel, when I do
cat /proc/pid/maps
I get some entries that map the files in /dev/XXX. I understand this is the device file, which corresponds to hardware devices instead of actual files. How does the memory management in linux kernel handle such mapping? What happens if I read or write to /dev/XXX?
This is what I believe is correct: When a device maps a particular memory region, say X to Y, a vm_area_struct (vm_start = X and vm_end = Y) is associated with that region and then mapped into the process's virtual memory map. That vm_are_struct is then associated with a vm_operations_struct that supplied by the device driver.
That means a device driver will implement some or all the functions in the vm_operations_struct. The most important one being the fault function.
Now, assume the process references a page in that area for the first time. That will trigger the page fault handler. The page fault has this code in it:
3646 if (pte_none(entry)) {
3647 if (vma->vm_ops) {
3648 if (likely(vma->vm_ops->fault))
3649 return do_linear_fault(mm, vma, address,
3650 pte, pmd, flags, entry);
3651 }
3652 return do_anonymous_page(mm, vma, address,
3653 pte, pmd, flags);
3654 }
Notice line #3648. It checks if the vm_operations_struct is implemented for that vm_area_struct. If it is, it will call the "fault" member function. Look at some of the implementations for that function.
The implementation of this function should return a pointer to page struct. The page itself will be allocated and populated with data by the device driver (i.e. "fault" function).
The rest of the page handler will associate a pte entry with that page. That means the next time the page is accessed and causes a page fault the check pte_none on line #3646 will fail. This will result in skipping the device driver fault function the second time that particular page is referenced.
Note: a vm_area_struct may be mapped to more than one page struct. So, one vm_area_struct may fit N*4KB. Assuming PAGE_SIZE is 4KB and N=1,2,...,X

User space mmap and driver space mmap point to different addresses..?

[I am a newbie to device driver programming, so requesting people to be patient]
I am writing a character device driver, and I am trying to mmap some portion of the allocated memory in the driver to the user space.
In the init_module() function, I allocate the some buffer space like this -
buf = (char*)vmalloc_user(SIZE_OF_BUFFER);
buf now points to some address.
Now, in the driver's mmap function, I set the VM_RESERVED flag, and call
remap_vmalloc_range(vma, (void*)buf, 0);
Then I create a character device file in /dev with the correct major number.
Now I create a simple program in the user space to open the character device file, then call mmap() and read data from this mmap'ed memory.
In the call to mmap() in userspace, I know there is an option where we can pass the start address of the area. But is there a way the user space mmap can point to the same address as done by the buf in the driver space?
I think that because the address of buf in the driver space is different from the one returned by mmap() in the user space, my user space program ends up reading junk values. Is there any other way than actually entering the address in the mmap() in the user space to solve this problem?
You pretty much have to design your driver interface so that the userspace map address doesn't matter. This means, for example, not storing pointers in an mmap region that's accessed outside of a single userspace process.
Typically, you'd store offsets from the base mapped address instead of full pointers. The kernel driver and userspace code can both add these offsets to their base pointers, and get to the virtual address that's right for their respective contexts.

Resources