Linux driver DMA transfer to a PCIe card with PC as master - linux-kernel

I am working on a DMA routine to transfer data from PC to a FPGA on a PCIe card. I read DMA-API.txt and LDD3 ch. 15 for details. However, I could not figure out how to do a DMA transfer from PC to a consistent block of iomem on the PCIe card. The dad sample for PCI in LDD3 maps a buffer and then tells the card to do the DMA transfer, but I need the PC to do this.
What I already found out:
Request bus master
pci_set_master(pdev);
Set the DMA mask
if (dma_set_mask(&(pdev->dev), DMA_BIT_MASK(32))) {
dev_err(&pdev->dev,"No suitable DMA available.\n");
goto cleanup;
}
Request a DMA channel
if (request_dma(dmachannel, DRIVER_NAME)) {
dev_err(&pdev->dev,"Could not reserve DMA channel %d.\n", dmachannel);
goto cleanup;
}
Map a buffer for DMA transfer
dma_handle = pci_map_single(pci_dev, buffer, count, DMA_TO_DEVICE);
Question:
What do I have to do in order to let the PC perform the DMA transfer instead of the card?
Thank your for your help!
First of all thank you for your replies. Maybe I should put my questions more precisely:
In my understanding the PC has to have a DMA controller. How do I access this DMA controller to start a transfer to a memory mapped IO region in the PCIe card?
Our specification demands that the PC's DMA controller initiates the transfer. However, I could only find examples where the device would do the DMA job (DMA_mapping.txt, LDD3 ch.15). Is there a reason, why nobody uses the PC's DMA controller (It still has DMA channels though)? Would it be better to request a specification change for our project?
Thanks for your patience.

Look up DMA_mapping.txt. There's a long section in there that tells you how to set the direction ('DMA direction', line 408).
EDIT
Ok, since you edited your question... your specification is wrong. You could set up the system DMA controller, but it would be pointless, because it's too slow, as I said in the comments. Read this thread.
You must change your FPGA to support bus mastering. I do this for a living - contact me off-thread if you want to sub-contract.

What you are talking about is not really a DMA. The DMA is when your device is accessing memory and the CPU itself is not involved (with an exception of PC's memory controller, which is usually embedded into the PC's CPU these days). Not all devices can do it, and if you are using FPGA, then you surely need some sort of DMA controller in your design (i.e. Expresso DMA Core or alike). In your case, you just have to write to the mapped memory region (i.e. one that you obtain with ioremap_nocache) using iowrite calls (i.e. iowrite32) followed by write memory barriers wmb(). What I/O bar and address you have to write to entirely depends on your device.
Hope it helps. Good Luck!

Related

How can I improve SPDK performance on userspace DMA access?

I am working on a userspace PCI driver which uses SPDK/VFIO APIs to do dma access.
Currently for each DMA allocation request I need to fill up structure spdk_vfio_dma_map then call system call ioctl(fd, VFIO_IOMMU_MAP_DMA, &dma_map) to map the DMA region through IOMMU. Then later call ioctl(fd, VFIO_IOMMU_UNMAP_DMA, &dma_map) to unmap the IOMMU mapping.
This is working fine so far and looks like it's what SPDK examples are using. However I am wondering if there is a way to pre-allocate all memory buffer in userspace then in each DMA allocation request just use the pre-allocated memory instead of doing ioctl call each time?
Any idea is well appreciated.
Don't know if I get the issue but the whole idea (of DPDK and SPDK) is to allocate all the memory you are using on application start or driver probe.
If you are using memory that is under application control all the time then you don't need to do VFIO_IOMMU_MAP_DMA and VFIO_IOMMU_UNMAP_DMA every DMA transaction. If this is not the case you have two options:
Do the VFIO_IOMMU_MAP_DMA and VFIO_IOMMU_UNMAP_DMA for every IO
Copy the payload to the memory that is already registered in VFIO_IOMMU_MAP_DMA.
First option is better for huge memory blocks, while second is better for small IO chunks.

Can I specify what dma physical address to use in Linux kernel instead of requesting one using DMA APIs?

Instead of using dma_map_single() or kmalloc() and dma_map_sg() to allocate a CPU accessible buffer then obtain IOMMU mapped dma address, can I specify a specific dma_addr_t type dma address and pass it to kernel to use?
The reason I have to do this is my customized hardware provides a way to calculate IOMMU mapped dma address that is available for device driver to use, but I am not sure if I can correlate this to CPU virtual memory. dma_map_sg() and dma_map_single() work in my case but I have no control over the dma address it returns (I would like to check a specific bit in dma address, and only use the address when the bit is set).
I have checked several APIs looks like dma_map_sg() might be able to do so... any idea is well appreciated.

Mapping external memory device

I am using the GCC toolchain and the ARM Cortex-M0 uC. I would like to ask if it is possible to define a space in the linker so that the reading and writing operations would call the external device driver functions for reading and writing it's space (eg. SPI memory). Can anyone give some hints how to do it?
Regards, Rafal
EDIT:
Thank you for your comments and replies. My setup is:
The random access SPI memory is connected via SPI controller and I use a "standard" driver to access the memory space and store/read data from it.
What I wanted to do is to avoid calling the driver's functions explicitly, but to hide them behind some fixed RAM address, so that any read of that address would call the spi read memory driver function and write would call the spi write memory function (the offset of the initial address would be the address of the data in the external memory). I doubt that it is at all possible in the uC without the MMU, but I think it is always worth to ask someone else who might have had similar idea.
No, this is not how it works. Cortex-M0 has no memory management Unit, and is therefore unable to intercept accesses to specific memory regions.
It's not really clear what you are trying to achieve. If you have connected SPI memory external to the chip, you have to perform all the accesses using a driver, it is not possible to memory map the SPI port abstraction.
If this is an on-device SPI memory controller, it will have two regions in the memory map. One will be the 'memory'region, and will probably behave read-only, one with be the control registers for the memory controller hardware, and it is these registers which the device driver talks to. Specifically, to write to the SPI, you need to perform driver accesses to perform the write.
In the extreme case, (for example Cortex-M1 for Xilinx), there will be an eXecute In Place (XIP) peripheral for the memory map behaviour, and a SPI Master device for the read/write functionality. A GPIO pin is used to multiplex the SPI EEPROM pins between 'memory mode' and çonfiguration mode'.

Physical Memory Allocation in Kernel

I am writting a Kernel Module that is going to trigger and external PCIe device to read a block of data from my internel memory. To do this I need to send the PCIe device a pointer to the physical memory address of the data that I would like to send. Ultimately this data is going to be written from Userspace to the kernel with the write() function (userspace) and copy_from_user() (kernel space). As I understand it, the address that my kernel module will see is still a virtual memory address. I need a way to get the physical address of it so that the PCIe device can find it.
1) Can I just use mmap() from userspace and place my data in a known location in DDR memory, instead of using copy_from_user()? I do not want to accidently overwrite another processes data in memory though.
2) My kernel module reserves PCIe data space at initialization using ioremap_nocache(), can I do the same from my kernel module or is it a bad idea to treat this memory as io memory? If I can, what would happen if the memory that I try to reserve is already in use? I do not want to hard code a static memory location and then find out that it is in use.
Thanks in advance for you help.
You don't choose a memory location and put your data there. Instead, you ask the kernel to tell you the location of your data in physical memory, and tell the board to read that location. Each page of memory (4KB) will be at a different physical location, so if you are sending more data than that, your device likely supports "scatter gather" DMA, so it can read a sequence of pages at different locations in memory.
The API is this: dma_map_page() to return a value of type dma_addr_t, which you can give to the board. Then dma_unmap_page() when the transfer is finished. If you're doing scatter-gather, you'll put that value instead in the list of descriptors that you feed to the board. Again if scatter-gather is supported, dma_map_sg() and friends will help with this mapping of a large buffer into a set of pages. It's still your responsibility to set up the page descriptors in the format expected by your device.
This is all very well written up in Linux Device Drivers (Chapter 15), which is required reading. http://lwn.net/images/pdf/LDD3/ch15.pdf. Some of the APIs have changed from when the book was written, but the concepts remain the same.
Finally, mmap(): Sure, you can allocate a kernel buffer, mmap() it out to user space and fill it there, then dma_map that buffer for transmission to the device. This is in fact probably the cleanest way to avoid copy_from_user().

Convert DMA mapping to virtual address

I have a somewhat unusual situation where I'm developing a simulation module for an Ethernet device. Ideally, the simulation layer would just be identical to the real hardware with regard to the register set. The issue I've run into is that the DMA registers in the hardware are loaded with the DMA mapping (physical) address of the data. I need to use those physical addresses to copy the data from the Tx buffer on the source device to the Rx buffer on the destination device. To do that in module code, I need pointers to virtual memory. I looked at phys_to_virt() and I didn't understand this comment in the man page:
This function does not handle bus mappings for DMA transfers.
Does this mean that a physical address that is retrieved via dma_map_single cannot be converted back to a virtual address using phys_to_virt()? Is there another way to accomplish this conversion?
There is not any general way to map a DMA address to a virtual address. The dma_map_single() function might be programming an IOMMU (eg VT-d on an Intel x86 system), which results in a DMA address that is completely unrelated to the original physical or virtual address. However this presentation and the linked slides gives one approach to hooking an emulated hardware model up to a real driver (basically, use virtualization).
I am not too clear about this question but if you are using "phys_to_virt()" may be the reason that address available on the bus can not be coverted to virtual by this function. I am not sure just try bus_to_virt(bus_addr); function
Try dma_virt = virt_to_phys(bus_to_virt(dma_handle))
it worked for me. It gives the same virtual address that was mapped by dma_coherent_alloc().

Resources