Configure DMA in a Linux Kernel Module - linux-kernel

for my application I would to send some data allocated in RAM to PWM fifo through DMA in Kernel Space.
I would to use DMA to generate an Interrupt when the data vector is completed, so to load next one vector and trigger other behavior...
I read "Linux Device Drivers" 3rd edition from O'Reilly but I'm a bit confused about using DMA Engine.
I would ask which step I have to follow to start a DMA transaction Memory-to-Device (PWM) with Interrupt callback?
EDIT 1:
I need to learn how to use Linux DMA API for my case (memory -> pwm fifo), in kernel space.

I have sumbit a patch to improve ethernet performance by using dma engine. In this patch, driver is able to move packets from rx fifo to RAM (from device to mem. ) So you can get some infomation about using dma engine in linux kernel from this patch. sun4i-emac.c: add dma support
steps:
request an dma channel (api: dma_request_chan)
setup dma channel (api: dmaengine_slave_config)
map data buf to dma region (api: dma_map_single)
prepare for transfer (api: dmaengine_prep_slave_single)
submit dma transfer request (api: dmaengine_submit)
launch! (api: dma_async_issue_pending)

Related

Linux PCIe DMA driver

I'm currently writing a driver for a PCIe device that should send data to a Linux system using DMA. As far as I can understand my PCIe device needs a DMA controller (DMA master) and my Linux system too (DMA slave). Currently the PCIe device has no DMA controller and should not get one. That confuses me.
A. Is the following possible?
PCIe device sends interrupt
Wait for interrupt in the Linux driver
Start DMA transfer from memory mapped PCIe registers to Linux system DMA.
Read the data from memory in userspace
I have everything setup for this, the only thing I miss is how to transfer the data from the PCIe registers to the memory.
B. Which system call (or series of) do I need to call to do a DMA transfer?
C. I probably need to setup the DMA on the Linux system but what I find points to code that assumes there is a slave, e.g. struct dma_slave_config.
The use case is collecting data from the PCIe device and make it available in memory to userspace.
Any help is much appreciated. Thanks in advance!
DMA, by definition, is completely independent of the CPU and any software (i.e. OS kernel) running on it. DMA is a way for devices to perform memory reads and writes against host memory without the involvement of the host CPU.
The way DMA usually works is something like this: software will allocate a DMA accessible region in memory and share the physical address with the device, say, by performing memory writes against the address space associated with one of the device's BARs. Then, the device will perform a DMA read or write against that block of memory. When that operation is complete, the device will issue an interrupt to the device driver so it can handle the data and/or free the memory.
If your device does not have the capability of issuing a DMA read or write against host memory, then you'll have to interact with it with using the CPU only. Discrete DMA controllers have not been a thing for a very long time.

How is DMA cache coherency kept on Intel chipsets?

I was reading something a few months ago about windows chipset iterations and PCH upgrades between them and I'm pretty sure I saw something on DMA cache coherency and that it involves the home agent or QHL (Nehalem) but I can't find it now.
So I ask if anyone knows the details of any method of DMA cache coherency that has been employed by Intel and how it works.
Nehalem's global queue on the optimisation manual:
Cacheline requests from the cores or from a remote package or the
I/O Hub are handled by the GQ.
The global queue checks to see if the line is on the package and if it is, it snoops the appropriate cores using the core valid bits. If this is a dual socket system then the request will be sent to the QHL (Home agent on SnB) if home snoop is being used which will then send to the QPI link that the NUMA node bitmap refers to. If source snoop is being used then the GQ will check its own 2 bit i/o directory cache in order to generate a message for the correct QPI link the QHL (QPI agent on SnB) must generate another message to the correct LLC that has been assigned that address range. I'm not sure what happens on COD mode on Haswell or SNC on the mesh architecture.

Is it possible to set the dma buffer address for a network card?

My understanding of network cards is that when receiving data, that data is DMA'd into main memory through the network card driver. The kernel then copies this memory into user space and sends any necessary messages.
My question is, in Windows, is it possible to set the address that the DMA is writing to? My goal is to eliminate the extra memory copy similar to the way NVidia's GPUDirect pipeline works.
Yes, this is possible. I believe this is called "common buffer DMA". It is used for intelligent network adapters. Taking advantage of this would require writing your own network driver. Here is some microsoft documentation on it. http://msdn.microsoft.com/en-us/library/windows/hardware/ff565359%28v=vs.85%29.aspx

Can hardware registers be mapped to userspace

I'm developing an LED driver on Freescale MPC8306. In driver code, I do ioremap on GPIO registers and call remap_pfn_range upon the remapped GPIO register address, then, call mmap in userspace to map the GPIO register to userspace. I haven't done this before and I want to know if this method work or not. Can some help me? Thanks in Advance.
You should be using /dev/mem interface for accessing the GPIO registers. A good reference for controlling LEDs via GPIOs on another embedded board is given here.
An easier way would probably just to mmap the relevant offset of /dev/mem in your userspace program directly. This allows you to access the physical memory layout by seeking into it.
AFAIK, this is what the RaspberryPi developers have done to make GPIO memory mapped I/O registers available to userspace programs.

Memory Alignment for a DMA transaction (Windows Driver Foundation)

We are writing a DMA-based driver for a custom made PCI-Express device using WDF for Windows 7.
As you may know, PCI-Express bus transactions are not allowed to cross a 4k memory boundary. The custom device does not check this, and therefore we need to ensure that the driver only requests DMA transfers which are aligned to 4k memory boundaries.
The profile for the device is WdfDmaProfilePacket64.
We tried using WdfDeviceSetAlignmentRequirement(DevExt->Device, 4095), but this does not result in the DMA start address to be properly aligned.
How can we configure the WDF framework so that it only requests properly aligned addresses?
you can handle this in user space application, somehow that you initiate/allocate an aligned memory in user space and then send it to kernel program. it is not easy for a driver to align a memory which already allocated and initiated. even in user-space application we have to allocating extra space and then using the aligned part(I know, it's not pretty, that's why i recommend to solve this problem in device side)
for example if you use C++ for your user-space application you can do something like this

Resources