Why Linux device driver execute unmap to clean cache after DMA transmission

Why Linux device driver execute unmap to clean cache after DMA transmission - caching

When writing standalone driver, I flush/clean cache before I use DMA to transmit data to peripheral device, I need not flush/invalidate/clean cache after the transmission is completed.
But for Linux kernel device driver, we call dma_map_single() before transmission, and call dma_unmap_single() after transmission. I think it is not necessary to call dma_unmap_single() to clean cache after transmission. Does it do some job for MMU?
For example， here is the code block for transmission in drivers\net\ethernet\cadence\macb_main.c.
macb_tx_map( )
{
.......
mapping = dma_map_single(&bp->pdev->dev,
skb->data + offset,
size, DMA_TO_DEVICE);
.......
}
macb_tx_unmap( )
{
.......
dma_unmap_single(&bp->pdev->dev, tx_skb->mapping,
tx_skb->size, DMA_TO_DEVICE);
.......
}
I check the code and find they both call __dma_clean_area in arch\arm64\mm\cache.S.

Related

Am i using the hardware driver while using system call read/write?

i have connected a hardware to an embedded linux board on i2c lines.
I can see the device at /dev/i2c-1
filename = "/dev/i2c-1"
filehandle = open(filename,O_RDWR);
write(filehandle, <buffer to be written>, <number of Bytes>)
(similiarly for read = read(filehandle, <buffer to be read in an array>, <number of Bytes>)
Now my question here is am I using the Linux's i2c-drivers ( read/write) when I am invoking write system calls (and read like above using filehandle).
Also is this implementation independent of i2c module?I verified only after I do modprobe i2c_dev I can see my code running.
Is modprobe i2c_dev loading the i2c module and forming the /dev/i2c-1 in the /dev directory since I have connected the i2c device to it.

User-space interface to I2C
User-space interface to I2C subsystem is provided via /dev/i2c-* files and documented at Documentation/i2c/dev-interface. There are two ways to send I2C message:
send plain buffer via write(); you need to include linux/i2c-dev.h for this
send i2c_msg structure via ioctl() with I2C_RDWR request; you need to include linux/i2c.h for this
See this question for examples.
How /dev/i2c-1 is associated with I2C subsystem
/dev/i2c-1 file is just an interface to I2C subsystem. You can send I2C message, receive I2C message and configure I2C using correspondingly write(), read() and ioctl() syscalls. Once you perform one of these operations over /dev/i2c-1 file, it's being passed through Virtual file system to I2C layer, where those operations are implemented. Actual callbacks for those operations are implemented in drivers/i2c/i2c-dev.c file, more specifically -- in i2cdev_fops structure.
For example, when you perform open() syscall on /dev/i2c-1 file, the i2cdev_open() function is called in kernel, which creates i2c_client structure for further send/receive operations, and that structure is being assigned to file's private data field:
/* This creates an anonymous i2c_client, which may later be
* pointed to some address using I2C_SLAVE or I2C_SLAVE_FORCE.
*
* This client is ** NEVER REGISTERED ** with the driver model
* or I2C core code!! It just holds private copies of addressing
* information and maybe a PEC flag.
*/
client = kzalloc(sizeof(*client), GFP_KERNEL);
...
file->private_data = client;
When you perform some operations on that /dev/i2c-1 file next, i2c_client structure will be extracted from file->private_data field, and corresponding function will be called for that structure.
For write() syscall the i2cdev_write() function will be called, which leads to i2c_master_send() function call:
struct i2c_client *client = file->private_data;
...
ret = i2c_master_send(client, tmp, count);
The same way read() leads to i2cdev_read(), which leads to i2c_master_recv(). And ioctl() leads to i2cdev_ioctl(), which just assigns corresponding flags to i2c_client structure.
How /dev/i2c-1 is associated with hardware I2C driver
Operations performed on /dev/i2c-* file lead eventually to execution of I2C hardware driver functions. Let's see one example of it. When we are doing write() syscall, the whole chain will be:
write() -> i2cdev_write() -> i2c_master_send() -> i2c_transfer() -> __i2c_transfer() -> adap->algo->master_xfer(), where adap is i2c_adapter structure, which stores hardware specific data about I2C controller, such as callbacks of I2C hardware driver. That .master_xfer callback is implemented in I2C hardware driver. For example, for OMAP platforms it's implemented in drivers/i2c/busses/i2c-omap.c file, see omap_i2c_xfer() function.

How to allocate DMA channel in user space?

I am the maintainer of an open source project that relies on the DMA controller to do PWM on Raspberry Pi IO pins. This technique requires the use of one DMA channel. We have historically hard-coded the DMA channel 0 but got multiple bug reports stating that the program does not work properly when X is running at the same time (bug reports: here and here, etc).
We have found the Mailbox API in the Raspberry PI firmware which includes an API to manage shared resources such as DMA channels and figure out which ones are available.
Pattrick Hueper gave this a try but it still reports channel 0 as available. Maybe X does not use this API to announce which channel it is using.
I found dma_request_channel() for kernel space programs but that is not available in user space.
What is the proper way to use a DMA channel from user space while being a good citizen on the computer and avoid conflict with other tools?

I have been able to confirm the following:
You include:
#include <mach/dma.h>
...
int rc = bcm_dma_chan_alloc(
BCM_DMA_FEATURE_NORMAL, /* Features found in mach/dma.h */
&dma_base,
&dma_irq
);
rc is returned negative, if an error occurs. When rc >= 0, it is the dma channel
returned.
void __iomem *dma_base; /* returned */
int dma_irq; /* returned */
To release:
bcm_dma_chan_free(dma_chan);
So far, it has returned me DMA channel 2:
[ 99.372778] chan = rc = 2, dma_base=f3007200, IRQ=77
[ 99.372790] Returned DMA channel 2.
[ 103.971670] Releasing DMA Channel 2
and 4 (when I left DMA 2 unreleased).

How to access kernel mode memory in user mode application in WinCe7

I have written a direct show filter and a video decoder driver for Win CE 7. The filter is loaded in the user mode and the decoder driver is loaded in the kernel mode. The filter need memory to receive the input buffers and it allocates this memory by calling a video decoder driver function. The decoder driver allocates memory and returns its virtual address from kernel space (>2GB) because it is loaded in the kernel mode. But this memory is not accessible for the filter because it is loaded in the user mode.
By considering the above scenario, how can I make the memory allocated by the video decoder driver accessible for the filter?

I'm not sure if this will help considering you're on Win CE 7, but in Windows 7, I have a driver that maps a kernel mode address to a user mode address before returning the resulting user-mode address to my application.
void *userSpaceAddr;
// Allocate the MDL describing our kernel memory
pmdl = IoAllocateMdl((PVOID)&my_heap_var,
(ULONG)size_of_my_heap_var,
FALSE,
FALSE,
NULL);
if(!pmdl) {
DbgPrintEx(DPFLTR_IHVVIDEO_ID, DPFLTR_INFO_LEVEL, "Error on IoAllocateMdl. Returning from driver early.\n");
return STATUS_INSUFFICIENT_RESOURCES;
}
MmBuildMdlForNonPagedPool(pmdl);
userSpaceAddr = (void *)MmMapLockedPagesSpecifyCache(pmdl, UserMode, MmWriteCombined, NULL, FALSE, LowPagePriority);
userSpaceAddris mapped to a user space virtual address in the process context that called the driver. You can then return userSpaceAddr to your application.
This hinges on the MmMapLockedPagesSpecifyCache function. MSDN doc here: http://msdn.microsoft.com/en-us/library/windows/hardware/ff554629(v=vs.85).aspx

Linux driver DMA transfer to a PCIe card with PC as master

I am working on a DMA routine to transfer data from PC to a FPGA on a PCIe card. I read DMA-API.txt and LDD3 ch. 15 for details. However, I could not figure out how to do a DMA transfer from PC to a consistent block of iomem on the PCIe card. The dad sample for PCI in LDD3 maps a buffer and then tells the card to do the DMA transfer, but I need the PC to do this.
What I already found out:
Request bus master
pci_set_master(pdev);
Set the DMA mask
if (dma_set_mask(&(pdev->dev), DMA_BIT_MASK(32))) {
dev_err(&pdev->dev,"No suitable DMA available.\n");
goto cleanup;
}
Request a DMA channel
if (request_dma(dmachannel, DRIVER_NAME)) {
dev_err(&pdev->dev,"Could not reserve DMA channel %d.\n", dmachannel);
goto cleanup;
}
Map a buffer for DMA transfer
dma_handle = pci_map_single(pci_dev, buffer, count, DMA_TO_DEVICE);
Question:
What do I have to do in order to let the PC perform the DMA transfer instead of the card?
Thank your for your help!
First of all thank you for your replies. Maybe I should put my questions more precisely:
In my understanding the PC has to have a DMA controller. How do I access this DMA controller to start a transfer to a memory mapped IO region in the PCIe card?
Our specification demands that the PC's DMA controller initiates the transfer. However, I could only find examples where the device would do the DMA job (DMA_mapping.txt, LDD3 ch.15). Is there a reason, why nobody uses the PC's DMA controller (It still has DMA channels though)? Would it be better to request a specification change for our project?
Thanks for your patience.

Look up DMA_mapping.txt. There's a long section in there that tells you how to set the direction ('DMA direction', line 408).
EDIT
Ok, since you edited your question... your specification is wrong. You could set up the system DMA controller, but it would be pointless, because it's too slow, as I said in the comments. Read this thread.
You must change your FPGA to support bus mastering. I do this for a living - contact me off-thread if you want to sub-contract.

What you are talking about is not really a DMA. The DMA is when your device is accessing memory and the CPU itself is not involved (with an exception of PC's memory controller, which is usually embedded into the PC's CPU these days). Not all devices can do it, and if you are using FPGA, then you surely need some sort of DMA controller in your design (i.e. Expresso DMA Core or alike). In your case, you just have to write to the mapped memory region (i.e. one that you obtain with ioremap_nocache) using iowrite calls (i.e. iowrite32) followed by write memory barriers wmb(). What I/O bar and address you have to write to entirely depends on your device.
Hope it helps. Good Luck!

Serial communication with minimal delay

I have a computer which is connected with external devices via serial communication (i.e. RS-232/RS-422 of physical or emulated serial ports). They communicate with each other by frequent data exchange (30Hz) but with only small data packet (less than 16 bytes for each packet).
The most critical requirement of the communication is low latency or delay between transmitting and receiving.
The data exchange pattern is handshake-like. One host device initiates communication and keeps sending notification on a client device. A client device needs to reply every notification from the host device as quick as possible (this is exactly where the low latency needs to be achieved). The data packets of notifications and replies are well defined; namely the data length is known.
And basically data loss is not allowed.
I have used following common Win API functions to do the I/O read/write in a synchronous manner:
CreateFile, ReadFile, WriteFile
A client device uses ReadFile to read data from a host device. Once the client reads the complete data packet whose length is known, it uses WriteFile to reply the host device with according data packet. The reads and writes are always sequential without concurrency.
Somehow the communication is not fast enough. Namely the time duration between data sending and receiving takes too long. I guess that it could be a problem with serial port buffering or interrupts.
Here I summarize some possible actions to improve the delay.
Please give me some suggestions and corrections :)
call CreateFile with FILE_FLAG_NO_BUFFERING flag? I am not sure if this flag is relevant in this context.
call FlushFileBuffers after each WriteFile? or any action which can notify/interrupt serial port to immediately transmit data?
set higher priority for thread and process which handling serial communication
set latency timer or transfer size for emulated devices (with their driver). But how about the physical serial port?
any equivalent stuff on Windows like setserial/low_latency under Linux?
disable FIFO?
thanks in advance!

I solved this in my case by setting the comm timeouts to {MAXDWORD,0,0,0,0}.
After years of struggling this, on this very day I finally was able to make my serial comms terminal thingy fast enough with Microsoft's CDC class USB UART driver (USBSER.SYS, which is now built in in Windows 10 making it actually usable).
Apparently the aforementioned set of values is a special value that sets minimal timeouts as well as minimal latency (at least with the Microsoft driver, or so it seems to me anyway) and also causes ReadFile to return immediately if no new characters are in the receive buffer.
Here's my code (Visual C++ 2008, project character set changed from "Unicode" to "Not set" to avoid LPCWSTR type cast problem of portname) to open the port:
static HANDLE port=0;
static COMMTIMEOUTS originalTimeouts;
static bool OpenComPort(char* p,int targetSpeed) { // e.g. OpenComPort ("COM7",115200);
char portname[16];
sprintf(portname,"\\\\.\\%s",p);
port=CreateFile(portname,GENERIC_READ|GENERIC_WRITE,0,0,OPEN_EXISTING,0,0);
if(!port) {
printf("COM port is not valid: %s\n",portname);
return false;
}
if(!GetCommTimeouts(port,&originalTimeouts)) {
printf("Cannot get comm timeouts\n");
return false;
}
COMMTIMEOUTS newTimeouts={MAXDWORD,0,0,0,0};
SetCommTimeouts(port,&newTimeouts);
if(!ComSetParams(port,targetSpeed)) {
SetCommTimeouts(port,&originalTimeouts);
CloseHandle(port);
printf("Failed to set COM parameters\n");
return false;
}
printf("Successfully set COM parameters\n");
return true;
}
static bool ComSetParams(HANDLE port,int baud) {
DCB dcb;
memset(&dcb,0,sizeof(dcb));
dcb.DCBlength=sizeof(dcb);
dcb.BaudRate=baud;
dcb.fBinary=1;
dcb.Parity=NOPARITY;
dcb.StopBits=ONESTOPBIT;
dcb.ByteSize=8;
return SetCommState(port,&dcb)!=0;
}
And here's a USB trace of it working. Please note the OUT transactions (output bytes) followed by IN transactions (input bytes) and then more OUT transactions (output bytes) all within 3 milliseconds:
And finally, since if you are reading this, you might be interested to see my function that sends and receives characters over the UART:
unsigned char outbuf[16384];
unsigned char inbuf[16384];
unsigned char *inLast = inbuf;
unsigned char *inP = inbuf;
unsigned long bytesWritten;
unsigned long bytesReceived;
// Read character from UART and while doing that, send keypresses to UART.
unsigned char vgetc() {
while (inP >= inLast) { //My input buffer is empty, try to read from UART
while (_kbhit()) { //If keyboard input available, send it to UART
outbuf[0] = _getch(); //Get keyboard character
WriteFile(port,outbuf,1,&bytesWritten,NULL); //send keychar to UART
}
ReadFile(port,inbuf,1024,&bytesReceived,NULL);
inP = inbuf;
inLast = &inbuf[bytesReceived];
}
return *inP++;
}
Large transfers are handled elsewhere in code.
On a final note, apparently this is the first fast UART code I've managed to write since abandoning DOS in 1998. O, doest the time fly when thou art having fun.
This is where I found the relevant information: http://www.egmont.com.pl/addi-data/instrukcje/standard_driver.pdf

I have experienced similar problem with serial port.
In my case I resolved the problem decreasing the latency of the serial port.
You can change the latency of every port (which by default is set to 16ms) using control panel.
You can find the method here:
http://www.chipkin.com/reducing-latency-on-com-ports/
Good Luck!!!

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio