USB2.0 Transfer using usb_submit_urb gives kernel panic - linux-kernel

Scenario
I am building and transferring ethernet packets from application over USB2.0.
Inside the USB class driver, I am issuing a request to send this packets to BULK endpoint using usb_submit_urb. My ethernet packet size is 112 bytes. I am able to transfer 8000 packets in ~ 200 msec without considering presentation time.
I am making an ioctl call to send packets at a very faster rate say I am making an ioctl call after every 3-4 usec. Inside my ioctl I am issuing usb_submit_urb which is non-blocking call unlike usb_bulk_msg.
Problem
If I consider presentation time, the kernel panics and dmesg log reads kernel panic - not syncing : Fatal exception in interrupt. For INFO: By considering presentation time, packets will wait in hardware device till timestamp_in_packet == hardware time.
I need the learning of how EHCI behaves in such conditions or what can be the status of endpoint queues in such a scenario. I am working on a ETH over USB chip. What is the actual cause of such kernel panics.
Any inputs will help alot.

Related

Zero-length packets for USB control transfers

In the context of a DFU driver, I'm trying to respond with a packet of length zero (not ZLP as in multiples of max size, just zero bytes) to a USB control in transfer. However, the host returns with a timeout condition. I tried both, the dfu-util tool and the corresponding protocol, as well as a minimal working example with pyusb just issuing a control in transfer of some length and the device returning no data.
My key question is: Do I achieve this by responding with a NAK or should I set the endpoint valid but without any data? The specs are rather vague about this, imo.
Here are some technical details since I'm not sure where the problem is:
Host: Linux Kernel 5.16.10, dfu-util and pyusb (presumably) both using libusb 0.1.12
Device: STM32L1 with ChibiOS 21.11.1 USB stack (sends NAK in the above situation, I also tried to modify it to send a zero-length packet without success)
It sounds like you are programming the firmware of a device, and you want your device to give a response that is 0 bytes long when the host starts a control read transfer.
You can't simply send a NAK token: that is what the device does when the data isn't ready yet, and it causes the host to try again later to read the data.
Instead, you must actually send a 0-length IN packet to the host. When the host receives this packet, it sees that the packet is shorter than the maximum packet size, so it knows the data phase of the control transfer is done, and it moves on to the status stage.

serial stops between mbed and processing

I want a solution right now...
The LPC1768 which is one of a mbed prototyping boards communicates with processing through Serial whose baud rate is 115200. However, as time passes, the serial communication stops.
As a situation, the LPC1768 sends sensor data with serial.putc() of the default serial library. On the other hand, processing receives the data with serial.read(). The processing code is the following:
if(serial.available()>1) { serial.read(); }
To explore the solution of this problem, I tired these things.
I checked serial.available() is 46 and used serial.clear() in processing, but mbed stopped and didn't send data. Because I thought the cause of this was an overflow of receiving buffer of processing.
I added serial.writable() and check the serial buffers for sending has space. If there are no space, I used the following codes and initialized the serial:
LPC_UART2->FCR |= 0x06;
serial.baud(115200);
, because I thought the cause of this was an overflow of sending buffer of mbed. However, they didn't work.
Please note that these codes work correctly and basically.
However, serial communication stops suddenly. What can I do anything else?
Best regards

Linux Driver and API architecture for a data acquisition device

We're trying to write a driver/API for a custom data acquisition device, which captures several "channels" of data. For the sake of discussion, let's assume this is a several-channel video capture device. The device is connected to the system via an 8xPCIe Gen-1 link, which has a theoretical throughput of 16Gbps. Our actual data rate will be around 2.8Gbps (~350MB/sec).
Because of the data rate requirement, we think we have to be careful about the driver/API architecture. We've already implemented a descriptor based DMA mechanism and the associated driver. For example, we can start a DMA transaction for 256KB from the device and it completes successfully. However, in this implementation we're only capturing the data in the kernel driver, and then dropping it and we aren't streaming the data to the user-space at all. Essentially, this is just a small DMA test implementation.
We think we have to separate the problem into three sections: 1. Kernel driver 2. Userspace API 3. User Code
The acquisition device has a register in the PCIe address space which indicates whether there is data to read for any channel from the device. So, our kernel driver must poll for this bit-vector. When the kernel driver sees this bit set, it starts a DMA transaction. The user application however does not need to know about all these DMA transactions and data, until an entire chunk of data is ready (For example, assume that the device provides us with 16 lines of video data per transaction, but we need to notify the user only when the entire video frame is ready). We need to only transfer entire frames to the user application.
Here was our first attempt:
Our user-side API allows a user application to register a function callback for a "channel".
The user-side API has a "start" function, which can be called by the user application, which uses ioctl to send a start message to the kernel driver.
In the kernel driver, upon receiving the start message, we started a kernel thread, which continuously monitors the "data ready" bit-vector, and when it sees new data, copies it over to a driver-allocated (kmalloc) buffer. It keeps doing this until the size of the collected data reaches the "frame size".
At this point a custom linux SIGNAL (similar to SIGINT, SIGHUP, etc) is sent to the process which is running the driver. Our API catches this signal and then calls back the appropriate user callback function.
The user callback function calls a function in the API (transfer_data), which uses an ioctl call to send a userspace buffer address to the kernel, and the kernel completes the data transfer by doing a copy_to_user of the channel frame data to userspace.
All of the above is working OK, except that the performance is abysmal. We can only achieve about 2MB/sec of transfer rate. We need to completely re-write this and we're open to any suggestions or pointers to examples.
Other notes:
Unfortunately, we can not change anything in the hardware device. So we must poll for the "data-ready" bit and start DMA based on that bit.
Some people suggested to look at Infiniband drivers as a reference, but we're completely lost in that code.
You're probably way past this now, but if not here's my 2p.
It's hard to believe that your card can't generate interrupts when
it has transferred data. It's got a DMA engine, and it can handle
'descriptors', which are presumably elements of a scatter-gather
list. I'll assume that it can generate a PCIe 'interrupt'; YMMV.
Don't bother trawling the kernel for existing similar drivers. You
might get lucky, but I suspect not.
You need to write a blocking read, which you supply a large memory buffer to. The driver read op (a) gets gets a list of user pages for your user buffer and locks them in memory (get_user_pages); (b) creates a scatter list with pci_map_sg; (c) iterates through the list (for_each_sg); (d) for each entry writes the corresponding physical bus address and data length to the DMA controller as what I presume you're calling a 'descriptor'.
The card now has a list of descriptors which correspond to the physical bus addresses of your large user buffer. When data arrives at the card, it writes it directly into user space, into your user buffer, while your user-level read is still blocked. When it has finished the descriptor list, the card has to be able to interrupt, or it's useless. The driver responds to the interrupt and unblocks your user-level read.
And that's it. The details are nasty, of course, and poorly documented, but that should be the basic architecture. If you really haven't got interrupts you can set up a timer in the kernel to poll for completion of transfer, but if it is really a custom card you should get your money back.

netfilter speed limit

I am testing netlink filter application on 1Gbit/sec network: i have user space function sending verdict to netlink socket; another user space routine performs async read of marked packets from netlink socket and some custom filter function. For the bitrates >300 Mbps i see netlink socket read errors "no buffer space available". I take it as netlink buffer overflow.
Can someone recommend an approach on how to improve netlink throughput for high speed network? My kernel version is 2.6.38.
There is socket between kernel to user space. via the socket packet upload to user space. the socket buffer is full , so you get an error.
in c you can define the socket buffer size and increase it (this is done by netlink)

Why would a device driver cause page faults?

I have a Windows console application that uses a parallel IO card for high speed data transmission. (General Standards HPDI32ALT)
My process is running in user mode, however, I am sure somewhere behind the device's API there is some kernel mode driver activity (PCI DMA transfers, reading device status registers etc..) The working model is roughly this:
at startup: I request a pointer to an IO buffer from API.
in my main loop:
block on API waiting for room in device's buffer (low watermark)
fill the IO buffer with transmission data
begin transmission to device by passing it the pointer to the IO buffer (during this time the API uses DMA on PCI bus to move the data to the card)
block on API waiting for IO to complete
The application appears to be working correctly with proper data rate and sustained throughput for long periods of time, however, when I look at the process in sys internals tool process explorer I see a large number of page faults (~6k per second). I am moving ~30MB/s to the card.
I have plenty of RAM and am reasonably sure the page faults are not disk IO related.
Any thoughts on what could be causing the page faults? I also have a receive side to this application that is using an identical IO card in receive mode. The receive mode use of the API does not cause a large number page faults.
Could the act of moving the IO buffer to kernel mode cause page faults?
So your application asks the driver for a memory buffer and you copy the send data into that buffer? That's a pretty strange model, usually you let the application manage the buffers.
If you're faulting 6K pages/s and you're only transfering 30MB/s, you're almost getting a page fault for every page you transfer. When you get the data buffer from the driver, is it always zero filled? I'm wondering if you're getting demand zero faults for every transfer.
-scott

Resources