What may cause a limit on SG_IO ioctl maximum sector count of a transfer? - limit

I need to pass a direct ATA request to a hard drive (0x25, READ DMA EXT), to disobey max sector count (long story), and to bypass all possible OS caches, buffers, reorderings et al.
HDIO_DRIVE_TASKFILE IOCTL is no longer available due to libata.
I accomplished the goal with a SG_IO IOCTL with ATA pass-through (SG_ATA_16). Works perfectly except one problem: I can read a maximum of 8192 sectors in one command. I need to read a full of 32767 sectors.
max_hw_sectors_kb is 32767, so the drive supports this much transfer
max_sectors_kb was low, yet I brought it up to 32767 sectors, to no avail
scheduler is set to noop, no change.
Tried gather buffer (iovec_count>0, properly set iovecs to consecutive buffer slices), no change.
Environment: Ubuntu 16.04/16.10/17.04 with standard kernels, SATA drive connected to standard AHCI interface on Intel chipset.
No matter what I do, starting with 8193 sectors, IOCTL bails out with "Invalid argument" error.
Where to look? What else can cause a 4MB data transfer cap?

Related

Max transfer size of SCSI Read(10) on a Physical Drive

Refer to here, IOCTL_STORAGE_QUERY_PROPERTY with StorageAdapterProperty can be used to get the max transfer size of per SCSI Read(10) command.
In this code, 16 sectors are read from start of lba. I tried to modify the number and in my Win7 environment, the max number is 256 sector through SATA and 128 sector through bridge (SATA-USB) to a SSD, which match the result using IOCTL_STORAGE_QUERY_PROPERTY with StorageAdapterProperty.
As far as I know, when installing OS (win7, win10, macOS), a device can receive a SCSI Read(10) command up to 2048 sector. I wonder which layer limits the transfer size (operating system/device driver...) and is there any way to bypass the layer to send a SCSI Read(10) command longer than the limitation at a time?

STM32F411 I need to send a lot of data by USB with high speed

I'm using STM32F411 with USB CDC library, and max speed for this library is ~1Mb/s.
I'm creating a project where I have 8 microphones connected into ADC line (this part works fine), I need a 16-bit signal, so I'm increasing accuracy by adding first 16 signals from one line (ADC gives only 12-bits signal). In my project, I need 96k 16-bit samples for one line, so it's 0,768M signals for all 8 lines. This signal needs 12000Kb space, but STM32 have only 128Kb SRAM, so I decided to send about 120 with 100Kb data in one second.
The conclusion is I need ~11,72Mb/s to send this.
The problem is that I'm unable to do that because CDC USB limited me to ~1Mb/s.
Question is how to increase USB speed to 12Mb/s for STM32F4. I need some prompt or library.
Or maybe should I set up "audio device" in CubeMX?
If small b means byte in your question, the answer is: it is not possible as your micro has FS USB which max speeds is 12M bits per second.
If it means bits your 1Mb (bit) speed assumption is wrong. But you will not reach the 12M bit payload transfer.
You may try to write (only if b means bit) your own class but I afraid you will not find a ready made library. You will need also to write the device driver on the host computer

ACP and DMA, how they work?

I'm using ARM a53 platform, it has ACP component, and I'm trying to use DMA to transfer data through ACP.
By ARM trm document, if I understand it correctly, the DMA transmission data size limits to 64 bytes for each DMA transfer when using ACP.
If so, does this limitation make DMA not usable? Because it's dumb to configure DMA descriptor but to transfer 64 bytes only each time.
Or DMA should auto divide its transfer length into many ACP size limited(64 bytes) packets, without any software intervention.
Need any expert to explain how ACP and DMA work together.
Somewhere in the interfaces from the DMA to the ACP's AXI port should auto divide its transfer length as needed into transfers of appropriate length. For the Cortex-A53 ACP, AXI transfers are limited to 64B(perhaps intentionally 1x cacheline).
From https://developer.arm.com/documentation/ddi0500/e/level-2-memory-system/acp/transfer-size-support :
x byte INCR request characterized by:(some list of limitations)
Note the use of INCR instead of FIXED. INCR will automatically increment the address according to the size of the transfer, while FIXED will not. This makes it simple for the peripheral break a large transfer into a series of multiple INCR transfers.
However, do note that on the Cortex-A53, transfer size(x in the quote) is fixed at 16 or 64 byte aligned transfers. If the DMA sends an inappropriate sized transfer(because misconfigured or correct size unsupported), the AXI will emit a SLVERR. If the buffer is not appropriately aligned, I think this also causes a SLVERR.
Lastly, the on-chip network routing must support connecting the DMA to the ACP at chip design time. In my experience this is more commonly done for network accelerators and FPGA fabric glue, but tends to be less often connected for low speed peripherals like UART/SPI/I2C.

FTDI driver (Windows) FT_Write() issue with large (1KB) chunk - (version 2.12.16.0)

My application on PC sends a file (2 MB) in chunks of 1 KB to embedded device.
I use FTDI Windows driver, I use the classic FT_Write() API function as my code is cross-platform.
Note: These issues below appear when I use 1KB chunk size. Smaller chunk (I tried 64 bytes) works fine.
The problem is the function returns "0 byte sent" every couple hundred packets and stuck. I found a work around, by purging both TX and Rx, followed by ResetDevice() call recovered the chip. It still happened every couple hundred packets, but at least I can send the whole file (2 MB).
But when I use USB isolator (http://www.bb-elec.com/Products/USB-Connectivity/USB-Isolators/Compact-USB-Port-Guardian.aspx)
the work around failed.
I believe my work around is not a graceful solution.
Note: I use large chunk because of suggestion I found in FTDI application note below:
When writing data to an FTDI device, as much data as possible should
be buffered in the application and written to the device in a single
write function call (either WriteFile for a VCP application using the
Win32 API, FT_Write if using the D2XX classic interface or
FT_WriteFile if using the D2XX FT_W32 interface). The result of this
is that the data will be written to the device with 64 bytes per USB
packet.
Any idea what's the proper fix for these issues? Is it related to FTDI initialization? My driver version is 2.12.16.0 (3/9/2016).
I also saw the same problem of API FT_Write() not working right if too much data was passed,
while working on the library for my USB device Nusbio.
I mostly work in the mode Synchronous Bitbanging rather than UART but after all it is the same
hardware, driver and API.
There are the USB 2.0 specification or the FTDI FT232RL specification and then there is
reality of the electron and bit. The expected numbers of transfer speed never really match at
least at first. In other words it is complicated (see more below in my referenced blog post).
In 2015 I was under the impression that with FTDI chip FT232RL the size of 384 bytes was working well
and the number comes from the chip datasheet (128 byte receive buffer and 256 byte transmit buffer).
Using a size of 500 bytes would still work but above 600 bytes thing would not work.
I later used the chip FT231X which has a larger buffer (1k, 512 byte receive buffer and 512 byte transmit buffer).
and was able to transfer with FT_Write() 1k and 2k buffer of data, therefore more than doubling my speed of transfer.
But above 2k things would not work.
In 2016, I read every thing you can read about FTDI USB 2.0 Full speed chip, I came to the
conclusion that FT_Write should support up to 64K (see datasheet for the following chip
FT232RL, FT231X, FT232H, FT260, FT4222).
I also did some research on faster serial port communication from .NET than 115200 baud.
Somehow I was able to update my C# library to send data in buffer of 32k in FT_Write() and it is
working with the FT232RL and the FT231X chip, but I can't tell you what changed.
I was probably not completely underdanding the in and out of the USB 2.0 full speed FTDI technology.
For example let's say you are using the FT232RL and transfering 384 bytes at the time with
FT_Write(). Knowing that there is at least a 1 milli-second latency in USB 2.0 full speed what ever you
do, you are transfering from a USB point of view 384*1000/1024, that is 375 K byte/s in theory
(that would be the max), that said now what is the baudrate supported by your embedded device.
What is the baudrate used?
The FT232RL max baudrate is 900 000 baud, which would give you only 900000/(1+8+1) == 87 K byte/S.
Right away you can tell there is going to be some problem, may be the FTDI driver takes care of
it or not. I can't tell.
Re do the math based on the baudrate supported by your embedded device, and a 384 byte buffer
sent 1000 per second, then slow down your USB speed with a sleep() to match your baud rate.
That is where I would start.

strange behaviour using mmap

I'm using Angtsrom embedded linux kernel v.2.6.37, based on Technexion distribution.
DM3730 SoC, TDM3730 module, custom baseboard.
CodeSourcery toolchain v. 2010-09.50
Here is dataflow in my system:
http://i.stack.imgur.com/kPhKw.png
FPGA generates incrementing data, Kernel reads it via GPMC DMA. GPMC pack size = 512 data samples. Buffer size = 61440 32bit samples (=60 ram pages).
DMA buffer is allocated by dma_alloc_coherent and mapped to userspace by mmap() call. User application directly reads data from DMA buffer and saving to NAND using fwrite() call. User reads data by 4096 samples at once.
And what I see in my file? http://i.stack.imgur.com/etzo0.png
Red line means first border of ring buffer. Ooops! Small packs (~16 samples) starts to hide after border. Their values is accurately = "old" values of corresponding buffer position. But WHY? 16 samples is much lesser than DMA pack size and user read pack size, so there cannot be pointers mismatch.
I guess there is some mmap() feature is hiding somewhere. I have tried different flags for mmap() - such as MAP_LOCKED, MAP_POPULATE, MAP_NONBLOCK with no success. I completely missunderstanding this behaviour :(
P.S. When i'm using copy_to_user() from kernel instead of mmap() and zero-copy access, there is no such behaviour.

Resources