I tried the example provided by atmel's ASF on USB mass storage host to send/read a file to a USB flash storage device. When reading a file, i'm getting 1.7 MB/s speed, I tried a lot of solutions, which include :
Made sure its running on High speed mode, and the board is running
on 300 mhz
Tried increasing the buffer size for the F_read function, and I
managed to increase it to 2.2 MB/s
I tested the file system it self, which is FAT32 on a virtual memory
example, and got 30MB/s on read operations ( not sure if thats
helpful for speed debugging purposes)
I tried using the same program, except reading from an SD card, which
gave me an output of 1 MB/s
I also tested it on full speed mode and it gave me an output of 0.66
MB/s.
one extreme idea i tested was running two boards, one in host mode,
and the other in device mode then I tested the transfer speed of the
USB, it gave me an output of 1.66 MB/s on Bulk Mode. (running on HS)
tried the Keil examples which gave me worst results than Atmel's.
can someone please suggest solutions? I've read all documentation regarding USB communication provided by Atmel and Keil.
Atmel's mass storage USB stack lacks multi-sector read and write, though the SCSI layer indeed implements the proper command to get many sectors in a row (see uhi_msc_scsi_read_10). The abstraction layer reading data above the SCSI commands (uhi_msc_mem_read_10_ram and uhi_msc_mem_write_10_ram for instance) only read sector by sector, yielding in very poor performance.
In order to achieve USB High Speed performance (~35 MB/s) you will have to hack these functions (and all the layers above) to use multi-sectors read/write.
Related
Any experience still seems to be insufficient to answer those strange issues that pop up in serial communication buses. We are trying to implement a data copy from an external flash in to the SRAM. Below are the details how we have configured our system.
Controller : RH850 (D1M1), PLL speed at 60MHz
External Flash (IS25LP128)
SPI speed: 5MHz (clocks observed using oscilloscope)
Data size: 4 MB
Now, in theory, if my SPI is operating at 5MHZ it should copy 5MBits/Sec. We are trying to copy 4MB so essentially it will be 32 Mega Bits. So in theory, our transfer should take about 7 seconds. Ok we have some implication overheads. My driver code can accept only up to 64Kb per read call so we chose to copy 40Kb for about 100 times to achieve this and we run this in a for loop.. Ok let me add a whooping 5 seconds of overhead (Sorry RH850!) so in total 12 seconds; well, lets add some more buffer and make it a comfort zone of 15 sec (Max expected!). But then when we run the code, its taking a whole 40seconds to finish the copy. We have checked the clock and it is 5MHz as expected and at least they are continuous.
Has anyone here faced this? Where can we look in to? Well I know I have some flash-driver provided by my vendor to dig in to but before I do that, I wanted to be sure! Any help will be really appreciated.
At a first glance, I can think about minimum 10 things which may be responsible for this. One thing I'm sure, this problem is complex. There is no simple "one line solution". The main suspect is what is not yours: the flash driver. So, isolate "pieces" one by one and verify them, starting from the bottom.
Is there operating system? DMA in use? Issue with memory or resource arbitration/sharing? Interrupts are in use or polling? Any higher priority jobs are running? Data read from registers or memory mapped? Generic SPI peripheral or special serial flash is used by the driver (I don't know RH850, some uC has it)?
Your post is not precise enough, so maybe these questions will help you. What I would do? My own driver!
Today I have figured out something that really made me wondering. I have the Samsung Exynos 4412 ARM9 CPU which has a GPU400(QuadCore). I tried to get a texture from the GPU to CPU by all known methods and its really slow. The same scenario and slow speed happens also in modern CPUs and GPUs in the PC Platform. My wondering is how that happens and the Samsung Exynos is an SoC and both of them has the same memory and I should not care about the bus.
Why that happens ?
The data from the GPU to the CPU is transferred by many methods which I have tried glReadpixels, gltexSubImage2D, gltexImage2d, FBO.
The frame rate drops from 40FPS to 7FPs or 7FPS while using any of those methods, on a texture 1024*1024 24bits.
Possible answers taken from the OpenGL forums:
Latency: it takes time for the read command to reach the hardware.
OpenGL command buffering: Reading the data requires the OpenGL driver to complete all outstanding commands.
Hardware buffering: Hardware must empty all GPU core pipelines before doing a readback.
Possible solution:
- Copy the data internally on the GPU to another location and read it back some number of frames after computing it. This should allow everything writing to that location to have completed before you attempt to read it.
I'm working with a board with an ARM based processor running linux (3.0.35). Board has 1GB RAM and is connected to a fast SSD HD, and to a 5MP camera.
My goal is capturing high resolution images and write those directly to disk.
All goes well until I'm trying to save a very long video (over 1GB of data),
After saving a large file, it seems that I'm unable to reload the camera driver - it fails allocating a large enough DMA memory block for streaming (when calling dma_alloc_coherent()).
I narrowed it down to a scenario where Linux boots (when most of the memory is available), I then write random data into a large file (>1GB), and when I try to load the camera driver it fails.
To my question -
When I open a file for writing, write a large amount of data, and close the file, isn't the memory which was used for writing the data to HD supposed to be freed?
I can understand the why the memory becomes fragmented during the HD access, but when the transactions to the HD are completed - why is the memory still so fragmented that I cannot allocate 15MB of contiguous RAM?
Thanks
[...] close the file, isn't the memory which was used for writing the data to HD supposed to be freed?
No, it will be cached, you can check /proc/meminfo for this. Whether the dma_alloc_coherent() function uses only free memory is a good question.
As compilation is mostly reading small files, I wonder if buying a fast usb key to work on can speed up compilation time compared to a standard SATA drive, and with a lower price than an SSD drive (16Gb keys are < 30$).
Am I right ?
PS: I working with .Net, VS and the whole MS' tools, but I think this is true for all languages...
[Edit] According Toms's hardware, a recent hdd drive read access time is near 14ms average where the slowest USB key is 1.5ms
That's why I'm wondering if my hypothesis is right
One year later, SSD drives cost has melt.
This is now the simplest solution, as you can host the operating system and all the dev tools on a fast SSD drive, which costs less than $ 120 (100 €)
The specifications on Wikipedias SATA page suggest otherwise. The SATA revision 3.0 is 50% faster than USB 3.0 according to the spec.
A python program I created is IO bounded. The majority of the time (over 90%) is spent in a single loop which repeats ~10,000 times. In this loop, ~100KB data is generated and written to a temporary file; it is then read back out by another program and statistics about that data collected. This is the only way to pass data into the second program.
Due to this being the main bottleneck, I thought that moving the location of the temporary file from my main HDD to a (~40MB) RAMdisk (inside of over 2GB of free RAM) would greatly increase the IO speed for this file and so reduce the run-time. However, I obtained the following results (each averaged over 20 runs):
Test data 1: Without RAMdisk - 72.7s, With RAMdisk - 78.6s
Test data 2: Without RAMdisk - 223.0s, With RAMdisk - 235.1s
It would appear that the RAMdisk is slower that my HDD.
What could be causing this?
Are there any other alternative to using a RAMdisk in order to get faster file IO?
Your operating system is almost certainly buffering/caching disk writes already. It's not surprising the RAM disk is so close in performance.
Without knowing exactly what you're writing or how, we can only offer general suggestions. Some ideas:
If you have 2 GB RAM you probably have a decent processor, so you could write this data to a filesystem that has compression. That would trade I/O operations for CPU time, assuming your data is amenable to that.
If you're doing many small writes, combine them to write larger pieces at once. (Can we see the source code?)
Are you removing the 100 KB file after use? If you don't need it, then delete it. Otherwise the OS may be forced to flush it to disk.
Can you write the data out in batches rather than one item at a time? Are you caching resources like open file handles etc or cleaning those up? Are your disk writes blocking, can you use background threads to saturate IO while not affecting compute performance.
I would look at optimising the disk writes first, and then look at faster disks when that is complete.
I know that Windows is very aggressive about caching disk data in RAM, and 100K would fit easily. The writes are going directly to cache and then perhaps being written to disk via a non-blocking write, which allows the program to continue. The RAM disk probably wouldn't support non-blocking operations because it expects those operations to be quick and not worth the bother.
By reducing the amount of memory available to programs and caching, you're going to increase the amount of disk I/O for paging even if only slightly.
This is all speculation on my part, since I'm not familiar with the kernel or drivers. I also speculate that Linux would operate similarly.
In my tests I've found that not only batch size affects overall performance, but also the nature of data itself. I've managed to get 5 times better write times compared to SSD in only one scenario: writing a 100MB chunk of pre-cooked random byte array to RAM drive. Writing more "predictable" data like letters "aaa" or current datetime yields quite opposite results - SSD is always faster or equal. So my guess is that opertating system (Win 7 in my case) does lots of caching and optimizations.
Looks like the most hindering case for RAM-drive is when you perform lots of small writes instead of a few big ones, and RAM drive shines at writing large amounts of hard-to-compress data.
I had the same mind boggling experience, and after many tries I figured it out.
When ramdisk is formatted as FAT32, then even though benchmarks shows high values, real world use is actually slower than NTFS formatted SSD.
But NTFS formatted ramdisk is faster in real life than SSD.
I join the people having problems with RAM disk speeds (only on Windows).
The SSD i have can write 30 GiB (in one big block, dump a 30GiB RAM ARRAY) with a speed of 550 MiB/s (arround 56 seconds to write 30 GiB) ... this is if the write is asked in one source code sentence.
The RAM Disk (imDisk) i have can write 30 GiB write (in one big block, dump a 30GiB RAM ARRAY) with a speed of a bit less than 100 MiB/s (arround 5 minutes and 13 seconds to write 30 GiB) ... this is if the write is asked in one source code sentence.
I had also done another RAM test: from source code do a sequential direct write (one byte per source code loop pass) to a 30GiB RAM ARRAY (i have 64GiB of RAM) and i get a speed of near 1.3GiB/s (1298 MiB per second).
Why on the hell (on Windows) RAM Disk is so slow for one BIG secuential write?
Of course that low write speed happens on RAM disks on Windows, since i tested the same 'concept' on Linux with Linux native ram disk and Linux ram disk can write at near one gigabyte per second.
Please note that i had also tested SoftPerfect and other RAM disks on Windows, RAM Disk speeds are near the same, can not write at more than one hundred megabytes per second.
Actual Windows tested: 10 & 11 (on both HOME & PRO, on 64 bits), RAM Disk format (exFAT & NTFS); since RAM disk speed was too slow i was trying to find one Windows version where RAM disk speed be normal, but found no one.
Actual Linux Kernel tested: Only 5.15.11, since Linux native RAM disk speed was normal i do not test on any other kernel.
Hope this help other people, since knowledge is the base to solve a problem.