My program on linux got a drastic speed increase when I wrote fcntl(fd, F_SETPIPE_SZ, size). fd is a pipe to a child process I created with fork+execv. I raised the pipe from 64K to 1MB which seems to be linux max without root permission.
I wrote a test to see how big it is on mac, it's also 64K, but I can't seem to figure out how to increase the pipe size. Does anyone know? I'm using an M2 and ventura
As far as I know macOS doesn't have a similar option available. However if your end goal is to speed up your program, it might be worth taking a closer look at why increasing the pipe capacity on Linux improves performance, as it seems to indicate some underlying problem elsewhere in the program.
An example off the top of my head: if you are sending large chunks of data (>64K), it may be that the reading part of the code doesn't correctly handle truncated data and hangs for some amount of time when a partial chunk is read. With a small buffer this would happen more often, so increasing the buffer size would improve performance, but wouldn't actually fix the root problem.
Related
I have a few Rust programs that read data from a file, do some operations, and write data on another file.
Simple enough, but I've been having a big issue in that my programs saturate the HDD max I/O and can only be executed when no other process is in use.
To be more precise, I'm currently using BufReader and BufWriter with a buffer size of 64 KB which is fantastic in and of itself to read/write a file as quickly as possible. But reading at 250MB/s and writing at the same time at 250MB/s has a tendency to overflow what the HDD can manage. Suffice to say that I'm all for speed and whatnot, but I realized that those Rust programs are asking for too much resources from the HDD and seems to be stalled by the Operating System (Windows) to let other processes work in peace. The files I'm reading/writing are generally a few Gigabytes
Now I know I could just add some form of wait() between each read/write operation on the disk but, I don't know how to find out at which speed I'm currently reading/writing and am looking for a more optimal solution. Plus even after reading the docs, I still can't find an option on BufReader/BufWriter that could limit HDD I/O operations to some arbitrary value (let's say 100MB/s for example).
I looked through the sysinfo crate but it does not seem to help in finding out current and maximum I/O for the HDD.
Am I out of luck and should I delve deeper in systems programming to find a solution ? Or is there already something that might teach how to prioritize my calls to the HDD or to simply limit my calls to some arbitrary value calculated from the currently available I/O rate of the HDD ?
After reading a bit more on the subject, apart from trying to read/write a lot of data and calculate from its performance, it seems like you can't find out HDD max I/O rate during the execution of the program and can only guess a constant at which HDD I/O rate can't go higher. (see https://superuser.com/questions/795483/how-to-limit-hdd-write-speed-for-chosen-programs/795488#795488)
But, you can still monitor disk activity, and with the number guessed earlier, you can use wait() more accurately than always limiting yourself at a constant speed. (here is a crate for Rust : https://github.com/myfreeweb/systemstat).
Prioritizing the process with the OS might be overkill since I'm trying to slip between other processes and share whatever resources are available at that time.
I just stumbled onto this SO question and was wondering if there would be any performance improvement if:
The file was compared in blocks no larger than the hard disk sector size (1/2KB, 2KB, or 4KB)
AND the comparison was done multithreaded (or maybe even with the .NET 4 parallel stuff)
I imagine there being 2 threads: one that reads from the beginning of the file and another that reads from the end until they meet in the middle.
I understand in this situation the disk IO is going to be the slowest part but if the reads never have to cross sector boundries (which in my twisted imagination somehow eliminates any possible fragmentation overhead) then it may potentially reduce head moves hence resulting in better performance (maybe?).
Of course other factors could play in as well, such as, single vs multiple processors/cores or SSD vs non-SSD, but with those aside; is the disk IO speed + potentially sharing processor time insurmountable? Or perhaps my concept of computer theory is completely off-base...
If you're comparing two files that are on the same drive, the only benefit you could receive from multi-threading is to have one thread reading--populating the next buffers--while another thread is comparing the previously-read buffers.
If the files you're comparing are on different physical drives, then you can have two asynchronous reads going concurrently--one on each drive.
But your idea of having one thread reading from the beginning and another reading from the end will make things slower because seek time is going to kill you. The disk drive heads will continually be seeking from one end of the file to the other. Think of it this way: do you think it would be faster to read a file sequentially from the start, or would it be faster to read 64K from the front, then read 64K from the end, then seek back to the start of the file to read the next 64K, etc?
Fragmentation is an issue, to be sure, but excessive fragmentation is the exception, not the rule. Most files are going to be unfragmented, or only partially fragmented. Reading alternately from either end of the file would be like reading a file that's pathologically fragmented.
Remember, a typical disk drive can only satisfy one I/O request at a time.
Making single-sector reads will probably slow things down. In my tests of .NET I/O speed, reading 32K at a time was significantly faster (between 10 and 20 percent) than reading 4K at a time. As I recall (it's been some time since I did this), on my machine at the time, the optimum buffer size for sequential reads was 256K. That will undoubtedly differ for each machine, based on processor speed, disk controller, hard drive, and operating system version.
Would keeping say 512 file handles to files sized 3GB+ open for the lifetime of a program, say a week or so, cause issues in 32-bit Linux? Windows?
Potential workaround: How bad is the performance penalty of opening/closing file handles?
The size of the files doesn't matter. The number of file descriptors does, though. On Mac OS X, for example, the default limit is 256 open files per process, so your program would not be able to run.
I don't know about Linux, but in Windows, 512 files doesn't seem that much to me. But as a rule of thumb, any more than a thousand and it's too many. (Although I have to say that I haven't seen any program first-hand opening more than, say, 50.)
And the cost of opening/closing handles isn't that big unless you do them every time you want to read/write a small amount, in which case it's too high and you should buffer your data.
When I read a large file in the file system, can the cache improve the speed of
the operation?
I think there are two different answers:
1.Yes. Because cache can prefetch thus the performance gets improved.
2.No. Because the speed to read from cache is more faster than the speed to read from
disk, at the end we can find that the cache doesn't help,so the reading speed is also
the speed to read from disk.
Which one is correct? How can I testify the answer?
[edit]
And another questions is :
What I am not sure is
that, when you turn on the cache the bandwidth is used to
1.prefetch
2.prefetch and read
which one is correct?
While if you
turn off the cache ,the bandwith of disk is just used to read.
If I turn off the cache and randomly access the disk, is the time needed comparable with the time when read sequentially with the cache turned on?
1 is definitely correct. The operating system can fetch from the disk to the cache while your code is processing the data it's already received. Yes, the disk may well still be the bottleneck - but you won't have read, process, read, process, read, process, but read+process, read+process, read+process. For example, suppose we have processing which takes half the time of reading. Representing time going down the page, we might have this sort of activity without prefetching:
Read
Read
Process
Read
Read
Process
Read
Read
Process
Whereas with prefetching, this is optimised to:
Read
Read
Read Process
Read
Read Process
Read
Process
Basically the total time will be "time to read whole file + time to process last piece of data" instead of "time to read whole file + time to process whole file".
Testing it is tricky - you'll need to have an operating system where you can tweak or turn off the cache. Another alternative is to change how you're opening the file - for instance, in .NET if you open the file with FileOptions.SequentialScan the cache is more likely to do the right thing. Try with and without that option.
This has spoken mostly about prefetching - general caching (keeping the data even after it's been delivered to the application) is a different matter, and obviously acts as a big win if you want to use the same data more than once. There's also "something in between" where the application has only requested a small amount of data, but the disk has read a whole block - the OS isn't actively prefetching blocks which haven't been requested, but can cache the whole block so that if the app then requests more data from the same block it can return that data from the cache.
First answer is correct.
The disk has a fixed underlying performance - but that fixed underlying performance differs in different circumstances. You obtain better real performance from a drive when you read long sections of data - e.g. when you cache ahead. So caching permits the drive to achieve genuine improvement its real performance.
In the general case, it will be faster with the cache. Some points to consider:
The data on the disk is organized in surfaces (aka heads), tracks and blocks. It takes the disk some time to position the reading heads so that you can start reading a track. Now you need five blocks from that track. Unfortunately, you ask for then in a different order than they appear on the physical media. The cache will help greatly by reading the whole track into memory (lots more blocks than you need), then reindex them (when the head starts to read, it probably will be anywhere on the track, not on the start of the first block). Without this, you'd have to wait until the first block of the track rotates under the head and start reading -> the time to read a track would be effectively doubled. So with a cache, you can read the blocks of a track in any order and you start reading as soon as the head arrives over the track.
If the file system is pretty full, the OS will start to squeeze your data into various empty spaces. Imagine block 1 is on track 5, block 2 is on track 7, block 3 is again on track 5. Without a cache, you'd loose a lot of time for positioning the head. With a cache, track 5 is read, kept in RAM as the head goes to track 7 and when you ask for block 3, you get it immediately.
Large files need a lot of meta-data, namely where the data blocks for the file are. In this case, the cache will keep this data live as you read the file, saving you from a lot more head trashing.
The cache will allow other programs to access their data in an efficient way as you hog the disk. So overall performance will be better. This is very important when a second program starts to write as you read. In this case, the cache will collect some writes before it interrupts your reads. Also, most programs read data, process it and then write it back. Without the cache, a program would either get into its own way or it would have to implement its own caching scheme to avoid head trashing.
A cache allows the OS to reorder the disk I/O. Say you have blocks on track 5, 7 and 13 but file order asks for track 5, 13 and then 7. Obviously, it's more efficient to read track 7 on the way to 13 rather than going all the way to 13 and then come back to 7.
So while theoretically, reading lots of data would be faster without a cache, this is only true if your file is the only one on the disk and all meta-data is ordered perfectly, the physical layout of the data is in such a way that the reading heads always start reading a track at the start of the first block, etc.
Jon Skeet has a very interesting benchmark with .NET about this topic. The basic result was that prefetching helps, the more processing per unit read you have to do.
If the files are larger than your memory, then it definitely has no way of helping.
Another point: Chances are, frequently used files will be in the cache before one is even starting to read one of them.
I am using VB6 and the Win32 API to write data to a file, this functionality is for the export of data, therefore write performance to the disk is the key factor in my considerations. As such I am using the FILE_FLAG_NO_BUFFERING and FILE_FLAG_WRITE_THROUGH options when opening the file with a call to CreateFile.
FILE_FLAG_NO_BUFFERING requires that I use my own buffer and write data to the file in multiples of the disk's sector size, this is no problem generally, apart from the last part of data, which if it is not an exact multiple of the sector size will include character zero's padding out the file, how do I set the file size once the last block is written to not include these character zero's?
I can use SetEndOfFile however this requires me to close the file and re-open it without using FILE_FLAG_NO_BUFFERING. I have seen someone talk about NtSetInformationFile however I cannot find how to use and declare this in VB6. SetFileInformationByHandle can do exactly what I want however it is only available in Windows Vista, my application needs to be compatible with previous versions of Windows.
I believe SetEndOfFile is the only way.
And I agree with Mike G. that you should bench your code with and without FILE_FLAG_NO_BUFFERING. Windows file buffering on modern OS's is pretty darn effective.
I'm not sure, but are YOU sure that setting FILE_FLAG_NO_BUFFERING and FILE_FLAG_WRITE_THROUGH give you maximum performance?
They'll certainly result in your data hitting the disk as soon as possible, but that sort of thing doesn't actually help performance - it just helps reliability for things like journal files that you want to be as complete as possible in the event of a crash.
For a data export routine like you describe, allowing the operating system to buffer your data will probably result in BETTER performance, since the writes will be scheduled in line with other disk activity, rather than forcing the disk to jump back to your file every write.
Why don't you benchmark your code without those options? Leave in the zero-byte padding logic to make it a fair test.
If it turns out that skipping those options is faster, then you can remove the 0-padding logic, and your file size issue fixes itself.
For a one-gigabyte file, Windows buffering will indeed probably be faster, especially if doing many small I/Os. If you're dealing with files which are much larger than available RAM, and doing large-block I/O, the flags you were setting WILL produce must better throughput (up to three times faster for heavily threaded and/or random large-block I/O).