I/O completion port silently fails to read completely - windows

I'm developing a program that needs to write a large amout of data to disk then read back much smaller amount of data back later on. It needs to "bin" related data together then once it figures out what to do with it, then it can process the data further. It's basically acting like a database, but with temp files on disk. Portions of the temp files get reused fairly frequently as I don't care about the data on disk after I read it back out, so that portion of the file can be recycled. I'm using I/O completion ports to implement this because sequential I/O is simply too slow.
The problem is that sometimes when I read the data, I don't get all of it back. For example, I will zero out my read buffer, do a read operation of, say, 20 bytes, and when the corresponding completion event triggers, some or even none of my read buffer will match what should be on disk, but all of it won't be zeroed out. Occasionally, I can detect this and try sleeping 5 seconds and reading the same portion again, and it matches what I read in the first try. This is taking place on a top of the line SSD, so 5 seconds should be plenty to flush to disk. However, when I stop my application and look at the contents of the file, it's correct on disk. It's as if the previous write hasn't flushed to disk and it tried reading old data.
To test that theory, I tried writing 0xFF on entire sections as I read them. When this error happened again, my read buffer did not contain 0xFFs as I would have expected. So presumably, I'm not reading old data.
I also checked to make sure that the number of bytes returned from the completion event matched the number of bytes that I passed to ReadFile, and they do match. There is no error returned by the completion event or by ReadFile (other than ERROR_IO_PENDING). I am creating my temp files with FILE_ATTRIBUTE_NORMAL, FILE_FLAG_OVERLAPPED, and FILE_FLAG_RANDOM_ACCESS.
I also tried waiting for all pending writes for a given portion of the file to complete before trying to read, but to no avail. I would hope that Windows would do that for me, but it isn't covered in any documentation that I've read.
I'm really at a loss as to why I'm getting what look to be partial or corrupted reads. I'm really just looking for some ideas that might cause this behavior because I'm all out.

From the sound of things you're firing off writes and reads to the same portions of the same file and sometimes the data that the read returns isn't what you think you have previously written.
I assume you are waiting for the write completion for a piece of data before issuing a read request for the same area of the file? If not the read could be occurring before the write completes? When lots of data is being written to the same disk the write completions may begin to slow down and writes may spend more time pending (watch out for the resources that this consumes!)
Personally I'd include my own memory cache layer which knows about the data block until the write completion occurs - you can then satisfy reads for this part of the file from your cache if the write has not yet completed.

Related

How to get a Win32 program to update the file size while still writing files

I have a Win32 program that keeps a file open and writes data to it over a period of several hours. I'd like for the file size, as shown in an Explorer window, to be updated every so often.
As an example, when a browser is downloading a large file, you can see the file size change over time, even though the file is still downloading.
With my current naive implementation, the file size remains zero until I close the file.
How do I do this in Win32? Currently the file is open using std::ofstream. Is this a proper application of std::ostream::flush() ? Or do I need to close and reopen the file with some regularity?
std::ostream::flush() makes sure you have your data safe on disk. Flushing the buffer is a valid use case in situations where the automatic flushes ain't good enough for you (e.g. there's too little data written over too long periods, the data is written constantly but needs to be accessible constantly too, you need to be sure the data gets logged in case of crash or power down etc.); yet, on some OS/filesystem combinations (see Why is the file size reported incorrectly for files that are still being written to?), that still won't update the file size accordingly. On Win32, you usually won't see size updates before actually closing/reopening the handle; sometimes re-reading the dir etc. will help, and sometimes it simply won't.
As such, you can use e.g. ReOpenFile to force that update, or simply use close/open instead of flushing. The exact solution depends whether you need the updated filesize so direly and the reduced output rate is not a real problem (in which case reopening is the best option), or if you can live with a wrong size reported (in which case flushes are your best option IMO).

Why are read operations much faster after writing files using OS and disk buffers?

I am writing about a hundred files with a size of 50MB each sequentially to a directory on my disk using CreateFile() and WriteFile(). In a second steps, the contents of those files are read using CreateFile() and ReadFile().
I noticed some partially weird things:
If I pass FILE_FLAG_NO_BUFFERING | FILE_FLAG_WRITE_THROUGH when writing the files, reading takes a noticably long time (usually hundreds of milliseconds). However, when I do not pass those flags (but use FlushFileBuffers() instead), writing appears to happen at roughly the same speed but reading those files after writing them is blazingly fast (less than 20 milliseconds per file!).
How is this possible? How do the flags passed when writing 5000MB of data affect reading later? Does the disk cache the whole 5GB in its cache?
When you pass FILE_FLAG_NO_BUFFERING then you are telling the system not to put the data in its disk cache. Then when you read the data, the system has to get the data from the disk.
When you omit FILE_FLAG_NO_BUFFERING, the system can put the data in its disk cache. And so when you read the data subsequently, it can be read directly from memory, which is faster than disk.
From https://support.microsoft.com/en-us/kb/99794:
The FILE_FLAG_WRITE_THROUGH flag for CreateFile() causes any writes made to that handle to be written directly to the file without being buffered. The data is cached (stored in the disk cache); however, it is still written directly to the file. This method allows a read operation on that data to satisfy the read request from cached data (if it's still there), rather than having to do a file read to get the data. The write call doesn't return until the data is written to the file. This applies to remote writes as well--the network redirector passes the FILE_FLAG_WRITE_THROUGH flag to the server so that the server knows not to satisfy the write request until the data is written to the file.
The FILE_FLAG_NO_BUFFERING takes this concept one step further and eliminates all read-ahead file buffering and disk caching as well, so that all reads are guaranteed to come from the file and not from any system buffer or disk cache.
You might find this article from Raymond Chen of interest: We’re currently using FILE_FLAG_NO_BUFFERING and FILE_FLAG_WRITE_THROUGH, but we would like our WriteFile to go even faster. An excerpt:
A customer said that their program’s I/O pattern is to open a file and
then every so often write about 100KB of data into the file. They are
currently using the FILE_FLAG_NO_BUFFERING and FILE_FLAG_WRITE_THROUGH
flags to open a file, and they wanted to know what else they could do
to make their writes go even faster.
Um, for one thing, you stop passing those two flags!
Those two flags in combination basically mean “Give me the slowest
possible I/O performance!” because they force all I/O to go through to
the physical media right away.

two programs accessing one file

New to this forum - looks great!
I have some Processing code that periodically reads data wirelessly from remote devices and writes that data as bytes to a file, e.g. data.dat. I want to write an Objective C program on my Mac Mini using Xcode to read this file, parse the data, and act on the data if data values indicate a problem. My question is: can my two different programs access the same file asynchronously without a problem? If this is a problem can you suggest a technique that will allow these operations?
Thanks,
Kevin H.
Multiple processes can read from the same file at a time without any problem. A process can also read from a file while another writes without problem, although you'll have to take care to ensure that you read in any new data that was written. Multiple processes should not write to the same file at at the same time, though. The OS will let you do it, but the ordering of data will be undefined, and you'll like overwrite data—in general, you're gonna have a bad time if you do that. So you should take care to ensure that only one process writes to a file at a time.
The simplest way to protect a file so that only one process can write to it at a time is with the C function flock(), although that function is admittedly a bit rudimentary and may or may not suit your use case.

Is the order of cashed writes preserved in Windows 7?

When writing to a file in Windows 7, Windows will cache the writes by default. When it completes the writes, does Windows preserve the order of writes, or can the writes happen out of order?
I have an existing application that writes continuously to a binary file. Every 20 seconds, it writes a block of data, updates the file's Table of Contents, and calls _commit() to flush the data to disk.
I am wondering if it is necessary to call commit, or if we can rely on Windows 7 to get the data to disk properly.
If the computer goes down, I'm not too worried about losing the most recent 20 seconds worth of data, but I am concerned about making the file invalid. If the file's Table of Contents is updated, but the data isn't present, then the file will not be correct. If the data is updated, but the Table of Contents isn't, then there will be extra data at the end of the file, but since it's not referenced by the Table of Contents, it is ignored when reading the file, and we have a correct file.
The writes will not necessarily happen in order. In particular if there are multiple disk I/Os outstanding, the filesystem/disk driver may reorder the I/O operations to reduce head motion. That means that there is no guarantee that data that is written to disk will be written in the order it was written to the file.
Having said that, flushing the file to disk will stall until the I/O is complete - that may mean several dozen milliseconds (or even longer) of inactivity when you application could be doing something more useful.

Custom Prefetch

Any programmatic techniques, portable or specific to NT and Linux that get the result of number of large files loading faster? I am after a 'ahead of time', a prior, whatever you prefer to call it mechanisms that I can control in code for two OS in a question.
Each file has to be processed in full, i.e. completely in size and sequentially for its contents. The aim is to speed up some batch file processing.
I don't know about NT, but one option on Linux would be to use madvise with the MADV_WILLNEED flag shortly before you actually need the next file to start reading it in early.
Alternately, a more portable option would be to simply manually do readahead in a separate thread from your buffer-processing thread - that is, read data in to fill an X MB buffer in thread A, process it as fast as you can in thread B.
I am not aware of a Win32 (NT) API similar to madvise().
However, I would suggest an approach.
First, pass the Win32 flag FILE_FLAG_SEQUENTIAL_SCAN to CreateFile(). This will allow the Windows operating system to perform better buffering of the file once you have opened it.
With FILE_FLAG_SEQUENTIAL_SCAN, your file parser may operate more quickly once the file is in memory. Unlike madvise() on Linux, the file will not begin loading into memory any earlier due to the use of the Win32 flag.
Next, we need to trigger the file to begin loading. Asynchronously read the first page of the file by calling ReadFileEx() with an OVERLAPPED structure and a FileIOCompletionRoutine function.
Your FileIOCompletionRoutine can simply return, or you can set the event in the overlapped structure -- read the MSDN details of ReadFileEx for details.
Since it would not be a critical failure if the pre-fetch hasn't completed when you actually read from the file, the easiest implementation would be to "fire and forget" -- execute the overlapped file read and then never check the result of it. Be sure that you read the data into valid buffers, though!
If you perform this operation for a file while reading the previous file, the result should be that the next file will commence paging in.
Be aware that this may slow your performance. As the next file begins to page in, the disk I/O to access that file will compete with disk I/O for the file you are currently parsing. If the two files are physically distant from each other on the same disk, the result of pre-fetching might be additional delay as the drive head seeks. Although modern drives have huge buffers which mitigate this, queuing the first page of a new file is likely to cause a head seek.
bdonlan's suggestion of a 'pre-fetch' thread which loads the files asynchronously from the processing would be a workable solution for Win32, also.

Resources