How to get a Win32 program to update the file size while still writing files

How to get a Win32 program to update the file size while still writing files - windows

I have a Win32 program that keeps a file open and writes data to it over a period of several hours. I'd like for the file size, as shown in an Explorer window, to be updated every so often.
As an example, when a browser is downloading a large file, you can see the file size change over time, even though the file is still downloading.
With my current naive implementation, the file size remains zero until I close the file.
How do I do this in Win32? Currently the file is open using std::ofstream. Is this a proper application of std::ostream::flush() ? Or do I need to close and reopen the file with some regularity?

std::ostream::flush() makes sure you have your data safe on disk. Flushing the buffer is a valid use case in situations where the automatic flushes ain't good enough for you (e.g. there's too little data written over too long periods, the data is written constantly but needs to be accessible constantly too, you need to be sure the data gets logged in case of crash or power down etc.); yet, on some OS/filesystem combinations (see Why is the file size reported incorrectly for files that are still being written to?), that still won't update the file size accordingly. On Win32, you usually won't see size updates before actually closing/reopening the handle; sometimes re-reading the dir etc. will help, and sometimes it simply won't.
As such, you can use e.g. ReOpenFile to force that update, or simply use close/open instead of flushing. The exact solution depends whether you need the updated filesize so direly and the reduced output rate is not a real problem (in which case reopening is the best option), or if you can live with a wrong size reported (in which case flushes are your best option IMO).

Related

Appropriate way to cancel saving file via file stream?

A tool I'm writing is responsible for downloading thousands of image files over a matter of many hours. Originally, using TIdHTTP, I would Get the file(s) into a TMemoryStream, and then save that to a file, so long as there were no exceptions. In order to improve speed, I changed the TMemoryStream to a TFileStream.
However, now if the resource was not found, or otherwise any sort of exception which results in no actual file, it still saves an empty file.
Completely understandable, since I simply create a file stream just prior to the download...
FileStream:= TFileStream.Create(FileName, fmCreate);
try
Web.Get(AURL, FileStream);
finally
FileStream.Free;
end;
I know I could simply delete the file if there was an exception. But it seems far too sloppy. I'm sure there's a more appropriate method of aborting such a situation.
How should I make this to not save a file if there was an exception, while not altering the performance (if at all possible)?

How should I make this to not save a file if there was an exception, while not altering the performance (if at all possible)?
This isn't possible in general. Errors and failures can happen at any step if the way, including part way through the download. Once this point is understood, then you must accept that the file can be partially downloaded and then abandoned. At which point where do you store it?
The obvious choices are memory and file. You don't want to store to memory, which leaves to file.
This takes you back to your current solution.
I know I could simply delete the file if there was an exception.
This is the correct approach. There are a few variants on this. For instance you might download to a temporary file that is created with flags to arrange its deletion when closed. Only if the download completes do you then copy to the true destination. This is the approach that a browser takes. But the basic idea is to download to file and deal with any failure by tidying up.

Instead of downloading the entire image in one go, you could consider using HTTP range requests if the server supports it. Then you could chunk the file into smaller parts, requesting the next part after the first finishes (or even requesting multiple parts at the same time to increase performance). If there is an exception then you can about the future requests, so they never start in the first place.
YouTube and a number of streaming media sites started doing this a while ago. It used to be if you started playing a video, then paused it, then it would eventually cache the entire video. Now it only caches a little ahead of the current position. This saves a ton of bandwidth because of the abandon rate for videos.
You could write the partial file to disk or keep it in memory.

Move or copy and truncate a file that is in use

I want to be able to (programmatically) move (or copy and truncate) a file that is constantly in use and being written to. This would cause the file being written to would never be too big.
Is this possible? Either Windows or Linux is fine.
To be specific what I'm trying to do is log video with FFMPEG and create hour long videos.

It is possible in both Windows and Linux, but it would take cooperation between the applications involved. If the application that is writing the new data to the file is not aware of what the other application is doing, it probably would not work (well ... there is some possibility ... back to that in a moment).
In general, to get this to work, you would have to open the file shared. For example, if using the Windows API CreateFile, both applications would likely need to specify FILE_SHARE_READ and FILE_SHARE_WRITE. This would allow both (multiple) applications to read and write the file "concurrently".
Beyond sharing the file, though, it would also be necessary to coordinate the operations between the applications. You would need to use some kind of locking mechanism (either by locking some part of the file or some shared mutex/semaphore). Note that if you use file locking, you could lock some known offset in the file to act as a "semaphore" (it can even be a byte value beyond the physical end of the file). If one application were appending to the file at the same exact time that the other application were truncating it, then it would lead to unpredictable results.
Back to the comment about both applications needing to be aware of each other ... It is possible that if both applications opened the file exclusively and kept retrying the operations until they succeeded, then perform the operation, then close the file, it would essentially allow them to work without "knowledge" of each other. However, that would probably not work very well and not be very efficient.
Having said all that, you might want to consider alternatives for efficiency reasons. For example, if it were possible to have the writing application write to new files periodically, it might be more efficient than having to "move" the data constantly out of one file to another. Also, if you needed to maintain some portion of the file (e.g., move out the first 100 MB to another file and then move the second 100 MB to the beginning) that could be a fairly expensive operation as well.

logrotate would be a good option is linux, comes stock on just about any distro. I'm sure there's a similar windows service out there somewhere

Opening a custom file on-demand

I have a custom file type that is implemented in sections with a header at the shows the offset and length of each section within the file.
Currently, whenever I want to interact with the file, I must either load and parse the entire thing up front, or else pick only the sections that I need and load just them.
What I would like to do is to achieve a hybrid approach where each of the sections is loaded on-demand.
It seems however that doing this has a lot of potential downsides in terms of leaving filesystem handles open for longer than I would like and the additional code complexity that I would incur.
Are there any standard patterns for this sort of thing? It seems that my options are to:
Just load the entire file and stop grousing about the cycles/memory wasted
Load the entire file into memory as raw bytes and then satisfy any requests for unloaded sections from the memory buffer rather than disk. This saves me the cost of parsing the unneeded sections and requires less memory (since the disk representation is much more compact than the object model around it), but still means that I waste memory for sections that I never end up loading.
Load whatever sections I need right away and close the file but hold onto the source location of the file. Then if another section is requested, re-open the file and load the data. In this case I could get strange results if the underlying file is changed.
Same as the above but leave a file handle open (perhaps allowing read sharing).
Load the file using Memory-Mapped IO and leave a view on the file open.
Any thoughts

If possible, MMAP-ing the whole file is usually the easiest thing to do if you have a random-access pattern. This way you just delegate the loading/unloading issue to the OS and you have 1 & 2 for free.
If you have very special access patterns, you can even use something like fadvise() (I don't the exact Win32 equivalent) to tell the OS your access intend.
If your file is more than 2GB and you can either go the 64bits way or to mmap() the file on demand.

If the file is relatively small, mmap-ing the entire file is good enough. If the file is large, you could leave a mmap view open, and just move it around the file and resize it to view each section when needed.

Programmatically empty out large text file when in use by another process

I am running a batch job that has been going for many many hours, and the log file it is generating is increasing in size very fast and I am worried about disk space.
Is there any way through the command line, or otherwise, that I could hollow out that text file (set its contents back to nothing) with the utility still having a handle on the file?
I do not wish to stop the job and am only looking to free up disk space via this file.
Im on Vista, 64 bit.
Thanks for the help,

Well, it depends on how the job actually works. If it's a good little boy and it pipes it's log info out to stdout or stderr, you could redirect the output to a program that you write, which could then write the contents out to disk and manage the sizes.
If you have access to the job's code, you could essentially tell it to close the file after each write (hopefully it's an append) operation, and then you would have a timeslice in which you could actually wipe the file.
If you don't have either one, it's going to be a bit tough. If someone has an open handle to the file, there's not much you can do, IMO, without asking the developer of the application to find you a better solution, or just plain clearing out disk space.

Depends on how it is writing the log file. You can not just delete the start of the file, because the file handle has a offset of where to write next. It will still be writing at 100mb into the file even though you just deleted the first 50mb.
You could try renaming the file and hoping it just creates a new one. This is usually how rolling logs work.

You can use a rolling log class, which will wrap the regular file class but silently seek back to the beginning of the file when the file reaches a maximum designated size.
It is a very simple wrap, either write it yourself or try finding an implementation online.

Memory mapped files optional write possible?

When using memory-mapped files it seems it is either read-only, or write-only. By this I mean you can't:
have one open for writing, and later decide not to save it
have open open for reading, and later decide to save it
Our application uses a writeable memory-mapped file to save data files, but since the user might want to exit without saving changes, we have to use a temporary file which the user actually edits. When the user opts to save the changes, the original file is overwritten with the temporary file so it has the latest changes. This is cumbersome because the files can be very large (>1GB) and it takes a long time to copy them.
I've tried many combinations of the flags used to create the file mapping but none seem to allow the flexibility of saving on demand. Can anyone confirm this is the case? Our application is written in Delphi, but it uses the standard Windows API to create the mapping, in our case
FMapHandle := CreateFileMapping(FFileHandle, nil, PAGE_READWRITE, 0, 2 * 65536, nil);
FBasePointer := MapViewOfFile(FileMapHandle, FILE_MAP_WRITE, FileOffsetHigh,
FileOffsetLow, NumBytes);

I don't think you can. By that I mean you may be able to, but it doesn't make any sense to me :-)
The whole point of a memory-mapped file is that it's a window onto the actual file. If you don't wany changes reflected in the file, you'll probably have to do something like batch up the changes in a data structure (e.g., an array of base address, size and data) and apply them when saving.
In which case, you wouldn't actually need the memory mapped file, just read in and maintain the chunks you want to change (lock the file first if there's a chance of multi-user access).
Update:
Have you thought of the possibility of, when doing a save, deleting the original file and just renaming the temporary file to the original file name? That's likely to be much faster than copying 1G of data from temporary to original. That way, if you don't want it saved, just delete the temporary file and keep the original.
You'll still have to copy the original data to the temporary file when loading but you won't have to copy the temporary data back (whether you save it or not) - that would halve the time taken.

Possible, but non-trivial.
You have to understand memory mapped basics, and the difference between the three modes of memory-mapped files. Both set aside a part of your virtual address space and create a mapping entry in an internal table. No physical RAM is initially allocated. Hence, when you DO try to access the memory, the CPU faults and the OS has to fix up. It does so by copying the file contents to RAM and mapping the RAM to your process, at the faulting address.
Now, the difference between the three modes is how the descriptors are set on the mapped pages. In all cases you get read access on the pages. (The first mode). However, if you ask for write access and subsequently write to it, on your first write the page is marked as writeable and dirty. It can then be written back to the original file, at the discretion of the OS (Second mode). Finally, it's possible to get copy-on-write semantics. You still start out with only read access to the page in memory. When you write to it, the CPU still faults and the OS needs to fix it up. With copy-on-write, that fixup is done by setting the backing store of the changed page to the page file, instead of the original mapped file.
So, in your case you want to use copy-on-write mode. If the user decides to discard the modifications, no problem. You simply discard the memory mapping. All pages that were modified in memory, and were backed by the page file are also discarded.
If the user does decide to save, you've got a slightly harder task. You now need to figure out which parts of the file have changed. Those changes are in memory, and you need to reapply those to the source file. You can do this with Page Guards. So, when the user decides to save, copy all modified pages to a separate memory block, remap the (unchanged) file for write, and apply the changes.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio