Should I write a temp file to a temp dir? or write a temp file to the final directory? - windows

When an application saves a file, a typical model is to save the file to a temporary location, then move the temporary file to the final location. In some cases that "move" becomes "replace". In pseudo code:
Save temp file;
if final file exists
delete final file;
move temp file to final filename;
There's a window in there where the delete might succeed, but the move may not, so you can handle that by something like :
Save temp file;
if final file exists
move final file to parking lot
move temp file to final filename;
if move succeeded
delete previous final file.
else
restore previous final file.
Now to my questions:
is it preferred to save the temporary file to a temporary directory, and then move it, as opposed to saving the temporary file to the final directory? (if so, why?)
Is there a difference in attributes and permissions on a file that is first saved to a temp dir, then moved to the final file in a different directory, as compared to a file that is saved to a temp file in the final directory, and then renamed within the directory?
If the answers to both are YES, then how can I do the preferred thing while getting the appropriate ACL on file which was first saved to a temporary directory and then moved to a final directory?

Create a temp file in the temp folder if it is just a temporary file. Otherwise, create it in its final destination.
Caveats:
1) This may not work if the final destination is a 'pickup' folder (unless the 'pickup' process checks for locked files (which it should))
2) The final destination has special permissions that have to be created in code and applied before being able to move to the final destination.

Microsoft Word saves a temp file to the original directory starting with a tilde (~). I would just follow that convention.

If these are temp files that turn into permanent files, create them in the same location to avoid any risk of having to "move" files across disks/partitions, which will result in more I/O (as a copy followed by a delete).
If these are temp files that are truly temporary, create (and leave them) in the temp dir.

A reason why you might want to never write a file to one directory and move it to another is because those directories might be on different filesystems. Although this is less often a problem on windows, it is still reasonably possible so long as the parent filesystem is ntfs. In unix, it is a standard practice for /tmp to be a different filesystem.
The reason this could be a problem is because that means the file has to be copied from one place to another. This significantly impacts performance for files of substantial size, and will certainly require many more seeks, even if the file is small. Additionally, there are many more ways for this to fail when moving a file across filesystem boundaries. Of coursea access permissions could be different, but also the target filesystem could be full, or any number of other additional complications that you are now deferring until much later.

It is preferable to create a temp file using the GetTempFile routines because this creates temp files in predefined locations (e.g. C:\temp) that utilities can delete if your app crashes or makes corrupt files in. If the same thing happens in your final directory, it is unrecoverable.
Yes, attributes could be different if the target file's attributes or ACL has been edited. This could happen even if you create the temp file in the same folder.
You fix this by using the File.Replace routine, which performs an atomic replacement of one file with another, replacing the new file's attributes and ACLs with the old file's.
A C# method that does this is an answer to Safe stream update of file.

I prefer saving the temporary file to the final directory:
It avoids the potential permission problems that you've described.
The final directory might be on a different volume, in which case the move (of the temporary to the final file) is really a copy + delete -- which incurs a lot of overhead if you do it often or if the file is big.
You can always rename the existing file to a second temporary file, rename the new temporary file to the existing file's name, and rollback on error. That seems to me to be the safest combination.
EDITED: I see that your "parking lot" already described my suggestion, so I'm not sure I've added much here.

1 . Yes, it is preferred to save to a temporary file first
Because the final file will never be in a corrupt state should the creation of the file fails for any reason. If you write directly to the final file and your program crashed mid-way... it will definitely leave the final file in an invalid state.
2 . Yes
The "inherited" attributes and permissions will of course, be different. But temporary directories on most systems usually are pre-configured for all applications to use. The "final file" directory might, however, need to be configured. Say the "Program Files" folder and Vista UAC, for example.
3 . Copy ACL from the final file to the temp file prior to replacing?

By default Android places .tmp as the suffix when the suffix param is set to null in File.createTempFile(). I would suggest you just use that.
File file = File.createTempFile(imageFileName, null, storageDir);
You should call file.delete() yourself as soon as you're done with your .tmp file in your app. You shouldn't depend on file.deleteOnExit() since there's absolutely no guarantee it'll be used by the Android system/VM.

Why not make it user configurable? Some users don't like temp files polluting their current directory.

Related

How to overwrite a file in a tarball

I've got an edge case where two files have the same name but different contents and are written to the same tarball. This causes there to be two entries in the tarball. I'm wondering if there's anything I can do to make the tar overwrite the file if it already exists in the tarball as opposed to creating another file with the same name.
No way as the first file have already been written when you ask to write the second one and the stream has advanced the position. Remember tar files are sequentially accessed.
You should do deduplication before starting to write.

ioutil.TempFile and umask

In my Go application instead of writing to a file directly I would like to write to a temporary that is renamed into the final file when everything is done. This is to avoid leaving partially written content in the file if the application crashes.
Currently I use ioutil.TempFile, but the issue is that it creates the file with the 0600 permission, not 0666. Thus with typical umask values one gets the 0600 permission, not expected 0644 or 0660. This is not a problem is the destination file already exist as I can fix the permission on the temporary to much the existing ones, but if the file does not exist, then I need somehow to deduce the current umask.
I suppose I can just duplicate ioutil.TempFile implementation to pass 0666 into os.OpenFile, but that does not sound nice. So the question is there a better way?
I don't quite grok your problem.
Temporary files must be created with as tight permissions as possible because the whole idea of having them is to provide your application with secure means of temporary storing data which is too big to fit in memory (or to hand the generated file over to another process). (Note that on POSIX systems, where an opened file counts as a live reference to it, it's even customary to immediately remove the file while having it open so that there's no way to modify its data other than writing it from the process which created it.)
So in my opinion you're trying to use a wrong solution to your problem.
So what I do in a case like yours is:
Create a file with the same name as old one but with the ".temp" suffix appended.
Write data there.
Close, rename it over the old one.
If you feel like using a fixed suffix is lame, you can "steal" the implementation of picking a unique non-conflicting file name from ioutil.TempFile(). But IMO this would be overengeneering.
You can use ioutil.TempDir to get the folder where temporary files should be stored an than create the file on your own with the right permissions.

How to reliably overwrite a file in Windows

I want to overwrite the content of a file atomically. I need this to maintain integrity when overwriting a config file, so an update should either pass or fail and never leave the file half-written or corrupted.
I went through multiple iteration to solve this problem, here is my current solution.
Steps to overwrite file "foo.config":
Enter a global mutex (unique per file name)
Write the new content in "foo.config.tmp"
Call FlushFileBuffers on the file handle before closing the file to flush the OS file buffers
Call ReplaceFile which will internally
rename "foo.config" to "foo.config.bak"
rename "foo.config.tmp" to "foo.config"
Delete "foo.config.bak"
Release the global mutex
I thought this solution to be robust, but the dreaded issue occurred again in production after a power failure. The config file was found corrupted, filled with 'NULL' character, .tmp or .bak file did not exist.
My theory is that the original file content was zeroed out when deleting "foo.config.bak" but the filesystem metadata update caused by the ReplaceFile call was not flushed to disk. So after reboot, "foo.config" is pointing to the original file content that has been zeroed out, is that even possible since ReplaceFile is called before DeleteFile?
The file was stored on an SSD (SanDisk X110).
Do you see a flaw in my file overwrite procedure? Could it be an hardware failure in the SSD? Do you have an idea to guarantee the atomicity of the file overwrite even in case of power failure? Ideally I'd like to delete the tmp and bak file after the overwrite.
Thanks,
Use MoveFileEx with the MOVEFILE_WRITE_THROUGH flag when renaming the file. This should tell windows to write the file right away, not caching it.

How to change file contents automically?

I am given a binary file (consider it large) and a several binary blobs, which I should insert/replace somewhere in the middle of the file (offsets are known).
The same time user may gain access to the file, thus I must have "all of nothing", either user have an old version of the file if she opens it before I have updated everything, or she has a new version if I succeeded.
I am interesting in solutions for Linux, Windows and OS X. Of course, implementation may be different.
For Linux:
Do everything on a temporary file.
fsync() the temporary file.
rename() the temporary file to the real file.
This idiom is known as atomic-rename.

Write multiple files atomically

Suppose I have a folder with a few files, images, texts, whatever, it only matters that there are multiple files and the folder is rather large (> 100 mb). Now I want to update five files in this folder, but I want to do this atomically, normally I would just create a temporary folder and write everything into it and if it succeeds, just replace the existing folder. But because I/O is expensive, I don't really want to go this way (resaving hundreds of files just to update five seems like a huge overhead). But how am I supposed to write these five files atomically? Note, I want the writing of all files to be atomic, not each file separately.
You could adapt your original solution:
Create a temporary folder full of hard links to the original files.
Save the five new files into the temporary folder.
Delete the original folder and move the folder of hard links in its place.
Creating a few links should be speedy, and it avoids rewriting all the files.

Resources