Atomic file update in Windows - windows

I have a program that periodically updates a file that is then updated on other machines via Dropbox.
So far I've been creating a temporary file, writing to it, and then calling ReplaceFile. However, Dropbox sees it as a separate delete+create operation, and as a result, the file will be deleted on other machines before being created again (possibly after some delay), which is not acceptable.
In fact, Microsoft recommends ReplaceFile as an alternative to Transactional NTFS, when it is clearly not doing the job.
Is there any way to atomically update the file contents in this scenario?

Related

Disable or flush page cache on Windows

I assume Windows has a similar concept to Linux's page cache for storing in memory data from disks, like files, executables and dynamic libraries. I wonder if it is possible at all to disable such cache or to the very least to clear/flush it.
This is called Standby List under windows. You can purge it globally, or for one volume, or for one file handle.
Globally
You can do it using a readily available program from Microsoft Technet, by selecting Empty → Empty Standby List
Programmatically, you can achieve the same thing using the undocumented NtSetSystemInformation function, for details see line 239 in a program which does the same thing as the previously mentioned one, among other things.
For one file handle
Open the file with FILE_FLAG_NO_BUFFERING: The documentation is lying insofar as it says that you open the file without buffering, but the true, observable behavior on all Windows versions from Windows 98 up to Windows 8 is that it simply throws away the complete cache contents for that file (for everyone!) and doesn't repopulate the cache from reads that use this handle.
For one complete volume
A volume handle is just a file handle (a somewhat special one, but still), so assuming you have appropriate privilegues to open a volume handle, you can do the same for a complete volume.
Also, as pointed out in the answer here, there seems to be a feature/bug (feature-bug?) which allows you to invalidate a volume's cache even without proper privilegues merely by attepting to open it without shared writes, at least under one recent version of Windows.
It makes perfect sense that this happens when any open which is valid for writing succeeds as you may change filesystem-internal data doing so (insofar it is a feature), but apparently it also works when opening the volume fails (which is a bug).

StreamWriter : Is there a way to write to memory without writing directly to a disk?

I have an application that in a certain moment has to create a temporary file and then delete it. So I'm thinking about whether it is possible to create a temporary file into memory instead of directly into disk. Then I would like to read it from memory to perform some actions and finally after doing some stuff, delete if from memory. If it is possible, how to do this in .NET Framework and C#. Also, Can performance be affected by using directly memory and not disk?
I would like to use memory directly instead of disk because I am afraid of some kind of windows permissions that does not allow to write file to some places of disk... so another doubt is: if using disk instead of memory is there a safe place to write temporary files and avoid writting permissions conflicts for windows XP,Vista,7,8? I mean, a place where is always safe to write temporary files and then delete it. Maybe, in user\Local Settings\Temp folder? is it the best place? I would like to know a safe place to save temporary files and then delete them that is working for windows versions: XP, Vista, 7, and 8.
Thanks.
You can use the MemoryStream for writing to memory instead of disk. Preformance can possibly be better using a memorystream because disks are often slower to write to. As always: profile before optimizing.
For information about the temporary folder see this thread: How to get temporary folder for current user
You should be able to write your own class inherited from Stream class, override its methods (read/write etc), and inside them maintain your own stringbuilder or whatever suits you. Everything will be inside the memory and of course it'll be really fast.
I think XP had "Documents and Settings", vista/7 has Users folder which has its very own place for each user to write temp files at, you should not have a problem writing there. What you are thinking is correct imo.

Avoid updating last-accessed date/time when reading a file

We're building a Windows-based application that traverses a directory structure recursively, looking for files that meet certain criteria and then doing some processing on them. In order to decide whether or not to process a particular file, we have to open that file and read some of its contents.
This approach seems great in principle, but some customers testing an early version of the application have reported that it's changing the last-accessed time of large numbers of their files (not surprisingly, as it is in fact accessing the files). This is a problem for these customers because they have archive policies based on the last-accessed times of files (e.g. they archive files that have not been accessed in the past 12 months). Because our application is scheduled to run more frequently than the archive "window", we're effectively preventing any of these files from ever being archived.
We tried adding some code to save each file's last-accessed time before reading it, then write it back afterwards (hideous, I know) but that caused problems for another customer who was doing incremental backups based on a file system transaction log. Our explicit setting of the last-accessed time on files was causing those files to be included in every incremental backup, even though they hadn't actually changed.
So here's the question: is there any way whatsoever in a Windows environment that we can read a file without the last-accessed time being updated?
Thanks in advance!
EDIT: Despite the "ntfs" tag, we actually can't rely on the filesystem being NTFS. Many of our customers run our application over a network, so it could be just about anything on the other end.
The documentation indicates you can do this, though I've never tried it myself.
To preserve the existing last access time for a file even after accessing a file, call SetFileTime immediately after opening the file handle with this parameter's FILETIME structure members initialized to 0xFFFFFFFF.
From Vista onwards NTFS does not update the last access time by default. To enable this see http://technet.microsoft.com/en-us/library/cc959914.aspx
Starting NTFS transaction and rolling back is very bad, and the performance will be terrible.
You can also do
FSUTIL behavior set disablelastaccess 0
I don't know what your client minimum requirements are, but have you tried NTFS Transactions? On the desktop the first OS to support it was Vista and on the server it was Windows Server 2008. But, it may be worth a look at.
Start an NTFS transaction, read your file, rollback the transaction. Simple! :-). I actually don't know if it will rollback the Last Access Date though. You will have to test it for yourself.
Here is a link to a MSDN Magazine article on NTFS transactions which includes other links. http://msdn.microsoft.com/en-us/magazine/cc163388.aspx
Hope it helps.

Locking Executing Files: Windows does, Linux doesn't. Why?

I noticed when a file is executed on Windows (.exe or .dll), it is locked and cannot be deleted, moved or modified.
Linux, on the other hand, does not lock executing files and you can delete, move, or modify them.
Why does Windows lock when Linux does not? Is there an advantage to locking?
Linux has a reference-count mechanism, so you can delete the file while it is executing, and it will continue to exist as long as some process (Which previously opened it) has an open handle for it. The directory entry for the file is removed when you delete it, so it cannot be opened any more, but processes already using this file can still use it. Once all processes using this file terminate, the file is deleted automatically.
Windows does not have this capability, so it is forced to lock the file until all processes executing from it have finished.
I believe that the Linux behavior is preferable. There are probably some deep architectural reasons, but the prime (and simple) reason I find most compelling is that in Windows, you sometimes cannot delete a file, you have no idea why, and all you know is that some process is keeping it in use. In Linux it never happens.
As far as I know, linux does lock executables when they're running -- however, it locks the inode. This means that you can delete the "file" but the inode is still on the filesystem, untouched and all you really deleted is a link.
Unix programs use this way of thinking about the filesystem all the time, create a temporary file, open it, delete the name. Your file still exists but the name is freed up for others to use and no one else can see it.
Linux does lock the files. If you try to overwrite a file that's executing you will get "ETXTBUSY" (Text file busy). You can however remove the file, and the kernel will delete the file when the last reference to it is removed. (If the machine wasn't cleanly shutdown, these files are the cause of the "Deleted inode had zero d-time" messages when the filesystem is checked, they weren't fully deleted, because a running process had a reference to them, and now they are.)
This has some major advantages, you can upgrade a process thats running, by deleting the executable, replacing it, then restarting the process. Even init can be upgraded like this, replace the executable, and send it a signal, and it'll re-exec() itself, without requiring a reboot. (THis is normally done automatically by your package management system as part of it's upgrade)
Under windows, replacing a file that's in use appears to be a major hassle, generally requiring a reboot to make sure no processes are running.
There can be some problems, such as if you have an extremely large logfile, and you remove it, but forget to tell the process that was logging to that file to reopen the file, it'll hold the reference, and you'll wonder why your disk didn't suddenly get a lot more free space.
You can also use this trick under linux for temporary files. open the file, delete it, then continue to use the file. When your process exits (for no matter what reason -- even power failure), the file will be deleted.
Programs like lsof and fuser (or just poking around in /proc//fd) can show you what processes have files open that no longer have a name.
I think linux / unix doesn't use the same locking mechanics because they are built from the ground up as a multi-user system - which would expect the possibility of multiple users using the same file, maybe even for different purposes.
Is there an advantage to locking? Well, it could possibly reduce the amount of pointers that the OS would have to manage, but now a days the amount of savings is pretty negligible. The biggest advantage I can think of to locking is this: you save some user-viewable ambiguity. If user a is running a binary file, and user b deletes it, then the actual file has to stick around until user A's process completes. Yet, if User B or any other users look on the file system for it, they won't be able to find it - but it will continue to take up space. Not really a huge concern to me.
I think largely it's more of a question on backwards compatibility with window's file systems.
I think you're too absolute about Windows. Normally, it doesn't allocate swap space for the code part of an executable. Instead, it keeps a lock on the excutable & DLLs. If discarded code pages are needed again, they're simply reloaded. But with /SWAPRUN, these pages are kept in swap. This is used for executables on CD or network drives. Hence, windows doesn't need to lock these files.
For .NET, look at Shadow Copy.
If executed code in a file should be locked or not is a design decision and MS simply decided to lock, because it has clear advantages in practice: That way you don't need to know which code in which version is used by which application. This is a major problem with Linux default behaviour, which is simply ignored by most people. If system wide libs are replaced, you can't easily know which apps use code of such libs, most of the times the best you can get is that the package manager knows some users of those libs and restarts them. But that only works for general and well know things like maybe Postgres and its libs or such. The more interesting scenarios are if you develop your own application against some 3rd party libs and those get replaced, because most of the times the package manager simply doesn't know your app. And that's not only a problem of native C code or such, it can happen with almost everything: Just use httpd with mod_perl and some Perl libs installed using a package manager and let the package manager update those Perl libs because of any reason. It won't restart your httpd, simply because it doesn't know the dependencies. There are plenty of examples like this one, simply because any file can potentially contain code in use in memory by any runtime, think of Java, Python and all such things.
So there's a good reason to have the opinion that locking files by default may be a good choice. You don't need to agree with that reasons, though.
So what did MS do? They simply created an API which gives the calling application the chance to decide if files should be locked or not, but they decided that the default value of this API is to provide an exclusive lock to the first calling application. Have a look at the API around CreateFile and its dwShareMode argument. That is the reason why you might not be able to delete files in use by some application, it simply doesn't care about your use case, used the default values and therefore got an exclusive lock by Windows for a file.
Please don't believe in people telling you something about Windows doesn't use ref counting on HANDLEs or doesn't support Hardlinks or such, that is completely wrong. Almost every API using HANDLEs documents its behaviour regarding ref counting and you can easily read in almost any article about NTFS that it in deed does support Hardlinks and always did. Since Windows Vista it has support for Symlinks as well and the Support for Hardlinks has been improved by providing APIs to read all of those for a given file etc.
Additionally, you may simply want to have a look at the structures used to describe a file in e.g. Ext4 compared to those of NTFS, which have a lot in common. Both work with the concept of extents, which separates data from attributes like file name, and inodes are pretty much just another name for an older, but similar concept of that. Even Wikipedia lists both file systems in its article.
There's really a lot of FUD around file locking in Windows compared to other OSs on the net, just like about defragmentation. Some of this FUD can be ruled out by simply reading a bit on the Wikipedia.
NT variants have the
openfiles
command, which will show which processes have handles on which files. It does, however, require enabling the system global flag 'maintain objects list'
openfiles /local /?
tells you how to do this, and also that a performance penalty is incurred by doing so.
Executables are progressively mapped to memory when run. What that means is that portions of the executable are loaded as needed. If the file is swapped out prior to all sections being mapped, it could cause major instability.

How do I make Windows file-locking more like UNIX file-locking?

UNIX file-locking is dead-easy: The operating system assumes that you know what you are doing and lets you do what you want:
For example, if you try to delete a file which another process has opened the operating system will usually let you do it. The original process still keeps it's file-handles until it terminates - at which point the the file-system will quietly re-cycle the disk-resources. No fuss, that's the way I like it.
How different things are on Windows: If I try to delete a file which another process is using I get an Operating-System error. The file is untouchable until the original process releases it's lock on the file. That was great back in the single-user days of MS-DOS when any locking process was likely to be on the same computer that contained the files, however on a network it's a nightmare:
Consider what happens when a process hangs while writing to a shared file on a Windows file-server. Before the file can be deleted we have to locate the computer and ID the process on that computer which originally opened the file. Only then can we kill the process and delete our unwanted file.
What a nuisance!
Is there a way to make this better? What I want is for file-locking on Windows to behave a like file-locking in UNIX. I want the operating system to just let me do what I want because I'm in charge and I know what I'm doing...
...so can it be done?
No. Windows is designed for the "average user", that is people who don't understand anything about a computer. Therefore, the OS tries to be smart to avoid PEBKACs. To quote Bill Gates: "There are no issues with Windows that any number of people want to be fixed." Of course, he knows that 99.9999% of all Windows users can't tell whether the program just did something odd because of them or the guy who wrote it.
Unix was designed when the world was more simple and anyone close enough to a computer to touch it, probably knew how to assemble it from dirty sand. Therefore, the OS usually lets you do what you want because it assumes that you know better (and if you didn't, you will next time).
Technical answer: Unix allocates an "i-nodes" if you create a file. I-nodes can be shared between processes. If two processes create the same file (that is, two processes call create() with the same path), then you end up with two i-nodes. This is by design. It allows for a fancy security feature: You can create files which no one can open but yourself:
Open a file
Delete it (but keep the file handle)
Use the file any way you like
Close the file
After step #2, the only process in the universe who can access the file is the one who created it (unless you want to read the hard disk block by block). The OS will keep the data alive until you either close the file or your process dies (at which time Unix will clean up after you).
This design is the foundation of all Unix filesystems. The Windows file system NTFS works much the same way but the high level API is different. Many applications open files in exclusive mode (which prevents anyone, even backup programs) to read the file. This is even true for applications which just display information like PDF viewers.
That means you'll have to fix all the Windows applications to achieve the desired effect. If you have access to the source, you can create a file in a shared mode. That would allow other processes to access it at the same time but then, you will have to check before every read/write if the file still exists, whether someone has made changes, etc.
According to MSDN you can specify to CreateFile() 3rd parameter (dwSharedMode) shared mode flag FILE_SHARE_DELETE which:
Enables subsequent open operations on a file or device to request delete access.
Otherwise, other processes cannot open the file or device if they request delete access.
If this flag is not specified, but the file or device has been opened for delete access, the function fails.
Note Delete access allows both delete and rename operations.
http://msdn.microsoft.com/en-us/library/aa363858(VS.85).aspx
So if you're can control your applications you can use this flag.
Note that Process Explorer allow for force closing of file handles (for processes local to the box on which you are running it) via Handle -> Close Handle.
Unlocker purports to do a lot more, and provides a helpful list of other tools.
Also deleting on reboot is an option (though this sounds like not what you want)
That doesn't really help if the hung process still has the handle open. It won't release the resources until that hung process releases the handle. But anyway, in Windows it is possible to force close a file out from under a process that's using it. Process Explorer from sysinternals.com will let you look at and close handles that a process has open.

Resources