Is an atomic file rename (with overwrite) possible on Windows? - windows

On POSIX systems rename(2) provides for an atomic rename operation, including overwriting of the destination file if it exists and if permissions allow.
Is there any way to get the same semantics on Windows? I know about MoveFileTransacted() on Vista and Server 2008, but I need this to support Win2k and up.
The key word here is atomic... the solution must not be able to fail in any way that leaves the operation in an inconsistent state.
I've seen a lot of people say this is impossible on win32, but I ask you, is it really?
Please provide reliable citations if possible.

See ReplaceFile() in Win32 (http://research.microsoft.com/pubs/64525/tr-2006-45.pdf)

Win32 does not guarantee atomic file meta data operations. I'd provide a citation, but there is none - that fact that there's no written or documented guarantee means as much.
You're going to have to write your own routines to support this. It's unfortunate, but you can't expect win32 to provide this level of service - it simply wasn't designed for it.

In Windows Vista and Windows Server 2008 an atomic move function has been added - MoveFileTransacted()
Unfortunately this doesn't help with older versions of Windows.
Interesting article here on MSDN.

Starting with Windows 10 1607, NTFS does support an atomic superseding rename operation. To do this call NtSetInformationFile(..., FileRenameInformationEx, ...) and specify the FILE_RENAME_POSIX_SEMANTICS flag.
Or equivalently in Win32 call SetFileInformationByHandle(..., FileRenameInfoEx, ...) and specify the FILE_RENAME_FLAG_POSIX_SEMANTICS flag.

you still have the rename() call on Windows, though I imagine the guarantees you want cannot be made without knowing the filesystem you're using - no guarantees if you're using FAT for instance.
However, you can use MoveFileEx and use the MOVEFILE_REPLACE_EXISTING
and MOVEFILE_WRITE_THROUGH options. The latter has this description in MSDN:
Setting this value guarantees that a
move performed as a copy and delete
operation is flushed to disk before
the function returns. The flush occurs
at the end of the copy operation.
I know that's not necessarily the same as a rename operation, but I think it might be the best guarantee you'll get - if it does that for a file move, it should for a simpler rename.

The MSDN documentation avoids clearly stating which APIs are atomic and which are not, but Niall Douglas states in his Cppcon 2015 talk that the only atomic function is
SetFileInformationByHandle
with FILE_RENAME_INFO.ReplaceIfExists set to true. It's available starting with Windows Vista / 2008 Server.
Niall is the author of a highly complicated LLFIO library and is an expert in file system race conditions so I believe if you're writing an algorithm where atomicity is crucial, better be safe than sorry and use the suggested function even though nothing in ReplaceFile's description states it's not atomic.

A fair number of answers but not the one I was expecting... I had the understanding (perhaps incorrectly) that MoveFile could be atomic provided that the proper stars aligned, flags were used, and file system was the same on the source as target. Otherwise, the operation would fall back to a [Copy->Delete]File.
Given that; I was also had the understanding that MoveFile -- when it is atomic -- was just setting the file information which also could be done here: setfileinfobyhandle.
Someone gave a talk called "Racing the Filesystem" which goes into some more depth about this. (about 2/3rds down they talk about atomic rename)

There is std::rename and starting with C++17 std::filesystem::rename.
It's unspecified what happens if destination exists with std::rename:
If new_filename exists, the behavior is implementation-defined.
POSIX rename, however, is required to replace existing files atomically:
This rename() function is equivalent for regular files to that defined
by the ISO C standard. Its inclusion here expands that definition to
include actions on directories and specifies behavior when the new
parameter names a file that already exists. That specification
requires that the action of the function be atomic.
Thankfully, std::filesystem::rename requires that it behaves just like POSIX:
Moves or renames the filesystem object identified by old_p to new_p as
if by the POSIX rename
However, when I tried to debug, it appears that std::filesystem::rename as implemented by VS2019 (as of March 2020) simply calls MoveFileEx, which isn't atomic in some cases.
So, possibly, when all bugs in its implementation are fixed, we'll see portable atomic std::filesystem::rename.

Related

How to access FD_SETSIZE in Ruby? spawn cpp (??)

While reading up about IO in Ruby, moreover refreshing my own albeit limited knowledge about I/O in generally POSIX-friendly libc environments, I found a question here at Stack Overflow: poll() in Ruby? such that raises the question that was the object of research.
The responses had mentioned the availability of a select method in Ruby. However, it also raised a concern about using select under certain conditions on some operating systems - including Linux - e.g when there may be 1024 or more file descriptors open in the Ruby process.
In some of the responses to the question, poll() in Ruby?, it was suggested that if select was called in such an environment, that it could result in memory corruption in the application. While the concern might not be represented as being of such severity, in other documentation, and there may be a way to portably avoid calling select in such circumstances - as later reading has indicated - perhaps the question remains as to how to address this portably, for Ruby's select.
Reading more about it, I noticed that the "BUGS" section of the select(2) manual page, on Linux, provides what may represent an expansive discussion of the issue. The text mentions a constant, FD_SETSIZE as apparently representing the exclusive upper limit on the number of file descriptors that can be open at the time when select is called, such that select might be expected to perform normally then (roughly paraphrased).
Quoting the select(2) manual page:
POSIX allows an implementation to define an upper limit,
advertised via the constant FD_SETSIZE, on the range of file
descriptors that can be specified in a file descriptor set. The
Linux kernel imposes no fixed limit, but the glibc implementation
makes fd_set a fixed-size type, with FD_SETSIZE defined as 1024,
and the FD_*() macros operating according to that limit. To
monitor file descriptors greater than 1023, use poll(2) or
epoll(7) instead.
The implementation of the fd_set arguments as value-result
arguments is a design error that is avoided in poll(2) and
epoll(7).
According to POSIX, select() should check all specified file
descriptors in the three file descriptor sets, up to the limit
nfds-1. However, the current implementation ignores any file
descriptor in these sets that is greater than the maximum file
descriptor number that the process currently has open. According
to POSIX, any such file descriptor that is specified in one of
the sets should result in the error EBADF.
Towards making use of this in Ruby, albeit in what may be a guess of an approach: What might be the best way to determine FD_SETSIZE for the Ruby environment?
If it was available as a constant, this assumes that the value of that constant could be used in a conditional test before calling 'select' on any open file descriptor. The Ruby program might then raise an exception internally, before calling select on any file descriptor equal to or greater than the value of FD_SETSIZE for the instance, at least for generally POSIX-friendly operating systems?
If there's no cleaner way to approach this, maybe it could be worked into the distribution tooling for a project, such as to determine that constant's value for the target architecture then to store it along with any other application constants? I'm guessing a cpp could be used for this - whether from GCC, LLVM, or any other toolchain - perhaps in some ways similar to sb-grovel.
Maybe there's some other way to approach this, and portably so? Perhaps there's already a constant for it, somewhere in Ruby?
Maybe there's already some checking about it, in the Ruby source code? I suppose it's time to look for that GitHub repository now....
Ruby does not export FD_SETSIZE in any way, so the only way to check the size is to use a compiler.
Instead of building your own extension, the least hassle-free way may be to use RubyInline, which makes the check very simple:
gem install RubyInline
Then:
require 'inline'
class FdTest
inline do |builder|
builder.c "
long fd_setsize(void) {
return FD_SETSIZE;
}"
end
end
puts FdTest.new.fd_setsize
=> 1024
This is even semi-portable to Windows, provided you are running under WSL, Cygwin, MinGW, or something similar. Might even work under Visual Studio, provided it is installed with C-support.
Building it as an extension might be another solution to ensure better compatibility, which you can then ship with precompiled binaries for your required platforms.
It all depends on how much trouble you are willing to go through in order to extract this information on all possible platforms, since there really does not exist a fully platform independent solution to something like this.

Windows: redirect ReadFile to run process and pipe it's stdout

I was wondering how hard it would be to create a set-up under Windows where a regular ReadFile on certain files is being redirected by the file system to actually run (e.g. ShellExecute) those files, and then the new process' stdout is being used as the file content streamed out to the ReadFile call to the callee...
What I envision the set-up to look like, is that you can configure it to denote a certain folder as 'special', and that this extra functionality is then only available on that folder's content (so it doesn't need to be disk-wide). It might be accessible under a new drive letter, or a path parallel to the source folder; the location it is hooked up to is irrelevant to me.
To those of you that wonder if this is a classic xy problem: it might very well be ;) It's just that this idea has intrigued me, and I want to know what possibilities there are. In my particular case I want to employ it to #include content in my C++ code base, where the actual content included is being made up on the spot, different on each compile round. I could of course also create a script to create such content to include, call it as a pre-build step and leave it at that, but why choose the easy route.
Maybe there are already ready-made solutions for this? I did an extensive Google search for it, but came out empty handed. But then I'm not sure I already know all the keywords involved to do a good search...
When coding up something myself, I think a minifilter driver might be needed intercepting ReadFile calls, but then it must at that spot run usermode apps from kernel space - not a happy marriage I assume. Or use an existing file system driver framework that allows for usermode parts, but I found the price of existing solutions to be too steep for my taste (several thousand dollars).
And I also assume that a standard file system (minifilter) driver might be required to return a consistent file size for such files, although the actual data size returned through ReadFile would of course differ on each call. Not to mention negating any buffering that takes place.
All in all I think that a create-it-yourself solution will take quite some effort, especially when you have never done Windows driver development in your life :) Although I see myself quite capable of learning up on it, the time invested will be prohibitive I think.
Another approach might be to hook ReadFile calls from the process doing the ReadFile - via IAT hooking, or via code injection. But I want this solution to more work 'out-of-the-box', i.e. all ReadFile requests for these special files trigger the correct behavior, regardless of origin. In my case I'd need to intercept my C++ compiler (G++) behavior, but that one is called on the fly by the IDE, so I see no easy way to detect it's startup and hook it up quickly before it does it's ReadFiles. And besides, I only want certain files to be special in this regard; intercepting all ReadFiles for a certain process is overkill.
You want something like FUSE (which I used with profit many times), but for Windows. Apparently there's Dokan, I've never used it but seems to be well known enough (and, at very least, can be used as an inspiration to see "how it's done").

Is there an OS-agnostic way to verify that a file isn't being written to or opened by another process?

Wondering if there is a way to validate that a file isn't being written to or has been opened by another process at runtime. Preferably a way that would work on all OS's
Not in general.
The most ubiquitous general application-level mechanism for detecting and preventing use or alteration of a file that is being used by another process is file locking
One reason there isn't a cross-platform solution is that some operating systems provide for cooperative locking where file locks are advisory. For example most Unix variants and Linux.
So, on those platforms, you can only guarantee knowledge of other process using a file where the other process is known in advance to be using a specific type of advisory lock.
Most of those platforms do have mandatory locking available. It is set on a per-file basis as part of the file attributes. There are some problems with this (e.g. race conditions).
So no, the underlying mechanisms that could provide the verification you seek are very different. It would probably be very troublesome to provide a reliable cross-platform mechanism in Go that would be guaranteed to work on a variety of popular platforms where other processes are or can be uncooperative.
References
https://www.kernel.org/doc/Documentation/filesystems/mandatory-locking.txt
https://unix.stackexchange.com/questions/244543/mandatory-locking-in-unix
That won't answer your question but since we might be dealing with an XY problem here, I'd like to look at the problem from a PoV different to locking and otherwise detecting the file is not being written to: an update-then-rename-over approach which is the only sensible way to do atomic updates to files which is sadly not very well known by (novice) programmers.
Since filesystem is inherently racy, to ensure proper "database-like" work with files—where everyone sees consistent state of the file's contents,—you have to use either locking or atomic updates or both.
To update the file's contents in an atomic way, you do this:
Read the file's data.
Open a temporary file (on the same filesystem).
Write the updated data into it.
Rename the new file over the old one.
Renaming is guaranteed to be atomic on all contemporary commodity OSes so that when a process tries to open a file, it opens either an old copy or the new one but not something in between.
On POSIX systems, Go's os.Rename() has always been atomic since it would end up calling rename(2); on Windows it was fixed since Go 1.5.
Note that this approach merely provides consistency of the file's contents
in the sense no two processes would ever end up updating it at the same time
but it does not ensure "serialized" updates which is only possible to ensure
through locking or other "side-channel" signaling.
That is, with atomic updates, it's still possible to have this situation:
Processes A and B read the file's data.
They both modify it and do atomic updates.
The file's contents will be consistent, but the state would be of whatever
process ended up calling the OS's renaming API function last.
So if you need serialization, you need locking.
I'm afraid, that no cross-platform file locking solution exists for Go
(anyway, approaches to locking differ greatly even across Unix-y systems
— let alone Windows; see this for an entertaining read) but one way to do it is to use platform-specific
locking of that temporary file created on the step (2) above.
The approach to update a file then changes to:
Open a temporary file with a well-known name.
Say, if the file to update is named "foo.state", call it "foo.state.lock".
Lock it using any platform-specific locking.
If locking fails, this means another process is updating the file,
so back out or wait—this really depends on what you're after.
Once the lock is held, read the file's data.
Modify it, re-write the temporary file being locked with this data.
Rename the temporary file over the original one.
Close the temp. file and release the lock.

Providing a basic filesystem from a char driver

I have an existing Linux device driver that exposes a basic char device to userland. (I am not its original author, but I'm trying to modify it.)
Currently it provides a maze of ioctl functions to do various things (though also wrapped in a handy library so most user code doesn't need to deal with the details of it).
One of the things that it does is to provide a sub-stream interface, where given a bunch of device-specific identifying information (including a string and some numeric ids) it can read or write (but not both at once) some data (up to a small number of MB) in a strictly sequential manner. Currently it does this with explicit ioctls.
I'm wondering if there is a way to leverage the existing file_operations infrastructure or similar to provide either a virtual filesystem or just an ioctl that can return a new already-open fd that can then be used with read/write/close (but not lseek) from userland as you'd normally expect?
The device does have a concept of a filename (that's the string) but it is not possible to enumerate existing valid filenames (only to try to open a specific filename and see if it gives an error or not), and the filename is not sufficient to open a stream by itself, which is why I'm currently leaning more towards the "special open" ioctl on the parent device rather than trying to expose things directly in some userland-visible fs that can be opened directly. (Also there's no concept of subdirs and only basic write-protect permissions, so a full fs seems like overkill anyway.) But I'm willing to be persuaded otherwise if there's a better way to do it.
I have written basic char drivers from scratch myself before, so I'm reasonably confident that I can get the read/write ops and other supporting things to work; I'm just not sure how to best handle that initial step of opening the handle.
I'm currently targeting kernel 3.2+.
Edit: The main reason that I think making an actual filesystem (or trying to expose it via procfs or sysfs) wouldn't work is that there's no way to populate a directory -- the only ops available are "open for read" and "open for write", and there's no way to tell which names are valid prior to the open attempt (the files are stored in external hardware and accessed via a protocol I cannot change). If I'm missing something and it is possible to support this sort of thing, that would be useful to know as well.
You can most certainly create a file system where readdir() is not implemented, but the open() method is. It's normally not done because it's not particularly user-friendly, but it certainly is doable.
You're targetting really ancient kernels if you're looking at 3.2 -- the upstream kernel developers aren't even bother to try to backport security fixes that far back, so I certainly wouldn't recommend shipping something as ancient as 3.2, but it's technically doable.
All you need to do is to implement lookup() method in the inode_operations structure for directories. You'll need to figure out some way of creating inodes with unique inode numbers, that contains private information so you can identify the subtream. The inode will have a file_operations structure that implements the read/write methods for reading and writing the substream.
You can try looking at a simple file system such as cramfs or minix to see how things are done.

NtOpenSection(L"\\Device\\PhysicalMemory") returns STATUS_OBJECT_NAME_NOT_FOUND

I am implementing SMBIOS reading functionality for Windows systems. As API levels vary, there are several methods to support:
trouble-free GetSystemFirmwareTable('RSMB') available on Windows Server 2003 and later;
hardcore NtOpenSection(L"\\Device\\PhysicalMemory") for legacy systems prior to and including Windows XP;
essential WMI data in L"Win32_ComputerSystemProduct" path through cumbersome COM automation calls as a fallback.
Methods 1 and 3 are already implemented, but I am stuck with \Device\PhysicalMemory, as NtOpenSection always yields 0xC0000034 (STATUS_OBJECT_NAME_NOT_FOUND) — definitely not one of the possible result codes in the ZwOpenSection documentation. Of course, I am aware that accessing this section is prohibited starting from Windows Server 2003sp1 and perhaps Windows XP-64 as well, so I am trying this on a regular Windows XP-32 system — and the outcome is no different to that of a Windows 7-64, for example. I am also aware that administrator rights may be required even on legacy systems, but people on the internets having faced this issue reported more relevant error codes for such scenario, like 0xC0000022 (STATUS_ACCESS_DENIED) and 0xC0000005 (STATUS_ACCESS_VIOLATION).
My approach is based on the Libsmbios library by Dell, which I assume to be working.
UNICODE_STRING wsMemoryDevice;
OBJECT_ATTRIBUTES oObjAttrs;
HANDLE hMemory;
NTSTATUS ordStatus;
RtlInitUnicodeString(&wsMemoryDevice, L"\\Device\\PhysicalMemory");
InitializeObjectAttributes(&oObjAttrs, &wsMemoryDevice,
OBJ_CASE_INSENSITIVE, NULL, NULL);
ordStatus = NtOpenSection(&hMemory, SECTION_MAP_READ, &oObjAttrs);
if (!NT_SUCCESS(ordStatus)) goto Finish;
I thought it could be possible to debug this, but native API seems to be transparent to debuggers like OllyDbg: the execution immediately returns once SYSENTER instruction receives control. So I have no idea why Windows cannot find this object. I also tried changing the section name, as there are several variants in examples available online, but that always yields 0xC0000033 (STATUS_OBJECT_NAME_INVALID).
Finally, I found the cause of such a strange behavior, — thanks to you, people, confirming that my code snippet (it was an actual excerpt, not a forged example) really works. The problem was that I did not have Windows DDK installed initially (I do have now, but still cannot integrate it with Visual Studio in a way that Windows SDK integrates automatically), so there was a need to write definitions by hand. Particularly, when I realized that InitializeObjectAttributes is actually a preprocessor macro rather than a Win32 function, I defined RtlInitUnicodeString as a macro, too, since its effect is even simpler. However, I was not careful enough to notice that UNICODE_STRING.Length and .MaximumLength are in fact meant for content size and buffer size instead of length, i. e. number of bytes rather than number of characters. Consequently, my macro was setting the fields to a half of their expected value, thus making Windows see only the first half of the L"\\Device\\PhysicalMemory" string, — with obvious outcome.

Resources