Find Out If Two HANDLEs are Hardlinks to the Same File

Find Out If Two HANDLEs are Hardlinks to the Same File - winapi

(This question is a toughie... it might require knowledge of NTFS and/or the use of NT Native APIs; be warned.) :)
If I'm given two HANDLEs to two files, how can I definitively (not just with high probability) find out if the two HANDLEs belong to the exact same file and stream on the disk?
This means, for example, checking the 8-byte NTFS file IDs isn't enough, because two HANDLEs with the same file ID can be pointing to different data streams of the same file, and I need to find out if the two streams are really the same and only differ by the name (hardlink).
(What's the use? This way, if I want to perform an operation on all files inside a folder, I don't do the operation twice on the same data stream with different names.)

This requires GetFileInformationByHandleEx(), asking for FileStreamInfo. That returns the stream name.
This warning in the SDK docs should be noted:
Certain file information classes
behave slightly differently on
different operating system releases.
These classes are supported by the
underlying drivers, and any
information they return is subject to
change between operating system
releases.
Avoid relying on recovering info that is (or should be) readily available in your program.

Related

GetFileInformationByHandleEx/FileIdInfo vs DeviceIoControl/FSCTL_CREATE_OR_GET_OBJECT_ID for OpenFileById

Recently I've stumbled upon "If you want to use GUIDs to identify your files, then nobody's stopping you" article by Raymond Chen and wanted to implement this method. But then I found that there is another way to get file ID and this is GetFileInformationByHandleEx with FILE_INFO_BY_HANDLE_CLASS::FileIdInfo and using the FileId field (128 bit).
I tried both, both methods works as expected but I have a few questions I cannot find any answers to:
These methods return different IDs (and the id from GetFileInformationByHandleEx seems to use only the low 64 bit leaving the high part as zero). What each of them represent? Are they essentially the same thing or just two independent mechanisms to achieve the same goal?
Edit: Actually I've just found some information. So the ObjectID from DeviceIoControl is NTFS object ID but what is the other ID then? How do they relate (if at all)? Are both methods available only on NTFS or at least one of them will work on FAT16/32, exFAT, etc?
Documentation for FILE_INFO_BY_HANDLE_CLASS::FileIdInfo doesn't tell us that the ID may not exist unlike FSCTL_CREATE_OR_GET_OBJECT_ID where I need to explicitly state that I want the ID to be created if there isn't one already. Will it have any bad consequences if I'd just blindly request creation of object IDs for any file I'll be working with?
I found a comment for this question that these IDs remain unchanged if a file is moved to another volume (logical or physical). I did test only the DeviceIoControl method but they indeed don't chnage across drives but if I do move the file I'm required to supply OpenFileById with the destination volume handle, otherwise it won't open the file. So, is there a way to make OpenFileById find a file without keeping the volume reference?
I'm thinking of enumerating all connected volumes to try to open the file by ID for each until it succeed but I'm not sure how reliable is this. Could it be that there could exist two equal IDs that reference different files on different volumes?
How fast it is to ask system to get (or create) an ID? Will it hurt performance if I add the ID query to regular file enumeration procedures or I'd better to do that only on demand when I really need this?

Is there an OS-agnostic way to verify that a file isn't being written to or opened by another process?

Wondering if there is a way to validate that a file isn't being written to or has been opened by another process at runtime. Preferably a way that would work on all OS's

Not in general.
The most ubiquitous general application-level mechanism for detecting and preventing use or alteration of a file that is being used by another process is file locking
One reason there isn't a cross-platform solution is that some operating systems provide for cooperative locking where file locks are advisory. For example most Unix variants and Linux.
So, on those platforms, you can only guarantee knowledge of other process using a file where the other process is known in advance to be using a specific type of advisory lock.
Most of those platforms do have mandatory locking available. It is set on a per-file basis as part of the file attributes. There are some problems with this (e.g. race conditions).
So no, the underlying mechanisms that could provide the verification you seek are very different. It would probably be very troublesome to provide a reliable cross-platform mechanism in Go that would be guaranteed to work on a variety of popular platforms where other processes are or can be uncooperative.
References
https://www.kernel.org/doc/Documentation/filesystems/mandatory-locking.txt
https://unix.stackexchange.com/questions/244543/mandatory-locking-in-unix

That won't answer your question but since we might be dealing with an XY problem here, I'd like to look at the problem from a PoV different to locking and otherwise detecting the file is not being written to: an update-then-rename-over approach which is the only sensible way to do atomic updates to files which is sadly not very well known by (novice) programmers.
Since filesystem is inherently racy, to ensure proper "database-like" work with files—where everyone sees consistent state of the file's contents,—you have to use either locking or atomic updates or both.
To update the file's contents in an atomic way, you do this:
Read the file's data.
Open a temporary file (on the same filesystem).
Write the updated data into it.
Rename the new file over the old one.
Renaming is guaranteed to be atomic on all contemporary commodity OSes so that when a process tries to open a file, it opens either an old copy or the new one but not something in between.
On POSIX systems, Go's os.Rename() has always been atomic since it would end up calling rename(2); on Windows it was fixed since Go 1.5.
Note that this approach merely provides consistency of the file's contents
in the sense no two processes would ever end up updating it at the same time
but it does not ensure "serialized" updates which is only possible to ensure
through locking or other "side-channel" signaling.
That is, with atomic updates, it's still possible to have this situation:
Processes A and B read the file's data.
They both modify it and do atomic updates.
The file's contents will be consistent, but the state would be of whatever
process ended up calling the OS's renaming API function last.
So if you need serialization, you need locking.
I'm afraid, that no cross-platform file locking solution exists for Go
(anyway, approaches to locking differ greatly even across Unix-y systems
— let alone Windows; see this for an entertaining read) but one way to do it is to use platform-specific
locking of that temporary file created on the step (2) above.
The approach to update a file then changes to:
Open a temporary file with a well-known name.
Say, if the file to update is named "foo.state", call it "foo.state.lock".
Lock it using any platform-specific locking.
If locking fails, this means another process is updating the file,
so back out or wait—this really depends on what you're after.
Once the lock is held, read the file's data.
Modify it, re-write the temporary file being locked with this data.
Rename the temporary file over the original one.
Close the temp. file and release the lock.

Providing a basic filesystem from a char driver

I have an existing Linux device driver that exposes a basic char device to userland. (I am not its original author, but I'm trying to modify it.)
Currently it provides a maze of ioctl functions to do various things (though also wrapped in a handy library so most user code doesn't need to deal with the details of it).
One of the things that it does is to provide a sub-stream interface, where given a bunch of device-specific identifying information (including a string and some numeric ids) it can read or write (but not both at once) some data (up to a small number of MB) in a strictly sequential manner. Currently it does this with explicit ioctls.
I'm wondering if there is a way to leverage the existing file_operations infrastructure or similar to provide either a virtual filesystem or just an ioctl that can return a new already-open fd that can then be used with read/write/close (but not lseek) from userland as you'd normally expect?
The device does have a concept of a filename (that's the string) but it is not possible to enumerate existing valid filenames (only to try to open a specific filename and see if it gives an error or not), and the filename is not sufficient to open a stream by itself, which is why I'm currently leaning more towards the "special open" ioctl on the parent device rather than trying to expose things directly in some userland-visible fs that can be opened directly. (Also there's no concept of subdirs and only basic write-protect permissions, so a full fs seems like overkill anyway.) But I'm willing to be persuaded otherwise if there's a better way to do it.
I have written basic char drivers from scratch myself before, so I'm reasonably confident that I can get the read/write ops and other supporting things to work; I'm just not sure how to best handle that initial step of opening the handle.
I'm currently targeting kernel 3.2+.
Edit: The main reason that I think making an actual filesystem (or trying to expose it via procfs or sysfs) wouldn't work is that there's no way to populate a directory -- the only ops available are "open for read" and "open for write", and there's no way to tell which names are valid prior to the open attempt (the files are stored in external hardware and accessed via a protocol I cannot change). If I'm missing something and it is possible to support this sort of thing, that would be useful to know as well.

You can most certainly create a file system where readdir() is not implemented, but the open() method is. It's normally not done because it's not particularly user-friendly, but it certainly is doable.
You're targetting really ancient kernels if you're looking at 3.2 -- the upstream kernel developers aren't even bother to try to backport security fixes that far back, so I certainly wouldn't recommend shipping something as ancient as 3.2, but it's technically doable.
All you need to do is to implement lookup() method in the inode_operations structure for directories. You'll need to figure out some way of creating inodes with unique inode numbers, that contains private information so you can identify the subtream. The inode will have a file_operations structure that implements the read/write methods for reading and writing the substream.
You can try looking at a simple file system such as cramfs or minix to see how things are done.

How to determine a volume supports resolving bookmarks to renamed or moved files?

- bookmarkDataWithOptions:includingResourceValuesForKeys:relativeToURL:error:
Documentation states:
This method returns bookmark data that can later be
resolved into a URL object for a file even if the user moves or
renames it (if the volume format on which the file resides supports
doing so).
My question is, how can I query if a volume supports this feature?
From trial and error it seems only (internal?) hard drives support it, but I am looking for some kind of sure test like a NSURLVolumeSupports???Key.
NSURLVolumeSupportsPersistentIDsKey looks like a good candidate, but I failed to find any docs or google-info about it. Any hints?

It definitely sounds like the NSURLVolumeSupportsPersistentIDsKey would apply.
Following the hints in this forum thread here (archived version here), the documentation for the VOL_CAP_FMT_PERSISTENTOBJECTIDS volume capability flag (from man getattrlist(2)) says:
If this bit is set the volume format supports persistent object identifiers and can look up file system objects by their IDs. See ATTR_CMN_OBJPERMANENTID for details about how to obtain these identifiers.
and the common attribute ATTR_CMN_OBJPERMANENTID documentation says
An fsobj_id_t structure that uniquely and persistently identifies the file system object within its volume; persistence implies that this attribute is unaffected by mount/unmount operations on the volume.
Some file systems can not return this attribute when the volume is mounted read-only and will fail the request with error EROFS. (e.g. original HFS modifies on disk structures to generate persistent identifiers, and hence cannot do so if the volume is mounted read only.)

Is an atomic file rename (with overwrite) possible on Windows?

On POSIX systems rename(2) provides for an atomic rename operation, including overwriting of the destination file if it exists and if permissions allow.
Is there any way to get the same semantics on Windows? I know about MoveFileTransacted() on Vista and Server 2008, but I need this to support Win2k and up.
The key word here is atomic... the solution must not be able to fail in any way that leaves the operation in an inconsistent state.
I've seen a lot of people say this is impossible on win32, but I ask you, is it really?
Please provide reliable citations if possible.

See ReplaceFile() in Win32 (http://research.microsoft.com/pubs/64525/tr-2006-45.pdf)

Win32 does not guarantee atomic file meta data operations. I'd provide a citation, but there is none - that fact that there's no written or documented guarantee means as much.
You're going to have to write your own routines to support this. It's unfortunate, but you can't expect win32 to provide this level of service - it simply wasn't designed for it.

In Windows Vista and Windows Server 2008 an atomic move function has been added - MoveFileTransacted()
Unfortunately this doesn't help with older versions of Windows.
Interesting article here on MSDN.

Starting with Windows 10 1607, NTFS does support an atomic superseding rename operation. To do this call NtSetInformationFile(..., FileRenameInformationEx, ...) and specify the FILE_RENAME_POSIX_SEMANTICS flag.
Or equivalently in Win32 call SetFileInformationByHandle(..., FileRenameInfoEx, ...) and specify the FILE_RENAME_FLAG_POSIX_SEMANTICS flag.

you still have the rename() call on Windows, though I imagine the guarantees you want cannot be made without knowing the filesystem you're using - no guarantees if you're using FAT for instance.
However, you can use MoveFileEx and use the MOVEFILE_REPLACE_EXISTING
and MOVEFILE_WRITE_THROUGH options. The latter has this description in MSDN:
Setting this value guarantees that a
move performed as a copy and delete
operation is flushed to disk before
the function returns. The flush occurs
at the end of the copy operation.
I know that's not necessarily the same as a rename operation, but I think it might be the best guarantee you'll get - if it does that for a file move, it should for a simpler rename.

The MSDN documentation avoids clearly stating which APIs are atomic and which are not, but Niall Douglas states in his Cppcon 2015 talk that the only atomic function is
SetFileInformationByHandle
with FILE_RENAME_INFO.ReplaceIfExists set to true. It's available starting with Windows Vista / 2008 Server.
Niall is the author of a highly complicated LLFIO library and is an expert in file system race conditions so I believe if you're writing an algorithm where atomicity is crucial, better be safe than sorry and use the suggested function even though nothing in ReplaceFile's description states it's not atomic.

A fair number of answers but not the one I was expecting... I had the understanding (perhaps incorrectly) that MoveFile could be atomic provided that the proper stars aligned, flags were used, and file system was the same on the source as target. Otherwise, the operation would fall back to a [Copy->Delete]File.
Given that; I was also had the understanding that MoveFile -- when it is atomic -- was just setting the file information which also could be done here: setfileinfobyhandle.
Someone gave a talk called "Racing the Filesystem" which goes into some more depth about this. (about 2/3rds down they talk about atomic rename)

There is std::rename and starting with C++17 std::filesystem::rename.
It's unspecified what happens if destination exists with std::rename:
If new_filename exists, the behavior is implementation-defined.
POSIX rename, however, is required to replace existing files atomically:
This rename() function is equivalent for regular files to that defined
by the ISO C standard. Its inclusion here expands that definition to
include actions on directories and specifies behavior when the new
parameter names a file that already exists. That specification
requires that the action of the function be atomic.
Thankfully, std::filesystem::rename requires that it behaves just like POSIX:
Moves or renames the filesystem object identified by old_p to new_p as
if by the POSIX rename
However, when I tried to debug, it appears that std::filesystem::rename as implemented by VS2019 (as of March 2020) simply calls MoveFileEx, which isn't atomic in some cases.
So, possibly, when all bugs in its implementation are fixed, we'll see portable atomic std::filesystem::rename.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio