GetFileInformationByHandleEx/FileIdInfo vs DeviceIoControl/FSCTL_CREATE_OR_GET_OBJECT_ID for OpenFileById - windows

Recently I've stumbled upon "If you want to use GUIDs to identify your files, then nobody's stopping you" article by Raymond Chen and wanted to implement this method. But then I found that there is another way to get file ID and this is GetFileInformationByHandleEx with FILE_INFO_BY_HANDLE_CLASS::FileIdInfo and using the FileId field (128 bit).
I tried both, both methods works as expected but I have a few questions I cannot find any answers to:
These methods return different IDs (and the id from GetFileInformationByHandleEx seems to use only the low 64 bit leaving the high part as zero). What each of them represent? Are they essentially the same thing or just two independent mechanisms to achieve the same goal?
Edit: Actually I've just found some information. So the ObjectID from DeviceIoControl is NTFS object ID but what is the other ID then? How do they relate (if at all)? Are both methods available only on NTFS or at least one of them will work on FAT16/32, exFAT, etc?
Documentation for FILE_INFO_BY_HANDLE_CLASS::FileIdInfo doesn't tell us that the ID may not exist unlike FSCTL_CREATE_OR_GET_OBJECT_ID where I need to explicitly state that I want the ID to be created if there isn't one already. Will it have any bad consequences if I'd just blindly request creation of object IDs for any file I'll be working with?
I found a comment for this question that these IDs remain unchanged if a file is moved to another volume (logical or physical). I did test only the DeviceIoControl method but they indeed don't chnage across drives but if I do move the file I'm required to supply OpenFileById with the destination volume handle, otherwise it won't open the file. So, is there a way to make OpenFileById find a file without keeping the volume reference?
I'm thinking of enumerating all connected volumes to try to open the file by ID for each until it succeed but I'm not sure how reliable is this. Could it be that there could exist two equal IDs that reference different files on different volumes?
How fast it is to ask system to get (or create) an ID? Will it hurt performance if I add the ID query to regular file enumeration procedures or I'd better to do that only on demand when I really need this?

Related

How to determine a volume supports resolving bookmarks to renamed or moved files?

- bookmarkDataWithOptions:includingResourceValuesForKeys:relativeToURL:error:
Documentation states:
This method returns bookmark data that can later be
resolved into a URL object for a file even if the user moves or
renames it (if the volume format on which the file resides supports
doing so).
My question is, how can I query if a volume supports this feature?
From trial and error it seems only (internal?) hard drives support it, but I am looking for some kind of sure test like a NSURLVolumeSupports???Key.
NSURLVolumeSupportsPersistentIDsKey looks like a good candidate, but I failed to find any docs or google-info about it. Any hints?
It definitely sounds like the NSURLVolumeSupportsPersistentIDsKey would apply.
Following the hints in this forum thread here (archived version here), the documentation for the VOL_CAP_FMT_PERSISTENTOBJECTIDS volume capability flag (from man getattrlist(2)) says:
If this bit is set the volume format supports persistent object identifiers and can look up file system objects by their IDs. See ATTR_CMN_OBJPERMANENTID for details about how to obtain these identifiers.
and the common attribute ATTR_CMN_OBJPERMANENTID documentation says
An fsobj_id_t structure that uniquely and persistently identifies the file system object within its volume; persistence implies that this attribute is unaffected by mount/unmount operations on the volume.
Some file systems can not return this attribute when the volume is mounted read-only and will fail the request with error EROFS. (e.g. original HFS modifies on disk structures to generate persistent identifiers, and hence cannot do so if the volume is mounted read only.)

(why) is FSCTL_SET_OBJECT_ID dangerous?

NTFS files can have object ids. These ids can be set using FSCTL_SET_OBJECT_ID. However, the msdn article says:
Modifying an object identifier can result in the loss of data from portions of a file, up to and including entire volumes of data.
But it doesn't go into any more detail. How can this result in loss of data? Is it talking about potential object id collisions in the file system, and does NTFS rely on them in some way?
Side node: I did some experimenting with this before I found that paragraph, and set the object id's of some newly created files, here's hoping that my file system's still intact.
I really don't think this can directly result in loss of data.
The only way I can imagine it being possible is if e.g. a backup program assumes that (1) every file has an Object Id, and (2) that the program is keeping track of all IDs at all times. In that case it might assume that an ID that is not in its database must refer to a file that should not exist, and it might delete the file.
Yeah, I know it sounds ridiculous, but that's the only way I can think of in which this might happen. I don't think you can lose data just by changing IDs.
They are used by distributed link tracking service which enables client applications to track link sources that have moved. The link tracking service maintains its link to an object only by using these object identifier (ID).
So coming back to your question,
Is it talking about potential object id collisions in the file system
?
I dont think so. Windows does provides us the option to set the object IDs using FSCTL_SET_OBJECT_ID but that doesnt bring the risk of ID collision.
Attempting to set an object identifier on an object that already has an object identifier will fail.
.. and does NTFS rely on them in some way?
Yes. Object identifiers are used to track files and directories. An index of all object IDs is stored on the volume. Rename, backup, and restore operations preserve object IDs. However, copy operations do not preserve object IDs, because that would violate their uniqueness.
How can this result in loss of data?
You wont get into a serious problem if you change(or rather set) object ID of user-created files(as you did). However, if a user(knowingly/unknowingly) sets object ID used by a shared object file/library, change will not be reflected as is.
Since Windows doesnt want everyone(but developers) to play with crutial library files, it issues a generic warning:
Modifying an object identifier can result in the loss of data from
portions of a file, up to and including entire volumes of data.
Bottom line: Change it if you know what you are doing.
There's another msn article on distributed link tracking and object identifiers.
Hope it helps!
EDIT:
Thanks to #Mehrdad for pointing out.I didnt mean object identifiers of DLLs themselves but ones which they use internally.
OLEACC(a dll), provides the Active Accessibility runtime and manages requests from Active Accessibility clients[source]. It use OBJID_QUERYCLASSNAMEIDX object identifier [ source ]

How can I find out how many GDI objects my process is allowed to create?

There's a registry key where I can check (& set) the currently set GDI object quota for processes. However, if a user changes that registry key, the value remains the old value until a reboot occurs. In my program, I need to know if there's a way to determine, programatically, how many more GDI objects I can create. Is there an API for getting GDI information for the current process? What about at the system level?
Always hard to prove the definite absence of an API, but this one is a 95% no-go. Lots of system settings are configured through the registry without an API to tweak it afterward.
Raymond Chen's typical response to questions like these is "if you want to know then you are doing something wrong". It applies here, the default quota of 10,000 handles is enormous.
If you want to find the current quota that matters to you, create GDI objects until that fails. Record that number. Then, destroy all of them.
If you feel like doing this on a regular basis to get an accurate number, you can do so. It's probably going to be fairly expensive though.
Since Hans mentioned Raymond already, we should play his "Imagine if this were true" game. If this API - GetGDIObjectLimit or whatever - existed, what would it return? If the object count limit is 10000, then you expect it to return that right? So what happens when the system is low on memory? The API tells you a value which has no actual meaning. If you're getting close to 10000 GDI objects, you are doing something wrong and you should concentrate on fixing that.

Get Windows hardlink count without GetFileInformationByHandle()

Is there a way to get a file hardlinks count on Windows without using GetFileInformationByHandle()?
MSDN says:
Depending on the underlying network features of the operating system and the type of server connected to, the GetFileInformationByHandle function may fail, return partial information, or full information for the given file.
In practice, retrieving the link count on a network share, whatever the Windows version at both ends, always return 1. The only case where it works is when accessing a samba share. Looks like they forgot to duplicate Windows bug/limitation. Also, the "partial results" without telling you they are partial is pretty nice for an API call.
It seems a little strange but what about GetFileInformationByHandleEx. It doesn't contain the waiver that you quoted above, so perhaps has the smarts built into it to handle some of the problems that GetFileInformationByHandle can have.
For that you can try FindFirstFileNameW and FindNextFileNameW.
It isn't good option to enumerate stuff to get count but it is another way.

How can one detect changes in a directory across program executions?

I am making a protocol, client and server which provide file transfer functionality similar to FTP (among other features). One difference between my protocol and FTP is that I would like to store a copy of the remote server's directory structure in a local cache. The server will only be running on Windows (written in C++) so any applicable Win32 API calls would be appreciated (if any). When initially connected, the client requests the immediate children (both files and directories, just like "ls" or "dir" with no options), then when a user navigates into a directory, this step repeats with the new parent like you might expect.
Of course, most of the time, if the same directory of a given server is requested twice by a client, the directory's contents will be the same. Therefore I would like to cache the results of each directory listing on the client. I would like a simple way of implementing this, but it would need to take into account expiring cache entries because of file/directory access and modification time and name changes, which is the tricky part. I would ideally like something which would enable almost instant directory listings by the client, with something like a hash which takes into account not only file contents, but also changes in subdirectories' contents' filenames, data, modification and access dates, etc.
This is NOT something that could completely rely on FileSystemWatcher (or similar) objects because it would need to maintain this cache even if the program is only run occasionally. Of course these would be nice to help maintain the cache, but that's only part of the problem.
My best(?) idea so far is using FindFirstFile() and FindNextFile(), and sorting (somehow), concatenating and hashing values found in the WIN32_FIND_DATA structs (with file contents maybe), and using that as a token for expiration (just to indicate change in any of these fields). Then I would have one of these tokens for each directory. When a directory is requested, the server would hash everything and compare that to the cached hash provided by the client, and if it's different, return the normal data, otherwise an HTTP 304 equivalent. Is there a less elaborate way of doing something like this? Does "directory last modified date" take into account every one of its subdirectories' files' modification dates under all circumstances? I'm sure that the built-in Windows indexing service has something like this but ideally I wouldn't need to rely on it.
Because this service is for file sharing, something involving hashes would be especially nice so that I could automatically and efficiently find other people who are sharing a given file, but that's less of a concern then hosing the disk during the hash calculation.
I'm wondering what others who are more experienced than I am with programming would do to solve this problem (rsync and subversion have solved similar problems but not identical).
You're asking a lot of a File System Implementation of Very Little Brain (with apologies to A. A. Milne).
This is actually well-trammeled ground and you'd do well to look at the existing literature on distributed filesystems. AFS comes to mind as an example of a very well studied approach.
I doubt you'll be able to come up with something useful and accurate without doing some serious homework. Put another way, 'twould be folly to ignore all the prior art.

Resources