How to determine a volume supports resolving bookmarks to renamed or moved files? - cocoa

- bookmarkDataWithOptions:includingResourceValuesForKeys:relativeToURL:error:
Documentation states:
This method returns bookmark data that can later be
resolved into a URL object for a file even if the user moves or
renames it (if the volume format on which the file resides supports
doing so).
My question is, how can I query if a volume supports this feature?
From trial and error it seems only (internal?) hard drives support it, but I am looking for some kind of sure test like a NSURLVolumeSupports???Key.
NSURLVolumeSupportsPersistentIDsKey looks like a good candidate, but I failed to find any docs or google-info about it. Any hints?

It definitely sounds like the NSURLVolumeSupportsPersistentIDsKey would apply.
Following the hints in this forum thread here (archived version here), the documentation for the VOL_CAP_FMT_PERSISTENTOBJECTIDS volume capability flag (from man getattrlist(2)) says:
If this bit is set the volume format supports persistent object identifiers and can look up file system objects by their IDs. See ATTR_CMN_OBJPERMANENTID for details about how to obtain these identifiers.
and the common attribute ATTR_CMN_OBJPERMANENTID documentation says
An fsobj_id_t structure that uniquely and persistently identifies the file system object within its volume; persistence implies that this attribute is unaffected by mount/unmount operations on the volume.
Some file systems can not return this attribute when the volume is mounted read-only and will fail the request with error EROFS. (e.g. original HFS modifies on disk structures to generate persistent identifiers, and hence cannot do so if the volume is mounted read only.)

Related

GetFileInformationByHandleEx/FileIdInfo vs DeviceIoControl/FSCTL_CREATE_OR_GET_OBJECT_ID for OpenFileById

Recently I've stumbled upon "If you want to use GUIDs to identify your files, then nobody's stopping you" article by Raymond Chen and wanted to implement this method. But then I found that there is another way to get file ID and this is GetFileInformationByHandleEx with FILE_INFO_BY_HANDLE_CLASS::FileIdInfo and using the FileId field (128 bit).
I tried both, both methods works as expected but I have a few questions I cannot find any answers to:
These methods return different IDs (and the id from GetFileInformationByHandleEx seems to use only the low 64 bit leaving the high part as zero). What each of them represent? Are they essentially the same thing or just two independent mechanisms to achieve the same goal?
Edit: Actually I've just found some information. So the ObjectID from DeviceIoControl is NTFS object ID but what is the other ID then? How do they relate (if at all)? Are both methods available only on NTFS or at least one of them will work on FAT16/32, exFAT, etc?
Documentation for FILE_INFO_BY_HANDLE_CLASS::FileIdInfo doesn't tell us that the ID may not exist unlike FSCTL_CREATE_OR_GET_OBJECT_ID where I need to explicitly state that I want the ID to be created if there isn't one already. Will it have any bad consequences if I'd just blindly request creation of object IDs for any file I'll be working with?
I found a comment for this question that these IDs remain unchanged if a file is moved to another volume (logical or physical). I did test only the DeviceIoControl method but they indeed don't chnage across drives but if I do move the file I'm required to supply OpenFileById with the destination volume handle, otherwise it won't open the file. So, is there a way to make OpenFileById find a file without keeping the volume reference?
I'm thinking of enumerating all connected volumes to try to open the file by ID for each until it succeed but I'm not sure how reliable is this. Could it be that there could exist two equal IDs that reference different files on different volumes?
How fast it is to ask system to get (or create) an ID? Will it hurt performance if I add the ID query to regular file enumeration procedures or I'd better to do that only on demand when I really need this?

Providing a basic filesystem from a char driver

I have an existing Linux device driver that exposes a basic char device to userland. (I am not its original author, but I'm trying to modify it.)
Currently it provides a maze of ioctl functions to do various things (though also wrapped in a handy library so most user code doesn't need to deal with the details of it).
One of the things that it does is to provide a sub-stream interface, where given a bunch of device-specific identifying information (including a string and some numeric ids) it can read or write (but not both at once) some data (up to a small number of MB) in a strictly sequential manner. Currently it does this with explicit ioctls.
I'm wondering if there is a way to leverage the existing file_operations infrastructure or similar to provide either a virtual filesystem or just an ioctl that can return a new already-open fd that can then be used with read/write/close (but not lseek) from userland as you'd normally expect?
The device does have a concept of a filename (that's the string) but it is not possible to enumerate existing valid filenames (only to try to open a specific filename and see if it gives an error or not), and the filename is not sufficient to open a stream by itself, which is why I'm currently leaning more towards the "special open" ioctl on the parent device rather than trying to expose things directly in some userland-visible fs that can be opened directly. (Also there's no concept of subdirs and only basic write-protect permissions, so a full fs seems like overkill anyway.) But I'm willing to be persuaded otherwise if there's a better way to do it.
I have written basic char drivers from scratch myself before, so I'm reasonably confident that I can get the read/write ops and other supporting things to work; I'm just not sure how to best handle that initial step of opening the handle.
I'm currently targeting kernel 3.2+.
Edit: The main reason that I think making an actual filesystem (or trying to expose it via procfs or sysfs) wouldn't work is that there's no way to populate a directory -- the only ops available are "open for read" and "open for write", and there's no way to tell which names are valid prior to the open attempt (the files are stored in external hardware and accessed via a protocol I cannot change). If I'm missing something and it is possible to support this sort of thing, that would be useful to know as well.
You can most certainly create a file system where readdir() is not implemented, but the open() method is. It's normally not done because it's not particularly user-friendly, but it certainly is doable.
You're targetting really ancient kernels if you're looking at 3.2 -- the upstream kernel developers aren't even bother to try to backport security fixes that far back, so I certainly wouldn't recommend shipping something as ancient as 3.2, but it's technically doable.
All you need to do is to implement lookup() method in the inode_operations structure for directories. You'll need to figure out some way of creating inodes with unique inode numbers, that contains private information so you can identify the subtream. The inode will have a file_operations structure that implements the read/write methods for reading and writing the substream.
You can try looking at a simple file system such as cramfs or minix to see how things are done.

Why do WebDAV implementations not support GETing a folder

RFC 2518 states:
The semantics of GET are unchanged when applied to a collection,
since GET is defined as, "retrieve whatever information (in the form
of an entity) is identified by the Request-URI" [RFC2068]. GET when
applied to a collection may return the contents of an "index.html"
resource, a human-readable view of the contents of the collection, or
something else altogether. Hence it is possible that the result of a
GET on a collection will bear no correlation to the membership of the
collection.
As a user of owncloud I often find myself suffering from the low performance of an initial sync of a folder containing lots of small files (See owncloud bugtracker for others reporting the same issue). After some investigation I came to the conclusion that the culprit is the underlying WebDAV implementation, which yields an index.html for a collection and thus forces the client to issue a GET request for each file. Since each GET causes a significant overhead (in the order of several hundreds of ms), the whole operation never uses the available bandwidth and is perceived as agonizingly slow.
So what is the reason that widely used WebDAV implementations do not allow a client to download a whole folder at a time? The specification does not explicitly forbid it. Surely this would increase performance, so I guess there must be some technical reason to this limitation.
The specification does not explicitly forbid it.
It does not forbid it, but it does not even remotely suggests that it's a something that the implementations should do. All the examples given are about retrieving a list or index of contents, not the contents itself.
Moreover, even if the server implementation chooses to support retrieving contents of a collection, there's no specification for format of that (how to package individual files into one download). So such implementation would be proprietary and your WebDAV client won't support it anyway.

(why) is FSCTL_SET_OBJECT_ID dangerous?

NTFS files can have object ids. These ids can be set using FSCTL_SET_OBJECT_ID. However, the msdn article says:
Modifying an object identifier can result in the loss of data from portions of a file, up to and including entire volumes of data.
But it doesn't go into any more detail. How can this result in loss of data? Is it talking about potential object id collisions in the file system, and does NTFS rely on them in some way?
Side node: I did some experimenting with this before I found that paragraph, and set the object id's of some newly created files, here's hoping that my file system's still intact.
I really don't think this can directly result in loss of data.
The only way I can imagine it being possible is if e.g. a backup program assumes that (1) every file has an Object Id, and (2) that the program is keeping track of all IDs at all times. In that case it might assume that an ID that is not in its database must refer to a file that should not exist, and it might delete the file.
Yeah, I know it sounds ridiculous, but that's the only way I can think of in which this might happen. I don't think you can lose data just by changing IDs.
They are used by distributed link tracking service which enables client applications to track link sources that have moved. The link tracking service maintains its link to an object only by using these object identifier (ID).
So coming back to your question,
Is it talking about potential object id collisions in the file system
?
I dont think so. Windows does provides us the option to set the object IDs using FSCTL_SET_OBJECT_ID but that doesnt bring the risk of ID collision.
Attempting to set an object identifier on an object that already has an object identifier will fail.
.. and does NTFS rely on them in some way?
Yes. Object identifiers are used to track files and directories. An index of all object IDs is stored on the volume. Rename, backup, and restore operations preserve object IDs. However, copy operations do not preserve object IDs, because that would violate their uniqueness.
How can this result in loss of data?
You wont get into a serious problem if you change(or rather set) object ID of user-created files(as you did). However, if a user(knowingly/unknowingly) sets object ID used by a shared object file/library, change will not be reflected as is.
Since Windows doesnt want everyone(but developers) to play with crutial library files, it issues a generic warning:
Modifying an object identifier can result in the loss of data from
portions of a file, up to and including entire volumes of data.
Bottom line: Change it if you know what you are doing.
There's another msn article on distributed link tracking and object identifiers.
Hope it helps!
EDIT:
Thanks to #Mehrdad for pointing out.I didnt mean object identifiers of DLLs themselves but ones which they use internally.
OLEACC(a dll), provides the Active Accessibility runtime and manages requests from Active Accessibility clients[source]. It use OBJID_QUERYCLASSNAMEIDX object identifier [ source ]

Find Out If Two HANDLEs are Hardlinks to the Same File

(This question is a toughie... it might require knowledge of NTFS and/or the use of NT Native APIs; be warned.) :)
If I'm given two HANDLEs to two files, how can I definitively (not just with high probability) find out if the two HANDLEs belong to the exact same file and stream on the disk?
This means, for example, checking the 8-byte NTFS file IDs isn't enough, because two HANDLEs with the same file ID can be pointing to different data streams of the same file, and I need to find out if the two streams are really the same and only differ by the name (hardlink).
(What's the use? This way, if I want to perform an operation on all files inside a folder, I don't do the operation twice on the same data stream with different names.)
This requires GetFileInformationByHandleEx(), asking for FileStreamInfo. That returns the stream name.
This warning in the SDK docs should be noted:
Certain file information classes
behave slightly differently on
different operating system releases.
These classes are supported by the
underlying drivers, and any
information they return is subject to
change between operating system
releases.
Avoid relying on recovering info that is (or should be) readily available in your program.

Resources