Related
I'm playing around with rb-kqueue on OS X and I can't work out how to get the extend flag to fire.
From the OpenBSD kqueue man page (which also matches the OS X man page):
NOTE_EXTEND The file referenced by the descriptor was extended
Which is very circular.
The FreeBSD kqueue man page has:
NOTE_EXTEND For regular file, the file referenced by the descriptor
was extended. For directory, reports that a directory entry was
added or removed, as the result of rename operation. The NOTE_EXTEND
event is not reported when a name is changed inside the directory.
This is a lot more descriptive, however, I've been running a kqueue on a directory and try as I might I cannot get the extend flag to fire. I've tried mv, rename, xattr (because searching for "extended" brings back lots of results on extended attributes), adding sub dirs and files with mkdir and touch and redirects but nothing results in the extend flag being part of an event, just write and/or link.
Hence I'm confused as to what extend really is. Is it just because I'm running it on OS X?
I hope to shed some light on this issue.
The flag NOTE_EXTEND corresponds to the action taken on a file. In your case you should trigger the action when a file size has increased.
The reasoning
To quote the original paper - Kquote: A generic and scalable event notification facility:
The fflags field is used to specify which actions on the descriptor
the application is interested in on registration, and upon return,
which actions have occurred. The possible actions are:
NOTE DELETE
NOTE WRITE
NOTE EXTEND
NOTE ATTRIB
NOTE LINK
NOTE RENAME
These correspond to the actions that the filesystem performs on the
file and thus will not be explained here. These notes may be OR-d
together in the returned kevent, if multiple actions have occurred.
E.g.: a file was written, then renamed. The final general purpose
filter is the PROC filter, which detects process changes. For this
filter, the ident field is interpreted as a process identifier. This
filter can watch for several types of events, and the fflags that
control this filter are outlined in Figure 3
The Figure 3 for EVFILT PROC:
Input/Output Flags:
NOTE EXIT Process exited.
NOTE FORK Process called
fork()
NOTE EXEC Process executed a new process via execve(2) or
similar call.
NOTE TRACK Follow a process across fork() calls. The
parent process will return with NOTE TRACK set in the flags field,
while the child process will return with NOTE CHILD set in fflags and
the parent PID in data.
Output Flags only:
NOTE CHILD This is the
child process of a TRACKed process which called fork().
NOTE TRACKERR
This flag is returned if the system was unable to attach an event to
the child process, usually due to resource limitations.
From the doc (ReadDirectoryChangesW):
"Retrieves information that describes the changes within the specified directory. The function does not report changes to the specified directory itself."
My question is: What do I use to report changes to the specified directory itself?
I want to be able to capture changes not only to things and sub-things in the folder itself but also to detect for example, when the folder itself has been deleted.
One strategy would be to actually monitor for changes on the parent of the folder I'm really interested in and then use that to generate an event when the folder I'm interested in is deleted. This works but has the potential to generate thousands of 'uninteresting' events.
A second strategy is to have a recursive monitor for stuff under the folder I'm actually interested in and then a non-recursive monitor on it'a parent. The non-recursive monitor would then be able to tell me when the real folder of interest is deleted.
The latter, second strategy, generates fewer events and is the strategy I would like to use. BUT: It doesn't work 'in process'. That is, if I start to monitor the folder of interest recursively (HANDLE A), and it's parent non-recursively (HANDLE B) and then in the same process, I try and delete the folder of interest, no removal event is generated for it (even though I verify from a console that the thing no longer exists). My suspicion is that this is due to HANDLE A on the folder still being open, and even though I have included the "FILE_SHARE_DELETE" flag in the call to CreateFileW that gave me HANDLE A, it simply can't work.
Note that 'Out of process', i.e. when I delete the folder from within a completely separate process, the above strategy does work.
So, what are my options?
Many thanks,
Ben.
I am working on a tool which writes data to files.
At some point, a file might be "locked" and is not writable until other handles have been closed.
I could use the CreateFile API in a loop until the file is available for writing access.
But I have 2 concerns using CreateFile in a loop:
The Harddrive (cache) is always running...?!
I need to call CreateFile again to obtain a valid writing handle with different flags...?!
So my question is:
What is the best solution to wait for a file to be writable and instantly get a valid handle?
Are there any event solutions or anything, which allows to "queue/reserve" for a handle once, so that there is no "uncontrolled" race condition with others?
A file can be "locked" for two reasons:
An actual file lock which prevents writing to, and possibly reading from the file.
The file being opened without sharing access (accidentially or voluntarily) which even prevents you from opening a handle. If you already see CreateFile failing, that's likely the case rather than a real lock.
There are conceptually[1] at least two ways of knowing that no other process has locked a file without busy waiting:
By finding out who holds locks and waiting on the process or thread to exit (or, by outright killing them...)
By locking the file yourself
Who holds locks?
Finding out about lock owners is rather nasty, you can do it via the totally undocumented SystemLocksInformation class used with the undocumented NtQuerySystemInformation function (the latter is "only undocumented", but the former is so much undocumented that it's really hard to find any information at all). The returned structure is explained here, and it contains an owning thread id.
Luckily, holding a lock presumes holding a handle. Closing the file handle will unlock all file ranges. Which means: No lock without handle.
In other words, the problem can also be expressed as "who is holding an open handle to the file?". Of course not all processes that hold a handle to a file will have the file locked, but no process having a handle guarantees that no process has the file locked.
Code for finding out which processes have a file open is much easier (using restart manager) and is readily available at Raymond Chen's site.
Now that you know which processes and threads are holding file handles and locks, make a list of all thread/process handles and use WaitForMultipleObjects on the list of process handles. When a process exits, all handles are closed.
This also transparently deals with the possibility of a "lock" because a process does not share access.
Locking the file yourself
You can use LockFileEx, which operates asynchronously. Note that LockFileEx needs a valid handle that has been opened with either read or write permissions (getting write permission may not be possible, but read should work almost always -- even if you are prevented from actually reading by an exclusive lock, it's still possible to create a handle that could read if there was no lock).
You can then wait on the asynchronous locking to complete either via the event in the OVERLAPPED structure, or on a completion port, and can even do other useful stuff in the mean time, too. Once you have locked the file, you know that nobody else has it locked.
[1] The wording "conceptually" suggests that I am pretty sure either method will work, but I have not tested them.
Apart from a busy loop, repeatedly trying to open the file with write access (which doesn't smell right - what if the file is locked by a process that is stuck and requires a reboot or manual termination, you'll never be able to write to it.
You could write to a temporary file and rename it afterwards (you can tell the OS a file rename operation is required and it will do it at next boot). If you need to append instead of write, then you'll have to write a process to append your temporary file to the correct one, possibly at startup (write the instructions of which file to append to where to a file that your process reads).
If you need to modify a locked file, then you'll just have to take a lock on it as soon as you can, and refuse to start the program if you don't have write access - warn the user right at the start.
There is a possibility that you can wait in a better way: if a file is locked for writing, you can assume that someone is going to write to it, and so use FindFirstChangeNotification to receive events for the FILE_NOTIFY_CHANGE_LAST_WRITE or FILE_NOTIFY_CHANGE_ATTRIBUTES events. Its not perfect in that someone could request exclusive access for reading too.
I suppose you could try to get the handle to the file that is locked and wait on that, so when it is released your WaitForSingleObject will return. However, there's a good chance you will not be allowed to get the handle owned by a different process (by the security subsystem)
I've written a program that reads the NTFS index and journal similar to what is described here:
http://ejrh.wordpress.com/2012/07/06/using-the-ntfs-journal-for-backups/
And It works fairly well.
In addition to the normal journal events USN_REASON_CLOSE, USN_REASON_FILE_CREATE, USN_REASON_FILE_DELETE etc' I'm receiving an event with reason USN_REASON_HARD_LINK_CHANGE. I'd like to be able to update the directory index according to this event but I can't find any information about it. The only documentation is:
An NTFS file system hard link is added to or removed from the file or
directory. An NTFS file system hard link, similar to a POSIX hard
link, is one of several directory entries that see the same file or
directory.
What does this mean? where was the hard-link created? or was it removed? how do I get more information about what happened?
I know this is ancient, but I stumbled upon this while researching a related problem. Here's what I found: The hard-links are a complicating factor when reading the USN. You can get journal entries describing change to a single file reference number by way of changes made through any hard-link that's been created. Generally, and to the original question, hard-links are alternative directory entries through which a single file might be accessed. Thus, all the file's characteristics are shared for each link (except for the names and parent file reference numbers). Technically, you can't tell which entry is the original and which is a link.
A subtle difference does exist, and it manifests if you query the master file table (using DeviceIOControl and Fsctl_Enum_Usn_Data). The query will return only a single representative file regardless of how many links exist. You can query for the links using NtQueryInformationFile, querying for FILE_HARD_LINK_INFORMATION. I think of the entry returned by the MFT query as the main entry and the NtQueryInformationFile-returned items as links...however, the main entry can get deleted and one of the links will get promoted...so it's only a housekeeping thought and little else.
Note that a problem arises where one of the hard-links is moved or renamed. In this case, the journal entries for the rename or move reflect the filename and parent file reference number of the affected link. The problem arises if you ask for only the summary "on-close" records. In such a case, you won't ever see the USN_REASON_RENAME_OLD_NAME record...because that USN entry never gets an associated REASON_CLOSE associated with it. Without this tidbit, you won't be able to easily determine which link's name or location was changed. You have to read the usn with ReadOnlyOnClose set to 0 in the Read_Usn_Journal_Data_V0. This is a far chattier query, but without it, you can't accurately associate the change with one link or the other.
As always with the USN, I expect you'll need to go through a bit of trial and error to get it to work right. These observations/guesses may, I hope, be helpful:
When the last hard link to a file is deleted, the file is deleted; so if the last hard link has been removed you should see USN_REASON_FILE_DELETE instead of USN_REASON_HARD_LINK_CHANGE. I believe that each reference number refers to a file (or directory, but NTFS doesn't support multiple hard links to directories AFAIK) rather than to a hard link. So immediately after the event is recorded, at least, the file reference number should still be valid, and point to another name for the file.
If the file still exists, you can look it up by reference number and use FindFirstFileNameW and friends to find the current links. Comparing this to the event record in question plus any relevant later events should give you enough information, although if multiple hard links for the same file are deleted and/or created you might not be able to reconstruct the order in which this happened, and if you don't have enough information about the prior state of the file system you might not be able to identify the deleted hard links. I don't know whether that would matter to you or not.
If the file no longer exists, you should still be able to identify it by the USN record in which it was deleted. Again, taking all relevant events into consideration, and with enough information about the prior state, you should be able to reconstruct most of what happened, if not the order.
There is some hope that we can do better than this: the file name and/or ParentFileReference number in the event record might refer to the hard link that was created or deleted, rather than to an arbitrary link to the file. In this case you'll have all the relevant information about the sequence of events except for whether any particular event was a create or a delete, which you should be able to work out by looking at the current state of the file and working backwards through the records.
I assume you've already looked for nearby change records that might contain additional information? There isn't, for example, a USN_REASON_RENAME_NEW_NAME record generated when a hard link is created or a USN_REASON_RENAME_OLD_NAME when a hard link is removed? Or paired USN_REASON_HARD_LINK_CHANGE records, one for the file, one for the directory containing the affected hard link to the file? (Wishful thinking, I expect, but it wouldn't hurt to look!)
For testing purposes, you can create hard links with the mklink command.
Observed that FWRITE or KAUTH_FILEOP_CLOSE_MODIFIED is not consistenly set in action KAUTH_FILEOP_CLOSE during file modification or file copy.
My usecase is - I am trying to figure out whether the file being closed is modified file or newly created file. I want to ignore files that are not modified.
As per documentation, I am checking for KAUTH_FILEOP_CLOSE_MODIFIED flag when the file action is KAUTH_FILEOP_CLOSE. Most of the time, I have observed KAUTH_FILEOP_CLOSE_MODIFIED is not set when file is copied from one location to other or file is modified.
I also observed that FWRITE flag is set, but not consistently for modified or copied files. I am just wondering why the behavior is so inconsistent.
Another way I was thinking was to rely on vnode actions KAUTH_VNODE_WRITE_DATA, But I have observed that there KAUTH_VNODE_WRITE_DATA multiple calls comes after KAUTH_FILEOP_CLOSE and even when file is not modified.
Any idea why such behavior exist?
Thanks in advance.
Regards,
Rupesh
KAuth and especially KAUTH_FILEOP_CLOSE_MODIFIED is buggy, and I already reported some problems related to it to Apple (a long time ago):
Events happening on file descriptor inherited from parent process seem to not trigger call to the KAuth callback at all. (See http://www.openradar.me/8898118)
The flag KAUTH_FILEOP_CLOSE_MODIFIED is not specified when the given file has transparent zlib compression enabled. (See http://www.openradar.me/23029109)
That said, I am quite confident that (as of 10.5.x, 10.6.x, 10.7.x) the callbacks are always called directly from the kernel thread doing the syscall. For example when open(2) is called, it calls the kauth callbacks for the vnode context and then (if allowed by return value) calls the VFS driver to realize the operation. The fileop (KAUTH_FILEOP_CLOSE) works also from the same thread but is just called after the closing itself.
Hence I don't think KAUTH_VNODE_WRITE_DATA can come after KAUTH_FILEOP_CLOSE for the same event.
Either you have a bug in your code, or it is another event (e.g. next open of the same file after it was closed in the same or other process.)
Still there are some traps you must be aware of:
Any I/O which is performed by kernel itself (including other kexts) does not trigger the kauth callbacks at all.
If there are multiple callbacks for the vnode context (e.g. from multiple Kexts), kernel then calls them one by one for every event. However as soon as some of them returns KAUTH_RESULT_ALLOW or KAUTH_RESULT_DENY, it finally decides and the rest of the callbacks is not called. I.e. all callbacks are called only if all of them but the last return KAUTH_RESULT_DEFER. (AFAIK, for fileop this is not true, because in this case the return value is completely ignored.)