I'm playing around with rb-kqueue on OS X and I can't work out how to get the extend flag to fire.
From the OpenBSD kqueue man page (which also matches the OS X man page):
NOTE_EXTEND The file referenced by the descriptor was extended
Which is very circular.
The FreeBSD kqueue man page has:
NOTE_EXTEND For regular file, the file referenced by the descriptor
was extended. For directory, reports that a directory entry was
added or removed, as the result of rename operation. The NOTE_EXTEND
event is not reported when a name is changed inside the directory.
This is a lot more descriptive, however, I've been running a kqueue on a directory and try as I might I cannot get the extend flag to fire. I've tried mv, rename, xattr (because searching for "extended" brings back lots of results on extended attributes), adding sub dirs and files with mkdir and touch and redirects but nothing results in the extend flag being part of an event, just write and/or link.
Hence I'm confused as to what extend really is. Is it just because I'm running it on OS X?
I hope to shed some light on this issue.
The flag NOTE_EXTEND corresponds to the action taken on a file. In your case you should trigger the action when a file size has increased.
The reasoning
To quote the original paper - Kquote: A generic and scalable event notification facility:
The fflags field is used to specify which actions on the descriptor
the application is interested in on registration, and upon return,
which actions have occurred. The possible actions are:
NOTE DELETE
NOTE WRITE
NOTE EXTEND
NOTE ATTRIB
NOTE LINK
NOTE RENAME
These correspond to the actions that the filesystem performs on the
file and thus will not be explained here. These notes may be OR-d
together in the returned kevent, if multiple actions have occurred.
E.g.: a file was written, then renamed. The final general purpose
filter is the PROC filter, which detects process changes. For this
filter, the ident field is interpreted as a process identifier. This
filter can watch for several types of events, and the fflags that
control this filter are outlined in Figure 3
The Figure 3 for EVFILT PROC:
Input/Output Flags:
NOTE EXIT Process exited.
NOTE FORK Process called
fork()
NOTE EXEC Process executed a new process via execve(2) or
similar call.
NOTE TRACK Follow a process across fork() calls. The
parent process will return with NOTE TRACK set in the flags field,
while the child process will return with NOTE CHILD set in fflags and
the parent PID in data.
Output Flags only:
NOTE CHILD This is the
child process of a TRACKed process which called fork().
NOTE TRACKERR
This flag is returned if the system was unable to attach an event to
the child process, usually due to resource limitations.
Related
From the doc (ReadDirectoryChangesW):
"Retrieves information that describes the changes within the specified directory. The function does not report changes to the specified directory itself."
My question is: What do I use to report changes to the specified directory itself?
I want to be able to capture changes not only to things and sub-things in the folder itself but also to detect for example, when the folder itself has been deleted.
One strategy would be to actually monitor for changes on the parent of the folder I'm really interested in and then use that to generate an event when the folder I'm interested in is deleted. This works but has the potential to generate thousands of 'uninteresting' events.
A second strategy is to have a recursive monitor for stuff under the folder I'm actually interested in and then a non-recursive monitor on it'a parent. The non-recursive monitor would then be able to tell me when the real folder of interest is deleted.
The latter, second strategy, generates fewer events and is the strategy I would like to use. BUT: It doesn't work 'in process'. That is, if I start to monitor the folder of interest recursively (HANDLE A), and it's parent non-recursively (HANDLE B) and then in the same process, I try and delete the folder of interest, no removal event is generated for it (even though I verify from a console that the thing no longer exists). My suspicion is that this is due to HANDLE A on the folder still being open, and even though I have included the "FILE_SHARE_DELETE" flag in the call to CreateFileW that gave me HANDLE A, it simply can't work.
Note that 'Out of process', i.e. when I delete the folder from within a completely separate process, the above strategy does work.
So, what are my options?
Many thanks,
Ben.
Dir-s seem awkward as compared to File-s. Many of the methods are similar to IO methods, but a Dir doesn't inherit from IO. For example, tell in the IO docs reads:
Returns the current offset (in bytes) of ios.
When read-ing and tell-ing through a normal Dir, I get large numbers like 346723732 and 422823816. I was originally expecting these integers to be more "array-like" and just be a simple range.
Are these the bytes of the files contained in the Dir?
If not, is there any meaning to the numbers returned like IO#tell?
Also why do Dir-s have an open and close function if they are not Streams?
Is it still just as important to close a Dir as a normal IO?
Any general explanation of how a Ruby Dir works would be appreciated.
update Another confusing part: if Dirs are not IOs, why does close raise an IOerror?
Closes the directory stream. Any further attempts to access dir will raise an IOError.
Also notice that in the documentation it considers it a "directory stream". So this brings up the question again of are they streams or not and if not, why the naming convention?
The docs for Dir#tell say:
Returns the current position in dir.
without specifying what the position means. What the returned value signifying is likely to vary based on the OS that you're using and possibly the type of the file system that contains the directory. That value should be treated as opaque, don't try to interpret it in any way. The only purpose it serves is for being able to send that value back to the OS such as by calling Dir#seek.
Directories are not just a giant file. More typically they just map from a file name to information about where the data for the file is contained.
You should not (and as far as I'm aware cannot) write to directories yourself.
So after some IRC chat here's the conclusion I've come to:
The Dir object is NOT an IO
Dir Does not inherit from the IO class and is only readable. Still not sure why an IOError is raised on #close.
An opened Dir IS a stream however
Objects of class Dir are directory streams representing directories in the underlying file system.
Also if you check the source for Dir#close You will see that it calls the C function dirclose. man dirclose prints:
The closedir() function closes the directory stream associated with
dirp. A successful call to closedir() also closes the underlying file
descriptor associated with dirp. The directory stream descriptor dirp
is not available after this call.
...with dirp being a param.
So yes, instantiated Dirs will open a stream and yes, Dirs will use a file descriptor and need to be closed if you do not want to rely on garbage collection.
Big thanks to injekt and others on #ruby-lang irc!
Observed that FWRITE or KAUTH_FILEOP_CLOSE_MODIFIED is not consistenly set in action KAUTH_FILEOP_CLOSE during file modification or file copy.
My usecase is - I am trying to figure out whether the file being closed is modified file or newly created file. I want to ignore files that are not modified.
As per documentation, I am checking for KAUTH_FILEOP_CLOSE_MODIFIED flag when the file action is KAUTH_FILEOP_CLOSE. Most of the time, I have observed KAUTH_FILEOP_CLOSE_MODIFIED is not set when file is copied from one location to other or file is modified.
I also observed that FWRITE flag is set, but not consistently for modified or copied files. I am just wondering why the behavior is so inconsistent.
Another way I was thinking was to rely on vnode actions KAUTH_VNODE_WRITE_DATA, But I have observed that there KAUTH_VNODE_WRITE_DATA multiple calls comes after KAUTH_FILEOP_CLOSE and even when file is not modified.
Any idea why such behavior exist?
Thanks in advance.
Regards,
Rupesh
KAuth and especially KAUTH_FILEOP_CLOSE_MODIFIED is buggy, and I already reported some problems related to it to Apple (a long time ago):
Events happening on file descriptor inherited from parent process seem to not trigger call to the KAuth callback at all. (See http://www.openradar.me/8898118)
The flag KAUTH_FILEOP_CLOSE_MODIFIED is not specified when the given file has transparent zlib compression enabled. (See http://www.openradar.me/23029109)
That said, I am quite confident that (as of 10.5.x, 10.6.x, 10.7.x) the callbacks are always called directly from the kernel thread doing the syscall. For example when open(2) is called, it calls the kauth callbacks for the vnode context and then (if allowed by return value) calls the VFS driver to realize the operation. The fileop (KAUTH_FILEOP_CLOSE) works also from the same thread but is just called after the closing itself.
Hence I don't think KAUTH_VNODE_WRITE_DATA can come after KAUTH_FILEOP_CLOSE for the same event.
Either you have a bug in your code, or it is another event (e.g. next open of the same file after it was closed in the same or other process.)
Still there are some traps you must be aware of:
Any I/O which is performed by kernel itself (including other kexts) does not trigger the kauth callbacks at all.
If there are multiple callbacks for the vnode context (e.g. from multiple Kexts), kernel then calls them one by one for every event. However as soon as some of them returns KAUTH_RESULT_ALLOW or KAUTH_RESULT_DENY, it finally decides and the rest of the callbacks is not called. I.e. all callbacks are called only if all of them but the last return KAUTH_RESULT_DEFER. (AFAIK, for fileop this is not true, because in this case the return value is completely ignored.)
If I'm writing a simple text log file from multiple processes, can they overwrite/corrupt each other's entries?
(Basically, this question Is file append atomic in UNIX? but for Windows/NTFS.)
You can get atomic append on local files. Open the file with FILE_APPEND_DATA access (Documented in WDK). When you omit FILE_WRITE_DATA access then all writes will ignore the the current file pointer and be done at the end-of file. Or you may use FILE_WRITE_DATA access and for append writes specify it in overlapped structure (Offset = FILE_WRITE_TO_END_OF_FILE and OffsetHigh = -1 Documented in WDK).
The append behavior is properly synchronized between writes via different handles. I use that regularly for logging by multiple processes. I do write BOM at every open to offset 0 and all other writes are appended. The timestamps are not a problem, they can be sorted when needed.
Even if append is atomic (which I don't believe it is), it may not give you the results you want. For example, assuming a log includes a timestamp, it seems reasonable to expect more recent logs to be appended after older logs. With concurrency, this guarantee doesn't hold - if multiple processes are waiting to write to the same file, any one of them might get the write lock - not just the oldest one waiting. Thus, logs can be written out of sequence.
If this is not desirable behaviour, you can avoid it by publishing logs entries from all processes to a shared queue, such as a named pipe. You then have a single process that writes from this queue to the log file. This avoids the conccurrency issues, ensures that logs are written in order, and works when file appends are not atomic, since the file is only written to directly by one process.
From this MSDN page on creating and opening Files:
An application also uses CreateFile to specify whether it wants to share the file for reading, writing, both, or neither. This is known as the sharing mode. An open file that is not shared (dwShareMode set to zero) cannot be opened again, either by the application that opened it or by another application, until its handle has been closed. This is also referred to as exclusive access.
and:
If you specify an access or sharing mode that conflicts with the modes specified in the previous call, CreateFile fails.
So if you use CreateFile rather than say File.Open which doesn't have the same level of control over the file access, you should be able to open a file in such a way that it can't get corrupted by other processes.
You'll obviously have to add code to your processes to cope with the case where they can't get exclusive access to the log file.
No it isn't. If you need this there is Transactional NTFS in Windows Vista/7.
Please note this is not duplicate of File r/w locking and unlink. (The difference - platform. Operations of files like locking and deletion have totally different semantics, thus the sultion would be different).
I have following problem. I want to create a file system based session storage where each session data is stored in simple file named with session ids.
I want following API: write(sid,data,timeout), read(sid,data,timeout), remove(sid)
where sid==file name, Also I want to have some kind of GC that may remove all timed-out sessions.
Quite simple task if you work with single process but absolutly not trivial when working with multiple processes or even over shared folders.
The simplest solution I thought about was:
write/read:
hanlde=CreateFile
LockFile(handle)
read/write data
UnlockFile(handle)
CloseHanlde(handle)
GC (for each file in directory)
hanlde=CreateFile
LockFile(handle)
check if timeout occured
DeleteFile
UnlockFile(handle)
CloseHanlde(handle)
But AFIAK I can't call DeleteFile on opended locked file (unlike in Unix where file locking is
not mandatory and unlink is allowed for opened files.
But if I put DeleteFile outside of Locking loop bad scenario may happen
GC - CreateFile/LockFile/Unlock/CloseHandle,
write - oCreateFile/LockFile/WriteUpdatedData/Unlock/CloseHandle
GC - DeleteFile
Does anybody have an idea how such issue may be solved? Are there any tricks that allow
combine file locking and file removal or make operation on file atomic (Win32)?
Notes:
I don't want to use Database,
I look for a solution for Win32 API for NT 5.01 and above
Thanks.
I don't really understand how this is supposed to work. However, deleting a file that's opened by another process is possible. The process that creates the file has to use the FILE_SHARE_DELETE flag for the dwShareMode argument of CreateFile(). A subsequent DeleteFile() call will succeed. The file doesn't actually get removed from the file system until the last handle on it is closed.
You currently have data in the record that allows the GC to determine if the record is timed out. How about extending that housekeeping info with a "TooLateWeAlreadyTimedItOut" flag.
GC sets TooLateWeAlreadyTimedItOut = true
Release lock
<== writer comes in here, sees the "TooLate" flag and so does not write
GC deletes
In other words we're using a kind of optimistic locking approach. This does require some additional complexity in the Writer, but now you're not dependent upon any OS-specifc wrinkles.
I'm not clear what happens in the case:
GC checks timeout
GC deletes
Writer attempts write, and finds no file ...
Whatever you have planned for this case can also be used in the "TooLate" case
Edited to add:
You have said that it's valid for this sequence to occur:
GC Deletes
(Very slightly later) Writer attempts a write, sees no file, creates a new one
The writer can treat "tooLate" flag as a identical to this case. It just creates a new file, with a different name, use a version number as a trailing part of it's name. Opening a session file the first time requires a directory search, but then you can stash the latest name in the session.
This does assume that there can only be one Writer thread for a given session, or that we can mediate between two Writer threads creating the file, but that must be true for your simple GC/Writer case to work.
For Windows, you can use the FILE_FLAG_DELETE_ON_CLOSE option to CreateFile - that will cause the file to be deleted when you close the handle. But I'm not sure that this satisfies your semantics (because I don't believe you can clear the delete-on-close attribute.
Here's another thought. What about renaming the file before you delete it? You simply can't close the window where the write comes in after you decided to delete the file but what if you rename the file before deleting it? Then when the write comes in it'll see that the session file doesn't exist and recreate it.
The key thing to keep in mind is that you simply can't close the window in question. IMHO there are two solutions:
Adding a flag like djna mentioned or
Require that a per-session named mutex be acquired which has the unfortunate side effect of serializing writes on the session.
What is the downside of having a TooLate flag? In other words, what goes wrong if you delete the file prematurely? After all your system has to deal with the file not being present...