Usually, when an application writes to one of it's files on disk, the file modified timestamp changes.
Sometimes, and in my case it is an application written in ProvideX (a Business Basic derivative i believe) doing the writing, the modified timestamp does not change after a write. A program like MyTrigger will not pick up on the write operation either, but Sysinternals ProcessMonitor does log the disk activity.
It seems obvious that there are different ways to ask windows to perform write operations, and the request could then be hooked or logged in various different ways as well.
I need to be able to hook the write operations coming from the ProvideX application. Any pointers on the different ways windows writes to disk, and the type of hooks available for them would be greatly appreciated.
Thanks
User-mode process can write to the file either using WriteFile API function or using MMF, memory-mapped file API (CreateFileMapping/MapViewOfFile/Write to memory block). Maybe your application goes MMF way. MMF writes to files very differently from WriteFile API, but they both lead to the same end point - IRP sent to file system driver. File system filter driver (such as the one used by Sysinternals stuff) can track write requests on that IRP level. It is technically possible to distinguish between write operations initiated by MMF and WriteFile as different IRPs are sent (cached and non-cached writing is involved). It seems that directory change monitoring function in windows tracks only one IRP type, and this causes MyTrigger to miss the change.
Related
Apparently the Windows file cache flushes data to disk asynchronously, even when using the synchronous WriteFile() API. Quoting "File Caching" on MSDN:
By default, [...] write operations write file data to the system
file cache rather than to the disk, and this type of cache is
referred to as a write-back cache.
Assuming that write-through and no-buffering flags are not used, what happens if the actual write to disk fails? Can clients be notified of such failures? What is the expected client error handling model for such failures? "Fire and forget" and "Write and pray" come to mind but maybe there is something else.
Secondary question: are there certain classes of errors that are guaranteed to be detected early? E.g. will WriteFile() always return an error if the disk is full? -- even though the actual write to disk would be deferred?
I would like to know how to write reliable file i/o that responds to these kinds of errors without disabling the Windows File Cache.
Bonus points: is this handled differently on other operating systems? Can you recommend a good resource on the topic?
In Windows 7, the user is notified via a pop-up dialog from the notification area.
Normal errors (such as the disk being full, lack of permissions, etc.) are reported back to the application immediately, these do not cause late failures.
Late failures can only happen in a handful of situations, such as a hardware failure or operating system crash. They can also happen when writing to a network share if the connection drops unexpectedly for any reason.
In most cases, it doesn't make sense for an application to worry about this. Data loss is to be expected under these circumstances; let the user deal with it.
If the data you are writing is unusually important, then you may need to worry, in which case you will have to use the write-through and/or no-buffering flags.
There is no third option.
AFAIK, OS X is a BSD derivation, which doesn't have actual mandatory file locking. If so, it seems that I have no way to prevent writing access from other programs even while I am writing a file.
How to guarantee file integrity in such environment? I don't care integrity after my program exited, because that's now user's responsibility. But at least, I think I need some kind of guarantee while my program is running.
How do other programs guarantee file content integrity without mandatory locking? Especially database programs. If there's common technique or recommended practice, please let me know.
Update
I am looking for this for data layer of GUI application for non-engineer users. And currently, my program have this situations.
Data is too big that it cannot be fit to RAM. And even hard to be temporarily copied. So it cannot be read/written atomically, and should be used from disk directly while program is running.
A long running professional GUI content editor application used by humans who are non-engineers. Though users are not engineers, but they still can access the file simultaneously with Finder or another programs. So users can delete or write on currently using file accidentally. Problem is users don't understand what is actually happening, and expect program handles file integrity at least program is running.
I think the only way to guarantee file's integrity in current situation is,
Open file with system-wide exclusive mandatory lock. Now the file is program's responsibility.
Check for integrity.
Use the file as like external memory while program is running.
Write all the modifications.
Unlock. Now the file is user's responsibility.
Because OS X lacks system-wide mandatory lock, so now I don't know what to do for this. But still I believe there's a way to archive this kind of file integrity, which just I don't know. And I want to know how everybody else handles this.
This question is not about my programming error. That's another problem. Current problem is protecting data from another programs which doesn't respect advisory file lockings. And also, users are usually root and the program is running with same user, so trivial Unix file privilege is not useful.
You have to look at the problem that you are trying to actually solve with mandatory locking.
File content integrity is not guaranteed by mandatory locking; unless you keep your file locked 24/7; file integrity will still depend on all processes observing file format/access conventions (and can still fail due to hard drive errors etc.).
What mandatory locking protects you against is programming errors that (by accident, not out of malice) fail to respect the proper locking protocols. At the same time, that protection is only partial, since failure to acquire a lock (mandatory or not) can still lead to file corruption. Mandatory locking can also reduce possible concurrency more than needed. In short, mandatory locking provides more protection than advisory locking against software defects, but the protection is not complete.
One solution to the problem of accidental corruption is to use a library that is aggressively tested for preserving data integrity. One such library (there are others) is SQlite (see also here and here for more information). On OS X, Core Data provides an abstraction layer over SQLite as a data storage. Obviously, such an approach should be complemented by replication/backup so that you have protection against other causes for data corruption where the storage layer cannot help you (media failure, accidental deletion).
Additional protection can be gained by restricting file access to a database and allowing access only through a gateway (such as a socket or messaging library). Then you will just have a single process running that merely acquires a lock (and never releases it). This setup is fairly easy to test; the lock is merely to prevent having more than one instance of the gateway process running.
One simple solution would be to simply hide the file from the user until your program is done using it.
There are various ways to hide files. It depends on whether you're modifying an existing file that was previously visible to the user or creating a new file. Even if modifying an existing file, it might be best to create a hidden working copy and then atomically exchange its contents with the file that's visible to the user.
One approach to hiding a file is to create it in a location which is not normally visible to users. (That is, it's not necessary that the file be totally impossible for the user to reach, just out of the way so that they won't stumble on it.) You can obtain such a location using -[NSFileManager URLForDirectory:inDomain:appropriateForURL:create:error:] and passing NSItemReplacementDirectory and NSUserDomainMask for the first two parameters. See -replaceItemAtURL:withItemAtURL:backupItemName:options:resultingItemURL:error: method for how to atomically move the file into its file place.
You can set a file to be hidden using various APIs. You can use -[NSURL setResourceValue:forKey:error:] with the key NSURLIsHiddenKey. You can use the chflags() system call to set UF_HIDDEN. The old Unix standby is to use a filename starting with a period ('.').
Here's some details about this topic:
https://developer.apple.com/library/ios/documentation/FileManagement/Conceptual/FileSystemProgrammingGuide/FileCoordinators/FileCoordinators.html
Now I think the basic policy on OSX is something like this.
Always allow access by any process.
Always be prepared for shared data file mutation.
Be notified when other processes mutates the file content, and provide proper response on them. For example you can display an error to end users if other process is trying to access the file. And then users will learn that's bad, and will not do it again.
I need a two-way communication between a kernel-mode WFP driver and a user-mode application. The driver initiates the communication by passing a URL to the application which then does a categorization of that URL (Entertainment, News, Adult, etc.) and passes that category back to the driver. The driver needs to know the category in the filter function because it may block certain web pages based on that information. I had a thread in the application that was making an I/O request that the driver would complete with the URL and a GUID, and then the application would write the category into the registry under that GUID where the driver would pick it up. Unfortunately, as the driver verifier pointed out, this is unstable because the Zw registry functions have to run at PASSIVE_LEVEL. I was thinking about trying the same thing with mapped memory buffers, but I’m not sure what the interrupt requirements are for that. Also, I thought about lowering the interrupt level before the registry function calls, but I don't know what the side effects of that are.
You just need to have two different kinds of I/O request.
If you're using DeviceIoControl to retrieve the URLs (I think this would be the most suitable method) this is as simple as adding a second I/O control code.
If you're using ReadFile or equivalent, things would normally get a bit messier, but as it happens in this specific case you only have two kinds of operations, one of which is a read (driver->application) and the other of which is a write (application->driver). So you could just use WriteFile to send the reply, including of course the GUID so that the driver can match up your reply to the right query.
Another approach (more similar to your original one) would be to use a shared memory buffer. See this answer for more details. The problem with that idea is that you would either need to use a spinlock (at the cost of system performance and power consumption, and of course not being able to work on a single-core system) or to poll (which is both inefficient and not really suitable for time-sensitive operations).
There is nothing unstable about PASSIVE_LEVEL. Access to registry must be at PASSIVE_LEVEL so it's not possible directly if driver is running at higher IRQL. You can do it by offloading to work item, though. Lowering the IRQL is usually not recommended as it contradicts the OS intentions.
Your protocol indeed sounds somewhat cumbersome and doing a direct app-driver communication is probably preferable. You can find useful information about this here: http://msdn.microsoft.com/en-us/library/windows/hardware/ff554436(v=vs.85).aspx
Since the callouts are at DISPATCH, your processing has to be done either in a worker thread or a DPC, which will allow you to use ZwXXX. You should into inverted callbacks for communication purposes, there's a good document on OSR.
I've just started poking around WFP but it looks like even in the samples that they provide, Microsoft reinject the packets. I haven't looked into it that closely but it seems that they drop the packet and re-inject whenever processed. That would be enough for your use mode engine to make the decision. You should also limit the packet capture to a specific port (80 in your case) so that you don't do extra processing that you don't need.
Is there any way to hook all disk writes going thru the system, and receive the file names of whatever's being modified, using the Win32 API? Or is this something that would require writing a driver?
You can't do this in user mode, it needs to be kernel mode and so that means a driver. You need a File System Filter Driver.
If you don't care about intercepting the actual data, and only want to know which files are being modified/created/deleted then you can use the ReadDirectoryChangesW API to get that info from userland. Note however that it's one of the hardest functions to use effectively and efficiently, and you should be familiar with IOCP to use it correctly.
I'm developing a network-redirector like SMB.
I want to test various file I/O to compare NTFS or SMB implementation.
What I want to test are,
CreateFile
Read, WriteFile
DeleteFile
RenameFile
Set, GetFileInformationByHandle
etc.
And it' would be better if it can measure each I/Os duration.
Is there a program I can use?
If you are developing a file system driver or use some redirector driver (either our Callback File System or alternatives), you can use IFSTest tool to check your implementation for correctness.
XPerf will answer all of these questions, allowing you to see perf at both the file level and the block level. Check out the PDC09 video on the topic at http://www.microsoftpdc.com/2009/CL16
You can use the File Server Capacity Tool (FSCT) provided by Microsoft. It will let you simulate a typical user workload against a file server that supports SMB. The tool can simulate multiple users from a single client and aggregates the results into text files in the end.
You can get more information, including links to download and detailed presentations at http://blogs.technet.com/b/josebda/archive/2009/09/16/file-server-capacity-tool-fsct-1-0-available-for-download.aspx