How to guarantee file integrity without mandatory file lock on OS X? - macos

AFAIK, OS X is a BSD derivation, which doesn't have actual mandatory file locking. If so, it seems that I have no way to prevent writing access from other programs even while I am writing a file.
How to guarantee file integrity in such environment? I don't care integrity after my program exited, because that's now user's responsibility. But at least, I think I need some kind of guarantee while my program is running.
How do other programs guarantee file content integrity without mandatory locking? Especially database programs. If there's common technique or recommended practice, please let me know.
Update
I am looking for this for data layer of GUI application for non-engineer users. And currently, my program have this situations.
Data is too big that it cannot be fit to RAM. And even hard to be temporarily copied. So it cannot be read/written atomically, and should be used from disk directly while program is running.
A long running professional GUI content editor application used by humans who are non-engineers. Though users are not engineers, but they still can access the file simultaneously with Finder or another programs. So users can delete or write on currently using file accidentally. Problem is users don't understand what is actually happening, and expect program handles file integrity at least program is running.
I think the only way to guarantee file's integrity in current situation is,
Open file with system-wide exclusive mandatory lock. Now the file is program's responsibility.
Check for integrity.
Use the file as like external memory while program is running.
Write all the modifications.
Unlock. Now the file is user's responsibility.
Because OS X lacks system-wide mandatory lock, so now I don't know what to do for this. But still I believe there's a way to archive this kind of file integrity, which just I don't know. And I want to know how everybody else handles this.
This question is not about my programming error. That's another problem. Current problem is protecting data from another programs which doesn't respect advisory file lockings. And also, users are usually root and the program is running with same user, so trivial Unix file privilege is not useful.

You have to look at the problem that you are trying to actually solve with mandatory locking.
File content integrity is not guaranteed by mandatory locking; unless you keep your file locked 24/7; file integrity will still depend on all processes observing file format/access conventions (and can still fail due to hard drive errors etc.).
What mandatory locking protects you against is programming errors that (by accident, not out of malice) fail to respect the proper locking protocols. At the same time, that protection is only partial, since failure to acquire a lock (mandatory or not) can still lead to file corruption. Mandatory locking can also reduce possible concurrency more than needed. In short, mandatory locking provides more protection than advisory locking against software defects, but the protection is not complete.
One solution to the problem of accidental corruption is to use a library that is aggressively tested for preserving data integrity. One such library (there are others) is SQlite (see also here and here for more information). On OS X, Core Data provides an abstraction layer over SQLite as a data storage. Obviously, such an approach should be complemented by replication/backup so that you have protection against other causes for data corruption where the storage layer cannot help you (media failure, accidental deletion).
Additional protection can be gained by restricting file access to a database and allowing access only through a gateway (such as a socket or messaging library). Then you will just have a single process running that merely acquires a lock (and never releases it). This setup is fairly easy to test; the lock is merely to prevent having more than one instance of the gateway process running.

One simple solution would be to simply hide the file from the user until your program is done using it.
There are various ways to hide files. It depends on whether you're modifying an existing file that was previously visible to the user or creating a new file. Even if modifying an existing file, it might be best to create a hidden working copy and then atomically exchange its contents with the file that's visible to the user.
One approach to hiding a file is to create it in a location which is not normally visible to users. (That is, it's not necessary that the file be totally impossible for the user to reach, just out of the way so that they won't stumble on it.) You can obtain such a location using -[NSFileManager URLForDirectory:inDomain:appropriateForURL:create:error:] and passing NSItemReplacementDirectory and NSUserDomainMask for the first two parameters. See -replaceItemAtURL:withItemAtURL:backupItemName:options:resultingItemURL:error: method for how to atomically move the file into its file place.
You can set a file to be hidden using various APIs. You can use -[NSURL setResourceValue:forKey:error:] with the key NSURLIsHiddenKey. You can use the chflags() system call to set UF_HIDDEN. The old Unix standby is to use a filename starting with a period ('.').

Here's some details about this topic:
https://developer.apple.com/library/ios/documentation/FileManagement/Conceptual/FileSystemProgrammingGuide/FileCoordinators/FileCoordinators.html
Now I think the basic policy on OSX is something like this.
Always allow access by any process.
Always be prepared for shared data file mutation.
Be notified when other processes mutates the file content, and provide proper response on them. For example you can display an error to end users if other process is trying to access the file. And then users will learn that's bad, and will not do it again.

Related

Ways to find out if the process is created by system (by pid) on macOS?

I'm implementing API which allows to launch other apps (using NSTask) inside VFS (FUSE on macOS). After VFS is mounted a bunch of processes start accessing launched VFS in which my app works, and I'd like to implement some kind of filtering mechnism which will allow to detect whether process which is accessing the VFS is created by system (and potentially safe) or not, and if so it'll be granted an access to the file system where my app runs.
So far I'm able to get basic information of the process by it's pid. For example: process path, uid, ppid, code signature of the process etc (using Security framework, libproc etc)
I've done a couple of tests and see that there are process with uid != 0 and still critical for my app to run (if I deny access to them app which is started in VFS crashes) (e.g. /usr/libexec/secinitd, /System/Library/CoreServices/Dock.app/Contents/MacOS/Dock), so looks like approach with filtering processes by pids, uids, ppids might not work.
So the question is: is it possible to distinguish whether process which is accessing my app was created by system and is potentially safe? I also don't want to do too much work by denying accees to critical system processes which will allow the app to successfully start and run in VFS.
Judging from the comment thread, your threat model is data theft via malware etc.
In this case, you can trust almost nothing, so the best way is probably to maintain an explicit whitelist of processes which are allowed to access your mount point, and block access to everything else by default. Log any processes to which access is denied, and allow the user to reverse that decision and add them to the whitelist. In other words, let the user decide what applications they consider safe.
Your said that according to your inspection, there were several processes which were mandatory for the process to run, so why won't use try-and-error approach.
You deploy you FUSE drive on clean environment and record all processes that attempt to access your files - try to prevent each process and keep only those which crash your apps, and add them to a white-list.
Of course that this list is subject to change in different macOS versions, but it can give you the general idea.
Alternatively, you can break your app into couple of parts. for example, put the sensitive logic inside separated dylib file, and prevent access to this file only.. since dylib is not the main executable in your app, I believe fewer processes require mandatory access it.

In Windows is it always necessary to use a handle to access a file?

In other words, is it possible to access a file without a handle being utilized?
You could use the CreateFile()-API to create a handle to the raw file-system and then parse the file structure by yourself (this is more work as it sounds!)
Though this would require admin-rights. This wouldn't trigger any hooks you have on CreateFile() or other file-related API-functions.
This wouldn't create a handle to the file but you still need a handle to the device.
For code running in user mode, any operation on a file will involve a handle of some kind, though not necessarily to the file in question. There are APIs that don't expose the handle to the programmer, but there is always one there.
In kernel mode, although it is usual to use handles for file operations, it is not necessary. For example, the file server component doesn't appear to open file handles when it is accessing a file on behalf of a remote user.

What errors can happen when (Windows) system file cache disk write-back fails? how are they reported?

Apparently the Windows file cache flushes data to disk asynchronously, even when using the synchronous WriteFile() API. Quoting "File Caching" on MSDN:
By default, [...] write operations write file data to the system
file cache rather than to the disk, and this type of cache is
referred to as a write-back cache.
Assuming that write-through and no-buffering flags are not used, what happens if the actual write to disk fails? Can clients be notified of such failures? What is the expected client error handling model for such failures? "Fire and forget" and "Write and pray" come to mind but maybe there is something else.
Secondary question: are there certain classes of errors that are guaranteed to be detected early? E.g. will WriteFile() always return an error if the disk is full? -- even though the actual write to disk would be deferred?
I would like to know how to write reliable file i/o that responds to these kinds of errors without disabling the Windows File Cache.
Bonus points: is this handled differently on other operating systems? Can you recommend a good resource on the topic?
In Windows 7, the user is notified via a pop-up dialog from the notification area.
Normal errors (such as the disk being full, lack of permissions, etc.) are reported back to the application immediately, these do not cause late failures.
Late failures can only happen in a handful of situations, such as a hardware failure or operating system crash. They can also happen when writing to a network share if the connection drops unexpectedly for any reason.
In most cases, it doesn't make sense for an application to worry about this. Data loss is to be expected under these circumstances; let the user deal with it.
If the data you are writing is unusually important, then you may need to worry, in which case you will have to use the write-through and/or no-buffering flags.
There is no third option.

Best secure single running app guard on windows

I would like to improve the way how an application is checking that another instance is not already running. Right now we are using named mutexes with checking of running processes.
The goal is to prevent security attacks (as this is security software). My idea right now is that "bulletproof" solution is only to write an driver, that will serve this kind of information and will authenticate client via signed binaries.
Does anyone solved such problem?
What are your opinions and recommendations?
First, let me say that there is ultimately no way to protect your process from agents that have administrator or system access. Even if you write a rootkit driver that intercepts all system calls (a difficult and unsafe practice in of itself), there are still ways to use admin access to get in. You have the wrong design if this is a requirement.
If you set up your secure process to run as a service, you can use the Service Control Manager to start it. The SCM will only start one instance, will monitor that it stays up, allow you to define actions to execute if it crashes, and allow you to query the current status. Since this is controlled by the SCM and the service database can only be modified by administrators, an attacking process would not be able to spoof it.
I don't think there's a secure way of doing this. No matter what kind of system-unique, or user-unique named object you use - malicious 3rd party software can still use the exact same name and that would prevent your application from starting at all.
If you use the method of checking the currently executing processes, and checking if no executable with the same name is running - you'd run into problems, if the malicious software has the same executable name. If you also check the path, of that executable - then it would be possible to run two copies of your app from different locations.
If you create/delete a file when starting/finishing - that might be tricked as well.
The only thing that comes to my mind is you may be able to achieve the desired effect by putting all the logic of your app into a COM object, and then have a GUI application interact with it through COM interfaces. This would, only ensure, that there is only one COM object - you would be able to run as many GUI clients as you want. Note, that I'm not suggesting this as a bulletproof method - it may have it's own holes (for example - someone could make your GUI client to connect to a 3rd party COM object, by simply editing the registry).
So, the short answer - there is no truly secure way of doing this.
I use a named pipe¹, where the name is derived from the conditions that must be unique:
Name of the application (this is not the file name of the executable)
Username of the user who launched the application
If the named pipe creation fails because a pipe with that name already exists, then I know an instance is already running. I use a second lock around this check for thread (process) safety. The named pipe is automatically closed when the application terminates (even if the termination was due to an End Process command).
¹ This may not be the best general option, but in my case I end up sending data on it at a later point in the application lifetime.
In pseudo code:
numberofapps = 0
for each process in processes
if path to module file equals path to this module file
increment numberofapps
if number of apps > 1
exit
See msdn.microsoft.com/en-us/library/ms682623(VS.85).aspx for details on how to enumerate processes.

How to emulate shm_open on Windows?

My service needs to store a few bits of information (at minimum, at least 20 bits or so, but I can easily make use of more) such that
it persists across service restarts, even if the service crashed or was otherwise terminated abnormally
it does not persist across a reboot
can be read and updated with very little overhead
If I store this information in the registry or in a file, it will not get automatically emptied when the system reboots.
Now, if I were on a modern POSIX system, I would use shm_open, which would create a shared memory segment which persists across process restarts but not system reboots, and I could use shm_unlink to clean it up if the persistent data somehow got corrupted.
I found MSDN : Creating Named Shared Memory and started reimplementing pieces of it within my service; this basically uses CreateFileMapping(INVALID_HANDLE_NAME, ..., PAGE_READWRITE, ..., "Global\\my_service") instead of shm_open("/my_service", O_RDWR, O_CREAT).
However, I have a few concerns, especially centered around the lifetime of this pagefile-backed mapping. I haven't found answers to these questions in the MSDN documentation:
Does the mapping persist across reboots?
If not, does the mapping disappear when all open handles to it are closed?
If not, is there a way to remove or clear the mapping? Doesn't need to be while it's in use.
If it does persist across reboots, or does disappear when unreferenced, or is not able to be reset manually, this method is useless to me.
Can you verify or find faults in these points, and/or recommend a different approach?
If there were a directory that were guaranteed to be cleaned out upon reboot, I could save data in a temporary file there, but it still wouldn't be ideal: under certain system loads, we are encountering file open/write failures (rare, under 0.01% of the time, but still happening), and this functionality is to be used in the logging path. I would like not to introduce any more file operations here.
The shared memory mapping would not persist across reboots and it will disappear when all of its handles are closed. A memory mapping object is a kernel object - they always get deleted when the last reference to them goes away, either explicitly via a CloseHandle or when the process containing the reference exits.
Try creating a registry key with RegCreateKeyEx with REG_OPTION_VOLATILE - the data will not preserved when the corresponding hive is unloaded. This will be at system shutdown for HKLM or user logoff for HKCU.
sounds like maybe you want serialization instead of shared memory? If that is indeed appropriate for your application, the way you serialize will depend on your language. If you're using c++, check out boost::serialize. C# undoubtedly has lots of serializations options (like java), if that's what you're using.

Resources