How to detect Windows file closures locally and on network drives

How to detect Windows file closures locally and on network drives - winapi

I'm working on a Win32 based document management system that employs an automatic check in/check out model. The model it currently uses for tracking documents in use (monitoring the processes of the applications that open the documents) is not particularly robust so I'm researching alternatives.
Check outs are easy as the DocMgt application is responsible for launching the other application (Word, Adobe, Notepad etc) and passing it the document.
It's the automatic check-in requirement that is more difficult. When the user closes the document in Word/Adobe/Notepad ideally the DocMgt system would be automatically notified so it can perform an automatic check in of the updated document.
To complicate things further the document is likely to be stored on a network drive not a local drive.
Anyone got any tips on API calls, techniques or architectures to support this sort of functionality?
I'm not expecting a magic 3 line solution, the research I've done so far leads me to believe that this is far from a trivial problem and will require some significant work to implement. I'm interested in all suggestions whether they're for a full or part solution.

What you describe is a common task. It is perfectly doable, though not without its share of hassle. Here I assume that the files are closed on the computer where your code can run (even if the files are stored on the mounted network share).
There exist two approaches to controlling the files when they are used: the filter and the virtual filesystem.
The filter sits in the middle, between the process and the filesystem (any filesystem, either local, network or fully virtual) and intercepts file requests that go to this filesystem. Here it is required that the filter code is run on the computer, via which the requests are passed (this requirement seems to be met in your scenario).
The virtual filesystem is an endpoint for the requests that come from the applications. When you implement the virtual filesystem, you handle all requests, so you always fully control the lifetime of the files. As the filesystem is virtual, you are free to keep the files anywhere including the real disk (local or network) or even in the cloud.
The benefit of the filter approach is that you can control individual files that reside on the real disks, while the virtual filesystem can be mounted only to the new drive letter or into the empty directory on the NTFS drive, which is not always fisible. At the same time, sitting in the middle, the filter is to some extent more restricted at what it can do, and the files can be altered while the filter is not running. Finally, filters are more complicated and potentially error prone, as they sit in the middle and must play nice with other filters and with endpoints.
I don't have specific recommendations, but if the separate drive letter is an option, I would recommend the virtual filesystem.
Our company developed (and continues to maintain for the new owner) two products, CBFS Filter and CBFS Connect, which let you create a filter and a virtual filesystem respectively, all in the user mode. Those products are used in many software titles, including some Document Management Systems (which is close to what you do). You will find both products on their website.

Related

How to create a software-implemented drive

There are some applications (let us call them providers), which (when running) provide a virtual file and directory structure under a new drive letter. Access requests from other processes to those files and directories are served by the provider.
One example of such provider could be the Google Drive for Windows (the new one, not the old Backup and Sync), which maps the contents of your Google Drive to a chosen drive letter.
I thought there should be some simple user-mode API, which should allow my app to provide a new drive and the contents of files and directories on it. I thought that many applications use such API, but I cannot find it. The closest I could get are IFS (installable file system drivers) and file system filter drivers, but those are kernel-mode and they seem too complex. They just seem not designed to accomplish such task.
So, what API should I use to make a simple software-implemented drive?

In addition to the suggestions in the comments there is also now the Projected Filesystem, which allows software to provide a drive-like interface though callbacks and not just by creating an actual disk image. It is my understanding that Projected FS is how, for instance, SQL Server does its table-backed files interface.

Setting protected folders e.g. via registry manipulation

Scenario
Customers are provided with a server-client solution to accomplish some business-related task. There is a central server installed on a respective machine. Clients are installed on individual machines of users of the software.
The server uses PostgreSQL and stores serialized data as well as media on the designated server-machine.
A related company has experienced a ransomware attack in the past 6 months and we are worried this scenario might also hit our customers. These customers supposedly implemented some security measures, such as a RAID setup, but we remain unconvinced based on prior communication. Even though this is a problem outside our scope of responsibility, adverse effects resulting from a possible attack are likely to affect us as well. This is why I am looking to at least increase security for their database wherever possible.
Question
Given that scenario, one small tweak to their server-system is to enable Windows protection for the folders related to their database.
This guide describes how to activate this function using Windows UI:
https://www.isumsoft.com/windows-10/how-to-protect-files-folders-against-ransomware-attacks.html
I would like to accomplish this without relying on the customer's sysadmins, using our NSIS-based installers only. Therefore my resulting question is - can additional protected folders be declared via registry manipulation? If not, is there a different way to achieve this?

There is a PowerShell API, see "Customize controlled folder access":
Set-MpPreference -EnableControlledFolderAccess Enabled
Add-MpPreference -ControlledFolderAccessProtectedFolders "<the folder to be protected>"
Add-MpPreference -ControlledFolderAccessAllowedApplications "<the app that should be allowed, including the path>"

What are "transactional" file operations?

I was browsing the Win32 API functions for file and directory management operations. I saw that some of those functions has their so called "transactional" counterparts.
Examples:
CreateDirectory and CreateDirectoryTransacted
RemoveDirectory and RemoveDirectoryTransacted
CreateFile and CreateFileTransacted
CopyFile and CopyFileTransacted
I read explanations of these transacted functions, the Wikipedia article Transactional NTFS and this MSDN Magazine page. But because of the heavy terminology (for me) in these pages, I didn't clearly understand these explanations. They all come to a common consensus that these functions are "atomic". But as far as I understand from the word "atom", it is a nucleus with spinning electrons around it...
Can you please explain me in basic and simple English sentences, what are the purposes and operations of these functions? Why and when would one prefer transacted version of an API function?

Why and when would one prefer transacted version of an API function?
There are a couple of scenarios given in the link I quoted above.
One of these is the use case of an installer application, which needs to copy/install several files to different locations and then maybe perform some updates to the registry. Before the installer runs the system can be considered consistent. Once the installer has done all its work, the software is completely installed and the system is again in a consistent state.
If, however, the computer crashes during the installation process it may not be trivial to determine which steps of the installation procedure have already been performed successfully before the crash and which have not. Transactional operations can give support in this situation by 'automatically' restoring the consistent system state as it was before the installer ran, if for any reason the installation fails in the process.
As Microsoft states, the transactional file system operations have never been adopted much by developers, which may serve as an indication that the functionality is not really needed for the vast majority of applications, or, that there are easier ways to achieve the desired result in an application-specific way, for which MS gives examples too.
Besides, the concept of "atomic" operations is present in different areas of software development, for instance in concurrent programming or in database management systems. See the Wikipedia article.

In short, a transaction (be it a file system, database or bank) will only be completed if no errors occurred in the process.
Using a non-transactional file system and API, say you have a file containing:
AAAA
Now you want to fill the file with all B's, but while doing so in the middle the power is lost and not all data is committed to the disk. Now you have an inconsistent state when you read the file back (after power returns):
BBAA
Remember FAT and scandisk?
Now with transactions, the file system basically first write the changes to a different location on the disk, and change the "file data location pointers" inodes towards the new location of the data only when finished, marking the space the old data occupied as 'available' again.
You don't need Transactional NTFS (TxF) for this, as 'standard' NTFS also promises to ensure consistency:
NTFS is a recoverable file system that guarantees the consistency of the volume by using standard transaction logging and recovery techniques. In the event of a system failure, NTFS runs a recovery procedure that accesses information stored in a transaction log file. The NTFS recovery procedure guarantees that the volume is restored to a consistent state. Transaction logging requires very little overhead.

IMPORTANT: Note that Microsoft marked the entire "Transactional NTFS" API as deprecated and strongly discourages its usage.
See Alternatives to using Transactional NTFS.

Reliable way of generating unique hardware ID

Question: I have to come up with unique ID for each networked client, such that:
it (ID) should persist once client software is installed on target computer, and should continue to persist if software is re-installed on same computer and same OS installment,
it should not change if hardware configuration is modified in most ways (except changing the motherboard)
When hard drive with client software installed is cloned to another computer with identical hardware configuration (or, as similar as possible), client software should be aware of that change.
A little bit of explanation and some back-story:
This question is basically age old question that also touches the topic of software copy-protection, as some of the mechanisms used in that area are mentioned here. I should be clear at this point that I'm not looking for a copy-protection scheme. Please, read on. :)
I'm working on a client-server software that is supposed to work in a local network. One of the problems I have to solve is to identify each unique client in the network (not so much of a problem), so that I can apply certain attributes to every specific client, retain and enforce those attributes during the deployment lifetime of a specific client.
While I was looking for a solution, I was aware of the following:
Windows activation system uses some kind of heavy fingerprinting mechanism that is extremely sensitive to hardware modifications,
Disk imaging software copies along all Volume IDs (tied to each partition when formatted), and custom, uniquely generated IDs during installation process, during first run, or in any other way, that is strictly software in its nature, and stored in registry or on hard drive, so it's very easy to confuse two.
The obvious choice for this kind of problem would be to find out BIOS identifiers (not 100% sure if this is unique through identical motherboard models, though), as that's the only thing I can rely on that isn't duplicated, transferred by cloning, and that can't be changed (at least not by using some user-space program). Everything else fails as either being not reliable (MAC cloning, anyone?), or too demanding (in terms that it's too sensitive to configuration changes).
Sub-question that I'd like to ask is, am I doing it correctly, architecture-wise? Perhaps there is a better tool for the task that I have to accomplish...
Another approach I had in mind is something similar to a handshake mechanism, where a server maintains an internal lookup table of connected client IDs (which can be even completely software-based and non-unique at any given moment), and tells the client to come up with a different ID during handshake, if a duplicate ID is provided upon connection. That approach, unfortunately, doesn't play nicely with one of the requirements to tie attributes to specific client during lifetime.

It seems to me that you should construct the unique ID corresponding to your requirements. This ID can be constructed as a hash (like MD5, SHA1 or SHA512) from the information which is important for you (some information about software and hardware component).
You can make your solution more secure if you sign such hash with your private key and your software verify during the starting, that the key (signed hash value) is signed (only public key must be installed together with your software). One can expand such kind of solution with different online services, but corporate clients could find online services not so nice.

What you're looking for is the Windows WMI. You can get the motherboard ID (which is unique across the same type of motherboard) or many many other types of unique identifiers and come up with some clever seeded function to generate a UHID. Whoa did I just make up an acronym?
And if you're looking specifically for getting the Motherboard (BIOS) ID:
WMI class: Win32_BIOS
Namespace: \Root\Cimv2
Documentation: http://msdn.microsoft.com/en-us/library/aa394077(VS.85).aspx
Sample code: http://msdn.microsoft.com/en-us/library/aa390423%28VS.85%29.aspx
Edit: You didn't specify a language (and I assumed C++), but this can be done in Java (with a COM driver), and any .NET language, as well.

Many programs use the hostId in order to build a license code (like those based on FlexLM). Have a look at what Matlab does depending on the operative system:
http://www.mathworks.com/support/solutions/en/data/1-171PI/index.html
Also have a look at this question:
Getting a unique id from a unix-like system
Once I also saw some programs basing their licenses on the serial number of the hard drive, an maybe that is the less likely thing to change. Some would suggest to use the MAC of your ethernet card, but that can be reprogrammed.

MAC
DON'T RELY ON MAC! EVER. It is not permanent. The user can easily change it (under 30 seconds).
Volume ID
DON'T RELY ON Volume ID! EVER. It is not permanent. The user can easily change it. It also changes by simply formatting the drive.
WMI
WMI is a service. Can be easily disabled. Actually, I tried that and I find out that on many computers is disabled or broken (yes, quite often broken).
License server
Connection to a validation server may cause you also lots of troubles because:
* your customers may not always be connected to the Internet.
* your customers may connect with special settings (router/NAT/proxy/gateway) that they need to input into your program in order to let it connect to the validation server.
* they may be behind a firewall that will block all programs except a few (my case). In some cases the firewall may not be under their control (valid for MOST corporate users)!
* it is super easy to redirect your program to a local fake webserver that emulates your licensing server.
Hardware data
If you need strong protection you need to rely on hardware. Something that cannot be edited by the user. Something like CPU ID instruction available in the Intel/AMD CPUs and the serial number written into the drive's IDE interface.
The CPU ID and HDD ID are permanent. They will never change, not even after you format the computer and reinstall Windows.
It is doable. For example this library reads the hardware ID of a computer. There is a compiled demo and also sourcecode/DLL. Disclaimer: the link leads to a commercial product (19€/no royalties).

Windows Registry best practices

In what way is the Windows registry meant to be used? I know it's alright to store a small amount of user preferences, but is it considered bad practice to store all your users data there? I would think it would depend on the data set, so how about for small amounts of data, say, less than 2KB, in 100 or so different key/value pairs. Is this bad practice? Would a flat file or SQLite db be a better practice?

I'm going to take a contrarian view.
The registry is a fine place to put configuration data of all types. In general it is faster than most configuration files and more reliable (individual operations on the registry are transacted so if your app crashes during a write the registry isn't corrupted - in general that isn't the case with ini files).
Marcelo MD is totally right: Storing things like operation percentage complete in the registry (or any other non volitile storage) is a horrible idea. On the other hand storing data like the most recently used files is just fine - the registry was built for just that kind of problem.
A number of the other commenters on this post talking about the MRU list have discussed the problem of what happens when the MRU list gets out of sync due to application crashes. I'm wondering why storing the MRU list in a flat file in per-user storage is any better?
I'm also not sure what the "security implications" of storing your data in the registry are. The registry is just as secure as the filesystem - the registry and the filesystem use the same ACL mechanism to protect their data.
If you ARE going to store your user data in a file, you should absolutely put your data in %APPDATA%\CompanyName\ApplicationName at least - that way if two different developers create an application with the same name (how many "Media Manager" applications are there out there?) you won't have collisions.

For me, simple user configuration items and user data is better to be stored in either a simple XML configuration file, a SQLLite db, or a MS SQL Server Compact db. The exact storage medium depends on the specifics of the implementation.
I only use the registry for things that I need to set infrequently and that users don't need to be able to change/see. For example, I have stored encrypted license information in the registry before to avoid accidental user removal of the data.

Using the registry to store data has mainly one problem: It's not very user-friendly. Users have virtually no chance of backing up their settings, copying them to another computer, troubleshooting them (or resetting them) if they get corrupted, or generally just see what their software is doing.
My rule of thumb is to use the registry only to communicate with the OS. Filetype associations, uninstaller entries, processes to run at startup, those things obviously have to be in the registry.
But data that is for use in your application only belongs in a file in your App Data folder. (whiever one of the 3+ App Data folders Microsoft currently wants you to use, anyway)

As each user has directory space in Windows already dedicated to storing application user data, I use it to store the user-level data (preferences, for instance) there.
In C#, I would get it by doing something like this:
Environment.GetFolderPath( Environment.SpecialFolder.ApplicationData);
Typically, I'll store SQLite files there or whatever is appropriate for the application.

If your app is going to be deployed "in the enterprise", keep in mind that administrators can tweak the registry using group policy tools. For example, if firefox used the registry for things like the proxy server, it would make deployment a snap because an admin can use the standard tools in active directory to set it up. If you use anything else, I dont think such things can be done very easily.
So don't dismiss the registry all together. If there is a chance an admin might want to standardize parts of your configuration across a network, put the setting in the registry.

I think Microsoft is encouraging use of isolated storage instead of the Windows registry.
Here's an article that explains how to use it in .Net.
You can find those files in Windows XP under Documents & Settings\\Local Settings\ App Data\Isolated Storage. The data is in .dat files

I would differentiate:
On the one hand there is application specific configuration data that is needed for the app to run, e.g. IP addresses to connect to, which folders to use for what sort of files etc, and non trivial per user settings.
Those I put in a config file, ini format for simple stuff, xml if it gets more complex.
On the other hand there is trivial per user settings (best example: window positions and layout). To avoid cluttering the config files (which some users will want to edit themselves, so few and clearly arranged entries are a must), I like to put those in the registry (with conservative defaults being set in the app if no settings in the registry can be found).
I mainly do it like istmatt sais: I store config files inside the %APPDATA% folder. Usually in %APPDATA%\ApplicationName, I don't like the .NET default of APPDATA%\CompanyName\ApplicationName\Version, that level of detail and complexity is counterproductive for most small to medium sized applications.
I disagree with the example of Marcelo MD of not storing recently used files in the registry. IMO this is exactly the volatile sort of user specific information that can be stored there.
(His example of what not to do is very good, though!)

To me it seems easier to think of what you should NOT put there.
e.g: dynamic data, such as an editor's "last file opened" and per project options. It is really annoying when your app loses sync with the registry (file deletion, system crash, etc) and retrieves information that is not valid anymore, possibly deadlocking the user.
At an earlier job I saw a guy that stored a data transfer completness percentage there, Writing the new values at every 10k or so and having the GUI retrieve this value every second so it could show on the titlebar.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio