NTFS Junctions, trouble understanding the API - winapi

Update: This question has evolved into a question about the NTFS filesystem filter driver how to use the Win32 API in backup applications and other programs that need to know what a file really is on disk? Junctions and reparse points are key concepts that I needed to consider and are the most confusing thing in the NTFS filesystem.
The original question follows:
What is the Win32 API used to detect if a directory is a junction?
'Where' (for lack of better understanding) in the NTFS hierarchy are junctions stored?
If I create a junction c:\thejunction_mydir do both directories become junctions of one-another? i.e. the created and the referenced

How do I detect a reparse point?
Determining Whether a Directory Is a Mounted Folder is the answer. It shows how to determine whether a folder is a reparse point, and thus if it is also potentially: a Junction or a Symlink or a Mount-point.

Related

How are Windows symbolic links treated by the apps?

I thought that symbolic links in Windows 10 behave similarly to Linux symlinks, i.e., they are transparent to the apps. However, I'm confused by the actual behavior.
As an example, I've both softlinked and hardlinked the same CSS file:
$ mklink softlinked.css Default.css
symbolic link created for softlinked.css <<===>> Default.css
$ mklink /H hardlinked.css Default.css
Hardlink created for hardlinked.css <<===>> Default.css
The hardlink behaves predictably (is indistinguishable from the original file) but I don't understand the soft linked one. See for example this:
Also, when the CSS is consumed by the Caret editor, the hardlinked stylesheet works fine:
while the softlinked is broken:
The questions are:
How do the symbolic links actually behave on Windows?
Can soft links be made transparent to the apps? By transparent, I mean the app would always see the file as being on the symlinked path (...\symlinked.css) and never resolve to the original path (...\Default.css). Is there some Windows registry settings or something?
Symlinks are transparent to applications that are using the underlying file system, e.g., CreateFile() and friends, unless the application makes a specific effort to be aware of them.
However, they are not transparent to applications that are using the shell namespace (for example the standard Open File dialog) because the shell treats symlinks as if they were shortcuts, even to the point of modifying the displayed icon. Whether this was a sensible decision on Microsoft's part is a moot point at this stage, since it isn't about to change. So far as I'm aware, it is not configurable.
In practice this usually means that symlinks will behave transparently for non-GUI applications and for internal files (DLLs, built-in templates, configuration files, etc.) in GUI applications, but not for the user's documents.
So your first two examples (the way Explorer displays the files and the behaviour of Notepad++) are features rather than bugs; like it or not, this is the way Windows is designed to work.
Your last example does appear to be a bug (or at best an undesirable design limitation) in the application in question. It might be worth contacting the vendor.
You should also be aware that creating a symlink requires administrative privilege, and by default they don't work at all over network shares. Personally, given all these limitations, I've never found them very useful. For most user tasks I would use shortcuts instead, and for most system administration tasks junction points are more reliable.
They should be transparent to most apps but some apps are to clever for their own good.
They might pass FILE_FLAG_OPEN_REPARSE_POINT to CreateFile, or be too aggressive when "verifying" file attributes and choke on FILE_ATTRIBUTE_REPARSE_POINT.
In your specific case, I'm guessing the advanced editor should use FOS_NODEREFERENCELINKS in their open dialog. The CSS switcher might be using FILE_FLAG_OPEN_REPARSE_POINT and you should be able to verify that with Process monitor.
There is no magical registry entry you can use, you have to contact the application authors.
A file is a pointer to a certain node.
When you create a hard link you are just making a new file that points to the same node as the original file.
When you create a soft link you are not making a pointer to a node, but to a file. Because of that soft link resolves it's path to the file it points to.
Since symlink contains both it's own path and path it points to it really depends on application developers to choose which path they want to put in their UI.

Graceful File Reading without Locking

Whiteboard Overview
The images below are 1000 x 750 px, ~130 kB JPEGs hosted on ImageShack.
Internal
Global
Additional Information
I should mention that each user (of the client boxes) will be working straight off the /Foo share. Due to the nature of the business, users will never need to see or work on each other's documents concurrently, so conflicts of this nature will never be a problem. Access needs to be as simple as possible for them, which probably means mapping a drive to their respective /Foo/username sub-directory.
Additionally, no one but my applications (in-house and the ones on the server) will be using the FTP directory directly.
Possible Implementations
Unfortunately, it doesn't look like I can use off the shelf tools such as WinSCP because some other logic needs to be intimately tied into the process.
I figure there are two simple ways for me to accomplishing the above on the in-house side.
Method one (slow):
Walk the /Foo directory tree every N minutes.
Diff with previous tree using a combination of timestamps (can be faked by file copying tools, but not relevant in this case) and check-summation.
Merge changes with off-site FTP server.
Method two:
Register for directory change notifications (e.g., using ReadDirectoryChangesW from the WinAPI, or FileSystemWatcher if using .NET).
Log changes.
Merge changes with off-site FTP server every N minutes.
I'll probably end up using something like the second method due to performance considerations.
Problem
Since this synchronization must take place during business hours, the first problem that arises is during the off-site upload stage.
While I'm transferring a file off-site, I effectively need to prevent the users from writing to the file (e.g., use CreateFile with FILE_SHARE_READ or something) while I'm reading from it. The internet upstream speeds at their office are nowhere near symmetrical to the file sizes they'll be working with, so it's quite possible that they'll come back to the file and attempt to modify it while I'm still reading from it.
Possible Solution
The easiest solution to the above problem would be to create a copy of the file(s) in question elsewhere on the file-system and transfer those "snapshots" without disturbance.
The files (some will be binary) that these guys will be working with are relatively small, probably ≤20 MB, so copying (and therefore temporarily locking) them will be almost instant. The chances of them attempting to write to the file in the same instant that I'm copying it should be close to nil.
This solution seems kind of ugly, though, and I'm pretty sure there's a better way to handle this type of problem.
One thing that comes to mind is something like a file system filter that takes care of the replication and synchronization at the IRP level, kind of like what some A/Vs do. This is overkill for my project, however.
Questions
This is the first time that I've had to deal with this type of problem, so perhaps I'm thinking too much into it.
I'm interested in clean solutions that don't require going overboard with the complexity of their implementations. Perhaps I've missed something in the WinAPI that handles this problem gracefully?
I haven't decided what I'll be writing this in, but I'm comfortable with: C, C++, C#, D, and Perl.
After the discussions in the comments my proposal would be like so:
Create a partition on your data server, about 5GB for safety.
Create a Windows Service Project in C# that would monitor your data driver / location.
When a file has been modified then create a local copy of the file, containing the same directory structure and place on the new partition.
Create another service that would do the following:
Monitor Bandwidth Usages
Monitor file creations on the temporary partition.
Transfer several files at a time (Use Threading) to your FTP Server, abiding by the bandwidth usages at the current time, decreasing / increasing the worker threads depending on network traffic.
Remove the files from the partition that have successfully transferred.
So basically you have your drives:
C: Windows Installation
D: Share Storage
X: Temporary Partition
Then you would have following services:
LocalMirrorService - Watches D: and copies to X: with the dir structure
TransferClientService - Moves files from X: to ftp server, removes from X:
Also use multi threads to move multiples and monitors bandwidth.
I would bet that this is the idea that you had in mind but this seems like a reasonable approach as long as your really good with your application development and your able create a solid system that would handle most issues.
When a user edits a document in Microsoft Word for instance, the file will change on the share and it may be copied to X: even though the user is still working on it, within windows there would be an API see if the file handle is still opened by the user, if this is the case then you can just create a hook to watch when the user actually closes the document so that all there edits are complete, then you can migrate to drive X:.
this being said that if the user is working on the document and there PC crashes for some reason, the document / files handle may not get released until the document is opened at a later date, thus causing issues.
For anyone in a similar situation (I'm assuming the person who asked the question implemented a solution long ago), I would suggest an implementation of rsync.
rsync.net's Windows Backup Agent does what is described in method 1, and can be run as a service as well (see "Advanced Usage"). Though I'm not entirely sure if it has built-in bandwidth limiting...
Another (probably better) solution that does have bandwidth limiting is Duplicati. It also properly backs up currently-open or locked files. Uses SharpRSync, a managed rsync implementation, for its backend. Open source too, which is always a plus!

Does Win32 support memory-mapped files (CreateFileMapping) on FAT file systems?

I'm concerned about the dangers of using memory-mapped IO, via CreateFileMapping, on FAT filesystems. The specific scenario is users opening documents directly from USB sticks (yeah, you try and ban them doing this!).
The MSDN Managing Memory-Mapped Files article doesn't say anything about file system constraints.
Update
I didn't have any real reason to be concerned but a vague feeling that I'd read about problems with them at some point (my career spans over 25 years so I have a lot of vague depths in my memory, all the way back to 8-bit micros!). The issue of whether or not they should be supported is pretty important for me to recommend so I wanted to ask if anyone could corroborate my concerns. Thanks for putting my mind at rest.
Memory-mapped files is one of my favorite features. It's absolutely no danger. It's one of the base extremely optimized Windows I/O features. If one starts an EXE or load indirect a DLL it is implemented internally as memory-mapped file mapping.
It is supported on all types of file systems including FAT.
By the way atzz say that memory-mapped files are allowed on network drives. I can add it is not only allowed, but it is strictly recommended to use memory-mapped file also with files from network. In the case the I/O operation will be cached in very good way, which is not done with other (C/C++) I/O.
If you want that the EXE will not crash if you open it from the CD or network one can mark Program Executable with one bit in the header (linker switch /SWAPRUN see http://msdn.microsoft.com/en-us/library/chzz5ts6.aspx). There are no option for documents opened from USB stick.
But what exact problem do the users have? Do they don't use "Safely Remove Hardware" Icon? Then they have to learn to do this exactly like they have to learn to not switch computer power, but shutdown the computer properly.
Could you explain why you find dangers to use memory-mapped files, and in what situations you have problems and is usage of other I/O operation has no such problem?
Yes it does. It even supports mapping of files on CDFS or on network drives. What is the source of your doubts?

how to write back to an existing file, ensuring the bits on the disk get overwritten in OS X

What API's Cocoa or Core Foundation, can I use to make sure that when I write back to a file that already exists on the storage device, that the bits get written over? The idea is to clear out the bits for security reasons.
In modern filesystems, there's no way to guarantee that you are overwriting the same physical location on disk -- the lack of that guarantee allows the filesystem to offer you better performance. You might be interested in reading this paper (PDF), appropriately titled Why secure delete doesn't work and no sensitive data should ever be stored unencrypted on disk (emphasis mine).
Consider also this warning from the shred manpage:
CAUTION: Note that shred relies on a very important assumption: that
the file system overwrites data in place. This is the traditional way
to do things, but many modern file system designs do not satisfy this
assumption. The following are examples of file systems on which shred
is not effective, or is not guaranteed to be effective in all file system modes:
log-structured or journaled file systems, such as those supplied with AIX and Solaris (and JFS, ReiserFS, XFS, Ext3, etc.)
file systems that write redundant data and carry on even if some writes fail, such as RAID-based file systems
file systems that make snapshots, such as Network Appliance's NFS server
file systems that cache in temporary locations, such as NFS version 3 clients
compressed file systems
That being said, there might be some way within Cocoa or Core Foundation to allow you to write to a specific head/cylinder/sector/etc. on a disk -- I'm not familiar enough with those APIs -- but I highly doubt it.
No such API exists. It would be much better to only store properly encrypted data on disk if you are worried about security.

English UI Terminology: Directory, or Folder?

When you are designing an application (assumed in English), and you ask the user to provide a path to a directory/folder, do you use the term Directory or Folder?
Is one more understood than the other? Is one more "correct" than the other?
Please note that they are not synonyms. Directories and Folders behave differently. For example, if you want to remove a File from a Folder, you need access to the Folder and the File, because the File is stored inside the Folder.
If, however, you want to remove a File from a Directory, you need access only to the Directory, because a Directory itself is just a regular File that lists the locations of (but does not contain) other Files. So, you just need to strike out that entry from the Directory, no access to the File is required.
This distinction is pretty important, because false and thus misleading metaphors can be at least confusing and in the worst case pretty dangerous when talking about filesystems. (Confusion about filesystem behaviour often translates into accidental information disclosure, data loss or security holes.)
A great percentage of questions on Unix mailinglists, but also here, on ServerFault and on SuperUser, about what to the asker seems to be confusing filesystem behaviour, can be traced directly back to thinking about Folders, when Unix does in fact have Directories.
So, in other words: use "Folder" when dealing with folders and "Directory" when dealing with directories.
In my experience, these tend to be the norm:
on Windows or any Mac OS: "Folder"
on *nix: "Directory"
The correctness of the term is wrapped up in how much your application behaves and talks like other applications on the platform, so it is best to stick with convention as to not confuse your users.
On the other hand, if the term needs to be cross-platform:
We can always use the Folder term,
because even if we use the term Folder in Linux-based platforms,
anyone knowing the technical-difference
(mentioned)
between Folder and Directory would understand that Directory was meant.
For example, in MacOS we use Folder term, while they are technically-speaking directories.
'Directory' is older and usually used on Unix-ish systems. 'Folder' is usually used on Windows. Personally, I use 'folder' even for GUI apps on both Linux and Windows, it just sounds more "user friendly". (And I doubt anyone will really care that I didn't use the "correct" term.)
If you think your users (e.g. technical users) will be happier with 'directory', use that, but otherwise, I would go with 'folder'.
Use whatever the target OS/DE uses. This definitely means "folder" on Win32, not sure about other platforms (though I think it is also definitely "folder" on OS X, and uncertain on Unix-likes). What you want is for your application to use the same terminology as all other apps, and system dialogs.
It also depends on the type of the application. For command-line applications, "directory" rather than "folder" seems to be the norm everywhere (including Win32).
The term Folder has been primary used by windows systems to make a better association to document-organization and is, as others said just another term. If you won't serve different terms for different systems, use the term Directory.

Resources