How to read disk file entries faster than FindFile API? [duplicate] - winapi

I am in the middle of writing a tool that finds lost files of an iTunes library, for both Mac and Windows. On the Mac, I can quickly find files by naming using the wonderful "CatalogSearch" function.
On Windows, however, there seems to be no OS API for searching by file name (or is there?).
After some googling, I learned that there are tools (like TFind, Everything) that read the NTFS directory directly and scan it to find files by name.
I would like to do the same, but without having to start from scratch (although I've written quite a few disk tools in the past, I've never had the energy to dig into NTFS).
I wonder if there are ready-made libs around, possibly as a .dll, that would give me this search feature: Pass in a file name, get back its path.
Alternatively, what about the Windows indexing service? At least when I tried this on a recently installed XP Home system, the Search operation under the Start menu would actually scan all directories, which suggests that it has no complete database. As I'm not a Windows user at all, I wonder why this isn't working.
In the end, the complete solution I need is: I have a list of file names to find, and I need code that searches the entire disk (or uses a DB for it) to get me all results in one go. E.g, the search should not start a new full scan for every file I'm looking up. That's why I think the MFT way would be optimal, as it could quickly iterate over all names, comparing each to my list.

The best way to solve your problem seems to be by using the Windows Change Journal.
Problem: If it is not enabled for a volume or the volume is a non-NTFS you need a fallback (or enable the Change Journal if it is NTFS). You need administrator rights as well to access the Change Journal.
You get the files by using the FSCTL_ENUM_USN_DATA and DeviceIOControll with LowUsn=0. This directly accesses the MFT and writes all filenames into the supplied buffer. Because it sequentially acesses the MFT it is faster than the FindFirstFile API.

Related

Creating Virtual Folders and hooking them into the file system

I have a big collection of folders for projects I'm working on. I've been trying to find a better way to sort them all for a long time and I want to write an app that creates groups based on whatever criteria I say, such as "folders from 2011" or "folders containing a x type of file" etc.
This is fairly straightforward, and wouldn't present much of a problem to code using its own UI in winForms or WPF or something. But I think it would be far better if I could make these folders appear to be part of the filesystem, so other apps (like existing file explorers) can see them.
Is this possible? Would it cause problems I haven't considered? How do I go about doing it if it is possible?
One way I thought of doing it would be to have the app monitor the filesystem and create folder shortcuts every time there's a change, but I'm curious about whether its possible to actually present a fake filesystem to explorer through a 'gateway' folder
EDIT: Ok it's obviously possible since http://www.virtualfolder.net/ can do it, and now that I think of it so can TrueCrypt, although it would be nice if it didn't have to appear as a separate drive. So the question becomes, how do I implement it?
You can create a Shell Namespace Extension that gathers the file information you want and displays it within Windows Explorer any way you wish. You can choose where your extension is located, whether as its own top-level node, a child of another system virtual folder/extension, or as a child of a file system folder.
Writing a SNE is not trivial, but it is a lot easier then writing a lower-level file system driver, and it does not require special driver-oriented compilers. Any compiler that supports developing COM objects will work.
This is accomplished using filesystem drivers or filesystem filter drivers. First let you create a virtual filesystem and mount it to a drive letter and also to a folder on NTFS drive (folder must exist but its contents are "replaced" with a virtual filesystem directory tree). Filesystem filter drivers let you introduce virtual files and folders in existing folders without replacing them.
VirtualFolder uses filesystem driver as it creates a drive letter.
Both types of drivers are written in C and work in kernel-mode. Writing them requires deep knowledge of Windows internals and experience with driver development (since filesystem drivers are one of the most complicated driver types).
We offer several products related to virtual storage. One of them, Callback File System, is a filesystem driver. It calls your user-mode code to perform actual filesystem functions. Another product, CallbackFilter, is an FS filter driver (and it also calls your user-mode code). However, current version of CallbackFilter doesn't let you introduce virtual files and folders (this would be implemented in the next release).
There's also Pismo File Mount product available, they use filter driver techniques. You can check with them if what you need can be accomplished.
From what I gather you are looking for a way to present the results of predefined file queries to appear as though they are located at a specific location in the file system. If that is correct you may want to look into Hard Links and Junctions. There are limits on what you can do with these file system services. However it is really straight forward to implement.

How can I know whether a particular file on a Windows machine supports Alternate Data Streams?

Using the raw Windows programming API from C/C++ and a file handle or a path to a file, folder, link, etc; how can I programmatically decide whether the file (etc) supports ADS (Alternate Data Streams)?
I assume one thing I have to know is whether the file is on an NTFS partition, but then again for all I know it might be possible to mount some kind of Mac or *nix filesystems which support data forks or alternate data streams of some kind, and all such cases might be covered by a single API call or data structure.
Secondly I'm not sure whether every kind of object that can exist on an NTFS partition can have ADSs - such as folders, symlinks, hardlinks, anything else?
What API etc can handle all cases to tell me whether a given file etc has the ability to have ADSs?
(For this question I'm not looking for whether files have ADSs, just whether its possible for files to have them. It could include a file I've just created for instance.)
ADS is a feature of NTFS. You can use GetVolumeInformation() to detect if a given path is on an NTFS file system, and even if that volume supports ADS at all. AFAIK, only a real file can have an ADS attached to it. You can use GetFileAttributes() to detect if a path is a file, directory, symbolic link, etc.
Like any other file, Directories can also host other ADS! Any file object on NTFS can store more than one DATA Stream. The 'visible' one is named, any additional data stream is 'invisible' as far as Explorer is concerned. Actually, at the prompt now one can display ADS using the /R switch when invoking dir.

Windows - download a file on-demand, when FileNotFound in file system?

I want to put some sort of "hook" into windows (only has to work on Windows Server 2008 R2 and above) which when I ask for a file on disk and it's not there it then requests it from a web server and caches it locally.
The files are immutable and have unique file names.
The application which is trying to open these files is written in C and just opens a file using the operating system in the normal way. Say it calls OpenFile asking for c:\scripts\1234.12.script, and that is there then it will just open it normally. If then it asks for c:\scripts\1234.13.script and it isn't then my hook in the operating system will then go and ask my web service for the file, download it and then return that file as it it were there all the time.
I'd prefer to write this as a usermode process (I've never written a windows driver), it should only fire when files are not found in a specific folder, and I'd prefer if possible to write it in a managed language (C# would be perfect). The files are small (< 50kB) and the web service is fast and the internet connection blinding so I'm not expecting it to take more than a second to download the file.
My question is - where do I start looking for information about this kind of thing? And if anyone has done anything similar - do you know what options I have (eg can it be done in C#?)?
You would need to create a kernel-mode filesystem filter driver which would intercept requests for opening such files and would "fake" those files. I should say that this is a very complicated task even for driver development. Our CallbackFilter product would be able to solve your problem however mechanism for "faking" files is not yet ready (we plan this feature for CallbackFilter 3). Until then I don't know any user-mode solutions (frankly speaking, no kernel-mode solutions as well) that would solve your problem.
If you can change the folder the application is accessing, then you can create a virtual file system and map it to the drive letter or a folder on NTFS drive. From the virtual file system you can direct most requests to/from real disk and if the file doesn't exist, you can download the file and cache it. Our other product, Callback File System, lets you do what I described in user-mode. If you have a one-time task you need to accomplish, and don't have a budget for it, please contact us anyway and maybe we can find some solution. There also exists an open-source solution with similar (but not so comprehensive) functionality named Dokan, yet I will refrain from commenting on its quality.
You can also try Dokan , it open source and you can check its discussion group for question and guides.

Finding a set of file names quickly on NTFS volumes, ideally via its MFT

I am in the middle of writing a tool that finds lost files of an iTunes library, for both Mac and Windows. On the Mac, I can quickly find files by naming using the wonderful "CatalogSearch" function.
On Windows, however, there seems to be no OS API for searching by file name (or is there?).
After some googling, I learned that there are tools (like TFind, Everything) that read the NTFS directory directly and scan it to find files by name.
I would like to do the same, but without having to start from scratch (although I've written quite a few disk tools in the past, I've never had the energy to dig into NTFS).
I wonder if there are ready-made libs around, possibly as a .dll, that would give me this search feature: Pass in a file name, get back its path.
Alternatively, what about the Windows indexing service? At least when I tried this on a recently installed XP Home system, the Search operation under the Start menu would actually scan all directories, which suggests that it has no complete database. As I'm not a Windows user at all, I wonder why this isn't working.
In the end, the complete solution I need is: I have a list of file names to find, and I need code that searches the entire disk (or uses a DB for it) to get me all results in one go. E.g, the search should not start a new full scan for every file I'm looking up. That's why I think the MFT way would be optimal, as it could quickly iterate over all names, comparing each to my list.
The best way to solve your problem seems to be by using the Windows Change Journal.
Problem: If it is not enabled for a volume or the volume is a non-NTFS you need a fallback (or enable the Change Journal if it is NTFS). You need administrator rights as well to access the Change Journal.
You get the files by using the FSCTL_ENUM_USN_DATA and DeviceIOControll with LowUsn=0. This directly accesses the MFT and writes all filenames into the supplied buffer. Because it sequentially acesses the MFT it is faster than the FindFirstFile API.

Change Journal for Blocks in Windows(NTFS)

I have written a backup tool that is able to backup files and images of volumes for Windows. To detect which files have changed I use the Windows Change Journal. I already use the shadow copy functionality to do a consistent copy of both the files and the volume images.
To detect which blocks have changed I use hashes at the moment. This means the whole volume has to be read once (because to see which block has changed hashes of all blocks have to be calculated).
The backup integrated into Windows 7 is able to create incremental volume images without checking all blocks. I wasn't able to find an API for a kind of block level change journal.
Does anybody know how to access this information?
(I'm willing to dive deep into NTFS internals - even reading and parsing special files)
I don't think block level change info is available anywhere. Most probably what the Windows 7 integrated backup does is it installs a File System Filter Driver like some backup products does and anti-virus software. A filter driver can intercept all file system calls and in this way know which blocks changed. If you do this you can basically build your own change journal that works block level but only for the files that you are interested in.
I would really like to know a better answer myself here.
When you say Windows Change Journal I take it you are referring to the NTFS USN? It looks very much like the Windows 7 backup uses a combination of VSC and NTFS USN to detect changes and create incremental images much like you are already doing.

Resources