Is creating hardlinks from linux on a ntfs partition viable? - windows

I have found a program that can make you save space by hardlinking files that are actually the same, thus leaving only one copy of the file on the file system with more than one hardlink pointing to it. The program is called hardlink.
This is very nice as I have at last found a way to save space on my backup disk for old backups I have made before I knew about rsync and incremental backups.
After such a lengthy introduction, any reader would expect a question, so here it is:
Would it be safe to use hardlinks to save space on a ntfs partition? The hardlinks would of course be created from Linux, using the hardlink program mentioned above. More precisely, will Windows (any version) be able to then use the files that would have been replaced by hardlinks?
Many thanks

There are hardlinks on Windows. They are created by the CreateHardLink system call in kernel32.dll. As to whether your hardlink program would work over remote shares, I wouldn't know, but a native one or one from cygwin would.
Now the real question is whether or not Windows programs handle them. Even Windows Explorer fails to calculate disk space used for hardlinks correctly.

I did a small test. Creating a hardlink (using 'ln TargetName LinkName') yields the same file at creation time, but after that the file and the hardlink content change independently. I would therefore discourage any use of unix based hard links on an NTFS partition. Use either an Ext4 partition (linux only) or software adapted for windows-like links on NTFS partitioning (windows software or perhaps some linux software if explicitly mentioned).

I guess that the program hardlink will either fail because hard links doesn’t exists on Windows or create Windows shortcuts.

Related

Creating Virtual Folders and hooking them into the file system

I have a big collection of folders for projects I'm working on. I've been trying to find a better way to sort them all for a long time and I want to write an app that creates groups based on whatever criteria I say, such as "folders from 2011" or "folders containing a x type of file" etc.
This is fairly straightforward, and wouldn't present much of a problem to code using its own UI in winForms or WPF or something. But I think it would be far better if I could make these folders appear to be part of the filesystem, so other apps (like existing file explorers) can see them.
Is this possible? Would it cause problems I haven't considered? How do I go about doing it if it is possible?
One way I thought of doing it would be to have the app monitor the filesystem and create folder shortcuts every time there's a change, but I'm curious about whether its possible to actually present a fake filesystem to explorer through a 'gateway' folder
EDIT: Ok it's obviously possible since http://www.virtualfolder.net/ can do it, and now that I think of it so can TrueCrypt, although it would be nice if it didn't have to appear as a separate drive. So the question becomes, how do I implement it?
You can create a Shell Namespace Extension that gathers the file information you want and displays it within Windows Explorer any way you wish. You can choose where your extension is located, whether as its own top-level node, a child of another system virtual folder/extension, or as a child of a file system folder.
Writing a SNE is not trivial, but it is a lot easier then writing a lower-level file system driver, and it does not require special driver-oriented compilers. Any compiler that supports developing COM objects will work.
This is accomplished using filesystem drivers or filesystem filter drivers. First let you create a virtual filesystem and mount it to a drive letter and also to a folder on NTFS drive (folder must exist but its contents are "replaced" with a virtual filesystem directory tree). Filesystem filter drivers let you introduce virtual files and folders in existing folders without replacing them.
VirtualFolder uses filesystem driver as it creates a drive letter.
Both types of drivers are written in C and work in kernel-mode. Writing them requires deep knowledge of Windows internals and experience with driver development (since filesystem drivers are one of the most complicated driver types).
We offer several products related to virtual storage. One of them, Callback File System, is a filesystem driver. It calls your user-mode code to perform actual filesystem functions. Another product, CallbackFilter, is an FS filter driver (and it also calls your user-mode code). However, current version of CallbackFilter doesn't let you introduce virtual files and folders (this would be implemented in the next release).
There's also Pismo File Mount product available, they use filter driver techniques. You can check with them if what you need can be accomplished.
From what I gather you are looking for a way to present the results of predefined file queries to appear as though they are located at a specific location in the file system. If that is correct you may want to look into Hard Links and Junctions. There are limits on what you can do with these file system services. However it is really straight forward to implement.

How to read disk file entries faster than FindFile API? [duplicate]

I am in the middle of writing a tool that finds lost files of an iTunes library, for both Mac and Windows. On the Mac, I can quickly find files by naming using the wonderful "CatalogSearch" function.
On Windows, however, there seems to be no OS API for searching by file name (or is there?).
After some googling, I learned that there are tools (like TFind, Everything) that read the NTFS directory directly and scan it to find files by name.
I would like to do the same, but without having to start from scratch (although I've written quite a few disk tools in the past, I've never had the energy to dig into NTFS).
I wonder if there are ready-made libs around, possibly as a .dll, that would give me this search feature: Pass in a file name, get back its path.
Alternatively, what about the Windows indexing service? At least when I tried this on a recently installed XP Home system, the Search operation under the Start menu would actually scan all directories, which suggests that it has no complete database. As I'm not a Windows user at all, I wonder why this isn't working.
In the end, the complete solution I need is: I have a list of file names to find, and I need code that searches the entire disk (or uses a DB for it) to get me all results in one go. E.g, the search should not start a new full scan for every file I'm looking up. That's why I think the MFT way would be optimal, as it could quickly iterate over all names, comparing each to my list.
The best way to solve your problem seems to be by using the Windows Change Journal.
Problem: If it is not enabled for a volume or the volume is a non-NTFS you need a fallback (or enable the Change Journal if it is NTFS). You need administrator rights as well to access the Change Journal.
You get the files by using the FSCTL_ENUM_USN_DATA and DeviceIOControll with LowUsn=0. This directly accesses the MFT and writes all filenames into the supplied buffer. Because it sequentially acesses the MFT it is faster than the FindFirstFile API.

Finding a set of file names quickly on NTFS volumes, ideally via its MFT

I am in the middle of writing a tool that finds lost files of an iTunes library, for both Mac and Windows. On the Mac, I can quickly find files by naming using the wonderful "CatalogSearch" function.
On Windows, however, there seems to be no OS API for searching by file name (or is there?).
After some googling, I learned that there are tools (like TFind, Everything) that read the NTFS directory directly and scan it to find files by name.
I would like to do the same, but without having to start from scratch (although I've written quite a few disk tools in the past, I've never had the energy to dig into NTFS).
I wonder if there are ready-made libs around, possibly as a .dll, that would give me this search feature: Pass in a file name, get back its path.
Alternatively, what about the Windows indexing service? At least when I tried this on a recently installed XP Home system, the Search operation under the Start menu would actually scan all directories, which suggests that it has no complete database. As I'm not a Windows user at all, I wonder why this isn't working.
In the end, the complete solution I need is: I have a list of file names to find, and I need code that searches the entire disk (or uses a DB for it) to get me all results in one go. E.g, the search should not start a new full scan for every file I'm looking up. That's why I think the MFT way would be optimal, as it could quickly iterate over all names, comparing each to my list.
The best way to solve your problem seems to be by using the Windows Change Journal.
Problem: If it is not enabled for a volume or the volume is a non-NTFS you need a fallback (or enable the Change Journal if it is NTFS). You need administrator rights as well to access the Change Journal.
You get the files by using the FSCTL_ENUM_USN_DATA and DeviceIOControll with LowUsn=0. This directly accesses the MFT and writes all filenames into the supplied buffer. Because it sequentially acesses the MFT it is faster than the FindFirstFile API.

Where is data on a non-persistant Live CD stored?

When I boot up Linux Mint from a Live CD, I am able to save files to the "File System". But where are these files being saved to? Can't be the disc, since it's a CDR. I don't think it's stored in the RAM, because it can only hold so much data and isn't really intended to be used as a "hard drive". The only other option is the hard drive... but it's certainly not saving to any partition on the hard drive I know about, since none of them are mounted. Then where are my files being saved to??
Believe it or not, it's a ramdisk :)
All live distros mount a temporary hard disk in RAM memory. The process is completely user-transparent and is all because of the magic of Linux kernel.
The OS, in fact, first allocates an area of your RAM memory into a virtual device, then mounts it as a regular hard drive in your file system.
Once you reboot, you lose all your data from that ramdrive.
Ramdrive is needed by almost all software running on Live CDs. In fact, almost all programs, in particular desktop managers, are designed in order to write files, even temporary, during their execution.
As an example, there are two ways to run KDE on a Live CD: either modify its code deeply in order to disallow you to change wallpaper etc. (the desktop settings are stored inside ~/.kde) or redeploy it onto a writable file system such as a ramdrive in order to avoid write fails on read-only file systems.
Obviously, you can mount your real HDD or any USB drive into your virtual file system and make all writes to them permanent, but by default no live distro mounts your drives into the root file system, instead they usually mount into specific subdirectories like /mnt, /media, /windows
Hope to have been of help.
It does indeed emulate a disk using RAM; from Wikipedia:
It is able to run without permanent
installation by placing the files that
typically would be stored on a hard
drive into RAM, typically in a RAM
disk, though this does cut down on the
RAM available to applications.
RAM. In Linux, and indeed most unix systems, any kind of device is seen as a file system.
For example, to get memory info on linux you use cat /proc/meminfo, where cat is used to read files. Then, there's all sorts of strange stuff like /dev/random (to read random crap) and /dev/null (to throw away crap). ;-)
To make it persistent - use a USB device - properly formatted and with a special name. See here:
https://help.ubuntu.com/community/LiveCD/Persistence

Windows equivalent to Linux namespaces (per-process filesystem mounts)?

Linux has a feature called namespaces, which let you give a different "view" of the filesystem to different processes. In Windows terms, this would be useful for example if you had a legacy program "floyd" that always loaded its configuration from C:\floyd\floyd.ini. If Windows had namespaces, you could write a wrapper script which would create a namespace in which to run floyd, making it so when Alice ran the script, floyd would start up in an environment where C:\floyd existed but actually pointed to C:\Users\Alice\Floyd.
Now you may be thinking, "OK, just use soft or hard links and make C:\floyd an alias for C:\Users\Alice." But with namespaces, Bob can also run the startup script, but his instance of floyd (on the same computer, running at the same time) will see C:\floyd with the contents of, say, C:\Users\Bob\Program Settings\Floyd Config (or any other path we like).
You can do this on Linux with namespaces. Is there something similar or analogous on Windows? It's fine if it requires writing a C program, and it's OK if it only works on recent versions of Windows.
NTFS hard links are really a simple case of reparse points. Reparse points are typed, and can include more advanced behavior. For instance, they're also used for "offline storage" (transparent migration of files to and from secondary storage). You can therefore also use reparse points to implement per-user symbolic links, by creating a new reparse type.
The reparse point type even has an explicit "Name surrogate" bit, which (if set) indicates that reparse points of those types are some kind of symbolic link.
You can even have multiple reparse points in a path. Therefore files inside your symbolic namepace can still be migrated to secondary storage - you'd just have two reparse points in the path.
I think Virtual Store does this automatically for legacy programs that try to write to nonstandard directories. So the legacy program writes to a user- and program-specific directory instead to C:\floyd.
This sounds like Windows Vista's file system virtualization. For example, it can silently redirect c:\Program Files\Floyd to c:\Users\<username>\AppData\Local\VirtualStore\Program Files\Floyd. However, file system virtualization isn't nearly as configurable as Linux namespaces. From what I can tell from reading, file system virtualization should apply any time a 32-bit interactive process opens for writing a file, folder, or registry key that's only writable by administrators. (So you typically end up with some read-only files under c:\Program Files and some per-user writable files under c:\Users\<username>\AppData\Local\VirtalStore.)
An application virtualization product can probably also do this, although those are often more complicated and more expensive.
You can use hardlinks for that, but only with NTFS. http://en.wikipedia.org/wiki/Hard_link
I think windows doesn't have virtual FS view per process.
The most relevant thing is probably special environment folders, such as %temp%, %appdata%, %localappdata%. Not that they're equivalent, but they fulfil the same purpose.
You can define your own environment variables then use '%myspecialplace%\myfile.txt' to access them.
As a horrid kludge (and here I book my passage to programming hell) could you use a NamedPipe for C:\Floyd that mapped the IO operations onto a file specific to the current process user?
I know it's not pretty and I don't know enough about NamedPipes (FIFOs in other dialects) on Windows to know how feasible this is.
Dan
There are several things that come to my mind.
First of all you can create a file system filter driver (or use a ready driver, such as our CallbackFilter product) that would redirect all file system calls, coming from the application, to other location. This is close to virtualization that you mentioned but this won't change the list of drive letters though. Such approach is both powerful and non-trivial, so see the other option first of all.
And the other option is:
there exist several products (Thinstall, Molebox if memory serves) that "wrap" the application redirecting it's file I/O to some other location. There was also some SDK to do the same, but I don't remember it's name at all.
e.g. http://www.msigeek.com/4819/file-re-direction-using-correctfilepaths-shim-to-fix-broken-applications
But I think it's not configurable per user, although the target can vary per-user based on environment variable substitution.
Most programs, though, store configuration in the registry, in which case RegOverridePredefKey would do the trick.
There's a shortage of good solutions for this. For simplicity, I can't improve on using NTFS soft links (junctions) for this - as you correctly point out, this creates issues if you want per-user configuration. As MSalters correctly says, all NTFS soft and hard links are just special cases of reparse points, so you could do something more general by impplementing a new reparse type, if you don't mind some work digging into NTFS..
(Junction is a pretty useful tool when experimenting with NTFS soft links: http://technet.microsoft.com/en-us/sysinternals/bb896768.aspx )
You could just take a direct approach - give each user (or your program initialization if you only care about one piece of software) a logon script that sets up the appropriate junction into their user directory (and make sure you clean it up afterwards). But it's clumsy.
In general the right Windows approach is to put things into the folders pointed at by the %localappdata% (from Vista on) and, more generally, %userprofile% system variables.
Win file system virtualization is intended to ensure this in the cases where it applies.

Resources