Windows remembering lower case filename, how to force it to forget? - windows

Here's my problem:
I've got source files I'm publishing (.dita files, publishing using Oxygen) and I need to change capitalization on a lot of them, along with folders and subfolders that they're in. Everything is in source control using SVN.
When I change only an initial cap, say, and leave everything about the filename the same otherwise, Windows "remembers" the lower case name, and that's what gets published, even though the source name is now upper case.
I can even search for the filename, for example Foobar.dita, and the search results will show me "foobar.dita". When I go to that location directly in the file explorer, the file is named Foobar.dita. It's not a duplicate, it's the same file.
What I understand from reading up on this is that Windows isn't case-sensitive, but it "remembers" the filename as one case or the other. So my question is, if I can't force Windows to be case-sensitive, can I somehow force Windows to forget the filename? I've tried deleting it from both Windows and SVN, and recreating it, but it still gets read as lower case when it's initial cap.
If I rename the file, even slightly, it solves the problem, but many of the filenames are just what they need to be, and it's a lot more work to rename them (to think of another good filename) than just to change to initial cap.
UPDATE:
Here's where I read about about the "remembering" idea, in response two, the one with 7 recommendations.
To be explicit: I'm not updating from SVN and thus turning it back to lower case, it's upper case in SVN. It appears upper case in the Windows folder.
UPDATE II: This seems to be what I'm up against:
http://support.microsoft.com/kb/100625
In NTFS, you can create unique file names, stored in the same directory, that differ only in case. For example, the following filenames can coexist in one directory on an NTFS volume:
CASE.TXT
case.txt
case.TXT
However, if you attempt to open one of these files in a Win32 application, such as Notepad, you would only have access to one of the files, regardless of the case of the filename you type in the Open File dialog box.
So it sounds like the only answer is rename the files, not just change case.

Related

Find next file (but not FindNextFile)

If a user opens a file in a program (for example using GetOpenFileNameW, DragQueryFileW, command line argument, or whatever else to get the path, and a subsequent CreateFileW call), is there a way to find the next file in the parent directory of the opened file?
The obvious solution is to cycle through the results from FindNextFileW or NtQueryDirectoryFileEx until the opened file is encountered, and just open the next file.
However, this seems undesireable.
First, because these functions use paths (instead of for example a handle), the original file is decoupled from the search algorithm, so the original file might not even get encountered in that search. This is not much of an issue (as failing in this case is the expected outcome), and it probably could be resolved with (temporarly) changing the sharing mode, using LockFile or similar (though I would like to avoid that).
Second, this cycling search would have to be done every time, because the contents of the directory might have changed (retaining hFindFile does not work, because only FindFirstFileW calls NtQueryDirectoryFileEx and enumerates the contents of the directory). Which seems like unnecessary work and might even affect performance (for example if the directory contains a lot of files).
In theory any file system has some way of enumerating the files in a directory. Meaning there is some ordered data structure of the files' metadata. And getting the next file should only involve going back from the existing file handle to that file's entry, and then getting the next entry from that data structure. So there does not seem to be a fundamental reason why this cannot be done more sanely.
I thought maybe there exist a better way to do this somewhere in WinAPI...
Same question for finding the previous file.

What is the purpose of creating a symbolic link between files?

Recently I came across the os library in Python and found out about the existence of symbolic links. I would like to know what a symbolic link is, why it exists, and what are various uses of it?
I will answer this from a perspective of an *nix user (specifically Linux). If you're interested in how this relates to Windows I suggest you look for tutorials like this one. This will be a bit of a roundabout, but I find it that symbolic links or symlinks are best explained together with hard links and generic properties of a filesystem on Linux.
Links and files on Linux
As a rule of thumb, in Linux everything is treated as a file. Directories are files that contain mappings from names (paths) to inodes, which are just unique identifiers of different objects residing on your system. Basically, if I give you a name like /home/gst/mydog.png the accessing process will first look into the / directory (the root directory) where it will find information on where to find home, then opening that file it will look into it to see where gst is and finally in that file it will try to find the location of mydog.png, and if successful try do whatever it set out to do with it. Going back to directory files, the mappings they contain are called links. Which brings us to hard and symbolic links.
Hard vs Symbolic links
A hard link is just a mapping like the one we discussed previously. It points directly to a certain object. A symlink on the other hand does not point directly to an object. Rather it just saves a path to an object. For example, say that I created a symbolic link to /home/gst/mydog.png at /home/gst/Desktop/mycat.png with os.symlink("/home/gst/mydog.png", "/home/gst/Desktop/mycat.png"). When I try to open it, the name /home/gst/Desktop/mycat.png is usually resolved to /home/gst/mydog.png. By following the symlink located at /home/gst/Desktop/mycat.png I actually (try to) access an object pointed to by /home/gst/mydog.png.
If I create a hard link (for example by calling os.link) I just add entries to the relevant directory files, such that the specific name can be followed to the linked object. When I create a symbolic link I create a file that contains a path to another file (which might be another symbolic link).
More specific to your question, if I pass /home/gst/Desktop/mycat.png to os.readlink it will return /home/gst/mydog.png. This name resolution also happens when calling functions in os with an (optional) parameter follow_symlinks set to True, however, if it's set to False the name does not get resolved (for instance you'd set it to false when you want to manipulate the symlink itself not the object it points to). From the module documentation:
not following symlinks: If follow_symlinks is False, and the last element of the path to operate on is a symbolic link, the function will operate on the symbolic link itself instead of the file the link points to. (For POSIX systems, Python will call the l... version of the function.)
You can check whether or not follow_symlinks is supported on your platform using os.supports_follow_symlinks. If it is unavailable, using it will raise a NotImplementedError.
Why use hard links?
This question has already been answered here, quoting from the accepted answer:
The main advantage of hard links is that, compared to soft links, there is no size or speed penalty. Soft links are an extra layer of indirection on top of normal file access; the kernel has to dereference the link when you open the file, and this takes a small amount of time. The link also takes a small amount of space on the disk, to hold the text of the link. These penalties do not exist with hard links because they are built into the very structure of the filesystem.
I'd like to add that hard links allow for an easy method of file backup. For every file the system keeps a count of its hard links. Once this count reaches 0 the memory segment on which the file is located is marked as free, meaning that the system will eventually overwrite it with another data (effectively deleting the previous file - which doesn't happen for at least as long as a running process has an opened stream associated with the file, but that's another story). Why would that matter?
Let's say you have a huge directory full of files you'd like to manipulate somehow (rename some, delete others, etc.) and you write a script to do this for you. However, you're not completely sure that the script will work as intended and you fear it might delete some wrong files. You also don't want to copy all the files, as this would take up too much space and time. One solution is to just create a hard link for each file at some other point in the filesystem. If you delete a file in the target directory, the associated object is still available because there's another hard link associated with it. Creating that many hard links will consume much less time and space than copying all the file, yet it will give you a reasonable backup strategy.
This is not the case with symbolic links. Remember, symlinks point to other links (possibly another symlink as well) not to actual files. Hence, I might create a symlink to a file, but that it will only save the link. If the (eventual) hard link that the symlink is pointing to gets removed from the system, trying to resolve the symlink won't lead you to a file. Such symlinks are said to be "broken" or "dangling". Thus you cannot rely on symlinks to preserve access to a certain file. (Conversely, deleting a symlink does not affect the link count associated with a target file.) So what's their use?
Why use symbolic links?
You can operate on symlinks as if they were the actual files to which they pointing somewhere down the line (except deleting them). This allows you to have multiple "access points" to a file, without having excess copies (that remain up to date, since they always access the same file). If you want to replace the file that is being accessed you only need to change it once and all of the symlinks will point to it (as long as the path saved by them is not changed). However, if you have hard links to a certain file and you then replace that file with another one, you also need to replace the hard links as otherwise they'll still be pointing to the old file.
Lastly, it is not uncommon to have different filesystems mounted on the same Linux machine. That is to say, that the way data is organized and interpreted at some point in the file hierarchy (say /home/gst/fs1) can be different to how it is organized and interpreted at another point (say /home/gst/Desktop/fs2). A hard link can only reside on the same filesystem as the file it's pointing to. Whereas, a symlink can be created on one filesystem but effectively pointing to a file on another filesystem (see answers to this question).
Symbolic links, also known as soft links, are special types of files that point to other files, much like shortcuts in Windows and Macintosh aliases. The data in the target file does not appear in a symbolic link, unlike a hard link. Instead, it points to another file system entry.
Read more here:
https://kb.iu.edu/d/abbe#:~:text=A%20symbolic%20link%2C%20also%20termed,somewhere%20in%20the%20file%20system.

Undo a botched command prompt copy which concatenated all of my files

In a Windows 8 Command Prompt, I had a backup drive plugged in and I navigated to my User directory. I executed the command:
copy Documents G:/Seagate_backup/Documents
What I assumed was that copy would create the Documents directory on my backup drive and then copy the contents of the C: Documents directory into it. That is not what happened!
I proceeded to wipe my hard-drive and re-install the operating system, thinking I had backed up the important files, only to find out that copy seemingly concatenated all the C: Documents files of different types (.doc, .pdf, .txt, etc) into one file called "Documents." This file is of course unreadable but opening it in Notepad reveals what happened. I can see some of my documents which were plain text throughout the massively long file.
How do I undo this!!? It's terrible because I was actually helping a friend and was so sure of myself but now this has happened. The only thing I can think of doing is searching for some common separator amongst the concatenated files and write some sort of script to split the file back apart. But then I would have to guess the extensions of each of the pieces...
Merging files together in the fashion that copy uses, discards important file system information such as file size and file name. While the file name may not be as important the size is. Both parameters are used by the OS to discriminate files.
This problem might sound familiar if you have damaged your file allocation table before and all files disappeared. In both cases, you will end up with a binary blob (be it an actual disk or something like your file which might resemble a disk image) that lacks any size and filename information.
Fortunately, this is where a lot of file system recovery tools can help. They are specialized in matching patterns. Specifically they are looking for giveaway clues to what type a file is of, where it starts and what it's size is.
This is for instance enabled by many file types having a set of magic numbers that are used to allow a program to check if a file really is of the type that the extension claims to be.
In principle it is possible to undo this process more or less well.
You will need to use data recovery tools or other analysis tools like binwalk to extract the concatenated binary blob. Essentially the same tools that are used to recover deleted files should be able to extract your documents again. Without any filename of course. I recommend renaming the file to a disk image (.img) and either mounting it from within the operating system as a virtual harddisk (don't worry that it has no file system - it should show up as an unformatted drive) or directly using a data recovery tool or analysis tool which can read binary files (binwalk, for instance, can do that directly, but may not find all types of files as it's mainly for unpacking firmware images that may be assembled in the same or a similar way to how your files ended up).

How should I mark a folder as processed in a script?

A script shall process files in a folder on a Windows machine and mark it as done once it is finished in order to not pick it up in the next round of processing.
My tendency is to let the script rename the folder to a different name, like adding "_done".
But on Windows, renaming a folder is not possible if some process has the folder or a file within it open. In this setup, there is a minor chance that some user may have the folder open.
Alternatively I could just write a stamp-file into that folder.
Are there better alternatives?
Is there a way to force the renaming anyway, in particular when it is on a shared drive or some NAS drive?
You have several options:
Put a token file of some sort in each processed folder and skip the folders that contain said file
Keep track of the last folder processed and only process ones newer (Either by time stamp or (since they're numbered sequentially), by sequence number)
Rename the folder
Since you've already stated that other users may already have the folder/files open, we can rule out #3.
In this situation, I'm in favor of option #1 even though you'll end up with extra files, if someone needs to try and figure out which folders have already been processed, they have a quick, easy method of discerning that with the naked eye, rather than trying to find a counter somewhere in a different file. It's also a bit less code to write, so less pieces to break.
Option #2 is good in this situation as well (I've used both depending on the circumstances), but I tend to favor it for things that a human wouldn't really need to care about or need to look for very often.

How to programmatically find the difference between two directories

First off; I am not necessarily looking for Delphi code, spit it out any way you want.
I've been searching around (especially here) and found a bit about people looking for ways to compare to directories (inclusive subdirs) though they were using byte-by-byte methods. Second off, I am not looking for a difftool, I am "just" looking for a way to find files which do not match and, just as important, files which are in one directory but not the other and vice versa.
To be more specific: I have one directory (the backup folder) which I constantly update using FindFirstChangeNotification. Though the first time I need to copy all files and I also need to check the backup directory against the original when the applications starts (in case something happened when the application wasn't running or FindFirstChangeNotification didn't catch a file change). To solve this I am thinking of creating a CRC list for the backed up files and then run through the original directory computing the CRC for every file and finally compare the two CRCs. Then somehow look for files which are in one directory and not the other (again; vice versa).
Here's the question: Is this the fastest way? If so, how would one (roughly) get the job done?
You don't necessarily need CRCs for each file, you can just compare the "last modified" date for every file for most normal purposes. It's WAY faster. If you need additional safety, you can also compare the lengths. You get both of these metrics for free with the find functions.
And in your change notification, you should probably add the files to a queue and use a timer object to copy the new queued files every ~30sec or something, so you don't bog down the system with frequent updates/checks.
For additional speed, use the Win32 functions wherever possible, avoid any Delphi find/copy/getfileinfo functions. I'm not familiar with the Delphi framework but for example the C# stuff is WAY WAY WAY slower than the Win32 functions.
Regardless of you "not looking for a difftool", are you opposed to using Cygwin with it's "diff" command for the shell? If you are open to this its quite easy, particularly using diff with the -r "recursive" option.
The following generates the differences between 2 Rails installs on my machine, and greps out not only information about differences between files but also, specifically by grepping for 'Only', finds files in one directory, but not the other:
$ diff -r pgnindex pgnonrails | egrep '^Only|diff'
Only in pgnindex/app/controllers: openings_controller.rb
Only in pgnindex/app/helpers: openings_helper.rb
Only in pgnindex/app/views: openings
diff -r pgnindex/config/environment.rb pgnonrails/config/environment.rb
diff -r pgnindex/config/initializers/session_store.rb pgnonrails/config/initializers/session_store.rb
diff -r pgnindex/log/development.log pgnonrails/log/development.log
Only in pgnindex/test/functional: openings_controller_test.rb
Only in pgnindex/test/unit: helpers
The fastest way to compare one directory on the local machine to a directory on another machine thousands of miles away is exactly as you propose:
generate a CRC/checksum for every file
send the name, path, and CRC/checksum for each file over the internet to the other machine
compare
Perhaps the easiest way to do that is to use rsync with the "--dryrun" or "--list-only" option.
(Or use one of the many applications that use the rsync algorithm,
or compile the rsync algorithm into your application).
cd some_backup_directory
rsync --dryrun myname#remote_host:latest_version_directory .
For speed, the default rsync assumes, as Blindy suggested, that two files with the same name and the same path and the same length and the same modification time are the same.
For extra safety, you can give rsync the "--checksum" option to ignore the length and modification time and force it to compare (the checksum of) the actual contents of the file.

Resources