How do I resolve a canonical filename in Windows? - windows

If I have a string that resolves to a file path in Windows, is there an accepted way to get a canonical form of the file name?
For example, I'd like to know whether
C:\stuff\things\etc\misc\whatever.txt
and
C:\stuff\things\etc\misc\other\..\whatever.txt
actually point to the same file or not, and store the canonical form of the path in my application.
Note that simple string comparisons won't work, nor will any RegEx magic. Remember that we have things like NTFS reparse points to deal with since Windows 2000 and the new Libraries structure in Windows 7.

Short answer: not really.
There is no simple way to get the canonical name of a file on Windows. Local files can be available via reparse points, via SUBST. Do you want to deal with NTFS junctions? Windows shortcuts? What about \\?\-escaped filenames
Remote files can be available via mapped drive letter or via UNC. Is that the UNC to the origin server? Are you using DFS? Is the server using reparse points, etc.? Is the server available by more than one name? What about the IP address? Does it have more than one IP address?
So, if you're looking for something like the inode number on Windows, it ain't there. See, for example, this page.

Roger is correct, there is no simple way. If the volume supports file a unique file index, you can open the file and call GetFileInformationByHandle, but this will not work on all volumes.
The Windows API call GetFullPathName may be the best simple approach.

GetFinalPathNameByHandle appears to do what your asking for, which is available starting with Windows Vista.

Using FileInfo (example in C#):
FileInfo info1 = new FileInfo(#"C:\stuff\things\etc\misc\whatever.txt");
FileInfo info2 = new FileInfo(#"C:\stuff\things\etc\misc\other\..\whatever.txt");
if (info1.FullName.Equals(info2.FullName)) {
Console.WriteLine("yep, they're equal");
}
Console.WriteLine(info1.FullName);
Console.WriteLine(info2.FullName);
Output is:
yep, they're equal
C:\stuff\things\etc\misc\whatever.txt
C:\stuff\things\etc\misc\whatever.txt

jheddings has a nice answer, but since you didn't indicate which language you are using, I thought I'd give a Python way to do it that also works from the command line, using os.path.abspath:
> python -c "import os.path; print os.path.abspath('C:\stuff\things\etc\misc\other\..\whatever.txt')"
C:\stuff\things\etc\misc\whatever.txt

I would use System.IO.Path.GetFullPath. It takes a string as an input (C:\stuff\things\etc\misc\other..\whatever.txt in your case) and will output a string (C:\stuff\things\etc\misc\whatever.txt).

I guess I'm a little late, but you can use System.IO.Path.GetFullPath("C:\stuff\things\etc\misc\other..\whatever.txt") and it will return "C:\stuff\things\etc\misc\whatever.txt"

To get canonical path you should use PathCanonicalize function.

Related

Is it possible to use Windows Api (win7) in order to mass rename files?

While in a folder with lots of files, one can select many and rename only one. This one will get the name NewName (1) and the rest will follow as NewName (2) etc..
Is there a way to use this algorithm?
I mostly interested in using WinApi methods in general. It is easy to implement this specific algorithm. I don't know how to dig into explorer.exe and see what method it uses but probably it would be something reusable.
I mostly use c# but any language example would be accepted.
Not with a single function call, no. But you can loop through the files one at a time using SHFileOperation() with the FOF_RENAMEONCOLLISION flag to rename each file to the same target filename so Windows will generate its own unique filenames.
As pointed out by Remy Lebeau I came up with the official way to do it.
IFileOperation::RenameItems
Declares a set of items that are to be given a new display name.
All items are given the same name.
...
If more than one of the items in the collection at pUnkItems is
in the same folder, the renamed files are appended with a number in
parentheses to differentiate them, for instance newfile(1).txt,
newfile(2).txt, and newfile(3).txt.
Here is the referenced link.
This also answers my question on where to start to using windows shell api to do stuff. The answer is here.
Yes it is possible see here http://blog.gadodia.net/stupid-windows-trick-mass-renaming/ I used this to oganized and Number files in mass

What does "\\?\" means before a path?

I'm having a look at the code of FastCopy, where I want to add a few features.
Internally, it seems that FastCopy stores its paths with a \\?\ before the path. eg. \\?\c:\Program Files\Adobe. These paths are passed on directly to Windows API functions like DeleteFile, RemoveDirectory, etc. so it seems Windows understands the format.
But what do these extra characters mean and why do FastCopy stores them that way?
The thing that's probably most relevant for FastCopy is that it allows you to work with file names more than ~256 characters long.
If memory serves, it also prevents Windows from parsing a file name looking for things like \\server\file to access a shared file (though you can still use \\?\UNC\whatever), but that's probably not what's really intended/relevant here.
You are referring to Long UNC paths : https://en.wikipedia.org/wiki/Path_%28computing%29 Hope that helps.
Generally that means it's supporting long file names - names up to about 32K in length.
It can also be used to specify UNC paths, e.g. \\?\UNC\server\share.
Without that support Fastcopy wouldn't be able to access all files properly.
More detail at http://msdn.microsoft.com/en-us/library/windows/desktop/aa365247(v=vs.85).aspx

Rename file in Win32 to name with only differences in capitalization

Does anyone know a pure Win32 solution for renaming a file and only changing its capitalization, that does not involve intermediate renaming to a different name or special privileges (e.g. backup, restore).
Since the Win32 subsystem generally regards two file names differing only in capitalization as the same, I haven't been able to find any solution to the problem.
A test program I made with the MoveFile API seems to work. So does the rename command in cmd.exe. What have you tried, and what error are you getting?
This isn't relevant, but further testing shows that renaming a long filename in this way works but will change the short filename (alternating between ~1 and ~2 for example), incidentally.
Just use the normal MoveFile API. That call probably just turns into ZwSetInformationFile(..., FileRenameInformation,...) The docs for FILE_RENAME_INFORMATION states that you need DELETE access and the file can't be locked etc, but those restrictions will probably apply to other solutions also.
I do not believe there is a way to expose two files with identical names that differ only in spelling to the Win32 subsystem. Even if some how you were able to create these files, the most likely result would be that only one file would be accessible - defeating the purpose of staying soley in Win32.
If you want to go into the Native layer, you can create a file with NtCreateFile and InitializeObjectAttributes w/o OBJ_CASE_INSENSITIVE or you can pad the end with extra spaces (if you pad with extra spaces, the file will not be accessible from Win32 dos paths). See here: http://www.osronline.com/ddkx/kmarch/k109_66uq.htm . I'm pretty sure you were already aware but I included it incase you did not know.
So long as your file is not immediately needed by another program, you can use my solution.
When you rename the file, capitalize, and delete the last letter. Then rename again and return the letter.
:)

Best way of getting path to "Application Data" directory?

There are several possible ways of getting the path to the application data directory:
using the %APPDATA% environment variable
calling SHGetFolderPath with CSIDL_APPDATA
What is the best way to get the path from within an program? Are there any gotchas when I use the environment variable?
Which method is safest across XP, Vista and upcoming versions?
I would suggest that calling SHGetFolderPath() is the most appropriate, and portable method; the alternatives, such as reading an environment variable, or (worse) trying to extract it from the registry are likely to trip you up in the future.
Raymond Chen has an article explaining why pulling such paths from the registry is a bad idea.
string appDataPath =
Environment.GetFolderPath(Environment.SpecialFolder.ApplicationData);
You'll need to use the GetFolderPath method get the actual path as Environment.SpecialFolder.ApplicationData is just an enum.
If you're programming in .NET you can use this:
string appDataPath = Environment.SpecialFolder.ApplicationData;
One important difference in Python: in case of unicode file paths ctypes.windll.shell32.SHGetFolderPathW returns a unicode string, whereasos.environ['APPDATA'] returns a byte string.
In case anyone was wondering, in Ruby the command would look like the following:
appDataPath = ENV['APPDATA']

Are there any invalid linux filenames?

If I wanted to create a string which is guaranteed not to represent a filename, I could put one of the following characters in it on Windows:
\ / : * ? | < >
e.g.
this-is-a-filename.png
?this-is-not.png
Is there any way to identify a string as 'not possibly a file' on Linux?
There are almost no restrictions - apart from '/' and '\0', you're allowed to use anything. However, some people think it's not a good idea to allow this much flexibility.
An empty string is the only truly invalid path name on Linux, which may work for you if you need only one invalid name. You could also use a string like "///foo", which would not be a canonical path name, although it could refer to a file ("/foo"). Another possibility would be something like "/dev/null/foo", since /dev/null has a POSIX-defined non-directory meaning. If you only need strings that could not refer to a regular file you could use "/" or ".", since those are always directories.
Technically it's not invalid but files with dash(-) at the beginning of their name will put you in a lot of troubles. It's because it has conflicts with command arguments.
I personally find that a lot of the time the problem is not Linux but the applications one is using on Linux.
Take for example Amarok. Recently I noticed that certain artists I had copied from my Windows machine where not appearing in the library. I check and confirmed that the files were there and then I noticed that certain characters in the folder names (Named for the artist) were represented with a weird-looking square rather than an actual character.
In a shell terminal the filenames look even stranger: /Music/Albums/Einst$'\374'rzende\ Neubauten is an example of how strange.
While these files were definitely there, Amarok could not see them for some reason. I was able to use some shell trickery to rename them to sane versions which I could then re-name with ASCII-only characters using Musicbrainz Picard. Unfortunately, Picard was also unable to open the files until I renamed them, hence the need for a shell script.
Overall this a a tricky area and it seems to get very thorny if you are trying to synchronise a music collection between Windows and Linux wherein certain folder or file names contain funky characters.
The safest thing to do is stick to ASCII-only filenames.

Resources