I'm having a look at the code of FastCopy, where I want to add a few features.
Internally, it seems that FastCopy stores its paths with a \\?\ before the path. eg. \\?\c:\Program Files\Adobe. These paths are passed on directly to Windows API functions like DeleteFile, RemoveDirectory, etc. so it seems Windows understands the format.
But what do these extra characters mean and why do FastCopy stores them that way?
The thing that's probably most relevant for FastCopy is that it allows you to work with file names more than ~256 characters long.
If memory serves, it also prevents Windows from parsing a file name looking for things like \\server\file to access a shared file (though you can still use \\?\UNC\whatever), but that's probably not what's really intended/relevant here.
You are referring to Long UNC paths : https://en.wikipedia.org/wiki/Path_%28computing%29 Hope that helps.
Generally that means it's supporting long file names - names up to about 32K in length.
It can also be used to specify UNC paths, e.g. \\?\UNC\server\share.
Without that support Fastcopy wouldn't be able to access all files properly.
More detail at http://msdn.microsoft.com/en-us/library/windows/desktop/aa365247(v=vs.85).aspx
Related
I'm reading this. Here I've found some code lines, for example: wsprintf(szDrive, "\\\\.\\%c:", *lpszSrc); I want to ask, what does this string give?
I tried to look for information but all that I've found is:
In the ANSI version of this function, the name is limited to MAX_PATH
characters. To extend this limit to 32,767 wide characters, call the
Unicode version of the function and prepend "\\?\" to the path. For
more information, see Naming Files, Paths, and Namespaces.
and this do not answer into my question, so asking here. As I think it should be connected with windows specific or NTFS but not sure about that.
The %c is the single character format specifier for wsprintf.
The code is used to generate path names of this form:
\\.\C:
This is the path to a physical volume. You use such a path when performing file operations directly on a volume, bypassing the file system. So you'd use such a path when implementing raw disk copy, for example. The documentation for CreateFile has more detail.
This all ties in with the fact that the code you found this in performs a raw disk copy.
Does anyone know a pure Win32 solution for renaming a file and only changing its capitalization, that does not involve intermediate renaming to a different name or special privileges (e.g. backup, restore).
Since the Win32 subsystem generally regards two file names differing only in capitalization as the same, I haven't been able to find any solution to the problem.
A test program I made with the MoveFile API seems to work. So does the rename command in cmd.exe. What have you tried, and what error are you getting?
This isn't relevant, but further testing shows that renaming a long filename in this way works but will change the short filename (alternating between ~1 and ~2 for example), incidentally.
Just use the normal MoveFile API. That call probably just turns into ZwSetInformationFile(..., FileRenameInformation,...) The docs for FILE_RENAME_INFORMATION states that you need DELETE access and the file can't be locked etc, but those restrictions will probably apply to other solutions also.
I do not believe there is a way to expose two files with identical names that differ only in spelling to the Win32 subsystem. Even if some how you were able to create these files, the most likely result would be that only one file would be accessible - defeating the purpose of staying soley in Win32.
If you want to go into the Native layer, you can create a file with NtCreateFile and InitializeObjectAttributes w/o OBJ_CASE_INSENSITIVE or you can pad the end with extra spaces (if you pad with extra spaces, the file will not be accessible from Win32 dos paths). See here: http://www.osronline.com/ddkx/kmarch/k109_66uq.htm . I'm pretty sure you were already aware but I included it incase you did not know.
So long as your file is not immediately needed by another program, you can use my solution.
When you rename the file, capitalize, and delete the last letter. Then rename again and return the letter.
:)
I'm working on maintenance of an application that transfers a file to another system and uses a structured filename to include meta data including a language code. The current app uses a two character language code and a dash/hyphen for a delimiter.
Ex. Canada-EN-ProdName-ProdCode.txt
I'm converting it to use IETF language code and so the dash delimiter won't do and need a replacement. I'm trying to determine a delimiter to avoid future errors and am considering the tilde ~.
Ex. Canada~en-GB~ProdName~ProdCode.txt
This will be use only on Windows Sever 2003 + systems. I certainly didn't come up with this system of parsing a filename to get meta data. Unfortunately, I can't include this in the file itself and the destination system is expecting the language code to be in IETF format with the dash.
Any thoughts on potential issues with using the tilde in the filename, or perhaps a better character to use? I'm just looking for a second opinion in case I'm overlooking a possible failure. I believe windows will use the tilde when shortening a long filename to 8.3 format, but I don't see that as an issue here as the OSs can handle lang filenames.
The tilde is probably fine, but what's wrong with the good old underscore _ ? It has no special meaning on either windows or unix, and makes names that are relatively easy to read. If there are no other special considerations, I would avoid the tilde solely out of paranoia, since windows does use it as a special character sometimes, as you mentioned.
For anyone readiong this question I would strongly recommend anything but the tilde in the file name or at least be careful in testing for any speed problems with any .NET path work where one exists.
I used this as a file name delimiter some time ago. I couldn't understand why simply getting a list of files from the folders was taking so long. It was a number of years later (having written a lot of speed up code that had marginal advantage) that I discovered there is a problem with the (DirectoryInfo(path).name in .NET at least) where simple existience of the tilde was forcing underlying code to through a lot of hoops.
The slow down was substantial (it was over a network so I had thought it was bandwidth/Network issues for a fair while)
I understand this is a legacy overhang for when alternative short versions of filenames could be used for Windows files.
I am now stuck with the tilde in these file names but, given that the problem lay in some of the .NET path functions (I don't actually know if it still does), I could work around it by spotting a tilde and creating my own answers when it existed rather than passing it through.
If in any doubt just run speed tests with and without the tilde in filenames for say just 500-1,000 files.
If I have a string that resolves to a file path in Windows, is there an accepted way to get a canonical form of the file name?
For example, I'd like to know whether
C:\stuff\things\etc\misc\whatever.txt
and
C:\stuff\things\etc\misc\other\..\whatever.txt
actually point to the same file or not, and store the canonical form of the path in my application.
Note that simple string comparisons won't work, nor will any RegEx magic. Remember that we have things like NTFS reparse points to deal with since Windows 2000 and the new Libraries structure in Windows 7.
Short answer: not really.
There is no simple way to get the canonical name of a file on Windows. Local files can be available via reparse points, via SUBST. Do you want to deal with NTFS junctions? Windows shortcuts? What about \\?\-escaped filenames
Remote files can be available via mapped drive letter or via UNC. Is that the UNC to the origin server? Are you using DFS? Is the server using reparse points, etc.? Is the server available by more than one name? What about the IP address? Does it have more than one IP address?
So, if you're looking for something like the inode number on Windows, it ain't there. See, for example, this page.
Roger is correct, there is no simple way. If the volume supports file a unique file index, you can open the file and call GetFileInformationByHandle, but this will not work on all volumes.
The Windows API call GetFullPathName may be the best simple approach.
GetFinalPathNameByHandle appears to do what your asking for, which is available starting with Windows Vista.
Using FileInfo (example in C#):
FileInfo info1 = new FileInfo(#"C:\stuff\things\etc\misc\whatever.txt");
FileInfo info2 = new FileInfo(#"C:\stuff\things\etc\misc\other\..\whatever.txt");
if (info1.FullName.Equals(info2.FullName)) {
Console.WriteLine("yep, they're equal");
}
Console.WriteLine(info1.FullName);
Console.WriteLine(info2.FullName);
Output is:
yep, they're equal
C:\stuff\things\etc\misc\whatever.txt
C:\stuff\things\etc\misc\whatever.txt
jheddings has a nice answer, but since you didn't indicate which language you are using, I thought I'd give a Python way to do it that also works from the command line, using os.path.abspath:
> python -c "import os.path; print os.path.abspath('C:\stuff\things\etc\misc\other\..\whatever.txt')"
C:\stuff\things\etc\misc\whatever.txt
I would use System.IO.Path.GetFullPath. It takes a string as an input (C:\stuff\things\etc\misc\other..\whatever.txt in your case) and will output a string (C:\stuff\things\etc\misc\whatever.txt).
I guess I'm a little late, but you can use System.IO.Path.GetFullPath("C:\stuff\things\etc\misc\other..\whatever.txt") and it will return "C:\stuff\things\etc\misc\whatever.txt"
To get canonical path you should use PathCanonicalize function.
If I wanted to create a string which is guaranteed not to represent a filename, I could put one of the following characters in it on Windows:
\ / : * ? | < >
e.g.
this-is-a-filename.png
?this-is-not.png
Is there any way to identify a string as 'not possibly a file' on Linux?
There are almost no restrictions - apart from '/' and '\0', you're allowed to use anything. However, some people think it's not a good idea to allow this much flexibility.
An empty string is the only truly invalid path name on Linux, which may work for you if you need only one invalid name. You could also use a string like "///foo", which would not be a canonical path name, although it could refer to a file ("/foo"). Another possibility would be something like "/dev/null/foo", since /dev/null has a POSIX-defined non-directory meaning. If you only need strings that could not refer to a regular file you could use "/" or ".", since those are always directories.
Technically it's not invalid but files with dash(-) at the beginning of their name will put you in a lot of troubles. It's because it has conflicts with command arguments.
I personally find that a lot of the time the problem is not Linux but the applications one is using on Linux.
Take for example Amarok. Recently I noticed that certain artists I had copied from my Windows machine where not appearing in the library. I check and confirmed that the files were there and then I noticed that certain characters in the folder names (Named for the artist) were represented with a weird-looking square rather than an actual character.
In a shell terminal the filenames look even stranger: /Music/Albums/Einst$'\374'rzende\ Neubauten is an example of how strange.
While these files were definitely there, Amarok could not see them for some reason. I was able to use some shell trickery to rename them to sane versions which I could then re-name with ASCII-only characters using Musicbrainz Picard. Unfortunately, Picard was also unable to open the files until I renamed them, hence the need for a shell script.
Overall this a a tricky area and it seems to get very thorny if you are trying to synchronise a music collection between Windows and Linux wherein certain folder or file names contain funky characters.
The safest thing to do is stick to ASCII-only filenames.