Uppercase/lowercase filenames regarded as same - windows

File and folder names in Windows (Windows 10) are recorded in UTF-16 LE. And exceptionally,
A and a
B and b
C and c
...
Z and z
are regarded as the same character. For example, we cannot generate abc.txt and aBc.txt in a same folder(without special method).
My question is, are these 26 pairs the only exceptions ?

No, it is not just ASCII. NTFS volumes store the mapping in a hidden special file named $UpCase. This means that the actual mapping can be different on different volumes on the same machine (if there are different NTFS versions on said volumes).
Windows itself handles case sensitivity in multiple ways.
When applications opens a file they can pass a POSIX flag to request different semantics.
The NT API allows the caller to specify case handling on names it passes to the object manager.
Windows 10 allows you to turn off case-sensitivity on a folder with fsutil file setCaseSensitiveInfo ....

Related

How to shorten a Windows filename programmatically - "Bitly for filenames"?

Is there an easy way to take a very long filename (which contains parameters and settings that were used to generate the file) and shorten it programmatically so it will save on Windows?
The filename cannot be sent to a URL shortening service (Bitly, Google, etc.) because the information is confidential and the network is segregated.
The filename has to be shortened internally within the application and without using third-party library and without sending the result to a database.
e.g. Start with a file name so long that Windows will fail to save it:
E:\Results\Job\<SomeJobComponentName>\[A very long file name with the parameters and settings that were used to generate it].csv
And save it as
E:\Results\Job\<SomeJobComponentName>\0eVfd878swg9.csv
And then when the file is read, the code can convert
0eVfd878swg9.csv
back to
[A very long file name with the parameters and settings that were used to generate it].csv
Sort of like a Bitly for filenames.
This is not about encrypting the CONTENT of the file. It's simply about making sure that filenames of any length can be saved without hindrance. The filename needs to be convertible back to the original without opening the file, so that the un-shortened filename can be parsed and viewed in a viewer (which the user uses to browse the result files in the job directory).
I've previously tried the "\?\" prefix to the full path but for whatever reason this fails when using StreamWriter. It can't handle this workaround.
Let me know if I'm barking up the wrong tree here!
Using C# 7.0, Windows Server 2012 R2 and Windows 10 desktop.
According to your question you want to start with a filename so long that Windows will fail to save it. The limit is 255 characters per component (bit between backslashes: folder or filename) and very nearly 32,767 characters, or about 9 A4 pages, for the entire path, which should be sufficient for most normal purposes.
If you are dealing with a filesystem other than NTFS (for example FAT, NFS, ISO-9660) then the limits are considerably tighter. This is about NTFS on Windows 10 Anniversary Update (2016) or later.
While Windows will save and retrieve a file with a path this long, it is possible that some APIs will not. This answer assumes that the file is actually saved under the name you want, but you have to pass a shorter name to such an API.
If the path or filename really is so long that Windows cannot save it, then the file cannot actually exist under that name, and you will have to put the name somewhere outside the filesystem, and one of your constraints is without sending the result to a database, so that possibility is ruled out.
There are two approaches to compression. One exploits redundancy in the data you want to compress, for example run length encoding or Huffman. But that won't work here. There is unlikely to be enough redundancy in the names to make a significant difference. The other is to generate short names and maintain a lookup table. That is what bitly does. Since you have disallowed creating your own lookup table (without sending the result to a database), your only option is to use built-in Windows facilities.
When you save a file in a modern version of Windows, the filesystem will automatically make a short 8.3 filename that will allow the file to be seen and opened by legacy applications. You can retrieve the short filename very simply, like this:
>>> import win32api
>>> win32api.GetShortPathName(r"E:\Dropbox\Rocket Cottage\Sicilian fennel and orange salad with red onion and mint.fdx")
'E:\\Dropbox\\ROCKET~1\\SICILI~1.FDX'
To convert back:
>>> win32api.GetLongPathName(r"E:\Dropbox\ROCKET~1\SICILI~1.FDX")
'E:\\Dropbox\\Rocket Cottage\\Sicilian fennel and orange salad with red onion and mint.fdx'
If using win32api falls foul of your requirement not to use a third-party library (though in a Windows installation that, frankly, borders on religious mania) then you can use subprocess to call dir /X.
C:\Users\xxxxx>dir /x E:\Dropbox\ROCKET~1\SICILI~1.FDX
Volume in drive E is Enigma
Volume Serial Number is D45D-0655
Directory of E:\Dropbox\ROCKET~1
2013-04-17 18:07 17,125 SICILI~1.FDX Sicilian fennel and orange salad with red onion and mint.fdx
C:\Users\xxxxx>dir /x "E:\Dropbox\Rocket Cottage\Sicilian fennel and orange salad with red onion and mint.fdx"
Volume in drive E is Enigma
Volume Serial Number is D45D-0655
Directory of E:\Dropbox\Rocket Cottage
2013-04-17 18:07 17,125 SICILI~1.FDX Sicilian fennel and orange salad with red onion and mint.fdx
The best answer appears to be to upgrade to Windows Server 2016 or later and switch on long file names.
There is no point fighting an uphill battle work around on outdated server operating system when a permanent & official solution is easily available.
Thanks everyone for all the comments.

Is there such a thing as a filename that is too long?

I'm thinking about filenames which are very long and therefore considered as bad practice. For example, the RxJava
project on GitHub contains a Java file that is called "CompletableMergeDelayErrorIterable.java". I'm wondering if filenames like these should be shortened in order to enhance the readability. Is there a rule of thumb? Or should a filename just be self-explanatory?
For NTFS file systems (i.e. Windows), an individual filename is limited to 255 characters. The same goes for any parts of a path, like directory names. Full paths should be shorter than the constant MAX_PATH, which is 260 characters.
See Maximum filename length in NTFS (Windows XP and Windows Vista)?
For Linux, the maximum length is 255 bytes in most cases (depends on the exact file system used).
See Filename length limits on linux?
So yes, long file (and path) names can cause problems and should be avoided!
Additionally, very long class names make your code harder to read. If you want to convey that a class supports certain features, it should be sufficient to mark it with the appropriate interface(s) (e.g. Iterable<T>).

Why does the Win32 MAX_PATH limitation still exist? [duplicate]

I have come up against this problem a few times at inopportune moments:
Trying to work on open source Java projects with deep paths
Storing deep Fitnesse wiki trees in source control
An error trying to use Bazaar to import my source control tree
Why does this limit exist?
Why hasn't it been removed yet?
How do you cope with the path limit?
And no, switching to Linux or Mac OS X is not a valid answer to this question ;)
Quoting this article https://learn.microsoft.com/en-us/windows/desktop/FileIO/naming-a-file#maximum-path-length-limitation
Maximum Path Length Limitation
In the Windows API (with some exceptions discussed in the following paragraphs), the maximum length for a path is MAX_PATH, which is defined as 260 characters. A local path is structured in the following order: drive letter, colon, backslash, name components separated by backslashes, and a terminating null character. For example, the maximum path on drive D is "D:\some 256-character path string<NUL>" where "<NUL>" represents the invisible terminating null character for the current system codepage. (The characters < > are used here for visual clarity and cannot be part of a valid path string.)
Now we see that it is 1+2+256+1 or [drive][:\][path][null] = 260. One could assume that 256 is a reasonable fixed string length from the DOS days. And going back to the DOS APIs we realize that the system tracked the current path per drive, and we have 26 (32 with symbols) maximum drives (and current directories).
The INT 0x21 AH=0x47 says “This function returns the path description without the drive letter and the initial backslash.” So we see that the system stores the CWD as a pair (drive, path) and you ask for the path by specifying the drive (1=A, 2=B, …), if you specify a 0 then it assumes the path for the drive returned by INT 0x21 AH=0x15 AL=0x19. So now we know why it is 260 and not 256, because those 4 bytes are not stored in the path string.
Why a 256 byte path string, because 640K is enough RAM.
This is not strictly true as the NTFS filesystem supports paths up to 32k characters. You can use the win32 api and "\\?\" prefix the path to use greater than 260 characters.
A detailed explanation of long path from the .Net BCL team blog.
A small excerpt highlights the issue with long paths
Another concern is inconsistent behavior that would result by exposing long path support. Long paths with the \\?\ prefix can be used in most of the file-related Windows APIs, but not all Windows APIs. For example, LoadLibrary, which maps a module into the address of the calling process, fails if the file name is longer than MAX_PATH. So this means MoveFile will let you move a DLL to a location such that its path is longer than 260 characters, but when you try to load the DLL, it would fail. There are similar examples throughout the Windows APIs; some workarounds exist, but they are on a case-by-case basis.
The question is why does the limitation still exist. Surely modern Windows can increase the side of MAX_PATH to allow longer paths. Why has the limitation not been removed?
The reason it cannot be removed is that Windows promised it would never change.
Through API contract, Windows has guaranteed all applications that the standard file APIs will never return a path longer than 260 characters.
Consider the following correct code:
WIN32_FIND_DATA findData;
FindFirstFile("C:\Contoso\*", ref findData);
Windows guaranteed my program that it would populate my WIN32_FIND_DATA structure:
WIN32_FIND_DATA {
DWORD dwFileAttributes;
FILETIME ftCreationTime;
FILETIME ftLastAccessTime;
FILETIME ftLastWriteTime;
//...
TCHAR cFileName[MAX_PATH];
//..
}
My application didn't declare the value of the constant MAX_PATH, the Windows API did. My application used that defined value.
My structure is correctly defined, and only allocates 592 bytes total. That means that i am only able to receive a filename that is less than 260 characters. Windows promised me that if i wrote my application correctly, my application would continue to work in the future.
If Windows were to allow filenames longer than 260 characters then my existing application (which used the correct API correctly) would fail.
For anyone calling for Microsoft to change the MAX_PATH constant, they first need to ensure that no existing application fails. For example, i still own and use a Windows application that was written to run on Windows 3.11. It still runs on 64-bit Windows 10. That is what backwards compatibility gets you.
Microsoft did create a way to use the full 32,768 path names; but they had to create a new API contract to do it. For one, you should use the Shell API to enumerate files (as not all files exist on a hard drive or network share).
But they also have to not break existing user applications. The vast majority of applications do not use the shell api for file work. Everyone just calls FindFirstFile/FindNextFile and calls it a day.
From Windows 10. you can remove the limitation by modifying a registry key.
Tip Starting in Windows 10, version 1607, MAX_PATH limitations have been removed from common Win32 file and directory functions. However, you must opt-in to the new behavior.
A registry key allows you to enable or disable the new long path behavior. To enable long path behavior set the registry key at HKLM\SYSTEM\CurrentControlSet\Control\FileSystem LongPathsEnabled (Type: REG_DWORD). The key's value will be cached by the system (per process) after the first call to an affected Win32 file or directory function (list follows). The registry key will not be reloaded during the lifetime of the process. In order for all apps on the system to recognize the value of the key, a reboot might be required because some processes may have started before the key was set.
The registry key can also be controlled via Group Policy at Computer Configuration > Administrative Templates > System > Filesystem > Enable NTFS long paths.
You can also enable the new long path behavior per app via the manifest:
<application xmlns="urn:schemas-microsoft-com:asm.v3">
<windowsSettings xmlns:ws2="http://schemas.microsoft.com/SMI/2016/WindowsSettings">
<ws2:longPathAware>true</ws2:longPathAware>
</windowsSettings>
</application>
You can mount a folder as a drive. From the command line, if you have a path C:\path\to\long\folder you can map it to drive letter X: using:
subst x: \path\to\long\folder
One way to cope with the path limit is to shorten path entries with symbolic links.
For example:
create a C:\p directory to keep short links to long paths
mklink /J C:\p\foo C:\Some\Crazy\Long\Path\foo
add C:\p\foo to your path instead of the long path
You can enable long path names using PowerShell:
Set-ItemProperty -Path 'HKLM:\SYSTEM\CurrentControlSet\Control\FileSystem' -Name LongPathsEnabled -Type DWord -Value 1
Another version is to use a Group Policy in Computer Configuration/Administrative Templates/System/Filesystem:
As to why this still exists - MS doesn't consider it a priority, and values backwards compatibility over advancing their OS (at least in this instance).
A workaround I use is to use the "short names" for the directories in the path, instead of their standard, human-readable versions. So e.g. for C:\Program Files\ I would use C:\PROGRA~1\ You can find the short name equivalents using dir /x.
As to how to cope with the path size limitation on Windows - using 7zip to pack (and unpack) your path-length sensitive files seems like a viable workaround. I've used it to transport several IDE installations (those Eclipse plugin paths, yikes!) and piles of autogenerated documentation and haven't had a single problem so far.
Not really sure how it evades the 260 char limit set by Windows (from a technical PoV), but hey, it works!
More details on their SourceForge page here:
"NTFS can actually support pathnames up to 32,000 characters in
length."
7-zip also support such long names.
But it's disabled in SFX code. Some users don't like long paths, since
they don't understand how to work with them. That is why I have
disabled it in SFX code.
and release notes:
9.32 alpha 2013-12-01
Improved support for file pathnames longer than 260 characters.
4.44 beta 2007-01-20
7-Zip now supports file pathnames longer than 260 characters.
IMPORTANT NOTE: For this to work properly, you'll need to specify the destination path in the 7zip "Extract" dialog directly, rather than dragging & dropping the files into the intended folder. Otherwise the "Temp" folder will be used as an interim cache and you'll bounce into the same 260 char limitation once Windows Explorer starts moving the files to their "final resting place". See the replies to this question for more information.
It does, and it is a default for some reason, but you could easily override it with this registry key:
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\FileSystem]
"LongPathsEnabled"=dword:00000001
See: https://blogs.msdn.microsoft.com/jeremykuhne/2016/07/30/net-4-6-2-and-long-paths-on-windows-10/
Another way to cope with it is to use Cygwin, depending on what do you want to do with the files (i.e. if Cygwin commands suit your needs)
For example it allows to copy, move or rename files that even Windows Explorer can't. Or of course deal with the contents of them like md5sum, grep, gzip, etc.
Also for programs that you are coding, you could link them to the Cygwin DLL and it would enable them to use long paths (I haven't tested this though)

On Windows, what is the maximum file name length considered acceptable for an app to output? (Updated and clarified)

Many Windows applications (e.g., almost all .NET apps) cannot open paths more than 260 characters in length. I am batch renaming a list of podcast files. I want to name each file after the title of the episode, but the titles are up to 100 characters long. This means that if a user saves the file in a deep directory with a very long path, they may hit the limit and be unable to open the file in those other applications.
Is it acceptable for my program to put out file names this long, and leave it to the user to deal with very long paths when it comes up? iTunes crops at 40 chars, but that seems very conservative.
Thanks to Ben Voigt for clarifying that this only applies to certain apps.
Windows does NOT have a limit of 255 characters for file paths.
CreateFileA has a limit of 260 characters. CreateFileW supports names up to about 32760 characters (64kB).
Some filesystems impose additional limits on the maximum directory nesting level, or the maximum length of each part.
You're probably thinking of certain popular Windows programs that have a 255 character limit, but accommodating those with a warning or user-configurable setting is probably more appropriate than adding your own hard limit.
Well I remember having an USB that didn't support more than 32 characters in the filename (can't remember which FS it had). I also just found this link on Google: http://msdn.microsoft.com/en-us/library/windows/desktop/aa365247(v=vs.85).aspx
Character count limitations can also be different and can vary depending on the file system and path name prefix format used. This is further complicated by support for backward compatibility mechanisms. For example, the older MS-DOS FAT file system supports a maximum of 8 characters for the base file name and 3 characters for the extension, for a total of 12 characters including the dot separator. This is commonly known as an 8.3 file name. The Windows FAT and NTFS file systems are not limited to 8.3 file names, because they have long file name support, but they still support the 8.3 version of long file names.

Should I deal with files longer than MAX_PATH?

Just had an interesting case.
My software reported back a failure caused by a path being longer than MAX_PATH.
The path was just a plain old document in My Documents, e.g.:
C:\Documents and Settings\Bill\Some Stupid FOlder Name\A really ridiculously long file thats really very very very..........very long.pdf
Total length 269 characters (MAX_PATH==260).
The user wasn't using a external hard drive or anything like that. This was a file on an Windows managed drive.
So my question is this. Should I care?
I'm not saying can I deal with the long paths, I'm asking should I. Yes I'm aware of the "\?\" unicode hack on some Win32 APIs, but it seems this hack is not without risk (as it's changing the behaviour of the way the APIs parse paths) and also isn't supported by all APIs .
So anyway, let me just state my position/assertions:
First presumably the only way the user was able to break this limit is if the app she used uses the special Unicode hack. It's a PDF file, so maybe the PDF tool she used uses this hack.
I tried to reproduce this (by using the unicode hack) and experimented. What I found was that although the file appears in Explorer, I can do nothing with it. I can't open it, I can't choose "Properties" (Windows 7). Other common apps can't open the file (e.g. IE, Firefox, Notepad). Explorer will also not let me create files/dirs which are too long - it just refuses. Ditto for command line tool cmd.exe.
So basically, one could look at it this way: a rouge tool has allowed the user to create a file which is not accessible by a lot of Windows (e.g. Explorer). I could take the view that I shouldn't have to deal with this.
(As an aside, this isn't an vote of approval for a short max path length: I think 260 chars is a joke, I'm just saying that if Windows shell and some APIs can't handle > 260 then why should I?).
So, is this a fair view? Should I say "Not my problem"?
UPDATE: Just had another user with the same problem. This time an mp3 file. Am I missing something? How can these users be creating files that violate the MAX_PATH rule?
It's not a real problem. NTFS support filenames up to 32K (32,767 wide characters). You need only use correct API and correct syntax of filenames. The base rule is: the filename should start with '\\?\' (see http://msdn.microsoft.com/en-us/library/aa365247(v=VS.85).aspx) like \\?\C:\Temp. The same syntax you can use with UNC: \\?\UNC\Server\share\Path. Important to understand that you can use only a small subset of API function. For example look at MSDN description of functions
CreateFile
CreateDirectory
MoveFile
and so on
you will find text like :
In the ANSI version of this function,
the name is limited to MAX_PATH
characters. To extend this limit to
32,767 wide characters, call the
Unicode version of the function and
prepend "\?\" to the path. For more
information, see Naming a File.
This functions you can safe use. If you have a file handle from CreateFile you can use all other functions used hFile (ReadFile, WriteFile etc.) without any restriction.
If you write a program like virus scanner or backup software or some good software running on a server you should write your program so, that all file operations support filenames up to 32K characters and not MAX_PATH characters.
This limitation is baked into a lot of software written in C or C++. Including MSFT code, although they've been chipping away at it. It is only partly a Win32 limitation, it still has a hard upper limit on the length of a file name (not path) through WIN32_FIND_DATA for example. One reason that even .NET has length restrictions. This is not going away any time soon, Win32 is still going strong and the stone-age C string won't disappear.
Your customer will have little sympathy with it, no doubt, probably until you can show them another program that fails the same way. Do however make sure that your code reliably can detect the potential string buffer overflow, followed by a reasonable diagnostic. No sympathy for programs bombing on heap corruption.
As you mentioned many of the Windows Shell functions only work on paths up to MAX_PATH. Windows XP and I believe Vista both have problems in Explorer when nesting directories with long file names. I've not checked Windows 7 - perhaps they have fixed that. This unfortunately means that users have a hard time browsing these file.
If you really wish to support long paths you'll need to check any functions you are using in Shell32.dll that take paths to ensure they support long paths. For those that don't you'll have to use write them yourself using Kernel32 functions.
If you decide to use Shell32 and be limited to MAX_PATH, writing your code to support long file paths would be advisable. If Microsoft later change Shell32 (or create an alternative), you will be better positioned to add support for them.
Just to add another couple of dimensions to the problem, remember that filenames are UTF-16, and you may encounter non NTFS or FAT filesystems that may be case sensitive!
Your own APIs should not hard-code a fixed limit on the path length (or any other hard limits); however, you shouldn't violate the preconditions of the system APIs in order to accomplish some task. IMHO, the fact that Windows limits the length of path names is absurd and should be considered a bug. That said, no I would suggest you not attempt to use the various system APIs other than as documented, even if that results in certain undesireable behavior such as limits to the maximum path length. So, in short, your view is completely fair; if the OS doesn't support it, then the OS doesn't support it. That said, you may want to make it clear to users that this is a limitation of Windows and not of your own code.
One easy way how these files with long paths could be created even by software that does not support paths longer than MAX_PATH: through a file share.
Example:
"C:\My veeeeeeeeeeeeeeeeeeeeery looooooooooooooooooong folder" could be shared as "data". Users could then access that folder through the UNC path \\computer\data or (even shorter) through a drive letter (M:\) assuming that M: is mapped to \\computer\data.
This often happens on file servers.
Paths often can be bigger than 260, one example would be when symlinks get nested and repeat over and over sometimes even on purpose. I think programmers should think about whether they want their program to handle these insanely large paths or not. IMO, 260 is PLENTY of space but thats just me. My answer to this is:
if you have to ask yourself so deeply about breaking the 260 char limit, then thats probably what you should do. We often look for confirmation when we are about to do something that we are unsure about...
I think the maximum path anywhere in the API is about 32k long but thats up to you. Back in the day that was a pretty big chunk of change (half of an entire memory segment!! sheesh!) but nowdays, in the segment-transparent addressing environment in which we live, where all memory is heaped together on the flat, 32k is nothin'... AFAIK paths wouldn't need to be that long unless you are using some fancy unicode language that requires lots of other characters, etc, etc.. we could blab about this all day but you get the idea. I hope this helps..... or hurts?
I am doung some C programming and I was searching for a way to get the maximum length of a given filename, after a search for MAX_PATH I stumbled to this thread and after som thoughts on this matter and after reading the comments on this thread I have come to the following conclusion.
So I understand that NTFS support filenames up to 32.767 characters in length, however, according to knowledge FAT16 only support 11 character filenames, 8 + 3, so in reallity operating systems should have a function which our program can call to dertemine the maximum filename size, simply because all filesystems have different limitations including the length of the filename.
So the end conclusion must be that since us, the developers, don't know anything about which filesystem the data is going to be stored in, so therefore the only solution must be an try and error method.
Not strictly an answer to your specific question, but it might help those who do need to handle long file names.
The Delimon library is a .NET Framework 4 based library on Microsoft TechNet for overcoming the long filenames problem:
Delimon.Win32.I​O Library (V4.0).
It has its own versions of key methods from System.IO. For example, you would replace:
System.IO.Directory.GetFiles
with
Delimon.Win32.IO.Directory.GetFiles
which will let you handle long files and folders.
From the website:
Delimon.Win32.IO replaces basic file functions of System.IO and
supports File & Folder names up to up to 32,767 Characters.
This Library is written on .NET Framework 4.0 and can be used either
on x86 & x64 systems. The File & Folder limitations of the standard
System.IO namespace can work with files that have 260 characters in a
filename and 240 characters in a folder name (MAX_PATH is usually
configured as 260 characters). Typically you run into the
System.IO.PathTooLongException Error with the Standard .NET Library.

Resources