Is there an easy way to take a very long filename (which contains parameters and settings that were used to generate the file) and shorten it programmatically so it will save on Windows?
The filename cannot be sent to a URL shortening service (Bitly, Google, etc.) because the information is confidential and the network is segregated.
The filename has to be shortened internally within the application and without using third-party library and without sending the result to a database.
e.g. Start with a file name so long that Windows will fail to save it:
E:\Results\Job\<SomeJobComponentName>\[A very long file name with the parameters and settings that were used to generate it].csv
And save it as
E:\Results\Job\<SomeJobComponentName>\0eVfd878swg9.csv
And then when the file is read, the code can convert
0eVfd878swg9.csv
back to
[A very long file name with the parameters and settings that were used to generate it].csv
Sort of like a Bitly for filenames.
This is not about encrypting the CONTENT of the file. It's simply about making sure that filenames of any length can be saved without hindrance. The filename needs to be convertible back to the original without opening the file, so that the un-shortened filename can be parsed and viewed in a viewer (which the user uses to browse the result files in the job directory).
I've previously tried the "\?\" prefix to the full path but for whatever reason this fails when using StreamWriter. It can't handle this workaround.
Let me know if I'm barking up the wrong tree here!
Using C# 7.0, Windows Server 2012 R2 and Windows 10 desktop.
According to your question you want to start with a filename so long that Windows will fail to save it. The limit is 255 characters per component (bit between backslashes: folder or filename) and very nearly 32,767 characters, or about 9 A4 pages, for the entire path, which should be sufficient for most normal purposes.
If you are dealing with a filesystem other than NTFS (for example FAT, NFS, ISO-9660) then the limits are considerably tighter. This is about NTFS on Windows 10 Anniversary Update (2016) or later.
While Windows will save and retrieve a file with a path this long, it is possible that some APIs will not. This answer assumes that the file is actually saved under the name you want, but you have to pass a shorter name to such an API.
If the path or filename really is so long that Windows cannot save it, then the file cannot actually exist under that name, and you will have to put the name somewhere outside the filesystem, and one of your constraints is without sending the result to a database, so that possibility is ruled out.
There are two approaches to compression. One exploits redundancy in the data you want to compress, for example run length encoding or Huffman. But that won't work here. There is unlikely to be enough redundancy in the names to make a significant difference. The other is to generate short names and maintain a lookup table. That is what bitly does. Since you have disallowed creating your own lookup table (without sending the result to a database), your only option is to use built-in Windows facilities.
When you save a file in a modern version of Windows, the filesystem will automatically make a short 8.3 filename that will allow the file to be seen and opened by legacy applications. You can retrieve the short filename very simply, like this:
>>> import win32api
>>> win32api.GetShortPathName(r"E:\Dropbox\Rocket Cottage\Sicilian fennel and orange salad with red onion and mint.fdx")
'E:\\Dropbox\\ROCKET~1\\SICILI~1.FDX'
To convert back:
>>> win32api.GetLongPathName(r"E:\Dropbox\ROCKET~1\SICILI~1.FDX")
'E:\\Dropbox\\Rocket Cottage\\Sicilian fennel and orange salad with red onion and mint.fdx'
If using win32api falls foul of your requirement not to use a third-party library (though in a Windows installation that, frankly, borders on religious mania) then you can use subprocess to call dir /X.
C:\Users\xxxxx>dir /x E:\Dropbox\ROCKET~1\SICILI~1.FDX
Volume in drive E is Enigma
Volume Serial Number is D45D-0655
Directory of E:\Dropbox\ROCKET~1
2013-04-17 18:07 17,125 SICILI~1.FDX Sicilian fennel and orange salad with red onion and mint.fdx
C:\Users\xxxxx>dir /x "E:\Dropbox\Rocket Cottage\Sicilian fennel and orange salad with red onion and mint.fdx"
Volume in drive E is Enigma
Volume Serial Number is D45D-0655
Directory of E:\Dropbox\Rocket Cottage
2013-04-17 18:07 17,125 SICILI~1.FDX Sicilian fennel and orange salad with red onion and mint.fdx
The best answer appears to be to upgrade to Windows Server 2016 or later and switch on long file names.
There is no point fighting an uphill battle work around on outdated server operating system when a permanent & official solution is easily available.
Thanks everyone for all the comments.
Related
In a Windows 8 Command Prompt, I had a backup drive plugged in and I navigated to my User directory. I executed the command:
copy Documents G:/Seagate_backup/Documents
What I assumed was that copy would create the Documents directory on my backup drive and then copy the contents of the C: Documents directory into it. That is not what happened!
I proceeded to wipe my hard-drive and re-install the operating system, thinking I had backed up the important files, only to find out that copy seemingly concatenated all the C: Documents files of different types (.doc, .pdf, .txt, etc) into one file called "Documents." This file is of course unreadable but opening it in Notepad reveals what happened. I can see some of my documents which were plain text throughout the massively long file.
How do I undo this!!? It's terrible because I was actually helping a friend and was so sure of myself but now this has happened. The only thing I can think of doing is searching for some common separator amongst the concatenated files and write some sort of script to split the file back apart. But then I would have to guess the extensions of each of the pieces...
Merging files together in the fashion that copy uses, discards important file system information such as file size and file name. While the file name may not be as important the size is. Both parameters are used by the OS to discriminate files.
This problem might sound familiar if you have damaged your file allocation table before and all files disappeared. In both cases, you will end up with a binary blob (be it an actual disk or something like your file which might resemble a disk image) that lacks any size and filename information.
Fortunately, this is where a lot of file system recovery tools can help. They are specialized in matching patterns. Specifically they are looking for giveaway clues to what type a file is of, where it starts and what it's size is.
This is for instance enabled by many file types having a set of magic numbers that are used to allow a program to check if a file really is of the type that the extension claims to be.
In principle it is possible to undo this process more or less well.
You will need to use data recovery tools or other analysis tools like binwalk to extract the concatenated binary blob. Essentially the same tools that are used to recover deleted files should be able to extract your documents again. Without any filename of course. I recommend renaming the file to a disk image (.img) and either mounting it from within the operating system as a virtual harddisk (don't worry that it has no file system - it should show up as an unformatted drive) or directly using a data recovery tool or analysis tool which can read binary files (binwalk, for instance, can do that directly, but may not find all types of files as it's mainly for unpacking firmware images that may be assembled in the same or a similar way to how your files ended up).
I have come up against this problem a few times at inopportune moments:
Trying to work on open source Java projects with deep paths
Storing deep Fitnesse wiki trees in source control
An error trying to use Bazaar to import my source control tree
Why does this limit exist?
Why hasn't it been removed yet?
How do you cope with the path limit?
And no, switching to Linux or Mac OS X is not a valid answer to this question ;)
Quoting this article https://learn.microsoft.com/en-us/windows/desktop/FileIO/naming-a-file#maximum-path-length-limitation
Maximum Path Length Limitation
In the Windows API (with some exceptions discussed in the following paragraphs), the maximum length for a path is MAX_PATH, which is defined as 260 characters. A local path is structured in the following order: drive letter, colon, backslash, name components separated by backslashes, and a terminating null character. For example, the maximum path on drive D is "D:\some 256-character path string<NUL>" where "<NUL>" represents the invisible terminating null character for the current system codepage. (The characters < > are used here for visual clarity and cannot be part of a valid path string.)
Now we see that it is 1+2+256+1 or [drive][:\][path][null] = 260. One could assume that 256 is a reasonable fixed string length from the DOS days. And going back to the DOS APIs we realize that the system tracked the current path per drive, and we have 26 (32 with symbols) maximum drives (and current directories).
The INT 0x21 AH=0x47 says “This function returns the path description without the drive letter and the initial backslash.” So we see that the system stores the CWD as a pair (drive, path) and you ask for the path by specifying the drive (1=A, 2=B, …), if you specify a 0 then it assumes the path for the drive returned by INT 0x21 AH=0x15 AL=0x19. So now we know why it is 260 and not 256, because those 4 bytes are not stored in the path string.
Why a 256 byte path string, because 640K is enough RAM.
This is not strictly true as the NTFS filesystem supports paths up to 32k characters. You can use the win32 api and "\\?\" prefix the path to use greater than 260 characters.
A detailed explanation of long path from the .Net BCL team blog.
A small excerpt highlights the issue with long paths
Another concern is inconsistent behavior that would result by exposing long path support. Long paths with the \\?\ prefix can be used in most of the file-related Windows APIs, but not all Windows APIs. For example, LoadLibrary, which maps a module into the address of the calling process, fails if the file name is longer than MAX_PATH. So this means MoveFile will let you move a DLL to a location such that its path is longer than 260 characters, but when you try to load the DLL, it would fail. There are similar examples throughout the Windows APIs; some workarounds exist, but they are on a case-by-case basis.
The question is why does the limitation still exist. Surely modern Windows can increase the side of MAX_PATH to allow longer paths. Why has the limitation not been removed?
The reason it cannot be removed is that Windows promised it would never change.
Through API contract, Windows has guaranteed all applications that the standard file APIs will never return a path longer than 260 characters.
Consider the following correct code:
WIN32_FIND_DATA findData;
FindFirstFile("C:\Contoso\*", ref findData);
Windows guaranteed my program that it would populate my WIN32_FIND_DATA structure:
WIN32_FIND_DATA {
DWORD dwFileAttributes;
FILETIME ftCreationTime;
FILETIME ftLastAccessTime;
FILETIME ftLastWriteTime;
//...
TCHAR cFileName[MAX_PATH];
//..
}
My application didn't declare the value of the constant MAX_PATH, the Windows API did. My application used that defined value.
My structure is correctly defined, and only allocates 592 bytes total. That means that i am only able to receive a filename that is less than 260 characters. Windows promised me that if i wrote my application correctly, my application would continue to work in the future.
If Windows were to allow filenames longer than 260 characters then my existing application (which used the correct API correctly) would fail.
For anyone calling for Microsoft to change the MAX_PATH constant, they first need to ensure that no existing application fails. For example, i still own and use a Windows application that was written to run on Windows 3.11. It still runs on 64-bit Windows 10. That is what backwards compatibility gets you.
Microsoft did create a way to use the full 32,768 path names; but they had to create a new API contract to do it. For one, you should use the Shell API to enumerate files (as not all files exist on a hard drive or network share).
But they also have to not break existing user applications. The vast majority of applications do not use the shell api for file work. Everyone just calls FindFirstFile/FindNextFile and calls it a day.
From Windows 10. you can remove the limitation by modifying a registry key.
Tip Starting in Windows 10, version 1607, MAX_PATH limitations have been removed from common Win32 file and directory functions. However, you must opt-in to the new behavior.
A registry key allows you to enable or disable the new long path behavior. To enable long path behavior set the registry key at HKLM\SYSTEM\CurrentControlSet\Control\FileSystem LongPathsEnabled (Type: REG_DWORD). The key's value will be cached by the system (per process) after the first call to an affected Win32 file or directory function (list follows). The registry key will not be reloaded during the lifetime of the process. In order for all apps on the system to recognize the value of the key, a reboot might be required because some processes may have started before the key was set.
The registry key can also be controlled via Group Policy at Computer Configuration > Administrative Templates > System > Filesystem > Enable NTFS long paths.
You can also enable the new long path behavior per app via the manifest:
<application xmlns="urn:schemas-microsoft-com:asm.v3">
<windowsSettings xmlns:ws2="http://schemas.microsoft.com/SMI/2016/WindowsSettings">
<ws2:longPathAware>true</ws2:longPathAware>
</windowsSettings>
</application>
You can mount a folder as a drive. From the command line, if you have a path C:\path\to\long\folder you can map it to drive letter X: using:
subst x: \path\to\long\folder
One way to cope with the path limit is to shorten path entries with symbolic links.
For example:
create a C:\p directory to keep short links to long paths
mklink /J C:\p\foo C:\Some\Crazy\Long\Path\foo
add C:\p\foo to your path instead of the long path
You can enable long path names using PowerShell:
Set-ItemProperty -Path 'HKLM:\SYSTEM\CurrentControlSet\Control\FileSystem' -Name LongPathsEnabled -Type DWord -Value 1
Another version is to use a Group Policy in Computer Configuration/Administrative Templates/System/Filesystem:
As to why this still exists - MS doesn't consider it a priority, and values backwards compatibility over advancing their OS (at least in this instance).
A workaround I use is to use the "short names" for the directories in the path, instead of their standard, human-readable versions. So e.g. for C:\Program Files\ I would use C:\PROGRA~1\ You can find the short name equivalents using dir /x.
As to how to cope with the path size limitation on Windows - using 7zip to pack (and unpack) your path-length sensitive files seems like a viable workaround. I've used it to transport several IDE installations (those Eclipse plugin paths, yikes!) and piles of autogenerated documentation and haven't had a single problem so far.
Not really sure how it evades the 260 char limit set by Windows (from a technical PoV), but hey, it works!
More details on their SourceForge page here:
"NTFS can actually support pathnames up to 32,000 characters in
length."
7-zip also support such long names.
But it's disabled in SFX code. Some users don't like long paths, since
they don't understand how to work with them. That is why I have
disabled it in SFX code.
and release notes:
9.32 alpha 2013-12-01
Improved support for file pathnames longer than 260 characters.
4.44 beta 2007-01-20
7-Zip now supports file pathnames longer than 260 characters.
IMPORTANT NOTE: For this to work properly, you'll need to specify the destination path in the 7zip "Extract" dialog directly, rather than dragging & dropping the files into the intended folder. Otherwise the "Temp" folder will be used as an interim cache and you'll bounce into the same 260 char limitation once Windows Explorer starts moving the files to their "final resting place". See the replies to this question for more information.
It does, and it is a default for some reason, but you could easily override it with this registry key:
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\FileSystem]
"LongPathsEnabled"=dword:00000001
See: https://blogs.msdn.microsoft.com/jeremykuhne/2016/07/30/net-4-6-2-and-long-paths-on-windows-10/
Another way to cope with it is to use Cygwin, depending on what do you want to do with the files (i.e. if Cygwin commands suit your needs)
For example it allows to copy, move or rename files that even Windows Explorer can't. Or of course deal with the contents of them like md5sum, grep, gzip, etc.
Also for programs that you are coding, you could link them to the Cygwin DLL and it would enable them to use long paths (I haven't tested this though)
I'm looking for a method of converting .epsf to .eps for a publication I'm submitting. The submission site requires .eps (even though my understanding is that modern renderers should be able to read .epsf as well - the site is archaic, I have to upload all 100 images individually.) My co-author sent me the zipped files to upload (and now to convert) - I didn't make them myself. Further, the programs that made these images may exist on my co-authors computer but where is uncertain.
I've tried this in Mathematica 8 to reasonable but not full success - as in colored files become black and white, files with duplicate entries (as in Fig11a.eps and Fig11a.epsf both exist though they are different, it seems that the .eps is the background and the .epsf is the foreground layer) convert incorrectly. My attempt was to import the .epsf files to Mathematica and export them as .eps.
Also, I've using a middle man format - e.g. gif/tiff/png/jpg - with similar results. I haven't been able to find a program that's free (I assume photoshop could pull this off) that I could use - also I'd like to do it as a batch. A method that uses requires python/Mathematica or XP/Linux OS's would be fine. Thanks.
You do not need to convert anything. Encapsulated PostScript files can have both extensions (both EPS and EPSF). If you publisher refuses to accept files with an EPSF extension just rename them to EPS.
Any processing/conversion you do on the files (using GhostScript, Mathematica, etc.) carries the risk of corrupting the graphics in some way. But there's no need to do it. Send them as they are or rename them if you prefer.
(If you have any doubt, you can check the EPS Format Specification from 1992 which says that on the Macintish the recommended file extension is .epsf while on DOS it's .EPS)
Just had an interesting case.
My software reported back a failure caused by a path being longer than MAX_PATH.
The path was just a plain old document in My Documents, e.g.:
C:\Documents and Settings\Bill\Some Stupid FOlder Name\A really ridiculously long file thats really very very very..........very long.pdf
Total length 269 characters (MAX_PATH==260).
The user wasn't using a external hard drive or anything like that. This was a file on an Windows managed drive.
So my question is this. Should I care?
I'm not saying can I deal with the long paths, I'm asking should I. Yes I'm aware of the "\?\" unicode hack on some Win32 APIs, but it seems this hack is not without risk (as it's changing the behaviour of the way the APIs parse paths) and also isn't supported by all APIs .
So anyway, let me just state my position/assertions:
First presumably the only way the user was able to break this limit is if the app she used uses the special Unicode hack. It's a PDF file, so maybe the PDF tool she used uses this hack.
I tried to reproduce this (by using the unicode hack) and experimented. What I found was that although the file appears in Explorer, I can do nothing with it. I can't open it, I can't choose "Properties" (Windows 7). Other common apps can't open the file (e.g. IE, Firefox, Notepad). Explorer will also not let me create files/dirs which are too long - it just refuses. Ditto for command line tool cmd.exe.
So basically, one could look at it this way: a rouge tool has allowed the user to create a file which is not accessible by a lot of Windows (e.g. Explorer). I could take the view that I shouldn't have to deal with this.
(As an aside, this isn't an vote of approval for a short max path length: I think 260 chars is a joke, I'm just saying that if Windows shell and some APIs can't handle > 260 then why should I?).
So, is this a fair view? Should I say "Not my problem"?
UPDATE: Just had another user with the same problem. This time an mp3 file. Am I missing something? How can these users be creating files that violate the MAX_PATH rule?
It's not a real problem. NTFS support filenames up to 32K (32,767 wide characters). You need only use correct API and correct syntax of filenames. The base rule is: the filename should start with '\\?\' (see http://msdn.microsoft.com/en-us/library/aa365247(v=VS.85).aspx) like \\?\C:\Temp. The same syntax you can use with UNC: \\?\UNC\Server\share\Path. Important to understand that you can use only a small subset of API function. For example look at MSDN description of functions
CreateFile
CreateDirectory
MoveFile
and so on
you will find text like :
In the ANSI version of this function,
the name is limited to MAX_PATH
characters. To extend this limit to
32,767 wide characters, call the
Unicode version of the function and
prepend "\?\" to the path. For more
information, see Naming a File.
This functions you can safe use. If you have a file handle from CreateFile you can use all other functions used hFile (ReadFile, WriteFile etc.) without any restriction.
If you write a program like virus scanner or backup software or some good software running on a server you should write your program so, that all file operations support filenames up to 32K characters and not MAX_PATH characters.
This limitation is baked into a lot of software written in C or C++. Including MSFT code, although they've been chipping away at it. It is only partly a Win32 limitation, it still has a hard upper limit on the length of a file name (not path) through WIN32_FIND_DATA for example. One reason that even .NET has length restrictions. This is not going away any time soon, Win32 is still going strong and the stone-age C string won't disappear.
Your customer will have little sympathy with it, no doubt, probably until you can show them another program that fails the same way. Do however make sure that your code reliably can detect the potential string buffer overflow, followed by a reasonable diagnostic. No sympathy for programs bombing on heap corruption.
As you mentioned many of the Windows Shell functions only work on paths up to MAX_PATH. Windows XP and I believe Vista both have problems in Explorer when nesting directories with long file names. I've not checked Windows 7 - perhaps they have fixed that. This unfortunately means that users have a hard time browsing these file.
If you really wish to support long paths you'll need to check any functions you are using in Shell32.dll that take paths to ensure they support long paths. For those that don't you'll have to use write them yourself using Kernel32 functions.
If you decide to use Shell32 and be limited to MAX_PATH, writing your code to support long file paths would be advisable. If Microsoft later change Shell32 (or create an alternative), you will be better positioned to add support for them.
Just to add another couple of dimensions to the problem, remember that filenames are UTF-16, and you may encounter non NTFS or FAT filesystems that may be case sensitive!
Your own APIs should not hard-code a fixed limit on the path length (or any other hard limits); however, you shouldn't violate the preconditions of the system APIs in order to accomplish some task. IMHO, the fact that Windows limits the length of path names is absurd and should be considered a bug. That said, no I would suggest you not attempt to use the various system APIs other than as documented, even if that results in certain undesireable behavior such as limits to the maximum path length. So, in short, your view is completely fair; if the OS doesn't support it, then the OS doesn't support it. That said, you may want to make it clear to users that this is a limitation of Windows and not of your own code.
One easy way how these files with long paths could be created even by software that does not support paths longer than MAX_PATH: through a file share.
Example:
"C:\My veeeeeeeeeeeeeeeeeeeeery looooooooooooooooooong folder" could be shared as "data". Users could then access that folder through the UNC path \\computer\data or (even shorter) through a drive letter (M:\) assuming that M: is mapped to \\computer\data.
This often happens on file servers.
Paths often can be bigger than 260, one example would be when symlinks get nested and repeat over and over sometimes even on purpose. I think programmers should think about whether they want their program to handle these insanely large paths or not. IMO, 260 is PLENTY of space but thats just me. My answer to this is:
if you have to ask yourself so deeply about breaking the 260 char limit, then thats probably what you should do. We often look for confirmation when we are about to do something that we are unsure about...
I think the maximum path anywhere in the API is about 32k long but thats up to you. Back in the day that was a pretty big chunk of change (half of an entire memory segment!! sheesh!) but nowdays, in the segment-transparent addressing environment in which we live, where all memory is heaped together on the flat, 32k is nothin'... AFAIK paths wouldn't need to be that long unless you are using some fancy unicode language that requires lots of other characters, etc, etc.. we could blab about this all day but you get the idea. I hope this helps..... or hurts?
I am doung some C programming and I was searching for a way to get the maximum length of a given filename, after a search for MAX_PATH I stumbled to this thread and after som thoughts on this matter and after reading the comments on this thread I have come to the following conclusion.
So I understand that NTFS support filenames up to 32.767 characters in length, however, according to knowledge FAT16 only support 11 character filenames, 8 + 3, so in reallity operating systems should have a function which our program can call to dertemine the maximum filename size, simply because all filesystems have different limitations including the length of the filename.
So the end conclusion must be that since us, the developers, don't know anything about which filesystem the data is going to be stored in, so therefore the only solution must be an try and error method.
Not strictly an answer to your specific question, but it might help those who do need to handle long file names.
The Delimon library is a .NET Framework 4 based library on Microsoft TechNet for overcoming the long filenames problem:
Delimon.Win32.IO Library (V4.0).
It has its own versions of key methods from System.IO. For example, you would replace:
System.IO.Directory.GetFiles
with
Delimon.Win32.IO.Directory.GetFiles
which will let you handle long files and folders.
From the website:
Delimon.Win32.IO replaces basic file functions of System.IO and
supports File & Folder names up to up to 32,767 Characters.
This Library is written on .NET Framework 4.0 and can be used either
on x86 & x64 systems. The File & Folder limitations of the standard
System.IO namespace can work with files that have 260 characters in a
filename and 240 characters in a folder name (MAX_PATH is usually
configured as 260 characters). Typically you run into the
System.IO.PathTooLongException Error with the Standard .NET Library.
I am creating a utility that will store data on flat file in a specific binary format.
I want the filename extension to be specific to my application. Is there any reason other than the old 8.3 filename limit for restricting the extension to 3 characters, and if not, what is the limit? Can I have myfilename.MyExtensionSoHandsOffEverybodyElse ?
This is a hold over from the old windows 3.x/MSDOS days. Today, there are plenty of file names that have more than 3 character extensions.
If I remember correctly, Windows XP had a maximum character limit for path names (including the file name) of 255 characters.
In my experience, having seen a few non-3-character extensions I'd say that it's a matter of tradition, and you're perfectly welcome to use myfilename.MyExtensionSoHandsOffEverybodyElse.
The only good reason for doing this is if you plan to support Windows 9x. If you're only targeting XP and later, as with most projects nowdays, the 8.3 thing is irrelevant.
In fact, Windows itself stores things in long-extension filenames in Vista and later, for example, .search-ms for saved searches.
No, there isn't a good reason to limit the extension to 3 characters. However, a shorter, descriptive name is better if a user has to remember it. For example, most people know what a .html or .doc file would contain.
As long as you make a reasonable attempt to avoid naming collisions with major software there shouldn't be an issue. A corollary to that is the fact that unless you create some insanely long extension that will only ever be unique to your software (and even then, it's not guaranteed), the extension you choose will always be subject to name collision by other people's software when they choose their program's data file extension as you are doing here.