Should I deal with files longer than MAX_PATH? - winapi

Just had an interesting case.
My software reported back a failure caused by a path being longer than MAX_PATH.
The path was just a plain old document in My Documents, e.g.:
C:\Documents and Settings\Bill\Some Stupid FOlder Name\A really ridiculously long file thats really very very very..........very long.pdf
Total length 269 characters (MAX_PATH==260).
The user wasn't using a external hard drive or anything like that. This was a file on an Windows managed drive.
So my question is this. Should I care?
I'm not saying can I deal with the long paths, I'm asking should I. Yes I'm aware of the "\?\" unicode hack on some Win32 APIs, but it seems this hack is not without risk (as it's changing the behaviour of the way the APIs parse paths) and also isn't supported by all APIs .
So anyway, let me just state my position/assertions:
First presumably the only way the user was able to break this limit is if the app she used uses the special Unicode hack. It's a PDF file, so maybe the PDF tool she used uses this hack.
I tried to reproduce this (by using the unicode hack) and experimented. What I found was that although the file appears in Explorer, I can do nothing with it. I can't open it, I can't choose "Properties" (Windows 7). Other common apps can't open the file (e.g. IE, Firefox, Notepad). Explorer will also not let me create files/dirs which are too long - it just refuses. Ditto for command line tool cmd.exe.
So basically, one could look at it this way: a rouge tool has allowed the user to create a file which is not accessible by a lot of Windows (e.g. Explorer). I could take the view that I shouldn't have to deal with this.
(As an aside, this isn't an vote of approval for a short max path length: I think 260 chars is a joke, I'm just saying that if Windows shell and some APIs can't handle > 260 then why should I?).
So, is this a fair view? Should I say "Not my problem"?
UPDATE: Just had another user with the same problem. This time an mp3 file. Am I missing something? How can these users be creating files that violate the MAX_PATH rule?

It's not a real problem. NTFS support filenames up to 32K (32,767 wide characters). You need only use correct API and correct syntax of filenames. The base rule is: the filename should start with '\\?\' (see http://msdn.microsoft.com/en-us/library/aa365247(v=VS.85).aspx) like \\?\C:\Temp. The same syntax you can use with UNC: \\?\UNC\Server\share\Path. Important to understand that you can use only a small subset of API function. For example look at MSDN description of functions
CreateFile
CreateDirectory
MoveFile
and so on
you will find text like :
In the ANSI version of this function,
the name is limited to MAX_PATH
characters. To extend this limit to
32,767 wide characters, call the
Unicode version of the function and
prepend "\?\" to the path. For more
information, see Naming a File.
This functions you can safe use. If you have a file handle from CreateFile you can use all other functions used hFile (ReadFile, WriteFile etc.) without any restriction.
If you write a program like virus scanner or backup software or some good software running on a server you should write your program so, that all file operations support filenames up to 32K characters and not MAX_PATH characters.

This limitation is baked into a lot of software written in C or C++. Including MSFT code, although they've been chipping away at it. It is only partly a Win32 limitation, it still has a hard upper limit on the length of a file name (not path) through WIN32_FIND_DATA for example. One reason that even .NET has length restrictions. This is not going away any time soon, Win32 is still going strong and the stone-age C string won't disappear.
Your customer will have little sympathy with it, no doubt, probably until you can show them another program that fails the same way. Do however make sure that your code reliably can detect the potential string buffer overflow, followed by a reasonable diagnostic. No sympathy for programs bombing on heap corruption.

As you mentioned many of the Windows Shell functions only work on paths up to MAX_PATH. Windows XP and I believe Vista both have problems in Explorer when nesting directories with long file names. I've not checked Windows 7 - perhaps they have fixed that. This unfortunately means that users have a hard time browsing these file.
If you really wish to support long paths you'll need to check any functions you are using in Shell32.dll that take paths to ensure they support long paths. For those that don't you'll have to use write them yourself using Kernel32 functions.
If you decide to use Shell32 and be limited to MAX_PATH, writing your code to support long file paths would be advisable. If Microsoft later change Shell32 (or create an alternative), you will be better positioned to add support for them.
Just to add another couple of dimensions to the problem, remember that filenames are UTF-16, and you may encounter non NTFS or FAT filesystems that may be case sensitive!

Your own APIs should not hard-code a fixed limit on the path length (or any other hard limits); however, you shouldn't violate the preconditions of the system APIs in order to accomplish some task. IMHO, the fact that Windows limits the length of path names is absurd and should be considered a bug. That said, no I would suggest you not attempt to use the various system APIs other than as documented, even if that results in certain undesireable behavior such as limits to the maximum path length. So, in short, your view is completely fair; if the OS doesn't support it, then the OS doesn't support it. That said, you may want to make it clear to users that this is a limitation of Windows and not of your own code.

One easy way how these files with long paths could be created even by software that does not support paths longer than MAX_PATH: through a file share.
Example:
"C:\My veeeeeeeeeeeeeeeeeeeeery looooooooooooooooooong folder" could be shared as "data". Users could then access that folder through the UNC path \\computer\data or (even shorter) through a drive letter (M:\) assuming that M: is mapped to \\computer\data.
This often happens on file servers.

Paths often can be bigger than 260, one example would be when symlinks get nested and repeat over and over sometimes even on purpose. I think programmers should think about whether they want their program to handle these insanely large paths or not. IMO, 260 is PLENTY of space but thats just me. My answer to this is:
if you have to ask yourself so deeply about breaking the 260 char limit, then thats probably what you should do. We often look for confirmation when we are about to do something that we are unsure about...
I think the maximum path anywhere in the API is about 32k long but thats up to you. Back in the day that was a pretty big chunk of change (half of an entire memory segment!! sheesh!) but nowdays, in the segment-transparent addressing environment in which we live, where all memory is heaped together on the flat, 32k is nothin'... AFAIK paths wouldn't need to be that long unless you are using some fancy unicode language that requires lots of other characters, etc, etc.. we could blab about this all day but you get the idea. I hope this helps..... or hurts?

I am doung some C programming and I was searching for a way to get the maximum length of a given filename, after a search for MAX_PATH I stumbled to this thread and after som thoughts on this matter and after reading the comments on this thread I have come to the following conclusion.
So I understand that NTFS support filenames up to 32.767 characters in length, however, according to knowledge FAT16 only support 11 character filenames, 8 + 3, so in reallity operating systems should have a function which our program can call to dertemine the maximum filename size, simply because all filesystems have different limitations including the length of the filename.
So the end conclusion must be that since us, the developers, don't know anything about which filesystem the data is going to be stored in, so therefore the only solution must be an try and error method.

Not strictly an answer to your specific question, but it might help those who do need to handle long file names.
The Delimon library is a .NET Framework 4 based library on Microsoft TechNet for overcoming the long filenames problem:
Delimon.Win32.I​O Library (V4.0).
It has its own versions of key methods from System.IO. For example, you would replace:
System.IO.Directory.GetFiles
with
Delimon.Win32.IO.Directory.GetFiles
which will let you handle long files and folders.
From the website:
Delimon.Win32.IO replaces basic file functions of System.IO and
supports File & Folder names up to up to 32,767 Characters.
This Library is written on .NET Framework 4.0 and can be used either
on x86 & x64 systems. The File & Folder limitations of the standard
System.IO namespace can work with files that have 260 characters in a
filename and 240 characters in a folder name (MAX_PATH is usually
configured as 260 characters). Typically you run into the
System.IO.PathTooLongException Error with the Standard .NET Library.

Related

Allocate file on NTFS without zeroing

I want to make a tool similar to zerofree for linux. I want to do it by allocating a big file without zeroing it, look for nonzero blocks and rewrite them.
With admin privileges it is possible, uTorrent can do this: http://www.netcheif.com/Articles/uTorrent/html/AppendixA_02_12.html#diskio.no_zero , but it's closed source.
I am not sure this answers your question (need), but such a tool already exists. You might have a look at fsutil.exe Fsutil command line tool. This tool has a huge potential to discover the internal structures of NTFS files and can also create file of any size (without the need to zeroing it manually). Hope that helps.
Wrote a tool https://github.com/basinilya/winzerofree . It uses SetFileValidData() as #RaymondChen suggested
You should try SetFilePointerEx
Note that it is not an error to set the file pointer to a position
beyond the end of the file.
So after you create file, call SetFilePointerEx and then SetEndOfFile or WriteFile or WriteFileEx and close the file, size should be increased.
EDIT
Raymonds suggested SetValidData is also good solution, but this requares privileges, so shouldn't be used often by anyone.
My solution is the best on NTFS, because it supports feature known as initialized size it means that after using SetFilePointerEx data won't be initialized to zeros, but after attempt to read uninitialized data you will receive zeros.
To sum up, if NTFS use SetFilePointerEx, if FAT (not very likely) - use SetValidData

Why does the Win32 MAX_PATH limitation still exist? [duplicate]

I have come up against this problem a few times at inopportune moments:
Trying to work on open source Java projects with deep paths
Storing deep Fitnesse wiki trees in source control
An error trying to use Bazaar to import my source control tree
Why does this limit exist?
Why hasn't it been removed yet?
How do you cope with the path limit?
And no, switching to Linux or Mac OS X is not a valid answer to this question ;)
Quoting this article https://learn.microsoft.com/en-us/windows/desktop/FileIO/naming-a-file#maximum-path-length-limitation
Maximum Path Length Limitation
In the Windows API (with some exceptions discussed in the following paragraphs), the maximum length for a path is MAX_PATH, which is defined as 260 characters. A local path is structured in the following order: drive letter, colon, backslash, name components separated by backslashes, and a terminating null character. For example, the maximum path on drive D is "D:\some 256-character path string<NUL>" where "<NUL>" represents the invisible terminating null character for the current system codepage. (The characters < > are used here for visual clarity and cannot be part of a valid path string.)
Now we see that it is 1+2+256+1 or [drive][:\][path][null] = 260. One could assume that 256 is a reasonable fixed string length from the DOS days. And going back to the DOS APIs we realize that the system tracked the current path per drive, and we have 26 (32 with symbols) maximum drives (and current directories).
The INT 0x21 AH=0x47 says “This function returns the path description without the drive letter and the initial backslash.” So we see that the system stores the CWD as a pair (drive, path) and you ask for the path by specifying the drive (1=A, 2=B, …), if you specify a 0 then it assumes the path for the drive returned by INT 0x21 AH=0x15 AL=0x19. So now we know why it is 260 and not 256, because those 4 bytes are not stored in the path string.
Why a 256 byte path string, because 640K is enough RAM.
This is not strictly true as the NTFS filesystem supports paths up to 32k characters. You can use the win32 api and "\\?\" prefix the path to use greater than 260 characters.
A detailed explanation of long path from the .Net BCL team blog.
A small excerpt highlights the issue with long paths
Another concern is inconsistent behavior that would result by exposing long path support. Long paths with the \\?\ prefix can be used in most of the file-related Windows APIs, but not all Windows APIs. For example, LoadLibrary, which maps a module into the address of the calling process, fails if the file name is longer than MAX_PATH. So this means MoveFile will let you move a DLL to a location such that its path is longer than 260 characters, but when you try to load the DLL, it would fail. There are similar examples throughout the Windows APIs; some workarounds exist, but they are on a case-by-case basis.
The question is why does the limitation still exist. Surely modern Windows can increase the side of MAX_PATH to allow longer paths. Why has the limitation not been removed?
The reason it cannot be removed is that Windows promised it would never change.
Through API contract, Windows has guaranteed all applications that the standard file APIs will never return a path longer than 260 characters.
Consider the following correct code:
WIN32_FIND_DATA findData;
FindFirstFile("C:\Contoso\*", ref findData);
Windows guaranteed my program that it would populate my WIN32_FIND_DATA structure:
WIN32_FIND_DATA {
DWORD dwFileAttributes;
FILETIME ftCreationTime;
FILETIME ftLastAccessTime;
FILETIME ftLastWriteTime;
//...
TCHAR cFileName[MAX_PATH];
//..
}
My application didn't declare the value of the constant MAX_PATH, the Windows API did. My application used that defined value.
My structure is correctly defined, and only allocates 592 bytes total. That means that i am only able to receive a filename that is less than 260 characters. Windows promised me that if i wrote my application correctly, my application would continue to work in the future.
If Windows were to allow filenames longer than 260 characters then my existing application (which used the correct API correctly) would fail.
For anyone calling for Microsoft to change the MAX_PATH constant, they first need to ensure that no existing application fails. For example, i still own and use a Windows application that was written to run on Windows 3.11. It still runs on 64-bit Windows 10. That is what backwards compatibility gets you.
Microsoft did create a way to use the full 32,768 path names; but they had to create a new API contract to do it. For one, you should use the Shell API to enumerate files (as not all files exist on a hard drive or network share).
But they also have to not break existing user applications. The vast majority of applications do not use the shell api for file work. Everyone just calls FindFirstFile/FindNextFile and calls it a day.
From Windows 10. you can remove the limitation by modifying a registry key.
Tip Starting in Windows 10, version 1607, MAX_PATH limitations have been removed from common Win32 file and directory functions. However, you must opt-in to the new behavior.
A registry key allows you to enable or disable the new long path behavior. To enable long path behavior set the registry key at HKLM\SYSTEM\CurrentControlSet\Control\FileSystem LongPathsEnabled (Type: REG_DWORD). The key's value will be cached by the system (per process) after the first call to an affected Win32 file or directory function (list follows). The registry key will not be reloaded during the lifetime of the process. In order for all apps on the system to recognize the value of the key, a reboot might be required because some processes may have started before the key was set.
The registry key can also be controlled via Group Policy at Computer Configuration > Administrative Templates > System > Filesystem > Enable NTFS long paths.
You can also enable the new long path behavior per app via the manifest:
<application xmlns="urn:schemas-microsoft-com:asm.v3">
<windowsSettings xmlns:ws2="http://schemas.microsoft.com/SMI/2016/WindowsSettings">
<ws2:longPathAware>true</ws2:longPathAware>
</windowsSettings>
</application>
You can mount a folder as a drive. From the command line, if you have a path C:\path\to\long\folder you can map it to drive letter X: using:
subst x: \path\to\long\folder
One way to cope with the path limit is to shorten path entries with symbolic links.
For example:
create a C:\p directory to keep short links to long paths
mklink /J C:\p\foo C:\Some\Crazy\Long\Path\foo
add C:\p\foo to your path instead of the long path
You can enable long path names using PowerShell:
Set-ItemProperty -Path 'HKLM:\SYSTEM\CurrentControlSet\Control\FileSystem' -Name LongPathsEnabled -Type DWord -Value 1
Another version is to use a Group Policy in Computer Configuration/Administrative Templates/System/Filesystem:
As to why this still exists - MS doesn't consider it a priority, and values backwards compatibility over advancing their OS (at least in this instance).
A workaround I use is to use the "short names" for the directories in the path, instead of their standard, human-readable versions. So e.g. for C:\Program Files\ I would use C:\PROGRA~1\ You can find the short name equivalents using dir /x.
As to how to cope with the path size limitation on Windows - using 7zip to pack (and unpack) your path-length sensitive files seems like a viable workaround. I've used it to transport several IDE installations (those Eclipse plugin paths, yikes!) and piles of autogenerated documentation and haven't had a single problem so far.
Not really sure how it evades the 260 char limit set by Windows (from a technical PoV), but hey, it works!
More details on their SourceForge page here:
"NTFS can actually support pathnames up to 32,000 characters in
length."
7-zip also support such long names.
But it's disabled in SFX code. Some users don't like long paths, since
they don't understand how to work with them. That is why I have
disabled it in SFX code.
and release notes:
9.32 alpha 2013-12-01
Improved support for file pathnames longer than 260 characters.
4.44 beta 2007-01-20
7-Zip now supports file pathnames longer than 260 characters.
IMPORTANT NOTE: For this to work properly, you'll need to specify the destination path in the 7zip "Extract" dialog directly, rather than dragging & dropping the files into the intended folder. Otherwise the "Temp" folder will be used as an interim cache and you'll bounce into the same 260 char limitation once Windows Explorer starts moving the files to their "final resting place". See the replies to this question for more information.
It does, and it is a default for some reason, but you could easily override it with this registry key:
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\FileSystem]
"LongPathsEnabled"=dword:00000001
See: https://blogs.msdn.microsoft.com/jeremykuhne/2016/07/30/net-4-6-2-and-long-paths-on-windows-10/
Another way to cope with it is to use Cygwin, depending on what do you want to do with the files (i.e. if Cygwin commands suit your needs)
For example it allows to copy, move or rename files that even Windows Explorer can't. Or of course deal with the contents of them like md5sum, grep, gzip, etc.
Also for programs that you are coding, you could link them to the Cygwin DLL and it would enable them to use long paths (I haven't tested this though)

On Windows, what is the maximum file name length considered acceptable for an app to output? (Updated and clarified)

Many Windows applications (e.g., almost all .NET apps) cannot open paths more than 260 characters in length. I am batch renaming a list of podcast files. I want to name each file after the title of the episode, but the titles are up to 100 characters long. This means that if a user saves the file in a deep directory with a very long path, they may hit the limit and be unable to open the file in those other applications.
Is it acceptable for my program to put out file names this long, and leave it to the user to deal with very long paths when it comes up? iTunes crops at 40 chars, but that seems very conservative.
Thanks to Ben Voigt for clarifying that this only applies to certain apps.
Windows does NOT have a limit of 255 characters for file paths.
CreateFileA has a limit of 260 characters. CreateFileW supports names up to about 32760 characters (64kB).
Some filesystems impose additional limits on the maximum directory nesting level, or the maximum length of each part.
You're probably thinking of certain popular Windows programs that have a 255 character limit, but accommodating those with a warning or user-configurable setting is probably more appropriate than adding your own hard limit.
Well I remember having an USB that didn't support more than 32 characters in the filename (can't remember which FS it had). I also just found this link on Google: http://msdn.microsoft.com/en-us/library/windows/desktop/aa365247(v=vs.85).aspx
Character count limitations can also be different and can vary depending on the file system and path name prefix format used. This is further complicated by support for backward compatibility mechanisms. For example, the older MS-DOS FAT file system supports a maximum of 8 characters for the base file name and 3 characters for the extension, for a total of 12 characters including the dot separator. This is commonly known as an 8.3 file name. The Windows FAT and NTFS file systems are not limited to 8.3 file names, because they have long file name support, but they still support the 8.3 version of long file names.

Is it possible to create a file that cannot be copied?

To restrict the scope, let assume we are in Windows world only.
Also assume we don't want to play with permission policy.
Is it possible for us to create a file that cannot be copied?
Thank you in advance.
"Trying to make digital files uncopyable is like trying to make water not wet." ~ Bruce Schneier
No. You can't create a file that a SYSADMIN can't copy. You could encrypt it, though.
Well, how about creating a file that uses up more than 50% of the total space on that machine and that is not compressible?
For instance, let us assume that you want to save a boolean (true or false) in such a fashion.
Depending on its value, you could then write a bit stream of ones or zeroes and encrypt said stream using some kind of encryption algorith, such as AES in CBC mode. This gives you the added advantage of error correction. Even in case of massive data corruption, you should be able to recover your boolean by checking whether ones or zeroes are prevalent in the decrypted stream.
In that case you cannot copy it around (completely) on the machine...
Of course, any type of external memory that can be added to the system would pose a problem in this scenario. But the file would be already encrypted, so don't worry about it too much...
Any file that can be read can have its contents written to another location (such as another file, i.e. copied).
The only thing you can do is limit who/what can read the file.
What is the motivation behind? If it is a read-only file, you can have it as embedded resources within your assembly.
Nice try, RIAA.
But seriously, no you can not. It is always possible to copy, you can just make it it more difficult for people to make sense of the file or try to hide it using like encryption. Spotify does it.
If you really try hard thou, you cold make a root-kit for windows and use it to prevent windows from even knowing about the file and also prevent copies. The file will still be there and copy-able by other tools, or Linux accessing the ntfs.
If in a running process you open a file and hold an exclusive lock, then other processes cannot read the file until you close the handle or your process terminates. However, as admin you could forcibly remove the lock handle.
Short answer: No.
You can, of course, use security settings to limit who can read the file. But if someone can read it, then they can copy it. Even if you found some operating system trick to disable "ordinary" copying, if someone can read the file, they can extract the contents, store it in memory, and then write it somewhere else.
You can encrypt the contents so it's only useful to your own program, that knows how to decrypt it.
That's about it.
When using Windows 7 to copy some files from a hard drive, certain files popped up a message saying they could not be copied in their entirety; certain data would be omitted from the copy. I suspect that had something to do with slack space at the end of the files, though I thought the message was curious. I would have expected the copy operation to just ignore the slack space.
If you are running old (OLD) versions of windows, there are certain characters you can put in the filename that make it invalid, not listed in folders, etc. They were used a lot in the old pub ftp days of filesharing ;)
In the old DOS days, you used to be able to flag disk sectors as bad and still read from them. This meant the OS ignored the sector in question but your application would know where to look and be able to get the data. Not sure this would work these days.
Another old MS-DOS trick was to put a space character in the middle of the filename (yes, spaces were valid characters for filenames). Since there was no method on the command line to escape a space, the file couldn't be copied using the DOS commands.
This answer is outside Windows so yeah
Dont know if its already been said but what about a file that is an inseperable part of the firmware so that it is always on AND running, perhaps it has firmware that generates a sequence that is required for the other . AN incedental effect of its running is to prevent any 80% or more of its code from being replicated. Lets say its on an entirely different board, protected by surge protectors, heavy em proof shielding and anything else required to make it completely unerasable.
If its possible to make a program that is ALWAYS on and running as long as the copying software is running then yes.
I have another way and this IS with windows. I will come to your house and give you a disk, i will then proceed to destroy every single computer you put the disk into. This doesnt work on XP
Well technically you could create and write to a write-only network share.

Is there a good reason to limit Windows filename extentions to three characters?

I am creating a utility that will store data on flat file in a specific binary format.
I want the filename extension to be specific to my application. Is there any reason other than the old 8.3 filename limit for restricting the extension to 3 characters, and if not, what is the limit? Can I have myfilename.MyExtensionSoHandsOffEverybodyElse ?
This is a hold over from the old windows 3.x/MSDOS days. Today, there are plenty of file names that have more than 3 character extensions.
If I remember correctly, Windows XP had a maximum character limit for path names (including the file name) of 255 characters.
In my experience, having seen a few non-3-character extensions I'd say that it's a matter of tradition, and you're perfectly welcome to use myfilename.MyExtensionSoHandsOffEverybodyElse.
The only good reason for doing this is if you plan to support Windows 9x. If you're only targeting XP and later, as with most projects nowdays, the 8.3 thing is irrelevant.
In fact, Windows itself stores things in long-extension filenames in Vista and later, for example, .search-ms for saved searches.
No, there isn't a good reason to limit the extension to 3 characters. However, a shorter, descriptive name is better if a user has to remember it. For example, most people know what a .html or .doc file would contain.
As long as you make a reasonable attempt to avoid naming collisions with major software there shouldn't be an issue. A corollary to that is the fact that unless you create some insanely long extension that will only ever be unique to your software (and even then, it's not guaranteed), the extension you choose will always be subject to name collision by other people's software when they choose their program's data file extension as you are doing here.

Resources