After years of programming it's still some of the simple things that keep tripping me up.
Is there a commonly agreed definition of filename ?
Even the wikipedia article confuses the two interpretations.
It starts by defining it as 'a special kind of string used to uniquely identify a file stored on the file system of a computer'. That seems clear enough, and suggests that a filename is a fully qualified filename, specifying the complete path to the file.
However, it then goes on to:
talk about basename and extension (so basename would contain an absolute path ?)
says that the length of a filename in DOS is limited to 8.3
says that a filename without a path part is assumed to be a file in the current working directory (so the filename does not uniquely identify a file)
So, simple questions:
what is a correct definition of 'filename' (include references)
how should I unambiguously name variables for:
a path to a file (which can be absolute/full or relative)
a path to a resource that can be a file/directory/socket
No references, just vernacular from experience. When I'm being specific I tend to use:
path or filespec (or file specification): all of the characters needed to identify a file on a filesystem. The path may be absolute (starting from the root, or topmost, directory) or relative (starting from the currently active directory).
filename: the characters needed to identify a file within the current directory.
extension: characters at the end of the filename that typically identify the type of the file. By convention, the extension usually starts with a dot ("."), and a filename may contain more than one extension.
basename: the filename up to (but not including) the dot that begins the first extension.
Javadoc for File.getName() method
file·name also file name
(fīl'nām') Pronunciation Key n. A
name given to a computer file to
distinguish it from other files, often
containing an extension that
classifies it by type.
# Dictionary.com
It states that a filename is used to name a file, (just like you name a person). And that it's used to distinguish it from other files. This does not tell you it includes a path, or other file-system imposed attributes.
This definition does say that often a filename has an extension. But this definition is very careful... (Which I think is a good thing)
So.. before you start thinking about paths and such, you have to set your scope. Are you in a unix world? Ar you in a dos/windows world?
Again no references, but the file name specification depends on the operating system or to be more accurate the file system. Lets start with early versions of DOS (Disk Operating System). File names were 8 character names containing numbers, letters, dashes, and underscores. They were followed by a three, two, one, or even zero character extension used to identify the file type. A dot separated the name from the extension. The name had to be unique in the directory.
You could extend the name by adding a directory name, or series of directory names. a slash character separated the directory names from each other and from the file name. This was usually referred to as the path name. The path was relative to current directory.
Finally in DOS you could include the drive name. Usually a single letter followed by a : and a slash (some systems two slashes). Adding the drive to the path made it an absolute path instead of relative.
Today most of us use long file names which do not follow the old 8 character dot three character pattern. Still many files systems keep such as name and use the long name simply as a pointer to old style identifier.
Related
I had to help someone delete a folder that had weird characters in it that caused the path to be re-interepeted as a different path:
c:\test. -> c:\test
It took me a while to recall the \\?\ construct, since I have no idea what it's called or how to search for it. Once I remembered it, though, it was easy:
\\?\c:\test. -> c:\test.
What is the name of this construct, that I (and others) may search for it?
I don't think it has an official name in widespread use, so I doubt that you'll get very far in any searches. It is described here: https://msdn.microsoft.com/en-gb/library/windows/desktop/aa365247.aspx#maxpath
The Windows API has many functions that also have Unicode versions to permit an extended-length path for a maximum total path length of 32,767 characters. This type of path is composed of components separated by backslashes, each up to the value returned in the lpMaximumComponentLength parameter of the GetVolumeInformation function (this value is commonly 255 characters). To specify an extended-length path, use the "\\?\" prefix. For example, "\\?\D:\very long path".
For this usage it might be called the extended-length path prefix. However the prefix serves other purposes, most specifically suppressing user mode path canonicalisation, the purpose that you were availing yourself of.
As you can see from comments to this answer, there are lots of varied opinions on the most suitable name. I think that we can all agree that there is no single officially used name for this thing!
I found a reference to a file in a log that had the following format:
\\?\C:\Path\path\file.log
I cannot find a reference to what the sequence of \?\ means. I believe the part between the backslashes refers to a hostname.
For instance, on my Windows computer, the following works just fine:
dir \\?\C:\
and also, just fine with same result:
dir \\.\C:\
Questions:
Is there a reference to what the question mark means in this particular path format?
What might generate a file path in such a format?
A long read, but worth reading if you are in this domain: http://msdn.microsoft.com/en-us/library/windows/desktop/aa365247%28v=vs.85%29.aspx
Extract:
The Windows API has many functions that also have Unicode versions to
permit an extended-length path for a maximum total path length of
32,767 characters. This type of path is composed of components
separated by backslashes, each up to the value returned in the
lpMaximumComponentLength parameter of the GetVolumeInformation
function (this value is commonly 255 characters). To specify an
extended-length path, use the "\\?\" prefix. For example,
"\\?\D:\very long path".
and:
The "\\?\" prefix can also be used with paths constructed according to
the universal naming convention (UNC). To specify such a path using
UNC, use the "\\?\UNC\" prefix. For example, "\\?\UNC\server\share",
where "server" is the name of the computer and "share" is the name of
the shared folder. These prefixes are not used as part of the path
itself. They indicate that the path should be passed to the system
with minimal modification, which means that you cannot use forward
slashes to represent path separators, or a period to represent the
current directory, or double dots to represent the parent directory.
Because you cannot use the "\\?\" prefix with a relative path,
relative paths are always limited to a total of MAX_PATH characters.
The Windows API parses input strings for file I/O. Among other things, it translates / to \ as part of converting the name to an NT-style name, or interpreting the . and .. pseudo directories. With few exceptions, the Windows API also limits path names to 260 characters.
The documented purpose of the \\?\ prefix is:
For file I/O, the "\\?\" prefix to a path string tells the Windows APIs to disable all string parsing and to send the string that follows it straight to the file system.
Among other things, this allows using otherwise reserved symbols in path names (such as . or ..). Opting out of any translations, the system no longer has to maintain an internal buffer, and the arbitrary limit of 260 characters can also be lifted (as long as the underlying filesystem supports it). Note, that this is not the purpose of the \\?\ prefix, rather than a corollary, even if the prefix is primarily used for its corollary.
I'm reading this. Here I've found some code lines, for example: wsprintf(szDrive, "\\\\.\\%c:", *lpszSrc); I want to ask, what does this string give?
I tried to look for information but all that I've found is:
In the ANSI version of this function, the name is limited to MAX_PATH
characters. To extend this limit to 32,767 wide characters, call the
Unicode version of the function and prepend "\\?\" to the path. For
more information, see Naming Files, Paths, and Namespaces.
and this do not answer into my question, so asking here. As I think it should be connected with windows specific or NTFS but not sure about that.
The %c is the single character format specifier for wsprintf.
The code is used to generate path names of this form:
\\.\C:
This is the path to a physical volume. You use such a path when performing file operations directly on a volume, bypassing the file system. So you'd use such a path when implementing raw disk copy, for example. The documentation for CreateFile has more detail.
This all ties in with the fact that the code you found this in performs a raw disk copy.
I want to pragmatically create a folder hierarchy. But the problems is in some cases the folder name comes beyond 260 characters and the folder creation getting failed. I have created this folder hierarchy using Win32 File Namespaces.
I want to create a folder structure in the following format. DRIVE_LETTER:\FOLDER1\FOLDER2\FOLDER3\FOLDER4........\FOLDER(N-1)\FOLDER(N)
FOLDER1, FOLDER2, FOLDER3 etc are names of the folder. These names are of length more than 260 characters
for eg:
FOLDER1 name is qwertyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnm
FOLDER2 name is mnbvcxzlkjhgfdsapoiuytrewqmnbvcxzlkjhgfdsapoiuytrewqmnbvcxzlkjhgfdsapoiuytrewqmnbvcxzlkjhgfdsapoiuytrewqmnbvcxzlkjhgfdsapoiuytrewqmnbvcxzlkjhgfdsapoiuytrewqmnbvcxzlkjhgfdsapoiuytrewqmnbvcxzlkjhgfdsapoiuytrewqmnbvcxzlkjhgfdsapoiuytrewqmnbvcxzlkjhgfdsapoiuytrewqmnbvcxzlkjhgfdsapoiuytrewqmnbvcxzlkjhgfdsapoiuytrewq
like this will go.
How can I over come this folder name/file name name length constraint.
The OS : Windows 7 64 bit and Windows Server 2008 R2 64 bit.
Please help
MSDN's CreateDirectory function explains you exactly this:
To extend this limit to 32,767 wide characters, call the Unicode
version of the function and prepend \\?\ to the path. For more
information, see Naming a File.
See also: Should I deal with files longer than MAX_PATH?
NTFS support filenames up to 32K (32,767 wide characters). You need
only use correct API and correct syntax of filenames. The base rule
is: the filename should start with \\?\ like \\?\C:\Temp. The same
syntax you can use with UNC: \\?\Server\share\Path.
You can use one of these two tricks:
To create a folder structure with length of path more than 260 characters, like C:\folder1\folder2\...\folder20, you can create C:\folder19, C:\folder20 and then move folder20 with all its subfolders into C:\folder19, then create C:\folder18 and move C:\folder19 with folder20 inside C:\folder18. Repeat until you finished creating this structure.
You can use the \\?\C:\folder1\folder2\...\folder20 notation to create your path. More information is here: http://msdn.microsoft.com/en-us/library/windows/desktop/aa365247(v=vs.85).aspx (search for words Maximum Path Length Limitation).
A single path component (e.g. a folder name or file name) is limited by the value of MaximumComponentLength returned by GetVolumeInformation. This is in theory filesystem-specific but in practice is always 255.
So, you can't do what you asked unless you make your own filesystem driver which supports longer file components. What you can do, however is to create a path with total length longer than 260 characters, as has been pointer in other answers.
If I wanted to create a string which is guaranteed not to represent a filename, I could put one of the following characters in it on Windows:
\ / : * ? | < >
e.g.
this-is-a-filename.png
?this-is-not.png
Is there any way to identify a string as 'not possibly a file' on Linux?
There are almost no restrictions - apart from '/' and '\0', you're allowed to use anything. However, some people think it's not a good idea to allow this much flexibility.
An empty string is the only truly invalid path name on Linux, which may work for you if you need only one invalid name. You could also use a string like "///foo", which would not be a canonical path name, although it could refer to a file ("/foo"). Another possibility would be something like "/dev/null/foo", since /dev/null has a POSIX-defined non-directory meaning. If you only need strings that could not refer to a regular file you could use "/" or ".", since those are always directories.
Technically it's not invalid but files with dash(-) at the beginning of their name will put you in a lot of troubles. It's because it has conflicts with command arguments.
I personally find that a lot of the time the problem is not Linux but the applications one is using on Linux.
Take for example Amarok. Recently I noticed that certain artists I had copied from my Windows machine where not appearing in the library. I check and confirmed that the files were there and then I noticed that certain characters in the folder names (Named for the artist) were represented with a weird-looking square rather than an actual character.
In a shell terminal the filenames look even stranger: /Music/Albums/Einst$'\374'rzende\ Neubauten is an example of how strange.
While these files were definitely there, Amarok could not see them for some reason. I was able to use some shell trickery to rename them to sane versions which I could then re-name with ASCII-only characters using Musicbrainz Picard. Unfortunately, Picard was also unable to open the files until I renamed them, hence the need for a shell script.
Overall this a a tricky area and it seems to get very thorny if you are trying to synchronise a music collection between Windows and Linux wherein certain folder or file names contain funky characters.
The safest thing to do is stick to ASCII-only filenames.