I have searched for an answer to this, but unfortunately I have had little luck in finding any information on it!
In essence: What is the set of valid names for a memory mapped file in windows?
How long can they be?
What are legal characters, e.g. are forward slashes, hyphens, punctuation, etc. legal?
Are there limitations on character ordering, e.g. an mmf name cannot start with an underscore?
EDIT: I realize that the answer to this question might be "They are exactly the same as normal file naming conventions in Windows." However it is important that this be clarified.
MemoryMappedFile.CreateNew corresponds to CreateFileMapping. The documentation for CreateFileMapping says
The name can have a "Global\" or "Local\" prefix to explicitly create
the object in the global or session namespace. The remainder of the
name can contain any character except the backslash character (\).
Creating a file mapping object in the global namespace from a session
other than session zero requires the SeCreateGlobalPrivilege
privilege. For more information, see Kernel Object Namespaces.
In other words, you can use any string you like as long as it doesn't contain a backslash.
Related
I had to help someone delete a folder that had weird characters in it that caused the path to be re-interepeted as a different path:
c:\test. -> c:\test
It took me a while to recall the \\?\ construct, since I have no idea what it's called or how to search for it. Once I remembered it, though, it was easy:
\\?\c:\test. -> c:\test.
What is the name of this construct, that I (and others) may search for it?
I don't think it has an official name in widespread use, so I doubt that you'll get very far in any searches. It is described here: https://msdn.microsoft.com/en-gb/library/windows/desktop/aa365247.aspx#maxpath
The Windows API has many functions that also have Unicode versions to permit an extended-length path for a maximum total path length of 32,767 characters. This type of path is composed of components separated by backslashes, each up to the value returned in the lpMaximumComponentLength parameter of the GetVolumeInformation function (this value is commonly 255 characters). To specify an extended-length path, use the "\\?\" prefix. For example, "\\?\D:\very long path".
For this usage it might be called the extended-length path prefix. However the prefix serves other purposes, most specifically suppressing user mode path canonicalisation, the purpose that you were availing yourself of.
As you can see from comments to this answer, there are lots of varied opinions on the most suitable name. I think that we can all agree that there is no single officially used name for this thing!
I'm reading this. Here I've found some code lines, for example: wsprintf(szDrive, "\\\\.\\%c:", *lpszSrc); I want to ask, what does this string give?
I tried to look for information but all that I've found is:
In the ANSI version of this function, the name is limited to MAX_PATH
characters. To extend this limit to 32,767 wide characters, call the
Unicode version of the function and prepend "\\?\" to the path. For
more information, see Naming Files, Paths, and Namespaces.
and this do not answer into my question, so asking here. As I think it should be connected with windows specific or NTFS but not sure about that.
The %c is the single character format specifier for wsprintf.
The code is used to generate path names of this form:
\\.\C:
This is the path to a physical volume. You use such a path when performing file operations directly on a volume, bypassing the file system. So you'd use such a path when implementing raw disk copy, for example. The documentation for CreateFile has more detail.
This all ties in with the fact that the code you found this in performs a raw disk copy.
I'm working on maintenance of an application that transfers a file to another system and uses a structured filename to include meta data including a language code. The current app uses a two character language code and a dash/hyphen for a delimiter.
Ex. Canada-EN-ProdName-ProdCode.txt
I'm converting it to use IETF language code and so the dash delimiter won't do and need a replacement. I'm trying to determine a delimiter to avoid future errors and am considering the tilde ~.
Ex. Canada~en-GB~ProdName~ProdCode.txt
This will be use only on Windows Sever 2003 + systems. I certainly didn't come up with this system of parsing a filename to get meta data. Unfortunately, I can't include this in the file itself and the destination system is expecting the language code to be in IETF format with the dash.
Any thoughts on potential issues with using the tilde in the filename, or perhaps a better character to use? I'm just looking for a second opinion in case I'm overlooking a possible failure. I believe windows will use the tilde when shortening a long filename to 8.3 format, but I don't see that as an issue here as the OSs can handle lang filenames.
The tilde is probably fine, but what's wrong with the good old underscore _ ? It has no special meaning on either windows or unix, and makes names that are relatively easy to read. If there are no other special considerations, I would avoid the tilde solely out of paranoia, since windows does use it as a special character sometimes, as you mentioned.
For anyone readiong this question I would strongly recommend anything but the tilde in the file name or at least be careful in testing for any speed problems with any .NET path work where one exists.
I used this as a file name delimiter some time ago. I couldn't understand why simply getting a list of files from the folders was taking so long. It was a number of years later (having written a lot of speed up code that had marginal advantage) that I discovered there is a problem with the (DirectoryInfo(path).name in .NET at least) where simple existience of the tilde was forcing underlying code to through a lot of hoops.
The slow down was substantial (it was over a network so I had thought it was bandwidth/Network issues for a fair while)
I understand this is a legacy overhang for when alternative short versions of filenames could be used for Windows files.
I am now stuck with the tilde in these file names but, given that the problem lay in some of the .NET path functions (I don't actually know if it still does), I could work around it by spotting a tilde and creating my own answers when it existed rather than passing it through.
If in any doubt just run speed tests with and without the tilde in filenames for say just 500-1,000 files.
PLEASE don't tell me why you think its a bad idea. Just tell me if its a workable idea.
I want to create files in a folder with names like the following:
asdf#qwerty.com.eml
abc+def#asdf.net.eml
abc_def#sasdf.at.eml
Is there some fundamental incompatibility in the characters allowed in email addresses and those allowed by a unix system?
I will be having a bash script reading the file names, subtracting the ".eml" ending, converting it into the "correct" usable format and sending an email to the address.
A simple test showed that it saved the above as files called
asdf\#qwerty.com.eml
abc+def\#asdf.net.eml
abc_def\#sasdf.at.eml
The only characters not allowed in a *nix filename are \0 and /, neither of which is allowed in an email address anyways. How your shell may handle symbols is another matter.
There are no characters disallowed in UNIX file names except / (directory separator) and ASCII 0 (string terminator), so there is no problem at a fundamental level.
Handling those file names in shell scripts is a different matter; it requires at least quoting every variable reference as "$FILENAME", and forgetting even one quotatino will create a very rare, insidious bug. (Also, many other utilities will fail on strange characters such as | or newline unless you consistently use the -0 option.)
So yes, technically your bad idea is workable :-)
Short answer:
przemek#linux-634b:~/tmp/email> touch john.smith#example.com
przemek#linux-634b:~/tmp/email> ls
john.smith#example.com
Works perfectly;)
Long answer:
It depends on filesystem you're using. See Wikipedia entry which lists allowed characters in file names. Most UNIX file systems support all characters that can be included in e-mail addresses. Non-UNIX filesystems, such as FAT, however, may cause problems.
Note that your problems may come from improper escaping. Check how are you creating your files.
What was your "simple test"?
Typing abc and hitting tab?
The bash autocompletion will add a \ before every special character.
But this does not mean, it is stored with a \ in its name.
Use ls to see the true name.
There is no problem with such file names on systems which treat file names as blobs and allow all byte sequences, i.e. Linux. But they are not portable to systems which treat file names as Unicode strings and disallow certain characters (Windows) or transform file names (Mac OS X, canonical decomposition).
More specifically, what is the authoritative source for that information?
This may look like a non-programming question, but I need to know whether a registry path fed to my code contains a regular expression or not. I decided the best way to do that is assume that any occurrence of an invalid character (like '*') means a wildcard search.
For allowable key and value names, see the MSDN page on Structure of the Registry. In particular:
Each key has a name consisting of one or more printable characters.
Key names are not case sensitive. Key names cannot include the
backslash character (\), but any other printable character can be
used. Value names and data can include the backslash character.
Registry value types are explained in detail on MSDN here, in case you need to know the allowable values.
For all things Windows, MSDN has to be the authoritative source -- the article on Registry Element Size Limits implies Unicode is good and Structure of the Registry says that backslash and non-printable characters are disallowed in key names. Values merely have to be entirely printable characters.
Just did an experiment with the Windows 7 registry: programmatically creating a key name with the 01 Hex (ASCII SOH) character in front of the word 'TEST' (in Delphi that is the string: #1'Test'). This is something that REGEDIT will not allow you to do by typing - even with ALT-Keypad operations.
Not only did it create the key, it showed the key in REGEDIT as having a 'wide' space where the #1 character resided.
Copying and pasting this new subkey name into TEXTPAD allowed me to verify that it was indeed still a #1 character.
I've never read anywhere that #1 is deemed 'printable', but in Windows anything other than 00 Hex can be put into a print string and literally anything can be sent to a printer, so I guess the MSDN statement about this limitation is an oxymoron: because in Windows being a character implies being printable, ergo unprintable character becomes ...well, meaningless.
Whilst you cannot type that #1 character directly into REGEDIT as a keyname (using the ALT-keypad-number entry method), you can nontheless paste it back from TEXTPAD to REGEDIT as part of a rename-operation. REGEDIT will even complain if you paste it to rename another peer subkey to your original one because the 'specified key already exists'.
Interestingly, I also experimented with the character #256 (which is no-longer ASCII, but is theoretically a Unicode Widechar, but not necessarily one deemed as "printable" if any parts of the input, storage or output mechanisms reject it).
Whilst I could create such a key programmatically, and see a strange looking 'A' in REGEDIT, it became somewhat less reliable in cut-and-paste. I'm guessing that the clipboard operations and interactions with different applications make this sort of thing a very dubious practice since TEXTPAD, for instance, might be making assumptions about whether you were pasting byte characters or wide characters that don't quite match what REGEDIT put into the clipboard - and vice-versa. If the code behind these operations are just expecting ANSI strings or UTF-16 Wide-Strings, and are being given something different, including byte-order differences and UTF-8 or similar differences that they were not expecting, then things are very likely to go wrong.
Finally, I experimented with an attempt to inject a widechar with order 0FFFF hex. That did not actually give any visual presence of the character in REGEDIT - how "unprintable" is that, then?. But the name did include the invisible character. I confirmed this by actually trying to create a separate peer subkey in REGEDIT without the offending character and as a result obtained what visually looked like two identical keys!
So in summary: It seems that you can put literally any character into a subkey name as long as it isn't a '\'. But it probably is not a very good idea to do so. And I think the term 'unprintable' in Windows generally only applies to 00 hex - and that is because it is usually used as a string terminator and therefore is a little bit difficult to 'send' through the registry API as a character!
What is quite worrying is the ability that this gives hackers to confuse and mislead. You could quite literally create a whole raft of registry subkeys that appear to have no names at all and can only be meaningfully used by applications, not humans. Yes, you could do that with space-characters, but some unicode characters (like FFFFh) have no width, and you can use any number of them together to create a unique and invisible name, or parts in a name! This makes them almost impossible to detect without using a laborious cut-and-paste, or a dedicated automated tool. In REGEDIT they all just look like identically named, or indeed unnamed, keys.