Apparently, Windows (or at least some part of Windows) ignores multiple backslashes in a path and treats them as a single backslash. For example, executing any of these commands from a command prompt or the Run window opens Notepad:
C:\Windows\System32\Notepad.exe
C:\Windows\System32\\Notepad.exe
C:\Windows\System32\\\Notepad.exe
C:\Windows\System32\\\\Notepad.exe
C:\\Windows\\System32\\Notepad.exe
C:\\\Windows\\\System32\\\Notepad.exe
This can even work with arguments passed on the command line:
notepad "C:\Users\username\Desktop\\\\myfile.txt"
Is this behavior documented anywhere? I tried several searches, and only found this SO question that even mentions the behavior.
Note: I am not asking about UNC paths (\\servername), the \\?\ prefix, or the \\" double-quote escape.
Note: I stumbled upon this behavior while working with a batch file. One line in the batch file looked something like this:
"%SOME_PATH%\myapp.exe"
After variable expansion, the command looked like:
"C:\Program Files\Vendor\MyApp\\myapp.exe"
To my surprise, the batch file executed as desired and did not fail with some kind of "path not found" error.
In most cases, Win32 API functions will accept a wide range of variations in the path name format, including converting a relative path into an absolute path based on the current directory or per-drive current directory, interpreting a single dot as "this directory" and two dots as "the parent directory", converting forward slashes into backslashes, and removing extraneous backslashes and trailing periods.
So something like
c:\documents\..\code.\\working\.\myprogram\\\runme.exe..
will wind up interpreted as
c:\code\working\myprogram\runme.exe
Some of this is documented, some is not. (As Hans points out, documenting this sort of workaround legitimizes doing it wrong.)
Note that this applies to the Win32 API, not necessarily to every application, or even every system component. In particular, the command interpreter has stricter rules when dealing with a long path, and Explorer will not accept the dot or double-dot and typically will not accept forward slashes. Also, the rules may be different for network drives if the server is not running Windows.
There is no consequence because you can't even name a file or folder with a backslash. So multiple consecutive backslashes will always be seen as one separator in the path.
Related
I have pattern to search. Say "*.txt".
Now I have some files I do not want to list there. I believe they do not match this pattern.
But on windows, they do.
I know tilde character is used to make short form of legacy 8.3 filename. That is LongFilename.json might be LONGFI~1.JSO. But I did not know they are handled somehow on windows in file search patterns. They are. I cannot find any documentation about what they mean and how to match files my way.
My problem is NOT with short forms. Or I think it is not directly related to it.
I have file "A.txt". Now I wanted temporary file and used "A.txt~". It is unix backup files that is not usually visible. But on windows, they should not have special meaning by itself. Only for my application.
Now I want list of "*.txt" files. Command
dir *.txt
returns to my surprise also all .txt~ files in the same directory. And I do not want them. I use FindFileFirst from Win32 API. I did not find anything about tilde character in documentation. FindFileFirst(".txt", handle) returns also files "A.txt~". Can I use some flag to exclude them? I know I can make special condition, like I have for "." and "..". How does ~ operator work? A.txt~1 is also matched. Is everything after tilde ignored? Is that feature or bug?
I am testing that on Windows 7 Professional, 64 edition, if that changes anything.
FindFirstFile also includes short names for legacy reasons so the pattern *.txt will include anything with an 8.3 representation ending in *.txt which includes *.txtANYTHING , not just the ~ character (see dir /xfor what's being matched against).
You will need to filter in your FindNext enumeration.
If you are searching for .txt files for example, you can use "kind:text" option in windows to exclude txt~ and similar files since they are not a recognized type anymore.
That's something that works on regular windows search. I'm not 100% sure about the API, but it should also be there.
When defining an environment variable (on Windows for me, maybe there is a more general guideline)
set MY_TOOL=C:\DevTools\bin\mytool.exe
if the tool is located on a path with spaces
set MY_TOOL=C:\Program Files (x86)\Foobar\bin\mytool.exe
should the environment variable already contain the necessary spaces?
That is, should it read:
set MY_TOOL="C:\Program Files (x86)\Foobar\bin\mytool.exe"
instead of the above version without spaces?
Note: In light of Joeys answer, I really should narrow this question to the examples I gave. That is, environment variables that contain one single (executable / batch) tool to be invoked by a user or by another batch script.
Maybe the spaces should be escaped differently?
I'd say, do it without quotes and use them everywhere you use the variable:
set MY_TOOL=C:\Program Files (x86)\Foobar\bin\mytool.exe
"%MY_TOOL%" -someoption someargument somefile
Especially if you let the user set the value somewhere I guess this is the safest option, since they usually tend not to surround it with quotes rather than do so.
If there are plenty of places where you use the variable you can of course redefine:
set MY_TOOL="%MY_TOOL%"
which makes things more resilient for you. Optionally you could detect whether there are quotes or not and add them if not present to be totally sure.
When your variable represents only a path to a directory and you want to append file names there, then the "no quotes" thing is even more important, otherwise you'd be building paths like
"C:\Program Files (x86)\Foobar\bin"\mytool.exe
or even:
""C:\Program Files (x86)\Foobar\bin"\my tool with spaces.exe"
which I doubt will parse correctly.
The command shell can answer your question: type C:\Pro and hit the tab key.
Autocomplete will leave all spaces as-is and add quotes around the filename. So, this is what is "officially" expected.
(this assumes that autocomplete is turned on, I'm not sure whether the default is on or off, but most people have it on anyway, I guess)
I'm trying to pass string to Win32 program from command line so it will be printed without changes.
Why I have to escape
"AAA <BBB#pobox.com>" as """AAA <BBB#pobox.com>"""
but
"AAA <BBB#pobox.com>", (comma included) as "\"AAA ^<BBB#pobox.com^>\","
I see no consistency in escaping rules for windows command line
P.S. I'm trying to generate a .cmd file
Update:
I'm using simple C program for testing that is compiled with gcc, no additional object files linked. If I replace it with perl, rules remain same.
I'm trying to create a general escaping algorithm. It will generate .cmd file which will call perl with output redirect. Currently I have a problem that if string contains odd number of double quotes which are escaped with backslash, output redirect does not function. Same problem is described in the last comment to http://blogs.msdn.com/b/oldnewthing/archive/2010/09/17/10063629.aspx .
If I use "" as escape for ", it splits on space, so it will result it 2 parameters instead of one. Also "" has some artifacts.
In windows there is no one way of getting a command line and parsing it. Mostly programs have generally been left to deal with that themselves.
There is a recent post by Raymond Chen about the CommandLineToArgvW function which mentions various rules about quoting but they'll only apply if the program uses that particular function. http://blogs.msdn.com/b/oldnewthing/archive/2010/09/17/10063629.aspx
In windows the command line is passed to the program unmolested (i.e. no wildcards expanded) and then the program needs to deal with it. The programming language may provide a convenience which does some default argument parsing, and this might use a standard windows function like CommandLineToArgvW but even so the program could opt to read the unadulterated string itself thereby skipping those standards.
This means you need to figure out the rules for the particular program you are trying to script yourself and then use them.
I've just tried those as parameters into one of my own programs, and both versions (with or without the comma) can be escaped in both ways (using either """ or \" to escape the quotes). The only reason I can see that the < and > need to be escaped with ^ in the second version is that as the command line is seeing them as I/O redirections prior to passing them to the application, due to the different way of escaping the string quotes.
PLEASE don't tell me why you think its a bad idea. Just tell me if its a workable idea.
I want to create files in a folder with names like the following:
asdf#qwerty.com.eml
abc+def#asdf.net.eml
abc_def#sasdf.at.eml
Is there some fundamental incompatibility in the characters allowed in email addresses and those allowed by a unix system?
I will be having a bash script reading the file names, subtracting the ".eml" ending, converting it into the "correct" usable format and sending an email to the address.
A simple test showed that it saved the above as files called
asdf\#qwerty.com.eml
abc+def\#asdf.net.eml
abc_def\#sasdf.at.eml
The only characters not allowed in a *nix filename are \0 and /, neither of which is allowed in an email address anyways. How your shell may handle symbols is another matter.
There are no characters disallowed in UNIX file names except / (directory separator) and ASCII 0 (string terminator), so there is no problem at a fundamental level.
Handling those file names in shell scripts is a different matter; it requires at least quoting every variable reference as "$FILENAME", and forgetting even one quotatino will create a very rare, insidious bug. (Also, many other utilities will fail on strange characters such as | or newline unless you consistently use the -0 option.)
So yes, technically your bad idea is workable :-)
Short answer:
przemek#linux-634b:~/tmp/email> touch john.smith#example.com
przemek#linux-634b:~/tmp/email> ls
john.smith#example.com
Works perfectly;)
Long answer:
It depends on filesystem you're using. See Wikipedia entry which lists allowed characters in file names. Most UNIX file systems support all characters that can be included in e-mail addresses. Non-UNIX filesystems, such as FAT, however, may cause problems.
Note that your problems may come from improper escaping. Check how are you creating your files.
What was your "simple test"?
Typing abc and hitting tab?
The bash autocompletion will add a \ before every special character.
But this does not mean, it is stored with a \ in its name.
Use ls to see the true name.
There is no problem with such file names on systems which treat file names as blobs and allow all byte sequences, i.e. Linux. But they are not portable to systems which treat file names as Unicode strings and disallow certain characters (Windows) or transform file names (Mac OS X, canonical decomposition).
If I wanted to create a string which is guaranteed not to represent a filename, I could put one of the following characters in it on Windows:
\ / : * ? | < >
e.g.
this-is-a-filename.png
?this-is-not.png
Is there any way to identify a string as 'not possibly a file' on Linux?
There are almost no restrictions - apart from '/' and '\0', you're allowed to use anything. However, some people think it's not a good idea to allow this much flexibility.
An empty string is the only truly invalid path name on Linux, which may work for you if you need only one invalid name. You could also use a string like "///foo", which would not be a canonical path name, although it could refer to a file ("/foo"). Another possibility would be something like "/dev/null/foo", since /dev/null has a POSIX-defined non-directory meaning. If you only need strings that could not refer to a regular file you could use "/" or ".", since those are always directories.
Technically it's not invalid but files with dash(-) at the beginning of their name will put you in a lot of troubles. It's because it has conflicts with command arguments.
I personally find that a lot of the time the problem is not Linux but the applications one is using on Linux.
Take for example Amarok. Recently I noticed that certain artists I had copied from my Windows machine where not appearing in the library. I check and confirmed that the files were there and then I noticed that certain characters in the folder names (Named for the artist) were represented with a weird-looking square rather than an actual character.
In a shell terminal the filenames look even stranger: /Music/Albums/Einst$'\374'rzende\ Neubauten is an example of how strange.
While these files were definitely there, Amarok could not see them for some reason. I was able to use some shell trickery to rename them to sane versions which I could then re-name with ASCII-only characters using Musicbrainz Picard. Unfortunately, Picard was also unable to open the files until I renamed them, hence the need for a shell script.
Overall this a a tricky area and it seems to get very thorny if you are trying to synchronise a music collection between Windows and Linux wherein certain folder or file names contain funky characters.
The safest thing to do is stick to ASCII-only filenames.