escaping string on Windows command line - windows

I'm trying to pass string to Win32 program from command line so it will be printed without changes.
Why I have to escape
"AAA <BBB#pobox.com>" as """AAA <BBB#pobox.com>"""
but
"AAA <BBB#pobox.com>", (comma included) as "\"AAA ^<BBB#pobox.com^>\","
I see no consistency in escaping rules for windows command line
P.S. I'm trying to generate a .cmd file
Update:
I'm using simple C program for testing that is compiled with gcc, no additional object files linked. If I replace it with perl, rules remain same.
I'm trying to create a general escaping algorithm. It will generate .cmd file which will call perl with output redirect. Currently I have a problem that if string contains odd number of double quotes which are escaped with backslash, output redirect does not function. Same problem is described in the last comment to http://blogs.msdn.com/b/oldnewthing/archive/2010/09/17/10063629.aspx .
If I use "" as escape for ", it splits on space, so it will result it 2 parameters instead of one. Also "" has some artifacts.

In windows there is no one way of getting a command line and parsing it. Mostly programs have generally been left to deal with that themselves.
There is a recent post by Raymond Chen about the CommandLineToArgvW function which mentions various rules about quoting but they'll only apply if the program uses that particular function. http://blogs.msdn.com/b/oldnewthing/archive/2010/09/17/10063629.aspx
In windows the command line is passed to the program unmolested (i.e. no wildcards expanded) and then the program needs to deal with it. The programming language may provide a convenience which does some default argument parsing, and this might use a standard windows function like CommandLineToArgvW but even so the program could opt to read the unadulterated string itself thereby skipping those standards.
This means you need to figure out the rules for the particular program you are trying to script yourself and then use them.

I've just tried those as parameters into one of my own programs, and both versions (with or without the comma) can be escaped in both ways (using either """ or \" to escape the quotes). The only reason I can see that the < and > need to be escaped with ^ in the second version is that as the command line is seeing them as I/O redirections prior to passing them to the application, due to the different way of escaping the string quotes.

Related

Documented behavior for multiple backslashes in Windows paths

Apparently, Windows (or at least some part of Windows) ignores multiple backslashes in a path and treats them as a single backslash. For example, executing any of these commands from a command prompt or the Run window opens Notepad:
C:\Windows\System32\Notepad.exe
C:\Windows\System32\\Notepad.exe
C:\Windows\System32\\\Notepad.exe
C:\Windows\System32\\\\Notepad.exe
C:\\Windows\\System32\\Notepad.exe
C:\\\Windows\\\System32\\\Notepad.exe
This can even work with arguments passed on the command line:
notepad "C:\Users\username\Desktop\\\\myfile.txt"
Is this behavior documented anywhere? I tried several searches, and only found this SO question that even mentions the behavior.
Note: I am not asking about UNC paths (\\servername), the \\?\ prefix, or the \\" double-quote escape.
Note: I stumbled upon this behavior while working with a batch file. One line in the batch file looked something like this:
"%SOME_PATH%\myapp.exe"
After variable expansion, the command looked like:
"C:\Program Files\Vendor\MyApp\\myapp.exe"
To my surprise, the batch file executed as desired and did not fail with some kind of "path not found" error.
In most cases, Win32 API functions will accept a wide range of variations in the path name format, including converting a relative path into an absolute path based on the current directory or per-drive current directory, interpreting a single dot as "this directory" and two dots as "the parent directory", converting forward slashes into backslashes, and removing extraneous backslashes and trailing periods.
So something like
c:\documents\..\code.\\working\.\myprogram\\\runme.exe..
will wind up interpreted as
c:\code\working\myprogram\runme.exe
Some of this is documented, some is not. (As Hans points out, documenting this sort of workaround legitimizes doing it wrong.)
Note that this applies to the Win32 API, not necessarily to every application, or even every system component. In particular, the command interpreter has stricter rules when dealing with a long path, and Explorer will not accept the dot or double-dot and typically will not accept forward slashes. Also, the rules may be different for network drives if the server is not running Windows.
There is no consequence because you can't even name a file or folder with a backslash. So multiple consecutive backslashes will always be seen as one separator in the path.

Using a loop in a batch file to convert all occurences of a backslash into a forward slash

I am sure their are options using sed or other programming languages but I would like to keep this final step as simple as possible. It has to be run on several systems all of which have a Windows/Dos OS. This would be the last step in a multi-stage batch file that performs several specific tasks using "oldtext.txt" as input and ending with "newtext.txt".
The final output text file though has a single backslash on every line that needs to be converted to a forward slash. I need a way to add one more line to the script to convert that single "\" to a "/" so that no line in the file has a "\" anywhere.
Every loop I have tried ends up with an error or fails to do anything and with all the rest working perfectly I hate to start all over using a different method.
Thanks
Perhaps
set "var=%var:\=/%"
would be useful?

Is there a "standard" format for command line/shell help text?

If not, is there a de facto standard? Basically I'm writing a command line help text like so:
usage: app_name [options] required_input required_input2
options:
-a, --argument Does something
-b required Does something with "required"
-c, --command required Something else
-d [optlistitem1 optlistitem 2 ... ] Something with list
I made that from basically just reading the help text of various tools, but is there a list of guidelines or something? For example, do I use square brackets or parentheses? How to use spacing? What if the argument is a list? Thanks!
Typically, your help output should include:
Description of what the app does
Usage syntax, which:
Uses [options] to indicate where the options go
arg_name for a required, singular arg
[arg_name] for an optional, singular arg
arg_name... for a required arg of which there can be many (this is rare)
[arg_name...] for an arg for which any number can be supplied
note that arg_name should be a descriptive, short name, in lower, snake case
A nicely-formatted list of options, each:
having a short description
showing the default value, if there is one
showing the possible values, if that applies
Note that if an option can accept a short form (e.g. -l) or a long form (e.g. --list), include them together on the same line, as their descriptions will be the same
Brief indicator of the location of config files or environment variables that might be the source of command line arguments, e.g. GREP_OPTS
If there is a man page, indicate as such, otherwise, a brief indicator of where more detailed help can be found
Note further that it's good form to accept both -h and --help to trigger this message and that you should show this message if the user messes up the command-line syntax, e.g. omits a required argument.
Take a look at docopt. It is a formal standard for documenting (and automatically parsing) command line arguments.
For example...
Usage:
my_program command --option <argument>
my_program [<optional-argument>]
my_program --another-option=<with-argument>
my_program (--either-that-option | <or-this-argument>)
my_program <repeating-argument> <repeating-argument>...
I think there is no standard syntax for command line usage, but most use this convention:
Microsoft Command-Line Syntax, IBM has similar Command-Line Syntax
Text without brackets or braces
Items you must type as shown
<Text inside angle brackets>
Placeholder for which you must supply a value
[Text inside square brackets]
Optional items
{Text inside braces}
Set of required items; choose one
Vertical bar {a|b}
Separator for mutually exclusive items; choose one
Ellipsis <file> …
Items that can be repeated
We are running Linux, a mostly POSIX-compliant OS. POSIX standards it should be: Utility Argument Syntax.
An option is a hyphen followed by a single alphanumeric character,
like this: -o.
An option may require an argument (which must appear
immediately after the option); for example, -o argument or
-oargument.
Options that do not require arguments can be grouped after a hyphen, so, for example, -lst is equivalent to -t -l -s.
Options can appear in any order; thus -lst is equivalent to -tls.
Options can appear multiple times.
Options precede other nonoption
arguments: -lst nonoption.
The -- argument terminates options.
The - option is typically used to represent one of the standard input
streams.
The GNU Coding Standard is a good reference for things like this. This section deals with the output of --help. In this case it is not very specific. You probably can't go wrong with printing a table showing the short and long options and a succinct description. Try to get the spacing between all arguments right for readability. You probably want to provide a man page (and possibly an info manual) for your tool to provide a more elaborate explanation.
Microsoft has their own Command Line Standard specification:
This document is focused at developers of command line utilities. Collectively, our goal is to present a consistent, composable command line user experience. Achieving that allows a user to learn a core set of concepts (syntax, naming, behaviors, etc) and then be able to translate that knowledge into working with a large set of commands. Those commands should be able to output standardized streams of data in a standardized format to allow easy composition without the burden of parsing streams of output text. This document is written to be independent of any specific implementation of a shell, set of utilities or command creation technologies; however, Appendix J - Using Windows Powershell to implement the Microsoft Command Line Standard shows how using Windows PowerShell will provide implementation of many of these guidelines for free.
There is no standard but http://docopt.org/ has created their version of a specification for help text for command line tools.
yes, you're on the right track.
yes, square brackets are the usual indicator for optional items.
Typically, as you have sketched out, there is a commandline summary at the top, followed by details, ideally with samples for each option. (Your example shows lines in between each option description, but I assume that is an editing issue, and that your real program outputs indented option listings with no blank lines in between. This would be the standard to follow in any case.)
A newer trend, (maybe there is a POSIX specification that addresses this?), is the elimination of the man page system for documentation, and including all information that would be in a manpage as part of the program --help output. This extra will include longer descriptions, concepts explained, usage samples, known limitations and bugs, how to report a bug, and possibly a 'see also' section for related commands.
I hope this helps.
It may be a bit off-topic, but I once wrote two small tools that make creation and maintenance of command line tools help pages more efficient:
The MAIN DOCLET that generates an HTML document for the main method of a Java program by processing Javadoc comments in the source code
The HTML2TXT tool that formats an HTML document as a plain text (which is what we want our help texts)
I integrate these two tools in the MAVEN build process of my programs so they execute automatically on every build.
For example:
The relevant source file of my ZZFIND tool
The POM file that builds the project (and runs the two tools mentioned above)
Example output when ZZFIND is run with the --help command line option
Hope this is useful for others!?
I use the CSS formal notation for this.
Component values may be arranged into property values as follows:
Several juxtaposed words mean that all of them must occur, in the given order.
A bar (|) separates two or more alternatives: exactly one of them must occur.
A double bar (||) separates two or more options: one or more of them must occur, in any order.
A double ampersand (&&) separates two or more components, all of which must occur, in any order.
Brackets ([ ]) are for grouping.
Juxtaposition is stronger than the double ampersand, the double ampersand is stronger than the double bar, and the double bar is stronger than the bar. Thus, the following lines are equivalent:
a b | c || d && e f
[ a b ] | [ c || [ d && [ e f ]]]
Every type, keyword, or bracketed group may be followed by one of the following modifiers:
An asterisk (*) indicates that the preceding type, word, or group occurs zero or more times.
A plus (+) indicates that the preceding type, word, or group occurs one or more times.
A question mark (?) indicates that the preceding type, word, or group is optional.
A pair of numbers in curly braces ({A,B}) indicates that the preceding type, word, or group occurs at least A and at most B times.
If you need examples, see Formal definition sections on MDN; here is one for font: https://developer.mozilla.org/en-US/docs/Web/CSS/font#formal_syntax.
And here is a simple example from my own Pandoc's cheat sheet:
$ pandoc <input_file>.md --from [markdown|commonmark_x][-smart]? --to html --standalone --table-of-contents? --number-sections? [--css <style_sheet>.css]? --output <output_file>.html
I would follow official projects like tar as an example. In my opinion help msg. needs to be simple and descriptive as possible. Examples of use are good too. There is no real need for "standard help".

Should environment variables that contain a executable-path with spaces also contain the necessary quotes?

When defining an environment variable (on Windows for me, maybe there is a more general guideline)
set MY_TOOL=C:\DevTools\bin\mytool.exe
if the tool is located on a path with spaces
set MY_TOOL=C:\Program Files (x86)\Foobar\bin\mytool.exe
should the environment variable already contain the necessary spaces?
That is, should it read:
set MY_TOOL="C:\Program Files (x86)\Foobar\bin\mytool.exe"
instead of the above version without spaces?
Note: In light of Joeys answer, I really should narrow this question to the examples I gave. That is, environment variables that contain one single (executable / batch) tool to be invoked by a user or by another batch script.
Maybe the spaces should be escaped differently?
I'd say, do it without quotes and use them everywhere you use the variable:
set MY_TOOL=C:\Program Files (x86)\Foobar\bin\mytool.exe
"%MY_TOOL%" -someoption someargument somefile
Especially if you let the user set the value somewhere I guess this is the safest option, since they usually tend not to surround it with quotes rather than do so.
If there are plenty of places where you use the variable you can of course redefine:
set MY_TOOL="%MY_TOOL%"
which makes things more resilient for you. Optionally you could detect whether there are quotes or not and add them if not present to be totally sure.
When your variable represents only a path to a directory and you want to append file names there, then the "no quotes" thing is even more important, otherwise you'd be building paths like
"C:\Program Files (x86)\Foobar\bin"\mytool.exe
or even:
""C:\Program Files (x86)\Foobar\bin"\my tool with spaces.exe"
which I doubt will parse correctly.
The command shell can answer your question: type C:\Pro and hit the tab key.
Autocomplete will leave all spaces as-is and add quotes around the filename. So, this is what is "officially" expected.
(this assumes that autocomplete is turned on, I'm not sure whether the default is on or off, but most people have it on anyway, I guess)

Convert plain text to latex code programmatically

I'd like to take some user input text and quickly parse it to produce some latex code. At the moment, I'm replacing % with \% and \n with \n\n, but I'm wondering if there are other replacements I should be making to make the conversion from plain text to latex.
I'm not super worried about safety here (can you even write malicious latex code?), as this should only be used by the user to convert their own text into latex, and so they should probably be allowed to used their own latex markup in the pre-converted text, but I'd like to make sure the output doesn't include accidental latex commands if possible. If there's a good library to make such a conversion, I'd take a look.
Apparently, the following characters
\ { } $ ^ _ % ~ # &
are special in LaTeX, so you should make sure to escape them (prefixing with backslash will do for some of them, see Thomas' answer for special cases) or tell your users not to use them unless they deliberately want to use LaTeX commands (or a mix of both, depending on the character).
Some additional pitfalls:
Not every line break in the text might be intended as a new paragraph.
If your users use a language other than English (or Latin), you will need to \usepackage something that deals with the encoding (like utf8) or convert the characters yourself (e.g. ä -> \"a).
As dmckee points out, quotes also need to be treated separately.
EDIT: Since this has become the accepted answer, I also added the points raised in the other answers, so this is now a summary.
As Heinzi said, the following need attention:
\ { } $ ^ _ % ~ # &
Most can be escaped with a backslash, but \ becomes \textbackslash and ~ becomes \textasciitilde.
I think you might want to leave line breaks alone. LaTeX handles these in exactly the same way as many content management systems; many people have come to expect that "double line break" = "paragraph break". Heck, even stackoverflow itself works that way.
(You cannot write malicious LaTeX code; everything that happens inside LaTeX stays inside LaTeX. Unless you explicitly enable write18 when running latex, but it's disabled by default.)
Heinzi has already shown most of the basic characters that need to be escaped, but the hard part here is insuring that the quoting comes out right.
She said "He didn't do it".
needs to be converted to
She said ``He didn't do it''.
which looks easy in this trivial case, but is full of gatcha's that require careful handling. For modest size texts, I generally use a naive substitution generated in sed and diddle the results by hand. Things are both easier and harder if your "plain text" uses curly quotes.
Here "naive quote substitution" means that quotes followed by word characters are replaced by (one or two as appropriate) back ticks, and all others are replaced by (one or two) single-quotes ('). That catches most cases in prose, but you will have to clean up all the triple-quote cases by hand.
Another possible solution is to make all "special" characters into ordinary ones before inserting the user's text. That might avoid many headaches, but might also create new ones...
You can do this by changing the catcode of the character. The TeX Wikibook knows more.
\catcode`\$=12
will turn $ into an ordinary character. However, for some reason some characters don't come out as you'd expect. \ becomes a double open quote, { becomes a dash... and redefining } inside a group ({...}) makes TeX choke entirely.
Long story short: only recommended if you know what you're doing.

Resources