How to escape powershell command arguments when calling from Ruby/Chef? - ruby

I'm currently running into a bog-standard Bobby Tables problem, but the environment is Chef + Ruby + powershell.
All of the solutions I've seen so far appear inadequate: they surround the arguments with quotes, but do not fully escape the arguments. Shellwords and shellescape look promising, but they appear to be bash-specific.
For example, I may want to construct in Chef this windows shell command:
.\foo.exe BAR="#{node['baz']}"
Generalizing from the SQL dev world I'd naively anticipate an interface something like this:
cmd = "foo.exe BAR=?"
args = (node['baz'])
run-command(cmd, args)
Where run-command would handle escaping any arguments. Instead I see interfaces that remind me of the bad old SQL days when developers had to construct SQL as a string, and escape any arguments "by hand".
Any pointers to best practices on how to proceed? Use system? Thanks!
Edit: to be clear, the argument baz above can be expected to include arbitrary text, including possibly any combination of characters. Think of it as being identical to the Bobby Tables problem.

The easiest generic solution is to use the array form for the execute resource's command property. This avoids all shell parsing an runs the command verbatim.
execute 'whatever' do
command ['foo.exe', "BAR=#{node['baz']}"]
end
This doesn't work for script style resources though, which take a string for the script chunk to run, including powershell_script. There you would need something more tailored to PowerShell and I don't know its rules well enough to say if Shellwords would match.

Related

How can I generate a list of every valid syntactic operator in Bash including input and output?

According to the Bash Reference Manual, the Bash scripting language is constituted of 4 distinct subclasses of syntactic elements:
built-in commands (alias, cd)
reserved words (if, function)
parameters and variables ($, IFS)
functions (abort, end-of-file - activated with keybindings such as Ctrl-d)
Apart from reading the manual, I became inherently curious if there was a programmatic way to list out or generate all such keywords, at least from one of the above categories. I think this could be useful in some contexts. Sometimes I wish I could see all the options available to me for what I can write in any given moment, and having that information as data, instead of a formatted manual, is convenient, focused, and can be edited, in case you want to strike out commands you know well, or that are too obscure for now.
My understanding is that Bash takes the input into stdin and passes it to the running shell process. When code is distributed in a production-ready form, it is compiled, so it runs faster. Unlike using a Python REPL, you don’t have access to the Bash source code from within Bash, so it is not a very direct route to write a program that searches through source files to find various defined commands. I mean that if you wanted to list all functions, Python has the dir() function which programmatically looks for function names in the namespace. But I don’t think Bash can do that. I think it doesn’t have a special syntax in its source files which makes it easy to find and identify all the keywords. Instead, they will be found if you simply enter them - like cd will “find” the program cd because $PATH returns the path to that command - but there’s no special way to discover them.
Or am I wrong? Technically, you could run a “brute force” search by generating every combination of symbols of every length and record when you did not get “error: unknown command” as a response.
Is there any other clever programmatic way to do this?
I mean I want to see a list of every symbol or string that the bash
compiler
Bash is not a compiler. It and every other shell I know are interpreters of various languages.
recognises and knows what to do with, including commands like
“ls” or just a symbol like “*”. I also want to see the inputs and
outputs for each symbol, i.e., some commands are executed in the shell
prompt by themselves, but what data type do they return?
All commands executed by the shell have an exit status, which is a number between 0 and 255. This is as close to a "return type" as you get. Many of them also produce idiosyncratic output to one or two streams (a standard output stream and a standard error stream) under some conditions, and many have other effects on the shell environment or operating environment.
And some
require a certain data type to standard input.
I can't think of a built-in utility whose expected input is well characterized as having a particular data type. That's not really a stream-oriented concept.
I want to do this just as a rigorous way to study the language.
If you want to rigorously study the language, then you should study its manual, where everything you describe has already been compiled. You might also want to study the POSIX shell command language manual for a slightly different perspective, which is more thorough in some areas, though what it documents differs in a few details from Bash's default behavior.
If you want to compile your own summary of Bash syntax and behavior, then those are the best source materials for such an effort.
You can get a list of all reserved words and syntactic elements of bash using this trick:
help -s '*' | cut -d: -f1
Or more accurately:
help -s \* | awk -F ': ' 'NR>2&&!/variables/{print $1}'

What is the rationale behind variable assignment without space in bash script

I am trying to write an automate process for AWS that requires some JSON processing and other things in bash script. I am following a few blogs for bash script and I found this:
a=b
with the following note:
There is no space on either side of the equals ( = ) sign. We
also leave off the $ sign from the beginning of the variable name when
setting it
This is ugly and very difficult to read and comparing to other scripting languages, it is easy for user to make a mistake when writing a bash script by leaving space in between. I think everyone like to write clean and readable code, this restriction for sure is bad for code readability.
Can you explain why? explanation with examples are highly appreciated.
It's because otherwise the syntax would be ambiguous. Consider this command line:
cat = foo
Is that an assignment to the variable cat, or running the command cat with the arguments "=" and "foo"? Note that "=" and "foo" are both perfectly legal filenames, and therefore reasonable things to run cat on. Shell syntax settles this in favor of the command interpretation, so to avoid this interpretation you need to leave out the spaces. cat =foo has the same problem.
On the other hand, consider:
var= cat
Is that the command cat run with the variable var set to the empty string (i.e. a shorthand for var='' cat), or an assignment to the shell variable var? Again, the shell syntax favors the command interpretation so you need to avoid the temptation to add spaces.
There are many places in shell syntax where spaces are important delimiters. Another commonly-messed-up place is in tests, where if you leave out any of the spaces in:
if [ "$foo" = "$bar" ]
...it will lead to a different meaning, which might cause an error, or might just silently do the wrong thing.
What I'm getting at is that shell syntax does not allow you to arbitrarily add or remove spaces to improve readability. Don't even try, you'll just break things.
What you need to understand is that the shell language and syntax is old. Really old. The first version of the UNIX shell with variables was the Bourne shell which was designed and implemented in 1977. Back then, there were few precedents. (AFAIK, just the Thompson shell, which didn't support variables according to the manual entry.)
The rationale for the design decisions in the 1970's are ... lost in the mists of time. The design decisions were made by Steve Bourne and colleagues working at Bell Labs on v6 UNIX. They probably had no idea that their decisions would still be relevant 40+ years later.
The Bourne shell was designed to be general purpose and simple to use ... compared with the alternative of writing programs in C. And small. It was an outstanding success in those terms.
However, any language that is successful has the "problem" that it gets widely adopted. And that makes it more difficult to fix any issues (real or perceived) that may arise. Any proposal to change a language needs to be balanced against the impact of that change on existing users / uses of the language. You don't want to break existing programs or scripts.
Irrespective of arguments about whether spaces around = should be allowed in a shell variable assignment, changing this would break millions of shell scripts. It is just not going to happen.
Of course, Linux (and UNIX before it) allow you to design and implement your own shell. You could (in theory) replace the default shell. It is just a lot of work.
And there is nothing stopping you from writing your scripts in another scripting language (e.g. Python, Ruby, Perl, etc) or designing and implementing your own scripting language.
In summary:
We cannot know for sure why they designed the shell with this syntax for variable assignment, but it is moot anyway.
Reference:
Evolution of shells in Linux: a history of shells.
It prevents ambiguity in a lot of cases. Otherwise, if you have a statement foo = bar, it could then either mean run the foo program with = and bar as arguments, or set the foo variable to bar. When you require that there are no spaces, now you've limited ambiguity to the case where a program name contains an equals sign, which is basically unheard of.
I agree with #StephenC, and here's some more context with sources:
Unix v6 from 1975 did not have an environment, there was just a exec syscall that took a program and a string array of arguments. The system sh, written by Thompson, did not support variables, only single digit numbered arguments like $1 (probably why $12 to this day is interpreted as ${1}2)
Unix v7 from 1979, emboldened by advances in hardware, added a ton of features including a second string array to the exec call. The man page described it like this, which is still how it works to this day:
An array of strings called the environment is made available by exec(2) when a process begins. By convention these strings have the form name=value
The system sh, now written by Bourne, worked much like v6 shell, but now allowed you to specify these environment strings in the same format in front of commands (because which other format would you use?). The simplistic parser essentially split words by spaces, and flagged a word as destined for a variable if it contained a = and all preceding characters had been alphanumeric.
Thanks to Unix v7's incredible popularity, forks and clones copied a lot of things including this behavior, and that's what we're still seeing today.

Escaping for proper shell injection prevention

I need to run some shell commands from a Lua interpreter embedded into another Mac/Windows application where shell commands are the only way to achieve certain things, like opening a help page in a browser. If I have a list of arguments (which might be the result of user input), how can I escape each argument to prevent trouble?
Inspired by this article an easy solution seems to be to escape all non-alphanumeric characters, on Unix-like systems with \, on Windows with ^. As far as I can tell, this prevents that any argument will cause
execution of another command because of on intervening newline, ; (Unix) or & (Windows)
command substitution on Unix with $ or `
variable evaluation on Windows with %
redirection with <, | and >
In addition, any character that on the respective platform works as escape character will be escaped properly.
This seems sound to me, but are there any pitfalls I might have missed? I know that in bash, \ followed by a newline will effectively remove the newline, which is not a problem here.
EDIT
My conclusion: There is no single mechanism that by swapping the escape character will work on both Windows and *nix. It turns out it is not so straightforward to make sure that a Windows program actually sees the command line arguments we want it to see as splitting the command string into arguments on Windows is not handled by the shell but by the called program itself.
Therefore two layers of escaping need to be taken into account:
First, the Windows shell will process what we give it. What it might do is variable substitution at %, splitting into multiple commands at & or piping to another command at |.
Then, it will hand on a single command string to the called program which the program will split, ideally but not necessarily following the rules described by Microsoft.
Assuming it follows these rules, one can work one's way backwards, first escaping to these rules, then escaping further for the shell.
Calling sub-processes with dynamic arguments is prone to error and danger, and many languages don't provide good mechanisms to protect the developer. In Python, for example, os.system() is no longer recommended, instead the subprocess module provides a proper mechanism for safely making system calls. In particular, you pass subprocess.run() a list of arguments, rather than a single string, thereby avoiding needing to implement any error-prone escaping in the first place.
A quick search for a subprocess-like tool for Lua uncovered lua-subprocess, which doesn't appear to be being actively developed, but might still be better than trying to implement proper escaping yourself.
If you must do so, take a look at the Python code for shlex.quote() (source) - it properly escapes an input string for use "in a shell command line":
# use single quotes, and put single quotes into double quotes
# the string $'b is then quoted as '$'"'"'b'
return "'" + s.replace("'", "'\"'\"'") + "'"
You ought to be able to replicate that in Lua.

Single quotes and double quotes: How to have the same behaviour in unix and windows?

I have a batch script that call an exe with some parameters.
Currently I was passing the parameters to my exe like that:
$>my_cmd.exe %*
One of the options of the my_cmd.exe program takes arguments that can contain spaces
$>my_cmd.bat --req "in: lava" (OK my prog receives in: lava)
$>my_cmd.bat --req 'in: lava' (NOK my program receives 'in: lava')
Users use indifferently single quotes or double quotes.
It works with double quotes because they are eaten at the batch script level but when they use ' (single quotes) it is left and passed to my program.
my_cmd is multiplatform and on unix both single quote and double quote are special characters.
I would like to avoid having to do something specific in my_cmd program depending on the platform.
Is there a way to have the same behaviour in shell scripts and batch scripts.
For example the batch script could eat single quote if they are present ?
Tell me what would be the best solution for you.
Many thanks
On Windows, argument handling (and rules for quoting, globbing, etc) is the responsibility of the application. If your code uses anything except a single string containing all parameters with quotes intact, understand that this is because your development tools have done some preprocessing on the result of GetCommandLine. Therefore, for different quote handling, you need to look at your development tools, not at the OS. The best option is often to call GetCommandLine yourself and use your choice of library for processing it, instead of the one provided with your compiler.
That said, the Windows shell team has provided one of these libraries. See CommandLineToArgvW. But this is not part of the core OS, and using it is completely optional.
In addition, the batch processor does consider quotes when doing variable substitution. And that behavior is hard to change or disable, but it doesn't sound like it is the source of your problems.
Why not just change quotes to double?
set args=%*
my_cmd.exe %args:'="%

What are the most important shell/terminal concepts/commands for novice to learn?

ALthough I've had to dabble in shell scripting and commands, I still consider myself a novice and I'm interested to hear from others what they consider to be crucial bits of knowledge.
Here's an example of something that I think is important:
I think understanding $PATH is crucial. In order to run psql, for instance, the PostgreSQL folder has to be added to the $PATH variable, a step easily over looked by beginners.
Concept of pipes. The fact that you can easily redirect output and divide complex task to several simple ones is crucial.
Do yourself a favor and get this book: Learning the Bash Shell
Read and understand:
The Official Bash FAQs
Greg Wooledge's Bash FAQs and Bash Pitfalls and everything else on that site
If you're writing shell scripts, an important habit to get into is to always put double quotes around variable substitutions. That is, always write "$myvariable" (and similarly "$(mycommand)"), never plain $myvariable or $(mycommand), unless you understand exactly why you need to leave them out. (Again, the question is not “should I use quotes?”, it's “why would I want to omit the quotes?”)
The reason is that the shell does nasty things when you leave a variable substitution unquoted. (Those nasty things are called field splitting and pathname expansion. They're good in some situations, but almost never on the result of a variable or command substitution.)
If you leave out the quotes, your script may appear to work at first glance. This is because nasty things only happen if the value of the variable contains some special characters (whitespace, \, *, ? and [). This sort of latent bug tends to be revealed the day you create a file whose name contains a space and your script ends up deleting your source tree/thesis/baby pictures/...
So for example, if you have a variable $filename that contains the name of a file you want to pass to a command, always write
mycommand "$filename"
and not mycommand $filename.

Resources