I am looking for the quoting/splitting rules for a command passed to script -c command. The man pages just says
-c, --command command: Run the command rather than an interactive shell.
but I want to make sure "command" is properly escaped.
The COMMAND argument is just a regular string that is processed by the shell as if it were an excerpt of a file. We may think of -c COMMAND as being functionally equivalent of
printf '%s' COMMAND > /tmp/command_to_execute.sh
sh /tmp/command_to_execute.sh
The form -c COMMAND is however superior to the version relying of an auxiliary file because it avoids race conditions related to using an auxiliary file.
In the typical usage of the -c COMMAND option we pass COMMAND as a single-quoted string, as in this pseudo-code example:
sh -c '
do_some_complicated_tests "$1" "$2";
if something; then
proceed_this_way "$1" "$2";
else
proceed_that_way "$1" "$2";
fi' ARGV0 ARGV1 ARGV2
If command must contain single-quoted string, we can rely on printf to build the COMMAND string, but this can be tedious. An example of this technique is illustrated
by the overcomplicated grep-like COMMAND defined here:
% AWKSCRIPT='$0 ~ expr {print($0)}'
% COMMAND=$(printf 'awk -v expr="$1" \047%s\047' "$AWKSCRIPT")
% sh -c "$COMMAND" print_matching 'tuning' < /usr/share/games/fortune/freebsd-tips
"man tuning" gives some tips how to tune performance of your FreeBSD system.
Recall that 047 is octal representation of the ASCII code for the single quote character.
As a side note, these constructions are quite command in Makefiles where they can replace shell functions.
Related
This is almost the exact same question as in this post, except that I do not want to use eval.
Quick question short, I want to execute the command echo aaa | grep a by first storing it in a string variable Command='echo aaa | grep a', and then running it without using eval.
In the post above, the selected answer used eval. That works for me too. What concerns me a lot is that there are plenty of warnings about eval below, followed by some attempts to circumvent it. However, none of them are able to solve my problem (essentially the OP's). I have commented below their attempts, but since it has been there for a long time, I suppose it is better to post the question again with the restriction of not using eval.
Concrete Example
What I want is a shell script that runs my command when I am happy:
#!/bin/bash
# This script run-this-if.sh runs the commands when I am happy
# Warning: the following script does not work (on nose)
if [ "$1" == "I-am-happy" ]; then
"$2"
fi
$ run-if.sh I-am-happy [insert-any-command]
Your sample usage can't ever work with an assignment, because assignments are scoped to the current process and its children. Because there's no reason to try to support assignments, things get suddenly far easier:
#!/bin/sh
if [ "$1" = "I-am-happy" ]; then
shift; "$#"
fi
This then can later use all the usual techniques to run shell pipelines, such as:
run-if-happy "$happiness" \
sh -c 'echo "$1" | grep "$2"' _ "$untrustedStringOne" "$untrustedStringTwo"
Note that we're passing the execve() syscall an argv with six elements:
sh (the shell to run; change to bash etc if preferred)
-c (telling the shell that the following argument is the code for it to run)
echo "$1" | grep "$2" (the code for sh to parse)
_ (a constant which becomes $0)
...whatever the shell variable untrustedStringOne contains... (which becomes $1)
...whatever the shell variable untrustedStringTwo contains... (which becomes $2)
Note here that echo "$1" | grep "$2" is a constant string -- in single-quotes, with no parameter expansions or command substitutions -- and that untrusted values are passed into the slots that fill in $1 and $2, out-of-band from the code being evaluated; this is essential to have any kind of increase in security over what eval would give you.
I am trying to run a script on multiple lists of files while also passing arguments in parallel. I have file_list1.dat, file_list2.dat, file_list3.dat. I would like to run script.sh which accepts 3 arguments: arg1, arg2, arg3.
For one run, I would do:
sh script.sh file_list1.dat $arg1 $arg2 $arg3
I would like to run this command in parallel for all the file lists.
My attempt:
Ncores=4
ls file_list*.dat | xargs -P "$Ncores" -n 1 [sh script.sh [$arg1 $arg2 $arg3]]
This results in the error: invalid number for -P option. I think the order of this command is wrong.
My 2nd attempt:
echo $arg1 $arg2 $arg3 | xargs ls file_list*.dat | xargs -P "$Ncores" -n 1 sh script.sh
But this results in the error: xargs: ls: terminated by signal 13
Any ideas on what the proper syntax is for passing arguments to a bash script with xargs?
I'm not sure I understand exactly what you want to do. Is it to execute something like these commands, but in parallel?
sh script.sh $arg1 $arg2 $arg3 file_list1.dat
sh script.sh $arg1 $arg2 $arg3 file_list2.dat
sh script.sh $arg1 $arg2 $arg3 file_list3.dat
...etc
If that's right, this should work:
Ncores=4
printf '%s\0' file_list*.dat | xargs -0 -P "$Ncores" -n 1 sh script.sh "$arg1" "$arg2" "$arg3"
The two major problems in your version were that you were passing "Ncores" as a literal string (rather than using $Ncores to get the value of the variable), and that you had [ ] around the command and arguments (which just isn't any relevant piece of shell syntax). I also added double-quotes around all variable references (a generally good practice), and used printf '%s\0' (and xargs -0) instead of ls.
Why did I use printf instead of ls? Because ls isn't doing anything useful here that printf or echo or whatever couldn't do as well. You may think of ls as the tool for getting lists of filenames, but in this case the wildcard expression file_list*.dat gets expanded to a list of files before the command is run; all ls would do with them is look at each one, say "yep, that's a file" to itself, then print it. echo could do the same thing with less overhead. But with either ls or echo the output can be ambiguous if any filenames contain spaces, quotes, or other funny characters. Some versions of ls attempt to "fix" this by adding quotes or something around filenames with funny characters, but that might or might not match how xargs parses its input (if it happens at all).
But printf '%s\0' is unambiguous and predictable -- it prints each string (filename in this case) followed by a NULL character, and that's exactly what xargs -0 takes as input, so there's no opportunity for confusion or misparsing.
Well, ok, there is one edge case: if there aren't any matching files, the wildcard pattern will just get passed through literally, and it'll wind up trying to run the script with the unexpanded string "file_list*.dat" as an argument. If you want to avoid this, use shopt -s nullglob before this command (and shopt -u nullglob afterward, to get back to normal mode).
Oh, and one more thing: sh script.sh isn't the best way to run scripts. Give the script a proper shebang line at the beginning (#!/bin/sh if it uses only basic shell features, #!/bin/bash or #!/usr/bin/env bash if it uses any bashisms), and run it with ./script.sh.
I'm trying to understand -c option for bash better. The man page says:
-c: If the -c option is present, then commands are read from the first non-option argument command_string. If there are arguments after the command_string, they are assigned to the positional parameters, starting with $0.
I'm having trouble understanding what this means.
If I do the following command with and without bash -c, I get the same result (example from http://www.tldp.org/LDP/abs/html/abs-guide.html):
$ set w x y z; IFS=":-;"; echo "$*"
w:x:y:z
$ bash -c 'set w x y z; IFS=":-;"; echo "$*"'
w:x:y:z
bash -c isn't as interesting when you're already running bash. Consider, on the other hand, the case when you want to run bash code from a Python script:
#!/usr/bin/env python
import subprocess
fileOne='hello'
fileTwo='world'
p = subprocess.Popen(['bash', '-c', 'diff <(sort "$1") <(sort "$2")',
'_', # this is $0 inside the bash script above
fileOne, # this is $1
fileTwo, # and this is $2
])
print p.communicate() # run that bash interpreter, and print its stdout and stderr
Here, because we're using bash-only syntax (<(...)), you couldn't run this with anything that used POSIX sh by default, which is the case for subprocess.Popen(..., shell=True); using bash -c thus provides access to capabilities that wouldn't otherwise be available without playing with FIFOs yourself.
Incidentally, this isn't the only way to do that: One could also use bash -s, and pass code in on stdin. Below, that's being done not from Python but POSIX sh (/bin/sh, which likewise is not guaranteed to have <(...) available):
#!/bin/sh
# ...this is POSIX sh code, not bash code; you can't use <() here
# ...so, if we want to do that, one way is as follows:
fileOne=hello
fileTwo=world
bash -s "$fileOne" "$fileTwo" <<'EOF'
# the inside of this heredoc is bash code, not POSIX sh code
diff <(sort "$1") <(sort "$2")
EOF
The -c option finds its most important uses when bash is launched by another program, and especially when the code to be executed may or does include redirections, pipelines, shell built-ins, shell variable assignments, and / or non-trivial lists. On POSIX systems that have /bin/sh being an alias for bash, it specifically supports the C library's system() function.
Equivalent behavior is much trickier to implement on top of fork / exec without using -c, though not altogether impossible.
How to execute BASH code from outside the BASH shell?
The answer is, using the -c option, which makes BASH execute whatever that has been passed as an argument to -c.
So, yeah, this is the purpose of this option, to execute BASH code arbitrarily, but just in another way.
I'm trying to execute the following commands:
mkdir 'my dir'
CMD="ls 'my dir'"
RESULT=$($CMD)
This results in:
ls: 'my: No such file or directory
ls: dir': No such file or directory
Using "set -x" before the second command reveals that the command that's actually being issued is:
++ ls ''\''my' 'dir'\'''
This is obviously abstracted from what I was actually trying to do; this code on its own doesn't serve any purpose. But my question is why does bash tokenize the quoted string like this and how can I make it stop?
(Almost) all languages differentiate between code and data:
args="1, 2"
myfunc(args) != myfunc(1, 2)
The same is true in bash. Putting single quotes in a literal string will not make bash interpret them.
The correct way of storing a program name and arguments (aka a simple command) is using an array:
cmd=(ls 'my dir')
result=$("${cmd[#]}")
Bash will automatically split words on spaces unless you double quote, which is why your example sometimes appears to work. This is surprising and error prone, and the reason why you should always double quote your variables unless you have a good reason not to.
It's also possible to get bash to interpret a string as if it was bash code entirely, using eval. This is frequently recommended, but almost always wrong.
To provide another approach -- one that works on POSIX systems -- xargs can do this parsing for you, as long as you can guarantee that the argument list is short enough that it doesn't get split into multiple separate commands:
CMD="ls 'my dir'"
printf '%s\n' "$CMD" | xargs sh -c '"$#"' sh
Mind, to do this securely (against an attacker who intentionally generates a string that goes over the maximum argv length to cause xargs to split it into multiple commands) you'd want to break out the first word of CMD to be something known/trusted, and parameterize only the following arguments. For example:
args="'my dir' 'other dir'"
printf '%s\n' "$args" | xargs sh -c 'ls "$#"' sh
...or simpler, at that point...
printf '%s\n' "$args" | xargs ls
Solution 1: Use sh -c
mkdir 'my dir'
CMD="ls 'my dir'" # careful: your example was missing a '
RESULT=$(sh -c "$CMD")
Solution 2: Declare CMD as an array
mkdir 'my dir'
CMD=(ls 'my dir') # array with 2 elements
RESULT=$("${CMD[#]}")
I want to inject a transparent wrappering command on each shell command in a make file. Something like the time shell command. ( However, not the time command. This is a completely different command.)
Is there a way to specify some sort of wrapper or decorator for each shell command that gmake will issue?
Kind of. You can tell make to use a different shell.
SHELL = myshell
where myshell is a wrapper like
#!/bin/sh
time /bin/sh "$0" "$#"
However, the usual way to do that is to prefix a variable to all command calls. While I can't see any show-stopper for the SHELL approach, the prefix approach has the advantage that it's more flexible (you can specify different prefixes for different commands, and override prefix values on the command line), and could be visibly faster.
# Set Q=# to not display command names
TIME = time
foo:
$(Q)$(TIME) foo_compiler
And here's a complete, working example of a shell wrapper:
#!/bin/bash
RESULTZ=/home/rbroger1/repos/knl/results
if [ "$1" == "-c" ] ; then
shift
fi
strace -f -o `mktemp $RESULTZ/result_XXXXXXX` -e trace=open,stat64,execve,exit_group,chdir /bin/sh -c "$#" | awk '{if (match("Process PID=\d+ runs in (64|32) bit",$0) == 0) {print $0}}'
# EOF
I don't think there is a way to do what you want within GNUMake itself.
I have done things like modify the PATH env variable in the Makefile so a directory with my script linked to all name the bins I wanted wrapped was executed rather than the actual bin. The script would then look at how it was called and exec the actual bin with the wrapped command.
ie. exec time "$0" "$#"
These days I usually just update the targets in the Makefile itself. Keeping all your modifications to one file is usually better IMO than managing a directory of links.
Update
I defer to Gilles answer. It's a better answer than mine.
The program that GNU make(1) uses to run commands is specified by the SHELL make variable. It will run each command as
$SHELL -c <command>
You cannot get make to not put the -c in, since that is required for most shells. -c is passed as the first argument ($1) and <command> is passed as a single argument string as the second argument ($2).
You can write your own shell wrapper that prepends the command that you want, taking into account the -c:
#!/bin/sh
eval time "$2"
That will cause time to be run in front of each command. You need eval since $2 will often not be a single command and can contain all sorts of shell metacharacters that need to be expanded or processed.