Bash tokenizing quoted string with spaces as individual words - bash

I'm trying to execute the following commands:
mkdir 'my dir'
CMD="ls 'my dir'"
RESULT=$($CMD)
This results in:
ls: 'my: No such file or directory
ls: dir': No such file or directory
Using "set -x" before the second command reveals that the command that's actually being issued is:
++ ls ''\''my' 'dir'\'''
This is obviously abstracted from what I was actually trying to do; this code on its own doesn't serve any purpose. But my question is why does bash tokenize the quoted string like this and how can I make it stop?

(Almost) all languages differentiate between code and data:
args="1, 2"
myfunc(args) != myfunc(1, 2)
The same is true in bash. Putting single quotes in a literal string will not make bash interpret them.
The correct way of storing a program name and arguments (aka a simple command) is using an array:
cmd=(ls 'my dir')
result=$("${cmd[#]}")
Bash will automatically split words on spaces unless you double quote, which is why your example sometimes appears to work. This is surprising and error prone, and the reason why you should always double quote your variables unless you have a good reason not to.
It's also possible to get bash to interpret a string as if it was bash code entirely, using eval. This is frequently recommended, but almost always wrong.

To provide another approach -- one that works on POSIX systems -- xargs can do this parsing for you, as long as you can guarantee that the argument list is short enough that it doesn't get split into multiple separate commands:
CMD="ls 'my dir'"
printf '%s\n' "$CMD" | xargs sh -c '"$#"' sh
Mind, to do this securely (against an attacker who intentionally generates a string that goes over the maximum argv length to cause xargs to split it into multiple commands) you'd want to break out the first word of CMD to be something known/trusted, and parameterize only the following arguments. For example:
args="'my dir' 'other dir'"
printf '%s\n' "$args" | xargs sh -c 'ls "$#"' sh
...or simpler, at that point...
printf '%s\n' "$args" | xargs ls

Solution 1: Use sh -c
mkdir 'my dir'
CMD="ls 'my dir'" # careful: your example was missing a '
RESULT=$(sh -c "$CMD")
Solution 2: Declare CMD as an array
mkdir 'my dir'
CMD=(ls 'my dir') # array with 2 elements
RESULT=$("${CMD[#]}")

Related

What is the meaning of "${psql[#]}" in this script?

I came across a script that is supposed to set up postgis in a docker container, but it references this "${psql[#]}" command in several places:
#!/bin/sh
# Perform all actions as $POSTGRES_USER
export PGUSER="$POSTGRES_USER"
# Create the 'template_postgis' template db
"${psql[#]}" <<- 'EOSQL'
CREATE DATABASE template_postgis;
UPDATE pg_database SET datistemplate = TRUE WHERE datname = 'template_postgis';
EOSQL
I'm guessing it's supposed to use the psql command, but the command is always empty so it gives an error. Replacing it with psql makes the script run as expected. Is my guess correct?
Edit: In case it's important, the command is being run in a container based on postgres:11-alpine.
$psql is supposed to be an array containing the psql command and its arguments.
The script is apparently expected to be run from here, which does
psql=( psql -v ON_ERROR_STOP=1 --username "$POSTGRES_USER" --no-password )
and later sources the script in this loop:
for f in /docker-entrypoint-initdb.d/*; do
case "$f" in
*.sh)
# https://github.com/docker-library/postgres/issues/450#issuecomment-393167936
# https://github.com/docker-library/postgres/pull/452
if [ -x "$f" ]; then
echo "$0: running $f"
"$f"
else
echo "$0: sourcing $f"
. "$f"
fi
;;
*.sql) echo "$0: running $f"; "${psql[#]}" -f "$f"; echo ;;
*.sql.gz) echo "$0: running $f"; gunzip -c "$f" | "${psql[#]}"; echo ;;
*) echo "$0: ignoring $f" ;;
esac
echo
done
See Setting an argument with bash for the reason to use an array rather than a string.
The #!/bin/sh and the [#] are incongruous. This is a bash-ism, where the psql variable is an array. This literal quote dollarsign psql bracket at bracket quote is expanded into "psql" "array" "values" "each" "listed" "and" "quoted" "separately." It's the safer way, e.g., to accumulate arguments to a command where any of them might have spaces in them.
psql=(/foo/psql arg arg arg) is the best way to define the array you need there.
It might look obscure, but it would work like so...
Let's say we have a bash array wc, which contains a command wc, and an argument -w, and we feed that a here document with some words:
wc=(wc -w)
"${wc[#]}" <<- words
one
two three
four
words
Since there are four words in the here document, the output is:
4
In the quoted code, there needs to be some prior point, (perhaps a calling script), that does something like:
psql=(psql -option1 -option2 arg1 arg2 ... )
As to why the programmer chose to invoke a command with an array, rather than just invoke the command, I can only guess... Maybe it's a crude sort of operator overloading to compensate for different *nix distros, (i.e. BSD vs. Linux), where the local variants of some necessary command might have different names from the same option, or even use different commands. So one might check for BSD or Linux or a given version, and reset psql accordingly.
The answer from #Barmar is correct.
The script was intended to be "sourced" and not "executed".
I faced the same problem and came to the same answer after I read that it had been reported here and fixed by "chmod".
https://github.com/postgis/docker-postgis/issues/119
Therefore, the fix is to change the permissions.
This can be done either in your git repository:
chmod -x initdb-postgis.sh
or add a line to your docker file.
RUN chmod -x /docker-entrypoint-initdb.d/10_postgis.sh
I like to do both so that it is clear to others.
Note: if you are using git on windows then permission can be lost. Therefore, "chmod" in the docker file is needed.

Using xargs to run bash scripts on multiple input lists with arguments

I am trying to run a script on multiple lists of files while also passing arguments in parallel. I have file_list1.dat, file_list2.dat, file_list3.dat. I would like to run script.sh which accepts 3 arguments: arg1, arg2, arg3.
For one run, I would do:
sh script.sh file_list1.dat $arg1 $arg2 $arg3
I would like to run this command in parallel for all the file lists.
My attempt:
Ncores=4
ls file_list*.dat | xargs -P "$Ncores" -n 1 [sh script.sh [$arg1 $arg2 $arg3]]
This results in the error: invalid number for -P option. I think the order of this command is wrong.
My 2nd attempt:
echo $arg1 $arg2 $arg3 | xargs ls file_list*.dat | xargs -P "$Ncores" -n 1 sh script.sh
But this results in the error: xargs: ls: terminated by signal 13
Any ideas on what the proper syntax is for passing arguments to a bash script with xargs?
I'm not sure I understand exactly what you want to do. Is it to execute something like these commands, but in parallel?
sh script.sh $arg1 $arg2 $arg3 file_list1.dat
sh script.sh $arg1 $arg2 $arg3 file_list2.dat
sh script.sh $arg1 $arg2 $arg3 file_list3.dat
...etc
If that's right, this should work:
Ncores=4
printf '%s\0' file_list*.dat | xargs -0 -P "$Ncores" -n 1 sh script.sh "$arg1" "$arg2" "$arg3"
The two major problems in your version were that you were passing "Ncores" as a literal string (rather than using $Ncores to get the value of the variable), and that you had [ ] around the command and arguments (which just isn't any relevant piece of shell syntax). I also added double-quotes around all variable references (a generally good practice), and used printf '%s\0' (and xargs -0) instead of ls.
Why did I use printf instead of ls? Because ls isn't doing anything useful here that printf or echo or whatever couldn't do as well. You may think of ls as the tool for getting lists of filenames, but in this case the wildcard expression file_list*.dat gets expanded to a list of files before the command is run; all ls would do with them is look at each one, say "yep, that's a file" to itself, then print it. echo could do the same thing with less overhead. But with either ls or echo the output can be ambiguous if any filenames contain spaces, quotes, or other funny characters. Some versions of ls attempt to "fix" this by adding quotes or something around filenames with funny characters, but that might or might not match how xargs parses its input (if it happens at all).
But printf '%s\0' is unambiguous and predictable -- it prints each string (filename in this case) followed by a NULL character, and that's exactly what xargs -0 takes as input, so there's no opportunity for confusion or misparsing.
Well, ok, there is one edge case: if there aren't any matching files, the wildcard pattern will just get passed through literally, and it'll wind up trying to run the script with the unexpanded string "file_list*.dat" as an argument. If you want to avoid this, use shopt -s nullglob before this command (and shopt -u nullglob afterward, to get back to normal mode).
Oh, and one more thing: sh script.sh isn't the best way to run scripts. Give the script a proper shebang line at the beginning (#!/bin/sh if it uses only basic shell features, #!/bin/bash or #!/usr/bin/env bash if it uses any bashisms), and run it with ./script.sh.

In a Bash script, how to use a variable to set multiple command options?

I want to store options to arbitrary commands as strings in bash so that I can do e.g.
presets_A='-A'
presets_B='-A -l -F'
ls $presets_A
ls $presets_B
The first one works, the socond gives ls: invalid option -- ' '.
The same happens when I try to store the entire command in a string variable (as opposed to a function
or an alias, which is not what I want):
presets_A='ls -A'
presets_B='ls -A -l -F'
$presets_A
$presets_B
This gives ls -A: command not found. Not good. Obviously, I haven't yet found the correct arbitrary
mixture of $%"(#]}" quotes and parens that Bash is so famous for. ${!presets_A} also didn't work
but chances are I got confused and messed up.
EDIT for clarification, I do know how to use a function to encapsulate setting options and subsuming a bunch of parametrized calls under a single command. What I'm looking for is the equivalent of (Bash) foo "$#" or (Python) foo( *P, **Q ) or JS foo( ...P ) such that I get a single serializable and transportable value that comprises the arguments to a call in a single place.
If functions really don't do what you want, then you can use an array to separate the arguments:
presets_A=( -A )
presets_B=( "${presets_A[#]}" -l -F ) # or just -A -l -F
ls "${presets_A[#]}" # ls -A
ls "${presets_B[#]}" # ls -A -l -F
Use "${array[#]}" to expand the array into a list of arguments, separated by the field separator (a space, by default).
But I would consider just defining two functions
la () {
ls -A
}
lalf () {
ls -A -l -F
}
I've since found the source for my troubles; I had tried to use a variable for options in a script with this header:
#!/usr/bin/env bash
set -euo pipefail; IFS=$'\n\t'
presets_A='-A'
presets_B='-A -l -F'
ls $presets_A
ls $presets_B
It does work as intended when I remove the Internal Field Separator setting, and I therefore think the correct answer to my question is:
"Storing multiple command arguments in a variable is OK in Bash and should work as long as IFS has not been reset to something other than a space."
I can affirm that
#!/usr/bin/env bash
set -euo pipefail; IFS=$' '
presets_A='-A'
presets_B='-A -l -F'
command_C='ls -A -l -F'
ls $presets_A
echo '----------------------------------'
ls $presets_B
echo '----------------------------------'
$command_C
does work as expected; thanks go to #TomFenech for pointing out that much (note the explicit IFS=' ' near the top).
FWIW the reason i had re-set IFS is that there are quite a few recommendations to do so out there; they call it 'Bash strict mode'. Turns out it makes your scripts incompatible in subtle ways.

Quoting rules for script -c 'command'

I am looking for the quoting/splitting rules for a command passed to script -c command. The man pages just says
-c, --command command: Run the command rather than an interactive shell.
but I want to make sure "command" is properly escaped.
The COMMAND argument is just a regular string that is processed by the shell as if it were an excerpt of a file. We may think of -c COMMAND as being functionally equivalent of
printf '%s' COMMAND > /tmp/command_to_execute.sh
sh /tmp/command_to_execute.sh
The form -c COMMAND is however superior to the version relying of an auxiliary file because it avoids race conditions related to using an auxiliary file.
In the typical usage of the -c COMMAND option we pass COMMAND as a single-quoted string, as in this pseudo-code example:
sh -c '
do_some_complicated_tests "$1" "$2";
if something; then
proceed_this_way "$1" "$2";
else
proceed_that_way "$1" "$2";
fi' ARGV0 ARGV1 ARGV2
If command must contain single-quoted string, we can rely on printf to build the COMMAND string, but this can be tedious. An example of this technique is illustrated
by the overcomplicated grep-like COMMAND defined here:
% AWKSCRIPT='$0 ~ expr {print($0)}'
% COMMAND=$(printf 'awk -v expr="$1" \047%s\047' "$AWKSCRIPT")
% sh -c "$COMMAND" print_matching 'tuning' < /usr/share/games/fortune/freebsd-tips
"man tuning" gives some tips how to tune performance of your FreeBSD system.
Recall that 047 is octal representation of the ASCII code for the single quote character.
As a side note, these constructions are quite command in Makefiles where they can replace shell functions.

Preserving whitespaces in a string as a command line argument

I'm facing a small problem here, I want to pass a string containing whitespaces , to another program such that the whole string is treated as a command line argument.
In short I want to execute a command of the following structure through a bash shell script:
command_name -a arg1 -b arg2 -c "arg with whitespaces here"
But no matter how I try, the whitespaces are not preserved in the string, and is tokenized by default. A solution please,
edit: This is the main part of my script:
#!/bin/bash
#-------- BLACKRAY CONFIG ---------------#
# Make sure the current user is in the sudoers list
# Running all instances with sudo
BLACKRAY_BIN_PATH='/opt/blackray/bin'
BLACKRAY_LOADER_DEF_PATH='/home/crozzfire'
BLACKRAY_LOADER_DEF_NAME='load.xml'
BLACKRAY_CSV_PATH='/home/crozzfire'
BLACKRAY_END_POINT='default -p 8890'
OUT_FILE='/tmp/out.log'
echo "The current binary path is $BLACKRAY_BIN_PATH"
# Starting the blackray 0.9.0 server
sudo "$BLACKRAY_BIN_PATH/blackray_start"
# Starting the blackray loader utility
BLACKRAY_INDEX_CMD="$BLACKRAY_BIN_PATH/blackray_loader -c $BLACKRAY_LOADER_DEF_PATH/$BLACKRAY_LOADER_DEF_NAME -d $BLACKRAY_CSV_PATH -e "\"$BLACKRAY_END_POINT\"""
sudo time $BLACKRAY_INDEX_CMD -a $OUT_FILE
#--------- END BLACKRAY CONFIG ---------#
You're running into this problem because you store the command in a variable, then expand it later; unless there's a good reason to do this, don't:
sudo time $BLACKRAY_BIN_PATH/blackray_loader -c $BLACKRAY_LOADER_DEF_PATH/$BLACKRAY_LOADER_DEF_NAME -d $BLACKRAY_CSV_PATH -e "$BLACKRAY_END_POINT" -a $OUT_FILE
If you really do need to store the command and use it later, there are several options; the bash-hackers.org wiki has a good page on the subject. It looks to me like the most useful one here is to put the command in an array rather than a simple variable:
BLACKRAY_INDEX_CMD=($BLACKRAY_BIN_PATH/blackray_loader -c $BLACKRAY_LOADER_DEF_PATH/$BLACKRAY_LOADER_DEF_NAME -d $BLACKRAY_CSV_PATH -e "$BLACKRAY_END_POINT")
sudo time "${BLACKRAY_INDEX_CMD[#]}" -a $OUT_FILE
This avoids the whole confusion between spaces-separating-words and spaces-within-words because words aren't separated by spaces -- they're in separate elements of the array. Expanding the array in double-quotes with the [#] suffix preserves that structure.
(BTW, another option would be to use escaped quotes rather like you're doing, then run the command with eval. Don't do this; it's a good way to introduce weird parsing bugs.)
Edit:
Try:
BLACKRAY_END_POINT="'default -p 8890'"
or
BLACKRAY_END_POINT='"default -p 8890"'
or
BLACKRAY_END_POINT="default\ -p\ 8890"
or
BLACKRAY_END_POINT='default\ -p\ 8890'
and
BLACKRAY_INDEX_CMD="$BLACKRAY_BIN_PATH/blackray_loader -c $BLACKRAY_LOADER_DEF_PATH/$BLACKRAY_LOADER_DEF_NAME -d $BLACKRAY_CSV_PATH -e $BLACKRAY_END_POINT"
Original answer:
Is blackray_loader a shell script?
Here is a demonstration that you have to deal with this issue both when specifying the parameter and when handling it:
A text file called "test.txt" (include the line numbers):
1 two words
2 two words
3 two
4 words
A script called "spacetest":
#!/bin/bash
echo "No quotes in script"
echo $1
grep $1 test.txt
echo
echo "With quotes in script"
echo "$1"
grep "$1" test.txt
echo
Running it with ./spacetest "two--------words" (replace the hyphens with spaces):
No quotes in script
two words
grep: words: No such file or directory
test.txt:1 two words
test.txt:2 two words
test.txt:3 two
With quotes in script
two words
2 two words
You can see that in the "No quotes" section it tried to do grep two words test.txt which interpreted "words" as a filename in addition to "test.txt". Also, the echo dropped the extra spaces.
When the parameter is quoted, as in the second section, grep saw it as one argument (including the extra spaces) and handled it correctly. And echo preserved the extra spaces.
I used the extra spaces, by the way, merely to aid in the demonstration.
I have a suggestion:
# iterate through the passed arguments, save them to new properly quoted ARGS string
while [ -n "$1" ]; do
ARGS="$ARGS '$1'"
shift
done
# invoke the command with properly quoted arguments
my_command $ARGS
probably you need to surround the argument by double quotes (e.g. "${6}").
Following OP comment it should be "$BLACKRAY_END_POINT"
Below is my example of restarting a script via exec su USER or exec su - USER. It accommodates:
being called from a relative path or current working directory
spaces in script name and arguments
single and double-quotes in arguments, without crazy escapes like: \\"
#
# This script should always be run-as a specific user
#
user=jimbob
if [ $(whoami) != "$user" ]; then
exec su -c "'$(readlink -f "$0")' $(printf " %q" "$#")" - $user
exit $?
fi
A post on other blog saved me for this whitespaces problem: http://logbuffer.wordpress.com/2010/09/23/bash-scripting-preserve-whitespaces-in-variables/
By default, whitespaces are trimed:
bash> VAR1="abc def gh ijk"
bash> echo $VAR1
abc def gh ijk
bash>
"The cause of this behaviour is the internal shell variable $IFS (Internal Field Separator), that defaults to whitespace, tab and newline.
To preserve all contiguous whitespaces you have to set the IFS to something different"
With IFS bypass:
bash> IFS='%'
bash> echo $VAR1
abc def gh ijk
bash>unset IFS
bash>
It works wonderfully for my command case:
su - user1 -c 'test -r "'${filepath}'"; ....'
Hope this helps.

Resources