How to separate string into shell arguments? - shell

I have this test variable in ZSH:
test_str='echo "a \" b c"'
I'd like to parse this into an array of two strings ("echo" "a \" b c").
i.e. Read test_str as the shell itself would and give me back an array of
arguments.
Please note that I'm not looking to split on white space or anything like that. This is really about parsing arbitrarily complex strings into shell arguments.

Zsh has (z) modifier:
ARGS=( ${(z)test_str} )
. But this will produce echo and "a \" b c", it won’t unquote string. To unquote you have to use Q modifier:
ARGS=( ${(Q)${(z)test_str}} )
: results in having echo and a " b c in $ARGS array. Neither would execute code in … or $(…), but (z) will split $(false true) into one argument.
that is to say:
% testfoo=${(z):-'blah $(false true)'}; echo $testfoo[2]
$(false true)

A simpler (?) answer is hinted at by the wording of the question. To set shell argument, use set:
#!/bin/sh
test_str='echo "a \" b"'
eval set $test_str
for i; do echo $i; done
This sets $1 to echo and $2 to a " b. eval certainly has risks, but this is portable sh. It does not assign to an array, of course, but you can use $# in the normal way.

Related

why there is different output in for-loop

Linux bash: why the two shell script as follow had different result?
[root#yumserver ~]# data="a,b,c";IFS=",";for i in $data;do echo $i;done
a
b
c
[root#yumserver ~]# IFS=",";for i in a,b,c;do echo $i;done
a b c
expect output: the second script also output:
a
b
c
I should understood what #M.NejatAydin means。Thanks also #EdMorton,#HaimCohen!
[root#k8smaster01 ~]# set -x;data="a,b,c";IFS=",";echo $data;echo "$data";for i in $data;do echo $i;done
+ data=a,b,c
+ IFS=,
+ echo a b c
a b c
+ echo a,b,c
a,b,c
+ for i in '$data'
+ echo a
a
+ for i in '$data'
+ echo b
b
+ for i in '$data'
+ echo c
c
[root#k8smaster01 ~]# IFS=",";for i in a,b,c;do echo $i;done
+ IFS=,
+ for i in a,b,c
+ echo a b c
a b c
Word splitting is performed on the results of unquoted expansions (specifically, parameter expansions, command substitutions, and arithmetic expansions, with a few exceptions which are not relevant here). The literal string a,b,c in the
second for loop is not an expansion at all. Thus, word splitting is not performed on that literal string. But note that, in the second example, word splitting is still performed on $i (an unquoted expansion) in the command echo $i.
It seems the point of confusion is where and when the IFS is used. It is used in the word splitting phase following an (unquoted) expansion. It is not used when the shell reads its input and breaks the input into words, which is an earlier phase.
Note: IFS is also used in other contexts (eg, by the read builtin command) which are not relevant to this question.
#HaimCohen explained in detail why you get a different result with those two approaches. Which is what you asked. His answer is correct, it should get upvoted and accepted.
Just a trivial addition from my side: you can easily modify the second of your approaches however if you define the variable on the fly:
IFS=",";for i in ${var="a,b,c"};do echo $i;done

Looping through variable with spaces

This piece of code works as expected:
for var in a 'b c' d;
do
echo $var;
done
The bash script loops through 3 arguments printing
a
b c
d
However, if this string is read in via jq , and then looped over like so:
JSON_FILE=path/to/jsonfile.json
ARGUMENTS=$(jq -r '.arguments' "${JSON_FILE}")
for var in ${ARGUMENTS};
do
echo $var;
done
The result is 4 arguments as follows:
a
'b
c'
d
Example json file for reference:
{
"arguments" : "a 'b c' d"
}
What is the reason for this? I tried putting quotes around the variable like suggested in other SO answers but that caused everything to just be handled as 1 argument.
What can I do to get the behavior of the first case (3 arguments)?
What is the reason for this?
The word splitting expansion is run over unquoted results of other expansions. Because ${ARGUMENTS} expansion in for var in ${ARGUMENTS}; is unquoted, word splitting is performed. No, word splitting ignores quotes resulted from variable expansion - it only cares about whitespaces.
What can I do to get the behavior of the first case (3 arguments)?
The good way™ would be to write your own parser, to parse the quotes inside the strings and split the argument depending on the quotes.
I advise to use xargs, it (by default, usually a confusing behavior) parses quotes in the input strings:
$ arguments="a 'b c' d"
$ echo "${arguments}" | xargs -n1 echo
a
b c
d
# convert to array
$ readarray -d '' arr < <(<<<"${arguments}" xargs printf "%s\0")
As presented in the other answer, you may use eval, but please do not, eval is evil and will run expansions over the input string.
Change IFS to a new line to make it work:
...
IFS='\n'; for var in $ARGUMENTS;
do
echo $var;
done

The semantics of arrays in bash

Check out the following transcript. With all possible rigor and formality, what is going on at each step?
$> ls -1 #This command prints 3 items. no explanation required.
a
b
c
$> X=$(ls -1) #Capture the output (as what? a string?)
$> Y=($(ls -1)) #Capture it again (as an array now?)
$> echo ${#X[#]} #Why is the length 1?
1
$> echo ${#Y[#]} #This works because Y is an array of the 3 items?
3
$> echo $X #Why are the linefeeds now spaces?
a b c
$> echo $Y #Why does the array echo as its first element
a
$> for x in $X;do echo $x; done #iterate over $X
a
b
c
$> for y in $Y;do echo $y; done #iterating over y doesn't work
a
$> echo ${X[2]} #I can loop over $X but not index into it?
$> echo ${Y[2]} #Why does this work if I can't loop over $Y?
c
I assume bash has well established semantics about how arrays and text variables (if that's even what they're called) work, but the user manual is not organized in an optimal fashion for someone who wants to reason about scripts based on whatever small set of underlying principles the language designer intended.
Let me preface the following with the very strong suggestion that you never use ls to populate an array. The correct code would be
Z=( * )
to create an array with each (non-hidden) file in the current directory as a distinct array element.
$> ls -1 #This command prints 3 items. no explanation required.
a
b
c
Correct. Each file name is printed on a separate line (although, beware of file names containing newlines; the parts before and after each newline would appear as separate file names.)
$> X=$(ls -1) #Capture the output (as what? a string?)
Yes. The output of ls is concatenated by the command substitution into a single string using a single space to separate each line. (The command substitution would be subject to word-splitting if it weren't the right-hand side of an assignment; word-splitting will come up below.)
$> Y=($(ls -1)) #Capture it again (as an array now?)
Same as with X, but now each of the words in the result of the command substitution is treated as a separate array element. As long as none of the output lines contain any characters in the value of IFS, each file name is one word and will be treated as a separate array element.
$> echo ${#X[#]} #Why is the length 1?
1
X, not being a real array, is treated as an array with a single element, namely the value of $X.
$> echo ${#Y[#]} #This works because Y is an array of the 3 items?
3
Correct.
$> echo $X #Why are the linefeeds now spaces?
a b c
When $X is unquoted, the resulting expansion is subject to word-splitting. In this case, the newlines are simply treated the same as any other whitespace, separating the result into a sequence of words that are passed to echo as distinct arguments, which are then displayed separated by a single space each.
$> echo $Y #Why does the array echo as its first element
a
For a true array, $Y is equivalent to ${Y[0]}.
$> for x in $X;do echo $x; done #iterate over $X
a
b
c
This works, but has caveats.
$> for y in $Y;do echo $y; done #iterating over y doesn't work
a
See above; $Y only expands to the first element. You want for y in "${Y[#]}"; do to iterate over all the elements.
$> echo ${X[2]} #I can loop over $X but not index into it?
Correct. X is not an array, but $X expanded to a space-separated list which the for loop could iterate over.
$> echo ${Y[2]} #Why does this work if I can't loop over $Y?
c
Indexing and iteration are two completely different things in shell. You don't actually iterate over an array; you iterate over the resulting sequence of words of a properly expanded array.

Why is my csh script not working with special characters?

#!/bin/csh -f
foreach line ("`cat test`")
set x=`echo "$line" | awk '{split($0, b, " "); print b[1]}'`
echo "$x"
end
test file contains following contents:
How to Format
Stack[7] Overflow
Put returns between paragraphs
On executing the script I am getting following error:
set: No match.
How to store the string which contains special character like square brackets [] in a variable and then use them in code?
The problem is that you're doing:
set x = [some string with shell globbing characters]
This won't work for the same reason that set x = foo works, but set x = [foo] doesn't. You need to use set x = "[foo]" (or '[foo]') to escape the special shell globbing characters ([ and ] in this case).
Nesting quotes in the C shell is pretty hard, and it's one the reasons it's generally discouraged to use the C shell for scripting. It's perhaps possible for your command, but I'm not smart enough (or too lazy) to figure out how. My solution is typically to set the special noglob variable to prevent expansion of globbing characters:
set noglob
foreach line ("`cat test`")
set x = `echo "$line" | awk '{split($0, b, " "); print b[1]}'`
echo "$x"
end
outputs:
How
Stack[7]
Put
P.S. There is an easier way to echo the first word of every line; put it in a list:
set noglob
foreach line ("`cat test`")
set x = ($line)
echo "$x[1]"
end

bash Script Arguments grouping

I am having problems with expansion of command-line options containing spaces. They are not getting grouped as I expect them to be. How can I modify following code(below) to get the desired output(below).
function myFunction {
while getopts "a:b:A:" optionName; do
echo "$optionName::$OPTARG"
done
}
#dynamic variable, cannot be hardcoded into $MY_ARGS
MY_VAR="X1=162356374 X2=432876 X3=342724"
#$MY_ARGS is useful and will be used more than once,
#so we don't want to eliminate it and replace it's usage with its value everywhere
MY_ARGS="-a 24765437643 -b '$MY_VAR' -A jeeywewueuye"
myFunction $MY_ARGS
Actual Output:
a::24765437643
b::'X1=162356374
Desired Output:
a::24765437643
b::X1=162356374 X2=432876 X3=342724
A::jeeywewueuye
The best way to store a list of arguments is in an array. An array can handle arguments with whitespace without problem, and you don't have to figure out how to get the quotes and backslashes just right.
MY_ARGS=(-a 24765437643 -b "$MY_VAR" -A jeeywewueuye)
myFunction "${MY_ARGS[#]}"
The only unnatural part about arrays is the weird syntax to expand them: "${array[#]}". The quotes, curly braces, and [#] notation are all important.
I agree that arrays answer the question in the best way.
Perhaps you don't want to use arrays (colleagues will not understand), or you must obey the Google Guidelines for bash (nice work, I agree with for over 90%). that claims: "If you find you need to use arrays for anything more than assignment of ${PIPESTATUS}, you should use Python. ".
When you must look for another solutions:
An ugly solution is changing the IFS:
function myFunction {
while getopts "a:b:A:" optionName; do
echo "$optionName::$OPTARG"
done
}
#dynamic variable, cannot be hardcoded into $MY_ARGS
MY_VAR="X1=162356374 X2=432876 X3=342724"
MY_ARGS="-a/24765437643/-b/"$MY_VAR"/-A/jeeywewueuye"
IFS=/
myFunction ${MY_ARGS}
Perhaps you want to do something with myFunction
function myFunction {
while getopts "a:bA:" optionName; do
case "${optionName}" in
b) echo "${optionName}::${MY_VAR}" ;;
*) echo "${optionName}::${OPTARG}" ;;
esac
done
}
or you could tr the spaces into another character before calling myFunction and tr the characters back to spaces in myFunction().

Resources