How do bash variable types work and how to work around automatic interpretation? - bash

I am trying to set up a variable that contains a string representation of a value with leading zeroes. I know I can printf to terminal the value, and I can pass the string output of printf to a variable. It seems however that assigning the value string to a new variable reinterprets the value and if I then print it, the value has lost its leading zeroes.
How do we work with variables in bash scripts to avoid implicit type inferences and ultimately how do I get to the solution I'm looking for. FYI I'm looking to concatenate a large fixed length string numeric, something like a part number, and build it from smaller prepared strings.
Update:
Turns out exactly how variables are assigned changes their interpretation in some way, see below:
Example:
#!/bin/bash
a=3
b=4
aStr=$(printf %03d $a)
bStr=$(printf %03d $b)
echo $aStr$bStr
output
$ ./test.sh
003004
$
Alternate form:
#!/bin/bash
((a = 3))
((b = 4))
((aStr = $(printf %03d $a)))
((bStr = $(printf %03d $b)))
echo $aStr$bStr
output
$ ./test.sh
34
$

How do bash variable types
There are no variable types. All variables are strings (type).. Variables store a value (a string), but also variables have some additional magic attributes associated with them.
There are Bash arrays, but I think it's an attribute that a variable is an array. Still, in any case, every array element holds a string. There is a "numeric" variable declare -i var, but it's attribute of the variable - in memory, the variable is still a string, only when setting it Bash checks if the string (still a string!) to be set is a number.
assigning the value string to a new variable reinterprets the value
Bash does not "interpret" the value on assignment.
How do we work with variables in bash scripts to avoid implicit type inferences
There are no "type inferences". The type of variable does not change - it holds a string.
The value of the variable undergoes different expansions and conversions depending on the context where it is used. For example $(...) removes trailing newlines. Most notably unquoted variable expansions undergo word splitting and filename expansion.
Example:
Posting your code to shellcheck results in:
Line 2:
a = 3
^-- SC2283 (error): Remove spaces around = to assign (or use [ ] to compare, or quote '=' if literal).
Line 3:
b = 4
^-- SC2283 (error): Remove spaces around = to assign (or use [ ] to compare, or quote '=' if literal).
Line 4:
aStr = $(printf %03d $a)
^-- SC2283 (error): Remove spaces around = to assign (or use [ ] to compare, or quote '=' if literal).
^-- SC2046 (warning): Quote this to prevent word splitting.
^-- SC2154 (warning): a is referenced but not assigned.
^-- SC2086 (info): Double quote to prevent globbing and word splitting.
Did you mean: (apply this, apply all SC2086)
aStr = $(printf %03d "$a")
Line 5:
bStr = $(printf %03d $b)
^-- SC2283 (error): Remove spaces around = to assign (or use [ ] to compare, or quote '=' if literal).
^-- SC2046 (warning): Quote this to prevent word splitting.
^-- SC2154 (warning): b is referenced but not assigned.
^-- SC2086 (info): Double quote to prevent globbing and word splitting.
Did you mean: (apply this, apply all SC2086)
bStr = $(printf %03d "$b")
Line 7:
echo $aStr$bStr
^-- SC2154 (warning): aStr is referenced but not assigned.
^-- SC2086 (info): Double quote to prevent globbing and word splitting.
^-- SC2154 (warning): bStr is referenced but not assigned.
^-- SC2086 (info): Double quote to prevent globbing and word splitting.
Did you mean: (apply this, apply all SC2086)
echo "$aStr""$bStr"
Shellcheck tells you what is wrong. After fixing the problems:
#!/bin/bash
a=3
b=4
aStr=$(printf %03d "$a")
bStr=$(printf %03d "$b")
echo "$aStr$bStr"
Which upon execution outputs your expected output:
003004

By doing ((aStr = $(printf %03d $a))), you are destroying again the careful formatting done by printf. You would see the same effect if you do a
(( x = 005 ))
echo $x
which outputs 5.
Actually the zeroes inserted by printf could do harm to your number, as you see by the following example:
(( x = 015 ))
echo $x
which outputs 13, because the ((....)) interprets the leading zero as an indication for octal numbers.
Hence, if you have a string representing a formatted (pretty-printed) number, don't use this string in numeric context anymore.

Related

Extract value for a key in a key/pair string

I have key value pairs in a string like this:
key1 = "value1"
key2 = "value2"
key3 = "value3"
In a bash script, I need to extract the value of one of the keys like for key2, I should get value2, not in quote.
My bash script needs to work in both Redhat and Ubuntu Linux hosts.
What would be the easiest and most reliable way of doing this?
I tried something like this simplified script:
pattern='key2\s*=\s*\"(.*?)\".*$'
if [[ "$content" =~ $pattern ]]
then
key2="${BASH_REMATCH[1]}"
echo "key2: $key2"
else
echo 'not found'
fi
But it does not work consistently.
Any better/easier/more reliable way of doing this?
To separate the key and value from your $content variable, you can use:
[[ $content =~ (^[^ ]+)[[:blank:]]*=[[:blank:]]*[[:punct:]](.*)[[:punct:]]$ ]]
That will properly populate the BASH_REMATCH array with both values where your key is in BASH_REMATCH[1] and the value in BASH_REMATCH[2].
Explanation
In bash the [[...]] treats what appears on the right side of =~ as an extended regular expression and matched according to man 3 regex. See man 1 bash under the section heading for [[ expression ]] (4th paragraph). Sub-expressions in parenthesis (..) are saved in the array variable BASH_REMATCH with BASH_REMATCH[0] containing the entire portion of the string (your $content) and each remaining elements containing the sub-expressions enclosed in (..) in the order the parenthesis appear in the regex.
The Regular Expression (^[^ ]+)[[:blank:]]*=[[:blank:]]*[[:punct:]](.*)[[:punct:]]$ is explained as:
(^[^ ]+) - '^' anchored at the beginning of the line, [^ ]+ match one or more characters that are not a space. Since this sub-expression is enclosed in (..) it will be saved as BASH_REMATCH[1], followed by;
[[:blank:]]* - zero or more whitespace characters, followed by;
= - an equal sign, followed by;
[[:blank:]]* - zero or more whitespace characters, followed by;
[[:punct:]] - a punctuation character (matching the '"', which avoids caveats associated with using quotes within the regex), followed by the sub-expression;
(.*) - zero or more characters (the rest of the characters), and since it is a sub-expression in (..) it the characters will be stored in BASH_REMATCH[2], followed by;
[[:punct:]] - a punctuation character (matching the '"' ... ditto), at the;
$ - end of line anchor.
So if you match what your key and value input lines separated by an = sign, it will separate the key and value into the array BASH_REMATCH as you wanted.
Bash supports BRE only and you cannot use \s and .*?.
As an alternative, please try:
while IFS= read -r content; do
# pattern='key2\s*=\s*\"(.*)\".*$'
pattern='key2[[:blank:]]*=[[:blank:]]*"([^"]*)"'
if [[ $content =~ $pattern ]]
then
key2="${BASH_REMATCH[1]}"
echo "key2: $key2"
(( found++ ))
fi
done < input-file.txt
if (( found == 0 )); then
echo "not found"
fi
What you start talking about key-value pairs, it is best to use an associative array:
declare -A map
Now looking at your lines, they look like key = "value" where we assume that:
value is always encapsulated by double quotes, but also could contain a quote
an unknown number of white spaces is before and/or after the equal sign.
So assuming we have a variable line which contains key = "value", the following operations will extract that value:
key="${line%%=*}"; key="${key// /}"
value="${line#*=}"; value="${value#*\042}"; value="${value%\042*}"
IFS=" \t=" read -r value _ <<<"$line"
This allows us now to have something like:
declare -A map
while read -r line; do
key="${line%%=*}"; key="${key// /}"
value="${line#*=}"; value="${value#*\042}"; value="${value%\042*}"
map["$key"]="$value"
done <inputfile
With awk:
awk -v key="key2" '$1 == key { gsub("\"","",$3);print $3 }' <<< "$string"
Reading the output of the variable called string, pass the required key in as a variable called key and then if the first space delimited field is equal to the key, remove the quotes from the third field with the gsub function and print.
Ok, after spending so many hours, this is how I solved the problem:
If you don't know where your script will run and what type of file (win/mac/linux) are you reading:
Try to avoid non-greedy macth in linux bash instead of tweaking diffrent switches.
don't trus end of line match $ when you might get data from windows or mac
This post solved my problem: Non greedy text matching and extrapolating in bash
This pattern works for me in may linux environments and all type of end of lines:
pattern='key2\s*=\s*"([^"]*)"'
The value is in BASH_REMATCH[1]

Looping through variable with spaces

This piece of code works as expected:
for var in a 'b c' d;
do
echo $var;
done
The bash script loops through 3 arguments printing
a
b c
d
However, if this string is read in via jq , and then looped over like so:
JSON_FILE=path/to/jsonfile.json
ARGUMENTS=$(jq -r '.arguments' "${JSON_FILE}")
for var in ${ARGUMENTS};
do
echo $var;
done
The result is 4 arguments as follows:
a
'b
c'
d
Example json file for reference:
{
"arguments" : "a 'b c' d"
}
What is the reason for this? I tried putting quotes around the variable like suggested in other SO answers but that caused everything to just be handled as 1 argument.
What can I do to get the behavior of the first case (3 arguments)?
What is the reason for this?
The word splitting expansion is run over unquoted results of other expansions. Because ${ARGUMENTS} expansion in for var in ${ARGUMENTS}; is unquoted, word splitting is performed. No, word splitting ignores quotes resulted from variable expansion - it only cares about whitespaces.
What can I do to get the behavior of the first case (3 arguments)?
The good way™ would be to write your own parser, to parse the quotes inside the strings and split the argument depending on the quotes.
I advise to use xargs, it (by default, usually a confusing behavior) parses quotes in the input strings:
$ arguments="a 'b c' d"
$ echo "${arguments}" | xargs -n1 echo
a
b c
d
# convert to array
$ readarray -d '' arr < <(<<<"${arguments}" xargs printf "%s\0")
As presented in the other answer, you may use eval, but please do not, eval is evil and will run expansions over the input string.
Change IFS to a new line to make it work:
...
IFS='\n'; for var in $ARGUMENTS;
do
echo $var;
done

Why does IFS not affect the length of an array in bash?

I have two specific questions about the IFS. I'm aware that changing the internal field separator, IFS, changes what the bash script iterates over.
So, why is it that the length of the array doesn't change?
Here's my example:
delimiter=$1
strings_to_find=$2
OIFS=$IFS
IFS=$delimiter
echo "the internal field separator is $IFS"
echo "length of strings_to_find is ${#strings_to_find[#]}"
for string in ${strings_to_find[#]}
do
echo "string is separated correctly and is $string"
done
IFS=$OIFS
But why does the length not get affected by the new IFS?
The second thing that I don't understand is how to make the IFS affect the input arguments.
Let's say I'm expecting my input arguments to look like this:
./executable_shell_script.sh first_arg:second_arg:third_arg
And I want to parse the input arguments by setting the IFS to :. How do I do this? Setting the IFS doesn't seem to do anything. I must be doing this wrong....?
Thank you.
Bash arrays are, in fact, arrays. They are not strings which are parsed on demand. Once you create an array, the elements are whatever they are, and they won't change retroactively.
However, nothing in your example creates an array. If you wanted to create an array out of argument 2, you would need to use a different syntax:
strings_to_find=($2)
Although your strings_to_find is not an array, bash allows you to refer to it as though it were an array of one element. So ${#strings_to_find[#]} will always be one, regardless of the contents of strings_to_find. Also, your line:
for string in ${strings_to_find[#]}
is really no different from
for string in $strings_to_find
Since that expansion is not quoted, it will be word-split, using the current value of IFS.
If you use an array, most of the time you will not want to write for string in ${strings_to_find[#]}, because that just reassembles the elements of an array into a string and then word-splits them again, which loses the original array structure. Normally you will avoid the word-splitting by using double quotes:
strings_to_find=(...)
for string in "${strings_to_find[#]}"
As for your second question, the value of IFS does not alter the shell grammar. Regardless of the value of IFS, words in a command are separated by unquoted whitespace. After the line is parsed, the shell performs parameter and other expansions on each word. As mentioned above, if the expansion is not quoted, the expanded text is then word-split using the value of IFS.
If the word does not contain any expansions, no word-splitting is performed. And even if the word does contain expansions, word-splitting is only performed on the expansion itself. So, if you write:
IFS=:
my_function a:b:c
my_function will be called with a single argument; no expansion takes places, so no word-splitting occurs. However, if you use $1 unquoted inside the function, the expansion of $1 will be word-split (if it is expanded in a context in which word-splitting occurs).
On the other hand,
IFS=:
args=a:b:c
my_function $args
will cause my_function to be invoked with three arguments.
And finally,
IFS=:
args=c
my_function a:b:$args
is exactly the same as the first invocation, because there is no : in the expansion.
This is an example script based on #rici's answer :
#!/bin/bash
fun()
{
echo "Total Params : " $#
}
fun2()
{
array1=($1) # Word splitting occurs here based on the IFS ':'
echo "Total elements in array1 : "${#array1[#]}
# Here '#' before array counts the length of the array
array2=("$1") # No word splitting because we have enclosed $1 in double quotes
echo "Total elements in array2 : "${#array2[#]}
}
IFS_OLD="$IFS"
IFS=$':' #Changing the IFS
fun a:b:c #Nothing to expand here, so no use of IFS at all. See fun2 at last
fun a b c
fun abc
args="a:b:c"
fun $args # Expansion! Word splitting occurs with the current IFS ':' here
fun "$args" # preventing word spliting by enclosing ths string in double quotes
fun2 a:b:c
IFS="$IFS_OLD"
Output
Total Params : 1
Total Params : 3
Total Params : 1
Total Params : 3
Total Params : 1
Total elements in array1 : 3
Total elements in array2 : 1
Bash manpage says :
The shell treats each character of IFS as a delimiter, and splits the
results of the other expansions into words on these characters.

How bash eval expansion work for single qoute double qoute

Someone please help to explain how this work? About the single quote it should not interpret anything but it is not working as what i expected. I expect to get echo $testvar value exactly '"123b"'.
a="testvar"
b="'"123b"'"
eval $a='$b'
echo $testvar
'123b'
a="testvar"
b='"123b"'
eval $a='$b'
echo $testvar
"123b"
a="testvar"
b='"123b"'
eval $a=$b
echo $testvar
123b
Guessing that you want testvar to be <single-quote><double-quote>123b<double-quote><single-quote>:
testvar=\'\"123b\"\'
Consider this in C or Java:
char* str = "123b";
printf("%s\n", str);
String str = "123b";
System.out.println(str);
Why does this write 123b when we clearly used double quotes? Why doesn't it write "123b", with quotes?
The answer is that the quotes are not part of the data. The quotes are used by the programming language to determine where strings start and stop, but they're not in any way part of the string. This is just as true for Bash as for C and Java.
Just like there's no way in Java to differentiate Strings created with "123" + "b" and "123b", there's no way in Bash to tell that b='"123b"' used single quotes in its definition, as opposed to e.g. b=\"123b\".
If given a variable you want to assign its value surrounded by single quotes, you can use e.g.
printf -v testvar "'%s'" "$b"
But this just adds new literal single quotes around a string. It doesn't and cannot care how b was originally quoted, because that information is stored.
To instead add a layer of escaping to a variable, so that when evaluated once it turns into a literal string identical to your input, you can use:
printf -v testvar "%q" "$b"
This will produce a value which is quoted equivalently but not necessarily identically to your original definition. For "value" (a literal with double quotes in it), it may produce \"value\" or '"value"' or '"'value'"' which all evaluate exactly to "value".

Replace " " with "\ " in a file path string with variable expansion

I know there is a better way to do this.
What is the better way?
How do you do a string replace on a string variable in bash?
For Example: (using php because that's what I know)
$path = "path/to/directory/foo bar";
$path = str_replace(" ", "\ ", "$path");
echo $path;
returns:
path/to/directory/foo\ bar
To perform the specific replacement in bash:
path='path/to/directory/foo bar'
echo "${path// /\\ }"
Don't use prefix $ when assigning to variables in bash.
No spaces are allowed around the =.
Note that path is assigned with single quotes, whereas the string replacement occurs in double quotes - this distinction is important: bash does NOT interpret single-quoted strings, whereas you can refer to variables (and do other things) in double-quoted strings; (also, not quoting a variable reference at all has other ramifications, often undesired - in general, double-quote your variable references)
Explanation of string replacement "${path// /\\ }":
In order to perform value substitution on a variable, you start with enclosing the variable name in {...}
// specifies that ALL occurrences of the following search pattern are to be replaced (use / to replace the first occurrence only).
/ separates the search pattern, (a single space), from the replacement string, \\ .
The replacement string, \ , must be represented as \\ , because \ has special meaning as an escape char. and must therefore itself be escaped for literal use.
The above is an instance of what bash (somewhat cryptically) calls shell parameter expansion and also parameter expansion and [parameter and] variable expansion. There are many more flavors, such as for extracting a substring, providing a default value, stripping a prefix or suffix, ... - see the BashGuide page on the topic or the manual.
As for what types of expressions are supported in the search and replacement strings:
The search expression is a globbing pattern of the same type used in filename expansion (e.g, *.txt); for instance, v='dear me'; echo "${v/m*/you}" yields 'dear you'. Note that the longest match will be used.
Additionally, the first character of the pattern has special meaning in this context:
/, as we've seen above, causes all matching occurrences of the pattern to be replaced - by default, only the first one is replaced.
# causes the rest of the pattern to only match at the beginning of the input variable
% only matches at the end
The replacement expression is a string that is subject to shell expansions; while there is no support for backreferences, the fact that the string is expanded allows you to have the replacement string reference other variables, contain commands, with $(...), ...; e.g.:
v='sweet home'; echo "${v/home/$HOME}" yields, for instance, 'sweet /home/jdoe'.
v='It is now %T'; echo "${v/\%T/$(date +%T)}" yields, for instance, It is now 10:05:17.
o1=1 o2=3 v="$o1 + $o2 equals result"; echo "${v/result/$(( $o1 + $o2 ))}" yields '1 + 3 equals 4' (I think)
There are many more features and subtleties - refer to the link above.
How about sed? Is that what you're looking for?
#!/bin/bash
path="path/to/directory/foo bar"
new_path=$(echo "$path" | sed 's/ /\\ /g')
echo "New Path: '$new_path"
But as #n0rd pointed out in his comment, is probably better just quoting the path when you want to use it; something like...
path="path/to/directory/foo bar"
echo "test" > "$path"

Resources