How to parse a string into variables?

How to parse a string into variables? - bash

I know how to parse a string into variables in the manner of this SO question, e.g.
ABCDE-123456
becomes:
var1=ABCDE
var2=123456
via, say, cut. I can do that in one script, no problem.
But I have a few dozen scripts which parse strings/arguments all in the same fashion (same arguments & variables, i.e. same parsing strategy).
And sometimes I need to make a change or add a variable to the parsing mechanism.
Of course, I could go through every one of my dozens of scripts and change the parsing manually (even if just copy & paste), but that would be tedious and more error-prone to bugs/mistakes.
Is there a modular way to do parse strings/arguments as such?
I thought of writing a script which parses the string/args into variables and then exports, but the export command does not work form child-to-parent, (only vice-versa).

Something like this might work:
parse_it () {
SEP=${SEP--}
string=$1
names=${#:2}
IFS="$SEP" read $names <<< "$string"
}
$ parse_it ABCDE-123456 var1 var2
$ echo "$var1"
ABCDE
$ echo "$var2"
123456
$ SEP=: parse_it "foo:bar:baz" id1 id2 id3
$ echo $id2
bar
The first argument is the string to parse, the remaining arguments are names of variables that get passed to read as the variables to set. (Not quoting $names here is intentional, as we will let the shell split the string into multiple words, one per variable. Valid variable names consist of only _, letters, and numbers, so there are no worries about undesired word splitting or pathname generation by not quoting $names). The function assumes the string uses a single separator of "-", which can be overridden via the environment.
For more complex parsing, you may want to use a custom regular expression (bash 4 or later required for the -g flag to declare):
parse_it () {
reg_ex=$1
string=$2
shift 2
[[ $string =~ $reg_ex ]] || return
i=1
for name; do
declare -g "$name=${BASH_REMATCH[i++]}"
done
}
$ parse_it '(.*)-(.*):(.*)' "abc-123:xyz" id1 id2 id3
$ echo "$id2"
123

I think what you really want is to write your function in one script and include it in all of your other scripts.
You can include other shell scripts by the source or . command.
For example, you can define your parse function in parseString.sh
function parseString {
...
}
And then in any of your other script, do
source parseString.sh
# now we can call parseString function
parseString abcde-12345

Related

Parameter expansion with replacement, avoid additional variable

I'm trying to join input $* which is one parameter consisting of all the parameters added together.
This works.
#!/bin/bash
foo() {
params="${*}"
echo "${params//[[:space:]]/-}"
}
foo 1 2 3 4
1-2-3-4
However, is it possible to skip the assignment of variable?
"${"${*}"//[[:space:]]/-}"
I'm getting bad substitution error.
I can also do
: "${*}"
echo "${_//[[:space:]]/-}"
But it feels hacky.

One option could be to set bash's internal field separator, IFS, to - locally and just echo "$*":
foo() {
local IFS=$'-'
echo "$*"
}

To answer your question, you can do global pattern substitutions on the positional parameters like this:
${*//pat/sub}
${#//pat/sub}
And also arrays like this:
${arr[*]//pat/sub}
${arr[#]//pat/sub}
This won’t join the parameters, but substitute inside them.
Setting IFS to dash adds a dash in between each parameter for echo "$*", or p=$*, but won’t replace anything inside a parameter.
Eg:
$ set -- aa bb 'cc cc'
$ IFS=-
$ echo "$*"
aa-bb-cc cc
To remove all whitespace, including inside a parameter, you can combine them:
IFS=-
echo "${*//[[:space:]]/-}"
Or just assign to a name first, like you were doing:
no_spaces=$*
echo "${no_spaces//[[:space:]]/-}"

read multiple values from a property file using bash shell script

Would like to read multiple values from a property file using a shell script
My properties files looks something like below, the reason I added it following way was to make sure, if in future more students joins I just need to add in in the properties file without changing any thing in the shell script.
student.properties
total_student=6
student_name_1="aaaa"
student_name_2="bbbb"
student_name_3="cccc"
student_name_4="dddd"
student_name_5="eeee"
When I run below script I not getting the desired output, for reading the student names from properties file
student.sh
#!/bin/bash
. /student.properties
i=1
while [ $i -lt $total_student ]
do
{
std_Name=$student_name_$i
echo $std_Name
#****** my logic *******
} || {
echo "ERROR..."
}
i=`expr $i + 1`
done
Output is something like this
1
2
3
4
5
I understand the script is not getting anything for $student_name_ hence only $i value is getting printed.
Hence, wanted to know how to read values from the properties file.

You can do variable name interpolation with ${!foo}. If $foo is "bar", then ${!foo} gives you the value of $bar. In your code that means changing
std_Name=$student_name_$i
to
var=student_name_$i
std_Name=${!var}
Alternatively, you could store the names in an array. Then you wouldn't have to do any parsing.
student.properties
student_names=("aaaa" "bbbb" "cccc" "dddd" "eeee")
student.sh
#!/bin/bash
. /student.properties
for student_name in "${student_names[#]}"; do
...
done

You can use indirect expansion:
std_Name=student_name_$i
echo "${!std_Name}"
the expression ${!var} basically evaluates the variable twice:
first evaluation: student_name_1
second evaluation: foo
Note that this is rarely a good idea and that using an array is almost always preferred.

Function with awk to print single or multiple columns

I use awk a lot to select single columns and after learning what an alias was I started off with
alias a1='awk '\{print $1}'\'
alias a2='awk '\{print $2}'\'
...
After I learned a little more I thought those were cheesy and replaced them with
function a() {
awk "{print \$$1}"
}
so now I can do a 3 or a 11 without needing to create explicit aliases.
So that's good, but sometimes I need to select more than one column, and when I do I have to resort to typing out the actual full awk '{print ...}' command (the horror!).
So I'm trying to come up with a way to do something similar to the a function but one that will accept different numbers of arguments, so I could do a 3 or a 5 7 or a 2 4 9.
I've tried diff things with $# and $* but can't get it right and everything I'm trying now I know are cheesy workarounds and so I'd rather just stop and ask how to do it the proper way.
Thanks all.

$ cat tst.sh
function a {
awk -v args="$*" '
BEGIN { n=split(args,f) }
{ for (i=1;i<=n;i++) printf "%s%s", $(f[i]), (i<n?OFS:ORS) }
'
}
echo "a b c d e f" | a 1 3 5
echo "---"
echo "a b c d e f" | a 1 3 4 6
$ ./tst.sh
a c e
---
a c d f

You could get arbitrary complicated with this sort of thing (what if you wanted to be able to say a 2-5 7 11-, as with cut?) but here's one that will work with a list of numbers:
a() { (IFS=,; awk '{print '"${*/#/$}"'}'); }
That requires a bit of explanation.
a() { ... }
defines a shell function, which differs from an alias in various ways, one of which being that you can give it parameters.
Inside the shell function, I want to change the value of IFS; to avoid having to remember the old value and change it back, I surround the command I actually want to execute with (...), which causes it to execute in a subshell. When the subshell finishes, all environmental changes finish with it, so it effectively makes the change to IFS local.
IFS is the set of characters used for word splitting, but it also defines the character used to separate elements in the expansion of "$*" (that is, the list of function or script arguments) when it is surrounded by quotes. So setting it to , means the $* expansion will be a comma-separated list.
The awk program I want to create is actually something like {print $1,$4,$7}, so aside from putting commas between the list, I need to add a $ before each number. I do that with the bash parameter expansion substitute syntax: ${parameter/pattern/replacement}. By specifying * as the parameter, I get $* with the substitution applied to each argument. (Note that the expansion is quoted. If it weren't, it wouldn't work.)
In the replacement expression, the pattern is empty because the # character at the beginning of the pattern indicates that the match must be at the beginning of the string. Since the actual pattern is empty, the first match always be at the beginning of the string and the replacement ($) will therefore be inserted at the beginning of each argument. The # is needed because // is syntactically different: it means "change all occurrences of the pattern", instead of just the first one.
Unlike many languages, in bash search-and-replace expressions are not terminated with a /, but rather with the matching }. If you type ${p/foo/bar/}, it will replace the first instance of foo with bar/.

Create associative array in bash 3

After thoroughly searching for a way to create an associative array in bash, I found that declare -A array will do the trick. But the problem is, it is only for bash version 4 and the bash version the server has in our system is 3.2.16.
How can I achieve some sort of associative array-like hack in bash 3? The values will be passed to a script like
ARG=array[key];
./script.sh ${ARG}
EDIT: I know that I can do this in awk, or other tools but strict bash is needed for the scenario I am trying to solve.

Bash 3 has no associative arrays, so you're going to have to use some other language feature(s) for your purpose. Note that even under bash 4, the code you wrote doesn't do what you claim it does: ./script.sh ${ARG} does not pass the associative array to the child script, because ${ARG} expands to nothing when ARG is an associative array. You cannot pass an associative array to a child process, you need to encode it anyway.
You need to define some argument passing protocol between the parent script and the child script. A common one is to pass arguments in the form key=value. This assumes that the character = does not appear in keys.
You also need to figure out how to represent the associative array in the parent script and in the child script. They need not use the same representation.
A common method to represent an associative array is to use separate variables for each element, with a common naming prefix. This requires that the key name only consists of ASCII letters (of either case), digits and underscores. For example, instead of ${myarray[key]}, write ${myarray__key}. If the key is determined at run time, you need a round of expansion first: instead of ${myarray[$key]}, write
n=myarray__${key}; echo ${!n}
For an assignment, use printf -v. Note the %s format to printf to use the specified value. Do not write printf -v "myarray__${key}" %s "$value" since that would treat $value as a format and perform printf % expansion on it.
printf -v "myarray__${key}" %s "$value"
If you need to pass an associative array represented like this to a child process with the key=value argument representation, you can use ${!myarray__*} to enumerate over all the variables whose name begins with myarray__.
args=()
for k in ${!myarray__*}; do
n=$k
args+=("$k=${!n}")
done
In the child process, to convert arguments of the form key=value to separate variables with a prefix:
for x; do
if [[ $x != *=* ]]; then echo 1>&2 "KEY=VALUE expected, but got $x"; exit 120; fi
printf -v "myarray__${x%%=*}" %s "${x#*=}"
done
By the way, are you sure that this is what you need? Instead of calling a bash script from another bash script, you might want to run the child script in a subshell instead. That way it would inherit from all the variables of the parent.

Here is another post/explanation on associative arrays in bash 3 and older using parameter expansion:
https://stackoverflow.com/a/4444841
Gilles' method has a nice if statement to catch delimiter issues, sanitize oddball input ...etc. Use that.
If you are somewhat familiar with parameter expansion:
http://www.gnu.org/software/bash/manual/html_node/Shell-Parameter-Expansion.html
To use in your scenario [ as stated: sending to script ]:
Script 1:
sending_array.sh
# A pretend Python dictionary with bash 3
ARRAY=( "cow:moo"
"dinosaur:roar"
"bird:chirp"
"bash:rock" )
bash ./receive_arr.sh "${ARRAY[#]}"
Script 2: receive_arr.sh
argAry1=("$#")
function process_arr () {
declare -a hash=("${!1}")
for animal in "${hash[#]}"; do
echo "Key: ${animal%%:*}"
echo "Value: ${animal#*:}"
done
}
process_arr argAry1[#]
exit 0
Method 2, sourcing the second script:
Script 1:
sending_array.sh
source ./receive_arr.sh
# A pretend Python dictionary with bash 3
ARRAY=( "cow:moo"
"dinosaur:roar"
"bird:chirp"
"bash:rock" )
process_arr ARRAY[#]
Script 2: receive_arr.sh
function process_arr () {
declare -a hash=("${!1}")
for animal in "${hash[#]}"; do
echo "Key: ${animal%%:*}"
echo "Value: ${animal#*:}"
done
}
References:
Passing arrays as parameters in bash

If you don't want to handle a lot of variables, or keys are simply invalid variable identifiers, and your array is guaranteed to have less than 256 items, you can abuse function return values. This solution does not require any subshell as the value is readily available as a variable, nor any iteration so that performance screams. Also it's very readable, almost like the Bash 4 version.
Here's the most basic version:
hash_index() {
case $1 in
'foo') return 0;;
'bar') return 1;;
'baz') return 2;;
esac
}
hash_vals=("foo_val"
"bar_val"
"baz_val");
hash_index "foo"
echo ${hash_vals[$?]}
More details and variants in this answer

You can write the key-value pairs to a file and then grep by key. If you use a pattern like
key=value
then you can egrep for ^key= which makes this pretty safe.
To "overwrite" a value, just append the new value at the end of the file and use tail -1 to get just the last result of egrep
Alternatively, you can put this information into a normal array using key=value as value for the array and then iterator over the array to find the value.

This turns out to be ridiculously easy. I had to convert a bash 4 script that used a bunch of associative arrays to bash 3. These two helper functions did it all:
array_exp() {
exp=${#//[/__}
eval "${exp//]}"
}
array_clear() {
unset $(array_exp "echo \${!$1__*}")
}
I'm flabbergasted that this actually works, but that's the beauty of bash.
E.g.
((all[ping_lo] += counts[ping_lo]))
becomes
array_exp '((all[ping_lo] += counts[ping_lo]))'
Or this print statement:
printf "%3d" ${counts[ping_lo]} >> $return
becomes
array_exp 'printf "%3d" ${counts[ping_lo]}' >> $return
The only syntax that changes is clearing. This:
counts=()
becomes
array_clear counts
and you're set. You could easily tell array_exp to recognize expressions like "=()" and handle them by rewriting them as array_clear expressions, but I prefer the simplicity of the above two functions.

Capturing multiple line output into a Bash variable

I've got a script 'myscript' that outputs the following:
abc
def
ghi
in another script, I call:
declare RESULT=$(./myscript)
and $RESULT gets the value
abc def ghi
Is there a way to store the result either with the newlines, or with '\n' character so I can output it with 'echo -e'?

Actually, RESULT contains what you want — to demonstrate:
echo "$RESULT"
What you show is what you get from:
echo $RESULT
As noted in the comments, the difference is that (1) the double-quoted version of the variable (echo "$RESULT") preserves internal spacing of the value exactly as it is represented in the variable — newlines, tabs, multiple blanks and all — whereas (2) the unquoted version (echo $RESULT) replaces each sequence of one or more blanks, tabs and newlines with a single space. Thus (1) preserves the shape of the input variable, whereas (2) creates a potentially very long single line of output with 'words' separated by single spaces (where a 'word' is a sequence of non-whitespace characters; there needn't be any alphanumerics in any of the words).

Another pitfall with this is that command substitution — $() — strips trailing newlines. Probably not always important, but if you really want to preserve exactly what was output, you'll have to use another line and some quoting:
RESULTX="$(./myscript; echo x)"
RESULT="${RESULTX%x}"
This is especially important if you want to handle all possible filenames (to avoid undefined behavior like operating on the wrong file).

In case that you're interested in specific lines, use a result-array:
declare RESULT=($(./myscript)) # (..) = array
echo "First line: ${RESULT[0]}"
echo "Second line: ${RESULT[1]}"
echo "N-th line: ${RESULT[N]}"

In addition to the answer given by #l0b0 I just had the situation where I needed to both keep any trailing newlines output by the script and check the script's return code.
And the problem with l0b0's answer is that the 'echo x' was resetting $? back to zero... so I managed to come up with this very cunning solution:
RESULTX="$(./myscript; echo x$?)"
RETURNCODE=${RESULTX##*x}
RESULT="${RESULTX%x*}"

Parsing multiple output
Introduction
So your myscript output 3 lines, could look like:
myscript() { echo $'abc\ndef\nghi'; }
or
myscript() { local i; for i in abc def ghi ;do echo $i; done ;}
Ok this is a function, not a script (no need of path ./), but output is same
myscript
abc
def
ghi
Considering result code
To check for result code, test function will become:
myscript() { local i;for i in abc def ghi ;do echo $i;done;return $((RANDOM%128));}
1. Storing multiple output in one single variable, showing newlines
Your operation is correct:
RESULT=$(myscript)
About result code, you could add:
RCODE=$?
even in same line:
RESULT=$(myscript) RCODE=$?
Then
echo $RESULT $RCODE
abc def ghi 66
echo "$RESULT"
abc
def
ghi
echo ${RESULT#Q}
$'abc\ndef\nghi'
printf '%q\n' "$RESULT"
$'abc\ndef\nghi'
but for showing variable definition, use declare -p:
declare -p RESULT RCODE
declare -- RESULT="abc
def
ghi"
declare -- RCODE="66"
2. Parsing multiple output in array, using mapfile
Storing answer into myvar variable:
mapfile -t myvar < <(myscript)
echo ${myvar[2]}
ghi
Showing $myvar:
declare -p myvar
declare -a myvar=([0]="abc" [1]="def" [2]="ghi")
Considering result code
In case you have to check for result code, you could:
RESULT=$(myscript) RCODE=$?
mapfile -t myvar <<<"$RESULT"
declare -p myvar RCODE
declare -a myvar=([0]="abc" [1]="def" [2]="ghi")
declare -- RCODE="40"
3. Parsing multiple output by consecutives read in command group
{ read firstline; read secondline; read thirdline;} < <(myscript)
echo $secondline
def
Showing variables:
declare -p firstline secondline thirdline
declare -- firstline="abc"
declare -- secondline="def"
declare -- thirdline="ghi"
I often use:
{ read foo;read foo total use free foo ;} < <(df -k /)
Then
declare -p use free total
declare -- use="843476"
declare -- free="582128"
declare -- total="1515376"
Considering result code
Same prepended step:
RESULT=$(myscript) RCODE=$?
{ read firstline; read secondline; read thirdline;} <<<"$RESULT"
declare -p firstline secondline thirdline RCODE
declare -- firstline="abc"
declare -- secondline="def"
declare -- thirdline="ghi"
declare -- RCODE="50"

After trying most of the solutions here, the easiest thing I found was the obvious - using a temp file. I'm not sure what you want to do with your multiple line output, but you can then deal with it line by line using read. About the only thing you can't really do is easily stick it all in the same variable, but for most practical purposes this is way easier to deal with.
./myscript.sh > /tmp/foo
while read line ; do
echo 'whatever you want to do with $line'
done < /tmp/foo
Quick hack to make it do the requested action:
result=""
./myscript.sh > /tmp/foo
while read line ; do
result="$result$line\n"
done < /tmp/foo
echo -e $result
Note this adds an extra line. If you work on it you can code around it, I'm just too lazy.
EDIT: While this case works perfectly well, people reading this should be aware that you can easily squash your stdin inside the while loop, thus giving you a script that will run one line, clear stdin, and exit. Like ssh will do that I think? I just saw it recently, other code examples here: https://unix.stackexchange.com/questions/24260/reading-lines-from-a-file-with-bash-for-vs-while
One more time! This time with a different filehandle (stdin, stdout, stderr are 0-2, so we can use &3 or higher in bash).
result=""
./test>/tmp/foo
while read line <&3; do
result="$result$line\n"
done 3</tmp/foo
echo -e $result
you can also use mktemp, but this is just a quick code example. Usage for mktemp looks like:
filenamevar=`mktemp /tmp/tempXXXXXX`
./test > $filenamevar
Then use $filenamevar like you would the actual name of a file. Probably doesn't need to be explained here but someone complained in the comments.

How about this, it will read each line to a variable and that can be used subsequently !
say myscript output is redirected to a file called myscript_output
awk '{while ( (getline var < "myscript_output") >0){print var;} close ("myscript_output");}'

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio