How to iterate a dictionary through indirection - bash

I want to implement in bash the following pseudocode
function gen_items() {
dict=$1 # $1 is a name of a dictionary declared globally
for key in $dict[#]
do
echo $key ${dict[$key]}
# process the key and its value in the dictionary
done
}
The best I have come by is
function gen_items() {
dict=$1
tmp="${dict}[#]"
for key in "${!tmp}"
do
echo $key
done
}
This actually only gets the values from the dictionary, but I need the keys as well.

Use a nameref:
show_dict() {
((BASH_VERSINFO[0] < 4 || ((BASH_VERSINFO[0] == 4 && BASH_VERSINFO[1] < 3)))) &&
{ printf '%s\n' "Need Bash version 4.3 or above" >&2; exit 1; }
declare -n hash=$1
for key in "${!hash[#]}"; do
echo key=$key
done
}
declare -A h
h=([one]=1 [two]=2 [three]=3)
show_dict h
Output:
key=two
key=three
key=one
See:
How can I use variable variables (indirect variables, pointers, references) or associative arrays?
Shell Parameters

Related

BASH assign value in an ARRAY dynamically

I want to assign the value of an array dynamically, and when I change the variable the array changes automatically.
var=5
arr=( $\[$var-1\] $var $\[$var+1\] )
echo ${arr\[#\]}
var=10
echo ${arr\[#\]}
result
4 5 6
4 5 6
I wanted
4 5 6
9 10 11
Try something like this:
v1=abc
v2=efd
v3=ghj
arr=( v1 v2 v3 )
for i in ${arr[#]}; { echo ${!i}; }
abc
efd
ghj
Now lets change values of those vars:
v1=123
v2=456
v3=789
$ for i in ${arr[#]}; { echo ${!i}; }
123
456
789
Elaborating Ivan's trick and applying get/set "method"-style functions -
[P2759474#sdp-bastion ~]$ cat tst
#! /bin/bash
arr=( v1 v2 v3 )
v1(){ (($1))&& var=$(($1+1)); echo $((var-1)); }
v2(){ (($1))&& var=$1; echo $var; }
v3(){ (($1))&& var=$(($1-1)); echo $((var+1)); }
var=5
for i in ${arr[#]}; { printf "%s=" $i; $i; }
v1 14
for i in ${arr[#]}; { printf "%s=" $i; $i; }
[P2759474#sdp-bastion ~]$ ./tst
v1=4
v2=5
v3=6
14
v1=14
v2=15
v3=16
While such automatic updates are perfectly common in spreadsheet applications, Bash doesn’t work that way. If you want to recalculate a few values based on formulas, you have to explicitly tell Bash to do so. The example below generates the outputs that you expect and defines a function called recalculate that you need to call after changes to the formulas or the formulas’ input variables. The rest of the trick is based around how integer evaluation works in Bash.
recalculate() {
local -n source="$1" target="$2"
target=("${source[#]}")
}
formulas=('var - 1' 'var' 'var + 1')
declare -ai arr
var=5
recalculate formulas arr
echo "${arr[#]}" # 4 5 6
var=10
recalculate formulas arr
echo "${arr[#]}" # 9 10 11
(It would be awesome if Bash had an additional pseudo-signal for the trap command, say assignment, which could work like trap 'echo "variable ${1} set to ${!1}"' assignment, but AFAIK, there is no such functionality (plus no separate argument handling in trap); hence the strikethrough. Without that kind of functionality, a function like recalculate might be the closest you can get to the updates you asked for.)
A slightly more elaborate version of recalculate could also (1) handle sparse arrays of formulas correctly, i.e. guarantee to store results under the same indices under which the corresponding formulas were found, and (2) introduce a “reserved” variable name, say index, which can occur in the formulas with an obvious meaning. Just for fun, “because we can”:
recalculate() {
local -n source="$1" target="$2"
local -i index
target=()
for index in "${!source[#]}"; do
target[index]="${source[index]}"
done
}
formulas[1]='index * var - 1'
formulas[3]='index * var'
formulas[5]='index * var + 1'
declare -ai arr
var=5
recalculate formulas arr
echo "${arr[#]#A}" # declare -ai arr=([1]="4" [3]="15" [5]="26")
var=10
recalculate formulas arr
echo "${arr[#]#A}" # declare -ai arr=([1]="9" [3]="30" [5]="51")

Processing a delimited line in bash

Given a single line of input with 'n' arguments which are space delimited. The input arguments themselves are variable. The input is given through an external file.
I want to move specific elements to variables depending on regular expressions. As such, I was thinking of declaring a pointer variable first to keep track of where on the line I am. In addition, the assignment to variable is independent of numerical order, and depending on input some variables may be skipped entirely.
My current method is to use
awk '{print $1}' file.txt
However, not all elements are fixed and I need to account for elements that may be absent, or may have multiple entries.
UPDATE: I found another method.
file=$(cat /file.txt)
for i in ${file[#]}; do
echo $i >> split.txt;
done
With this way, instead of a single line with multiple arguments, we get multiple lines with a single argument. as such, we can now use var#=(grep --regexp="[pattern]" split.txt. Now I just need to figure out how best to use regular expressions to filter this mess.
Let me take an example.
My input strings are:
RON KKND 1534Z AUTO 253985G 034SRT 134OVC 04/32
RON KKND 5256Z 143623G72K 034OVC 074OVC 134SRT 145PRT 13/00
RON KKND 2234Z CON 342523G CLS 01/M12 RMK
So the variable assignment for each of the above would be:
var1=RON var2=KKND var3=1534Z var4=TRUE var5=FALSE var6=253985G varC=2 varC1=034SRT varC2=134OVC var7=04/32
var1=RON var2=KKND var3=5256Z var4=FALSE var5=FALSE var6=143623G72K varC=4 varC1=034OVC varC2=074OVC varC3=134SRT varC4=145PRT var7=13/00
var1=RON var2=KKND var3=2234Z var4=FALSE var5=TRUE var6=342523G varC=0 var7=01/M12
So, the fourth argument might be var4, var5, or var6.
The fifth argument might be var5, var6, or match another criteria.
The sixth argument may or may not be var6. Between var6 and var7 can be determined by matching each argument with */*
Boiling this down even more, The positions on the input of var1, var2 and var3 are fixed but after that I need to compare, order, and assign. In addition, the arguments themselves can vary in character length. The relative position of each section to be divided is fixed in relation to its neighbors. var7 will never be before var6 in the input for example, and if var4 and var5 are true, then the 4th and 5th argument would always be 'AUTO CON' Some segments will always be one argument, and others more than one. The relative position of each is known. As for each pattern, some have a specific character in a specific location, and others may not have any flag on what it is aside from its position in the sequence.
So I need awk to recognize a pointer variable as every argument needs to be checked until a specific match is found
#Check to see if var4 or var5 exists. if so, flag and increment pointer
pointer=4
if (awk '{print $$pointer}' file.txt) == "AUTO" ; then
var4="TRUE"
pointer=$pointer+1
else
var4="FALSE"
fi
if (awk '{print $$pointer}' file.txt) == "CON" ; then
var5="TRUE"
pointer=$pointer+1
else
var5="FALSE"
fi
#position of var6 is fixed once var4 and var5 are determined
var6=$(awk '{print $$pointer}' file.txt)
pointer=$pointer+1
#Count the arguments between var6 and var7 (there may be up to ten)
#and separate each to decode later. varC[0-9] is always three upcase
# letters followed by three numbers. Use this counter later when decoding.
varC=0
until (awk '{print $$pointer}' file.txt) == "*/*" ; do
varC($varC+1)=(awk '{print $$pointer}' file.txt)
varC=$varC+1
pointer=$pointer+1
done
#position of var7 is fixed after all arguments of varC are handled
var7=$(awk '{print $$pointer}' file.txt)
pointer=$pointer+1
I know the above syntax is incorrect. The question is how do I fix it.
var7 is not always at the end of the input line. Arguments after var7 however do not need to be processed.
Actually interpreting the patterns I haven't gotten to yet. I intend to handle that using case statements comparing the variables with regular expressions to compare against. I don't want to use awk to interpret the patterns directly as that would get very messy. I have contemplated using for n in $string, but to do that would mean comparing every argument to every possible combination directly (And there are multiple segments each with multiple patterns) and is such impractical. I'm trying to make this a two step process.
Please try the following:
#!/bin/bash
# template for variable names
declare -a namelist1=( "var1" "var2" "var3" "var4" "var5" "var6" "varC" )
declare -a ary
# read each line and assign ary to the elements
while read -r -a ary; do
if [[ ${ary[3]} = AUTO ]]; then
ary=( "${ary[#]:0:3}" "TRUE" "FALSE" "${ary[4]}" "" "${ary[#]:5:3}" )
elif [[ ${ary[3]} = CON ]]; then
ary=( "${ary[#]:0:3}" "FALSE" "TRUE" "${ary[4]}" "" "${ary[#]:5:3}" )
else
ary=( "${ary[#]:0:3}" "FALSE" "FALSE" "${ary[3]}" "" "${ary[#]:4:5}" )
fi
# initial character of the 7th element
ary[6]=${ary[7]:0:1}
# locate the index of */* entry in the ary and adjust the variable names
for (( i=0; i<${#ary[#]}; i++ )); do
if [[ ${ary[$i]} == */* ]]; then
declare -a namelist=( "${namelist1[#]}" )
for (( j=1; j<=i-7; j++ )); do
namelist+=( "$(printf "varC%d" "$j")" )
done
namelist+=( "var7" )
fi
done
# assign variables to array elements
for (( i=0; i<${#ary[#]}; i++ )); do
# echo -n "${namelist[$i]}=${ary[$i]} " # for debugging
declare -n p="${namelist[$i]}"
p="${ary[$i]}"
done
# echo "var1=$var1 var2=$var2 var3=$var3 ..." # for debugging
done < file.txt
Note that the script above just assigns bash variables and does not print anything
unless you explicitly echo or printf the variables.
Updated: This code shows how to decide variable value based on pattern match , multiple times.
one code block in pure bash and the other in gawk manner
bash code block requires associative Array support, which is not available in very early versions
grep is also required to do pattern matching
tested with GNU bash, version 4.2.46(2)-release (x86_64-redhat-linux-gnu) and grep (GNU grep) 2.20
and stick to printf other than echo after I learn why-is-printf-better-than-echo
when using bash I consider it good practice to be more defensive
#!/bin/bash
declare -ga outVars
declare -ga lineBuf
declare -g NF
#force valid index starts from 1
#consistent with var* name pattern
outVars=(unused var1 var2 var3 var4 var5 var6 varC var7)
((numVars=${#outVars[#]} - 1))
declare -gr numVars
declare -r outVars
function e_unused {
return
}
function e_var1 {
printf "%s" "${lineBuf[1]}"
}
function e_var2 {
printf "%s" "${lineBuf[2]}"
}
function e_var3 {
printf "%s" "${lineBuf[3]}"
}
function e_var4 {
if [ "${lineBuf[4]}" == "AUTO" ] ;
then
printf "TRUE"
else
printf "FALSE"
fi
}
function e_var5 {
if [ "${lineBuf[4]}" == "CON" ] ;
then
printf "TRUE"
else
printf "FALSE"
fi
}
function e_varC {
local var6_idx=4
if [ "${lineBuf[4]}" == "AUTO" -o "${lineBuf[4]}" == "CON" ] ;
then
var6_idx=5
fi
local var7_idx=$NF
local i
local count=0
for ((i=NF;i>=1;i--));
do
if [ $(grep -cE '^.*/.*$' <<<${lineBuf[$i]}) -eq 1 ];
then
var7_idx=$i
break
fi
done
((varC = var7_idx - var6_idx - 1))
if [ $varC -eq 0 ];
then
printf 0
return;
fi
local cFamily=""
local append
for ((i=var6_idx;i<=var7_idx;i++));
do
if [ $(grep -cE '^[0-9]{3}[A-Z]{3}$' <<<${lineBuf[$i]}) -eq 1 ];
then
((count++))
cFamily="$cFamily varC$count=${lineBuf[$i]}"
fi
done
printf "%s %s" $count "$cFamily"
}
function e_var6 {
if [ "${lineBuf[4]}" == "AUTO" -o "${lineBuf[4]}" == "CON" ] ;
then
printf "%s" "${lineBuf[5]}"
else
printf "%s" "${lineBuf[4]}"
fi
}
function e_var7 {
local i
for ((i=NF;i>=1;i--));
do
if [ $(grep -cE '^.*/.*$' <<<${lineBuf[$i]}) -eq 1 ];
then
printf "%s" "${lineBuf[$i]}"
return
fi
done
}
while read -a lineBuf ;
do
NF=${#lineBuf[#]}
lineBuf=(unused ${lineBuf[#]})
for ((i=1; i<=numVars; i++));
do
printf "%s=" "${outVars[$i]}"
(e_${outVars[$i]})
printf " "
done
printf "\n"
done <file.txt
The gawk specific extension Indirect Function Call is used in the awk code below
the code assigns a function name for every desired output variable.
different pattern or other transformation can be applied in its specific function
doing so to avoid tons of if-else-if-else
and is also easier to read and extend.
for the special varC family, the function pick_varC played a trick
after varC is determined ,its value consists of multiple output fields.
if varC=2, the value of varC is returned as 2 varC1=034SRT varC2=134OVC
that is actual value of varC appending all follow members.
gawk '
BEGIN {
keys["var1"] = "pick_var1";
keys["var2"] = "pick_var2";
keys["var3"] = "pick_var3";
keys["var4"] = "pick_var4";
keys["var5"] = "pick_var5";
keys["var6"] = "pick_var6";
keys["varC"] = "pick_varC";
keys["var7"] = "pick_var7";
}
function pick_var1 () {
return $1;
}
function pick_var2 () {
return $2;
}
function pick_var3 () {
return $3;
}
function pick_var4 () {
for (i=1;i<=NF;i++) {
if ($i == "AUTO") {
return "TRUE";
}
}
return "FALSE";
}
function pick_var5 () {
for (i=1;i<=NF;i++) {
if ($i == "CON") {
return "TRUE";
}
}
return "FALSE";
}
function pick_varC () {
for (i=1;i<=NF;i++) {
if (($i=="AUTO" || $i=="CON")) {
break;
}
}
var6_idx = 5;
if ( i!=4 ) {
var6_idx = 4;
}
var7_idx = NF;
for (i=1;i<=NF;i++) {
if ($i~/.*\/.*/) {
var7_idx = i;
}
}
varC = var7_idx - var6_idx - 1;
if ( varC == 0) {
return varC;
}
count = 0;
cFamily = "";
for (i = 1; i<=varC;i++) {
if ($(var6_idx+i)~/[0-9]{3}[A-Z]{3}/) {
cFamily = sprintf("%s varC%d=%s",cFamily,i,$(var6_idx+i));
count++;
}
}
varC = sprintf("%d %s",count,cFamily);
return varC;
}
function pick_var6 () {
for (i=1;i<=NF;i++) {
if (($i=="AUTO" || $i=="CON")) {
break;
}
}
if ( i!=4 ) {
return $4;
} else {
return $5
}
}
function pick_var7 () {
for (i=1;i<=NF;i++) {
if ($i~/.*\/.*/) {
return $i;
}
}
}
{
for (k in keys) {
pickFunc = keys[k];
printf("%s=%s ",k,#pickFunc());
}
printf("\n");
}
' file.txt
test input
RON KKND 1534Z AUTO 253985G 034SRT 134OVC 04/32
RON KKND 5256Z 143623G72K 034OVC 074OVC 134SRT 145PRT 13/00
RON KKND 2234Z CON 342523G CLS 01/M12 RMK
script output
var1=RON var2=KKND var3=1534Z var4=TRUE var5=FALSE varC=2 varC1=034SRT varC2=134OVC var6=253985G var7=04/32
var1=RON var2=KKND var3=5256Z var4=FALSE var5=FALSE varC=4 varC1=034OVC varC2=074OVC varC3=134SRT varC4=145PRT var6=143623G72K var7=13/00
var1=RON var2=KKND var3=2234Z var4=FALSE var5=TRUE varC=0 var6=342523G var7=01/M12

find the position of an element in a list in bash

I am using bash. I need to find the position of a given element in a list.
I have searched for solutions but they all only check if an element exists in a list instead of finding the index.
#!/bin/bash
LIST=(A B C D)
# index_of_A = LIST.index('A') # which should return 0
echo ${LIST[${index_of_A}]} # prints 'A'
Or is it impossible to do this through bash?
EDIT: remove dash
First, let's define the list:
list=(A B C D)
Next, let's define a function to find the index of elements:
indexof() { i=0; while [ "$i" -lt "${#list[#]}" ] && [ "${list[$i]}" != "$1" ]; do ((i++)); done; echo $i; }
Now, it is easy to find the index of elements of array list:
$ indexof "A"
0
$ indexof "C"
2
If we ask for the index of an element that is not in list, we get one more than the largest index in the array:
$ indexof "E"
4
Alternative
Depending on your programming background, you might prefer that the index of an unknown element be returned as -1. In that case:
indexof() { i=-1; for ((j=0;j<${#list[#]};j++)); do [ "${list[$j]}" = "$1" ] && { i=$j; break; } done; echo $i; }
For example:
$ indexof "A"
0
$ indexof "B"
1
$ indexof "E"
-1
If you insist on a clever / inefficient / awkward way:
#! /bin/bash
indexof()
{
local word
local item
local idx
word=$1
shift
item=$(printf '%s\n' "$#" | fgrep -nx "$word")
let idx=${item%%:*}-1
echo $idx
}
list=(A B C D)
indexof C "${list[#]}" # 2
indexof Z "${list[#]}" # -1
This works as long as the elements of your list don't contain newlines.
Assuming that the array values are unique (or else one would need to define more precisely what an “index” means (first one / last one / any / all matching)), the best option might be to pre-index the array into a reverse mapping (== an integer-valued associative array). Advantages:
Subsequent use of the reverse mapping avoids repeated linear traversals of the array. The complexity of an index lookup is sub-linear (e.g. constant or logarithmic, depending on underlying implementation).
The solution uses zero external processes (i.e. only Bash itself).
create_reverse() {
declare -gAi "$2"
local -nr array="$1"
local -n mapping="$2"
local -i index
mapping=()
for index in "${!array[#]}"; do
mapping["${array[index]}"]="$((index))"
done
}
index_of() {
local -r value="$1"
local -nr reverse="$2"
echo "$((reverse["$value"]))"
}
list=(zero one two three four five)
create_reverse list reverse_mapping_of_list
index_of zero reverse_mapping_of_list # 0
index_of three reverse_mapping_of_list # 3
index_of five reverse_mapping_of_list # 5

Storing multiple columns of data from a file in a variable

I'm trying to read from a file the data that it contains and get 2 important pieces of data from the file and use it in a bash script. A string and then a number for example:
Box 12
Toy 85
Dog 13
Bottle 22
I was thinking I could write a while loop to loop through the file and store the data into a variable. However I need two different variables, one for the number and one for the word. How do I get them separated into two variables?
Example code:
#!/bin/bash
declare -a textarr numarr
while read -r text num;do
textarr+=("$text")
numarr+=("$num")
done <file
echo ${textarr[1]} ${numarr[1]} #will print Toy 85
data are stored into two array variables: textarr numarr.
You can access each one of them using index ${textarr[$index]} or all of them at once with ${textarr[#]}
To read all the data into a single associative array (in bash 4.0 or newer):
#!/bin/bash
declare -A data=( )
while read -r key value; do
data[$key]=$value
done <file
With that done, you can retrieve a value by key efficiently:
echo "${data[Box]}"
...or iterate over all keys:
for key in "${!data[#]}"; do
value=${data[$key]}
echo "Key $key has value $value"
done
You'll note that read takes multiple names on its argument list. When given more than one argument, it splits fields by IFS, putting columns into their respective variables (with the entire rest of the line going into the last variable named, if more columns exist than variables are named).
Here I provide my own solution which should be discussed. I am not sure this is a good solution or not. Using while read construct has the drawback of starting a new shell and it will not be able to update a variable outside the loop. Here is an example code which you can modify to suite your own need. If you have more column data to use, then slight adjustment is need.
#!/bin/sh
res=$(awk 'BEGIN{OFS=" "}{print $2, $3 }' mytabularfile.tab)
n=0
for x in $res; do
row=$(expr $n / 2)
col=$(expr $n % 2)
#echo "row: $row column: $col value: $x"
if [ $col -eq 0 ]; then
if [ $n -gt 0 ]; then
echo "row: $row "
echo col1=$col1 col2=$col2
fi
col1=$x
else
col2=$x
fi
n=$(expr $n + 1)
done
row=$(expr $row + 1)
echo "last row: $row col1=$col1 col2=$col2"

Iterate over lists embedded as values in key/value pairs in bash

I'm trying to get a (key,multiple-value) structure (some sort of hashmap) in bash, like this :
[
[ "abc" : 1, 2, 3, 4 ],
[ "def" : "w", 33, 2 ]
]
I'd like to iterate through eack key (some kind of for key in ..., and get each value with something like map["def",2] or map[$key,2].
I've seen a couple of threads talking about single-value hashmap, but nothing about this issue.
I could go with N arrays, N being the amount of key in my map, filled with every field in a row, but I don't want to duplicate code as much as possible.
Thanks in advance !
Edit :
I'd like to go through the structure with something like this :
for key in ${map[#]} do;
echo $key # "abc" then "def"
for value in ${map[$key,#]} do;
...
done
done
Using modern bash features with the multiple-array case:
Assignment (manual):
map_abc=( 1 2 3 4 )
map_def=( w 33 2 )
Assignment (programmatic):
append() {
local array_name="${1}_$2"; shift; shift
declare -g -a "$array_name"
declare -n array="$array_name" # BASH 4.3 FEATURE
array+=( "$#" )
}
append map abc 1 2 3 4
append map def w 33 2
Iteration (done inside a function to contain the namevar's scope):
iter() {
for array in ${!map_#}; do
echo "Iterating over array ${array#map_}"
declare -n cur_array="$array" # BASH 4.3 FEATURE
for key in "${!cur_array[#]}"; do
echo "$key: ${cur_array[$key]}"
done
done
}
iter
This can also be done without namevars, but in an uglier and more error-prone fashion. (To be clear, I believe the code given here uses eval safely, but it's easy to get wrong -- if trying to build your own implementation on this template, please be very cautious).
# Compatible with older bash (should be through 3.x).
append() {
local array_name="${1}_$2"; shift; shift
declare -g -a "$array_name"
local args_str cmd_str
printf -v args_str '%q ' "$#"
printf -v cmd_str "%q+=( %s )" "$array_name" "$args_str"
eval "$cmd_str"
}
...and, to iterate in a way compatible with bash back through 3.x:
for array in ${!map_#}; do
echo "Iterating over array ${array#map_}"
printf -v cur_array_cmd 'cur_array=( ${%q[#]} )' "$array"
eval "$cur_array_cmd"
for key in "${!cur_array[#]}"; do
echo "$key: ${cur_array[$key]}"
done
done
This is more computationally efficient than filtering through a single large array (the other answer given) -- and, when namevars are available, arguably results in cleaner code as well.
Do-able. The declaration is somewhat ugly
declare -A map=(
[abc,0]=1
[abc,1]=2
[abc,2]=3
[abc,3]=4
[def,0]=w
[def,1]=33
[def,2]=2
)
key="def"
i=1
echo "${map[$key,$i]}" # => 33
Iterating: helpful to keep a separate array of "keys":
keys=(abc def)
Then
for key in "${keys[#]}"; do
echo "$key"
for idx in "${!map[#]}"; do
if [[ $idx == $key,* ]]; then
n=${idx##*,}
printf "\t%s\t%s\n" "$n" "${map["$idx"]}"
fi
done
done
abc
0 1
1 2
2 3
3 4
def
1 33
0 w
2 2

Resources