Read a config file into 2D array - bash

I have a config file:
map_a 1234,3788,9940
map_b 9948,8901
map_c
map_d 7789,30400
map_e 499423
map_f
The array variable should content:
Name Attribute 1 Attribute 2 Attribute 3 Attribute ...
---------------------------------------------------------------------------
map_a 1234 3788 9940
map_b 9948 8901
map_c
map_d 7789 30400
map_e 499423
map_f
...
So:
foo[0,0] = map_a
foo[0,1] = 1234
foo[3,2] = 30400
...
How can I achieve this with bash? Or are there recommendations for change the delimiters of the .cfg file? As it's flexible from scratch
Regards
Joe C.

Bash arrays are one-dimensional, but associate arrays may allow you to represent something that looks like 2D array (each key will be the indices combined with ','). Without changing the structure:
#! /bin/bash
declare -A foo
n=0
maxcol=0
IFS=" ,"
while read k v ; do
foo[$n,0]=$k
IFS=', ' read -r -a vv <<< "$v"
i=1
for v1 in ${vv[*]} ; do
foo[$n,$i]=$v1
[ $i -gt $maxcol ] && maxcol=$i
let i=i+1
done
let n=n+1
done < config.txt
for ((i=0; i<n ; i++)) do
for ((j=0 ; j<maxcol ; j++)) do
echo "foo($i,$j)=${foo[$i,$j]}"
done
echo
done

Alternative will be to use the attribute name as the first key, so that it will be possible to query ${foo[map_a,2}} - 2nd attribute of map_a
#! /bin/bash
declare -A foo
n=0
maxcol=0
maxcol=0
IFS=" ,"
while read k v ; do
foo[$n,0]=$k
IFS=', ' read -r -a vv <<< "$v"
i=1
for v1 in ${vv[*]} ; do
# Use key as first index
foo[$k,$i]=$v1
[ $i -gt $maxcol ] && maxcol=$i
let i=i+1
done
let n=n+1
done < config.txt
for i in map_a map_b map_c map_d map_e map_f ; do
for ((j=0 ; j<maxcol ; j++)) do
echo "foo($i,$j)=${foo[$i,$j]}"
done
echo
done

Alright, I've also put maxcol into an array:
declare -A foo
declare -A maxcol
n=0
IFS=" ,"
while read k v
do
foo[$n,0]=$k
IFS=', ' read -r -a vv <<< "$v"
i=1
for v1 in ${vv[*]}
do
foo[$n,$i]=$v1
let i=i+1
done
maxcol[$n]=$i
let n=n+1
done < config.txt
for (( i=0; i<n; i++ ))
do
for (( j=0 ; j<maxcol[$i] ; j++ ))
do
echo "foo($i,$j)=${foo[$i,$j]}"
done
echo "maxcol($i)=${maxcol[$i]}"
echo
done
I think this will fit my needs now perfectly.

Related

How can I initialise a 2D array in bash Scripts but in minimalistic method [duplicate]

I'm wondering how to declare a 2D array in bash and then initialize to 0.
In C it looks like this:
int a[4][5] = {0};
And how do I assign a value to an element? As in C:
a[2][3] = 3;
You can simulate them for example with hashes, but need care about the leading zeroes and many other things. The next demonstration works, but it is far from optimal solution.
#!/bin/bash
declare -A matrix
num_rows=4
num_columns=5
for ((i=1;i<=num_rows;i++)) do
for ((j=1;j<=num_columns;j++)) do
matrix[$i,$j]=$RANDOM
done
done
f1="%$((${#num_rows}+1))s"
f2=" %9s"
printf "$f1" ''
for ((i=1;i<=num_rows;i++)) do
printf "$f2" $i
done
echo
for ((j=1;j<=num_columns;j++)) do
printf "$f1" $j
for ((i=1;i<=num_rows;i++)) do
printf "$f2" ${matrix[$i,$j]}
done
echo
done
the above example creates a 4x5 matrix with random numbers and print it transposed, with the example result
1 2 3 4
1 18006 31193 16110 23297
2 26229 19869 1140 19837
3 8192 2181 25512 2318
4 3269 25516 18701 7977
5 31775 17358 4468 30345
The principle is: Creating one associative array where the index is an string like 3,4. The benefits:
it's possible to use for any-dimension arrays ;) like: 30,40,2 for 3 dimensional.
the syntax is close to "C" like arrays ${matrix[2,3]}
Bash doesn't have multi-dimensional array. But you can simulate a somewhat similar effect with associative arrays. The following is an example of associative array pretending to be used as multi-dimensional array:
declare -A arr
arr[0,0]=0
arr[0,1]=1
arr[1,0]=2
arr[1,1]=3
echo "${arr[0,0]} ${arr[0,1]}" # will print 0 1
If you don't declare the array as associative (with -A), the above won't work. For example, if you omit the declare -A arr line, the echo will print 2 3 instead of 0 1, because 0,0, 1,0 and such will be taken as arithmetic expression and evaluated to 0 (the value to the right of the comma operator).
Bash does not support multidimensional arrays.
You can simulate it though by using indirect expansion:
#!/bin/bash
declare -a a0=(1 2 3 4)
declare -a a1=(5 6 7 8)
var="a1[1]"
echo ${!var} # outputs 6
Assignments are also possible with this method:
let $var=55
echo ${a1[1]} # outputs 55
Edit 1: To read such an array from a file, with each row on a line, and values delimited by space, use this:
idx=0
while read -a a$idx; do
let idx++;
done </tmp/some_file
Edit 2: To declare and initialize a0..a3[0..4] to 0, you could run:
for i in {0..3}; do
eval "declare -a a$i=( $(for j in {0..4}; do echo 0; done) )"
done
Another approach is you can represent each row as a string, i.e. mapping the 2D array into an 1D array. Then, all you need to do is unpack and repack the row's string whenever you make an edit:
# Init a 4x5 matrix
a=("00 01 02 03 04" "10 11 12 13 14" "20 21 22 23 24" "30 31 32 33 34")
aset() {
row=$1
col=$2
value=$3
IFS=' ' read -r -a rowdata <<< "${a[$row]}"
rowdata[$col]=$value
a[$row]="${rowdata[#]}"
}
aget() {
row=$1
col=$2
IFS=' ' read -r -a rowdata <<< "${a[$row]}"
echo ${rowdata[$col]}
}
aprint() {
for rowdata in "${a[#]}"; do
echo $rowdata
done
}
echo "Matrix before change"
aprint
# Outputs: a[2][3] == 23
echo "a[2][3] == $( aget 2 3 )"
echo "a[2][3] = 9999"
aset 2 3 9999
# Show result
echo "Matrix after change"
aprint
Outputs:
Matrix before change
00 01 02 03 04
10 11 12 13 14
20 21 22 23 24
30 31 32 33 34
a[2][3] == 23
a[2][3] = 9999
Matrix after change
00 01 02 03 04
10 11 12 13 14
20 21 22 9999 24
30 31 32 33 34
You can also approach this in a much less smarter fashion
q=()
q+=( 1-2 )
q+=( a-b )
for set in ${q[#]};
do
echo ${set%%-*}
echo ${set##*-}
done
of course a 22 line solution or indirection is probably the better way to go and why not sprinkle eval every where to .
2D array can be achieved in bash by declaring 1D array and then elements can be accessed using (r * col_size) + c). Below logic delcares 1D array (str_2d_arr) and prints as 2D array.
col_size=3
str_2d_arr=()
str_2d_arr+=('abc' '200' 'xyz')
str_2d_arr+=('def' '300' 'ccc')
str_2d_arr+=('aaa' '400' 'ddd')
echo "Print 2D array"
col_count=0
for elem in ${str_2d_arr[#]}; do
if [ ${col_count} -eq ${col_size} ]; then
echo ""
col_count=0
fi
echo -e "$elem \c"
((col_count++))
done
echo ""
Output is
Print 2D array
abc 200 xyz
def 300 ccc
aaa 400 ddd
Below logic is very useful to get each row from the above declared 1D array str_2d_arr.
# Get nth row and update to 2nd arg
get_row_n()
{
row=$1
local -n a=$2
start_idx=$((row * col_size))
for ((i = 0; i < ${col_size}; i++)); do
idx=$((start_idx + i))
a+=(${str_2d_arr[${idx}]})
done
}
arr=()
get_row_n 0 arr
echo "Row 0"
for e in ${arr[#]}; do
echo -e "$e \c"
done
echo ""
Output is
Row 0
abc 200 xyz
A way to simulate arrays in bash (it can be adapted for any number of dimensions of an array):
#!/bin/bash
## The following functions implement vectors (arrays) operations in bash:
## Definition of a vector <v>:
## v_0 - variable that stores the number of elements of the vector
## v_1..v_n, where n=v_0 - variables that store the values of the vector elements
VectorAddElementNext () {
# Vector Add Element Next
# Adds the string contained in variable $2 in the next element position (vector length + 1) in vector $1
local elem_value
local vector_length
local elem_name
eval elem_value=\"\$$2\"
eval vector_length=\$$1\_0
if [ -z "$vector_length" ]; then
vector_length=$((0))
fi
vector_length=$(( vector_length + 1 ))
elem_name=$1_$vector_length
eval $elem_name=\"\$elem_value\"
eval $1_0=$vector_length
}
VectorAddElementDVNext () {
# Vector Add Element Direct Value Next
# Adds the string $2 in the next element position (vector length + 1) in vector $1
local elem_value
local vector_length
local elem_name
eval elem_value="$2"
eval vector_length=\$$1\_0
if [ -z "$vector_length" ]; then
vector_length=$((0))
fi
vector_length=$(( vector_length + 1 ))
elem_name=$1_$vector_length
eval $elem_name=\"\$elem_value\"
eval $1_0=$vector_length
}
VectorAddElement () {
# Vector Add Element
# Adds the string contained in the variable $3 in the position contained in $2 (variable or direct value) in the vector $1
local elem_value
local elem_position
local vector_length
local elem_name
eval elem_value=\"\$$3\"
elem_position=$(($2))
eval vector_length=\$$1\_0
if [ -z "$vector_length" ]; then
vector_length=$((0))
fi
if [ $elem_position -ge $vector_length ]; then
vector_length=$elem_position
fi
elem_name=$1_$elem_position
eval $elem_name=\"\$elem_value\"
if [ ! $elem_position -eq 0 ]; then
eval $1_0=$vector_length
fi
}
VectorAddElementDV () {
# Vector Add Element
# Adds the string $3 in the position $2 (variable or direct value) in the vector $1
local elem_value
local elem_position
local vector_length
local elem_name
eval elem_value="$3"
elem_position=$(($2))
eval vector_length=\$$1\_0
if [ -z "$vector_length" ]; then
vector_length=$((0))
fi
if [ $elem_position -ge $vector_length ]; then
vector_length=$elem_position
fi
elem_name=$1_$elem_position
eval $elem_name=\"\$elem_value\"
if [ ! $elem_position -eq 0 ]; then
eval $1_0=$vector_length
fi
}
VectorPrint () {
# Vector Print
# Prints all the elements names and values of the vector $1 on sepparate lines
local vector_length
vector_length=$(($1_0))
if [ "$vector_length" = "0" ]; then
echo "Vector \"$1\" is empty!"
else
echo "Vector \"$1\":"
for ((i=1; i<=$vector_length; i++)); do
eval echo \"[$i]: \\\"\$$1\_$i\\\"\"
###OR: eval printf \'\%s\\\n\' \"[\$i]: \\\"\$$1\_$i\\\"\"
done
fi
}
VectorDestroy () {
# Vector Destroy
# Empties all the elements values of the vector $1
local vector_length
vector_length=$(($1_0))
if [ ! "$vector_length" = "0" ]; then
for ((i=1; i<=$vector_length; i++)); do
unset $1_$i
done
unset $1_0
fi
}
##################
### MAIN START ###
##################
## Setting vector 'params' with all the parameters received by the script:
for ((i=1; i<=$#; i++)); do
eval param="\${$i}"
VectorAddElementNext params param
done
# Printing the vector 'params':
VectorPrint params
read temp
## Setting vector 'params2' with the elements of the vector 'params' in reversed order:
if [ -n "$params_0" ]; then
for ((i=1; i<=$params_0; i++)); do
count=$((params_0-i+1))
VectorAddElement params2 count params_$i
done
fi
# Printing the vector 'params2':
VectorPrint params2
read temp
## Getting the values of 'params2'`s elements and printing them:
if [ -n "$params2_0" ]; then
echo "Printing the elements of the vector 'params2':"
for ((i=1; i<=$params2_0; i++)); do
eval current_elem_value=\"\$params2\_$i\"
echo "params2_$i=\"$current_elem_value\""
done
else
echo "Vector 'params2' is empty!"
fi
read temp
## Creating a two dimensional array ('a'):
for ((i=1; i<=10; i++)); do
VectorAddElement a 0 i
for ((j=1; j<=8; j++)); do
value=$(( 8 * ( i - 1 ) + j ))
VectorAddElementDV a_$i $j $value
done
done
## Manually printing the two dimensional array ('a'):
echo "Printing the two-dimensional array 'a':"
if [ -n "$a_0" ]; then
for ((i=1; i<=$a_0; i++)); do
eval current_vector_lenght=\$a\_$i\_0
if [ -n "$current_vector_lenght" ]; then
for ((j=1; j<=$current_vector_lenght; j++)); do
eval value=\"\$a\_$i\_$j\"
printf "$value "
done
fi
printf "\n"
done
fi
################
### MAIN END ###
################
If each row of the matrix is the same size, then you can simply use a linear array and multiplication.
That is,
a=()
for (( i=0; i<4; ++i )); do
for (( j=0; j<5; ++j )); do
a[i*5+j]=0
done
done
Then your a[2][3] = 3 becomes
a[2*5+3] = 3
This approach might be worth turning into a set of functions, but since you can't pass arrays to or return arrays from functions, you would have to use pass-by-name and sometimes eval. So I tend to file multidimensional arrays under "things bash is simply Not Meant To Do".
One can simply define two functions to write ($4 is the assigned value) and read a matrix with arbitrary name ($1) and indexes ($2 and $3) exploiting eval and indirect referencing.
#!/bin/bash
matrix_write () {
eval $1"_"$2"_"$3=$4
# aux=$1"_"$2"_"$3 # Alternative way
# let $aux=$4 # ---
}
matrix_read () {
aux=$1"_"$2"_"$3
echo ${!aux}
}
for ((i=1;i<10;i=i+1)); do
for ((j=1;j<10;j=j+1)); do
matrix_write a $i $j $[$i*10+$j]
done
done
for ((i=1;i<10;i=i+1)); do
for ((j=1;j<10;j=j+1)); do
echo "a_"$i"_"$j"="$(matrix_read a $i $j)
done
done
Mark Reed suggested a very good solution for 2D arrays (matrix)! They always can be converted in a 1D array (vector). Although Bash doesn't have a native support for 2D arrays, it's not that hard to create a simple ADT around the mentioned principle.
Here is a barebone example with no argument checks, etc, just to keep the solution clear: the array's size is set as two first elements in the instance (documentation for the Bash module that implements a matrix ADT, https://github.com/vorakl/bash-libs/blob/master/src.docs/content/pages/matrix.rst )
#!/bin/bash
matrix_init() {
# matrix_init instance x y data ...
declare -n self=$1
declare -i width=$2 height=$3
shift 3;
self=(${width} ${height} "$#")
}
matrix_get() {
# matrix_get instance x y
declare -n self=$1
declare -i x=$2 y=$3
declare -i width=${self[0]} height=${self[1]}
echo "${self[2+y*width+x]}"
}
matrix_set() {
# matrix_set instance x y data
declare -n self=$1
declare -i x=$2 y=$3
declare data="$4"
declare -i width=${self[0]} height=${self[1]}
self[2+y*width+x]="${data}"
}
matrix_destroy() {
# matrix_destroy instance
declare -n self=$1
unset self
}
# my_matrix[3][2]=( (one, two, three), ("1 1" "2 2" "3 3") )
matrix_init my_matrix \
3 2 \
one two three \
"1 1" "2 2" "3 3"
# print my_matrix[2][0]
matrix_get my_matrix 2 0
# print my_matrix[1][1]
matrix_get my_matrix 1 1
# my_matrix[1][1]="4 4 4"
matrix_set my_matrix 1 1 "4 4 4"
# print my_matrix[1][1]
matrix_get my_matrix 1 1
# remove my_matrix
matrix_destroy my_matrix
For simulating a 2-dimensional array, I first load the first n-elements (the elements of the first column)
local pano_array=()
i=0
for line in $(grep "filename" "$file")
do
url=$(extract_url_from_xml $line)
pano_array[i]="$url"
i=$((i+1))
done
To add the second column, I define the size of the first column and calculate the values in an offset variable
array_len="${#pano_array[#]}"
i=0
while [[ $i -lt $array_len ]]
do
url="${pano_array[$i]}"
offset=$(($array_len+i))
found_file=$(get_file $url)
pano_array[$offset]=$found_file
i=$((i+1))
done
The below code will definitely work provided if you are working on a Mac you have bash version 4. Not only can you declare 0 but this is more of a universal approach to dynamically accepting values.
2D Array
declare -A arr
echo "Enter the row"
read r
echo "Enter the column"
read c
i=0
j=0
echo "Enter the elements"
while [ $i -lt $r ]
do
j=0
while [ $j -lt $c ]
do
echo $i $j
read m
arr[${i},${j}]=$m
j=`expr $j + 1`
done
i=`expr $i + 1`
done
i=0
j=0
while [ $i -lt $r ]
do
j=0
while [ $j -lt $c ]
do
echo -n ${arr[${i},${j}]} " "
j=`expr $j + 1`
done
echo ""
i=`expr $i + 1`
done

Bash : Exclude range from array

I have an array from with numbers from 1 to 100:
array=$(seq 100)
My task is to exclude range from 60 to 80.
You can use parameter expansion with the offset/length specification.
#! /bin/bash
arr=({1..100})
exclude_from=60
exclude_to=80
echo "${arr[#]:0:exclude_from-1}" "${arr[#]:exclude_to}"
An arithmetic test condition
for n in {1..100}; do
(( 60 <= n && n <= 80 )) && continue
echo $n
done
However, to remove those elements from an array
ary=({1..100})
# note that number 1 is stored in _index_ 0
for ((n=60; n <= 80; n++)); do
unset "ary[$((n-1))]"
done
declare -p ary
outputs
declare -a ary=([0]="1" [1]="2" [2]="3" [3]="4" [4]="5" [5]="6" [6]="7" [7]="8" [8]="9" [9]="10" [10]="11" [11]="12" [12]="13" [13]="14" [14]="15" [15]="16" [16]="17" [17]="18" [18]="19" [19]="20" [20]="21" [21]="22" [22]="23" [23]="24" [24]="25" [25]="26" [26]="27" [27]="28" [28]="29" [29]="30" [30]="31" [31]="32" [32]="33" [33]="34" [34]="35" [35]="36" [36]="37" [37]="38" [38]="39" [39]="40" [40]="41" [41]="42" [42]="43" [43]="44" [44]="45" [45]="46" [46]="47" [47]="48" [48]="49" [49]="50" [50]="51" [51]="52" [52]="53" [53]="54" [54]="55" [55]="56" [56]="57" [57]="58" [58]="59" [80]="81" [81]="82" [82]="83" [83]="84" [84]="85" [85]="86" [86]="87" [87]="88" [88]="89" [89]="90" [90]="91" [91]="92" [92]="93" [93]="94" [94]="95" [95]="96" [96]="97" [97]="98" [98]="99" [99]="100")
# ... note ......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................^^^^^^^^^^^^^^^^^^^
And
for n in "${ary[#]}"; do echo $n; done
# or more concisely
printf '%d\n' "${ary[#]}"
excludes 60-80
Something like this maybe?
#!/usr/bin/env bash
arr=({1..100})
exclude=({60..80})
exclude_pattern=$(IFS='|'; printf '%s' "#(${exclude[*]})")
for i in "${arr[#]}"; do
[[ $i == $exclude_pattern ]] && continue
printf '%d\n' "$i"
done
Or create a temp array to build up the elements.
#!/usr/bin/env bash
arr=({1..100})
exclude=({60..80})
exclude_pattern=$(IFS='|'; printf '%s' "#(${exclude[*]})")
for i in "${arr[#]}"; do
[[ $i == $exclude_pattern ]] && continue
included_arr+=("$i")
done
arr=("${included_arr[#]}")
declare -p arr

How do I create large CSVs in seconds?

I am trying to create 1000s of large CSVs rapidly. This function generates the CSVs:
function csvGenerator () {
for ((i=1; i<=$NUMCSVS; i++)); do
CSVNAME=$DIRNAME"-"$CSVPREFIX$i$CSVEXT
HEADERARRAY=()
if [[ ! -e $CSVNAME ]]; then #Only create csv file if it not exist
touch $CSVNAME
echo "file: "$CSVNAME "created at $(date)" >> ../status.txt
fi
for ((j=1; j<=$NUMCOLS; j++)); do
if (( j < $NUMCOLS )) ; then
HEADERNAME=$DIRNAME"-csv-"$i"-header-"$j", "
elif (( j == $NUMCOLS )) ; then
HEADERNAME=$DIRNAME"-csv-"$i"-header-"$j
fi
HEADERARRAY+=$HEADERNAME
done
echo $HEADERARRAY > $CSVNAME
for ((k=1; k<=$NUMROWS; k++)); do
ROWARRAY=()
for ((l=1; l<=$NUMCOLS; l++)); do
if (( l < $NUMCOLS )) ; then
ROWVALUE=$DIRNAME"-csv-"$i"-r"$k"c"$l", "
elif (( l == $NUMCOLS )) ; then
ROWVALUE=$DIRNAME"-csv-"$i"-r"$k"c"$l
fi
ROWARRAY+=$ROWVALUE
done
echo $ROWARRAY >> $CSVNAME
done
done
}
The script takes ~3 mins to generate a CSV with 100k rows and 70 cols. What do I need to do to generate these CSVs at the rate of 1 CSV/~10 seconds?
Let me start by saying that bash and "performant" don't usually go together in the same sentence. As other commentators suggested, awk may be a good choice that's adjacent in some senses.
I haven't yet had a chance to run your code, but it opens and closes the output file once per row — in this example, 100,000 times. Each time it must seek to the end of the file so that it can append the latest row.
Try pulling the actual generation (everything after for ((j=1; j<=$NUMCOLS; j++)); do) into a new function, like generateCsvContents. In that new function, don't reference $CSVNAME, and remove the redirections on the echo statements. Then, in the original function, call the new function and redirect its output to the filename. Roughly:
function csvGenerator () {
for ((i=1; i<=NUMCSVS; i++)); do
CSVNAME=$DIRNAME"-"$CSVPREFIX$i$CSVEXT
if [[ ! -e $CSVNAME ]]; then #Only create csv file if it not exist
echo "file: $CSVNAME created at $(date)" >> ../status.txt
fi
# This will create $CSVNAME if it doesn't yet exist
generateCsvContents > "$CSVNAME"
done
}
function generateCsvContents() {
HEADERARRAY=()
for ((j=1; j<=NUMCOLS; j++)); do
if (( j < NUMCOLS )) ; then
HEADERNAME=$DIRNAME"-csv-"$i"-header-"$j", "
elif (( j == NUMCOLS )) ; then
HEADERNAME=$DIRNAME"-csv-"$i"-header-"$j
fi
HEADERARRAY+=$HEADERNAME
done
echo $HEADERARRAY
for ((k=1; k<=NUMROWS; k++)); do
ROWARRAY=()
for ((l=1; l<=NUMCOLS; l++)); do
if (( l < NUMCOLS )) ; then
ROWVALUE=$DIRNAME"-csv-"$i"-r"$k"c"$l", "
elif (( l == NUMCOLS )) ; then
ROWVALUE=$DIRNAME"-csv-"$i"-r"$k"c"$l
fi
ROWARRAY+=$ROWVALUE
done
echo "$ROWARRAY"
done
}
"Not this way" is I think the answer.
There are a few problems here.
You're not using your arrays as arrays. When you treat them like strings, you affect only the first element in the array, which is misleading.
The way you're using >> causes the output file to be opened and closed once for every line. That's potentially wasteful.
You're not quoting your variables. In fact, you're quoting the stuff that doesn't need quoting, and not quoting the stuff that does.
Upper case variable names are not recommended, due to the risk of collision with system variables. ref
Bash isn't good at this. Really.
A cleaned up version of your function might look like this:
csvGenerator2() {
for (( i=1; i<=NUMCSVS; i++ )); do
CSVNAME="$DIRNAME-$CSVPREFIX$i$CSVEXT"
# Only create csv file if it not exist
[[ -e "$CSVNAME" ]] && continue
touch "$CSVNAME"
date "+[%F %T] created: $CSVNAME" | tee -a status.txt >&2
HEADER=""
for (( j=1; j<=NUMCOLS; j++ )); do
printf -v HEADER '%s, %s-csv-%s-header-%s' "$HEADER" "$DIRNAME" "$i" "$j"
done
echo "${HEADER#, }" > "$CSVNAME"
for (( k=1; k<=NUMROWS; k++ )); do
ROW=""
for (( l=1; l<=NUMCOLS; l++ )); do
printf -v ROW '%s, %s-csv-%s-r%sc%s' "$ROW" "$DIRNAME" "$i" "$k" "$l"
done
echo "${ROW#, }"
done >> "$CSVNAME"
done
}
(Note that I haven't switched the variables to lower case because I'm lazy, but it's still a good idea.)
And if you were to make something functionally equivalent in awk:
csvGenerator3() {
awk -v NUMCSVS="$NUMCSVS" -v NUMCOLS="$NUMCOLS" -v NUMROWS="$NUMROWS" -v DIRNAME="$DIRNAME" -v CSVPREFIX="$CSVPREFIX" -v CSVEXT="$CSVEXT" '
BEGIN {
for ( i=1; i<=NUMCSVS; i++) {
out=sprintf("%s-%s%s%s", DIRNAME, CSVPREFIX, i, CSVEXT)
if (!system("test -e " CSVNAME)) continue
system("date '\''+[%F %T] created: " out "'\'' | tee -a status.txt >&2")
comma=""
for ( j=1; j<=NUMCOLS; j++ ) {
printf "%s%s-csv-%s-header-%s", comma, DIRNAME, i, j > out
comma=", "
}
printf "\n" >> out
for ( k=1; k<=NUMROWS; k++ ) {
comma=""
for ( l=1; l<=NUMCOLS; l++ ) {
printf "%s%s-csv-%s-r%sc%s", comma, DIRNAME, i, k, l >> out
comma=", "
}
printf "\n" >> out
}
}
}
'
}
Note that awk does not suffer from the same open/closer overhead mentioned earlier with bash; when a file is used for output or as a pipe, it gets opened once and is left open until it is closed.
Comparing the two really highlights the choice you need to make:
$ time bash -c '. file; NUMCSVS=1 NUMCOLS=10 NUMROWS=100000 DIRNAME=2 CSVPREFIX=x CSVEXT=.csv csvGenerator2'
[2019-03-29 23:57:26] created: 2-x1.csv
real 0m30.260s
user 0m28.012s
sys 0m1.395s
$ time bash -c '. file; NUMCSVS=1 NUMCOLS=10 NUMROWS=100000 DIRNAME=3 CSVPREFIX=x CSVEXT=.csv csvGenerator3'
[2019-03-29 23:58:23] created: 3-x1.csv
real 0m4.994s
user 0m3.297s
sys 0m1.639s
Note that even my optimized bash version is only a little faster than your original code.
Refactoring your two inner for-loops to loops like this will save time:
for ((j=1; j<$NUMCOLS; ++j)); do
HEADERARRAY+=$DIRNAME"-csv-"$i"-header-"$j", "
done
HEADERARRAY+=$DIRNAME"-csv-"$i"-header-"$NUMCOLS

bash read strings and output as one key and multiple values

Assuming there is an input:
1,2,C
We are trying to output it as
KEY=1, VAL1=2, VAL2=C
So far trying to modify from here:
Is there a way to create key-value pairs in Bash script?
for i in 1,2,C ; do KEY=${i%,*,*}; VAL1=${i#*,}; VAL2=${i#*,*,}; echo $KEY" XX "$VAL1 XX "$VAL2"; done
Output:
1 XX 2,c XX c
Not entirely sure what the pound ("#") and % here mean above, making the modification kinda hard.
Could any guru enlighten? Thanks.
I would generally prefer easier to read code, as bash can get ugly pretty fast.
Try this:
key_values.sh
#!/bin/bash
IFS=,
count=0
# $* is the expansion of all the params passed in, i.e. $1, $2, $3, ...
for i in $*; do
# '-eq' is checking for equality, i.e. is $count equal to zero.
if [ $count -eq 0 ]; then
echo -n "KEY=$i"
else
echo -n ", VAL${count}=$i"
fi
count=$(( $count + 1 ))
done
echo
Example
key_values.sh 1,2,ABC,123,DEF
Output
KEY=1, VAL1=2, VAL2=ABC, VAL3=123, VAL4=DEF
Expanding on anishsane's comment:
$ echo $1
1,2,3,4,5
$ IFS=, read -ra args <<<"$1" # read into an array
$ out="KEY=${args[0]}"
$ for ((i=1; i < ${#args[#]}; i++)); do out+=", VAL$i=${args[i]}"; done
$ echo "$out"
KEY=1, VAL1=2, VAL2=3, VAL3=4, VAL4=5

slow running script. How can I increase its speed?

How can I speed this up? it's taking about 5 minutes to make one file...
it runs correctly, but I have a little more than 100000 files to make.
Is my implementation of awk or sed slowing it down? I could break it down into several smaller loops and run it on multiple processors but one script is much easier.
#!/bin/zsh
#1000 configs per file
alpha=( a b c d e f g h i j k l m n o p q r s t u v w x y z )
m=1000 # number of configs per file
t=1 #file number
for (( i=1; i<=4; i++ )); do
for (( j=i; j<=26; j++ )); do
input="arc"${alpha[$i]}${alpha[$j]}
n=1 #line number
#length=`sed -n ${n}p $input| awk '{printf("%d",$1)}'`
#(( length= $length + 1 ))
length=644
for ((k=1; k<=$m; k++ )); do
echo "$hmbi" >> ~/Glycine_Tinker/configs/config$t.in
echo "jobtype = energy" >> ~/Glycine_Tinker/configs/config$t.in
echo "analyze_only = false" >> ~/Glycine_Tinker/configs/config$t.in
echo "qm_path = qm_$t" >> ~/Glycine_Tinker/configs/config$t.in
echo "mm_path = aiff_$t" >> ~/Glycine_Tinker/configs/config$t.in
cat head.in >> ~/Glycine_Tinker/configs/config$t.in
water=4
echo $k
for (( l=1; l<=$length; l++ )); do
natom=`sed -n ${n}p $input| awk '{printf("%d",$1)}'`
number=`sed -n ${n}p $input| awk '{printf("%d",$6)}'`
if [[ $natom -gt 10 && $number -gt 0 ]]; then
symbol=`sed -n ${n}p $input| awk '{printf("%s",$2)}'`
x=`sed -n ${n}p $input| awk '{printf("%.10f",$3)}'`
y=`sed -n ${n}p $input| awk '{printf("%.10f",$4)}'`
z=`sed -n ${n}p $input| awk '{printf("%.10f",$5)}'`
if [[ $water -eq 4 ]]; then
echo "--" >> ~/Glycine_Tinker/configs/config$t.in
echo "0 1 0.4638" >> ~/Glycine_Tinker/configs/config$t.in
water=1
fi
echo "$symbol $x $y $z" >> ~/Glycine_Tinker/configs/config$t.in
(( water= $water + 1 ))
fi
(( n= $n + 1 ))
done
cat tail.in >> ~/Glycine_Tinker/configs/config$t.in
(( t= $t + 1 ))
done
done
done
One thing that is going to be killing you here is the sheer number of processes being created. Especially when they are doing the exact same thing.
Consider doing the sed -n ${n}p $input once per loop iteration.
Also consider doing the equivalent of awk as a shell array assignment, then accessing the individual elements.
With these two things you should be able to get the 12 or so processes (and the shell invocation via back quotes) down to a single shell invocation and the backquote.
Obviously, Ed's advice is far preferable, but if you don't want to follow that, I had a couple of thoughts...
Thought 1
Rather than run echo 5 times and cat head.in onto the Glycine file, each of which causes the file to be opened, seeked (or sought maybe) to the end, and appended, you could do that in one go like this:
# Instead of
hmbi=3
echo "$hmbi" >> ~/Glycine_thing
echo "jobtype = energy" >> ~/Glycine_thing
echo "somethingelse" >> ~/Glycine_thing
echo ... >> ~/Glycine_thing
echo ... >> ~/Glycine_thing
cat ... >> ~/Glycine_thing
# Try this
{
echo "$hmbi"
echo "jobtype = energy"
echo "somethingelse"
echo
echo
cat head.in
} >> ~/Glycine_thing
# Or, better still, this
echo -e "$hmbi\njobtype = energy\nsomethingelse" >> Glycine_thing
# Or, use a here-document, as suggested by #mklement0
cat -<<EOF >>Glycine
$hmbi
jobtype = energy
next thing
EOF
Thought 2
Rather than invoke sed and awk 5 times to find 5 parameters, just let awk do what sed was doing, and also do all 5 things in one go:
read symbol x y z < <(awk '...{printf "%.10f %.10f %.10f" $2,$3,$4}' $input)

Resources