How do I print out 2 separate arrays with new lines in bash script - bash

So basically I want to be able to print out 2 separate arrays with newlines between each element.
Sample output I'm looking for:
a x
b y
(a,b being apart of one array x,y being a separate array)
Currently im using:
printf "%s\n" "${words[#]} ${newWords[#]}"
But the output comes out like:
a
b x
y

As bash is tagged, you could use paste from GNU coreutils with each array as an input:
$ words=(a b)
$ newWords=(x y)
$ paste <(printf '%s\n' "${words[#]}") <(printf '%s\n' "${newWords[#]}")
a x
b y
TAB is the default column separator but you can change it with option -d.
If you have array items that might contain newlines, you can switch to e.g. NUL-delimited strings by using the -z flag and producing each input using printf '%s\0'.

What does "${words[#]} ${newWords[#]}" produce? Let's put that expansion into another array and see what's inside it:
words=(a b)
newWords=(x y)
tmp=("${words[#]} ${newWords[#]}")
declare -p tmp
declare -a tmp=([0]="a" [1]="b x" [2]="y")
So, the last element of the first array and the first element of the second array are joined as a string; the other elements remain individual.
paste with 2 process substitutions is a good way to solve this. If you want to do it in plain bash, iterate over the indices of the arrays:
for idx in "${!words[#]}"; do
printf '%s\t%s\n' "${words[idx]}" "${newWords[idx]}"
done

Related

How to iterate over multiple variables and echo them using Shell Script?

Consider the below variables which are dynamic and might change each time. Sometimes there might even be 5 variables, But the length of all the variables will be the same every time.
var1='a b c d e... upto z'
var2='1 2 3 4 5... upto 26'
var3='I II III IV V... upto XXVI'
I am looking for a generalized approach to iterate the variables in a for loop & My desired output should be like below.
a,1,I
b,2,II
c,3,III
d,4,IV
e,5,V
.
.
goes on upto
z,26,XXVI
If I use nested loops, then I get all possible combinations which is not the expected outcome.
Also, I know how to make this work for 2 variables using for loop and shift using below link
https://unix.stackexchange.com/questions/390283/how-to-iterate-two-variables-in-a-sh-script
With paste
paste -d , <(tr ' ' '\n' <<<"$var1") <(tr ' ' '\n' <<<"$var2") <(tr ' ' '\n' <<<"$var3")
a,1,I
b,2,II
c,3,III
d,4,IV
e...z,5...26,V...XXVI
But clearly having to add other parameter substitutions for more varN's is not scalable.
You need to "zip" two variables at a time.
var1='a b c d e...z'
var2='1 2 3 4 5...26'
var3='I II III IV V...XXVI'
zip_var1_var2 () {
set $var1
for v2 in $var2; do
echo "$1,$v2"
shift
done
}
zip_var12_var3 () {
set $(zip_var1_var2)
for v3 in $var3; do
echo "$1,$v3"
shift
done
}
for x in $(zip_var12_var3); do
echo "$x"
done
If you are willing to use eval and are sure it is safe to do so, you can write a single function like
zip () {
if [ $# -eq 1 ]; then
eval echo \$$1
return
fi
a1=$1
shift
x=$*
set $(eval echo \$$a1)
for v in $(zip $x); do
printf '=== %s\n' "$1,$v" >&2
echo "$1,$v"
shift
done
}
zip var1 var2 var3 # Note the arguments are the *names* of the variables to zip
If you can use arrays, then (for example, in bash)
var1=(a b c d e)
var2=(1 2 3 4 5)
var3=(I II III IV V)
for i in "${!var1[#]}"; do
printf '%s,%s,%s\n' "${var1[i]}" "${var2[i]}" "${var3[i]}"
done
Use this Perl one-liner:
perl -le '#in = map { [split] } #ARGV; for $i ( 0..$#{ $in[0] } ) { print join ",", map { $in[$_][$i] } 0..$#in; }' "$var1" "$var2" "$var3"
Prints:
a,1,I
b,2,II
c,3,III
d,4,IV
e,5,V
z,26,XXVI
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-l : Strip the input line separator ("\n" on *NIX by default) before executing the code in-line, and append it when printing.
The input variables must be quoted with double quotes "like so", to keep the blank-separated words from being treated as separate arguments.
#ARGV is an array of the command line arguments, here $var1, $var2, $var3.
#in is an array of 3 elements, each element being a reference to an array obtained as a result of splitting the corresponding element of #ARGV on whitespace. Note that split splits the string on whitespace by default, but you can specify a different delimiter, it accepts regexes.
The subsequent for loop prints #in elements separated by comma.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches
perldoc perlvar: Perl predefined variables
The following is (almost) a copy of this answer with a few tweaks that make it fit this question.
The Original Question
First let’s assign a few variables to play with, 26 tokens in each of them:
var1="$(echo {a..z})"
var2="$(echo {1..26})"
var3="$(echo I II III IV \
V{,I,II,III} IX \
X{,I,II,III} XIV \
XV{,I,II,III} XIX \
XX{,I,II,III} XXIV \
XXV XXVI)"
var4="$(echo {A..Z})"
var5="$(echo {010101..262626..10101})"
Now we want a “magic” function that zips an arbitrary number of variables, ideally in pure Bash:
zip_vars var1 # a trivial test
zip_vars var{1..2} # a slightly less trivial test
zip_vars var{1..3} # the original question
zip_vars var{1..4} # more vars, becasuse we can
zip_vars var{1..5} # more vars, because why not
What could zip_vars look like? Here’s one in pure Bash, without any external commands:
zip_vars() {
local var
for var in "$#"; do
local -a "array_${var}"
local -n array_ref="array_${var}"
array_ref=(${!var})
local -ar "array_${var}"
done
local -n array_ref="array_${1}"
local -ir size="${#array_ref[#]}"
local -i i
local output
for ((i = 0; i < size; ++i)); do
output=
for var in "$#"; do
local -n array_ref="array_${var}"
output+=",${array_ref[i]}"
done
printf '%s\n' "${output:1}"
done
}
How it works:
It splits all variables (passed by reference (by variable name)) into arrays. For each variable varX it creates a local array array_varX.
It would be actually way easier if the input variables were already Bash arrays to start with (see below), but … we stick with the original question initially.
It determines the size of the first array and then blindly expects all arrays to be of that size.
For each index i from 0 to size - 1 it concatenates the ith elements of all arrays, separated by ,.
Arrays Make Things Easier
If you use Bash arrays from the very start, the script will be shorter and look simpler and there won’t be any string-to-array conversions.
zip_arrays() {
local -n array_ref="$1"
local -ir size="${#array_ref[#]}"
local -i i
local output
for ((i = 0; i < size; ++i)); do
output=
for arr in "$#"; do
local -n array_ref="$arr"
output+=",${array_ref[i]}"
done
printf '%s\n' "${output:1}"
done
}
arr1=({a..z})
arr2=({1..26})
arr3=( I II III IV
V{,I,II,III} IX
X{,I,II,III} XIV
XV{,I,II,III} XIX
XX{,I,II,III} XXIV
XXV
XXVI)
arr4=({A..Z})
arr5=({010101..262626..10101})
zip_arrays arr1 # a trivial test
zip_arrays arr{1..2} # a slightly less trivial test
zip_arrays arr{1..3} # (almost) the original question
zip_arrays arr{1..4} # more arrays, becasuse we can
zip_arrays arr{1..5} # more arrays, because why not

Convert a bash array into an awk array

I have an array in bash and want to use this array in an awk script. How can I pass the array from bash to awk?
The keys of the awk array should be the indices of the bash array. For simplicity, we can assume that the bash array is dense, that is, the array is not sparse like a=([3]=x [5]=y).
The elements inside the array can have any value. Besides strange unicode symbols and ascii control characters they may contain spaces or even newlines. Also, there might be empty ("") entries which should be retained. As an example consider the following array:
a=(AB " C D " $'E\nF\tG' "¼ẞ🍕" "")
Extending approach #1 provided by Socowi, it is possible to address the shortcoming that he identified using the awk split function. Note that this solution does not use the stdin - it uses command line options - allowing awk to process stdin, files, etc.
The solution will convert the 'a' bash array into the 'a' awk, using intermediate awk file AVG (process substituion). This is a workaround to the bash limit that prevent NUL from being stored in a string.
a=(AB " C D " $'E\nF\tG' "¼ẞ🍕" "")
awk -v AVF=<(printf '%s\0' "${a[#]}") '
BEGIN {
# Temporary RS to allow reading the array with a single read.
saveRS=RS
RS=""
getline AV < AVF
rs = saveRS
na=split(AV, a, "\\0")
# Remove trailing empty element (printf add trailing separator).
delete a[na]
na-- ; for (i=1 ; i<=na ; i++ ) print "AV#", i, "=" a[i]
}{
# Use a[x]
}
'
Output:
1 AB
2 C D
3 E
F G
4 ¼ẞ🍕
5
Previous solution: For practical reason, Using the '\001' character as separator. make the script much easier (can use any other character sequence that is known not to appear in the info array). Bash command substitution does not allow NUL character. Hopefully, not a major issue, as this control character is not used for normal files, etc. I believe possible to solve this, but I'm not how.
The solution will convert the 'a' bash array into the 'a' awk, using intermediate awk variable 'AV'.
a=(AB " C D " $'E\nF\tG' "¼ẞ🍕" "")
awk -v AV="$(printf '%s\1' "${a[#]}")" '
BEGIN {
na=split(AV, a, "\\1") }
# Remove trailing empty element (printf add trailing separator).
delete a[na]
for (i=1 ; i<=na ; i++ ) print "AV#", i, "=" a[i]
{
# Use a[x]
}
'
Approach 1: Reading in awk
Since the array elements can contain any character but the null byte (\0) we have to delimit them by \0. This is done with printf. For simplicity we assume that the array has at least one entry.
Due to the \0 we can no longer pass the string to awk as an argument but have to use (or emulate) a file instead. We then read that file in awk using \0 as the record separator RS (may require GNU awk).
awk 'BEGIN {RS="\0"} {a[n++]=$0; next}' <(printf %s\\0 "${a[#]}")
This reliably constructs the awk array a from the bash array a. The length of a is stored in n.
This approach is ugly when you actually want to use it. There is no simple step-by-step instruction on how to incorporate this approach into your existing awk script. Normally, your awk script would read another file afterwards, therefore you have to change the record separator RS after the array file was read. This can be done with NR>FNR. However, if your awk script already reads multiple files and relies on something like NR==FNR things get complicated.
Approach 2: Generating awk Code with bash
Instead of parsing the array in awk we hard-code the array by generating awk code. This code will be injected at the beginning of an existing awk script and initialize the array. This approach also supports sparse arrays and associative arrays and should work with all awk versions, not only GNU.
For the code generation we have to correctly quote all strings. For example, the code generator echo "a[0]=${a[0]}" would fail if ${a[0]} was " resulting in the code a[1]=""". POSIX awk supports octal escape sequences (\012) which can encode all bytes. We simply encoding everything. That way we cannot forget any special symbols (even though the generated code is a bit inefficient).
octString() {
printf %s "$*" | od -bvAn | tr ' ' '\\' | tr -d '\n'
}
arrayToAwk() {
printf 'BEGIN{'
n=0
for key in "${!a[#]}"; do
printf 'a["%s"]="%s";' "$(octString "$key")" "$(octString "${a[$key]}")"
((n++))
done
echo "n=$n}"
}
The function arrayToAwk converts the bash array a (can be sparse or associative) into a BEGIN block. After inserting the generated code block at the begging of your existing awk program you can use the awk array a anywhere inside awk without having to adapt anything (assuming that the variable names a and n were unused before). n is the size of the awk array a.
For awk commands of the form awk ... 'program' ... use
awk ... "$(arrayToAwk)"'program' ...
For big arrays this might result in the error Argument list too long. You can circumvent this problem using a program file:
awk ... -f <(arrayToAwk; echo 'program') ...
For awk commands of the form awk ... -f progfile ... use
awk ... -f <(arrayToAwk; cat progfile) ...
I'd like to point out that this can be extremely simple if you do not mind using ARGV and deleting all the non-file arguments. One way:
>cat awk_script.sh
#!/bin/awk -f
BEGIN{
i=1
while(ARGV[i] != "--" && i < ARGC) {
print ARGV[i]
delete ARGV[i]
i++
}
if(i < ARGC)
delete ARGV[i]
} {
print "File 1 contains at 1",$1
}
Then run it with:
>./awk_script.sh "${a[#]}" -- file1
AB
C D
E
F G
¼ẞ�
File 1 contains at 1 a
Obviously I'm missing some symbols.
Note while I like this method it assumes -- is not in the array, as pointed out by Oguz Ismail. They give a great alternate solution of having the first argument the length of your list.
This can be a one liner to where you have
awk 'BEGIN{... get and delete first arguments ...}{process files}END{if wanted} "${a[#]}" file1 file2...
but will become unreadable very quickly.

Loop over two associative arrays in Bash

Say I have two associative arrays in Bash
declare -A a
declare -A b
a[xz]=1
b[xz]=2
a[zx]=3
b[zx]=4
I want to do something like this
for arr in ${a[#]} ${b[#]}; do echo ${arr[zx]}; done
and get 3 and 4 in output
but I get
$ for arr in ${a[#]} ${b[#]}; do echo ${arr[zx]}; done
1
3
2
4
Is there a way to do this in Bash?
You don't want to iterate over the contents; you want to iterate over the names of the arrays, then use indirect expansion to get the desired value of the fixed key from each array.
for arr in a b; do
t=$arr[zx] # first a[zx], then b[zx]
printf '%s\n' "${!t}"
done
Here, the variable "name" for use in indirect expansion is the name of the array along with the desired index.
Assuming the keys in both the arrays match(a major assumption), you can use one array as reference and loop over the keys and print in each array.
for key in "${!a[#]}"; do
printf "Array-1(%s) %s Array-2(%s) %s\n" "$key" "${a[$key]}" "$key" "${b[$key]}"
done
which produces an output as below. You can of-course remove the fancy debug words(Array-1, Array-2) which was added just for an understanding purpose.
Array-1(xz) 1 Array-2(xz) 2
Array-1(zx) 3 Array-2(zx) 4
One general good practice is always quote (for key in "${!a[#]}") your array expansions in bash, so that the elements are not subjected to word-splitting by the shell.

The semantics of arrays in bash

Check out the following transcript. With all possible rigor and formality, what is going on at each step?
$> ls -1 #This command prints 3 items. no explanation required.
a
b
c
$> X=$(ls -1) #Capture the output (as what? a string?)
$> Y=($(ls -1)) #Capture it again (as an array now?)
$> echo ${#X[#]} #Why is the length 1?
1
$> echo ${#Y[#]} #This works because Y is an array of the 3 items?
3
$> echo $X #Why are the linefeeds now spaces?
a b c
$> echo $Y #Why does the array echo as its first element
a
$> for x in $X;do echo $x; done #iterate over $X
a
b
c
$> for y in $Y;do echo $y; done #iterating over y doesn't work
a
$> echo ${X[2]} #I can loop over $X but not index into it?
$> echo ${Y[2]} #Why does this work if I can't loop over $Y?
c
I assume bash has well established semantics about how arrays and text variables (if that's even what they're called) work, but the user manual is not organized in an optimal fashion for someone who wants to reason about scripts based on whatever small set of underlying principles the language designer intended.
Let me preface the following with the very strong suggestion that you never use ls to populate an array. The correct code would be
Z=( * )
to create an array with each (non-hidden) file in the current directory as a distinct array element.
$> ls -1 #This command prints 3 items. no explanation required.
a
b
c
Correct. Each file name is printed on a separate line (although, beware of file names containing newlines; the parts before and after each newline would appear as separate file names.)
$> X=$(ls -1) #Capture the output (as what? a string?)
Yes. The output of ls is concatenated by the command substitution into a single string using a single space to separate each line. (The command substitution would be subject to word-splitting if it weren't the right-hand side of an assignment; word-splitting will come up below.)
$> Y=($(ls -1)) #Capture it again (as an array now?)
Same as with X, but now each of the words in the result of the command substitution is treated as a separate array element. As long as none of the output lines contain any characters in the value of IFS, each file name is one word and will be treated as a separate array element.
$> echo ${#X[#]} #Why is the length 1?
1
X, not being a real array, is treated as an array with a single element, namely the value of $X.
$> echo ${#Y[#]} #This works because Y is an array of the 3 items?
3
Correct.
$> echo $X #Why are the linefeeds now spaces?
a b c
When $X is unquoted, the resulting expansion is subject to word-splitting. In this case, the newlines are simply treated the same as any other whitespace, separating the result into a sequence of words that are passed to echo as distinct arguments, which are then displayed separated by a single space each.
$> echo $Y #Why does the array echo as its first element
a
For a true array, $Y is equivalent to ${Y[0]}.
$> for x in $X;do echo $x; done #iterate over $X
a
b
c
This works, but has caveats.
$> for y in $Y;do echo $y; done #iterating over y doesn't work
a
See above; $Y only expands to the first element. You want for y in "${Y[#]}"; do to iterate over all the elements.
$> echo ${X[2]} #I can loop over $X but not index into it?
Correct. X is not an array, but $X expanded to a space-separated list which the for loop could iterate over.
$> echo ${Y[2]} #Why does this work if I can't loop over $Y?
c
Indexing and iteration are two completely different things in shell. You don't actually iterate over an array; you iterate over the resulting sequence of words of a properly expanded array.

Element matching within the different lists

As usually :) I have two sets of input files with the same names but different extensions.
Using bash I made simple script which create 2 lists with identical elements names while looping 2 sets of the files from 2 dirs and using only it names w/o extensions as the elements of those lists:
#!/bin/bash
workdir=/data2/Gleb/TEST/claire+7d4+md/water_analysis/MD1
traj_all=${workdir}/tr_all
top_all=${workdir}/top_all
#make 2 lists for both file types
Trajectories=('');
Topologies=('');
#looping of 1st input files
echo "Trr has been found in ${traj_all}:"
for tr in ${traj_all}/*; do # ????
tr_n_full=$(basename "${tr}")
tr_n="${tr_n_full%.*}"
Trajectories=("${Trajectories[#]}" "${tr_n}");
done
#sort elements within ${Trajectories[#]} lists!! >> HERE I NEED HELP!
#looping of 2nd files
echo "Top has been found in ${top_all}:"
for top in ${top_all}/*; do # ????
top_n_full=$(basename "${top}")
top_n="${top_n_full%.*}"
Topologies=("${Topologies[#]}" "${top_n}");
done
#sort elements within ${Topologies[#] lists!! >> HERE I NEED HELP!
#make input.in file for some program- matching of elements from both lists >> HERE I NEED HELP!
for i in $(seq 1 ${#Topologies[#]}); do
printf "parm $top_all/${Topologies[i]}.top \ntrajin $traj_all/${Trajectories[i]}.mdcrd\nwatershell ${Area} ${output}/watershell_${Topologies[i]}_${Area}.dat > output.in
done
I'd thankful if someone provide me with good possibility how to improve this script:
1) I need to sort elements in both lists in the similar pattern after last elements have been added in each of them;
2) I need to add some test on the LAST step of the script which will create final output.in file only in case if elements are the same (in principle in this case it always should be the same!) in both lists which are matched during this operation by printf.
Thanks for the help,
Gleb
Here's a simpler way to create the arrays:
# Create an empty array
Trajectories=();
for tr in "${traj_all}"/*; do
# Remove the string of directories
tr_base=${tr##*/}
# Append the name without extension to the array
Trajectories+="${tr_base%.*}"
done
In bash, this will normally result in a sorted list, because the expansion of * in the glob is sorted. But you can sort it with sort; it is simplest if you are certain there are no newlines in the filenames:
mapfile -t sorted_traj < <(printf %s\\n "${Trajectories[#]}" | sort)
To compare two sorted arrays, you could use join:
# convenience function; could have helped above, too.
lines() { printf %s\\n "$#"; }
# Some examples:
# 1. compare a with b and print the lines which are only in a
join -v 1 -t '' <(lines "${a[#]}") <(lines "${b[#]}")
# 2. create c as an array with the lines which are only in b
mapfile -t c < <( join -v 2 -t '' <(lines "${a[#]}") <(lines "${b[#]}") )
If you create both difference lists, then the two arrays are equal if both lists are empty. If you are expecting both arrays to be the same, though, and if this is time-critical (probably not), you could do a simple precheck:
if [[ "${a[*]}" = "${b[*]}" ]]; then
# the arrays are the same
else
# the arrays differ; do some more work to see how.
fi

Resources