BASH Iterating into an array from a file [duplicate] - bash

This question already has answers here:
Creating an array from a text file in Bash
(7 answers)
Closed 2 years ago.
If I have document and want to iterate the second column of the document into an array, would there be a simple way to do this. At present I am trying by using:
cat file.txt | awk -F'\t' '{print $2}' | sort -u
This lists all the unique items in the second column to standard out.
The question is ...how do I now add these items to an array, considering some of these items have whitespace.
I have been trying to declare an array
arr=()
and then tried
${arr}<<cat file.txt | awk -F'\t' '{print $2}' | sort -u

Bash4+ has mapfile aka readarray plus a Process Substituion.
mapfile -t array < <(awk -F'\t' '{print $2}' file.txt | sort -u)
If you don't have bash4+
while IFS= read -r line; do
array+=("$line")
done < <(awk -F'\t' '{print $2}' file.txt | sort -u)
To see the structure in the array
declare -p array
By default read strips leading and trailing white space so to work around that you need to use IFS= to preserve the default line structure.
The -t option from mapfile -t Remove a trailing DELIM from each line read (default newline)

Bash 3 has read -a to read IFS delimited fields from a file stream to an array.
The -d '' switch tells read, the record delimiter is null, so it reads fields until it reaches the end of the file stream EOF or a null character.
declare -a my_array
IFS=$'\n' read -r -d '' -a my_array < <(cut -f2 < file.txt | sort -u)

Related

Bash: read property file into Array

I'm trying to read a property file like this one into a set of arrays:
DATABASE="mysql57"
DB_DRIVER_XA="com.mysql.cj.jdbc.MysqlXADataSource"
DB_DRIVER_CLASS="com.mysql.cj.jdbc.Driver"
DATABASE="db2_111"
DB_DRIVER_XA="com.ibm.db2.jcc.DB2XADataSource"
DB_DRIVER_CLASS="com.ibm.db2.jcc.DB2Driver"
I've found the following grep to be useful to store each key into its array:
filename=conf.properties
dblist=($(grep "DATABASE" $filename))
xadriver=($(grep "DB_DRIVER_XA" $filename))
driver=($(grep "DB_DRIVER_CLASS" $filename))
The problem is that the above solution stores into the array KEY=VALUE:
printf '%s\n' "${dblist[#]}"
DATABASE="mysql57"
DATABASE="db2_111"
I'd like to have in each array only the value. Is there a simple way to do it rather than looping over the array and maybe use "cut" to remove the "KEY=" part?
Sure:
databases=()
xas=()
classes=()
while IFS="=" read -r var value; do
without_quotes=${value//\"/}
case $var in
DATABASE) databases+=( "$without_quotes" ) ;;
DB_DRIVER_XA) xas+=( "$without_quotes" ) ;;
DB_DRIVER_CLASS) classes+=( "$without_quotes" ) ;;
esac
done < file
declare -p databases xas classes
declare -a databases='([0]="mysql57" [1]="db2_111")'
declare -a xas='([0]="com.mysql.cj.jdbc.MysqlXADataSource" [1]="com.ibm.db2.jcc.DB2XADataSource")'
declare -a classes='([0]="com.mysql.cj.jdbc.Driver" [1]="com.ibm.db2.jcc.DB2Driver")'
The take-away is to use IFS with the read command to split the line into fields, and store the results in separate variables.
Use awk -F= to split each line into key and value, and sed to strip out the quotes.
dblist=( $(awk -F= '$1=="DATABASE" {print $2}' "$filename" | sed 's/"//g'))
xadriver=($(awk -F= '$1=="DB_DRIVER_XA" {print $2}' "$filename" | sed 's/"//g'))
driver=( $(awk -F= '$1=="DB_DRIVER_CLASS" {print $2}' "$filename" | sed 's/"//g'))
Then, it would be better to use readarray to populate arrays to prevent word splitting on spaces and glob expansion on * and ?.
readarray -t dblist < <(awk -F= '$1=="DATABASE" {print $2}' "$filename" | sed 's/"//g')
readarray -t xadriver < <(awk -F= '$1=="DB_DRIVER_XA" {print $2}' "$filename" | sed 's/"//g')
readarray -t driver < <(awk -F= '$1=="DB_DRIVER_CLASS" {print $2}' "$filename" | sed 's/"//g')

How to assign options to a subscript by looping through a TSV?

I have a TSV file with 3 columns, that is assigned to paramfile.
Here is my script:
#! /bin/bash -l
paramfile=/path/to/file
while
sample=`sed -n ${number}p $paramfile | awk '{print $1}'`
Reads1=`sed -n ${number}p $paramfile | awk '{print $2}'`
Reads2=`sed -n ${number}p $paramfile | awk '{print $3}'`
do
./program.sh $sample $reads1 $reads2
done
I want it to read the TSV line by line, and for each line take the content of each column and insert it into my program, to be used as an option in program.sh
I know I haven't got the loop qutie right, what am I missing?
read with a ‘custom’ $IFS can read TSV* into variables, e.g:
#!/bin/bash
paramfile=/path/to/file
while IFS="$(printf '\t')" read -r sample reads1 reads2 _
do
./program.sh "${sample}" "${reads1}" "${reads2}"
done < "${paramfile}"
The _ is for dropping any trailing cells.
And I took the liberty to quote all variables, as one should.
*Not quoted TSV, though.

Bash - readarray contains only one element

I'm writing this script to count some variables from an input file. I can't figure out why it is not counting the elements in the array (should be 500) but only counts 1.
#initializing variables
timeout=5
headerFile="lab06.output"
dataFile="fortune500.tsv"
dataURL="http://www.tech.mtu.edu/~toarney/sat3310/lab09/"
dataPath="/home/pjvaglic/Documents/labs/lab06/data/"
curlOptions="--silent --fail --connect-timeout $timeout"
#creating the array
declare -a myWebsitearray #=('cut -d '\t' -f3 "dataPath$dataFile"')
#obtaining the data file
wget $dataURL$dataFile -O $dataPath$dataFile
#getting rid of the crap from dos
sed -e "s/^m//" $dataPath$dataFile | readarray -t $myWebsitesarray
readarray -t myWebsitesarray < <(cut -d, -f3 $dataPath$dataFile)
myWebsitesarray=("${#myWebsitesarray[#]:1}")
#printf '%s\n' "${myWebsitesarray2[#]}"
websitesCount=${#myWebsitesarray[*]}
echo $websitesCount
You are overwriting your array with the count of elements in this line
myWebsitesarray=("${#myWebsitesarray[#]:1}")
Remove the hash sign
myWebsitesarray=("${myWebsitesarray[#]:1}")
Also, #chepner suggestions are good to follow.

Unix file pattern issue: append changing value of variable pattern to copies of matching line

I have a file with contents:
abc|r=1,f=2,c=2
abc|r=1,f=2,c=2;r=3,f=4,c=8
I want a result like below:
abc|r=1,f=2,c=2|1
abc|r=1,f=2,c=2;r=3,f=4,c=8|1
abc|r=1,f=2,c=2;r=3,f=4,c=8|3
The third column value is r value. A new line would be inserted for each occurrence.
I have tried with:
for i in `cat $xxxx.txt`
do
#echo $i
live=$(echo $i | awk -F " " '{print $1}')
home=$(echo $i | awk -F " " '{print $2}')
echo $live
done
but is not working properly. I am a beginner to sed/awk and not sure how can I use them. Can someone please help on this?
awk to the rescue!
$ awk -F'[,;|]' '{c=0;
for(i=2;i<=NF;i++)
if(match($i,/^r=/)) a[c++]=substr($i,RSTART+2);
delim=substr($0,length($0))=="|"?"":"|";
for(i=0;i<c;i++) print $0 delim a[i]}' file
abc|r=1,f=2,c=2|1
abc|r=1,f=2,c=2;r=3,f=4,c=8|1
abc|r=1,f=2,c=2;r=3,f=4,c=8|3
Use an inner routine (made up of GNU grep, sed, and tr) to compile a second more elaborate sed command, the output of which needs further cleanup with more sed. Call the input file "foo".
sed -n $(grep -no 'r=[0-9]*' foo | \
sed 's/^[0-9]*/&s#.*#\&/;s/:r=/|/;s/.*/&#p;/' | \
tr -d '\n') foo | \
sed 's/|[0-9|]*|/|/'
Output:
abc|r=1,f=2,c=2|1
abc|r=1,f=2,c=2;r=3,f=4,c=8|1
abc|r=1,f=2,c=2;r=3,f=4,c=8|3
Looking at the inner sed code:
grep -no 'r=[0-9]*' foo | \
sed 's/^[0-9]*/&s#.*#\&/;s/:r=/|/;s/.*/&#p;/' | \
tr -d '\n'
It's purpose is to parse foo on-the-fly (when foo changes, so will the output), and in this instance come up with:
1s#.*#&|1#p;2s#.*#&|1#p;2s#.*#&|3#p;
Which is almost perfect, but it leaves in old data on the last line:
sed -n '1s#.*#&|1#p;2s#.*#&|1#p;2s#.*#&|3#p;' foo
abc|r=1,f=2,c=2|1
abc|r=1,f=2,c=2;r=3,f=4,c=8|1
abc|r=1,f=2,c=2;r=3,f=4,c=8|1|3
...which old data |1 is what the final sed 's/|[0-9|]*|/|/' removes.
Here is a pure bash solution. I wouldn't recommend actually using this, but it might help you understand better how to work with files in bash.
# Iterate over each line, splitting into three fields
# using | as the delimiter. (f3 is only there to make
# sure a trailing | is not included in the value of f2)
while IFS="|" read -r f1 f2 f3; do
# Create an array of variable groups from $f2, using ;
# as the delimiter
IFS=";" read -a groups <<< "$f2"
for group in "${groups[#]}"; do
# Get each variable from the group separately
# by splitting on ,
IFS=, read -a vars <<< "$group"
for var in "${vars[#]}"; do
# Split each assignment on =, create
# the variable for real, and quit once we
# have found r
IFS== read name value <<< "$var"
declare "$name=$value"
[[ $name == r ]] && break
done
# Output the desired line for the current value of r
printf '%s|%s|%s\n' "$f1" "$f2" "$r"
done
done < $xxxx.txt
Changes for ksh:
read -A instead of read -a.
typeset instead of declare.
If <<< is a problem, you can use a here document instead. For example:
IFS=";" read -A groups <<EOF
$f2
EOF

How to sort and get unique values from an array in bash?

Im new to bash scripting... Im trying to sort and store unique values from an array into another array.
eg:
list=('a','b','b','b','c','c');
I need,
unique_sorted_list=('b','c','a')
I tried a couple of things, didnt help me ..
sorted_ids=($(for v in "${ids[#]}"; do echo "$v";done| sort| uniq| xargs))
or
sorted_ids=$(echo "${ids[#]}" | tr ' ' '\n' | sort -u | tr '\n' ' ')
Can you guys please help me in this ....
Try:
$ list=(a b b b c c)
$ unique_sorted_list=($(printf "%s\n" "${list[#]}" | sort -u))
$ echo "${unique_sorted_list[#]}"
a b c
Update based on comments:
$ uniq=($(printf "%s\n" "${list[#]}" | sort | uniq -c | sort -rnk1 | awk '{ print $2 }'))
The accepted answer doesn't work if array elements contain spaces.
Try this instead:
readarray -t unique_sorted_list < <( printf "%s\n" "${list[#]}" | sort -u )
In Bash, readarray is an alias to the built-in mapfile command. See help mapfile for details.
The -t option is to remove the trailing newline (used in printf) from each line read.

Resources