How to read table-like file into multiple arrays (bash) - bash

If I have a file example1.txt containing multiple strings
str1
str2
str3
...
I can read them into a bash array by using
mapfile -t mystrings < example1.txt.
Now say my file example2.txt is formatted as a table
str11 str12 str13
str21 str22 str23
str31 str32 str33
... ... ...
and I want to read each column into a different array. I know I can use other tools such as awk to separate each line into fields. Is there some way to combine this functionality with mapfile? I'm looking for something like
mapfile -t firstcol < $(cat example2.txt | awk '//{printf $1"\n"}')
mapfile -t secondcol < $(cat example2.txt | awk '//{printf $2"\n"}')
(which doesn't work).
Any other suggestion on how to handle a table in bash is also welcome.

Reading each row is simple, so let's build off that. I'll assume you have a proper matrix (i.e., each row has the same number of columns. This will be much easier since you are using bash 4.3.
while read -a row; do
c=0
for value in "${row[#]}"; do
declare -n column=column_$(( c++ ))
column+=( "$value" )
done
done < table.txt
There! Now, did it work?
$ echo "${column_0[#]}"
str11 str21 str31
$ echo "${column_1[#]}"
str12 str22 str32
I think so!
declare -n makes a nameref to an array (implicitly declared by the += on the next line) using a counter that increments as we iterate over each row. Then we simply append the current column value to the array behind the current nameref.

You should be using process substitution like this:
mapfile -t firstcol < <(awk '{print $1}' example2.txt)
mapfile -t secondcol < <(awk '{print $2}' example2.txt)
mapfile -t thirdcol < <(awk '{print $3}' example2.txt)

Hmm. Something like this, perhaps?
readarrays() {
declare -a values
declare idx line=0
while read -a values; do
for idx in "${!values[#]}"; do
[[ ${#:idx+1:1} ]] || break
declare -g "${#:idx+1:1}[$line]=${values[#]:idx:1}"
done
(( ++line ))
done
}
Tested as:
bash4-4.3$ (readarrays one two three <<<$'a b c\nd e f'; declare -p one two three)
declare -a one='([0]="a" [1]="d")'
declare -a two='([0]="b" [1]="e")'
declare -a three='([0]="c" [1]="f")'

Related

How to use two `IFS` in Bash

So I know I can use a single IFS in a read statement, but is it possible to use two. For instance if I have the text
variable = 5 + 1;
print variable;
And I have the code to assign every word split to an array, but I also want to split at the ; as well as a space, if it comes up.
Here is the code so far
INPUT="$1"
declare -a raw_parse
while IFS=' ' read -r -a raw_input; do
for raw in "${raw_input[#]}"; do
raw_parse+=("$raw")
done
done < "$INPUT"
What comes out:
declare -a raw_parse=([0]="variable" [1]="=" [2]="5" [3]="+" [4]="1;" [5]="print" [6]="variable;")
What I want:
declare -a raw_parse=([0]="variable" [1]="=" [2]="5" [3]="+" [4]="1" [5]=";" [6]="print" [7]="variable" [8]=";")
A workaround with GNU sed. This inserts a space before every ; and replaces every newline with a space.
read -r -a raw_input < <(sed -z 's/;/ ;/g; s/\n/ /g' "$INPUT")
declare -p raw_input
Output:
declare -a raw_input=([0]="variable" [1]="=" [2]="5" [3]="+" [4]="1" [5]=";" [6]="print" [7]="variable" [8]=";")

Loop through table and parse multiple arguments to scripts in Bash

I am in a situation similar to this one and having difficulties implementing this kind of solution for my situation.
I have file.tsv formatted as follows:
x y
dog woof
CAT meow
loud_goose honk-honk
duck quack
with a fixed number of columns (but variable rows) and I need to loop those pairs of values, all but the first one, in a script like the following (pseudocode)
for elements in list; do
./script1 elements[1] elements[2]
./script2 elements[1] elements[2]
done
so that script* can take the arguments from the pair and run with it.
Is there a way to do it in Bash?
I was thinking I could do something like this:
list1={`awk 'NR > 1{print $1}' file.tsv`}
list2={`awk 'NR > 1{print $2}' file.tsv`}
and then to call them in the loop based on their position, but I am not sure on how.
Thanks!
Shell tables are not multi-dimensional so table element cannot store two arguments for your scripts. However since you are processing lines from file.tsv, you can iterate on each line, reading both elements at once like this:
#!/usr/bin/env sh
# Populate tab with a tab character
tab="$(printf '\t')"
# Since printf's sub-shell added a trailing newline, remove it
tab="${tab%?}"
{
# Read first line in dummy variable _ to skip header
read -r _
# Iterate reading tab delimited x and y from each line
while IFS="$tab" read -r x y || [ -n "$x" ]; do
./script1 "$x" "$y"
./script2 "$x" "$y"
done
} < file.tsv # from this file
You could try just a while + read loop with the -a flag and IFS.
#!/usr/bin/env bash
while IFS=$' \t' read -ra line; do
echo ./script1 "${line[0]}" "${line[1]}"
echo ./script2 "${line[0]}" "${line[1]}"
done < <(tail -n +2 file.tsv)
Or without the tail
#!/usr/bin/env bash
skip=0 start=-1
while IFS=$' \t' read -ra line; do
if ((start++ >= skip)); then
echo ./script1 "${line[0]}" "${line[1]}"
echo ./script2 "${line[0]}" "${line[1]}"
fi
done < file.tsv
Remove the echo's if you're satisfied with the output.

BASH loop to change data from 1 csv from other csv

trying to change the value of a column based on other column in other csv
so let's say we have a CSV_1 that states with over 1000 lines with 3 columns
shape Color size
round 2 big
triangle 1 small
square 3 medium
then we have a CSV2 that has only 10 with the following information
color
1 REd
2 Blue
3 Yellow
etc
now i want to change the value in column color in CSV_1 with the name of the color of CSV2
so in other words .. something like
for (i=0; i<column.color(csv1); i++) {
if color.csv1=1; then
subustite with color.csv2=1 }
so that loop iterates in all CSV1 Color column and changes the value with the values from CSV2
An explicit loop for this would be very slow in bash. Use a command that does the line-wise processing for you.
sed 's/abc/xyz/' searches abc in each line and replaces it by xyz. Use this to search and replace the numbers in your 2nd column by the names from your 2nd file. The sed command can be automatically generated from the 2nd file using another sed command:
The following script assumes a CSV file without spaces around the delimiting ,.
sed -E "$(sed -E '1d;s#^([^,]*),(.*)#s/^([^,]*,)\1,/\\1\2,/#' 2.csv)" 1.csv
Interactive Example
$ cat 1.csv
shape,Color,size
round,2,big
triangle,1,small
square,3,medium
$ cat 2.csv
color
1,REd
2,Blue
3,Yellow
$ sed -E "$(sed -E '1d;s#^([^,]*),(.*)#s/^([^,]*,)\1,/\\1\2,/#' 2.csv)" 1.csv
shape,Color,size
round,Blue,big
triangle,REd,small
square,Yellow,medium
Here is one approach, with mapfile which is a bash4+ feature and some common utilities in linux/unix.
Assuming both files are delimited with a comma ,
#!/usr/bin/env bash
mapfile -t colors_csv2 < csv2.csv
head -n1 csv1.csv
while IFS=, read -r shape_csv1 color_csv1 size_csv1; do
for color_csv2 in "${colors_csv2[#]:1}"; do
if [[ $color_csv1 == ${color_csv2%,*} ]]; then
printf '%s,%s,%s\n' "$shape_csv1" "${color_csv2#*,}" "$size_csv1"
fi
done
done < <(tail -n +2 csv1.csv)
Would be very slow on large set of data/files.
If ed is available acceptable, with the bash shell.
#!/usr/bin/env bash
ed -s csv1.csv < <(
printf '%s\n' '1d' $'g|.|s|,|/|\\\ns|^|,s/|\\\ns|$|/|' '$a' ',p' 'Q' . ,p |
ed -s csv2.csv
)
To add to #Jetchisel interesting answer, here is an old bash way to achieve that. It should work with bash release 2 as it supports escape literals, indexed array, string expansion, indirect variable references. It implies that color keys in csv2.csv will always be a numeric value. Add shopt -s compat31 at the beginning to test it in the 'old way' with a recent bash. You can also replace declare -a csv2 with a Bash 4+ declare -A csv2 for an associative array, in which case the key can be anything.
#!/bin/bash
declare -a csv2
esc=$'\x1B'
while read -r colors; do
if [ "${colors}" ] ; then
colors="${colors// /${esc}}"
set ${colors//,/ }
if [ "$1" ] ; then
csv2["$1"]="$2"
fi
fi
done < csv2.csv
while read -r output; do
if [ "${output}" ] ; then
outputfilter="${output// /${esc}}"
set ${outputfilter//,/ }
if [ "$2" ] ; then
color="${csv2["$2"]}"
[ "${color}" ] && { tmp="$1,${color},$3";output="${tmp//${esc}/ }"; };
fi
echo "${output}"
fi
done < csv1.csv

Bash Associative Array from String?

A command emits the string: "[abc]=kjlkjkl [def]=yutuiu [ghi]=jljlkj"
I want to load a bash associative array using these key|value pairs, but the result I'm getting is a single row array where the key is formed of the first pair [abc]=kjlkjkl and the value is the whole of the rest of the string, so: declare -p arr returns declare -A arr["[abc]=kjlkjkl"]="[def]=yutuiu [ghi]=jljlkj"
This is what I am doing at the moment. Where am I going wrong please?
declare -A arr=()
while read -r a b; do
arr["$a"]="$b"
done < <(command that outputs the string "[abc]=kjlkjkl [def]=yutuiu [ghi]=jljlkj")
You need to parse it: split the string on spaces, split each key-value pair on the equals sign, and get rid of the brackets.
Here's one way, using tr to replace the spaces with newlines, then tr again to remove all brackets (including any that occur in a value), then IFS="=" to split the key-value pairs. I'm sure this could be done more effectively, like with AWK or Perl, but I don't know how.
declare -A arr=()
while IFS="=" read -r a b; do
arr["$a"]="$b"
done < <(
echo "[abc]=kjlkjkl [def]=yutuiu [ghi]=jljlkj" |
tr ' ' '\n' |
tr -d '[]'
)
echo "${arr[def]}" # -> yutuiu
See Cyrus's answer for another take on this, with the space and equals steps combined.
Append this to your command which outputs the string:
| tr ' =' '\n ' | tr -d '[]'
You can use the "eval declare" trick - but be sure your input is clean.
#! /bin/bash
s='[abc]=kjlkjkl [def]=yutuiu [ghi]=jljlkj'
eval declare -A arr=("$s")
echo ${arr[def]} # yutuiu
If the input is insecure, don't use it. Imagine (don't try) what would happen if
s='); rm -rf / #'
The "proper" good™ solution would be to write your own parser and tokenize the input. For example read the input char by char, handle [ and ] and = and space and optionally quoting. After parsing the string, assign the output to an associative array.
A simple way could be:
echo "[abc]=kjlkjkl [def]=yutuiu [ghi]=jljlkj" |
xargs -n1 |
{
declare -A arr;
while IFS= read -r line; do
if [[ "$line" =~ ^\[([a-z]*)\]=([a-z]*)$ ]]; then
arr[${BASH_REMATCH[1]}]=${BASH_REMATCH[2]}
fi
done
declare -p arr
}
outputs:
declare -A arr=([abc]="kjlkjkl" [ghi]="jljlkj" [def]="yutuiu" )

Unix file pattern issue: append changing value of variable pattern to copies of matching line

I have a file with contents:
abc|r=1,f=2,c=2
abc|r=1,f=2,c=2;r=3,f=4,c=8
I want a result like below:
abc|r=1,f=2,c=2|1
abc|r=1,f=2,c=2;r=3,f=4,c=8|1
abc|r=1,f=2,c=2;r=3,f=4,c=8|3
The third column value is r value. A new line would be inserted for each occurrence.
I have tried with:
for i in `cat $xxxx.txt`
do
#echo $i
live=$(echo $i | awk -F " " '{print $1}')
home=$(echo $i | awk -F " " '{print $2}')
echo $live
done
but is not working properly. I am a beginner to sed/awk and not sure how can I use them. Can someone please help on this?
awk to the rescue!
$ awk -F'[,;|]' '{c=0;
for(i=2;i<=NF;i++)
if(match($i,/^r=/)) a[c++]=substr($i,RSTART+2);
delim=substr($0,length($0))=="|"?"":"|";
for(i=0;i<c;i++) print $0 delim a[i]}' file
abc|r=1,f=2,c=2|1
abc|r=1,f=2,c=2;r=3,f=4,c=8|1
abc|r=1,f=2,c=2;r=3,f=4,c=8|3
Use an inner routine (made up of GNU grep, sed, and tr) to compile a second more elaborate sed command, the output of which needs further cleanup with more sed. Call the input file "foo".
sed -n $(grep -no 'r=[0-9]*' foo | \
sed 's/^[0-9]*/&s#.*#\&/;s/:r=/|/;s/.*/&#p;/' | \
tr -d '\n') foo | \
sed 's/|[0-9|]*|/|/'
Output:
abc|r=1,f=2,c=2|1
abc|r=1,f=2,c=2;r=3,f=4,c=8|1
abc|r=1,f=2,c=2;r=3,f=4,c=8|3
Looking at the inner sed code:
grep -no 'r=[0-9]*' foo | \
sed 's/^[0-9]*/&s#.*#\&/;s/:r=/|/;s/.*/&#p;/' | \
tr -d '\n'
It's purpose is to parse foo on-the-fly (when foo changes, so will the output), and in this instance come up with:
1s#.*#&|1#p;2s#.*#&|1#p;2s#.*#&|3#p;
Which is almost perfect, but it leaves in old data on the last line:
sed -n '1s#.*#&|1#p;2s#.*#&|1#p;2s#.*#&|3#p;' foo
abc|r=1,f=2,c=2|1
abc|r=1,f=2,c=2;r=3,f=4,c=8|1
abc|r=1,f=2,c=2;r=3,f=4,c=8|1|3
...which old data |1 is what the final sed 's/|[0-9|]*|/|/' removes.
Here is a pure bash solution. I wouldn't recommend actually using this, but it might help you understand better how to work with files in bash.
# Iterate over each line, splitting into three fields
# using | as the delimiter. (f3 is only there to make
# sure a trailing | is not included in the value of f2)
while IFS="|" read -r f1 f2 f3; do
# Create an array of variable groups from $f2, using ;
# as the delimiter
IFS=";" read -a groups <<< "$f2"
for group in "${groups[#]}"; do
# Get each variable from the group separately
# by splitting on ,
IFS=, read -a vars <<< "$group"
for var in "${vars[#]}"; do
# Split each assignment on =, create
# the variable for real, and quit once we
# have found r
IFS== read name value <<< "$var"
declare "$name=$value"
[[ $name == r ]] && break
done
# Output the desired line for the current value of r
printf '%s|%s|%s\n' "$f1" "$f2" "$r"
done
done < $xxxx.txt
Changes for ksh:
read -A instead of read -a.
typeset instead of declare.
If <<< is a problem, you can use a here document instead. For example:
IFS=";" read -A groups <<EOF
$f2
EOF

Resources