Can I use 'column' in bash where a column has a newline? - bash

Suppose I have an associative array of ( [name]="Some description" ).
declare -A myItems=(
[item1]='Item1 description'
[item2]='Item2 description'
)
I now want to print a table of myItems with nice, even column lengths.
str=''
for n in $(echo "${!myItems[#]}" | tr " " "\n" | sort); do
str+="$n\t${myItems[$n]}\n"
done
# $(printf '\t') was the simplest way I could find to capture a tab character
echo -e "$str" | column -t -s "$(printf '\t')"
### PRINTS ###
item1 Item1 description
item2 Item2 description
Cool. This works nicely.
Now suppose an item has a description that is multiple lines.
myItems=(
[item1]='Item1 description'
[item2]='Item2 description'
[item3]='This item has
multiple lines
in it'
)
Now running my script prints
item1 Item1 description
item2 Item2 description
item3 This item has
multiple lines
in it
What I want is
item1 Item1 description
item2 Item2 description
item3 This item has
multiple lines
in it
Is this achievable with column? If not, can I achieve it through some other means?

Assuming the array keys are always single lines, you can prefix all newline characters inside the values by a tab so that column aligns them correctly.
By the way: Bash supports so called C-Strings which allow you to specify tabs and newlines by their escape sequences. $'\n\t' is a newline followed by a tab.
for key in "${!myItems[#]}"; do
printf '_%s\t%s\n' "$key" "${myItems[$key]//$'\n'/$'\n_\t'}"
done |
column -ts $'\t' |
sed 's/^_/ /'
If you also want to sort the keys as in your question, I'd suggest something more robust than for ... in $(echo ...), for instance
printf %s\\n "${!myItems[#]}" | sort |
while IFS= read -r key; do
...
And here is a general solution allowing for multi-line keys and values:
printf %s\\0 "${!myItems[#]}" | sort -z |
while IFS= read -rd '' key; do
paste <(printf %s "$key") <(printf %s "${myItems[$key]}")
done |
sed 's/^/_/' | column -ts $'\t' | sed 's/^_//'

You may use this 2 pass code:
# find max length of key
for i in "${!myItems[#]}"; do ((${#i} > max)) && max=${#i}; done
# create a variable filled with space using max+4; assuming 4 is tab space size
printf -v spcs "%*s" $((max+4)) " "
# finally print key and value using max and spcs filler
for i in "${!myItems[#]}"; do
printf '%-*s\t%s\n' $max "$i" "${myItems[$i]//$'\n'/$'\n'$spcs}"
done
item1 Item1 description
item2 Item2 description
item3 This item has
multiple lines
in it

Something tested
#!/usr/bin/env bash
declare -A myItems=(
[item1]='Item1 description'
[item2]='Item2 description'
[item3]='This item has
multiple lines
in it'
)
# Iterate each item by key
for item in "${!myItems[#]}"; do
# Transform lines of item name into an array
IFS=$'\n' read -r -d '' -a itemName <<<"$item"
# Transform lines of item description into an array
IFS=$'\n' read -r -d '' -a itemDesc <<<"${myItems[$item]}"
# Computes the biggest number of lines
lines=$((${#itemName[#]}>${#itemDesc[#]}?${#itemName[#]}:${#itemDesc[#]}))
# Print each line formated
for (( l=0; l<lines; l++)); do
printf '%-8s%s\n' "${itemName[$l]}" "${itemDesc[$l]}"
done
done
Output:
item1 Item1 description
item2 Item2 description
item3 This item has
multiple lines
in it

Related

What's the best way to loop over single line with several separator?

I want to parse the output of fio, I have format them so it has a nice delimiter.
182.07 MB/s|182.55 MB/s|364.62 MB/s|45.5k|45.6k|91.2k#682.65 MB/s|686.24 MB/s|1.36 GB/s|10.7k|10.7k|21.4k#665.21 MB/s|700.56 MB/s|1.36 GB/s|1.3k|1.4k|2.7k#751.97 MB/s|802.05 MB/s|1.55 GB/s|0.7k|0.8k|1.5k
I want to process each string separated with # sign, currently this is what I do.
Convert # to \n (newline)
fio_result=$(printf %s "$fio_result" | tr '#' '\n')
This will output the string like so.
182.07 MB/s|182.55 MB/s|364.62 MB/s|45.5k|45.6k|91.2k
682.65 MB/s|686.24 MB/s|1.36 GB/s|10.7k|10.7k|21.4k
665.21 MB/s|700.56 MB/s|1.36 GB/s|1.3k|1.4k|2.7k
751.97 MB/s|802.05 MB/s|1.55 GB/s|0.7k|0.8k|1.5k
Only after that, loop through the variable fio_result.
echo "$fio_result" | while IFS='|' read -r bla bla...
Does anyone have better idea how to achieve what I want ?
With bash you can do:
#!/bin/bash
fio_result='182.07 MB/s|182.55 MB/s|364.62 MB/s|45.5k|45.6k|91.2k#682.65 MB/s|686.24 MB/s|1.36 GB/s|10.7k|10.7k|21.4k#665.21 MB/s|700.56 MB/s|1.36 GB/s|1.3k|1.4k|2.7k#751.97 MB/s|802.05 MB/s|1.55 GB/s|0.7k|0.8k|1.5k'
while IFS='|' read -d '#' -ra arr
do
declare -p arr #=> shows what's inside 'arr'
done < <(
printf '%s' "$fio_result"
)
But, if your need is to format/extract/compute something from fio output then you should switch to an other tool more fit for the job than bash.
Example with awk: calculate the average of the first two columns:
printf '%s' "$fio_result" |
awk -F'|' -v RS='#' '{print ($1+$2)/2}'

shell script declare associative variables from array

I have array key-value pairs separated by space,
array=Name:"John" ID:"3234" Designation:"Engineer" Age:"32" Phone:"+123 456 789"
Now I want convert above array as associative variables like below,
declare -A newmap
newmap[Name]="John"
newmap[ID]="3234"
newmap[Designation]="Engineer"
newmap[Age]="32"
newmap[Phone]="+123 456 789"
echo ${newmap[Name]}
echo ${newmap[ID]}
echo ${newmap[Designation]}
echo ${newmap[Age]}
echo ${newmap[Phone]}
I'm able to get value for given key using file,
declare -A arr
while IFS='=' read -r k v; do
arr[$k]=$v;
done < "file.txt"
echo "${arr[name]}"
But I want to implement same functionality using array instead of file.
You can just use a sed to reformat input data before calling declare -A:
s='array=Name:"John" ID:"3234" Designation:"Engineer" Age:"32" Phone:"+123 456 789"'
declare -A "newmap=(
$(sed -E 's/" ([[:alpha:]])/" [\1/g; s/:"/]="/g' <<< "[${s#*=}")
)"
Then check output:
declare -p newmap
declare -A newmap=([ID]="3234" [Designation]="Engineer" [Age]="32" [Phone]="+123 456 789" [Name]="John" )
A version without eval:
array='Name:"John" ID:"3234" Designation:"Engineer" Age:"32" Phone:"+123 456 789"'
declare -A "newmap=($(perl -pe 's/(\w+):"/[\1]="/g' <<< "$array"))"
echo ${newmap[Phone]}
# output : +123 456 789
Working with the variable array that's been defined as follows:
$ array='Name:"John" ID:"3234" Designation:"Engineer" Age:"32" Phone:"+123 456 789"'
NOTES:
assuming no white space between the attribute, ':' and value
assuming there may be variable amount of white space between attribute/value pairs
assuming all values are wrapped in a pair of double quotes
And assuming the desire is to parse this string and store in an array named newmap ...
We can use sed to break our string into separate lines as such:
$ sed 's/" /"\n/g;s/:/ /g' <<< ${array}
Name "John"
ID "3234"
Designation "Engineer"
Age "32"
Phone "+123 456 789"
We can then feed this to a while loop to populate our array:
$ unset newmap
$ typeset -A newmap
$ while read -r k v
do
newmap[${k}]=${v//\"} # strip off the double quote wrapper
done < <(sed 's/" /"\n/g;s/:/ /g' <<< ${array})
$ typeset -p newmap
declare -A newmap=([ID]="3234" [Name]="John" [Phone]="+123 456 789" [Age]="32" [Designation]="Engineer" )
And applying the proposed (and slightly modified) echo statements:
$ (
echo "Name - ${newmap[Name]}"
echo "ID - ${newmap[ID]}"
echo "Designation - ${newmap[Designation]}"
echo "Age - ${newmap[Age]}"
echo "Phone - ${newmap[Phone]}"
)
Name - John
ID - 3234
Designation - Engineer
Age - 32
Phone - +123 456 789

bash to store unique value if array in variable

The bash below goes to a folder and stores all the unique values that are .html file names in f1. It then removes all text after the _ in $p. I added a for loop to get the unique id in $p. The terminal out shows $p is correct, but the last value is only being stored in the new array ($sorted_unique_ids), I am not sure why all three are not.
dir=/path/to
var=$(ls -td "$dir"/*/ | head -1) ## sore newest <run> in var
for f1 in "$var"/qc/*.html ; do
# Grab file prefix
bname=`basename $f1` # strip of path
p="$(echo $bname|cut -d_ -f1)"
typeset -A myarray ## define associative array
myarray[${p}]=yes ## store p in myarray
for i in ${!myarray[#]}; do echo ${!myarray[#]} | tr ' ' '\n' | sort; done
done
output
id1
id1
id1
id2
id1
id2
id1
id2
id3
id1
id2
id3
desired sorted_unique_ids
id1
id2
id3
Maybe something like this:
dir=$(ls -td "$dir"/*/ | head -1)
find "$dir" -maxdepth 1 -type f -name '*_*.html' -printf "%f\n" |
cut -d_ -f1 | sort -u
For input directory structure created like:
dir=dir
mkdir -p dir/dir
touch dir/dir/id{1,2,3}_{a,b,c}.html
So it looks like this:
dir/dir/id2_b.html
dir/dir/id1_c.html
dir/dir/id2_c.html
dir/dir/id1_b.html
dir/dir/id3_b.html
dir/dir/id2_a.html
dir/dir/id3_a.html
dir/dir/id1_a.html
dir/dir/id3_c.html
The script will output:
id1
id2
id3
Tested on repl.
latest=`ls -t "$dir"|head -1` # or …|sed q` if you're really jonesing for keystrokes
for f in "$latest"/qc/*_*.html; do f=${f##*/}; printf %s\\n "${f%_*}"; done | sort -u
Define an associative array:
typeset -A myarray
Use each p value as the index for an array element; assign any value you want to the array element (the value just acts as a placeholder):
myarray[${p}]=yes
If you run across the same p value more than once, each assignment to the array will overwrite the previous assignment; net result is that you'll only ever have a single element in the array with a value of p.
To obtain your unique list of p values, you can loop through the indexes for the array, eg:
for i in ${!myarray[#]}
do
echo ${i}
done
If you need the array indexes generated in sorted order try:
echo ${!myarray[#]} | tr ' ' '\n' | sort
You can then use this sorted result set as needed (eg, dump to stdout, feed to a loop, etc).
So, adding my code to the OPs original code would give us:
typeset -A myarray ## define associative array
dir=/path/to
var=$(ls -td "$dir"/*/ | head -1) ## sore newest <run> in var
for f1 in "$var"/qc/*.html ; do
# Grab file prefix
bname=`basename $f1` # strip of path
p="$(echo $bname|cut -d_ -f1)"
myarray[${p}]=yes ## store p in myarray
done
# display sorted, unique set of p values
for i in ${!myarray[#]}; do echo ${!myarray[#]} | tr ' ' '\n' | sort; done

How to merge two or more lines if they start with the same word?

I have a file like this:
AAKRKA HIST1H1B AAGAGAAKRKATGPP
AAKRKA HIST1H1E RKSAGAAKRKASGPP
AAKRLN ACAT1 LMTADAAKRLNVTPL
AAKRLN SUCLG2 NEALEAAKRLNAKEI
AAKRLR GTF2F1 VSEMPAAKRLRLDTG
AAKRMA VCL NDIIAAAKRMALLMA
AAKRPL WIZ YLGSVAAKRPLQEDR
AAKRQK MTA2 SSSQPAAKRQKLNPA
I would like to kind of merge 2 lines if they are exactly the same in the 1st column. The desired output is:
AAKRKA HIST1H1B,HIST1H1E AAGAGAAKRKATGPP,RKSAGAAKRKASGPP
AAKRLN ACAT1,SUCLG2 LMTADAAKRLNVTPL,NEALEAAKRLNAKEI
AAKRLR GTF2F1 VSEMPAAKRLRLDTG
AAKRMA VCL NDIIAAAKRMALLMA
AAKRPL WIZ YLGSVAAKRPLQEDR
AAKRQK MTA2 SSSQPAAKRQKLNPA
Sometimes there could be more than two lines starting with the same word. How could I reach the desired output with bash/awk?
Thanks for help!
Since this resembles SQL like group operations, you can use sqlite which is available in bash
with the given inputs
$ cat aqua.txt
AAKRKA HIST1H1B AAGAGAAKRKATGPP
AAKRKA HIST1H1E RKSAGAAKRKASGPP
AAKRLN ACAT1 LMTADAAKRLNVTPL
AAKRLN SUCLG2 NEALEAAKRLNAKEI
AAKRLR GTF2F1 VSEMPAAKRLRLDTG
AAKRMA VCL NDIIAAAKRMALLMA
AAKRPL WIZ YLGSVAAKRPLQEDR
AAKRQK MTA2 SSSQPAAKRQKLNPA
$
Script:
$ cat ./sqlite_join.sh
#!/bin/sh
sqlite3 << EOF
create table data(a,b,c);
.separator ' '
.import $1 data
select a, group_concat(b) , group_concat(c) from data group by a;
EOF
$
Results
$ ./sqlite_join.sh aqua.txt
AAKRKA HIST1H1B,HIST1H1E AAGAGAAKRKATGPP,RKSAGAAKRKASGPP
AAKRLN ACAT1,SUCLG2 LMTADAAKRLNVTPL,NEALEAAKRLNAKEI
AAKRLR GTF2F1 VSEMPAAKRLRLDTG
AAKRMA VCL NDIIAAAKRMALLMA
AAKRPL WIZ YLGSVAAKRPLQEDR
AAKRQK MTA2 SSSQPAAKRQKLNPA
$
This is a two-liner in awk; the first line stores the second and third fields in associative arrays indexed by the first field, accumulating fields with identical indices with leading commas before each field, and the second line iterates over the two arrays, deleting the leading comma on output:
{ second[$1] = second[$1] "," $2; third[$1] = third[$1] "," $3 }
END { for (i in second) print i, substr(second[i],2), substr(third[i],2) }
I made no assumptions about the order of the input or the output. If you want sorted output, pipe the output through sort. You can run the program at https://ideone.com/sbgLNk.
try this:
DATAFILE=data.txt
cut -d " " -f1 < $DATAFILE | sort | uniq |
while read key; do
column1="$key"
column2=""
column3=""
grep "$key" $DATAFILE |
while read line; do
set -- $line
[ -n "$column2" ] && [ -n "$2" ] && column2="$column2,"
[ -n "$column3" ] && [ -n "$3" ] && column3="$column3,"
column2="$column2$2"
column3="$column3$3"
echo "$column1 $column2 $column3"
done | tail -n1
done

Associative array pipe to Column command

Im looking for a way to print out an Associative array with the column command and I fill like there is probably a way to do this, but I havent had much luck.
declare -A list
list=(
[a]="x is in this one"
[b]="y is here"
[areallylongone]="z down here"
)
I'd like the outcome to be a simple table. I've used a loop with tabs but in my case the lengths are great enough to offset the second column.
The output should look like
a x is in this one
b y is here
areallylongone z down here
You are looking for something like this?
declare -A assoc=(
[a]="x is in this one"
[b]="y is here"
[areallylongone]="z down here"
)
for i in "${!assoc[#]}" ; do
echo -e "${i}\t=\t${assoc[$i]}"
done | column -s$'\t' -t
Output:
areallylongone = z down here
a = x is in this one
b = y is here
I'm using a tab char to delimit key and value and use the column -t to tabulate the output and -s to set the input delimiter to the tab char. From man column:
-t Determine the number of columns the input contains and create a table. Columns are delimited with whitespace, by default, or with the charac‐
ters supplied using the -s option. Useful for pretty-printing displays
-s Specify a set of characters to be used to delimit columns for the -t option.
One (simple) way to do it is by pasting together keys column and values column:
paste -d $'\t' <(printf "%s\n" "${!list[#]}") <(printf "%s\n" "${list[#]}") | column -s $'\t' -t
For your input, it yields:
areallylongone z down here
a x is in this one
b y is here
To handle spaces in (both) keys and values, we used TAB (\t) as column delimiter, in both paste (-d option) and column (-s option) commands.
To obtain the desired output from the answer of hek2mgl
declare -A assoc=(
[a]="x is in this one"
[b]="y is here"
[areallylongone]="z down here"
)
for i in "${!assoc[#]}" ; do
echo "${i}=${assoc[$i]}"
done | column -s= -t | sort -k 2

Resources