Bash rows to column - bash

I know how to transpose rows in a file to columns, but I want to append the lines of the bottom half of a file to the lines to the upper half.
Like:
A1
A2
A3
B1
B2
B3
to
A1 | B1
A2 | B2
A3 | B3
the list comes from two greps. I append the first grep with the second one. The two greps have the same amount of hits.
I want to do this within a bash script.

What about combining head and tail together with paste?
paste -d'|' <(head -3 file) <(tail -3 file)
It returns:
A1|B1
A2|B2
A3|B3
paste merges lines of files. If we provide different lines from the same file... that's all!
As it is a matter of getting head from the half of the lines and tail from the rest, this is a more generic way:
paste -d'|' <(head -n $(($(wc -l <file)/2)) file)
<(tail -n $(($(wc -l <file)/2)) file)

You're looking for the pr tool:
printf "%s\n" {A,B}{1,2,3} | pr -2 -T -s" | "
A1 | B1
A2 | B2
A3 | B3

$ awk '{a[NR]=$0} END{ m=NR/2; for (i=1;i<=m;i++) print a[i] " | " a[i+m]}' file
A1 | B1
A2 | B2
A3 | B3

Just as an alternative:
awk 'BEGIN{c=0}
{a[c++] = $1}
END { for (i = 0; i < c/2; i++) print a[i] " " a[i+c/2]}'
This assumes you have an even number of lines as input.

Related

awk or sed command for columns and rows selection from multiple files

Looking for a command for the following task:
I have three files, each with two columns, as seen below.
I would like to create file4 with four columns.
The output should resemble a merge-sorted version of file1, file2 and file3 such that the first column is sorted, the second column is the second column of file1 the third column is the second column of file2 and the fourth column is the second column of file3.
The entries in column 2 to 3 should not be sorted but should match the key-value in the first column of the original files.
I tried intersection in Linux, but not giving the desired outputs.
Any help will be appreciated. Thanks in advance!!
$ cat -- file1
A1 B5
A10 B2
A3 B15
A15 B6
A2 B10
A6 B19
$ cat -- file2
A10 C4
A4 C8
A6 C5
A3 C10
A12 C14
A15 C18
$ cat -- file 3
A3 D1
A22 D9
A20 D3
A10 D5
A6 D10
A21 D11
$ cat -- file 4
col1 col2 col3 col4
A1 B5
A2 B10
A3 B15 C10 D1
A4 C8
A6 B19 C5 D10
A10 B2 C4 D5
A12 C14
A15 B6 C18
A20 D3
A21 D11
A22 D9
Awk + Bash version:
( echo "col1, col2, col3, col4" &&
awk 'ARGIND==1 { a[$1]=$2; allkeys[$1]=1 } ARGIND==2 { b[$1]=$2; allkeys[$1]=1 } ARGIND==3 { c[$1]=$2; allkeys[$1]=1 }
END{
for (k in allkeys) {
print k", "a[k]", "b[k]", "c[k]
}
}' file1 file2 file3 | sort -V -k1,1 ) | column -t -s ','
Pure Bash version:
declare -A a
while read key value; do a[$key]="${a[$key]:-}${a[$key]:+, }$value"; done < file1
while read key value; do a[$key]="${a[$key]:-, }${a[$key]:+, }$value"; done < file2
while read key value; do a[$key]="${a[$key]:-, , }${a[$key]:+, }$value"; done < file3
(echo "col1, col2, col3, col4" &&
for i in ${!a[#]}; do
echo $i, ${a[$i]}
done | sort -V -k1,1) | column -t -s ','
Explanation for "${a[$key]:-, , }${a[$key]:+, }$value" please check Shell-Parameter-Expansion
Using GNU Awk:
gawk '{ a[$1] = substr($1, 1); b[$1, ARGIND] = $2 }
END {
PROCINFO["sorted_in"] = "#val_num_asc"
for (i in a) {
t = i
for (j = 1; j <= ARGIND; ++j)
t = t OFS b[i, j]
print t
}
}' file{1..3} | column -t
There is a simple tool called join that allows you to perform this operation:
#!/usr/bin/env bash
cut -d ' ' -f1 file{1,2,3} | sort -k1,1 -u > ftmp
for f in file1 file2 file3; do
mv -- ftmp file4
join -a1 -e "---" -o auto file4 <(sort -k1,1 "$f") > ftmp
done
sort -k1,1V ftmp > file4
cat file4
This outputs
A1 B5 --- ---
A2 B10 --- ---
A3 B15 C10 D1
A4 --- C8 ---
A6 B19 C5 D10
A10 B2 C4 D5
A12 --- C14 ---
A15 B6 C18 ---
A20 --- --- D3
A21 --- --- D11
A22 --- --- D9
I used --- to indicate an empty field. If you want to pretty print this, you have to re-parse it with awk or anything else.
This might work for you (GNU sed and sort):
s=''; for f in file{1,2,3}; do s="$s\t"; sed -E "s/\s+/$s/" $f; done |
sort -V |
sed -Ee '1i\col1\tcol2\tcol3\tcol4' -e ':a;N;s/^((\S+\t).*\S).*\n\2\t+/\1\t/;ta;P;D'
Replace spaces by tabs and insert the number of tabs between the key and value depending on which file is being processed.
Sort the output by key column order.
Coalesce each line with its key and print the result.

uniq -c in one column

Imagine we have a txt file like the next one:
Input:
a1 D1
b1 D1
c1 D1
a1 D2
a1 D3
c1 D3
I want to count the time each element in the first column appears but also keep the information provided by the second column (someway). Potential possible output formats are represented, but any coherent alternative is also accepted:
Possible output 1:
3 a1 D1,D2,D3
1 b1 D1
2 c1 D1,D3
Possible output 2:
3 a1 D1
1 b1 D1
2 c1 D1
3 a1 D2
3 a1 D3
1 c1 D3
How can I do this? I guess a combination sort -k 1 input | uniq -c <keep col2> or perhaps using awk but I was not able to write anything that works. However, all answers are considered.
I would harness GNU AWK for this task following way, let file.txt content be
a1 D1
b1 D1
c1 D1
a1 D2
a1 D3
c1 D3
then
awk 'FNR==NR{arr[$1]+=1;next}{print arr[$1],$0}' file.txt file.txt
gives output
3 a1 D1
1 b1 D1
2 c1 D1
3 a1 D2
3 a1 D3
2 c1 D3
Explanation: 2-pass solution (observe that file.txt is repeated), first pass does count number of occurences of first column value storing that data into array arr, second pass is for printing computed number from array, followed by whole line.
(tested in GNU Awk 5.0.1)
Using any awk:
$ awk '
{
vals[$1] = ($1 in vals ? vals[$1] "," : "") $2
cnts[$1]++
}
END {
for (key in vals) {
print cnts[key], key, vals[key]
}
}
' file
3 a1 D1,D2,D3
1 b1 D1
2 c1 D1,D3

Sort multiple tables inside Markdown file with text interspersed between them

There is a Markdown file with headings, text, and unsorted tables. I want to programmatically sort each table by ID, which is the 3rd column, in descending order, preferably using PowerShell or Bash. The table would remain in its place in the file.
# Heading
Text
| Col A | Col B | ID |
|---------|---------|----|
| Item 1A | Item 1B | 8 |
| Item 2A | Item 2B | 9 |
| Item 3A | Item 3B | 6 |
# Heading
Text
| Col A | Col B | ID |
|---------|---------|----|
| Item 4A | Item 4B | 3 |
| Item 5A | Item 5B | 2 |
| Item 6A | Item 6B | 4 |
I have no control over how the Markdown file is generated. Truly.
Ideally the file would remain in Markdown after the sort for additional processing. However, I explored these options without success:
Convert to JSON and sort (the solutions I tried didn't agree with tables)
Convert to HTML and sort (only found JavaScript solutions)
This script alone, while helpful, would need to be modified to parse through the Markdown file (having trouble finding understandable guidance on how to run a script on content between two strings)
The reason for command line (and not JavaScript on the HTML, for example) is that this transformation will take place in an Azure Release Pipeline. It is possible to add an Azure Function to the pipeline, which would allow me to run JavaScript code in the cloud, and I will pursue that if all else fails. I want to exhaust command-line options first because I am not very familiar with JavaScript or how to pass content between Functions and releases.
Thank you for any ideas.
By modifying the referred script, how about:
flush() {
printf "%s\n" "${lines[#]:0:2}"
printf "%s\n" "${lines[#]:2}" | sort -t \| -nr -k 4
lines=()
}
while IFS= read -r line; do
if [[ ${line:0:1} = "|" ]]; then
lines+=("$line")
else
(( ${#lines[#]} > 0 )) && flush
echo "$line"
fi
done < input.md
(( ${#lines[#]} > 0 )) && flush
Output:
# Heading
Text
| Col A | Col B | ID |
|---------|---------|----|
| Item 2A | Item 2B | 9 |
| Item 1A | Item 1B | 8 |
| Item 3A | Item 3B | 6 |
# Heading
Text
| Col A | Col B | ID |
|---------|---------|----|
| Item 6A | Item 6B | 4 |
| Item 4A | Item 4B | 3 |
| Item 5A | Item 5B | 2 |
BTW, if Perl is your option, here is an alternative:
perl -ne '
sub flush {
print splice(#ary, 0, 2); # print header lines
# sort the table with keying the ID by Schwartzian transform
print map { $_->[0] }
sort { $b->[1] <=> $a->[1] }
map { [$_, (split(/\s*\|\s*/))[3] ] }
#ary;
#ary = ();
}
# main loop
if (/^\|/) { # table section
push(#ary, $_);
} else { # other section
if ($#ary > 0) {
&flush;
} else {
print;
}
}
END {
if ($#ary > 0) { &flush; }
}
' input.md
Hope this helps.
If possible to identify markdown tables, a small 'awk' (or bash/python/perl) can filter the output. It assume each table has 2 header line.
awk -v 'FS="|" '
function cmp_id(i1, v1, i2, v2) {
return v1-v2 ;
}
function show () {
asorti(k, d, "cmp_id")
# for (i=1 ; i<=n; i++ ) print i, k[i], d[i] ;
# Print first 2 original header row, followed by sorted data lines
print s[1] ; print s[2]
for (i=1 ; i<=n; i++ ) if ( d[i]>=3 ) print s[d[i]] ;
n = 0
}
# Capture tables
/^\|/ { s[++n] = $0 ; k[n] = $4 ; next }
n > 0 { show() ; }
{ print }
END { show() ; }
'

How to sort columns using bash script? [duplicate]

I have a file full of data in columns
sarah mark john
10 20 5
x y z
I want to sort the data so the columns stay intact but the second row is in increasing order so it looks like this:
john sarah mark
5 10 20
z x y
I've been looking at the sort command but have only been able to find vertical sorting, not horizontal. I'm happy to use any tool, any help is appreciated.
Thank you!
Let's create a function to transpose a file (make rows become columns, and columns become rows):
transpose () {
awk '{for (i=1; i<=NF; i++) a[i,NR]=$i; max=(max<NF?NF:max)}
END {for (i=1; i<=max; i++)
{for (j=1; j<=NR; j++)
printf "%s%s", a[i,j], (j<NR?OFS:ORS)
}
}'
}
This just loads all the data into a bidimensional array a[line,column] and then prints it back as a[column,line], so that it transposes the given input. The wrapper transpose () { } is used to store it as a bash function. You just need to copy paste it in your shell (or in ~/.bashrc if you want it to be a permanent function, available any time you open a session).
Then, by using it, we can easily solve the problem by using sort -n -k2: sort numerically based on column 2. Then, transpose back.
$ cat a | transpose | sort -n -k2 | transpose
john sarah mark
5 10 20
z x y
In case you want to have a nice format as final output, just pipe to column like this:
$ cat a | transpose | sort -n -k2 | transpose | column -t
john sarah mark
5 10 20
z x y
Step by step:
$ cat a | transpose
sarah 10 x
mark 20 y
john 5 z
$ cat a | transpose | sort -n -k2
john 5 z
sarah 10 x
mark 20 y
$ cat a | transpose | sort -n -k2 | transpose
john sarah mark
5 10 20
z x y
Coming from a duplicate question, this would sort the columns by the first row:
#!/bin/bash
input="$1"
order=$((for i in $(head -1 $input); do echo $i; done) | nl | sort -k2 | cut -f1)
grep ^ $input | (while read line
do
read -a columns <<< "${line%"${line##*[![:space:]]}"}"
orderedline=()
for i in ${order[#]}
do
orderedline+=("${columns[$i - 1]}")
done
line=$(printf "\t%s" "${orderedline[#]}")
echo ${line:1}
done)
To sort by second row, replace head -1 $input with head -2 $input | tail -1. If the sort should be numeric, put in sort -n -k2 instead of sort -k2.
Good one-liner gets the job done:
perl -ane '$,=" "; print sort #F; print "\n";' file
I found it here: http://www.unix.com/unix-for-advanced-and-expert-users/36039-horizontal-sorting-lines-file-sed-implementation.html

Using bash to sort data horizontally

I have a file full of data in columns
sarah mark john
10 20 5
x y z
I want to sort the data so the columns stay intact but the second row is in increasing order so it looks like this:
john sarah mark
5 10 20
z x y
I've been looking at the sort command but have only been able to find vertical sorting, not horizontal. I'm happy to use any tool, any help is appreciated.
Thank you!
Let's create a function to transpose a file (make rows become columns, and columns become rows):
transpose () {
awk '{for (i=1; i<=NF; i++) a[i,NR]=$i; max=(max<NF?NF:max)}
END {for (i=1; i<=max; i++)
{for (j=1; j<=NR; j++)
printf "%s%s", a[i,j], (j<NR?OFS:ORS)
}
}'
}
This just loads all the data into a bidimensional array a[line,column] and then prints it back as a[column,line], so that it transposes the given input. The wrapper transpose () { } is used to store it as a bash function. You just need to copy paste it in your shell (or in ~/.bashrc if you want it to be a permanent function, available any time you open a session).
Then, by using it, we can easily solve the problem by using sort -n -k2: sort numerically based on column 2. Then, transpose back.
$ cat a | transpose | sort -n -k2 | transpose
john sarah mark
5 10 20
z x y
In case you want to have a nice format as final output, just pipe to column like this:
$ cat a | transpose | sort -n -k2 | transpose | column -t
john sarah mark
5 10 20
z x y
Step by step:
$ cat a | transpose
sarah 10 x
mark 20 y
john 5 z
$ cat a | transpose | sort -n -k2
john 5 z
sarah 10 x
mark 20 y
$ cat a | transpose | sort -n -k2 | transpose
john sarah mark
5 10 20
z x y
Coming from a duplicate question, this would sort the columns by the first row:
#!/bin/bash
input="$1"
order=$((for i in $(head -1 $input); do echo $i; done) | nl | sort -k2 | cut -f1)
grep ^ $input | (while read line
do
read -a columns <<< "${line%"${line##*[![:space:]]}"}"
orderedline=()
for i in ${order[#]}
do
orderedline+=("${columns[$i - 1]}")
done
line=$(printf "\t%s" "${orderedline[#]}")
echo ${line:1}
done)
To sort by second row, replace head -1 $input with head -2 $input | tail -1. If the sort should be numeric, put in sort -n -k2 instead of sort -k2.
Good one-liner gets the job done:
perl -ane '$,=" "; print sort #F; print "\n";' file
I found it here: http://www.unix.com/unix-for-advanced-and-expert-users/36039-horizontal-sorting-lines-file-sed-implementation.html

Resources