Bash: Output columns from array consisting of two columns - bash

Problem
I am writing a bash script and I have an array, where each value consists of two columns. It looks like this:
for i in "${res[#]}"; do
echo "$i"
done
#Stream1
0 a1
1 b1
2 c1
4 d1
6 e1
#Stream2
0 a2
1 b2
3 c2
4 d2
9 f2
...
I would like to combine the output from this array into a larger table, and multiplex the indices. Furthermore, I would like to format the top row by inserting comment #Sec.
I would like the result to be something like this:
#Sec Stream1 Stream2
0 a1 a2
1 b1 b2
2 c1
3 c2
4 d1 d2
6 e1
9 f2
The insertion of #Sec and removal of the # behind the Streamkeyword is not necessary but desired if not too difficult.
Tried Solutions
I have tried piping to column and awk, but have not been able to produce the desired results.
EDIT
resis an array in a bash script. It is quite large, so I will only provide a short selection. Running echo "$( typeset -p res)"produces following output:
declare -a res='([1]="#Stream1
0 3072
1 6144
2 5120
3 1024
5 6144
..." [2]="#Stream2
0 3072
1 5120
2 4096
3 3072
53 3072
55 1024
57 2048")'
As for the 'result', my initial intention was to assign the resulting table to a variable and use it in another awk script to calculate the moving averages for specified indices, and plot the results. This will be done for ~20 different files. However I am open to other solutions.
The number of streams may vary from 10 to 50. Each stream having from 100 to 300 rows.

You may use this awk solution:
cat tabulate.awk
NF == 1 {
h = h OFS substr($1, 2)
++numSec
next
}
{
keys[$1]
map[$1,numSec] = $2
}
END {
print h
for (k in keys) {
printf "%s", k
for (i=1; i<=numSec; ++i)
printf "\t%s", map[k,i]
print ""
}
}
Then use it as:
awk -v OFS='\t' -v h='#Sec' -f tabulate.awk file
#Sec Stream1 Stream2
0 a1 a2
1 b1 b2
2 c1
3 c2
4 d1 d2
6 e1
9 f2

Related

uniq -c in one column

Imagine we have a txt file like the next one:
Input:
a1 D1
b1 D1
c1 D1
a1 D2
a1 D3
c1 D3
I want to count the time each element in the first column appears but also keep the information provided by the second column (someway). Potential possible output formats are represented, but any coherent alternative is also accepted:
Possible output 1:
3 a1 D1,D2,D3
1 b1 D1
2 c1 D1,D3
Possible output 2:
3 a1 D1
1 b1 D1
2 c1 D1
3 a1 D2
3 a1 D3
1 c1 D3
How can I do this? I guess a combination sort -k 1 input | uniq -c <keep col2> or perhaps using awk but I was not able to write anything that works. However, all answers are considered.
I would harness GNU AWK for this task following way, let file.txt content be
a1 D1
b1 D1
c1 D1
a1 D2
a1 D3
c1 D3
then
awk 'FNR==NR{arr[$1]+=1;next}{print arr[$1],$0}' file.txt file.txt
gives output
3 a1 D1
1 b1 D1
2 c1 D1
3 a1 D2
3 a1 D3
2 c1 D3
Explanation: 2-pass solution (observe that file.txt is repeated), first pass does count number of occurences of first column value storing that data into array arr, second pass is for printing computed number from array, followed by whole line.
(tested in GNU Awk 5.0.1)
Using any awk:
$ awk '
{
vals[$1] = ($1 in vals ? vals[$1] "," : "") $2
cnts[$1]++
}
END {
for (key in vals) {
print cnts[key], key, vals[key]
}
}
' file
3 a1 D1,D2,D3
1 b1 D1
2 c1 D1,D3

BASH - Summarising information from several fields in unique field using Loop and If statements

I have the following tab-separated file:
A1 A1 0 0 2 1 1 1 1 1 1 1 2 1 1 1
A2 A2 0 0 2 1 1 1 1 1 1 1 1 1 1 1
A3 A3 0 0 2 2 1 1 2 2 1 1 1 1 1 1
A5 A5 0 0 2 2 1 1 1 1 1 1 1 2 1 1
The idea is to summarise the information between column 7 (included) and the end in a new column that is added at the end of the file.
To do so, these are the rules:
If the total number of “2”s in the row (between column 7 and the end) is 0: add “1 1” to the new last column
If the total number of “2”s in the row (between column 7 and the end) is 1: add “1 2” to the new last column
If the total number of “2”s in the row (between column 7 and the end) is 2 or more: add “2 2” to the new last column
I started to extract the columns I want to work on using the command:
awk '{for (i = 7; i <= NF; i++) printf $i " "; print ""}' myfile.ped > tmp_myfile.txt
Then I count the number of occurrence in each row using:
sed 's/[^2]//g' tmp_myfile.txtt | awk '{print NR, length }' >
tmp_occurences.txt
Which outputs:
1 1
2 0
3 2
4 1
Then my idea was to write a for loop that loops through the lines to add the new summary column.
I was thinking in this kind of structure, based on what I found here: http://www.thegeekstuff.com/2010/06/bash-if-statement-examples:
while read line ;
do
set $line
If ["$2"==0]
then
$3=="1 1"
elif ["$2"==1 ]
then
$3=="1 2”
elif ["$2">=2 ]
then
$3==“2 2”
else
print ["error"]
fi
done < tmp_occurences.txt
But I am stuck here. Do I have to create the new column before starting the loop? Am I going in the right direction?
Ideally, the final output (after merging the first 6 columns from the initial file and the summary column) would be:
A1 A1 0 0 2 1 1 2
A2 A2 0 0 2 1 1 1
A3 A3 0 0 2 2 2 2
A5 A5 0 0 2 2 1 2
Thank you for your help!
Using gnu-awk you can do:
awk -v OFS='\t' '{
c=0;
for (i=7; i<=NF; i++)
if ($i==2)
c++
if (c==0)
s="1 1"
else if (c==1)
s="1 2"
else
s="2 2"
NF=6
print $0, s
}' file
A1 A1 0 0 2 1 1 2
A2 A2 0 0 2 1 1 1
A3 A3 0 0 2 2 2 2
A5 A5 0 0 2 2 1 2
PS: If not using gnu-awk you can use:
awk -v OFS='\t' '{c=0; for (i=7; i<=NF; i++) {if ($i==2) c++; $i=""} if (c==0) s="1 1"; else if (c==1) s="1 2"; else s="2 2"; NF=6; print $0, s}' file
With GNU awk for the 3rd arg to match():
$ awk '{match($0,/((\S+\s+){6})(.*)/,a); c=gsub(2,2,a[3]); print a[1] (c>1?2:1), (c>0?2:1)}' file
A1 A1 0 0 2 1 1 2
A2 A2 0 0 2 1 1 1
A3 A3 0 0 2 2 2 2
A5 A5 0 0 2 2 1 2
With other awks you'd replace \S/\s with [^[:space:]]/[[:space:]] and use substr() instead of a[].
We can keep the format by using gensub() and capturing groups: we capture the 6 first fields and replace with them + the calculated values:
awk '{for (i=7; i<=NF; i++) {
if ($i==2)
twos+=1 # count number of 2's from 7th to last field
}
f7=1; f8=0 # set 7th and 8th fields's default value
if (twos)
f8=2 # set 8th = 2 if sum is > 0
if (twos>1)
f7=2 # set 7th = 2 if sum is > 1
$0=gensub(/^((\S+\s*){6}).*/,"\\1 " f7 FS f8, 1) # perform the replacement
twos=0 # reset counter
}1' file
As a one-liner:
$ awk '{for (i=7; i<=NF; i++) {if ($i==2) twos+=1} f7=1; f8=0; if (twos) f8=2; if (twos>1) f7=2; $0=gensub(/^((\S+\s*){6}).*/,"\\1 " f7 FS f8,1); twos=0}1' a
A1 A1 0 0 2 1 1 2
A2 A2 0 0 2 1 1 0
A3 A3 0 0 2 2 2 2
A5 A5 0 0 2 2 1 2
$ cat > test.awk
{
for(i=1;i<=NF;i++) { # for every field
if(i<7)
printf "%s%s", $i,OFS # only output the first 6
else a[$i]++ # count the values of the of the fields
}
print (a[2]>1?"2 2":(a[2]==1?"1 2":"1 1")) # output logic
delete a # reset a for next record
}
$ awk -f test.awk test
A1 A1 0 0 2 1 1 2
A2 A2 0 0 2 1 1 1
A3 A3 0 0 2 2 2 2
A5 A5 0 0 2 2 1 2
Borrowing some ideas from #anubhava's solution above:
$ cat > another.awk
{
for(i=7;i<=NF;i++)
a[$i]++ # count 2s
NF=6 # truncate $0
print $0 OFS (a[2]<2?"1 "(a[2]?"2":"1"):"2 2") # print $0 AND 1 AND 1 OR 2 OR 2 AND 2
delete a # reset a for next record
}

Awk - Control when my $# variables are expanded to merge two files with variable number of columns

My bash script is calling a awk script that nicely merges two files
mapfieldfile1=1
mapfieldfile2=2
awk -v FS="\t" 'BEGIN {OFS="\t"}
FNR==NR{hash1['"\$${mapfieldfile2}"']=$1 FS $3 FS $4 FS $5 FS $6;next}
('"\$${mapfieldfile1}"' in hash1){ print $0, hash1['"\$${mapfieldfile1}"']}' file2 file1
However I want to a more general version,where I don't have to hardcode the columns that I want to print, I simply want to print everything but my id column. Replacing $1 FS $3 FS $4 FS $5 FS $6 for $0 "almost" does the work, except that repeats the id column. I have been trying to dynamically create a a string similar to the $1 FS $3 FS $4 FS $5 FS $6 but I am getting literally the $1 $3 $4 $5 $6 strings in the merged file, as opposed to expanding their values. Also, smaller side effects: I am adding a tab in the middle and losing some headers, below is the code and example files.
I would like to find the solution to my merge and also understand what I am doing wrong and why my variables are not expanding.
I appreciate any help!
mapfieldfile1=1
mapfieldfile2=2
awk -v FS="\t" 'BEGIN {OFS="\t";strfields=""}
FNR==NR{for(i=1;i<=NF;i++) if(i!='"${mapfieldfile2}"') {strfields=strfields" "FS" $"i};
hash1['"\$${mapfieldfile2}"']=strfields;strfields="";next}
('"\$${mapfieldfile1}"' in hash1){print $0, hash1['"\$${mapfieldfile1}"']}' file2 file1
$cat file1
sampleid s1 s2 s3 s4
1 1 1 1 1
2 2 2 2 2
3 3 3 3 3
4 4 4 4 4
$cat file2
a0 sampleid a1 a2 a3 a4
a0 1 a a a a4
a0 2 b b b a4
a0 3 c c c a4
a0 5 e e e a4
$cat first_code_result.txt (good one!)
sampleid s1 s2 s3 s4 a0 a1 a2 a3 a4
1 1 1 1 1 a0 a a a a4
2 2 2 2 2 a0 b b b a4
3 3 3 3 3 a0 c c c a4
$cat second_code_result.txt
sampleid s1 s2 s3 s4 $1 $3 $4 $5 $6
1 1 1 1 1 $1 $3 $4 $5 $6
2 2 2 2 2 $1 $3 $4 $5 $6
3 3 3 3 3 $1 $3 $4 $5 $6
Try this (untested):
awk -v mf1="$mapfieldfile1" -v mf2="$mapfieldfile2" '
BEGIN {FS=OFS="\t"}
FNR==NR{sub(/\t[^\t]+/,""); hash1[$mf2]=$0; next}
($mf1 in hash1){ print $0, hash1[$mf1]}
' file2 file1
Don't let shell variables expand within awk scripts, use a regexp to remove fields from the record and idk why the script you haven't shown us is printing literally $3, etc. but you must be including them in a string. You'd have to post that script for help debugging it.
Check where mf1 vs mf2 should appear, I got confused reading your scripts.
EDIT - I had to tweak it as above I was deleting $2 before using it:
$ awk -v mf1="1" -v mf2="2" '
BEGIN {FS=OFS="\t"}
FNR==NR{key=$mf2; sub(/\t[^\t]+/,""); hash1[key]=$0; next}
($mf1 in hash1){ print $0, hash1[$mf1]}
' file2 file1
sampleid s1 s2 s3 s4 a0 a1 a2 a3 a4
1 1 1 1 1 a0 a a a a4
2 2 2 2 2 a0 b b b a4
3 3 3 3 3 a0 c c c a4
Note that the sub() above relies on the key field being $2 and FS being a tab. If you need a more general solution let us know.
Here's a version that'll do what you want for any key field values and will work in any awk, it just requires the FS to be a tab or some other fixed string (i.e. not a regexp):
$ cat tst.awk
BEGIN { FS=OFS="\t" }
NR==FNR {
key = $mf2
val = ""
nf = 0
for (i=1; i<=NF; i++) {
if (i != mf2) {
val = (nf++ ? val FS : "") $i
}
}
hash1[key] = val
next
}
$mf1 in hash1 { print $0, hash1[$mf1] }
$ awk -v mf1="1" -v mf2="2" -f tst.awk file2 file1
sampleid s1 s2 s3 s4 a0 a1 a2 a3 a4
1 1 1 1 1 a0 a a a a4
2 2 2 2 2 a0 b b b a4
3 3 3 3 3 a0 c c c a4
if your files are sorted already, the default output of join is what you want
$ join -t$'\t' -11 -22 file1 file2
sampleid s1 s2 s3 s4 a0 a1 a2 a3 a4
1 1 1 1 1 a0 a a a a4
2 2 2 2 2 a0 b b b a4
3 3 3 3 3 a0 c c c a4
or, after prettying with column
$ join -t$'\t' -11 -22 file1 file2 | column -t
sampleid s1 s2 s3 s4 a0 a1 a2 a3 a4
1 1 1 1 1 a0 a a a a4
2 2 2 2 2 a0 b b b a4
3 3 3 3 3 a0 c c c a4

Only display the largest number in head

I use sort -r | head and get the out put like this:
8 a1
8 a2
5 a3
5 a4
4 a5
4 a6
4 a7
4 a8
4 a9
4 a0
What can I do to make the output like this:
8 a1
8 a2
only the largest k1 number show up????
There are several ways to do it, but here is one using awk. Since it is already sorted, you want to check to just print lines that match the first value by piping the headed list into something like
awk 'BEGIN{maxval=0}; (maxval==0) {maxval=$1}; ($1==maxval) {print $0}'

Compare two file columns (unsorted files)

Input File 1
A1 123 AA
B1 123 BB
C2 44 CC1
D1 12 DD1
E1 11 EE1
Input File 2
A sad21 1
DD1 124f2 2
CC 123tges 3
BB 124sdf 4
AA 1asrf 5
Output File
A1 123 AA 1asrf 5
B1 123 BB 124sdf 4
D1 12 DD1 124f2 2
Making of Output file
We check 3rd column of Input File 1 and 1st Col of Input File 2.
If they match , we print it in Output file.
Note :
The files are not sorted
I tried :
join -t, A B | awk -F "\t" 'BEGIN{OFS="\t"} {if ($3==$4) print $1,$2,$3,$4,$6}'
But this doesnot work as files are unsorted. so the condition ($3==$4) won't work all the time. Please help .
nawk 'FNR==NR{a[$3]=$0;next}{if($1 in a){p=$1;$1="";print a[p],$0}}' file1 file2
tested below:
> cat file1
A1 123 AA
B1 123 BB
C2 44 CC1
D1 12 DD1
E1 11 EE1
> cat file2
A sad21 1
DD1 124f2 2
CC 123tges 3
BB 124sdf 4
AA 1asrf 5
> awk 'FNR==NR{a[$3]=$0;next}{if($1 in a){p=$1;$1="";print a[p],$0}}' file1 file2
D1 12 DD1 124f2 2
B1 123 BB 124sdf 4
A1 123 AA 1asrf 5
>
You can use join, but you need to sort on the key field first and tell join that the key in the first file is column 3 (-1 3):
join -1 3 <(sort -k 3,3 file1) <(sort file2)
Will get you the correct fields, output (with column -t for output formatting):
AA A1 123 1asrf 5
BB B1 123 124sdf 4
DD1 D1 12 124f2 2
To get the same column ordering listed in the question, you need to specify the output format:
join -1 3 -o 1.1,1.2,1.3,2.2,2.3 <(sort -k 3,3 file1) <(sort file2)
i.e. file 1 fields 1 through 3 then file 2 fields 2 and 3. Output (again with column -t):
A1 123 AA 1asrf 5
B1 123 BB 124sdf 4
D1 12 DD1 124f2 2
perl -F'/\t/' -anle 'BEGIN{$f=1}if($f==1){$H{$F[2]}=$_;$f++ if eof}else{$l=$H{$F[0]};print join("\t",$l,#F[1..$#F]) if defined$l}' f1.txt f2.txt
or shorter
perl -F'/\t/' -anle'$f?($l=$H{$F[0]})&&print(join"\t",$l,#F[1..$#F]):($H{$F[2]}=$_);eof&&$f++' f1.txt f2.txt
One way using awk:
awk 'BEGIN { FS=OFS="\t" } FNR==NR { array[$1]=$2 OFS $3; next } { if ($3 in array) print $0, array[$3] }' file2.txt file1.txt
Results:
A1 123 AA 1asrf 5
B1 123 BB 124sdf 4
D1 12 DD1 124f2 2
This might work for you (GNU sed):
sed 's|\(\S*\)\(.*\)|/\\s\1$/s/$/\2/p|' file2 | sed -nf - file1

Resources