How do I join multiple files by awk

How do I join multiple files by awk - shell

Well, I have several files and I'm having trouble to join them with awk.
Here are three sample files:
FileA
2 2 31771 15 5327353 4.73E-04 1 1
2 2 40642 23 27428869 -8.29E-04 1 1
2 2 21517 7 89002990 -2.41E-04 1 1
2 2 33787 16 44955568 2.86E-05 1 1
FileB
2 2 31771 15 5327353 5.07E-04 1 1
2 2 40642 23 27428869 5.45E-04 1 1
2 2 21517 7 89002990 1.85E-04 1 1
2 2 33787 16 44955568 3.73E-04 1 1
FileC
2 2 31771 15 5327353 4.28E-04 1 1
2 2 40642 23 27428869 -7.55E-04 1 1
2 2 21517 7 89002990 -2.01E-04 1 1
2 2 33787 16 44955568 3.09E-05 1 1
Each file has 8 columns, but I do not need columns 1, 2, 7 and 8. Columns 3, 4, and 5 are common to all files, these being perhaps the reference columns for joining the files, and column 6 would be the different information of each file. In general, my final file would look like this:
Finalfile
31771 15 5327353 4.73E-04 5.07E-04 4.28E-04
40642 23 27428869 -8.29E-04 5.45E-04 -7.55E-04
21517 7 89002990 -2.41E-04 1.85E-04 -2.01E-04
33787 16 44955568 2.86E-05 3.73E-04 3.09E-05
I tried the following commands:
awk 'NR==FNR{a[$3]=$6;next}{print $3"\t"$4"\t"$5"\t"$6"\t"a[$3]}' FileA FileB FileC > Finalfile
But unfortunately it only works with two files, and I get something like this:
2 2 31771 15 5327353 4.73E-04 5.07E-04
2 2 40642 23 27428869 -8.29E-04 5.45E-04
2 2 21517 7 89002990 -2.41E-04 1.85E-04
2 2 33787 16 44955568 2.86E-05 3.73E-04
Can someone please help? Remember that there are multiple files (10), not just three. Thank you very much!

Give this a try:
awk '{a[$3FS$4FS$5]=a[$3FS$4FS$5]""$6FS}END{for (i in a){print i, a[i]}}' file*
A cleaner version (Thanks #james-brown):
awk '{ a[$3 OFS $4 OFS $5 FS] = a[$3 OFS $4 OFS $5 FS] ( a[$3 OFS $4 OFS $5 FS] == "" ? "" : OFS) $6 }
END{ for (i in a){print i,a[i]} }' OFS="\t" file*
Output
33787 16 44955568 2.86E-05 3.73E-04 3.09E-05
21517 7 89002990 -2.41E-04 1.85E-04 -2.01E-04
40642 23 27428869 -8.29E-04 5.45E-04 -7.55E-04
31771 15 5327353 4.73E-04 5.07E-04 4.28E-04

paste + awk approach(with "pretty" output):
paste fileA fileB fileC | awk '{print $3,$4,$5,$6,$14,$22}' | column -tx
The output:
31771 15 5327353 4.73E-04 5.07E-04 4.28E-04
40642 23 27428869 -8.29E-04 5.45E-04 -7.55E-04
21517 7 89002990 -2.41E-04 1.85E-04 -2.01E-04
33787 16 44955568 2.86E-05 3.73E-04 3.09E-05

Related

How to substract every nth from (n+3)th line in awk?

I have 4 column data files which have approximately 100 lines. I'd like to substract every nth from (n+3)th line and print the values in a new column ($5). The column data has not a regular pattern for each column.
My sample file:
cat input
1 2 3 20
1 2 3 10
1 2 3 5
1 2 3 20
1 2 3 30
1 2 3 40
1 2 3 .
1 2 3 .
1 2 3 . (and so on)
Output should be:
1 2 3 20 0 #(20-20)
1 2 3 10 20 #(30-10)
1 2 3 5 35 #(40-5)
1 2 3 20 ? #(. - 20)
1 2 3 30 ? #(. - 30)
1 2 3 40 ? #(. - 40)
1 2 3 .
1 2 3 .
1 2 3 . (and so on)
How can i do this in awk?
Thank you

For this I think the easiest thing is to read through the file twice. The first time (the NR==FNR block) we save all the 4th column values in an array indexed by the line number. The next block is executed for the second pass and creates a 5th column with the desired calculation (checking first to make sure that we wouldn't go passed the end of the file).
$ cat input
1 2 3 20
1 2 3 10
1 2 3 5
1 2 3 20
1 2 3 30
1 2 3 40
$ awk 'NR==FNR{a[NR]=$4; last=NR; next} {$5 = (FNR+3 <= last ? a[FNR+3] - $4 : "")}1' input input
1 2 3 20 0
1 2 3 10 20
1 2 3 5 35
1 2 3 20
1 2 3 30
1 2 3 40

You can do this using tac + awk + tac:
tac input |
awk '{a[NR]=$4} NR>3 { $5 = (a[NR-3] ~ /^[0-9]+$/ ? a[NR-3] - $4 : "?") } 1' |
tac | column -t
1 2 3 20 0
1 2 3 10 20
1 2 3 5 35
1 2 3 20 ?
1 2 3 30 ?
1 2 3 40 ?
1 2 3 .
1 2 3 .
1 2 3 .

AWK: Add number to the column for specific line

I have a data file of:
1 2 3
1 5 7
2 5 9
11 21 110
6 17 -2
10 2 8
6 4 3
5 1 8
6 1 5
7 3 1
I want to add number 1 to the third column, only for line number 1, 3, 6, 8, 9, 10. And add 2 to the second column, for line number 6~9.
I know how to add 2 to entire second column, and add 1 to entire third column using awk
awk '{print $1, $2+2, $3+1}' data > data2
But how can I modify this code to specific lines of second and third column?
Thanks
Best,

awk to the rescue! You can check for NR in the condition, but for 6 values it will be tedious, alternatively you can check for string match with anchored NR.
$ awk 'BEGIN{lines=",1,3,6,8,9,10,"}
match(lines,","NR","){$3++}
NR>=6 && NR<=9{$2+=2}1' nums
1 2 4
1 5 7
2 5 10
11 21 110
6 17 -2
10 4 9
6 6 3
5 3 9
6 3 6
7 3 2

$ cat tst.awk
BEGIN {
for (i=6;i<=9;i++) {
d[2,i] = 2
}
split("1 3 6 8 9 10",t);
for (i in t) {
d[3,t[i]] = 1
}
}
{ $2 += d[2,NR]; $3 += d[3,NR]; print }
$ awk -f tst.awk file
1 2 4
1 5 7
2 5 10
11 21 110
6 17 -2
10 4 9
6 6 3
5 3 9
6 3 6
7 3 2

Matrix addition over multiple files using e.g. awk

I have a bunch of files from simulation output, all with the same number of rows and fields.
What I need to do is to combine them, so that I get only one file with the numbers summed up, which basically resembles the addition of several matrices.
Example:
File1.txt
1 1 1
1 1 1
1 1 1
File2.txt
2 2 2
2 2 2
2 2 2
File3.txt
3 3 3
3 3 3
3 3 3
required output
6 6 6
6 6 6
6 6 6
I'm going to integrate this into some larger Shell-script, therefore I would prefer a solution in awk, though other languages are welcome as well.

awk '{for(i=1;i<=NF;i++)a[FNR,i]=$i+a[FNR,i]}
END{for(i=1;i<=FNR;i++)
for(j=1;j<=NF;j++)printf "%s%s", a[i,j],(j==NF?"\n":FS)}' f1 f2 f3
input files could be more than 3
test with your data:
kent$ head f[1-3]
==> f1 <==
1 1 1
1 1 1
1 1 1
==> f2 <==
2 2 2
2 2 2
2 2 2
==> f3 <==
3 3 3
3 3 3
3 3 3
kent$ awk '{for(i=1;i<=NF;i++)a[FNR,i]=$i+a[FNR,i]}END{for(i=1;i<=FNR;i++)for(j=1;j<=NF;j++)printf "%s%s", a[i,j],(j==NF?"\n":FS)}' f1 f2 f3
6 6 6
6 6 6
6 6 6

Quick hack:
paste f1 f2 f3 | awk '{for(i=1;i<=m;i++)printf "%d%s",$i+$(i+m)+$(i+2*m),i==m?ORS:OFS}' m=3

awk merge two columns by key, joining values

These are my two imput files:
file1.txt
1 34
2 55
3 44
6 77
file2.txt
1 12
2 7
5 32
And I wish my output to be:
1 34 12
2 55 0
3 44 0
5 0 32
6 77 0
I need to do this in awk and although I was able to merge files, I do not know how to do it without losing info...
awk -F"\t" 'NR==FNR {h[$1] = $2; next }{print $1,$2,h[$2]}' file1.txt file2.txt > try.txt
awk '{ if ($3 !="") print $1,$2,$3; else print $1,$2,"0";}' try.txt > output.txt
And the output is:
1 34 12
2 55 7
3 44 0
6 77 0
Sorry, I know this must be very easy, but I am quite new in this world! Please I need help!!! Thanks in advance!!

this command gives you the desired output:
awk 'NR==FNR{a[$1]=$2;next}
{if($1 in a){print $0,a[$1];delete a[$1]}
else print $0,"0"}
END{for(x in a)print x,"0",a[x]}' file2 file1|sort -n|column -t
note that I used sort and column to sort & format the output.
output: (note I guess the 2 55 0 was a typo in your expected output)
1 34 12
2 55 7
3 44 0
5 0 32
6 77 0

Here is another way using join and awk:
join -a1 -a2 -o1.1 2.1 1.2 2.2 -e0 file1 file2 | awk '{print ($1?$1:$2),$3,$4}' OFS='\t'
1 34 12
2 55 7
3 44 0
5 0 32
6 77 0
-a switch allows to join on un-pairable lines.
-o builds our output format
-e allows to specify what should be printed for values that do not exist
awk just completes the final formatting.

use bash to combine values of the same name

Input file:
AAA 2 3 4 5
BBB 3 4 5
AAA 23 21 34
BBB 4 5 62
I want the output to be:
AAA 2 3 4 5 23 21 34
BBB 3 4 5 4 5 62
I feel that I should use awk and sed but not sure how to realize it. Does anyone have any good ideas? Thanks.

This might work for you:
sort -sk1,1 file | sed ':a;$!N;s/^\([^ ]* \)\(.*\)\n\1/\1\2/;ta;P;D'
AAA 2 3 4 5 23 21 34
BBB 3 4 5 4 5 62
or gnu awk;
awk '{if($1 in a){line=$0;sub(/[^ ]* /,"",line);a[$1]=a[$1]line;next};a[$1]=$0}END{n=asort(a);for(i=1;i<=n;i++)print a[i]}' file
AAA 2 3 4 5 23 21 34
BBB 3 4 5 4 5 62

Here is an awk 1 liner to solve above problem:
awk '{line=$2;for(i=3; i<=NF; i++) line=line " " $i; arr[$1]=arr[$1] " " line} END{for (val in arr) print val, arr[val]}' file

Using bash version 4's associative arrays
$ declare -A vals
$ while read key nums; do vals[$key]+="$nums "; done < filename
$ for key in "${!vals[#]}"; do printf "%s %s\n" "$key" "${vals[$key]}"; done
AAA 2 3 4 5 23 21 34
BBB 3 4 5 4 5 62

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How do I join multiple files by awk - shell

Related

How to substract every nth from (n+3)th line in awk?

AWK: Add number to the column for specific line

Matrix addition over multiple files using e.g. awk

awk merge two columns by key, joining values

use bash to combine values of the same name

Categories

Resources