Well, I have several files and I'm having trouble to join them with awk.
Here are three sample files:
FileA
2 2 31771 15 5327353 4.73E-04 1 1
2 2 40642 23 27428869 -8.29E-04 1 1
2 2 21517 7 89002990 -2.41E-04 1 1
2 2 33787 16 44955568 2.86E-05 1 1
FileB
2 2 31771 15 5327353 5.07E-04 1 1
2 2 40642 23 27428869 5.45E-04 1 1
2 2 21517 7 89002990 1.85E-04 1 1
2 2 33787 16 44955568 3.73E-04 1 1
FileC
2 2 31771 15 5327353 4.28E-04 1 1
2 2 40642 23 27428869 -7.55E-04 1 1
2 2 21517 7 89002990 -2.01E-04 1 1
2 2 33787 16 44955568 3.09E-05 1 1
Each file has 8 columns, but I do not need columns 1, 2, 7 and 8. Columns 3, 4, and 5 are common to all files, these being perhaps the reference columns for joining the files, and column 6 would be the different information of each file. In general, my final file would look like this:
Finalfile
31771 15 5327353 4.73E-04 5.07E-04 4.28E-04
40642 23 27428869 -8.29E-04 5.45E-04 -7.55E-04
21517 7 89002990 -2.41E-04 1.85E-04 -2.01E-04
33787 16 44955568 2.86E-05 3.73E-04 3.09E-05
I tried the following commands:
awk 'NR==FNR{a[$3]=$6;next}{print $3"\t"$4"\t"$5"\t"$6"\t"a[$3]}' FileA FileB FileC > Finalfile
But unfortunately it only works with two files, and I get something like this:
2 2 31771 15 5327353 4.73E-04 5.07E-04
2 2 40642 23 27428869 -8.29E-04 5.45E-04
2 2 21517 7 89002990 -2.41E-04 1.85E-04
2 2 33787 16 44955568 2.86E-05 3.73E-04
Can someone please help? Remember that there are multiple files (10), not just three. Thank you very much!
Give this a try:
awk '{a[$3FS$4FS$5]=a[$3FS$4FS$5]""$6FS}END{for (i in a){print i, a[i]}}' file*
A cleaner version (Thanks #james-brown):
awk '{ a[$3 OFS $4 OFS $5 FS] = a[$3 OFS $4 OFS $5 FS] ( a[$3 OFS $4 OFS $5 FS] == "" ? "" : OFS) $6 }
END{ for (i in a){print i,a[i]} }' OFS="\t" file*
Output
33787 16 44955568 2.86E-05 3.73E-04 3.09E-05
21517 7 89002990 -2.41E-04 1.85E-04 -2.01E-04
40642 23 27428869 -8.29E-04 5.45E-04 -7.55E-04
31771 15 5327353 4.73E-04 5.07E-04 4.28E-04
paste + awk approach(with "pretty" output):
paste fileA fileB fileC | awk '{print $3,$4,$5,$6,$14,$22}' | column -tx
The output:
31771 15 5327353 4.73E-04 5.07E-04 4.28E-04
40642 23 27428869 -8.29E-04 5.45E-04 -7.55E-04
21517 7 89002990 -2.41E-04 1.85E-04 -2.01E-04
33787 16 44955568 2.86E-05 3.73E-04 3.09E-05
Related
I have 4 column data files which have approximately 100 lines. I'd like to substract every nth from (n+3)th line and print the values in a new column ($5). The column data has not a regular pattern for each column.
My sample file:
cat input
1 2 3 20
1 2 3 10
1 2 3 5
1 2 3 20
1 2 3 30
1 2 3 40
1 2 3 .
1 2 3 .
1 2 3 . (and so on)
Output should be:
1 2 3 20 0 #(20-20)
1 2 3 10 20 #(30-10)
1 2 3 5 35 #(40-5)
1 2 3 20 ? #(. - 20)
1 2 3 30 ? #(. - 30)
1 2 3 40 ? #(. - 40)
1 2 3 .
1 2 3 .
1 2 3 . (and so on)
How can i do this in awk?
Thank you
For this I think the easiest thing is to read through the file twice. The first time (the NR==FNR block) we save all the 4th column values in an array indexed by the line number. The next block is executed for the second pass and creates a 5th column with the desired calculation (checking first to make sure that we wouldn't go passed the end of the file).
$ cat input
1 2 3 20
1 2 3 10
1 2 3 5
1 2 3 20
1 2 3 30
1 2 3 40
$ awk 'NR==FNR{a[NR]=$4; last=NR; next} {$5 = (FNR+3 <= last ? a[FNR+3] - $4 : "")}1' input input
1 2 3 20 0
1 2 3 10 20
1 2 3 5 35
1 2 3 20
1 2 3 30
1 2 3 40
You can do this using tac + awk + tac:
tac input |
awk '{a[NR]=$4} NR>3 { $5 = (a[NR-3] ~ /^[0-9]+$/ ? a[NR-3] - $4 : "?") } 1' |
tac | column -t
1 2 3 20 0
1 2 3 10 20
1 2 3 5 35
1 2 3 20 ?
1 2 3 30 ?
1 2 3 40 ?
1 2 3 .
1 2 3 .
1 2 3 .
I have a data file of:
1 2 3
1 5 7
2 5 9
11 21 110
6 17 -2
10 2 8
6 4 3
5 1 8
6 1 5
7 3 1
I want to add number 1 to the third column, only for line number 1, 3, 6, 8, 9, 10. And add 2 to the second column, for line number 6~9.
I know how to add 2 to entire second column, and add 1 to entire third column using awk
awk '{print $1, $2+2, $3+1}' data > data2
But how can I modify this code to specific lines of second and third column?
Thanks
Best,
awk to the rescue! You can check for NR in the condition, but for 6 values it will be tedious, alternatively you can check for string match with anchored NR.
$ awk 'BEGIN{lines=",1,3,6,8,9,10,"}
match(lines,","NR","){$3++}
NR>=6 && NR<=9{$2+=2}1' nums
1 2 4
1 5 7
2 5 10
11 21 110
6 17 -2
10 4 9
6 6 3
5 3 9
6 3 6
7 3 2
$ cat tst.awk
BEGIN {
for (i=6;i<=9;i++) {
d[2,i] = 2
}
split("1 3 6 8 9 10",t);
for (i in t) {
d[3,t[i]] = 1
}
}
{ $2 += d[2,NR]; $3 += d[3,NR]; print }
$ awk -f tst.awk file
1 2 4
1 5 7
2 5 10
11 21 110
6 17 -2
10 4 9
6 6 3
5 3 9
6 3 6
7 3 2
I have a bunch of files from simulation output, all with the same number of rows and fields.
What I need to do is to combine them, so that I get only one file with the numbers summed up, which basically resembles the addition of several matrices.
Example:
File1.txt
1 1 1
1 1 1
1 1 1
File2.txt
2 2 2
2 2 2
2 2 2
File3.txt
3 3 3
3 3 3
3 3 3
required output
6 6 6
6 6 6
6 6 6
I'm going to integrate this into some larger Shell-script, therefore I would prefer a solution in awk, though other languages are welcome as well.
awk '{for(i=1;i<=NF;i++)a[FNR,i]=$i+a[FNR,i]}
END{for(i=1;i<=FNR;i++)
for(j=1;j<=NF;j++)printf "%s%s", a[i,j],(j==NF?"\n":FS)}' f1 f2 f3
input files could be more than 3
test with your data:
kent$ head f[1-3]
==> f1 <==
1 1 1
1 1 1
1 1 1
==> f2 <==
2 2 2
2 2 2
2 2 2
==> f3 <==
3 3 3
3 3 3
3 3 3
kent$ awk '{for(i=1;i<=NF;i++)a[FNR,i]=$i+a[FNR,i]}END{for(i=1;i<=FNR;i++)for(j=1;j<=NF;j++)printf "%s%s", a[i,j],(j==NF?"\n":FS)}' f1 f2 f3
6 6 6
6 6 6
6 6 6
Quick hack:
paste f1 f2 f3 | awk '{for(i=1;i<=m;i++)printf "%d%s",$i+$(i+m)+$(i+2*m),i==m?ORS:OFS}' m=3
These are my two imput files:
file1.txt
1 34
2 55
3 44
6 77
file2.txt
1 12
2 7
5 32
And I wish my output to be:
1 34 12
2 55 0
3 44 0
5 0 32
6 77 0
I need to do this in awk and although I was able to merge files, I do not know how to do it without losing info...
awk -F"\t" 'NR==FNR {h[$1] = $2; next }{print $1,$2,h[$2]}' file1.txt file2.txt > try.txt
awk '{ if ($3 !="") print $1,$2,$3; else print $1,$2,"0";}' try.txt > output.txt
And the output is:
1 34 12
2 55 7
3 44 0
6 77 0
Sorry, I know this must be very easy, but I am quite new in this world! Please I need help!!! Thanks in advance!!
this command gives you the desired output:
awk 'NR==FNR{a[$1]=$2;next}
{if($1 in a){print $0,a[$1];delete a[$1]}
else print $0,"0"}
END{for(x in a)print x,"0",a[x]}' file2 file1|sort -n|column -t
note that I used sort and column to sort & format the output.
output: (note I guess the 2 55 0 was a typo in your expected output)
1 34 12
2 55 7
3 44 0
5 0 32
6 77 0
Here is another way using join and awk:
join -a1 -a2 -o1.1 2.1 1.2 2.2 -e0 file1 file2 | awk '{print ($1?$1:$2),$3,$4}' OFS='\t'
1 34 12
2 55 7
3 44 0
5 0 32
6 77 0
-a switch allows to join on un-pairable lines.
-o builds our output format
-e allows to specify what should be printed for values that do not exist
awk just completes the final formatting.
Input file:
AAA 2 3 4 5
BBB 3 4 5
AAA 23 21 34
BBB 4 5 62
I want the output to be:
AAA 2 3 4 5 23 21 34
BBB 3 4 5 4 5 62
I feel that I should use awk and sed but not sure how to realize it. Does anyone have any good ideas? Thanks.
This might work for you:
sort -sk1,1 file | sed ':a;$!N;s/^\([^ ]* \)\(.*\)\n\1/\1\2/;ta;P;D'
AAA 2 3 4 5 23 21 34
BBB 3 4 5 4 5 62
or gnu awk;
awk '{if($1 in a){line=$0;sub(/[^ ]* /,"",line);a[$1]=a[$1]line;next};a[$1]=$0}END{n=asort(a);for(i=1;i<=n;i++)print a[i]}' file
AAA 2 3 4 5 23 21 34
BBB 3 4 5 4 5 62
Here is an awk 1 liner to solve above problem:
awk '{line=$2;for(i=3; i<=NF; i++) line=line " " $i; arr[$1]=arr[$1] " " line} END{for (val in arr) print val, arr[val]}' file
Using bash version 4's associative arrays
$ declare -A vals
$ while read key nums; do vals[$key]+="$nums "; done < filename
$ for key in "${!vals[#]}"; do printf "%s %s\n" "$key" "${vals[$key]}"; done
AAA 2 3 4 5 23 21 34
BBB 3 4 5 4 5 62