awk; searching file2 by file1 - bash

I have searched high and low, but not finding the exact code for awk.
I have 2 files.
File 1 (single column):
1407859648
1639172851
1427051689
1023011285
1437152683
1508869405
1790775963
1932373552
File 2 (three columns):
1790775963,1932373552,65
1639206006,1437337425,15
1265418669,1477541563,145
1053424648,1316944317,182
1184611535,1821014457,26
1003906082,1134327133,152
1376530121,1841236684,168
1316921570,1962555771,23
1396962627,1184732489,87
1194958421,1255333456,113
1538156732,1336215482,62
File 1 and 2 have an unequal number of records.
I would like to print records from File 2, when both Col1 and Col2 in File2 match Col1 from File1.
In this example output should be:
1790775963,1932373552,65
Thank you!
A

Try following:
awk -F',' 'NR==FNR {arr[$1]++; next} (($1 in arr) && ($2 in arr)) {print $0}' file1 file2
Output:
1790775963,1932373552,65
EDIT
Or more concisely as suggested by sudo_O
awk -F, 'NR==FNR{a[$0];next}($1 in a)&&($2 in a)' file1 file2

Related

Vlookup using awk command

I have two files in my linux server.
File 1
9190784
9197256
9170546
9184139
9196854
File 2
S NO.,Column1,Column2,Column3
72070,9196854,TGM,AP
72071,9172071,BGM,MP
72072,9184139,AGM,KN
72073,9172073,TGM,AP
I want to write a script or a single line command in bash using awk command, so as whatever the element in File -1 should match the same with column 1 in File -2 and print Column 1, Column2 and Column3. Also if any entry is not found it should print entry from file 1 and print NA in Column 2 and Column 3
Output : it should redirect the output to a new file as below.
new_file
9190784,TGM,AP
9197256,NA,NA
9170546,NA,NA
9184139,AGM,KN
9196854,TGM,AP
I hope the query is understandable. Anyone please help me on the same.
standard join operation with awk
$ awk 'BEGIN {FS=OFS=","}
NR==FNR {a[$2]=$3 OFS $4; next}
{print $1, (($1 in a)?a[$1]:"NA" OFS "NA")} file2 file1
substring variation (not tested)
$ awk 'BEGIN {FS=OFS=","}
NR==FNR {a[substr($2,1,7)]=$3 OFS $4; next}
{key=substr($1,1,7);
print $1, ((key in a)?a[key]:"NA" OFS "NA")} file2 file1
Does it have to be awk? It's done with join:
Having two files:
echo '9190784
9197256
9170546
9184139
9196854' >file2
echo 'S NO.,Column1,Column2,Column3
72070,9196854,TGM,AP
72071,9172071,BGM,MP
72072,9184139,AGM,KN
72073,9172073,TGM,AP' > file1
One can join the on , as separator on the second field from the first file1 -12 with removed the first header line tail -n +2 and sorted using the second field sort -t, -k2 with the first field from the second file -21 sorted sort.
join -t, -12 -21 -o1.2,1.3,1.4 <(tail -n +2 file1 | sort -t, -k2) <(sort file2)
will output:
9184139,AGM,KN
9196854,TGM,AP

Compare two files based on fields

I have two UNIX files with below data. I have to compare field 1, field 2 and field 3 of file 1 with file 2 and if that matches I have to check whether the field 5 in file 1 matches with field 5 of file 2 , if it does not match I have to print it from file 1 otherwise just ignore.
file 1
A|B|C|1|D|
A|B|D|1|D|
A|B|E|1|D|
A|B|F|1|D|
file 2
A|B|Z|1|D|
A|B|C|1|x|
A|B|D|1|y|
A|B|E|1|D|
So the result should be
A|B|C|1|D|
A|B|D|1|D|
awk to the rescue!
This for matching fields 1,2,3,5
$ awk -F'|' '{k=$1 FS $2 FS $3 FS $5} NR==FNR{a[k];next} k in a' file2 file1
A|B|E|1|D|
your question was different, however, the results doesn't match yours and you need to explain why one of the records shouldn't be printed
$ awk -F'|' '{k=$1 FS $2 FS $3}
NR==FNR {a[k]=$5; next}
k in a && a[k]!=$5' file2 file1
A|B|C|1|D|
A|B|D|1|D|

awk: two files are queried

I have two files
file1:
>string1<TAB>Name1
>string2<TAB>Name2
>string3<TAB>Name3
file2:
>string1<TAB>sequence1
>string2<TAB>sequence2
I want to use awk to compare column 1 of respective files. If both files share a column 1 value I want to print column 2 of file1 followed by column 2 of file2. For example, for the above files my expected output is:
Name1<TAB>sequence1
Name2<TAB>sequence2
this is my code:
awk 'BEGIN{FS=OFS="\t"} FNR == NR { a[$1] = $1; next } $1 in a { print a[$2], $2 }' file1 file2 >out
But the only thing I get is an empty first columnsequence
where is the error here?
your assignment is not right.
$ awk 'BEGIN {FS=OFS="\t"}
NR==FNR {a[$1]=$2; next}
$1 in a {print a[$1],$2}' file1 file2
Name1 sequence1
Name2 sequence2

Compare two csv files, use the first three columns as identifier, then print common lines

I have two csv files. File 1 has 9861 rows and 4 columns while File 2 has 6037 rows and 5 columns.Here are the files.
Link of File 1
Link of File 2
The first three columns are years, months, days respectively.
I want to get the lines in File 2 with the same identifier in File 1 and print this to File 3.
I found this command from some posts here but this only works using one column as identifier:
awk -F, 'NR==FNR {a[$1]=$0;next}; $1 in a {print a[$1]; print}' file1 file2
Is there a way to do this using awk or any simpler commands where I can use the first three columns as identifier?
Ill appreciate any help.
Just use more columns to make the uniqueness you need:
$ awk -F, 'NR==FNR {a[$1, $2, $3] = $0; next}
$1 SUBSEP $2 SUBSEP $3 in a' file1 file2
SUBSEP
is the subscript separator. It has the default value of "\034", and is used to separate the parts of the indices of a multi-dimensional array. Thus, the expression foo["A", "B"] really accesses foo["A\034B"]
awk -F, '{k=$1 FS $2 FS $3} NR==FNR{a[k];next} k in a' file1 file2
Untested of course since you didn't provide any sample input/output.

Values missing in awk

My Input files :
file1
231|35000
234|15000
242|60000
254|12313
345|50000
435|24300
file2
1|madhan|retl|231|tcs
2|vaisakh|retl|234|tcs
4|sam|ins|242|infy
5|tina|bfs|254|tcs
3|ram|bfs|345|infy
6|subbu|bfs|435|infy
Ouput :
Trying to get
col1 , col2 of file1 and col2 of file2 based on common column(col1 of file1 and col4 of file2)
My code :
awk 'BEGIN { FS="|";} NR==FNR{a[$1] = $2;next} ($4 in a) {print $2 "|" $4 "|" a[$1]} ' file_1 file_2
O/p i got:
madhan|231|
vaisakh|234|
sam|242|
tina|254|
ram|345|
subbu|435|
Can you help why last col is coming as spaces
Try something like:
join -t '|' -1 1 -2 4 file1 file2 | awk -F'|' '{print $1 "|" $2 "|" $4}'
Join on field 1 from file1 and field 4 on file 2 and extract fields what you need using awk.
This should do:
awk -F\| 'FNR==NR {a[$1]=$0;next} {for (i in a) if (i==$4) print a[i]"|"$2}' file1 file2
231|35000|madhan
234|15000|vaisakh
242|60000|sam
254|12313|tina
345|50000|ram
435|24300|subbu
It store file1 in array a using first field as index.
Then it test index in first file against fourth field in file2.
If they are equal, print data from file1 and second field from file2.
It is coming up blank because the key does not exist in the array. You are storing first column of file1 as key which is 4th column of file2.
$ awk '
BEGIN { FS=OFS="|" }
NR==FNR { a[$1]=$2; next }
($4 in a) { print $2, $4, a[$4] }
' file1 file2
madhan|231|35000
vaisakh|234|15000
sam|242|60000
tina|254|12313
ram|345|50000
subbu|435|24300
If you need the order stated in your requested O/P then
$ awk 'BEGIN {FS=OFS="|"}NR==FNR{a[$4]=$2;next} ($1 in a) {print $0, a[$1]}' file2 file1
231|35000|madhan
234|15000|vaisakh
242|60000|sam
254|12313|tina
345|50000|ram
435|24300|subbu

Resources