Compare each row of one file with each row of another file - bash

I have two files:
file1
cat file1
A,1
B,2
C,2
D,3
file2
cat file2
A
A
A
B
B
C
C
D
Desired output
cat output
A,A,1
A,A,1
A,A,1
B,B,2
B,B,2
C,C,2
C,C,2
D,D,3
As you can see, each line of file1 should be matched with each line of file2 and if they match, the line from file1 should be added to the matched line in file2. I have tried join but it doesn't work. I guess the search needs to be recursive but I am not sure how to do it when two files are involved.
Any help would be greatly appreciated.
Thanks

Using awk:
awk -F, -v OFS=, 'FNR==NR {a[$1]=$0;next} $1 in a{print $1, a[$1]}' file1 file2
A,A,1
A,A,1
A,A,1
B,B,2
B,B,2
C,C,2
C,C,2
D,D,3

join -t',' -o'1.1,2.1,2.2' file2 file1
this line does it.

Related

Merge File1 with File2 (keep appending from File1 to File2 until no more rows)

I can't find a solution.
So here is the problem.
Result should be 100 rows (File1) with contents from File2 repeating 25 times.
What I want is to join the contents even though the number of rows is not equal. Keep repeating including lines from File2 until number of rows from File1 is met.
File1:
test1#domain.com
test2#domain2.com
test3#domain3.com
test4#domain4.com
File2:
A1,B11
A2,B22
A3,B33
A4,B44
What I want is to combine the files in the following to have the following expected result:
File3:
test1#domain.com,A1,B12
test2#domain2.com,A2,B22
test3#domain3.com,A3,B33
test4#domain4.com,A4,B44
Note here: After it finishes with the 4 rows from File2, start again from first line, then repeat.
test5#domain5.com,A1,B12
test6#domain6.com,A2,B22
test7#domain7.com,A3,B33
test8#domain8.com,A4,B44
The example in your question isn't clear but I THINK this is what you're trying to do:
$ awk -v OFS=',' 'NR==FNR{a[++n]=$0;next} {print $0, a[(FNR-1)%n+1]}' file2 file1
test1#domain.com,A1,B11
test2#domain2.com,A2,B22
test3#domain3.com,A3,B33
test4#domain4.com,A4,B44
test5#domain5.com,A1,B11
test6#domain6.com,A2,B22
The above was run against this input:
$ cat file1
test1#domain.com
test2#domain2.com
test3#domain3.com
test4#domain4.com
test5#domain5.com
test6#domain6.com
$
$ cat file2
A1,B11
A2,B22
A3,B33
A4,B44
Could you please try following.
awk '
BEGIN{
OFS=","
}
FNR==NR{
a[++count]=$0
next
}
{
count_curr++
count_curr=count_curr>count?1:count_curr
print a[count_curr],$0
}
' Input_file2 Input_file1

Vlookup using awk command

I have two files in my linux server.
File 1
9190784
9197256
9170546
9184139
9196854
File 2
S NO.,Column1,Column2,Column3
72070,9196854,TGM,AP
72071,9172071,BGM,MP
72072,9184139,AGM,KN
72073,9172073,TGM,AP
I want to write a script or a single line command in bash using awk command, so as whatever the element in File -1 should match the same with column 1 in File -2 and print Column 1, Column2 and Column3. Also if any entry is not found it should print entry from file 1 and print NA in Column 2 and Column 3
Output : it should redirect the output to a new file as below.
new_file
9190784,TGM,AP
9197256,NA,NA
9170546,NA,NA
9184139,AGM,KN
9196854,TGM,AP
I hope the query is understandable. Anyone please help me on the same.
standard join operation with awk
$ awk 'BEGIN {FS=OFS=","}
NR==FNR {a[$2]=$3 OFS $4; next}
{print $1, (($1 in a)?a[$1]:"NA" OFS "NA")} file2 file1
substring variation (not tested)
$ awk 'BEGIN {FS=OFS=","}
NR==FNR {a[substr($2,1,7)]=$3 OFS $4; next}
{key=substr($1,1,7);
print $1, ((key in a)?a[key]:"NA" OFS "NA")} file2 file1
Does it have to be awk? It's done with join:
Having two files:
echo '9190784
9197256
9170546
9184139
9196854' >file2
echo 'S NO.,Column1,Column2,Column3
72070,9196854,TGM,AP
72071,9172071,BGM,MP
72072,9184139,AGM,KN
72073,9172073,TGM,AP' > file1
One can join the on , as separator on the second field from the first file1 -12 with removed the first header line tail -n +2 and sorted using the second field sort -t, -k2 with the first field from the second file -21 sorted sort.
join -t, -12 -21 -o1.2,1.3,1.4 <(tail -n +2 file1 | sort -t, -k2) <(sort file2)
will output:
9184139,AGM,KN
9196854,TGM,AP

Awk syntax error in loop

cat file1
xi=zaoshui jiao=#E0488_5#
chi=fan da qiu=#E0488_3#
gong=zuo you xi #E0977_5#
cat file2
#E0488_3# #E21562_3#
#E0488_5# #E21562_5#
#E0977_3# #E21630_3#
#E0977_5# #E21630_5#
#E0977_6# #E21631_1#
Purpose: if $NF in file1 found in file2 $1, than replace $NF in file1 with file2 $2.otherwise, makes no change.
My Code:
awk 'NR==FNR{a[$1]=$2;next}
{split($NF,a,"=");for($NF in a){$NF=a[$NF]}}1' test2.txt test1.txt
But it comes error:
awk: cmd. line:1: NR==FNR{a[$1]=$2;next}{split($NF,a,"=");for($NF in a){$NF=a[$NF]}}1
awk: cmd. line:1: ^ syntax error
Does my code look right? It seems grammar issue happens. How can I improve it?
My expect output:
xi=zaoshui jiao=#E21562_5#
chi=fan da qiu=#E21562_3#
gong=zuo you xi #E21630_5#
for($NF in a) is not valid syntax, ($NF gives value)
it can be like
for (var in array)
body
Read More from : GNU AWK Scanning-an-Array
Used sub($NF,a[$NF]) to retain your original field separator, since last record, last field has space before, whereas other lines last field has = before, assuming values doesn't repeat other than last field.
Test Results:
$ cat file1
xi=zaoshui jiao=#E0488_5#
chi=fan da qiu=#E0488_3#
gong=zuo you xi #E0977_5#
$ cat file2
#E0488_3# #E21562_3#
#E0488_5# #E21562_5#
#E0977_3# #E21630_3#
#E0977_5# #E21630_5#
#E0977_6# #E21631_1#
$ awk 'FNR==NR{a[$1]=$NF;next}($NF in a){sub($NF,a[$NF])}1' file2 FS='[ =]' file1
xi=zaoshui jiao=#E21562_5#
chi=fan da qiu=#E21562_3#
gong=zuo you xi #E21630_5#
Not sure completely but could you please try following and do let me know if this helps you.
awk 'FNR==NR{a[$1]=$NF;next} ($NF in a){$NF=a[$NF]} 1' FS="=" file2 FS='[= ]' OFS="=" file1
Output will be as follows.
xi=zaoshui jiao=#E0488_5#
chi=fan da qiu=#E0488_3#
gong=zuo you xi #E0977_5#
EDIT: Adding explanation too now for same.
awk '
FNR==NR{ ##Checking condition FNR==NR which will be TRUE when first Input_file named file2 is being read.
a[$1]=$NF; ##making an array named a whose index is $1 of current line and value is last field of the current line.
next ##next will skip all the further statements now.
}
($NF in a){ ##Checking condition here if last field of current line of Input_file file1 is present in array a if yes then do following.
$NF=a[$NF] ##creating last field value to array a value whose index is $NF of current line in Input_file file1.
}
1 ##1 will print the lines for Input_file file1.
' FS="=" file2 FS='[= ]' OFS="=" file1 ##Setting FS="=" for file2 and setting FS value to either = or space for file1 and setting OFS value to = for file1 too.
My code is as below, hope it could be helpful even if it's not the most efficient answer.
awk '$NF ~ /=/ {gsub("="," # ",$NF)}{print $0}' file1 > file3
cat file3
xi=zaoshui jiao # #E0488_5#
chi=fan da qiu # #E0488_3#
gong=zuo you xi #E0977_5#
As you said ,replace file1 with file3, if $NF of file3 found in file2 $1, than replace $NF of file3 with file2 $2
awk 'NR==FNR {a[$1]=$2;next}($NF in a){$NF=a[$NF]}1' file2 file3 | sed 's/ # /=/g'
xi=zaoshui jiao=#E21562_5#
chi=fan da qiu=#E21562_3#
gong=zuo you xi #E21630_5#

awk; searching file2 by file1

I have searched high and low, but not finding the exact code for awk.
I have 2 files.
File 1 (single column):
1407859648
1639172851
1427051689
1023011285
1437152683
1508869405
1790775963
1932373552
File 2 (three columns):
1790775963,1932373552,65
1639206006,1437337425,15
1265418669,1477541563,145
1053424648,1316944317,182
1184611535,1821014457,26
1003906082,1134327133,152
1376530121,1841236684,168
1316921570,1962555771,23
1396962627,1184732489,87
1194958421,1255333456,113
1538156732,1336215482,62
File 1 and 2 have an unequal number of records.
I would like to print records from File 2, when both Col1 and Col2 in File2 match Col1 from File1.
In this example output should be:
1790775963,1932373552,65
Thank you!
A
Try following:
awk -F',' 'NR==FNR {arr[$1]++; next} (($1 in arr) && ($2 in arr)) {print $0}' file1 file2
Output:
1790775963,1932373552,65
EDIT
Or more concisely as suggested by sudo_O
awk -F, 'NR==FNR{a[$0];next}($1 in a)&&($2 in a)' file1 file2

comparing 2 files and extracting elements from file

I have two files. one has list of names (only one column) and the second file is with three columns with names, phone number, country.
What I want is to extract the data of the people whose names are not present in file 1, but only present in file2.
#!/bin/bash
for i in `cat file1 `
do
cat file2 | awk '{ if ($1 != "'$i'") {print $1 "\t" $2 "\t" $3 }}'>>NonResp
done
What I get is a weird result with more data than expected.
Kindly help.
You can do this with grep:
grep -v -F -f file1 file2
awk '{print $1}' file2 | comm -1 -3 file1 - | join file2 -
The files must already be sorted for this to work properly.
Explanation:
=> awk '{print $1}' file2 |
print only the first fileld of file2 and feed it to the next command (|)
=> comm -1 -3 file1 - |
compare file1 and the output of the last command (-) and suppress lines only in file1 (-1) as well as lines in both files (-3); that leaves lines in file2 only and feed this to the next command (|)
=> join file2 -
join the original file2 and the output from the last command (-) and write out the fields fo the matching lines (whitespace between fields is truncated, however)
Testcase:
cat <<EOF >file1
alan
bert
cindy
dave
fred
sunny
ted
EOF
cat <<EOF >file2
bert 01 AU
cindy 03 CZ
ginny 05 CN
ted 07 CH
zorro 09 AG
EOF
awk '{print $1}' file2 | comm -1 -3 file1 - | join file2 -
assuming the field delimiter as "," in file2
awk -F, 'FNR==NR{a[$1];next}!($1 in a)' file1 file2
if "," is not the delimiter ,then simply
awk 'FNR==NR{a[$1];next}!($1 in a)' file1 file2
would be sufficient.

Resources