Find in word from one file to another file using awk, NR==FNR - shell

I am trying to find words from one file to another file using following awk command:
awk 'NR==FNR{a[$2]=$1;next}($1 in a){print a[$2] " " $2}' file1 file2
content of files are :
file1:
vidhu 1
gangwar 2
file2:
1 1
2 4980022
Expected Output:
vidhu 1
gangwar 4980022
But output is coming like that :
vidhu 1
4980022
Help me out to find this problem.

#try:
awk 'FNR==NR{A[$2]=$1;next} ($1 in A){print A[$1], $2}' Input_file1 Input_file2
Explanation:
awk 'FNR==NR #### Checking condition of FNR==NR which will be TRUE when first Input_file1 is being read.
{A[$2]=$1; #### create an array named A with index of field 2 and have value as first field.
next} #### using next keyword(built-in awk) so it will skip all next statements.
($1 in A) #### Now checking if first field of file2 is present in array A, this will be checked only when Input_file2 is being read.
{print A[$1], $2 #### printing value of array A's value whose index is $1 and $2 of Input_file2.
}' Input_file1 Input_file2 #### Mentioning the Input_file1 and Input_file2 here.

You can also use something like this:
join -1 2 -2 1 file1 file2 -o 1.1,2.2
Assuming that your files are in sorted order.
If not then first sort them.

Related

How to compare two files and print the values of both the files which are different

There are 2 files. I need to sort them first and then compare the 2 files and then the difference I need to print the value from File 1 and File 2.
file1:
pair,bid,ask
AED/MYR,3.918000,3.918000
AED/SGD,3.918000,3.918000
AUD/CAD,3.918000,3.918000
file2:
pair,bid,ask
AUD/CAD,3.918000,3.918000
AUD/CNY,3.918000,3.918000
AED/MYR,4.918000,4.918000
Output should be:
pair,inputbid,inputask,outputbid,outtputask
AED/MYR,3.918000,3.918000,4.918000,4.918000
The only difference in 2 files is AED/MYR with different bid/ask rates. How can I print difference value from file 1 and file 2.
I tried using below commands:
nawk -F, 'NR==FNR{a[$1]=$4;a[$2]=$5;next} !($4 in a) || !($5 in a) {print $1 FS a[$1] FS a[$2] FS $4 FS $5}' file1 file2
Result output as below:
pair,bid,ask,bid,ask
AUD/CAD,3.918000,3.918000,3.918000,3.918000
AUD/CHF,3.918000,3.918000,3.918000,3.918000
AUD/CNH,3.918000,3.918000,3.918000,3.918000
AUD/CNY,3.918000,3.918000,3.918000,3.918000
AED/MYR,3.918000,3.918000,4.918000,4.918000
We are still not able to get only the difference.
Could you please try following, written and tested in GNU awk with shown samples.
awk -v header="pair,inputbid,inputask,outputbid,outtputask" '
BEGIN{
FS=OFS=","
}
FNR==NR{
arr[$1]=$0
next
}
($1 in arr) && arr[$1]!=$0{
val=$1
$1=""
sub(/^,/,"")
if(!found){
print header
found=1
}
print arr[val],$0
}' Input_file1 Input_file2
Explanation: Adding detailed explanation for above.
awk -v header="pair,inputbid,inputask,outputbid,outtputask" ' ##Starting awk program from here and setting this to header value here.
BEGIN{ ##Starting BEGIN section of this program from here.
FS=OFS="," ##Setting field separator and output field separator as comma here.
}
FNR==NR{ ##Checking condition FNR==NR which will be TRUE when Input_file1 is being read.
arr[$1]=$0 ##Creating arr with index $1 and keep value as current line.
next ##next will skip all further statements from here.
}
($1 in arr) && arr[$1]!=$0{ ##Checking condition if first field is present in arr and its value NOT equal to $0
val=$1 ##Creating val which has current line value in it.
$1="" ##Nullifying irst field here.
sub(/^,/,"") ##Substitute starting , with NULL here.
if(!found){ ##Checking if found is NULL then do following.
print header ##Printing header here only once.
found=1 ##Setting found here.
}
print arr[val],$0 ##Printing arr with index of val and current line here.
}' Input_file1 Input_file2 ##Mentioning Input_files here.
With bash process substitution, then join and then choosing with awk:
# print header
printf "%s\n" "pair,inputbid,inputask,outputbid,outtputask"
# remove first line from both files, then sort them on first field
# then join them on first field and output first 5 fields
join -t, -11 -21 -o1.1,1.2,1.3,2.2,2.3 <(tail -n +2 file1 | sort -t, -k1) <(tail -n +2 file2 | sort -t, -k1) |
# output only those lines, that columns differ
awk -F, '$2 != $4 || $3 != $5'

Vlookup using awk command

I have two files in my linux server.
File 1
9190784
9197256
9170546
9184139
9196854
File 2
S NO.,Column1,Column2,Column3
72070,9196854,TGM,AP
72071,9172071,BGM,MP
72072,9184139,AGM,KN
72073,9172073,TGM,AP
I want to write a script or a single line command in bash using awk command, so as whatever the element in File -1 should match the same with column 1 in File -2 and print Column 1, Column2 and Column3. Also if any entry is not found it should print entry from file 1 and print NA in Column 2 and Column 3
Output : it should redirect the output to a new file as below.
new_file
9190784,TGM,AP
9197256,NA,NA
9170546,NA,NA
9184139,AGM,KN
9196854,TGM,AP
I hope the query is understandable. Anyone please help me on the same.
standard join operation with awk
$ awk 'BEGIN {FS=OFS=","}
NR==FNR {a[$2]=$3 OFS $4; next}
{print $1, (($1 in a)?a[$1]:"NA" OFS "NA")} file2 file1
substring variation (not tested)
$ awk 'BEGIN {FS=OFS=","}
NR==FNR {a[substr($2,1,7)]=$3 OFS $4; next}
{key=substr($1,1,7);
print $1, ((key in a)?a[key]:"NA" OFS "NA")} file2 file1
Does it have to be awk? It's done with join:
Having two files:
echo '9190784
9197256
9170546
9184139
9196854' >file2
echo 'S NO.,Column1,Column2,Column3
72070,9196854,TGM,AP
72071,9172071,BGM,MP
72072,9184139,AGM,KN
72073,9172073,TGM,AP' > file1
One can join the on , as separator on the second field from the first file1 -12 with removed the first header line tail -n +2 and sorted using the second field sort -t, -k2 with the first field from the second file -21 sorted sort.
join -t, -12 -21 -o1.2,1.3,1.4 <(tail -n +2 file1 | sort -t, -k2) <(sort file2)
will output:
9184139,AGM,KN
9196854,TGM,AP

Compare two columns of two files and display the third column with input if it matching of not in unix

I would like to compare the first two columns of two files file1.txt and file2.txt and if they match to write to another file output.txt with the third column of both file1,file 2 along with details if it matches or not .
file1.txt
ab|2001|name1
cd|2000|name2
ef|2002|name3
gh|2003|name4
file2.txt
xy|2001|name5
cd|2000|name6
ef|2002|name7
gh|2003|name8
output.txt
name1 name5 does not match
name2 name6 matches
name3 name7 matches
name4 name8 matches
Welcome to stack overflow, could you please try following and let me know if this helps you.
awk -F"|" 'FNR==NR{a[$2]=$1;b[$2]=$3;next} ($2 in a) && ($1==a[$2]){print b[$2],$3,"matched properly.";next} {print b[$2],$3,"Does NOT matched."}' file1.txt file2.txt
EDIT: Adding a non-one liner form of solution too here.
awk -F"|" '
FNR==NR{
a[$2]=$1;
b[$2]=$3;
next
}
($2 in a) && ($1==a[$2]){
print b[$2],$3,"matched properly.";
next
}
{
print b[$2],$3,"Does NOT matched."
}
' file1.txt file2.txt
Explanation: Adding explanation for above code.
awk -F"|" ' ##Starting awk program from here and setting field separator as | here.
FNR==NR{ ##Checking condition if FNR==NR which will be TRUE when file1.txt is being read.
a[$2]=$1; ##Creating an array with named a whose index is $2 and value is $1.
b[$2]=$3; ##Creating an array named b whose index is $2 and value is $3.
next ##next will skip all further statements from here.
} ##Closing BLOCK for FNR==NR condition here.
($2 in a) && ($1==a[$2]){ ##Checking condition if $2 is in array a AND first field is equal to value of array a value of index $2.
print b[$2],$3,"matched properly."; ##Printing array b value with index $2 and 3rd field with string value matched properly.
next ##next will skip all statements from here.
} ##Closing BLOCK for above condition here.
{
print b[$2],$3,"Does NOT matched." ##Printing value of array b with index $2 and 3rd field with string Does NOT matched here.
}
' file1.txt file2.txt ##Mentioning Input_file names here.
You can use paste and awk to get what you want.
Below solution is assuming the fields in file1 and file2 will be always delimited by "|"
paste -d "|" file1.txt file2.txt | awk -F "|" '{ if( $1 == $4 && $2 == $5 ){print $3, $6, "matches"} else {print $3, $6, "does not match"} }' > output.txt

AWK: search substring in first file against second

I have the following files:
data.txt
Estring|0006|this_is_some_random_text|more_text
Fstring|0010|random_combination_of_characters
Fstring|0028|again_here
allids.txt (here the columns are separated by semicolon; the real input is tab-delimited)
Estring|0006;MAR0593
Fstring|0002;MAR0592
Fstring|0028;MAR1195
please note: data.txt: the important part is here the first two "columns" = name|number)
Now I want to use awk to search the first part (name|number) of data.txt in allids.txt and output the second column (starting with MAR)
so my expected output would be (again tab-delimited):
Estring|0006|this_is_some_random_text|more_text;MAR0593
Fstring|0010|random_combination_of_characters
Fstring|0028|again_here;MAR1195
I do not know now how to search that first conserved part within awk, the rest should then be:
awk 'BEGIN{FS=OFS="\t"} FNR == NR { a[$1] = $1; next } $1 in a { print a[$0], [$1] }' data.txt allids.txt
I would use a set of field delimiters, like this:
awk -F'[|\t;]' 'NR==FNR{a[$1"|"$2]=$0; next}
$1"|"$2 in a {print a[$1"|"$2]"\t"$NF}' data.txt allids.txt
In your real-data example you can remove the ;. It is in here just to be able to reproduce the example in the question.
Here is another awk that uses a different field separator for both files:
awk -F ';' 'NR==FNR{a[$1]=FS $2; next} {k=$1 FS $2}
k in a{$0=$0 a[k]} 1' allids.txt FS='|' data.txt
Estring|0006|this_is_some_random_text|more_text;MAR0593
Fstring|0010|random_combination_of_characters
Fstring|0028|again_here;MAR1195
This command uses ; as FS for allids.txt and uses | as FS for data.txt.

Compare two columns of different files and add new column if it matches

I would like to compare the first two columns of two files, if matched need to print yes else no.
input.txt
123,apple,type1
123,apple,type2
456,orange,type1
6567,kiwi,type2
333,banana,type1
123,apple,type2
qualified.txt
123,apple,type4
6567,kiwi,type2
output.txt
123,apple,type1,yes
123,apple,type2,yes
456,orange,type1,no
6567,kiwi,type2,yes
333,banana,type1,no
123,apple,type2,yes
I was using the below command for split the data, and then i will add one more column based on the result.
Now the the input.txt has duplicate(1st column) so the below method is not working, also the file size was huge.
Can we get the output.txt in awk one liner?
comm -2 -3 input.txt qualified.txt
$ awk -F, 'NR==FNR {a[$1 FS $2];next} {print $0 FS (($1 FS $2) in a?"yes":"no")}' qual input
123,apple,type1,yes
123,apple,type2,yes
456,orange,type1,no
6567,kiwi,type2,yes
333,banana,type1,no
123,apple,type2,yes
Explained:
NR==FNR { # for the first file
a[$1 FS $2];next # aknowledge the existance of qualified 1st and 2nd field pairs
}
{
print $0 FS ($1 FS $2 in a?"yes":"no") # output input row and "yes" or "no"
} # depending on whether key found in array a
No need to redefine the OFS as $0 isn't modified and doesn't get rebuilt.
You can use awk logic for this as below. Not sure why do you mention one-liner awk command though.
awk -v FS="," -v OFS="," 'FNR==NR{map[$1]=$2;next} {if($1 in map == 0) {$0=$0FS"no"} else {$0=$0FS"yes"}}1' qualified.txt input.txt
123,apple,type1,yes
123,apple,type2,yes
456,orange,type1,no
6567,kiwi,type2,yes
333,banana,type1,no
123,apple,type2,yes
The logic is
The command FNR==NR parses the first file qualified.txt and stores the entries in column 1 and 2 in first file with first column being the index.
Then for each of the line in 2nd file {if($1 in map == 0) {$0=$0FS"no"} else {$0=$0FS"yes"}}1 the entry in column 1 does not match the array, append the no string and yes otherwise.
-v FS="," -v OFS="," are for setting input and output field separators
It looks like all you need is:
awk 'BEGIN{FS=OFS=","} NR==FNR{a[$1];next} {print $0, ($1 in a ? "yes" : "no")}' qualified.txt output.txt

Resources