Compare two files with awk - check if values from two columns in file1 are included somewhere in two columns in file2 - bash

I want to check (each line) if values from file1 (in column $2 && $3) are somewhere included in file2 (in column $1 && $2). If yes, then I would like to print $1, $2, $3 from file1 as well as $3 from file2 (as a 4th column).
File1:
# 139.51 -62.48
# 137.36 -63.36
# 135.44 -64.09
File2:
137.35 -63.36 6.349
137.36 -63.36 6.348
137.37 -63.36 6.346
I've got so far:
awk 'NR == FNR {a[$1$2];c[FNR] =$3;next} $2$3 in a {print $1, $2, $3, c[FNR]}' $file2 $file1 > $output
But somehow, the resulting values in $4 are not equal to the 3rd column of file2. Could someone help me out? Thank you so much! :)
I am new in programming, and use awk and shell so far, so I am always happy about explanations! Thank you!

Since you haven't shown your expected output, so based on your statements only writing this code.
awk 'FNR==NR{a[$2,$3]=$0;next} (($1,$2) in a){print a[$1,$2],$NF}' fiLE1 fiLE2
Output will be as follows.
# 137.36 -63.36 6.348

Related

awk: two files are queried

I have two files
file1:
>string1<TAB>Name1
>string2<TAB>Name2
>string3<TAB>Name3
file2:
>string1<TAB>sequence1
>string2<TAB>sequence2
I want to use awk to compare column 1 of respective files. If both files share a column 1 value I want to print column 2 of file1 followed by column 2 of file2. For example, for the above files my expected output is:
Name1<TAB>sequence1
Name2<TAB>sequence2
this is my code:
awk 'BEGIN{FS=OFS="\t"} FNR == NR { a[$1] = $1; next } $1 in a { print a[$2], $2 }' file1 file2 >out
But the only thing I get is an empty first columnsequence
where is the error here?
your assignment is not right.
$ awk 'BEGIN {FS=OFS="\t"}
NR==FNR {a[$1]=$2; next}
$1 in a {print a[$1],$2}' file1 file2
Name1 sequence1
Name2 sequence2

Compare two csv files, use the first three columns as identifier, then print common lines

I have two csv files. File 1 has 9861 rows and 4 columns while File 2 has 6037 rows and 5 columns.Here are the files.
Link of File 1
Link of File 2
The first three columns are years, months, days respectively.
I want to get the lines in File 2 with the same identifier in File 1 and print this to File 3.
I found this command from some posts here but this only works using one column as identifier:
awk -F, 'NR==FNR {a[$1]=$0;next}; $1 in a {print a[$1]; print}' file1 file2
Is there a way to do this using awk or any simpler commands where I can use the first three columns as identifier?
Ill appreciate any help.
Just use more columns to make the uniqueness you need:
$ awk -F, 'NR==FNR {a[$1, $2, $3] = $0; next}
$1 SUBSEP $2 SUBSEP $3 in a' file1 file2
SUBSEP
is the subscript separator. It has the default value of "\034", and is used to separate the parts of the indices of a multi-dimensional array. Thus, the expression foo["A", "B"] really accesses foo["A\034B"]
awk -F, '{k=$1 FS $2 FS $3} NR==FNR{a[k];next} k in a' file1 file2
Untested of course since you didn't provide any sample input/output.

Split file into different parts based on the data using awk

I need to split the data in file 1 based on it´s data in $4 using awk. The target file-names should be taken from a mapping file 2.
File 1
text;text;text;AB;text
text;text;text;AB;text
text;text;text;CD;text
text;text;text;CD;text
text;text;text;EF;text
text;text;text;EF;text
File 2
AB;valid
CD;not_valid
EF;not_specified
Desired output where the file names are the value of $2 in file 2.
File valid
text;text;text;AB;text
text;text;text;AB;text
File not_valid
text;text;text;CD;text
text;text;text;CD;text
File not_specified
text;text;text;EF;text
text;text;text;EF;text
Any suggestions on how to perform the split?
Using awk:
awk -F';' 'FNR==NR {a[$1]=$2;next} $4 in a {print > a[$4]}
$4 != p {if (p) close(a[p]); p=$4}' file2 file1
It seems that just the first part of the code will work:
awk -F';' 'FNR==NR {a[$1]=$2;next} $4 in a {print > a[$4]}' file2 file1
So, why the last half code:
$4 != p {if (p) close(a[p]); p=$4
is needed? Thanks!

How to check whehter rows of a file within the rows of another file

I am fresh to Shell or Bash. I have file1 with one column and about 5000 rows and file2 have five columns with 240k rows. How can I check whether the values of the 5000 rows in file1 within or not the second column of file2?
$wc -l file1
$5188
$wc -l file2
$240,888
You can do this with awk, something like this:
awk 'NR == FNR {a[$2] = $1; next} {if ($2 in a){print(a[$2], $1)}}' file1 file2
Basically you read the first file in and store its contents in an array "a". Then you read the second file and check if the second field of each line is contained within array "a" and print it if it is.
My answer assumes your fields are separated by white space, if they are not you will have to change the separator. So, if your fields are separated by commas, you will need:
awk -F, .....
The above syntax does work, and it can be further simplified as:
awk 'FNR==NR{a[$1]=$2; next} {print $1, a[$1]}' file2 file1

How can i compare the numeric values of the last two fields in a file?

I have a file that contains the following information
organic_apple;2;organic_apple_212_212
organic_tomato;3;organic_tomato_24_29
fruit_juice;5;fruit_juice_15_15
So i want a file that contains the output
organic_apple;2;organic_apple_212
organic_tomato;3;organic_tomato_24_29
fruit_juice;5;fruit_juice_15
compare the last two fields, if they are the same display it once , if not , display them both
I'm writing in unix bash using solaris
Regardless of the number of underscores, compare the last two:
awk 'BEGIN{FS=OFS="_"}$NF==$(NF-1){--NF;$1=$1}1' test.in
Try this :
awk -vOFS=_ -F_ '{if ($2 == $3) print $1, $2; else print $1, $2, $3}' file.txt
This script removes the last field, if it is equal to the one before last:
awk -F "_" '$NF==$(NF-1){$NF=""}1' file

Resources