Awk syntax error in loop - shell

cat file1
xi=zaoshui jiao=#E0488_5#
chi=fan da qiu=#E0488_3#
gong=zuo you xi #E0977_5#
cat file2
#E0488_3# #E21562_3#
#E0488_5# #E21562_5#
#E0977_3# #E21630_3#
#E0977_5# #E21630_5#
#E0977_6# #E21631_1#
Purpose: if $NF in file1 found in file2 $1, than replace $NF in file1 with file2 $2.otherwise, makes no change.
My Code:
awk 'NR==FNR{a[$1]=$2;next}
{split($NF,a,"=");for($NF in a){$NF=a[$NF]}}1' test2.txt test1.txt
But it comes error:
awk: cmd. line:1: NR==FNR{a[$1]=$2;next}{split($NF,a,"=");for($NF in a){$NF=a[$NF]}}1
awk: cmd. line:1: ^ syntax error
Does my code look right? It seems grammar issue happens. How can I improve it?
My expect output:
xi=zaoshui jiao=#E21562_5#
chi=fan da qiu=#E21562_3#
gong=zuo you xi #E21630_5#

for($NF in a) is not valid syntax, ($NF gives value)
it can be like
for (var in array)
body
Read More from : GNU AWK Scanning-an-Array
Used sub($NF,a[$NF]) to retain your original field separator, since last record, last field has space before, whereas other lines last field has = before, assuming values doesn't repeat other than last field.
Test Results:
$ cat file1
xi=zaoshui jiao=#E0488_5#
chi=fan da qiu=#E0488_3#
gong=zuo you xi #E0977_5#
$ cat file2
#E0488_3# #E21562_3#
#E0488_5# #E21562_5#
#E0977_3# #E21630_3#
#E0977_5# #E21630_5#
#E0977_6# #E21631_1#
$ awk 'FNR==NR{a[$1]=$NF;next}($NF in a){sub($NF,a[$NF])}1' file2 FS='[ =]' file1
xi=zaoshui jiao=#E21562_5#
chi=fan da qiu=#E21562_3#
gong=zuo you xi #E21630_5#

Not sure completely but could you please try following and do let me know if this helps you.
awk 'FNR==NR{a[$1]=$NF;next} ($NF in a){$NF=a[$NF]} 1' FS="=" file2 FS='[= ]' OFS="=" file1
Output will be as follows.
xi=zaoshui jiao=#E0488_5#
chi=fan da qiu=#E0488_3#
gong=zuo you xi #E0977_5#
EDIT: Adding explanation too now for same.
awk '
FNR==NR{ ##Checking condition FNR==NR which will be TRUE when first Input_file named file2 is being read.
a[$1]=$NF; ##making an array named a whose index is $1 of current line and value is last field of the current line.
next ##next will skip all the further statements now.
}
($NF in a){ ##Checking condition here if last field of current line of Input_file file1 is present in array a if yes then do following.
$NF=a[$NF] ##creating last field value to array a value whose index is $NF of current line in Input_file file1.
}
1 ##1 will print the lines for Input_file file1.
' FS="=" file2 FS='[= ]' OFS="=" file1 ##Setting FS="=" for file2 and setting FS value to either = or space for file1 and setting OFS value to = for file1 too.

My code is as below, hope it could be helpful even if it's not the most efficient answer.
awk '$NF ~ /=/ {gsub("="," # ",$NF)}{print $0}' file1 > file3
cat file3
xi=zaoshui jiao # #E0488_5#
chi=fan da qiu # #E0488_3#
gong=zuo you xi #E0977_5#
As you said ,replace file1 with file3, if $NF of file3 found in file2 $1, than replace $NF of file3 with file2 $2
awk 'NR==FNR {a[$1]=$2;next}($NF in a){$NF=a[$NF]}1' file2 file3 | sed 's/ # /=/g'
xi=zaoshui jiao=#E21562_5#
chi=fan da qiu=#E21562_3#
gong=zuo you xi #E21630_5#

Related

awk-IF...ELSE IF issue in command

cat file1
xizaoshuijiao #E0488_5#
chifandaqiu #E0488_3#
gongzuoyouxi #E0977_5#
cat file2
#E0488_3# #E0488_3#
#E0488_5# #E0488_5#
#E0977_3# #E0977_3#
#E0977_5# #E0977_5#
#E0977_6# #E0977_6#
Purpose:if $NF in file1 found in file2 $1, than replace $NF in file1 with file2 $2.otherwise, makes no change.
My code:
awk '\
NR==FNR{a[$1]=$1;b[$2]=$2;next}\
{if($NF in a)\
{$NF=b[FNR];print $0}\
else if!($NF in a)\
{print $0}\
}' file2 file1
Then it came error:
awk: cmd. line:5: else if!($NF in a)\
awk: cmd. line:5: ^ syntax error
awk: cmd. line:6: {print $0}\
awk: cmd. line:6: ^ syntax error
So it seems that "!" issue. because I want to print all content in file1(both changed line and unchanged line).How can I do it ?
you can rewrite it in this form
awk 'NR==FNR {a[$1]=$2; next}
$NF in a {$2=a[$1]}1' file2 file1
since your file2 has the same values for $1 and $2, it seems useless.
Since you want to print unconditionally, don't print in the condition block. Here 1 corresponds to {print} which is the same as {print $0}
Replace:
if!($NF in a)
With:
if(!($NF in a))
! is part of the test-condition and awk expects the test-condition to all be inside parens.
Here comes my code after verification.
awk '\
NR==FNR{a[$1]=$1;b[$2]=$2;next}\
{if($NF in a)\
{$NF=b[FNR];print $0}\
else # use else... it will work, no need else if... , but why ? How can I achieve it with else if !($NF in a)
{print $0}\
}' file2 file1

awk: two files are queried

I have two files
file1:
>string1<TAB>Name1
>string2<TAB>Name2
>string3<TAB>Name3
file2:
>string1<TAB>sequence1
>string2<TAB>sequence2
I want to use awk to compare column 1 of respective files. If both files share a column 1 value I want to print column 2 of file1 followed by column 2 of file2. For example, for the above files my expected output is:
Name1<TAB>sequence1
Name2<TAB>sequence2
this is my code:
awk 'BEGIN{FS=OFS="\t"} FNR == NR { a[$1] = $1; next } $1 in a { print a[$2], $2 }' file1 file2 >out
But the only thing I get is an empty first columnsequence
where is the error here?
your assignment is not right.
$ awk 'BEGIN {FS=OFS="\t"}
NR==FNR {a[$1]=$2; next}
$1 in a {print a[$1],$2}' file1 file2
Name1 sequence1
Name2 sequence2

Compare all but last N Columns across two files in bash

I have 2 files: one with 18 columns; another with many more. I need to find the rows that mismatch on ONLY the first 18 columns while ignoring the rest in the other file. However, I need to preserve and print the entire row (cut will not work).
File 1:
F1 F2 F3....F18
A B C.... Y
AA BB CC... YY
File 2:
F1 F2 F3... F18... F32
AA BB CC... YY... 123
AAA BBB CCC... YYY...321
Output Not In File 1:
AAA BBB CCC YYY...321
Output Not In File 2:
A B C...Y
If possible, I would like to use diff or awk with as few loops as possible.
You can use awk:
awk '{k=""; for(i=1; i<=18; i++) k=k SUBSEP $i} FNR==NR{a[k]; next} !(k in a)' file1 file2
For each row in both files we are first creating a key by concatenating first 18 fields
We are then storing this key in an associative array while iterating first file
Finally we print each row from 2nd file when this new key value is not found in our associative array.
You can use grep:
grep -vf file1 file2
grep -vf <(cut -d" " -f1-18 file2) file1
to get set differences between two files, you'll need little more, similar to #anubhava's answer
$ awk 'NR==FNR{f1[$0]; next}
{k=$1; for(i=2;i<=18;i++) k=k FS $i;
if(k in f1) delete f1[k];
else f2[$0]}
END{print "not in f1";
for(k in f2) print k;
print "\nnot in f2";
for(k in f1) print k}' file1 file2
can be re-written to preserve order in file2
$ awk 'NR==FNR{f1[$0]; next}
{k=$1; for(i=2;i<=18;i++) k=k FS $i;
if(k in f1) delete f1[k];
else {if(!p) print "not in f1";
f2[$0]; print; p=1}}
END{print "\nnot in f2";
for(k in f1) print k}' file1 file2

Compare each row of one file with each row of another file

I have two files:
file1
cat file1
A,1
B,2
C,2
D,3
file2
cat file2
A
A
A
B
B
C
C
D
Desired output
cat output
A,A,1
A,A,1
A,A,1
B,B,2
B,B,2
C,C,2
C,C,2
D,D,3
As you can see, each line of file1 should be matched with each line of file2 and if they match, the line from file1 should be added to the matched line in file2. I have tried join but it doesn't work. I guess the search needs to be recursive but I am not sure how to do it when two files are involved.
Any help would be greatly appreciated.
Thanks
Using awk:
awk -F, -v OFS=, 'FNR==NR {a[$1]=$0;next} $1 in a{print $1, a[$1]}' file1 file2
A,A,1
A,A,1
A,A,1
B,B,2
B,B,2
C,C,2
C,C,2
D,D,3
join -t',' -o'1.1,2.1,2.2' file2 file1
this line does it.

extract different lines from files using Bash

I have two files and I use the "comm -23 file1 file2" command to extract the lines that are different from a file to another.
I would also need something that extracts the different lines but also preserves the string "line_$NR".
Example:
file1:
line_1: This is line0
line_2: This is line1
line_3: This is line2
line_4: This is line3
file2:
line_1: This is line1
line_2: This is line2
line_3: This is line3
I need this output:
differences file1 file2:
line_1: This is line0.
In conclusion I need to extract the differences as if the file has not line_$NR at the beginning but when I print the result I need to also print line_$NR.
Try using awk
awk -F: 'NR==FNR {a[$2]; next} !($2 in a)' file2 file1
Output:
line_1: This is line0
Short Description
awk -F: ' # Set filed separator as ':'. $1 contains line_<n> and $2 contains 'This is line_<m>'
NR==FNR { # If Number of records equal to relative number of records, i.e. first file is being parsed
a[$2]; # store $2 as a key in associative array 'a'
next # Don't process further. Go to next record.
}
!($2 in a) # Print a line if $2 of that line is not a key of array 'a'
' file2 file1
Additional Requirement (In comment)
And if I have multiple ":" in a line : "line_1: This :is: line0"
doesn't work. How can I only take the line_x
In that case, try following (GNU awk only)
awk -F'line_[0-9]+:' 'NR==FNR {a[$2]; next} !($2 in a)' file2 file1
this awk line is longer, however it would work no matter where the differences were located:
awk 'NR==FNR{a[$NF]=$0;next}a[$NF]{a[$NF]=0;next}7;END{for(x in a)if(a[x])print a[x]}' file1 file2
test:
kent$ head f*
==> f1 <==
line_1: This is line0
line_2: This is line1
line_3: This is line2
line_4: This is line3
==> f2 <==
line_1: This is line1
line_2: This is line2
line_3: This is line3
#test f1 f2
kent$ awk 'NR==FNR{a[$NF]=$0;next}a[$NF]{a[$NF]=0;next}7;END{for(x in a)if(a[x])print a[x]}' f1 f2
line_1: This is line0
#test f2 f1:
kent$ awk 'NR==FNR{a[$NF]=$0;next}a[$NF]{a[$NF]=0;next}7;END{for(x in a)if(a[x])print a[x]}' f2 f1
line_1: This is line0

Resources