awk-IF...ELSE IF issue in command - shell

cat file1
xizaoshuijiao #E0488_5#
chifandaqiu #E0488_3#
gongzuoyouxi #E0977_5#
cat file2
#E0488_3# #E0488_3#
#E0488_5# #E0488_5#
#E0977_3# #E0977_3#
#E0977_5# #E0977_5#
#E0977_6# #E0977_6#
Purpose:if $NF in file1 found in file2 $1, than replace $NF in file1 with file2 $2.otherwise, makes no change.
My code:
awk '\
NR==FNR{a[$1]=$1;b[$2]=$2;next}\
{if($NF in a)\
{$NF=b[FNR];print $0}\
else if!($NF in a)\
{print $0}\
}' file2 file1
Then it came error:
awk: cmd. line:5: else if!($NF in a)\
awk: cmd. line:5: ^ syntax error
awk: cmd. line:6: {print $0}\
awk: cmd. line:6: ^ syntax error
So it seems that "!" issue. because I want to print all content in file1(both changed line and unchanged line).How can I do it ?

you can rewrite it in this form
awk 'NR==FNR {a[$1]=$2; next}
$NF in a {$2=a[$1]}1' file2 file1
since your file2 has the same values for $1 and $2, it seems useless.
Since you want to print unconditionally, don't print in the condition block. Here 1 corresponds to {print} which is the same as {print $0}

Replace:
if!($NF in a)
With:
if(!($NF in a))
! is part of the test-condition and awk expects the test-condition to all be inside parens.

Here comes my code after verification.
awk '\
NR==FNR{a[$1]=$1;b[$2]=$2;next}\
{if($NF in a)\
{$NF=b[FNR];print $0}\
else # use else... it will work, no need else if... , but why ? How can I achieve it with else if !($NF in a)
{print $0}\
}' file2 file1

Related

How to compare two different files and two different columns bash

file 1
Client ID,USER ID,DH SERV, ...
,abs,2022-04-24, ...
,btg,2022-04-24, ...
file 2
abs,124235235
dsg,262356527
If second columnt from first file = first column from the second file, then add second column from second file in the first column 1 file.
I need to get:
Client ID,USER ID,DH SERV, ...
124235235,abs,2022-04-24, ...
,btg,2022-04-24, ...
How can I do this?
That's my attempts, but i don't understand very much
#!/bin/bash
#awk -F, 'FNR==NR{a[$1]=$0;next} ($1 in a){print $2,a[$1]}' mgcom_Deloviye_Linii_RU_conve_2022-04-24.csv wamfactory_6100.csv > test
#awk -F, 'NR==FNR{a[FNR]=$1; next} {$2 == a[FNR] ? a[FNR]","$0 : $0}' mgcom_Deloviye_Linii_RU_conve_2022-04-24.csv wamfactory_6100.csv > test
#awk -F, 'NR==FNR{a[FNR]=$1; next} {$2 == a[FNR] ? a[FNR]","$0 : $0}' wamfactory_6100.csv mgcom_Deloviye_Linii_RU_conve_2022-04-24.csv > test
#awk -F, '{print FILENAME, NR, FNR, a[FNR]=$2,"||", b[NR]=$1}' mgcom_Deloviye_Linii_RU_conve_2022-04-24.csv wamfactory_6100.csv > test
#Work
#awk -F, 'NR==FNR{A[$2]; next}$1 in A' mgcom_Deloviye_Linii_RU_conve_2022-04-24.csv wamfactory_6100.csv > test
#awk -F, 'NR==FNR{A[NR]=$1; next}($2 in A) {print A[NR]}' wamfactory_6100.csv mgcom_Deloviye_Linii_RU_conve_2022-04-24.csv> test
#awk -F, 'NR==FNR{A[$2]=$2; next}$1 in A{print A[$2]}' mgcom_Deloviye_Linii_RU_conve_2022-04-24.csv wamfactory_6100.csv > test
#awk -F, 'FNR==NR{A[$1]=$1; next}$2 in A{print A[$1]}' mgcom_Deloviye_Linii_RU_conve_2022-04-24.csv wamfactory_6100.csv > test
awk -F, 'NR==FNR {arr[$1]=$2 $1; next}
{print arr[$1]","$0}
' wamfactory_6100.csv mgcom_Deloviye_Linii_RU_conve_2022-04-24.csv > test
$ awk 'BEGIN{FS=OFS=","} NR==FNR{a[$1]=$2; next} $2 in a{$1=a[$2]} 1' file2 file1
Client ID,USER ID,DH SERV, ...
124235235,abs,2022-04-24, ...
,btg,2022-04-24, ...
This should do the trick:
awk 'BEGIN{FS=OFS=","}NR==FNR{a[$1]=$2} NR!=FNR{if($2 in a){print a[$2],$2,$3,$4 }else{print $0}}' file2 file1
Client ID,USER ID,DH SERV, ...
124235235,abs,2022-04-24, ...
,btg,2022-04-24, ...
Just using bash's associative and indexed arrays:
f(){
# delimit input/output with comma not whitespace
local IFS=,
# treat m as an associative array
declare -A m
# initialize m from second file
while read -r k v; do
m[$k]=$v
done <"$2"
# process lines of the first file
while read -ra i; do
# get second column
k=${i[1]}
# attempt substitution
i[1]=${m[$k]:-$k}
# print the line (quoting forces interpolation of IFS)
echo "${i[*]}"
done <"$1"
}
f "file 1" "file 2"

Why awk does not work in script file while select somthing between two files

In my project, I have two files.
The content of file1 is like:
bme-zhangyl
chem-abbott
chem-hef
chem-lijun
chem-liuch
chem-lix
chem-nisf
chem-quanm
chem-sunli
chem-taohq
chem-wanggc
chem-wangyg
The content of file2 is like:
bme-zhangyl bme-zhangmm
phy-dongert phy-zhangwq
chem-lijun phy-zhangwq
ls-liulj bio-chenw
phy-zhangyb phy-zhangwq
mee-xingw mee-rongym
cs-likm cs-hisao
cs-nany cs-hisao
cs-pengym cs-hisao
chem-quanm cs-hisao
cs-likq cs-hisao
cs-wujx cs-liuyp
mse-mar mse-liangyy
ccse-xiezy ccse-xiezy
maad-chensm maad-wanmp
Now i have a script file, the content of it is like:
#!/bash/sh
for i in $(cat file1)
do
groupname=`awk '($1=='"$i"'){print $2}' file2`
echo $groupname
done
But it is unlucky, it displays nothing;
i have tried another way:
#!/bash/sh
for i in $(cat file1)
do
groupname=`awk '{if($1=='"$i"')print $2}' file2`
echo $groupname
done
and
#!/bash/sh
for i in $(cat file1)
do
groupname=`awk '{if($1==$i)print $2}' file2`
echo $groupname
done
They are all fail. It seems nothing wrong, who can help me?
The correct output should be:
bme-zhangmm
phy-zhangwq
cs-hisao
Using bare awk:
$ awk 'NR==FNR{a[$1];next}$1 in a{print $2}' file1 file2
Output:
bme-zhangmm
phy-zhangwq
cs-hisao
Explained:
$ awk '
NR==FNR { # has file1 strings to a hash
a[$1]
next
}
$1 in a { # if file2 field 1 keyword was hashed from file1
print $2 # output word from field 2
}' file1 file2
UpdateD: As a script:
#!/bin/sh
awk 'NR==FNR{a[$1];next}$1 in a{print $2}' file1 file2
i have tested:
groupname=`awk '{if($1==" '$i' ") print $2}' UGfrompwdguprst`
it works Ok

Syntax error in awk

I am writing a small programme whose functionality is like following. It uses a file named hist1.dat and the first step is like this
awk '{if(NR>1) s+=$2*($1-x); x=$1}END{print s}' hist1.dat >int1.txt
The hist1.dat is like following
0.259990113102 4
0.261752339307 10
0.263514565512 15
0.265276791717 35
0.267039017922 58
0.268801244127 84
0.270563470333 147
0.272325696537 217
0.274087922742 316
0.275850148947 410
0.277612375152 583
0.279374601357 750
0.281136827562 881
0.282899053767 1004
0.284661279972 1241
The int1.txt contains a value 79.9504 . The second step uses this value and process it further like following
awk '{if(NR>1) s+=$2*($1-x); print $1,s/c1; x=$1}' c1="$(cat int1.txt)" hist1.dat >cutoff1_org.txt
The cutoff1_org.txt looks like following
0.321668030277 0.832661
0.323430256483 0.851265
0.325192482688 0.867267
0.326954708893 0.882167
0.328716935097 0.895634
0.330479161302 0.906677
0.332241387507 0.917587
0.334003613712 0.927175
0.335765839917 0.935287
0.337528066122 0.9432
0.339290292327 0.94968
0.341052518532 0.955675
0.342814744737 0.961097
0.344576970942 0.96555
0.346339197147 0.969231
The next script uses the cutoff1_org.txt file and tries to find a value in column 1 if the corresponding value in column 2 is close to 0.95. This works well too
awk -v c=2 -v t=0.95 'NR==1{d=$c-t;d=d<0?-d:d;v=$c;next}{m=$c-t;m=m<0?-m:m}m<d{d=m;v=$1}END{print v}' cutoff1_org.txt >final_cutoff1.txt
The next two scripts will just use values from final_cutoff1.txt. Something like following
awk '{ if ($1 >= cutoff1) print $1 }' cutoff1="$(cat final_cutoff1.txt)" hist1.dat >hist_oc1.dat
awk '{ if ($1 <= cutoff1) print $1 }' cutoff1="$(cat final_cutoff1.txt)" hist1.dat >hist_uc1.dat
Now I want to put inside a loop like following
for i in {1..22}; do
awk '{if(NR>1) s+=$2*($1-x); x=$1}END{print s}' hist${i}.dat >int${i}.txt
awk '{if(NR>1) s+=$2*($1-x); print $1,s/c${i}; x=$1}' c${i}="$(cat int${i}.txt)" hist${i}.dat >cutoff${i}_org.txt
awk -v c=2 -v t=0.95 'NR==1{d=$c-t;d=d<0?-d:d;v=$c;next}{m=$c-t;m=m<0?-m:m}m<d{d=m;v=$1}END{print v}' cutoff${i}_org.txt >final_cutoff${i}.txt
awk '{ if ($1 >= cutoff${i}) print $1 }' cutoff${i}="$(cat final_cutoff${i}.txt)" hist${i}.dat >hist_oc${i}.dat
awk '{ if ($1 <= cutoff${i}) print $1 }' cutoff${i}="$(cat final_cutoff${i}.txt)" hist${i}.dat >hist_uc${i}.dat
cat hist_oc${i}.dat |stdev >stat_oc${i}.txt
cat hist_uc${i}.dat |stdev >stat_uc${i}.txt
done
However I got an error like following
awk: cmd. line:1: {if(NR>1) s+=$2*($1-x); print $1,s/c${i}; x=$1}
awk: cmd. line:1: ^ syntax error
awk: cmd. line:1: {if(NR>1) s+=$2*($1-x); print $1,s/c${i}; x=$1}
awk: cmd. line:1: ^ syntax error
awk: cmd. line:1: { if ($1 >= cutoff${i}) print $1 }
awk: cmd. line:1: ^ syntax error
I think I have brain fade now. Seems like very easy error to fix. Can anyone please point out and help with it. Thanks a lot in advance

Awk syntax error in loop

cat file1
xi=zaoshui jiao=#E0488_5#
chi=fan da qiu=#E0488_3#
gong=zuo you xi #E0977_5#
cat file2
#E0488_3# #E21562_3#
#E0488_5# #E21562_5#
#E0977_3# #E21630_3#
#E0977_5# #E21630_5#
#E0977_6# #E21631_1#
Purpose: if $NF in file1 found in file2 $1, than replace $NF in file1 with file2 $2.otherwise, makes no change.
My Code:
awk 'NR==FNR{a[$1]=$2;next}
{split($NF,a,"=");for($NF in a){$NF=a[$NF]}}1' test2.txt test1.txt
But it comes error:
awk: cmd. line:1: NR==FNR{a[$1]=$2;next}{split($NF,a,"=");for($NF in a){$NF=a[$NF]}}1
awk: cmd. line:1: ^ syntax error
Does my code look right? It seems grammar issue happens. How can I improve it?
My expect output:
xi=zaoshui jiao=#E21562_5#
chi=fan da qiu=#E21562_3#
gong=zuo you xi #E21630_5#
for($NF in a) is not valid syntax, ($NF gives value)
it can be like
for (var in array)
body
Read More from : GNU AWK Scanning-an-Array
Used sub($NF,a[$NF]) to retain your original field separator, since last record, last field has space before, whereas other lines last field has = before, assuming values doesn't repeat other than last field.
Test Results:
$ cat file1
xi=zaoshui jiao=#E0488_5#
chi=fan da qiu=#E0488_3#
gong=zuo you xi #E0977_5#
$ cat file2
#E0488_3# #E21562_3#
#E0488_5# #E21562_5#
#E0977_3# #E21630_3#
#E0977_5# #E21630_5#
#E0977_6# #E21631_1#
$ awk 'FNR==NR{a[$1]=$NF;next}($NF in a){sub($NF,a[$NF])}1' file2 FS='[ =]' file1
xi=zaoshui jiao=#E21562_5#
chi=fan da qiu=#E21562_3#
gong=zuo you xi #E21630_5#
Not sure completely but could you please try following and do let me know if this helps you.
awk 'FNR==NR{a[$1]=$NF;next} ($NF in a){$NF=a[$NF]} 1' FS="=" file2 FS='[= ]' OFS="=" file1
Output will be as follows.
xi=zaoshui jiao=#E0488_5#
chi=fan da qiu=#E0488_3#
gong=zuo you xi #E0977_5#
EDIT: Adding explanation too now for same.
awk '
FNR==NR{ ##Checking condition FNR==NR which will be TRUE when first Input_file named file2 is being read.
a[$1]=$NF; ##making an array named a whose index is $1 of current line and value is last field of the current line.
next ##next will skip all the further statements now.
}
($NF in a){ ##Checking condition here if last field of current line of Input_file file1 is present in array a if yes then do following.
$NF=a[$NF] ##creating last field value to array a value whose index is $NF of current line in Input_file file1.
}
1 ##1 will print the lines for Input_file file1.
' FS="=" file2 FS='[= ]' OFS="=" file1 ##Setting FS="=" for file2 and setting FS value to either = or space for file1 and setting OFS value to = for file1 too.
My code is as below, hope it could be helpful even if it's not the most efficient answer.
awk '$NF ~ /=/ {gsub("="," # ",$NF)}{print $0}' file1 > file3
cat file3
xi=zaoshui jiao # #E0488_5#
chi=fan da qiu # #E0488_3#
gong=zuo you xi #E0977_5#
As you said ,replace file1 with file3, if $NF of file3 found in file2 $1, than replace $NF of file3 with file2 $2
awk 'NR==FNR {a[$1]=$2;next}($NF in a){$NF=a[$NF]}1' file2 file3 | sed 's/ # /=/g'
xi=zaoshui jiao=#E21562_5#
chi=fan da qiu=#E21562_3#
gong=zuo you xi #E21630_5#

awk: two files are queried

I have two files
file1:
>string1<TAB>Name1
>string2<TAB>Name2
>string3<TAB>Name3
file2:
>string1<TAB>sequence1
>string2<TAB>sequence2
I want to use awk to compare column 1 of respective files. If both files share a column 1 value I want to print column 2 of file1 followed by column 2 of file2. For example, for the above files my expected output is:
Name1<TAB>sequence1
Name2<TAB>sequence2
this is my code:
awk 'BEGIN{FS=OFS="\t"} FNR == NR { a[$1] = $1; next } $1 in a { print a[$2], $2 }' file1 file2 >out
But the only thing I get is an empty first columnsequence
where is the error here?
your assignment is not right.
$ awk 'BEGIN {FS=OFS="\t"}
NR==FNR {a[$1]=$2; next}
$1 in a {print a[$1],$2}' file1 file2
Name1 sequence1
Name2 sequence2

Resources