AWK syntax error while using the IF statement - bash

I am very new to AWK although I have previously used the command prompt/terminal.
I have this script below where I am creating subsets of data based on Country Code and State Code. But I get a syntax error.
BEGIN{
FS = "\t"
OFS = "\t"
}
# Subset data from the states you need for all years
if ($5 == "IN-GA" || $5 == "IN-DD" || $5 == "IN-DN" || $5 == "IN-KA" || $5 == "IN-KL" || $5 == "IN-MH" || $5 == "IN-TN" || $5 == "IN-GJ"){
if (substr($17, 1, 4) == "2000"){
print $5, $12, $13, $14, $15, $16, $17, $22, $23, $24, $25, $26, $28 > "Y2000_India_sampling_output.txt"
}
}
On Cygwin, I refer to the script and I run the below lines of code and you see the syntax error immediately:
$ gawk -f sampling_India.awk sampling_relFeb-2017.txt
gawk: sampling_India.awk:20: gawk if ($5 == "IN-GA" || $5 == "IN-DD" || $5 == "IN-DN" || $5 == "IN-KA" || $5 == "IN-KL" || $5 == "IN-MH" || $5 == "IN-TN" || $5 == "IN-GJ"){
gawk: sampling_India.awk:20: ^ syntax error
Any thoughts?

Your if condition is not enclosed in {...} block.
Have it like this:
BEGIN {
FS = OFS = "\t"
}
# Subset data from the states you need for all years
$5 ~ /^IN-(GA|DD|DN|KA|KL|MH|TN|GJ)$/ && substr($17, 1, 4) == "2000" {
print $5, $12, $13, $14, $15, $16, $17, $22, $23, $24, $25, $26, $28 > "Y2000_India_sampling_output.txt"
}
Note how using regex you can combine multiple == conditions into a single condition.

Related

AWK: Read a File "x" and compare it's values against the value of Column 1 and column 2 of file "y"

I'm trying to read an file and manipulate the value of it's column. For a particular row in File X, If the column 6 is set to 2, then i will replace it with "REVERSE-CHECK", also check if it's 2nd Column (File X) value is matching with Column 2(File Y) and 3rd Column(File X) value is matching with Column 1(File Y) then change the 7th Column value of File X as "ACCEPTED" otherwise mark it "NON ACCEPTABLE".
File X:
2019-08-01 00:00:04,00000011111,0000002221,111111000000000,2,2,0
2019-08-01 00:00:08,00000011112,0000002222,211111000000000,2,12,0
2019-08-01 00:00:20,00000011113,0000002223,311111000000000,2,12,0
2019-08-01 00:00:04,00000011114,0000002224,411111000000000,2,2,0
2019-08-01 00:00:08,00000011115,0000002225,511111000000000,2,2,0
2019-08-01 00:00:20,00000011116,0000002226,611111000000000,2,8,0
File Y:
0000002221,00000011111
0000002226,00000011116
0000002223,00000011114
Exepected Output:
2019-08-01 00:00:04,00000011111,0000002221,111111000000000,INTERESTING,REVERSE-CHECK,ACCEPTABLE
2019-08-01 00:00:08,00000011112,0000002222,211111000000000,INTERESTING,SIMPLE-CHECK,NON-ACCEPTABLE
2019-08-01 00:00:20,00000011113,0000002223,311111000000000,INTERESTING,SIMPLE-CHECK,NON-ACCEPTABLE
2019-08-01 00:00:04,00000011114,0000002224,411111000000000,INTERESTING,REVERSE-CHECK,NON-ACCEPTABLE
2019-08-01 00:00:08,00000011115,0000002225,511111000000000,INTERESTING,REVERSE-CHECK,NON-ACCEPTABLE
2019-08-01 00:00:20,00000011116,0000002226,611111000000000,INTERESTING,BASIC-CHECK,ACCEPTABLE
Code Block 1: This helped me to manipulate the values of $5 and $6 column easily.
awk -F, '{
if ( $5 == "1" )
$5 = "INTERESTING"
else if ( $5 == "2" )
$5 = "IMPORTANT";
else
$5 = "UNKNOWN";
if ( $6 == "2" )
$6="REVERSE-CHECK";
else if ( $6 == "12" )
$6="SIMPLE-CHECK";
else if ( $6 == "8" )
$6="BASIC-CHECK";
else
$6="UNHANDLED";
print }' OFS=, $exeDir/FileX.log > /home/standardOutput.log
Code Block 2: When i'm trying to manipulate the value of 7th columns through nested check's. It's not working at all.
awk '
BEGIN { FS = OFS = ","
}
FNR == NR {
i[$1]=$1
j[$1]=$2
next
}
{ if($3 in i){
if ($2 in j){
$7 = "ACCEPTABLE";
}
}
else{
$7 = "NOT ACCEPTABLE";
}
}
1' FileY.log FileX.log
I'm having difficulties to merge these codes. Please help.
Your posted expected output doesn't match the description of what you want to do so idk if this is right or not but this does what I think you described:
$ cat tst.awk
BEGIN {
FS = OFS = ","
}
NR==FNR {
map[$2] = $1
next
}
{
$6 = ( $6 == 2 ? "REVERSE-CHECK" : $6 )
$7 = ( ($2 in map) && ($3 == map[$2]) ? "ACCEPTED" : "NON ACCEPTABLE" )
print
}
$ awk -f tst.awk fileY fileX
2019-08-01 00:00:04,00000011111,0000002221,111111000000000,2,REVERSE-CHECK,ACCEPTED
2019-08-01 00:00:08,00000011112,0000002222,211111000000000,2,12,NON ACCEPTABLE
2019-08-01 00:00:20,00000011113,0000002223,311111000000000,2,12,NON ACCEPTABLE
2019-08-01 00:00:04,00000011114,0000002224,411111000000000,2,REVERSE-CHECK,NON ACCEPTABLE
2019-08-01 00:00:08,00000011115,0000002225,511111000000000,2,REVERSE-CHECK,NON ACCEPTABLE
2019-08-01 00:00:20,00000011116,0000002226,611111000000000,2,8,ACCEPTED
Adding an interpretation of the code you posted does produce the expected output you posted (though the relationship to the 2 fields from fileX is ambiguous in your question so I'm guessing at what you really wanted with map[]):
$ cat tst.awk
BEGIN {
FS = OFS = ","
}
NR==FNR {
map[$2] = $1
next
}
{
if ( $5 == 1 ) $5 = "INTERESTING"
else if ( $5 == 2 ) $5 = "IMPORTANT"
else $5 = "UNKNOWN"
if ( $6 == 2 ) $6 = "REVERSE-CHECK"
else if ( $6 == 12 ) $6 = "SIMPLE-CHECK"
else if ( $6 == 8 ) $6 = "BASIC-CHECK"
else $6 = "UNHANDLED"
$7 = ( ($2 in map) && ($3 == map[$2]) ? "ACCEPTED" : "NON ACCEPTABLE" )
print
}
.
$ awk -f tst.awk fileY fileX
2019-08-01 00:00:04,00000011111,0000002221,111111000000000,IMPORTANT,REVERSE-CHECK,ACCEPTED
2019-08-01 00:00:08,00000011112,0000002222,211111000000000,IMPORTANT,SIMPLE-CHECK,NON ACCEPTABLE
2019-08-01 00:00:20,00000011113,0000002223,311111000000000,IMPORTANT,SIMPLE-CHECK,NON ACCEPTABLE
2019-08-01 00:00:04,00000011114,0000002224,411111000000000,IMPORTANT,REVERSE-CHECK,NON ACCEPTABLE
2019-08-01 00:00:08,00000011115,0000002225,511111000000000,IMPORTANT,REVERSE-CHECK,NON ACCEPTABLE
2019-08-01 00:00:20,00000011116,0000002226,611111000000000,IMPORTANT,BASIC-CHECK,ACCEPTED

Compare two files using awk having many columns and get the column in which data is different

file 1:
field1|field2|field3|
abc|123|234
def|345|456
hij|567|678
file2:
field1|field2|field3|
abc|890|234
hij|567|658
desired output:
field1|field2|field3|
abc|N|Y
def|345|456
hij|Y|N
I need to compare.if the fields match , then it shld put Y , else N in the output file.
Using awk, you may try this:
awk -F '|' 'FNR == NR {
p = $1
sub(p, "")
a[p] = $0
next
}
{
if (FNR > 1 && $1 in a) {
split(a[$1], b, /\|/)
printf "%s", $1 FS
for (i=2; i<=NF; i++)
printf "%s%s", ($i == b[i] ? "Y" : "N"), (i == NF ? ORS : FS)
}
else
print
}' file2 file1
field1|field2|field3|
abc|N|Y
def|345|456
hij|Y|N
Code Demo

Subtract single largest number from multiple specific columns in awk

I have a comma delimited file that looks like
R,F,TE,K,G,R
1,0,12,f,1,18
2,1,17,t, ,17
3,1, , ,1,
4,0,15, ,0,16
There are some items which are missing, also first row is the header which I want to ignore. I wanted to calculate the second smallest number in specific columns and subtract it from all the elements in that column unless the value in the column is the minimum value. In this example, I want to subtract the second minimum values from columns 3 and 6 in the example. So, my final values would be:
R,F,TE,K,G,R
1,0,12,f,1,1
2,1, 2,t, ,0
3,1, , ,0,
4,0, 0, ,0,16
I tried individually using single columns and giving hand-coded thresholds to make it second largest by
awk 'BEGIN {FS=OFS=",";
};
{ min=1000000;
if($3<min && $3 != "" && $3>12) min = $3;
if($3>0) $3 = $3-min+1;
print}
END{print min}
' try1.txt
It finds the min alright but the output is not as expected. There should be an easier way in awk.
I'd loop over the file twice, once to find the minima, once to adjust the values. It's a trade-off of time versus memory.
awk -F, -v OFS=, '
NR == 1 {min3 = $3; min6 = $6}
NR == FNR {if ($3 < min3) min3 = $3; if ($6 < min6) min6 = $6; next}
$3 != min3 {$3 -= min3}
$6 != min6 {$6 -= min6}
{print}
' try1.txt try1.txt
For prettier output:
awk -F, -v OFS=, '
NR == 1 {min3 = $3; min6 = $6; next}
NR == FNR {if ($3 < min3) min3 = $3; if ($6 < min6) min6 = $6; next}
FNR == 1 {len3 = length("" min3); len6 = length("" min6)}
$3 != min3 {$3 = sprintf("%*d", len3, $3-min3)}
$6 != min6 {$6 = sprintf("%*d", len6, $6-min6)}
{print}
' try1.txt try1.txt
Given the new requirements:
min2_3=$(cut -d, -f3 try1.txt | tail -n +2 | sort -n | grep -v '^ *$' | sed -n '2p')
min2_6=$(cut -d, -f6 try1.txt | tail -n +2 | sort -n | grep -v '^ *$' | sed -n '2p')
awk -F, -v OFS=, -v min2_3=$min2_3 -v min2_6=$min2_6 '
NR==1 {print; next}
$3 !~ /^ *$/ && $3 >= min2_3 {$3 -= min2_3}
$6 !~ /^ *$/ && $6 >= min2_6 {$6 -= min2_6}
{print}
' try1.txt
R,F,TE,K,G,R
1,0,12,f,1,1
2,1,2,t, ,0
3,1, , ,1,
4,0,0, ,0,16
BEGIN{
FS=OFS=","
}
{
if(NR==1){print;next}
if(+$3)a[NR]=$3
if(+$6)b[NR]=$6
s[NR]=$0
}
END{
asort(a,c)
asort(b,d)
for(i=2;i<=NR;i++){
split(s[i],t)
if(t[3]!=c[1]&&+t[3]!=0)t[3]=t[3]-c[2]
if(t[6]!=d[1]&&+t[6]!=0)t[6]=t[6]-d[2]
print t[1],t[2],t[3],t[4],t[5],t[6]
}
}

bash- replace empty field with . in piped statement

I'm looping over a list of files, while for each file im scanning for specific things to grep out.
# create .fus file
grep DIF $file | awk '{
if ( $7 != $13 )
print $4, "\t", $5, "\t", $20, "\t", $10, "\t", $11, "\t", $22, "\t", "Null";
}' > $file_name.fus
# create .inv_fus file
grep DIF $file | awk '{
if ( $7 == $13 )
print $4, "\t", $5, "\t", $20, "\t", $10, "\t", $11, "\t", $22, "\t", "Null";
}' > $file_name.inv_fus
# create .del file
echo -e '1\t1\t1\t1' > ${file_name}.del
echo -e '1\t1\t1\t3' >> ${file_name}.del
grep DEL ${file} | awk '{print $4, "\t", $5, "\t", $12, "\t", "2"}' >> ${file_name}.del
The first awk checks if the values of column 7 and 13 are different, if they are, write to file.
The second awk checks if hte values are the same, if they are, write to file. The third creates a file with 2 lines always the same, and the rest filled in by lines containing 'DEL'.
The output files I use to generate a plot, but this fails because some fields are empty. How can I change my code (I guess the awk statement ?) so that it checks for empty fields (for columns 4, 5, 20, 10, 11 and 22) and replace empty columns with dots '.' ?
Like the other response has said, there's a lot of simplification that could happen here, but without knowing input or expected output its hard to comment on what changes would be beneficial.
Regardless, the question seems to boil down to replacing empty fields with dots in your output for some process down the line. Adding a function like this to your awk scripts would seem to do the trick:
function clean() {
for(i = 1; i <= NF; i++) { if($i==""){$i="."}; }
}
For example, given this input in test.txt:
a1,a2,a3,a4
b1,,b3,b4
,,c3,
d1,d2,d3,
Running the following awk results in empty fields being periods.
awk -F',' 'function clean() {
for(i = 1; i <= NF; i++) { if($i==""){$i="."}; }
}
BEGIN {OFS=","}
{clean(); print;}' test.txt
Example output:
a1,a2,a3,a4
b1,.,b3,b4
.,.,c3,.
d1,d2,d3,.
Let's start by cleaning up your script. Replace the whole thing with just one simple awk command:
awk -v file_name="$file_name" '
BEGIN {OFS="\t"; print 1, 1, 1, 1 ORS 1, 1, 1, 3 > (file_name ".del")}
/DIF/ {print $4, $5, $20, $10, $11, $22, "Null" > (file_name "." ($7==$13?"inv_":"") "fus")}
/DEL/ {print $4, $5, $12, 2 > (file_name ".del")}
' "$file"
Now, update your question with sample input and expected output that captures what else you need it to do.

awk check if the value of a specific field exist in an array or a file

Please i have this code:
awk -F' *\\| *' 'FNR==NR {a[$1];next} $2==2 && $4==3 && $5 in a && ($3==11 || $8 in ..)' file1.conf file2
And i want to add another condition which verifies that the field $8 exist in an array of value [1, 2, 3, 4, 5 ..] ?
I use already a file to check if the value of $5 exist in the file but i don't know how can I add another array to check existence of the value of filed $8 in the array or file
here is the list.txt file:
1 #error code
2 #submit code
3 #delivery code
.
.
thank you
Assuming your list data is in a file called list.txt you can then modify your awk command to:
awk -F' *\\| *' -v lf='list.txt' 'BEGIN {
while (getline p < lf) {
sub(/[ #].*$/, "", p);
d[p];
}
close(lf)
}
FNR==NR {
a[$1];next
}
$2==2 && $4==3 && $5 in a && ($3=11 || $8 in d)' file1.conf file2

Resources