Using AWK to check a six column txt file - bash

I am brand new to using Awk and I am running into a bit of a problem. I have multiple tab delimited text files that are made up of six columns. The column layout is:
col1=int
col2=float
col3=float
col4=int
col5=int
col6=DATE (yyyy-mm-dd)
The task at hand is to basically due a quality check on the text files to make sure that each column is that type. I also need to skip the first line since each tab delimited text file has has a header. So far this is what I have:
#!/bin/sh
awk < file1.txt -F\\t '
{(NR!=1)}
{if ($1 != int($1)||($2 != /[0-9]+\.[0-9]*/)||($3 != /[0-9]+\.[0-9]*/)||($4 != int($4)||($5 != int($5))print "Error At " NR; }
'
I am not required to use Awk, it is just that it seemed the most appropriate.
EDIT 1:
#!/bin/sh
awk < file1.txt -F\\t '
{if (NR!=1){
if ($1 != int($1)) print "Error col1 at " NR;
else if ($4 != int($4)) print "Error col4 at " NR;
else if ($5 != int($5)) print "Error col5 at " NR;
}
}
'
This seems to work fine so my questions now are:
1- How do I check for floats?
2- How do I run this over multiple files?

If this isn't what you want then edit your question to include some sample input and expected output:
awk '
function act_type(n, t) {
if (n ~ /^[0-9]{4}(-[0-9]{2}){2}$/) { t = "date" }
else if (n ~ /^-?[0-9]+\.[0-9]+$/) { t = "float" }
else if (n ~ /^-?[0-9]+$/) { t = "int" }
return t
}
BEGIN { split("int float float int int date",exp_type) }
{
for (i=1; i<=NF; i++) {
if (act_type(i) != exp_type[i]) {
print "Error col", i, "at", NR. "in", FILENAME | "cat>&2"
}
}
}
' file
massage the regexp to suit your data (i.e. if your ints can start with + and/or include ,s then include that in the regexp).

To test if a field is a number, you can check if
$1 + 0 == $1
This works because adding to a string converts it to zero, if it isn't a number.
To run a script on multiple files, you can just add them as extra parameters, e.g.
awk 'commands' file1 file2 file3

Related

Comparing columns and printing comments in a new column based on column values

I have a file with multiple columns. I want to check the following conditions :
file.csv
A.B.P;FATH;FNAME;XTRUC;XIZE;XIZE2;ORG;ORG2
AIT;Y9A;RAIT;UNKNOWN;UNKNOWN;80;X;XY
AIT-A;Y9A;RAIT;VIR;67;217;X;X
if $4 contains UNKNOWN print in a new error column "XTRUC is UNKNOWN "
Example :
A.B.P;FATH;FNAME;XTRUC;XIZE;XIZE2;ORG;ORG2;error
AIT;Y9A;RAIT;UNKNOWN;UNKNOWN;80;X;XY;"XTRUC is UNKNOWN."
if for the same value in $3 we have different values in $4 print in a new column "multiple XTRUC value for the same FNAME" and if the previous error exist print the new error in a new line in the same cell.
Example :
A.B.P;FATH;FNAME;XTRUC;XIZE;XIZE2;ORG;ORG2;error
AIT;Y9A;RAIT;UNKNOWN;UNKNOWN;80;X;XY;"XTRUC is UNKNOWN.
multiple XTRUC value for the same FNAME."
AIT-A;Y9A;RAIT;VIR;67;217;X;X;"multiple XTRUC value for the same FNAME"
if $5 and $6 do not match or one of them or both contain something other tan numbers print the error in a new column "XIZE NOK" and/or "XIZE2 NOK" and/or "XIZE and XIZE2 don't match" in a new line if previous errors exist in the same cell.
Example :
A.B.P;FATH;FNAME;XTRUC;XIZE;XIZE2;ORG;ORG2;error
AIT;Y9A;RAIT;UNKNOWN;UNKNOWN;80;X;XY;"XTRUC is UNKNOWN.
multiple XTRUC value for the same FNAME.
XIZE NOK."
AIT-A;Y9A;RAIT;VIR;67;217;X;X;"multiple XTRUC value for the same FNAME.
XIZE and XIZE2 don't match."
if $7 and $8 do not match print the error in a new column "ORG and ORG2 don't match" in a new line if previous errors exist in the same cell.
Example and expected result:
A.B.P;FATH;FNAME;XTRUC;XIZE;XIZE2;ORG;ORG2;error
AIT;Y9A;RAIT;UNKNOWN;UNKNOWN;80;X;X;"XTRUC is UNKNOWN.
multiple XTRUC value for the same FNAME.
XIZE NOK."
AIT-A;Y9A;RAIT;VIR;67;217;X;X Y;"multiple XTRUC value for the same FNAME.
XIZE and XIZE2 don't match.
ORG and ORG2 don't match."
Visual result from CSV file :
I tried to use multiple awk commands like :
awk '{if($5!=$6) print "XIZE and XIZE2 do not match" ; elif($5!='^[0-9]+$' print "`XIZE` NOK" ; elif($6!="^-\?[0-9]+$" print "`XIZE` NOK"}' file.csv
It didn't work and with multiple conditions i wonder if there's a simpler way to do it.
I assume you want to add these messages to a new final column.
awk -F ';' 'BEGIN {OFS = FS}
{new_field = NF + 1}
$5 != $6 {$new_field = $new_field "XIZE and XIZE2 do not match\n"}
$5 !~ "^[0-9]+$" {$new_field = $new_field "`XIZE` NOK\n"}
$6 !~ "^-\\?[0-9]+$" {$new_field = $new_field "`XIZE` NOK\n"}
{print}' file.csv > new-file.csv
This may output more newlines than you want. If that's a problem, it's possible to fix that, perhaps using an array and a for loop or building a string and adding it at print time (see below) instead of simple concatenation.
This script
sets the field delimiter for input (-F) and output (OFS) to a semicolon
calculates the field number of a new error field at the end of the row, it does this for each row, so it may be different if the lengths of rows varies
for each true field test it concatenates a message to the error field
regex tests use the negated regex match operator !~
each field in each row is tested (tests are not mutually exclusive (no else), if you want them to be mutually exclusive you can change the form of the tests back to using if and else
prints the whole row whether an error field was added or not
redirects the output to a new file
I used the shorter messages from your AWK script rather than the longer ones in your examples. You can easily change them if needed.
Here is an array version that eliminates an excess newline and wraps the new field in quotes:
awk -F ';' 'BEGIN {OFS = FS}
NR == 1 {print; next}
{new_field = NF + 1; delete arr; i = 0; d = ""; msg = ""}
$5 != $6 {arr[i++] = "XIZE and XIZE2 do not match"}
$5 !~ "^[0-9]+$" {arr[i++] = "`XIZE` NOK"}
$6 !~ "^-\\?[0-9]+$" {arr[i++] = "`XIZE` NOK"}
{
if (i > 0) {
msg = "\"";
for (idx in arr) {
msg = d msg arr[idx];
d = "\n";
}
msg = msg "\"";
$new_field = msg;
};
print
}' file.csv > new-file.csv
I think this might be what you want:
$ cat tst.awk
BEGIN { FS=OFS=";" }
NR == 1 { print $0, "error"; next }
{ numErrs = 0 }
($4 == "UNKNOWN") { errs[++numErrs] = "XTRUC is UNKNOWN" }
($3 != $4) { errs[++numErrs] = "multiple XTRUC value for the same FNAME" }
($5 != $6) || ($5+0 != $5) || ($6+0 != $6) { errs[++numErrs] = "XIZE and XIZE2 don't match" }
($7 != $8) { errs[++numErrs] = "ORG and ORG2 don't match" }
{
printf "%s%s\"", $0, OFS
for ( errNr=1; errNr<=numErrs; errNr++ ) {
printf "%s%s", (errNr>1 ? "\n\t\t\t\t" : ""), errs[errNr]
}
print "\""
}
$ awk -f tst.awk file.csv
A.B.P;FATH;FNAME;XTRUC;XIZE;XIZE2;ORG;ORG2;error
AIT;Y9A;RAIT;UNKNOWN;UNKNOWN;80;X;XY;"XTRUC is UNKNOWN
multiple XTRUC value for the same FNAME
XIZE and XIZE2 don't match
ORG and ORG2 don't match"
AIT-A;Y9A;RAIT;VIR;67;217;X;X;"multiple XTRUC value for the same FNAME
XIZE and XIZE2 don't match"
If you don't REALLY want a bunch of white space at the start of the lines in the quoted fields (I only added them to get output that looks ike you say you wanted in your question), then just get rid of \t\t\t\t from the printf but leave the \n, i.e. printf "%s%s", (errNr>1 ? "\n" : ""), errs[errNr]. I'd normally print ORS insead of \n but you may be doing this to create output for MS-Excel in which case you'd set ORS="\r\n" in the BEGIN section and leave that printf with a \n in it for consistency with Excels CSV format.
printf "AIT;Y9A;RAIT;UNKNOWN;UNKNOWN;80;X;XY" | tr ';' '\n' > stack
[[ $(sed -n '/UNKNOWN/p' stack) ]] && printf "\"XTRUC is UNKNOWN\"" >> stack
tr '\n' ';' < stack > s2
You can do the same thing with whatever other tests you like. Just replace the semi colons with newlines, and then use ed or sed to read the line number corresponding with the line you want. After that, replace the newlines with semicolons again.

Editing text in Bash

I am trying to edit text in Bash, i got to point where i am no longer able to continue and i need help.
The text i need to edit:
Symbol Name Sector Market Cap, $K Last Links
AAPL
Apple Inc
Computers and Technology
2,006,722,560
118.03
AMGN
Amgen Inc
Medical
132,594,808
227.76
AXP
American Express Company
Finance
91,986,280
114.24
BA
Boeing Company
Aerospace
114,768,960
203.30
The text i need:
Symbol,Name,Sector,Market Cap $K,Last,Links
AAPL,Apple Inc,Computers and Technology,2,006,722,560,118.03
AMGN,Amgen Inc,Medical,132,594,808,227.76
AXP,American Express Company,Finance,91,986,280,114.24
BA,Boeing Company,Aerospace,114,768,960,203.30
I already tried :
sed 's/$/,/' BIPSukol.txt > BIPSukol1.txt | awk 'NR==1{print}' BIPSukol1.txt | awk '(NR-1)%5{printf "%s ", $0;next;}1' BIPSukol1.txt | sed 's/.$//'
But it doesnt quite do the job.
(BIPSukol1.txt is the name of the file i am editing)
The biggest problem you have is you do not have consistent delimiters between your fields. Some have commas, some don't and some are just a combination of 3-fields that happen to run together.
The tool you want is awk. It will allow you to treat the first line differently and then condition the output that follows with convenient counters you keep within the script. In awk you write rules (what comes between the outer {...} and then awk applies your rules in the order they are written. This allows you to "fix-up" your hap-hazard format and arrive at the desired output.
The first rule applied FNR==1 is applied to the 1st line. It loops over the fields and finds the problematic "Market Cap $K" field and considers it as one, skipping beyond it to output the remaining headings. It stores a counter count = NF - 3 as you only have 5 lines of data for each Symbol, and skips to the next record.
When count==n the next rule is triggered which just outputs the records stored in the a[] array, zeros count and deletes the a[] array for refilling.
The next rule is applied to every record (line) of input from the 2nd-on. It simply removes any whitespece from the fields by forcing awk to recalculate the fields with $1 = $1 and then stores the record in the array incrementing count.
The last rule, END is a special rule that runs after all records are processed (it lets you sum final tallies or output final lines of data) Here it is used to output the records that remain in a[] when the end of the file is reached.
Putting it altogether in another cut at awk:
awk '
FNR==1 {
for (i=1;i<=NF;i++)
if ($i == "Market") {
printf ",Market Cap $K"
i = i + 2
}
else
printf (i>1?",%s":"%s"), $i
print ""
n = NF-3
count = 0
next
}
count==n {
for (i=1;i<=n;i++)
printf (i>1?",%s":"%s"), a[i]
print ""
delete a
count = 0
}
{
$1 = $1
a[++count] = $0
}
END {
for (i=1;i<=count;i++)
printf (i>1?",%s":"%s"), a[i]
print ""
}
' file
Example Use/Output
Note: you can simply select-copy the script above and then middle-mouse-paste it into an xterm with the directory set so it contains file (you will need to rename file to whatever your input filename is)
$ awk '
> FNR==1 {
> for (i=1;i<=NF;i++)
> if ($i == "Market") {
> printf ",Market Cap $K"
> i = i + 2
> }
> else
> printf (i>1?",%s":"%s"), $i
> print ""
> n = NF-3
> count = 0
> next
> }
> count==n {
> for (i=1;i<=n;i++)
> printf (i>1?",%s":"%s"), a[i]
> print ""
> delete a
> count = 0
> }
> {
> $1 = $1
> a[++count] = $0
> }
> END {
> for (i=1;i<=count;i++)
> printf (i>1?",%s":"%s"), a[i]
> print ""
> }
> ' file
Symbol,Name,Sector,Market Cap $K,Last,Links
AAPL,Apple Inc,Computers and Technology,2,006,722,560,118.03
AMGN,Amgen Inc,Medical,132,594,808,227.76
AXP,American Express Company,Finance,91,986,280,114.24
BA,Boeing Company,Aerospace,114,768,960,203.30
(note: it is unclear why you want the "Links" heading included since there is no information for that field -- but that is how your desired output is specified)
More Efficient No Array
You always have afterthoughts that creep in after you post an answer, no different than remembering a better way to answer a question as you are walking out of an exam, or thinking about the one additional question you wished you would have asked after you excuse a witness or rest your case at trial. (there was some song that captured it -- a little bit ironic :)
The following does essentially the same thing, but without using arrays. Instead it simply outputs the information after formatting it rather than buffer it in an array for output all at once. It was one of those type afterthoughts:
awk '
FNR==1 {
for (i=1;i<=NF;i++)
if ($i == "Market") {
printf ",Market Cap $K"
i = i + 2
}
else
printf (i>1?",%s":"%s"), $i
print ""
n = NF-3
count = 0
next
}
count==n {
print ""
count = 0
}
{
$1 = $1
printf (++count>1?",%s":"%s"), $0
}
END { print "" }
' file
(same output)
With your shown samples, could you please try following(written and tested in GNU awk). Considering that(by seeing OP's attempts) after header of Input_file you want to make every 5 lines into a single line.
awk '
BEGIN{
OFS=","
}
FNR==1{
NF--
match($0,/Market.*\$K/)
matchedPart=substr($0,RSTART,RLENGTH)
firstPart=substr($0,1,RSTART-1)
lastPart=substr($0,RSTART+RLENGTH)
gsub(/,/,"",matchedPart)
gsub(/ +/,",",firstPart)
gsub(/ +/,",",lastPart)
print firstPart matchedPart lastPart
next
}
{
sub(/^ +/,"")
}
++count==5{
print val,$0
count=0
val=""
next
}
{
val=(val?val OFS:"")$0
}
' Input_file
OR if your awk doesn't support NF-- then try following.
awk '
BEGIN{
OFS=","
}
FNR==1{
match($0,/Market.*\$K/)
matchedPart=substr($0,RSTART,RLENGTH)
firstPart=substr($0,1,RSTART-1)
lastPart=substr($0,RSTART+RLENGTH)
gsub(/,/,"",matchedPart)
gsub(/ +/,",",firstPart)
gsub(/ +Links( +)?$/,"",lastPart)
gsub(/ +/,",",lastPart)
print firstPart matchedPart lastPart
next
}
{
sub(/^ +/,"")
}
++count==5{
print val,$0
count=0
val=""
next
}
{
val=(val?val OFS:"")$0
}
' Input_file
NOTE: Looks like your header/first line needed special manipulation because we can't simply set , for all spaces, so taken care of it in this solution as per shown samples.
With GNU awk. If your first line is always the same.
echo 'Symbol,Name,Sector,Market Cap $K,Last,Links'
awk 'NR>1 && NF=5' RS='\n ' ORS='\n' FS='\n' OFS=',' file
Output:
Symbol,Name,Sector,Market Cap $K,Last,Links
AAPL,Apple Inc,Computers and Technology,2,006,722,560,118.03
AMGN,Amgen Inc,Medical,132,594,808,227.76
AXP,American Express Company,Finance,91,986,280,114.24
BA,Boeing Company,Aerospace,114,768,960,203.30
See: 8 Powerful Awk Built-in Variables – FS, OFS, RS, ORS, NR, NF, FILENAME, FNR

bash - select columns based on values

I am new to bash and have the below requirement:
I have a file as below:
col1,col2,col3....col25
s1,s2,s2..........s1
col1,col2,col3....col25
s3,s2,s2..........s2
If you notice the values of these columns can be of 3 types only: s1,s2,s3
I can extract the last 2rows from the given file which gives me:
col1,col2,col3....col25
s3,s1,s2..........s2
I want to further parse the above lines so that I get only the columns with say value s1.
Desired output:
say col3,col25 are the only columns with value s2, then say a comma separated value is also fine ex:
col3,col25
Can someone please help?
P.S. I found many examples where a file parsed based on the value of say 2nd (fixed) column, but how do we do it when the column number is not fixed?
Checked URLs:
awk one liner select only rows based on value of a column
Assumptions:
there are 2 input lines
each input line has the same number of comma-separated items
We can use a couple arrays to collect the input data, making sure to use the same array indexes. Once the data is loaded into arrays we loop through the array looking for our value match.
$ cat col.awk
/col1/ { for (i=1; i<=NF; i++) { arr_c[i]=$i } ; n=NF }
! /col1/ { for (i=1; i<=NF; i++) { arr_s[i]=$i } }
END {
sep=""
for (i=1; i<=n; i++)
{ if (arr_s[i]==smatch)
{ printf "%s%s" ,sep,arr_c[i]
sep=", "
}
}
}
/col1/ : for the line that contains col1, store the fields in array arr_c
n=NF : grab our max array index value (NF=number of fields)
! /col1/ : for line that does not contain col1, store the fields in array arr_s
END ... : executed once the arrays have been loaded
sep="" : set our initial output separator to a null string
for (...) : loop through our array indexes (1 to n)
if (arr_s[i]==smatch) : if the s array value matches our input parameter (smatch - see below example), then ...
printf "%s%s",sep,arr_c[i] : printf our sep and the matching c array item, then ...
sep=", " : set our separator for the next match in the loop
We use printf because without specifying '\n' (a new line), all output goes to one line.
Example:
$ cat col.out
col1,col2,col3,col4,col5
s3,s1,s2,s1,s3
$ awk -F, -f col.awk smatch=s1 col.out
col2, col4
-F, : define the input field separator as a comma
here we pass in our search pattern s1 in the array variable named smatch, which is referenced in the awk code (see col.awk - above)
If you want to do the whole thing at the command line:
$ awk -F, '
/col1/ { for (i=1; i<=NF; i++) { arr_c[i]=$i } ; n=NF }
! /col1/ { for (i=1; i<=NF; i++) { arr_s[i]=$i } }
END {
sep=""
for (i=1; i<=n; i++)
{ if (arr_s[i]==smatch)
{ printf "%s%s" ,sep,arr_c[i]
sep=", "
}
}
}
' smatch=s1 col.out
col2, col4
Or collapsing the END block to a single line:
awk -F, '
/col1/ { for (i=1; i<=NF; i++) { arr_c[i]=$i } ; n=NF }
! /col1/ { for (i=1; i<=NF; i++) { arr_s[i]=$i } }
END { sep="" ; for (i=1; i<=n; i++) { if (arr_s[i]==smatch) { printf "%s%s" ,sep,arr_c[i] ; sep=", " } } }
' smatch=s1 col.out
col2, col4
I'm not so good with awk, but here is something that seems to work, outputting only the column names whose corresponding values are s1 :
#<yourTwoLines> |
tac |
awk -F ',' 'NR == 1 { for (f=1; f<=NF; f++) { relevant[f]= ($f == "s1") } };
NR == 2 { for (f=1; f<=NF; f++) { if(relevant[f]) print($f) } }'
It works in the following way :
reverse the lines order with tac, so the value (criteria) are handled before the headers (which we will print based on the criteria).
when handling the first line (now values) with awk, store in an array which ones are s1
when handling the second line (now headers) with awk, print those who correspond to an s1 value thanks to the previously filled array.
solution in awk that prints a resulting row after parsing each set of 2 rows.
$ cat tst.awk
BEGIN {FS=","; p=0}
/s1|s2|s3/ {
for (i=1; i<NF; i++) {
if ($i=="s2") str = sprintf("%s%s", str?str ", ":str, c[i])
};
p=1
}
!p { for (i=1; i<NF; i++) { c[i] = $i } }
p { print str; p=0; str="" }
Rationale: build up your resultstring str when you're looping through the value-row.
whenever your input contains s1, s2 or s3, loop through the elements and - if value == s2 -, add column with index i to resultstring str; set the print var p to 1.
if p = 0 build up column array
if p = 1 print resultstring str
With input:
$ cat input.txt
col1,col2,col3,col4,col5
s1,s2,s2,s3,s1
col1,col2,col3,col4,col5
s1,s1,s2,s3,s3
col1,col2,col3,col4,col5
s1,s1,s1,s3,s3
col1,col2,col3,col4,col5
s1,s1,s2,s3,s3
The result is:
$ awk -f tst.awk input.txt
col2, col3
col3
col3
Notice the empty 3rd line: no s2's for that one.
Let's say you have this:
cat file
col1,col2,col3,..,col25
s3,s1,s2,........,s2
Then you can use this awk:
awk -F, -v val='s2' '{
s="";
for (i=1; i<=NF; i++)
if (NR==1)
hdr[i]=$i
else if ($i==val)
s=s hdr[i] FS;
if (s) {
sub(/,$/, "", s);
print s
}
}' file
col3,col25
If order of the columns returned is not a concern
awk -F"," 'NR==1{for(i=1;i<=NF;i++){a[i]=$i};next}{for(i=1;i<=NF;i++){if($i=="s2")b[i]=$i}}END{for( i in b) m=m a[i]","; gsub(/,$/,"", m); print m }'

Got stuck with multiple value validation against in particular columns in awk?

I have a text file where i'm trying to validate with particular column(5) if that column contains value like ACT,LFP,TST and EPO then file goes to further process else it should be exit.Here i'm if my text file contains these value in column number 5 means ACT,LFP,TST and EPO go for further process on other hand if column contains apart from that four value then script will terminate.
Code
cat test.txt \
| awk -F '~' -v ERR="/a/x/ERROR" -v NAME="/a/x/z/" -v WRKD="/a/x/b/" -v DATE="23_09_16" -v PD="234" -v FILE_NAME="FILENAME" \
'{ if ($5 != "ACT" || $5 != "LFP" || $5 != "EPO" || $5 != "TST")
system("mv "NAME" "ERR);
system("rm -f"" "WRKD);
print DATE" " PD " " "[" FILE_NAME "]" " ERROR: Panel status contains invalid value due to this file move to error folder";
print DATE" " PD " " "[" FILE_NAME "]" " INFO: Script is exited";
system("exit");
}' >>log.txt
Txt file: test.txt(Note:- File should be processed successfully)
161518~CHEM~ACT~IRPMR~ACT~UD
010282~CHEM~ACT~IRPMR~ACT~UD
162794~CHEM~ACT~IRPMR~LFP~UD
030767~CHEM~ACT~IRPMR~LFP~UD
Txt file: test1.txt(Note:- File should not be processed successfully.This file contains one invalid value)
161518~CHEM~ACT~IRPMR~**ACT1**~UD
010282~CHEM~ACT~IRPMR~ACT~UD
162794~CHEM~ACT~IRPMR~TST~UD
030767~CHEM~ACT~IRPMR~LFP~UD
awk to the rescue!
Lets assume the following input file:
010282~CHEM~ACT~IRPMR~ACT~UD
121212~CHEM~ACT~IRPMR~ZZZ~UD
162794~CHEM~ACT~IRPMR~TST~UD
020202~CHEM~ACT~IRPMR~YYY~UD
030767~CHEM~ACT~IRPMR~LFP~UD
987654~CHEM~ACT~IRPMR~EPO~UD
010101~CHEM~ACT~IRPMR~XXX~UD
123456~CHEM~ACT~IRPMR~TST~UD
1) This example illustrates how to check for invalid lines/records in the input file:
#!/bin/awk
BEGIN {
FS = "~"
s = "ACT,LFP,TST,EPO"
n = split( s, a, "," )
}
{
for( i = 1; i <= n; i++ )
if( a[i] == $5 )
next
print "Unexpected value # line " NR " [" $5 "]"
}
# eof #
Testing:
$ awk -f script.awk -- input.txt
Unexpected value # line 2 [ZZZ]
Unexpected value # line 4 [YYY]
Unexpected value # line 7 [XXX]
2) This example illustrates how to filter out (remove) invalid lines/records from the input file:
#!/bin/awk
BEGIN {
FS = "~"
s = "ACT,LFP,TST,EPO"
n = split( s, a, "," )
}
{
for( i = 1; i <= n; i++ )
{
if( a[i] == $5 )
{
print $0
next
}
}
}
# eof #
Testing:
$ awk -f script.awk -- input.txt
010282~CHEM~ACT~IRPMR~ACT~UD
162794~CHEM~ACT~IRPMR~TST~UD
030767~CHEM~ACT~IRPMR~LFP~UD
987654~CHEM~ACT~IRPMR~EPO~UD
123456~CHEM~ACT~IRPMR~TST~UD
3) This example illustrates how to display the invalid lines/records from the input file:
#!/bin/awk
BEGIN {
FS = "~"
s = "ACT,LFP,TST,EPO"
n = split( s, a, "," )
}
{
for( i = 1; i <= n; i++ )
if( a[i] == $5 )
next
print $0
}
# eof #
Testing:
$ awk -f script.awk -- input.txt
121212~CHEM~ACT~IRPMR~ZZZ~UD
020202~CHEM~ACT~IRPMR~YYY~UD
010101~CHEM~ACT~IRPMR~XXX~UD
Hope it Helps!
Without getting into the calls to system, this will show you an answer.
awk -F"~" '{ if (! ($5 == "ACT" || $5 == "LFP" || $5 == "EPO" || $5 == "TST")) print $0}' data.txt
output
161518~CHEM~ACT~IRPMR~**ACT1**~UD
This version is testing if $5 matches at least one item in the list. If it doesn't (the ! at the front of the || chain tests), then it prints the record as an error.
Of course, $5 will match only one from that list at a time, but that is all you need.
By contrast, when you say
if ($5 != "ACT" || $5 != "LFP" ...)
You're creating a logic test that can never be true. If $5 does not equal "ACT" because it is "LFP", you have already had the chained condition fail, and the remaining || will not be checked.
IHTH

Display input file without print in awk

My Code is in mid of manipulating two input files.
awk -F'|' -v PARM_VAL="${PARM_VALUE[*]}" '
BEGIN { split(PARM_VAL,pa," ") }
FNR==NR
{
for(i=1;i<=NF;i++)
a[NR,i]=$i;
}
END {printf "second value of SPPIN : "a[2,2]", parm : "pa[2]", File val : " FILENAME "First rec of SPPOUT: " $0 ;printf "\n" } ' SPP_IN SPP_OUT
I am passing parm array to awk, storing first input file in array. Just executed the above command.
My first input file is getting displayed without print. Anyway to suppress or avoid it?
Don't split FNR == NR and the { of the action.
FNR == NR
{
Put them on the same line instead.
FNR == NR {
awk is seeing FNR==NR as a pattern without an action and using the default action of print.

Resources