Got stuck with multiple value validation against in particular columns in awk? - shell

I have a text file where i'm trying to validate with particular column(5) if that column contains value like ACT,LFP,TST and EPO then file goes to further process else it should be exit.Here i'm if my text file contains these value in column number 5 means ACT,LFP,TST and EPO go for further process on other hand if column contains apart from that four value then script will terminate.
Code
cat test.txt \
| awk -F '~' -v ERR="/a/x/ERROR" -v NAME="/a/x/z/" -v WRKD="/a/x/b/" -v DATE="23_09_16" -v PD="234" -v FILE_NAME="FILENAME" \
'{ if ($5 != "ACT" || $5 != "LFP" || $5 != "EPO" || $5 != "TST")
system("mv "NAME" "ERR);
system("rm -f"" "WRKD);
print DATE" " PD " " "[" FILE_NAME "]" " ERROR: Panel status contains invalid value due to this file move to error folder";
print DATE" " PD " " "[" FILE_NAME "]" " INFO: Script is exited";
system("exit");
}' >>log.txt
Txt file: test.txt(Note:- File should be processed successfully)
161518~CHEM~ACT~IRPMR~ACT~UD
010282~CHEM~ACT~IRPMR~ACT~UD
162794~CHEM~ACT~IRPMR~LFP~UD
030767~CHEM~ACT~IRPMR~LFP~UD
Txt file: test1.txt(Note:- File should not be processed successfully.This file contains one invalid value)
161518~CHEM~ACT~IRPMR~**ACT1**~UD
010282~CHEM~ACT~IRPMR~ACT~UD
162794~CHEM~ACT~IRPMR~TST~UD
030767~CHEM~ACT~IRPMR~LFP~UD

awk to the rescue!
Lets assume the following input file:
010282~CHEM~ACT~IRPMR~ACT~UD
121212~CHEM~ACT~IRPMR~ZZZ~UD
162794~CHEM~ACT~IRPMR~TST~UD
020202~CHEM~ACT~IRPMR~YYY~UD
030767~CHEM~ACT~IRPMR~LFP~UD
987654~CHEM~ACT~IRPMR~EPO~UD
010101~CHEM~ACT~IRPMR~XXX~UD
123456~CHEM~ACT~IRPMR~TST~UD
1) This example illustrates how to check for invalid lines/records in the input file:
#!/bin/awk
BEGIN {
FS = "~"
s = "ACT,LFP,TST,EPO"
n = split( s, a, "," )
}
{
for( i = 1; i <= n; i++ )
if( a[i] == $5 )
next
print "Unexpected value # line " NR " [" $5 "]"
}
# eof #
Testing:
$ awk -f script.awk -- input.txt
Unexpected value # line 2 [ZZZ]
Unexpected value # line 4 [YYY]
Unexpected value # line 7 [XXX]
2) This example illustrates how to filter out (remove) invalid lines/records from the input file:
#!/bin/awk
BEGIN {
FS = "~"
s = "ACT,LFP,TST,EPO"
n = split( s, a, "," )
}
{
for( i = 1; i <= n; i++ )
{
if( a[i] == $5 )
{
print $0
next
}
}
}
# eof #
Testing:
$ awk -f script.awk -- input.txt
010282~CHEM~ACT~IRPMR~ACT~UD
162794~CHEM~ACT~IRPMR~TST~UD
030767~CHEM~ACT~IRPMR~LFP~UD
987654~CHEM~ACT~IRPMR~EPO~UD
123456~CHEM~ACT~IRPMR~TST~UD
3) This example illustrates how to display the invalid lines/records from the input file:
#!/bin/awk
BEGIN {
FS = "~"
s = "ACT,LFP,TST,EPO"
n = split( s, a, "," )
}
{
for( i = 1; i <= n; i++ )
if( a[i] == $5 )
next
print $0
}
# eof #
Testing:
$ awk -f script.awk -- input.txt
121212~CHEM~ACT~IRPMR~ZZZ~UD
020202~CHEM~ACT~IRPMR~YYY~UD
010101~CHEM~ACT~IRPMR~XXX~UD
Hope it Helps!

Without getting into the calls to system, this will show you an answer.
awk -F"~" '{ if (! ($5 == "ACT" || $5 == "LFP" || $5 == "EPO" || $5 == "TST")) print $0}' data.txt
output
161518~CHEM~ACT~IRPMR~**ACT1**~UD
This version is testing if $5 matches at least one item in the list. If it doesn't (the ! at the front of the || chain tests), then it prints the record as an error.
Of course, $5 will match only one from that list at a time, but that is all you need.
By contrast, when you say
if ($5 != "ACT" || $5 != "LFP" ...)
You're creating a logic test that can never be true. If $5 does not equal "ACT" because it is "LFP", you have already had the chained condition fail, and the remaining || will not be checked.
IHTH

Related

Comparing columns and printing comments in a new column based on column values

I have a file with multiple columns. I want to check the following conditions :
file.csv
A.B.P;FATH;FNAME;XTRUC;XIZE;XIZE2;ORG;ORG2
AIT;Y9A;RAIT;UNKNOWN;UNKNOWN;80;X;XY
AIT-A;Y9A;RAIT;VIR;67;217;X;X
if $4 contains UNKNOWN print in a new error column "XTRUC is UNKNOWN "
Example :
A.B.P;FATH;FNAME;XTRUC;XIZE;XIZE2;ORG;ORG2;error
AIT;Y9A;RAIT;UNKNOWN;UNKNOWN;80;X;XY;"XTRUC is UNKNOWN."
if for the same value in $3 we have different values in $4 print in a new column "multiple XTRUC value for the same FNAME" and if the previous error exist print the new error in a new line in the same cell.
Example :
A.B.P;FATH;FNAME;XTRUC;XIZE;XIZE2;ORG;ORG2;error
AIT;Y9A;RAIT;UNKNOWN;UNKNOWN;80;X;XY;"XTRUC is UNKNOWN.
multiple XTRUC value for the same FNAME."
AIT-A;Y9A;RAIT;VIR;67;217;X;X;"multiple XTRUC value for the same FNAME"
if $5 and $6 do not match or one of them or both contain something other tan numbers print the error in a new column "XIZE NOK" and/or "XIZE2 NOK" and/or "XIZE and XIZE2 don't match" in a new line if previous errors exist in the same cell.
Example :
A.B.P;FATH;FNAME;XTRUC;XIZE;XIZE2;ORG;ORG2;error
AIT;Y9A;RAIT;UNKNOWN;UNKNOWN;80;X;XY;"XTRUC is UNKNOWN.
multiple XTRUC value for the same FNAME.
XIZE NOK."
AIT-A;Y9A;RAIT;VIR;67;217;X;X;"multiple XTRUC value for the same FNAME.
XIZE and XIZE2 don't match."
if $7 and $8 do not match print the error in a new column "ORG and ORG2 don't match" in a new line if previous errors exist in the same cell.
Example and expected result:
A.B.P;FATH;FNAME;XTRUC;XIZE;XIZE2;ORG;ORG2;error
AIT;Y9A;RAIT;UNKNOWN;UNKNOWN;80;X;X;"XTRUC is UNKNOWN.
multiple XTRUC value for the same FNAME.
XIZE NOK."
AIT-A;Y9A;RAIT;VIR;67;217;X;X Y;"multiple XTRUC value for the same FNAME.
XIZE and XIZE2 don't match.
ORG and ORG2 don't match."
Visual result from CSV file :
I tried to use multiple awk commands like :
awk '{if($5!=$6) print "XIZE and XIZE2 do not match" ; elif($5!='^[0-9]+$' print "`XIZE` NOK" ; elif($6!="^-\?[0-9]+$" print "`XIZE` NOK"}' file.csv
It didn't work and with multiple conditions i wonder if there's a simpler way to do it.
I assume you want to add these messages to a new final column.
awk -F ';' 'BEGIN {OFS = FS}
{new_field = NF + 1}
$5 != $6 {$new_field = $new_field "XIZE and XIZE2 do not match\n"}
$5 !~ "^[0-9]+$" {$new_field = $new_field "`XIZE` NOK\n"}
$6 !~ "^-\\?[0-9]+$" {$new_field = $new_field "`XIZE` NOK\n"}
{print}' file.csv > new-file.csv
This may output more newlines than you want. If that's a problem, it's possible to fix that, perhaps using an array and a for loop or building a string and adding it at print time (see below) instead of simple concatenation.
This script
sets the field delimiter for input (-F) and output (OFS) to a semicolon
calculates the field number of a new error field at the end of the row, it does this for each row, so it may be different if the lengths of rows varies
for each true field test it concatenates a message to the error field
regex tests use the negated regex match operator !~
each field in each row is tested (tests are not mutually exclusive (no else), if you want them to be mutually exclusive you can change the form of the tests back to using if and else
prints the whole row whether an error field was added or not
redirects the output to a new file
I used the shorter messages from your AWK script rather than the longer ones in your examples. You can easily change them if needed.
Here is an array version that eliminates an excess newline and wraps the new field in quotes:
awk -F ';' 'BEGIN {OFS = FS}
NR == 1 {print; next}
{new_field = NF + 1; delete arr; i = 0; d = ""; msg = ""}
$5 != $6 {arr[i++] = "XIZE and XIZE2 do not match"}
$5 !~ "^[0-9]+$" {arr[i++] = "`XIZE` NOK"}
$6 !~ "^-\\?[0-9]+$" {arr[i++] = "`XIZE` NOK"}
{
if (i > 0) {
msg = "\"";
for (idx in arr) {
msg = d msg arr[idx];
d = "\n";
}
msg = msg "\"";
$new_field = msg;
};
print
}' file.csv > new-file.csv
I think this might be what you want:
$ cat tst.awk
BEGIN { FS=OFS=";" }
NR == 1 { print $0, "error"; next }
{ numErrs = 0 }
($4 == "UNKNOWN") { errs[++numErrs] = "XTRUC is UNKNOWN" }
($3 != $4) { errs[++numErrs] = "multiple XTRUC value for the same FNAME" }
($5 != $6) || ($5+0 != $5) || ($6+0 != $6) { errs[++numErrs] = "XIZE and XIZE2 don't match" }
($7 != $8) { errs[++numErrs] = "ORG and ORG2 don't match" }
{
printf "%s%s\"", $0, OFS
for ( errNr=1; errNr<=numErrs; errNr++ ) {
printf "%s%s", (errNr>1 ? "\n\t\t\t\t" : ""), errs[errNr]
}
print "\""
}
$ awk -f tst.awk file.csv
A.B.P;FATH;FNAME;XTRUC;XIZE;XIZE2;ORG;ORG2;error
AIT;Y9A;RAIT;UNKNOWN;UNKNOWN;80;X;XY;"XTRUC is UNKNOWN
multiple XTRUC value for the same FNAME
XIZE and XIZE2 don't match
ORG and ORG2 don't match"
AIT-A;Y9A;RAIT;VIR;67;217;X;X;"multiple XTRUC value for the same FNAME
XIZE and XIZE2 don't match"
If you don't REALLY want a bunch of white space at the start of the lines in the quoted fields (I only added them to get output that looks ike you say you wanted in your question), then just get rid of \t\t\t\t from the printf but leave the \n, i.e. printf "%s%s", (errNr>1 ? "\n" : ""), errs[errNr]. I'd normally print ORS insead of \n but you may be doing this to create output for MS-Excel in which case you'd set ORS="\r\n" in the BEGIN section and leave that printf with a \n in it for consistency with Excels CSV format.
printf "AIT;Y9A;RAIT;UNKNOWN;UNKNOWN;80;X;XY" | tr ';' '\n' > stack
[[ $(sed -n '/UNKNOWN/p' stack) ]] && printf "\"XTRUC is UNKNOWN\"" >> stack
tr '\n' ';' < stack > s2
You can do the same thing with whatever other tests you like. Just replace the semi colons with newlines, and then use ed or sed to read the line number corresponding with the line you want. After that, replace the newlines with semicolons again.

UNIX group by two values

I have a file with the following lines (values are separated by ";"):
dev_name;dev_type;soft
name1;ASR1;11.1
name2;ASR1;12.2
name3;ASR1;11.1
name4;ASR3;15.1
I know how to group them by one value, like count of all ASRx, but how can I group it by two values, as for example:
ASR1
*11.1 - 2
*12.2 - 1
ASR3
*15.1 - 1
another awk
$ awk -F';' 'NR>1 {a[$2]; b[$3]; c[$2,$3]++}
END {for(k in a) {print k;
for(p in b)
if(c[k,p]) print "\t*"p,"-",c[k,p]}}' file
ASR1
*11.1 - 2
*12.2 - 1
ASR3
*15.1 - 1
$ cat tst.awk
BEGIN { FS=";"; OFS=" - " }
NR==1 { next }
$2 != prev { prt(); prev=$2 }
{ cnt[$3]++ }
END { prt() }
function prt( soft) {
if ( prev != "" ) {
print prev
for (soft in cnt) {
print " *" soft, cnt[soft]
}
delete cnt
}
}
$ awk -f tst.awk file
ASR1
*11.1 - 2
*12.2 - 1
ASR3
*15.1 - 1
Or if you like pipes....
$ tail +2 file | cut -d';' -f2- | sort | uniq -c |
awk -F'[ ;]+' '{print ($3!=prev ? $3 ORS : "") " *" $4 " - " $2; prev=$3}'
ASR1
*11.1 - 2
*12.2 - 1
ASR3
*15.1 - 1
try something like
awk -F ';' '
NR==1{next}
{aRaw[$2"-"$3]++}
END {
asorti( aRaw, aVal)
for( Val in aVal) {
split( aVal [Val], aTmp, /-/ )
if ( aTmp[1] != Last ) { Last = aTmp[1]; print Last }
print " " aTmp[2] " " aRaw[ aVal[ Val] ]
}
}
' YourFile
key here is to use 2 field in a array. The END part is more difficult to present the value than the content itself
Using Perl
$ cat bykub.txt
dev_name;dev_type;soft
name1;ASR1;11.1
name2;ASR1;12.2
name3;ASR1;11.1
name4;ASR3;15.1
$ perl -F";" -lane ' $kv{$F[1]}{$F[2]}++ if $.>1;END { while(($x,$y) = each(%kv)) { print $x;while(($p,$q) = each(%$y)){ print "\t\*$p - $q" }}}' bykub.txt
ASR1
*11.1 - 2
*12.2 - 1
ASR3
*15.1 - 1
$
Yet Another Solution, this one using the always useful GNU datamash to count the groups:
$ datamash -t ';' --header-in -sg 2,3 count 3 < input.txt |
awk -F';' '$1 != curr { curr = $1; print $1 } { print "\t*" $2 " - " $3 }'
ASR1
*11.1 - 2
*12.2 - 1
ASR3
*15.1 - 1
I don't want to encourage lazy questions, but I wrote a solution, and I'm sure someone can point out improvements. I love posting answers on this site because I learn so much. :)
One binary subcall to sort, otherwise all built-in processing. That means using read, which is slow. If your file is large, I'd recommend rewriting the loop in awk or perl, but this will get the job done.
sed 1d groups | # strip the header
sort -t';' -k2,3 > group.srt # pre-sort to collect groupings
declare -i ctr=0 # initialize integer record counter
IFS=';' read x lastA lastB < group.srt # priming read for comparators
printf "$lastA\n\t*$lastB - " # priming print (assumes at least one record)
while IFS=';' read x a b # loop through the file
do if [[ "$lastA" < "$a" ]] # on every MAJOR change
then printf "$ctr\n$a\n\t*$b - " # print total, new MAJOR header and MINOR header
lastA="$a" # update the MAJOR comparator
lastB="$b" # update the MINOR comparator
ctr=1 # reset the counter
elif [[ "$lastB" < "$b" ]] # on every MINOR change
then printf "$ctr\n\t*$b - " # print total and MINOR header
ctr=1 # reset the counter
else (( ctr++ )) # otherwise increment
fi
done < group.srt # feed read from sorted file
printf "$ctr\n" # print final group total at EOF

AWK in single line pass multiple commands

I would like to combine the following multiple awk commands into a single awk program:
awk -F 'FS' '{ $1 = ($1 == "}" ? "" : $1) } 1' sorce > destfil
awk -F 'FS' '{ $3 = ($3 == "]" ? "" : $3) } 1' sorce > destfil
awk -F 'FS' '{ $5 = ($5 == "}" ? "}," : $5) } 1' sorce > destfil
I have tried to accomplish this using && but the result is not what I expected.
awk -F 'FS' '{ $1 = ($1 == "}" ? "" : $1) && $3 = ($3 == "]" ? "" : $3) && $5 = ($5 == "}" ? "}," : $5) } 1' sorce > destfil
The output seems to have various ZERO's in it.
Question:
How can I merge these lines?
What is the origin of the ZEROS?
Thank you!
#RavinderSingh13, as I tried your code, sample input file and output file as per below
[user#restt]$ tail source
{
}
]
}
{
" e t
{
}
]
}
[user#test]$ awk -F 'FS' '{$1=($1=="}"?"":$1); $3=($3=="]" ? "" : $3) ; $5=($5=="}" ? "}," :$5);} 1' source > target
[user#test]$ tail target
{
}
]
}
{
" e t
{
}
]
}
I think the issue is related to field separator -F 'FS' or I was not sure.
#kvantour, Below I have given my sample input file & command what I am running & what output I am getting & what I require.
Source file content :
{
"metadata": [
{
sample content line 1
sample content line n
}
]
}
{
"metadata": [
{
sample content line 1
sample content line n
}
]
}
{
"metadata": [
{
sample content line 1
sample content line n
}
]
}
{
"metadata": [
{
sample content line 1
sample content line n
}
]
}
A command I am running
$ awk '($1=="}"){$1="First Column"}
($3=="]"){$3="third Column"}
($5=="}"){$5="Fifth Column"}
{$1=$1}1' sample.json > out
Output I am getting :
[root#centos-src ~]# cat out
{
"metadata": [
{
sample content line 1
sample content line n
First Column
]
First Column
{
"metadata": [
{
sample content line 1
sample content line n
First Column
]
First Column
{
"metadata": [
{
sample content line 1
sample content line n
First Column
]
First Column
{
"metadata": [
{
sample content line 1
sample content line n
First Column
]
First Column
but I am expecting output is:
{
"metadata": [
{
sample content line 1
sample content line n
Fifth Column
third Column
First Column
{
"metadata": [
{
sample content line 1
sample content line n
Fifth Column
third Column
First Column
{
"metadata": [
{
sample content line 1
sample content line n
Fifth Column
third Column
First Column
{
"metadata": [
{
sample content line 1
sample content line n
Fifth Column
third Column
First Column
In a nice awk structure, one would write:
awk -F 'FS' '($1=="}"){$1=""}
($3=="]"){$3=""}
($5=="}"){$5="},"}
{$1=$1}1' <file>
The reason I add $1=$1 to the list, is to reprocess $0 for the correct OFS in case none of the above conditions was satisfied. If you don't do this, you will have lines printed with FS as field separator and others with OFS.
So why are you getting a bunch of zeros?
Let's look at your one-liner:
$1 = ($1 == "}" ? "" : $1) && $3 = ($3 == "]" ? "" : $3) && $5 = ($5 == "}" ? "}," : $5)
And simplify it by assuming that the ternary operators between brackets return a variable. So we can rewrite it as:
$1 = var1 && $3 = var3 && $5 = var5
Taking into account that:
expr1 && expr2 has a higher precedence than value = expr.
lvalue = expr returns the value of expr
We can see that awk interprets this as
$1 = var1 && ($3 = (var3 && ($5 = var5) ) )
So the result will be:
$5 = var5
$3 = var3 && $5 equalling var3 && var5
$1 = var1 && $3 equalling var1 && var5
This is visible in the following example:
$ echo "a b c d e f" | awk '{ $1="p" && $3 = "q" && $5 = "r"}1'
1 b 1 d rf
Finally, in awk an empty string and a numeric zero has the logical value false and anything else true. So since two of your original ternary operators can return empty strings, they will ensure that the logical AND will return false, which is equivalent to the number ZERO. Hence $1 and $3 will be both matched with ZERO if the original $3 equals ]
Update (after receiving [mcve])
What you try to achieve is not that easy. First off, it seems you assume that the column number implies the character number in the line. This is sadly not the case. Awk, in default mode, assumes that field $n is the nth word in the line where a word is a sequence of characters not containing any blank. So in the following text,
}
]
}
all characters are actually referenced by $1.
Under the assumption that your JSON file is perfectly indented, one could use the following:
awk '/^} *$/{$0="First Column"}
/^ ] *$/{$0=" Thrid Column"}
/^ } *$/{$0=" Fifth Column"}
{print $0}' <file>
However, if your JSON file is not indented uniformly, things become rather messy. The easiest would be to parse the file first with jq as
jq . <json-file> | awk ...
Is this what you're trying to do (given your source input file)?
$ awk '
BEGIN{ FS="[ ]"; map[1,"}"]=map[3,"]"]=map[5,"}"]="" }
{ for (i=1;i<=NF;i++) $i=((i,$i) in map ? map[i,$i] : $i); print }
' file
{
{
" e t
{
Usage ; to separate statements:
awk ... '{ $1 = ($1 == "}" ? "" : $1); $3 = ($3 == "]" ? "" : $3); $5 = ($5 == "}" ? "}," : $5); } 1' ...
Since you haven't shown your sample Input_file so couldn't test it, could you please try following.
awk -F 'FS' '{$1=($1=="}"?"":$1);$3=($3=="]"?"":$3);$5=($5=="}"?"":$5);} 1' sorce > destfil

Finding Contiguous Ranges

I would like to find the contiguous ranges given a set of dates by day
given the following sample
2016-01-01
2016-01-02
2016-01-03
2016-01-04
2016-01-05
2016-01-06
2016-01-08
2016-01-09
2016-01-10
2016-01-11
2016-01-12
2016-01-15
2016-01-16
2016-01-17
2016-01-20
2016-01-21
2016-01-30
2016-01-31
2016-02-01
I expect the following result
2016-01-01-2016-01-06
2016-01-08-2016-01-12
2016-01-15-2016-01-17
2016-01-20-2016-01-21
2016-01-30-2016-01-31
2016-02-01-2016-02-01
I have already came across this question which is almost the opposite of what I want but with integers.
I have formulated the following which works with integers.
awk 'NR==1 {l=$1; n=$1} {if ($1==n){n=$1+1} else{print l"-"n-1; l=$1 ;n=$1+1} } END {print l"-"$1}' file.txt
With GNU awk for mktime():
$ cat tst.awk
BEGIN { FS=OFS="-" }
{ currSecs = mktime( $1" "$2" "$3" 0 0 0" ) }
(currSecs - prevSecs) > (24*60*60) {
if (NR>1) {
print startDate, prevDate
}
startDate = $0
}
{ prevSecs = currSecs; prevDate = $0 }
END { print startDate, prevDate }
$ awk -f tst.awk file
2016-01-01-2016-01-06
2016-01-08-2016-01-12
2016-01-15-2016-01-17
2016-01-20-2016-01-21
2016-01-30-2016-02-01
With any awk if you don't care about ranges restarting when months change (as apparent in your expected output and the comment under your question):
$ cat tst.awk
BEGIN { FS=OFS="-" }
{ currYrMth = $1 FS $2; currDay = $3 }
(currYrMth != prevYrMth) || ((currDay - prevDay) > 1) {
if (NR>1) {
print startDate, prevDate
}
startDate = $0
}
{ prevYrMth = currYrMth; prevDay = currDay; prevDate = $0 }
END { print startDate, prevDate }
$ awk -f tst.awk file
2016-01-01-2016-01-06
2016-01-08-2016-01-12
2016-01-15-2016-01-17
2016-01-20-2016-01-21
2016-01-30-2016-01-31
2016-02-01-2016-02-01
If you have GNU Awk you can use its time functions.
gawk -F - 'NR==1 || $1 "-" $2 "-" $3 != following {
if (following != "") print start "-" latest;
start = $1 "-" $2 "-" $3
this = mktime($1 " " $2 " " $3 " 0 0 0")
}
{
this += 24*60*60
following = strftime("%F", this)
latest = $1 "-" $2 "-" $3 }
END { if (start != latest) print start "-" latest }' filename
Unit ranges will print like "2016-04-15-2016-04-15" which is a bit of a wart, but easy to fix if you need to. Also the END block has a bug in this case, but again, this should at least get you started.
gawk:
#!/bin/awk -f
BEGIN{
FS="-"
}
{
a[NR]=mktime($1" "$2" "$3" 0 0 0")
b[NR]=$2;
if ( (a[NR-1]+86400) != a[NR] || b[NR-1]!=b[NR] ) {
if(NR!=1){
print s" - "strftime("%Y-%m-%d",a[NR-1])
};
s=$0
}
}
END{
print s" - "$0
}
Create array a with index NR and value as epochtime derived from $0 using awk time function mktime.
Array b with index NR and value as the month in $2
if either epoch time from last line + 86400 ( +1 day) is not equal to epoch time in current line or month in previous line and current line differs, except for first line, print value in s" - "strftime("%Y-%m-%d",a[NR-1] and reassign s which is the start date with $0
END:
Print the last start time s and last line

Using AWK to check a six column txt file

I am brand new to using Awk and I am running into a bit of a problem. I have multiple tab delimited text files that are made up of six columns. The column layout is:
col1=int
col2=float
col3=float
col4=int
col5=int
col6=DATE (yyyy-mm-dd)
The task at hand is to basically due a quality check on the text files to make sure that each column is that type. I also need to skip the first line since each tab delimited text file has has a header. So far this is what I have:
#!/bin/sh
awk < file1.txt -F\\t '
{(NR!=1)}
{if ($1 != int($1)||($2 != /[0-9]+\.[0-9]*/)||($3 != /[0-9]+\.[0-9]*/)||($4 != int($4)||($5 != int($5))print "Error At " NR; }
'
I am not required to use Awk, it is just that it seemed the most appropriate.
EDIT 1:
#!/bin/sh
awk < file1.txt -F\\t '
{if (NR!=1){
if ($1 != int($1)) print "Error col1 at " NR;
else if ($4 != int($4)) print "Error col4 at " NR;
else if ($5 != int($5)) print "Error col5 at " NR;
}
}
'
This seems to work fine so my questions now are:
1- How do I check for floats?
2- How do I run this over multiple files?
If this isn't what you want then edit your question to include some sample input and expected output:
awk '
function act_type(n, t) {
if (n ~ /^[0-9]{4}(-[0-9]{2}){2}$/) { t = "date" }
else if (n ~ /^-?[0-9]+\.[0-9]+$/) { t = "float" }
else if (n ~ /^-?[0-9]+$/) { t = "int" }
return t
}
BEGIN { split("int float float int int date",exp_type) }
{
for (i=1; i<=NF; i++) {
if (act_type(i) != exp_type[i]) {
print "Error col", i, "at", NR. "in", FILENAME | "cat>&2"
}
}
}
' file
massage the regexp to suit your data (i.e. if your ints can start with + and/or include ,s then include that in the regexp).
To test if a field is a number, you can check if
$1 + 0 == $1
This works because adding to a string converts it to zero, if it isn't a number.
To run a script on multiple files, you can just add them as extra parameters, e.g.
awk 'commands' file1 file2 file3

Resources