Grepping particular pattern and want to remove unwanted pattern using awk command

Grepping particular pattern and want to remove unwanted pattern using awk command - bash

I have one text file a.txt whose contents is shown below. I do not wanted to print "wide" as violation but it is showing up in the below command which I used. Can anybody help me on this so that "wide" will not come as violation.
Command used by me :
awk '{
if ($0 =="") {rsave=0}
else {if (rsave==0) rule=$1; rsave=1};
if ($0 ~ ":.* [1-9] violations? found")
{printf "%s\n", $1; rsave=0}
else if ($0 ~ "[1-9] violations? found")
{printf "%s\n", rule; rsave=0}}' a.txt \
| sort -u
Output which is coming by using the above command:
DM5.S.7:IP_TIGHTEN_BOUNDARY
DM6.S.7:IP_TIGHTEN_BOUNDARY
text_net:text_short
wide
Expected output:
DM5.S.7:IP_TIGHTEN_BOUNDARY
DM6.S.7:IP_TIGHTEN_BOUNDARY
text_net:text_short
a.txt file contents:
ERROR SUMMARY
DM5.S.7:IP_TIGHTEN_BOUNDARY : To avoid > 1.4 um x
1.4 um Metal empty space after IP abutment Metal
empty space must <= 0.7 um x 1.4 um on IP boundary
edge Metal empty space must <= 0.7 um x 0.7 um on
IP boundary corner
contains ........................................... 1 violation found.
wide ............................................... 4 violations found.
DM6.S.7:IP_TIGHTEN_BOUNDARY : To avoid > 1.4 um x
1.4 um Metal empty space after IP abutment Metal
empty space must <= 0.7 um x 1.4 um on IP boundary
edge Metal empty space must <= 0.7 um x 0.7 um on
IP boundary corner
contains ........................................... 1 violation found.
wide ............................................... 4 violations found.
Violation
text_net:text_short ................................ 4 violations found.
text_abcd:text_short ................................ 0 violations found.

Easiest way to do it might be to pipe a grep -v wide after awk and after sort
From grep's manpage:
v, --invert-match Invert the sense of matching, to select non-matching lines. (-v is specified by POSIX .)

Related

Grepping particular pattern using sed command

I have one file (Let say a.txt) whose contents is as shown below. I want to grep only errors name only before colon (:) like DK2.a.Iq_abc_vu, LAP.ABCD.1 but not grep "11xAB2_B_1" error as violation value is 0 except there is one special case mentioned at last of the question. We have to grep only those errors whose value is non zero (like DK2.a.Iq_abc_vu,LAP.ABCD.1 but not 11xAB2_B_1 as it value is showing 0 violations). The format of a.txt file is remain same across different files also. Here there is one special case when "violation" word is coming in that case we have grep "text_abcd" and "text_jkl" as error not "violation". Can you please help me how can grep I these errors as shown in below output.
$ cat a.txt file
DK2.a.Iq_abc_vu : To avoid > 500 um x 500.0 um Metal empty space after IP abutment empty space must on IP boundary corner
interacting ........................................ 1 violation found.
interacting ........................................ 1 violation found.
DM3.a.7.abc_vu : To avoid > 100.0 um x 100.0 um Metal empty space after TV boundary corner having some thing
interacting ........................................ 2 violations found.
LAP.ABCD.1 : Voltage high this is one type of error coming some thing violations. This error can be removed by providing spacing
net_abcd:net_abcd .............................. 1 violation found.
net_abcd:net_abcd .............................. 1 violation found.
net_abcd:net_abcd .............................. 10 violation found.
net_abcd:net_abcd .............................. 1 violation found.
11xAB2_B_1 : 10xAB area >= 100um2
not_inside ......................................... 0 violations found.
Violation
text_abcd:text_pqrs .......................... 2 violations found.
text_jkl:jkl_jkl ............................. 2 violations found.
Desired output:
DK2.a.Iq_abc_vu
DM3.a.7.abc_vu
LAP.ABCD.1
text_abcd
text_jkl

Assuming the answer doesn't have to be based on sed ...
We can use egrep to keep only those lines that meet one of the following criteria:
line contains a colon with leading/trailing space (:) or the word violation (case insensitive)
from the resulting lines we then discard lines that contain 0 violations
At this point we have:
$ egrep -i " : |violation" a.txt | egrep -v " 0 violations"
DK2.a.Iq_abc_vu : To avoid > 500 um x 500.0 um Metal empty space after IP abutment empty space must on IP boundary corner
interacting ........................................ 1 violation found.
interacting ........................................ 1 violation found.
DM3.a.7.abc_vu : To avoid > 100.0 um x 100.0 um Metal empty space after TV boundary corner having some thing
interacting ........................................ 2 violations found.
LAP.ABCD.1 : Voltage high this is one type of error coming some thing violations. This error can be removed by providing spacing
net_abcd:net_abcd .............................. 1 violation found.
net_abcd:net_abcd .............................. 1 violation found.
net_abcd:net_abcd .............................. 10 violation found.
net_abcd:net_abcd .............................. 1 violation found.
11xAB2_B_1 : 10xAB area >= 100um2
Violation
text_abcd:text_pqrs .......................... 2 violations found.
text_jkl:jkl_jkl ............................. 2 violations found.
Now we can use awk to keep track of 2 types of errors:
(1) line contains [space]:[space], so we store the error name and if the next line contains the string violation we print the error name and then clear the error name (to keep from printing it twice)
(2) line starts with ^Violation in which case we'll obtain/print the error name from each follow-on line that contains the string violation (error name is the portion of the line before the :)
The awk code to implement this looks like:
awk '
/ : / { errname = $1 ; next }
/^Violation/ { errname = $1 ; next }
/violation/ { if ( errname == "Violation" ) { split($1,a,":") ; print a[1] ; next }
if ( errname != "" ) { print errname ; errname="" ; next }
}
'
Pulling the egrep and awk snippets together gives us:
$ egrep -i " : |violation" a.txt | egrep -v " 0 violations" | awk '
/ : / { errname = $1 ; next }
/^Violation/ { errname = $1 ; next }
/violation/ { if ( errname == "Violation" ) { split($1,a,":") ; print a[1] ; next }
if ( errname != "" ) { print errname ; errname="" ; next }
}
'
With the following results:
DK2.a.Iq_abc_vu
DM3.a.7.abc_vu
LAP.ABCD.1
text_abcd
text_jkl

bash : to keep all values > 3500 with sed

I 've a question concerning sed cmd: how to keep all values > 3500 in a field?
this is my problem:
I've as output (from a .csv file):
String1;Val1;String2;Val2
i would like to keep all lines where Val1 is only > 3500 and Val2 >= 60,00 (<= 99,99)
so, i tried this:
`sed -nr 's/^(.*);
([^([0-9]|[1-9][0-9]|[1-9][0-9]{2}|[1-2][0-9]{3}|3[0-4][0-9]{2}|3500)]);
(.*);
([6-9][0-9],[0-9]*)$
/Dans la ville de \1, \2 votants avec un pourcentage de \4 pour \3/p'
`
but i 've this error:
`sed -e expression #1, char 174: Unmatched ) or \)`
i think the problem come from the search of the second field.
i look all numbers <= 3500 and i put NOT(these tests).
Do u have an idea to how should i proceed?
Thanks.
(and sry for this terrible english)

Awk is the right way to go in such case:
awk 'BEGIN{ FS=OFS=";" }$2 > 3500 && ($4 >= 60.00 && $4 <= 99.99)' file

The parsing error is in [^([0-9]|[1-9][0-9]|[1-9][0-9]{2}|[1-2][0-9]{3}|3[0-4]. I'm not entirely sure where exactly, but that doesn't matter since there is an error in your approach:
(Inverted) character classes [^...] do not work on full strings. [^ab|xy] matches all single characters that are not a, b, |, x, or y.
If you want to say »all strings except 0, 1, 2, ..., 3500« you have to use something different, probably a positive formulation like »all strings from 3500, 3501, ...«.
The following regex should work for numbers >= 3500.
0*([1-9][0-9]{4,}|[4-9][0-9]{3}|3[5-9][0-9]{2})

fatal: division by zero attempted when trying to find mean?

I'm trying to find the mean of several numbers in a file, which contains "< Overall >" on the line.
My code:
awk -v file=$file '{if ($1~"<Overall>") {rating+=$1; count++;}} {rating=rating/count; print file, rating;}}' $file | sed 's/<Overall>//'
I'm getting
awk: cmd. line:1: (FILENAME=[file] FNR=1) fatal: division by zero attempted
for every file. I can't see why count would be zero if the file does contain a line such as "< Overall >5"
EDIT:
Sample from the (very large) input file, as requested:
<Author>RW53
<Content>Location! Location? view from room of nearby freeway
<Date>Dec 26, 2008
<No. Reader>-1
<No. Helpful>-1
<Overall>3
<Value>4
<Rooms>3
<Location>2
<Cleanliness>4
<Check in / front desk>3
<Service>-1
<Business service>-1
Expected output:
[filename] X
Where X is the average of all the lines containing < Overall >

Use an Awk as below,
awk -F'<Overall>' 'NF==2 {sum+=$2; count++}
END{printf "[%s] %s\n",FILENAME,(count?sum/count:0)}' file
For an input file containing two <Overall> clauses like this, it produces a result as follows the file-name being input-file
<Author>RW53
<Content>Location! Location? view from room of nearby freeway
<Date>Dec 26, 2008
<No. Reader>-1
<No. Helpful>-1
<Overall>3
<Value>4
<Rooms>3
<Location>2
<Cleanliness>4
<Check in / front desk>3
<Service>-1
<Business service>-1
<Overall>2
Running it produces,
[input-file] 2.5
The part, -F'<Overall>' splits input-lines with de-limiter as <Overall>, basically only the lines having <Overall> and the number after it will be filtered, the number being $2 which is summed up and stored in sum variable and count is tracked in c.
The END clause gets executed after all lines are printed which basically prints the filename using the awk special variable FILENAME which retains the name of the file processed and the average is calculated iff the count is not zero.

You aren't waiting until you've completely read the file to compute the average rating. This is simpler if you use patterns rather than an if statement. You also need to remove <Overall> before you attempt to increment rating.
awk '$1 ~ /<Overall>/ {rating+=sub("<Overall>", "", $1); count++;}
END {rating=rating/(count?count:1); print FILENAME, rating;}' "$file"
(Answer has been updated to fix a typo in the call to sub and to correctly avoid dividing by 0.)

awk -F '>' '
# separator of field if the >
# for line that containt <Overall>
/<Overall>/ {
# evaluate the sum and increment counter
Rate+=$2;Count++}
# at end of the current file
END{
# print the average.
printf( "[%s] %f\n", FILENAME, Rate / ( Count + ( ! Count ) )
}
' ${File}
# one liner
awk -F '>' '/<Overall>/{r+=$2;c++}END{printf("[%s] %f\n",FILENAME,r/(c+(!c))}' ${File}
Note:
( c + ( ! c ) ) use a side effect of logical NOT (!). It value 1 if c = 0, 0 otherwise. So if c = 0 it add 1, if not it add 0 to itself insurring a division value of at least 1.
assume the full file reflect the sample for content

how to round the output in shell?

For our webshop we get from the manufacturers a csv file (automatically updated) with product data.
Some manufacturers use prices without Tax and some within.
I want to change prices with a shell script to add 21% TAX and round it to nearest .95 or .50
For example I get a sheet:
sku|ean|name|type|price_excl_vat|price
EU-123|123123123123|Product name|simple|24.9900
I use this code:
sed -i "1 s/price/price_excl_vat/" inputfile
awk '{FS="|"; OFS="|"; if (NR<=1) {print $0 "|price"} else {print $0 "|" $5*1.21}}' inputfile > outputfile
the output is:
sku|ean|name|type|price_excl_vat|price
EU-123|123123123123|Product name|simple|24.9900|30.2379
How do I round it to the correct price like below ?
sku|ean|name|type|price_excl_vat|price
EU-123|123123123123|Product name|simple|24.9900|29.95

awk to the rescue!
awk 'BEGIN {FS=OFS="|"}
$NF==$NF+0 {a=$NF*1.21;
r=a-int(a);
if (r<0.225) a=a-r-0.05;
else if (r<0.725) a=a-r+0.50;
else a=a-r+0.95;
$(NF+1)=a} 1'
note that in your example the nearest number for 30.2379 will be 30.50 Perhaps you want to round down?
To round down instead of the nearest, and with a variable price column. The new computed value will be appended to the end of the row.
awk 'BEGIN {FS=OFS="|"; k=5}
$k==$k+0 {a=$k*1.21;
r=a-int(a);
if (r<0.50) a=a-r-0.05;
else if (r<0.95) a=a-r+0.50;
else a=a-r+0.95;
$(NF+1)=a} 1'

awk '#define field separator in and out
BEGIN{FS=OFS="|"}
# add/modify a 6th field for price label if missing on header only
NR==1 && NF == 6 { $6 = "price"; print; next}
NR==1 && NF == 5 { $6 = "price"; print; next}
# add price with tva rounded to 0.01 if missing
NF == 5 { $6 = int( $5 * 121 ) / 100 }
# print the line (modified or not, ex empty lines) [7 is just a *not 0*)
7
' inputfile \
> outputfile
self documented
not sure about your sed for header becasue sample show already a header with price so take the one you want

Not knowing what you're program looks like, it makes it difficult to give you more information.
However, both awk and bash have the printf command. This command can be used for rounding floating point numbers. (Yes, Bash is integer arithmetic, but it can pretend a number is a decimal number).
I gave you the link for the C printf command because the one for Bash doesn't include the formatting codes. Read it and weep because the documentation is a bit dense, and if you've never used printf before, it can be quite difficult to understand. Fortunately, an example will bring things to light:
$ foo="23.42532"
$ printf "%2.2f\n", $foo
$ 23.43 #All rounded for you!
The f means it's a floating point number. The % tells you that this is the beginning of a formatting sequence. The 2.2 means you want 2 digits on the left side of the decimal and two digits on the right. If you said %4.2f, it would make sure there's enough room for four digits on the left side of the decimal, and left pad the number with spaces. The \n on the end is the New Line character.
Fortunately, although printf can be hard to understand at first, it's pretty much the same in almost all programming languages. It's in awk, Perl, Python, C, Java, and many more languages. And, if the information you need isn't in printf, try the documentation on sprintf which is like printf, but prints the formatted text into a string.
The best documentation I've seen is in the Perl sprintf documentation because it gives you plenty of examples.

Bash: arithmetic addressed by line number and column

I have normally done this with Excel, but as I am trying to learn bash, I'd like to ask for advice here on how to do so. My input file resembles:
# s0 legend "1001"
# s1 legend "1002"
#target G0.S0
#type xy
2.0 -1052.7396157664
2.5 -1052.7330560932
3.0 -1052.7540013664
3.5 -1052.7780321236
4.0 -1052.7948229060
4.5 -1052.8081313831
5.0 -1052.8190310613
&
#target G0.S1
#type xy
2.0 -1052.5384564253
2.5 -1052.7040374678
3.0 -1052.7542803612
3.5 -1052.7781686744
4.0 -1052.7948927247
4.5 -1052.8081704241
5.0 -1052.8190543049
&
where the above only shows two data sets: s0 and s1. In reality I have 17 data sets and will combine them arbitrarily. By combine, I mean I would like to:
For two data sets, extract the second column of each separately.
Subtract these two columns row by row.
Multiply the difference by a constant, $C.
Note: $C multiplies very small numbers and the only way I could get it to not divide by zero was to take a massive scale.
Edit: After requests, I was apparently not entirely clear what I was going for. Take for example:
set0
2 x
3 y
4 z
set1
2 r
3 s
4 t
I also have defined a constant C.
I would like to perform the following operation:
C*(r - x)
C*(s - y)
C*(t - z)
I will be doing this for sets > 1, up to 16, for example (set 10) minus (set 0). Therefore, I need the flexibility to target a value based on its line number and column number, and preferably acting over a range of line numbers to make it efficient.
So far this works:
C=$(echo "scale=45;x=(small numbers)*(small numbers); x" | bc -l)
sed -n '5,11p' input.in | cut -c 5-20 > tmp1.in
sed -n '15,21p' input.in | cut -c 5-20 > tmp2.in
pr -m -t -s tmp1.in tmp2.in > tmp3.in
awk '{printf $2-$1 "\n"}' tmp3.in > tmp4.in
but the multiplication failed:
awk '{printf "%11.2f\n", "$C"*$1 }' tmp4.in > tmp5.in
returning:
0.00
0.00
0.00
0.00
0.00
0.00
0.00
I have a feeling the whole thing can be accomplished more elegantly with awk. I also tried this:
for (( i=0; i<=6; i++ ))
do
n=5+$i
m=10+n
awk 'NR==n{a=$2};NR==m{b=$2} {printf "%d\n", $b-$a}' input.in > temp.in
done
but all I get in temp.in is a long column of 0s.
I also tried
awk 'NR==5,NR==11{a=$2};NR==15,NR==21{b=$2} {printf "%d\n", $b-$a}' input.in > temp.in
but got the error
awk: (FILENAME=input.in FNR=20) fatal: attempt to access field -1052
Any idea how to formulate this with awk, and if that doesn't work, then why I cannot multiply with awk above? Thank you!

this does the math in one go
$ awk -v c=1 '/^&/ {s++}
s==1 {a[$1]=$2}
s==3 {print $1,a[$1],$2,c*(a[$1]-$2)}
/#type/ {s++}' file
2.0 -1052.7396157664 -1052.5384564253 -0.201159
2.5 -1052.7330560932 -1052.7040374678 -0.0290186
3.0 -1052.7540013664 -1052.7542803612 0.000278995
3.5 -1052.7780321236 -1052.7781686744 0.000136551
4.0 -1052.7948229060 -1052.7948927247 6.98187e-05
4.5 -1052.8081313831 -1052.8081704241 3.9041e-05
5.0 -1052.8190310613 -1052.8190543049 2.32436e-05
you can remove the decorations and add print formatting easily. The magic numbers 1=g1 and 3=2*g2-1 correspond to data groups 1 and 2 as the order presented in the data file, can be converted to awk variables as well.
The counter s keeps track of whether you're in a set or not, Odd numbers correspond to sets and even numbers between sets. The increment is done both at the start pattern and end pattern. The order of increment statements were set in such a way they, they are not printed following the pattern (unset first, print set values, reset last}. You can change the order and observe the effects.

This might be what you're looking for:
$ cat tst.awk
/^[#&]/ { lineNr=0; next }
{
++lineNr
if (lineNr in prev) {
print $1, c * ($2 - prev[lineNr])
}
prev[lineNr] = $2
}
$ awk -v c=100000 -f tst.awk file
2.0 20115.9
2.5 2901.86
3.0 -27.8995
3.5 -13.6551
4.0 -6.98187
4.5 -3.9041
5.0 -2.32436

In your first try, you should replace that line:
awk '{printf "%11.2f\n", "$C"*$1 }' tmp4.in > tmp5.in
with that one:
awk -v C=$C '{printf "%11.2f\n", C*$1 }' tmp4.in > tmp5.in
You are mixing notations of bash shell with notation with awk.
in shell you define variable without $, and you use them with $.
Here you are in awk script, there is no $ to use variables. Yet there are some special variables : $1 $2 ...
You have put single quote ' around your awk script, so the shell variables cant be used. I mean you have written $C, but the shell can not see it inside single-quote. That is why you have to write awk -v C=$C so that the shell variable $C is transferred to an awk variable called C.
In your other tries with awk, we can see such errors also. Now I think you'll make it.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio