Grepping particular pattern using sed command

Grepping particular pattern using sed command - bash

I have one file (Let say a.txt) whose contents is as shown below. I want to grep only errors name only before colon (:) like DK2.a.Iq_abc_vu, LAP.ABCD.1 but not grep "11xAB2_B_1" error as violation value is 0 except there is one special case mentioned at last of the question. We have to grep only those errors whose value is non zero (like DK2.a.Iq_abc_vu,LAP.ABCD.1 but not 11xAB2_B_1 as it value is showing 0 violations). The format of a.txt file is remain same across different files also. Here there is one special case when "violation" word is coming in that case we have grep "text_abcd" and "text_jkl" as error not "violation". Can you please help me how can grep I these errors as shown in below output.
$ cat a.txt file
DK2.a.Iq_abc_vu : To avoid > 500 um x 500.0 um Metal empty space after IP abutment empty space must on IP boundary corner
interacting ........................................ 1 violation found.
interacting ........................................ 1 violation found.
DM3.a.7.abc_vu : To avoid > 100.0 um x 100.0 um Metal empty space after TV boundary corner having some thing
interacting ........................................ 2 violations found.
LAP.ABCD.1 : Voltage high this is one type of error coming some thing violations. This error can be removed by providing spacing
net_abcd:net_abcd .............................. 1 violation found.
net_abcd:net_abcd .............................. 1 violation found.
net_abcd:net_abcd .............................. 10 violation found.
net_abcd:net_abcd .............................. 1 violation found.
11xAB2_B_1 : 10xAB area >= 100um2
not_inside ......................................... 0 violations found.
Violation
text_abcd:text_pqrs .......................... 2 violations found.
text_jkl:jkl_jkl ............................. 2 violations found.
Desired output:
DK2.a.Iq_abc_vu
DM3.a.7.abc_vu
LAP.ABCD.1
text_abcd
text_jkl

Assuming the answer doesn't have to be based on sed ...
We can use egrep to keep only those lines that meet one of the following criteria:
line contains a colon with leading/trailing space (:) or the word violation (case insensitive)
from the resulting lines we then discard lines that contain 0 violations
At this point we have:
$ egrep -i " : |violation" a.txt | egrep -v " 0 violations"
DK2.a.Iq_abc_vu : To avoid > 500 um x 500.0 um Metal empty space after IP abutment empty space must on IP boundary corner
interacting ........................................ 1 violation found.
interacting ........................................ 1 violation found.
DM3.a.7.abc_vu : To avoid > 100.0 um x 100.0 um Metal empty space after TV boundary corner having some thing
interacting ........................................ 2 violations found.
LAP.ABCD.1 : Voltage high this is one type of error coming some thing violations. This error can be removed by providing spacing
net_abcd:net_abcd .............................. 1 violation found.
net_abcd:net_abcd .............................. 1 violation found.
net_abcd:net_abcd .............................. 10 violation found.
net_abcd:net_abcd .............................. 1 violation found.
11xAB2_B_1 : 10xAB area >= 100um2
Violation
text_abcd:text_pqrs .......................... 2 violations found.
text_jkl:jkl_jkl ............................. 2 violations found.
Now we can use awk to keep track of 2 types of errors:
(1) line contains [space]:[space], so we store the error name and if the next line contains the string violation we print the error name and then clear the error name (to keep from printing it twice)
(2) line starts with ^Violation in which case we'll obtain/print the error name from each follow-on line that contains the string violation (error name is the portion of the line before the :)
The awk code to implement this looks like:
awk '
/ : / { errname = $1 ; next }
/^Violation/ { errname = $1 ; next }
/violation/ { if ( errname == "Violation" ) { split($1,a,":") ; print a[1] ; next }
if ( errname != "" ) { print errname ; errname="" ; next }
}
'
Pulling the egrep and awk snippets together gives us:
$ egrep -i " : |violation" a.txt | egrep -v " 0 violations" | awk '
/ : / { errname = $1 ; next }
/^Violation/ { errname = $1 ; next }
/violation/ { if ( errname == "Violation" ) { split($1,a,":") ; print a[1] ; next }
if ( errname != "" ) { print errname ; errname="" ; next }
}
'
With the following results:
DK2.a.Iq_abc_vu
DM3.a.7.abc_vu
LAP.ABCD.1
text_abcd
text_jkl

Related

I am very new to BASH Scripting, How do I keep a running total (count) of the keyword hits (grep) in a file and then display totals in the end?

count= `grep success <fileName.txt>
The above will only give me a total count of the word "success" but I want to keep a running total. For example, if there is a total of 'expected' 25 hits of which only 20 were found. This would mean that there were 5 failures. So I think I need to keep a running total so in the end I can report (echo) as follows:
20 out of 25 expected success found; 5 failures.

You could use awk which will print out the custom output also.
awk '/success/{ success++ ; expected=25 } END { if ( success < expected ) ; print success " out of " expected " success found; " fail " failures"};{fail=expected-success}' input_file
Example Output
$ for i in {1..25}; do echo "success";done |\
> awk '/success/{ success++ ; expected=25 } END { if ( success < expected ) ; print success " out of " expected " success found; " fail " failures"};{fail=expected-success}'
25 out of 25 success found; 0 failures

This should suffice:
~$ count=$(grep -o -i <search_term> <data_source> | wc -l)
e.g.
~$ count=$(grep -o -i computer myfile.txt | wc -l)
~$ echo $count
--- flag explanations ---
-o means only print only the matching part of the line. Can also be written as --only-matching.
-i makes the search case-insensitive. Also written as --ignore-case.
-l means output the number of lines in each datasource/input file. In the above command, that coordinates with grep with the -o flag which counts each match as a unique line found.

Parsing multiline program output

I've recently been working on some lab assignments and in order to collect and analyze results well, I prepared a bash script to automate my job. It was my first attempt to create such script, thus it is not perfect and my question is strictly connected with improving it.
Exemplary output of the program is shown below, but I would like to make it more general for more purposes.
>>> VARIANT 1 <<<
Random number generator seed is 0xea3495cc76b34acc
Generate matrix 128 x 128 (16 KiB)
Performing 1024 random walks of 4096 steps.
> Total instructions: 170620482
> Instructions per cycle: 3.386
Time elapsed: 0.042127 seconds
Walks accrued elements worth: 534351478
All data I want to collect is always in different lines. My first attempt was running the same program twice (or more times depending on the amount of data) and then using grep in each run to extract the data I need by looking for the keyword. It is very inefficient, as there probably are some possibilities of parsing whole output of one run, but I could not come up with any idea. At the moment the script is:
#!/bin/bash
write() {
o1=$(./progname args | grep "Time" | grep -o -E '[0-9]+.[0-9]+')
o2=$(./progname args | grep "cycle" | grep -o -E '[0-9]+.[0-9]+')
o3=$(./progname args | grep "Total" | grep -o -E '[0-9]+.[0-9]+')
echo "$1 $o1 $o2 $o3"
}
for ((i = 1; i <= 10; i++)); do
write $i >> times.dat
done
It is worth mentioning that echoing results in one line is crucial, as I am using gnuplot later and having data in columns is perfect for that use. Sample output should be:
1 0.019306 3.369 170620476
2 0.019559 3.375 170620475
3 0.021971 3.334 170620478
4 0.020536 3.378 170620480
5 0.019692 3.390 170620475
6 0.020833 3.375 170620477
7 0.019951 3.450 170620477
8 0.019417 3.381 170620476
9 0.020105 3.374 170620476
10 0.020255 3.402 170620475
My question is: how could I improve the script to collect such data in just one program execution?

You could use awk here and could get values into an array and later access them by index 1,2 and 3 in case you want to do this in a single command.
myarr=($(your_program args | awk '/Total/{print $NF;next} /cycle/{print $NF;next} /Time/{print $(NF-1)}'))
OR use following to forcefully print all elements into a single line, which will not come in new lines if someone using " to keep new lines safe for values.
myarr=($(your_program args | awk '/Total/{val=$NF;next} /cycle/{val=(val?val OFS:"")$NF;next} /Time/{print val OFS $(NF-1)}'))
Explanation: Adding detailed explanation of awk program above.
awk ' ##Starting awk program from here.
/Total/{ ##Checking if a line has Total keyword in it then do following.
print $NF ##Printing last field of that line which has Total in it here.
next ##next keyword will skip all further statements from here.
}
/cycle/{ ##Checking if a line has cycle in it then do following.
print $NF ##Printing last field of that line which has cycle in it here.
next ##next keyword will skip all further statements from here.
}
/Time/{ ##Checking if a line has Time in it then do following.
print $(NF-1) ##Printing 2nd last field of that line which has Time in it here.
}'
To access individual items you could use like:
echo ${myarr[0]}, echo ${myarr[1]} and echo ${myarr[2]} for Total, cycle and time respectively.
Example to access all elements by loop in case you need:
for i in "${myarr[#]}"
do
echo $i
done

You can execute your program once and save the output at a variable.
o0=$(./progname args)
Then you can grep that saved string any times like this.
o1=$(echo "$o0" | grep "Time" | grep -o -E '[0-9]+.[0-9]+')

Assumptions:
each of the 3x search patterns (Time, cycle, Total) occur just once in a set of output from ./progname
format of ./progname output is always the same (ie, same number of space-separated items for each line of output)
I've created my own progname script that just does an echo of the sample output:
$ cat progname
echo ">>> VARIANT 1 <<<
Random number generator seed is 0xea3495cc76b34acc
Generate matrix 128 x 128 (16 KiB)
Performing 1024 random walks of 4096 steps.
> Total instructions: 170620482
> Instructions per cycle: 3.386
Time elapsed: 0.042127 seconds
Walks accrued elements worth: 534351478"
One awk solution to parse and print the desired values:
$ i=1
$ ./progname | awk -v i=${i} ' # assign awk variable "i" = ${i}
/Time/ { o1 = $3 } # o1 = field 3 of line that contains string "Time"
/cycle/ { o2 = $5 } # o2 = field 5 of line that contains string "cycle"
/Total/ { o3 = $4 } # o4 = field 4 of line that contains string "Total"
END { printf "%s %s %s %s\n", i, o1, o2, o3 } # print 4x variables to stdout
'
1 0.042127 3.386 170620482

Grepping particular pattern and want to remove unwanted pattern using awk command

I have one text file a.txt whose contents is shown below. I do not wanted to print "wide" as violation but it is showing up in the below command which I used. Can anybody help me on this so that "wide" will not come as violation.
Command used by me :
awk '{
if ($0 =="") {rsave=0}
else {if (rsave==0) rule=$1; rsave=1};
if ($0 ~ ":.* [1-9] violations? found")
{printf "%s\n", $1; rsave=0}
else if ($0 ~ "[1-9] violations? found")
{printf "%s\n", rule; rsave=0}}' a.txt \
| sort -u
Output which is coming by using the above command:
DM5.S.7:IP_TIGHTEN_BOUNDARY
DM6.S.7:IP_TIGHTEN_BOUNDARY
text_net:text_short
wide
Expected output:
DM5.S.7:IP_TIGHTEN_BOUNDARY
DM6.S.7:IP_TIGHTEN_BOUNDARY
text_net:text_short
a.txt file contents:
ERROR SUMMARY
DM5.S.7:IP_TIGHTEN_BOUNDARY : To avoid > 1.4 um x
1.4 um Metal empty space after IP abutment Metal
empty space must <= 0.7 um x 1.4 um on IP boundary
edge Metal empty space must <= 0.7 um x 0.7 um on
IP boundary corner
contains ........................................... 1 violation found.
wide ............................................... 4 violations found.
DM6.S.7:IP_TIGHTEN_BOUNDARY : To avoid > 1.4 um x
1.4 um Metal empty space after IP abutment Metal
empty space must <= 0.7 um x 1.4 um on IP boundary
edge Metal empty space must <= 0.7 um x 0.7 um on
IP boundary corner
contains ........................................... 1 violation found.
wide ............................................... 4 violations found.
Violation
text_net:text_short ................................ 4 violations found.
text_abcd:text_short ................................ 0 violations found.

Easiest way to do it might be to pipe a grep -v wide after awk and after sort
From grep's manpage:
v, --invert-match Invert the sense of matching, to select non-matching lines. (-v is specified by POSIX .)

Find words in multiple files and sort in another

Need help with "printf" and "for" loop.
I have individual files each named after a user (e.g. john.txt, david.txt) and contains various commands that each user ran. Example of commands are (SUCCESS, TERMINATED, FAIL, etc.). Files have multiple lines with various text but each line contains one of the commands (1 command per line).
Sample:
command: sendevent "-F" "SUCCESS" "-J" "xxx-ddddddddddddd"
command: sendevent "-F" "TERMINATED" "-J" "xxxxxxxxxxx-dddddddddddddd"
I need to go through each file, count the number of each command and put it in another output file in this format:
==== John ====
SUCCESS - 3
TERMINATED - 2
FAIL - 4
TOTAL 9
==== David ====
SUCCESS - 1
TERMINATED - 1
FAIL - 2
TOTAL 4

P.S. This code can be made more compact, e.g there is no need to use so many echo's etc, but the following structure is being used to make it clear what's happening:
ls | grep .txt | sed 's/.txt//' > names
for s in $(cat names)
do
suc=$(grep "SUCCESS" "$s.txt" | wc -l)
termi=$(grep "TERMINATED" "$s.txt"|wc -l)
fail=$(grep "FAIL" "$s.txt"|wc -l)
echo "=== $s ===" >>docs
echo "SUCCESS - $suc" >> docs
echo "TERMINATED - $termi" >> docs
echo "FAIL - $fail" >> docs
echo "TOTAL $(($termi+$fail+$suc))">>docs
done
Output from my test files was like :
===new===
SUCCESS - 0
TERMINATED - 0
FAIL - 0
TOTAL 0
===vv===
SUCCESS - 0
TERMINATED - 0
FAIL - 0
TOTAL 0
based on karafka's suggestions instead of using the above lines for the for-loopyou can directly use the following:
for f in *.txt
do
something
#in order to print the required name in the file without the .txt you can do a
printf "%s\n" ${f::(-4)}

awk to the rescue!
$ awk -vOFS=" - " 'function pr() {s=0;
for(k in a) {s+=a[k]; print k,a[k]};
print "\nTOTAL "s"\n\n\n"}
NR!=1 && FNR==1 {pr(); delete a}
FNR==1 {print "==== " FILENAME " ===="}
{a[$4]++}
END {pr()}' file1 file2 ...
if your input file is not structured (key is not always on fourth field), you can do the same with pattern match.

bash log file count words and replace them by number

I need to keep warnings from my script log and add a "LAST" to every line after each start so I know when the alert occurs at a glance so I add this to my script :
This is the fist line of my script :
echo "$( cat $ALERT_LOG_FILE | grep WARNING | tail -n 2k | ts "LAST ")" > $ALERT_LOG_FILE
Script log looks like this at first run :
WARNING : ...
WARNING : ...
WARNING : ...
WARNING : ...
When script start/restart the echo line adds "LAST" to each line and make it like this :
LAST WARNING : ...
LAST WARNING : ...
LAST WARNING : ...
LAST WARNING : ...
Problem is the log file becomes like this after some restarts:
LAST LAST LAST LAST WARNING : ....
LAST LAST LAST WARNING : ....
LAST LAST WARNING : ....
LAST LAST WARNING : ....
LAST WARNING : ....
WARNING:
Any way to make it like this:
LAST 4 WARNING : ....
LAST 3 WARNING : ....
LAST 2 WARNING : ....
LAST 2 WARNING : ....
LAST 2 WARNING : ....
LAST 1 WARNING : ....
WARNING:
EDIT:
code with #Yoda suggestion:
cat $LOG_FILE | grep WARNING | tail -n 2k | ts "LAST " | awk '{n=gsub("LAST ",X);if(n) print "LAST",n,$0;else print}')" > $LOG_FILE
out put log after some restarts with #Yoda suggestion:
LAST 2 2 1 WARNING : ...
LAST 2 1 WARNING : ...
LAST 1 WARNING : ...
WARNING : ...

Based on some assumptions:-
$ awk '{n=gsub("LAST ",X);if(n) print "LAST",n,$0;else print}' file
LAST 4 WARNING : ....
LAST 3 WARNING : ....
LAST 2 WARNING : ....
LAST 2 WARNING : ....
LAST 1 WARNING : ....
WARNING:
If this is not what your are looking for, then I would suggest posting a representative sample of your log file and expected output.

Here is something that might help:-
awk '
{
n = gsub("LAST ",X)
if( n )
{
for ( i = 1; i <= NF; i++ )
{
if ( $i ~ /WARNING/ )
{
sub(/^ */,X)
print "LAST",n,$0;
next
}
if ( $i ~ /^[0-9]$/ )
{
n += $i
$i = ""
}
}
}
else
print $0
}
'

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio