Need grep statement to exclude lines - bash

I am running ufw in "open" mode just to collect stats to see if there are any attempts to access the server. UFW is running in "medium" logging so I can see all access to the server. When I check ufw.log, I need to run through the whole list manually.
I currently use:
grep 'IN=eth0' uwf.log
But this still leaves too many records for me to check manually
What I really need is:
Grep must only look for lines that contain IN=eth0 ( This part is
easy)
Grep must IGNORE lines with SRC=0.0.0.0 (These are dhcp broadcasts)
Grep must IGNORE lines with SRC=10.0.1.15 (10.0.X.X is my Nagios checking ftp service)
Can someone please help,
Thank you.

I would use awk:
awk '/IN=eth0/ && !/SRC=0\.0\.0\.0/ && !/SRC=10\.0\.1\.15/' uwf.log
Since awk supports boolean operations, multiple conditions can be expressed in a pretty simple way.

grep -v : This will make grep to exclude lines which matches the pattern.
grep -E : This will make grep to accept regular expressions (in this case multiple possibilities - which is separated by pipe).
grep "IN=eth0" uwf.log |grep -Ev 'SRC=0\.0\.0\.0|SRC=10\.0\.1\.15'

With GNU sed:
sed -rn '/IN=eth0/{/(SRC=0\.0\.0\.0|SRC=10\.0\.1\.15)/!p}' uwf.log

Related

Search all occurences of a instance ids in the variable

I have a bash variable which has the following content:
SSH exit status 255 for i-12hfhf578568tn
i-12hdfghf578568tn is able to connect
i-13456tg is not able to connect
SSH exit status 255 for 1.2.3.4
I want to search the string starting with i- and then extract only that instance id. So, for the above input, I want to have output like below:
i-12hfhf578568tn
i-12hdfghf578568tn
i-13456tg
I am open to use grep, awk, sed.
I am trying to achieve my task by using following command but it gives me whole line:
grep -oE 'i-.*'<<<$variable
Any help?
You can just change your grep command to:
grep -oP 'i-[^\s]*' <<<$variable
Tested on your input:
$ cat test
SSH exit status 255 for i-12hfhf578568tn
i-12hdfghf578568tn is able to connect
i-13456tg is not able to connect
SSH exit status 255 for 1.2.3.4
$ var=`cat test`
$ grep -oP 'i-[^\s]*' <<<$var
i-12hfhf578568tn
i-12hdfghf578568tn
i-13456tg
grep is exactly what you need for this task, sed would be more suitable if you had to reformat the input and awk would be nice if you had either to reformat a string or make some computation of some fields in the rows, columns
Explanation:
-P is to use perl regex
i-[^\s]* is a regex that will match literally i- followed by 0 to N non space character, you could change the * by a + if you want to impose that there is at least 1 char after the - or you could use {min,max} syntax to impose a range.
Let me know if there is something unclear.
Bonus:
Following the comment of Sundeep, you can use one of the improved versions of the regex I have proposed (the first one does use PCRE and the second one posix regex):
grep -oP 'i-\S*' <<<$var
or
grep -o 'i-[^[:blank:]]*' <<<$var
You could use following too(I tested it with GNU awk):
echo "$var" | awk -v RS='[ |\n]' '/^i-/'
You can also use this code (Tested in unix)
echo $test | grep -o "i-[0-z]*"
Here,
-o # Prints only the matching part of the lines
i-[0-z]* # This regular expression, matches all the alphabetical and numerical characters following 'i-'.

bash how to extract a field based on its content from a delimited string

Problem - I have a set of strings that essentially look like this:
|AAAAAA|BBBBBB|CCCCCCC|...|XXXXXXXXX|...|ZZZZZZZZZ|
The '...' denotes omitted fields.
Please note that the fields between the pipes ('|') can appear in ANY ORDER and not all fields are necessarily present. My task is to find the "XXXXXXX" field and extract it from the string; I can specify that field with a regex and find it with grep/awk/etc., but once I have that one line extracted from the file, I am at a loss as to how to extract just that text between the pipes.
My searches have turned up splitting the line into individual fields and then extracting the Nth field, however, I do not know what N is, that is the trick.
I've thought of splitting the string by the delimiter, substituting the delimiter with a newline, piping those lines into a grep for the field, but that involves running another program and this will be run on a production server through near-TB of data, so I wanted to minimize program invocations. And I cannot copy the files to another machine nor do I have the benefit of languages like Python, Perl, etc., I'm stuck with the "standard" UNIX commands on SunOS. I think I'm being punished.
Thanks
As an example, let's extract the field that matches MyField:
Using sed
$ s='|AAAAAA|BBBBBB|CCCCCCC|...|XXXXXXXXX|12MyField34|ZZZZZZZZZ|'
$ sed -E 's/.*[|]([^|]*MyField[^|]*)[|].*/\1/' <<<"$s"
12MyField34
Using awk
$ awk -F\| -v re="MyField" '{for (i=1;i<=NF;i++) if ($i~re) print $i}' <<<"$s"
12MyField34
Using grep -P
$ grep -Po '(?<=\|)[^|]*MyField[^|]*' <<<"$s"
12MyField34
The -P option requires GNU grep.
$ sed -e 's/^.*|\(XXXXXXXXX\)|.*$/\1/'
Naturally, this only makes sense if XXXXXXXXX is a regular expression.
This should be really fast if used something like:
$ grep '|XXXXXXXXX|' somefile | sed -e ...
One hackish way -
sed 's/^.*|\(<whatever your regex is>\)|.*$/\1/'
but that might be too slow for your production server since it may involve a fair amount of regex backtracking.

Does grep support the OR in a group?

I am looking at this question: https://leetcode.com/problems/valid-phone-numbers/
which asked using a cmd to extract the phone numbers.
I found this command works:
cat file.txt | grep -Eo '^(\([0-9]{3}\) ){1}[0-9]{3}-[0-9]{4}$|^([0-9]{3}-){2}[0-9]{4}$'
while this failed:
cat file.txt | grep -E '(^(\([0-9]{3}\))|^([0-9]{3}-))[0-9]{3}-[0-9]{4}'
I don't know why the second failed. Does it because grep doesn't support OR in a group?
No, it's because you dropped the space, so space in a phone number will no longer be allowed.
Also, the grouping in your regex seems to be off by a whack or two. What are you actually trying to express?
Finally, you have a useless use of cat -- grep can perfectly well read one or more input files without the help of cat.

grep or sed pattern matching of domain name and truncating of subdomain?

I am trying to extract a list of domain names from a httrack data stream using grep. I have it close to working, but the result also includes any and all sub-domains.
httrack --skeleton http://www.ilovefreestuff.com -V "cat \$0" | grep -iEo "([0-9,a-z\.-]+)\.(com)"
Here is my current example result:
domain1.com
domain2.com
www.domain3.com
subdomain.domain4.com
whatever.domain5.com
Here is my desired example result.
domain1.com
domain2.com
domain3.com
domain4.com
domain5.com
Is there something I can add to this grep expression, or should I pipe it to a new sed expression to truncate any subdomains? And if so, how do I accomplish this task? I'm stuck. Any help is much appreciated.
Regards,
Wyatt
You could drop the . in the grep pattern. The following should work
httrack --skeleton http://www.ilovefreestuff.com -V "cat \$0" |
grep -iEo '[[:alnum:]-]+\.(com|net|org)'
If you are just wanting to do a .com then the following will work as it will remove HTTP:// with or without an s, and the next sub-domains. As you can see though it will only work for a .com.
/(?:https?:\/\/[a-z09.]*?)([a-zA-Z0-9-]*\.com)/
Example Dataset
http://www.ilovefreestuff.com/
https://test.ilovefreestuff.com/
https://test.sub.ilovefreestuff.com/
REGEX101
That being said it is generally bad practice to parse and/or validate domain names using Regex as there are a ton of variants that can never be fully accounted for with the exception being when the conditions for matching and/or the dataset is clearly defined and not all encompassing. THIS post has more details on this process and covers a few more situations.
I use this code
include all domain & subdomains
grep -oE '[[:alnum:]_.-]+[.][[:alnum:]_.-]+' file_name | sed -re 's/[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}//g' | sort -u > test.txt

Grep with number

How I can grep with number. I have following data
[240465] SERVICE ALERT: localhost;demo-jms2:Critical Services;CRITICAL;SOFT;2;Disk Space,OutConnectorResponse-MCASMS,OutConnectorResponse-SPONSORED-SMS,
[240465] SERVICE EVENT HANDLER: localhost;demo-jms2:Critical Services;CRITICAL;SOFT;1;notify-service-by-email
I want to grep this data with number '2' which is a parameter after 'SOFT'. Here my problem is i am getting both lines when I grep with 2 since 2 contains in the time stamp of second line..
There is already a bunch of answers recommending fixes to your grep expression. It can be the right thing to do if you work on a problem interactively, and a quick hack that narrows down the results right now is enough for you.
If you're writing a script, I would recommend something like this awk command:
awk -F';' '{if ($5==2) print}'
Or a more readable equivalent provided by #sudo_O in a comment:
awk -F';' '$5==2'
First we have to reconstruct a specification of what you want from your example. You're in the best position to do it reliably, but it looks like you want to find 2 in the fifth field of semicolon-separated fields. That's what the above command does.
Simply use the pattern ;2;
$ grep ';2;' file
[240465] SERVICE ALERT: localhost;demo-jms2:Critical Services;CRITICAL;SOFT;2;...
Or if you only want to match 2 following SOFT then use ;SOFT;2;
$ grep ';SOFT;2;' file
[240465] SERVICE ALERT: localhost;demo-jms2:Critical Services;CRITICAL;SOFT;2;...
Grep with more context:
grep "SOFT;2;" data
Mind the quotes, otherwise the ";" will be interpreted by the shell.
Try doing this :
grep 'SERVICE.*ALERT.*SOFT;2' file
I had added more patterns for unicity.
grep -Po "(?<=SOFT;)\d+"
with your data
kent$ echo "[240465] SERVICE ALERT: localhost;demo-jms2:Critical
Services;CRITICAL;SOFT;2;Disk Space,OutConnectorResponse-MCASMS,OutConnectorResponse-SPONSORED-SMS,
[240465] SERVICE EVENT HANDLER: localhost;demo-jms2:Critical Services;CRITICAL;SOFT;1;notify-service-by-email"|grep -Po "(?<=SOFT;)\d+"
2
1

Resources