Logfile reformatting with regex - bash

I'm using grep to filter certain lines from a log file and present them to my conky config.
The log file is /var/log/messages.
The entries pertain to UFW block events.
The trouble is that I only care about certain strings of each line.
I can grep the only the UFW blocks, but the line is too long to fit in conky.
Even if conky were not part of the equation, learning to only show pieces of a log line would benefit me in future.
I have got somewhere by using the following:
grep -Ewoh '(IN=([a-z]){4,})|((DST|SRC)=(([0-9]){1,3}\.){3,}([0-9]){1,3})|(PROTO=[a-z]{2,6})|((SPT|DPT)=[0-9]{1,5})' /var/log/messages
This ugly-looking regex is filtering for entries (like) this:
IN=wlan0
SRC=10.10.123.23
DST=192.168.41.23
PROTO=TCP
SPT=443
DPT=41080
Which (nearly) line-for-line is as such:
'
IN=([a-z]){4,}
(DST|SRC)=(([0-9]){1,3}\.){3,}([0-9]){1,3}
PROTO=[a-z]{2,6}
(SPT|DPT)=[0-9]{1,5}
'
The problem is that this produces a new line for each matching word, where I just want the filtered strings of each line, in their line.
$ grep -Ewoh '(IN=([a-z]){4,})|((DST|SRC)=(([0-9]){1,3}\.){3,}([0-9]){1,3})|(PROTO=[a-z]{2,6})|((SPT|DPT)=[0-9]{1,5})' /var/log/messages
IN=wlan
SRC=103.81.76.20
DST=172.31.77.54
SPT=443
DPT=41080
$
I'd rather not use a very complicated awk method unless I can walk back into it a few months later and remember it easily. awk is incredible, but it can be difficult to digest if you drop the ball just once!
Thank you.

If I understand correctly, instead of the list provided by grep -o, you want to remove non matching strings, and print only matching strings, in place. Ie. in the lines and order they appear.
Using gawk's FPAT:
gawk -v FPAT='my-regex' '$1=$1'
Replace my-regex with the regex for strings you want to see.
This will print the matches on each line, in order, delimited by a space.
Add -v OFS= to remove the space, or for example, -v OFS=', ' to change the delimiting string.
You were using grep -w to match a whole word. You can do this in a gawk regex by using \< and \> for left and right word boundary (respectively).
For example, add parentheses and word boundaries around the whole list of 'or' operators (|):
-v FPAT='\<((IN=([a-z]){4,})|((DST|SRC)=(([0-9]){1,3}\.){3,}([0-9]){1,3})|(PROTO=[a-z]{2,6})|((SPT|DPT)=[0-9]{1,5}))\>'
Note the bugs in your regex, such as tshiono commented, which won't match PROTO=TCP.

Related

Grepping for exact string while ignoring regex for dot character

So here's my issue. I need to develop a small bash script that can grep a file containing account names (let's call it file.txt). The contents would be something like this:
accounttest
account2
account
accountbtest
account.test
Matching an exact line SHOULD be easy but apparently it's really not.
I tried:
grep "^account$" file.txt
The output is:
account
So in this situation the output is OK, only "account" is displayed.
But if I try:
grep "^account.test$" file.txt
The output is:
accountbtest
account.test
So the next obvious solution that comes to mind, in order to stop interpreting the dot character as "any character", is using fgrep, right?
fgrep account.test file.txt
The output, as expected, is correct this time:
account.test
But what if I try now:
fgrep account file.txt
Output:
accounttest
account2
account
accountbtest
account.test
This time the output is completely wrong, because I can't use the beginning/end line characters with fgrep.
So my question is, how can I properly grep a whole line, including the beginning and end of line special characters, while also matching exactly the "." character?
EDIT: Please note that I do know that the "." character needs to be escaped, but in my situation, escaping is not an option, because of further processing that needs to be done to the account name, which would make things too complicated.
The . is a special character in regex notation which needs to be escaped to match it as a literal string when passing to grep, so do
grep "^account\.test$" file.txt
Or if you cannot afford to modify the search string use the -F flag in grep to treat it as literal string and not do any extra processing in it
grep -Fx 'account.test' file.txt
From man grep
-F, --fixed-strings
Interpret PATTERN as a list of fixed strings (instead of regular expressions), separated by newlines, any of which is to be matched.
-x, --line-regexp
Select only those matches that exactly match the whole line. For a regular expression pattern, this is like parenthesizing the pattern and then surrounding it with ^ and $.
fgrep is the same as grep -F. grep also has the -x option which matches against whole lines only. You can combine these to get what you want:
grep -Fx account.test file.txt

How to detect some pattern with grep -f on a file in terminal, and extract those lines without the pattern

I'm on mac terminal.
I have a txt file with one column with 9 IDs, allofthem.txt, where every ID starts with ¨rs¨:
rs382216
rs11168036
rs9296559
rs9349407
rs10948363
rs9271192
rs11771145
rs11767557
rs11
Also, I have another txt file, useful.txt, with those IDs that were useful in an analysis I did. It looks the same, one column with several rows of IDs, but with less IDS, only 5.
rs9349407
rs10948363
rs9271192
rs11
Problem:I want to generate a new txt file with the non-useful ones (the ones that appear in allofthem.txt but not in useful.txt).
I want to do the inverse of:
grep -f useful.txt allofthem.txt
I want to use some systematic way of deleting all the IDs in useful and obtain a file with the remaining ones. Maybe with awk or sed, but I can´t see it. Can you help me, please? Thanks in advance!
Desired output:
rs382216
rs11168036
rs9296559
rs11771145
rs11767557
-v option does the inverse for you:
grep -vxf useful.txt allofthem.txt > remaining.txt
-x option matches the whole line in allofthem.txt, not parts.
As #hek2mgl rightly pointed out, you need -F if you want to treat the content of useful.txt as strings and not patterns:
grep -vxFf useful.txt allofthem.txt > remaining.txt
Make sure your files have no leading or trailing white spaces - they could affect the results.
I recommend to use awk:
awk 'FNR==NR{patterns[$0];next} $0 in patterns' useful.txt allofthem.txt
Explanation:
FNR==NR is true as long as we are reading useful.txt. We create an index in patterns for every line of useful.txt. next stops further processing.
$0 in patterns runs, because of the previous next statement, on every line of allofthem.txt. It checks for every line of that file if it is a key in patterns. If that checks evaluates to true awk will print that line.

Cutting a string using multiple delimiters using the awk or sed commands

I am using a SIPP server simulator to verify incoming calls.
What I need to verify is the caller ID and the dialed digits. I've logged this information to a file, which now contains, for example, the following:
From: <sip:972526134661#server>;tag=60=.To: <sip:972526134662#server>}
in each line.
What I want is to modify it to a csv file containing simply the two phone numbers, such as follows:
972526134661,972526134662
and etc.
I've tried using the awk -F command, but then I can only use the sip: as a delimiter or the # or / as delimiters.
While, basically what I want to do is to take all the strings which begin with a < and end with >, and then take all the strings that follow the sip: delimiter.
using the cut command is also not an option, as I understand that it cannot use strings as delimiters.
I guess it should be really simple but I haven't find quite the right thing to use.. Would appreciate the help, thanks!
OK, for fun, picking some random data (from your original post) and using awk -F as you originally wanted.
To note, because your file is "generated", we can assume a regular format for the data and not expect the "short" patterns to cause mis-hits.
[g]awk -F'sip:|#' -v OFS="," '{print $2,$4}' yourlogfile
It uses both sip: and # as the Field Separator, by means of the alternation operator |. It can easily be extended to allow further characters or strings to also be used to separate fields in the input if required. The built-in variable FS can contain a regular expression/regexp like this.
For that first sample in your question, it yields this:
972526134661,972526134662
For the latest (revision 8) version, and guessing what you want:
[g]awk -F'sip:|#|to_number:' -v OFS="," '{print $2,$5}' yourlogfile
Yields this:
from_number,972526134662
The [g]awk is because I used gawk on my machine, and got same behaviour with awk.
Slight amendment in style, suggested by #fedorqui, to use the command-line option -v to set the value for the Output Field Separator (an AWK built-in variable which can be amended using -v like any other variable) and separating the print fields with a comma, so that they are treated in the output as fields, rather than building a string with a hard-coded "," and treating it as one field.
I would suggest using sed to extract the two numbers:
$ sed -n 's/^From: <sip:\([0-9]*\).*To: <sip:\([0-9]*\).*/\1,\2/p' file
972526134661,972526134662
The regular expression matches a line beginning with From and captures the two numbers after <sip:. If the spaces are variable, you may want to add * to those places.
You can use a regex replace, as long as the format stays the same (order is always From/To):
sed -E "s/^.*sip:([0-9]+)#.*sip:([0-9]+)#.*$/\1,\2/"
It's not a very specific or perfect solution, but in most cases an approach like this is enough.

How to pull a value from between 2 strings which occur several times in a file

I am trying to pull the value from inbetween 2 strings and line break each result. I am then hoping to combine this with another value from the same document being pulled the same way. The problem is there are NO linebreaks in this file and it is quite large. Here is an example of the file.
<ID>47</ID><DATACENTER_ID>36</DATACENTER_ID><DNS_NAME>myhost.domain.local</DNS_NAME> <IP_ADDRESS>10.0.0.1</IP_ADDRESS><ID>60</ID><DATACENTER_ID>36</DATACENTER_ID><DNS_NAME>yourhost.domain.local</DNS_NAME><IP_ADDRESS>10.0.0.2</IP_ADDRESS>
My end result would ideally look something like this.
ID-----DNS_NAME
47-----myhost.domain.local
60-----yourhost.domain.local
My closest attemps so far have been creating variables with grep, but I cant seem to format them into a table. Im also very new to scripting so forgive my ignorance.
If your grep supports -P (--Perl-regexp), then you're free to use the below regex.
$ grep -oP '<ID>\K[^<>]*(?=</ID>)|<DNS_NAME>\K[^<>]*(?=</DNS_NAME>)' file | sed 'N;s/\n/-----/g'
47-----myhost.domain.local
60-----yourhost.domain.local
\K Discards the previously matched characters from printing.
(?=...) posiitve lookahead assertion which asserts where the match would occur. It won't consume any characters.
Here is an gnu awk (do to multiple characters in RS) to get your data:
awk -v RS="<ID>" -F"<|>" 'NR>1 {print $1"-----"$9}' file
47-----myhost.domain.local
60-----yourhost.domain.local

Using sed/awk to process a pattern in bash

I have a command whose output is of the form:
[{"foo1":<some value>,"foo2":<some value>,"foo3":<some value>}]
I want to take the output of this command and just get the value corresponding to foo2
How do I use sed/awk or any other shell utility readily available in a bash script to do this?
Assuming that the values do not contain commas, this sed rune will do it:
sed -n 's/.*"foo2":\([^,]*\),.*/\1/'p
sed -n tells sed not to print lines by default.
The s ("substitute") command uses a regexp group delimited by \( and \) to pick out just the bit you want.
"foo2": provides the context needed to find the right value.
[^,]* means "a character that is not a comma, any number of times". This is your . If values are not delimited by commas, change this (and the comma after the grouping parens) to match correctly.
.* means "any character, any number of times", and it is used to match all the characters before and after the bit you want. Now the regexp will match the entire line.
\1 means the contents of the grouping parentheses. sed will substitute the string that matches the pattern (which is the whole line, because we used .* at the beginning and end) with the contents of the parens, .
Finally, the p on the end means "print the resulting line".
With this awk for example:
$ awk -F[:,] '{print $4}' file
<some value2>
-F[:,] sets possible field separators as : or ,. Then, it is a matter of counting the position in which <some value> of foo2 are. It happens to be the 4th.
With sed:
$ sed 's/.*"foo2":\([^,]*\).*/\1/g' file
<some value2>
.*"foo2":\([^,]*\).* gets the string coming after foo2: and until the comma appears. Then it prints it back with \1.
Your block of data looks like JSON. There is no native JSON parsing in bash, sed or awk, so ALL the answers here will either suggest that you use a different, more appropriate tool, or they will be hackish and might easily fail if your real data looks different from the example you've provided here.
That said, if you are confident that your variable:value blocks and line structure are always in the same format as this example, you may be able to get away with writing your own (very) basic parser that will work for just your use case.
Note that you can't really parse things in sed, it's just not designed for that. If your data always looks the same, a sed solution may be sufficient ... but remember that you are simply pattern matching, not parsing the input data. There are other answers already which cover this.
For very simple matching of the string that appears after the colon after "foo2", as Peter suggested, you could use the following:
$ data='[{"foo1":11,"foo2":222,"foo3":3333}]'
$ echo "$data" | sed -ne 's/.*"foo2":\([^,]*\),.*/\1/p'
As I say, this should in no way be confused with parsing of your JSON. It would work equally well (or badly) with an input string of abcde"foo2":bar,abcde.
In awk, you can make things that are a bit more advanced, but you still have serious limitations when it comes to JSON. For example, if you choose to separate fields with commas, but then you put a comma inside the <some value> in your data, awk doesn't know how to distinguish it from a field separator.
That said, if your JSON is only one level deep (i.e. matches your sample data), the following might work for you:
$ data='[{"foo1":11,"foo2":222,"foo3":3333}]'
$ echo "$data" | awk -F: -vRS=, '{gsub(/[^[:alnum:]]/,"",$1)} $1=="foo2" {print $2}'
This awk script considers commas as record separators and colons as field separators. It does not support any level of depth in your JSON, and depends on alphanumeric variable names. But it should handle JSON split on to multiple lines.
Alternately, if you want to avoid ugly hacks, and perl or python solutions don't work for you, you might want to try out jsawk. With it, you might use something like this:
$ data='[{"foo1":11,"foo2":222,"foo3":3333}]'
$ echo "$data" | jsawk -a 'return this.foo2'
[222]
SEE ALSO: Parsing json with awk/sed in bash to get key value pair
This worked for me. You can Try this one
echo "[{"foo1":<some value>,"foo2":<some value>,"foo3":<some value>}]" | awk -F"[:,]+" '{ if($3=="foo2") { print $4 }}'
Above line awk uses multiple field separators.I have used colon and comma here
Since this looks like JSON, let's parse it like JSON:
perl -MJSON -ne '$json = decode_json($_); print $json->[0]{foo2}, "\n"' <<END
[{"foo1":"some value","foo2":"some, value","foo3":"some value"}]
END
some, value

Resources