Cut Mod_Security ID with sed/awk

Cut Mod_Security ID with sed/awk - bash

I would like to cut the numbers between the quotas of a Mod_sec ID: [id "31231"]. Generally it is not difficult at all but when I am trying to extract all IDs from multiple reports such as:
[Wed Oct 19 15:31:33.460342 2016] [:error] [pid 16526] [client 67.22.202.121] ModSecurity: Access denied with code 400 (phase 2). Operator EQ matched 0 at REQUEST_HEADERS. [file "/usr/local/apache/conf/includes/mod_security2.conf"] [line "4968"] [id "000784"] [hostname "example.org"] [uri "/"] [unique_id "WAfYJU1ol#MAAECO#HQAAAAI"]
[Wed Mar 19 15:31:33.460342 2016] [:error] [pid 16526] [client 67.22.202.121] ModSecurity: Access denied with code 400 (phase 2). Operator EQ matched 0 at REQUEST_HEADERS. [file "/usr/local/apache/conf/includes/mod_security2.conf"] [line "4968"] [id "9"] [hostname "example.org"] [uri "/"] [unique_id "WAfYJU1ol#MAAECO#HQAAAAI"]
[Wed Mar 19 15:31:33.460342 2016] [:error] [pid 16526] [client 67.22.202.121] ModSecurity: Access denied with code 400 (phase 2). Operator EQ matched 0 at REQUEST_HEADERS. [file "/usr/local/apache/conf/includes/mod_security2.conf"] [line "4968"] [id "00263"] [hostname "example.org"] [uri "/"] [unique_id "WAfYJU1ol#MAAECO#HQAAAAI"]
I have attempted several commands such as:
cat asd | awk '/\[id\ "/,/"]/{print}'
cat asd | sed -n '/[id "/,/"]/p'
and many others but they do not print the required IDs but rather include additional output since the pattern is being matched several times. Generally I am able to do something like:
cat asd | egrep -o "\"[0-9][0-9][0-9][0-9][0-9][0-9]\"" and then cut the output again but this does not work in cases where the ID does not contain 6 numbers.
I am not familiar with all options of awk,sed and egrep and do not seem to find a solution.
What I would like to be printed from above history is:
000784
9
00263
Could someone please help. Thank you in advance.

With sed:
sed -n 's/.*\[id "\([^"]*\)"].*/\1/p'
you need to consume all items before [id and after your token
you need to escape the square bracket

With grep if pcre option is available:
$ grep -oP 'id "\K\d+' asd
000784
9
00263
id "\K positive lookbehind to match id ", not part of output
\d+ the digits following id "
With sed
$ sed -nE 's/.*id "([0-9]+).*/\1/p' asd
000784
9
00263
.*id " match up to id "
([0-9]+) capture group to save digits needed
.* rest of line
\1 entire line replaced only with required string

The ids are accessible in the 6th awk field when double quotes are used as custom separators:
$ awk -F '"|"' '{print $6}' file
000784
9
00263

Related

Parsing java logs for multiline entries using bash

I have loads of java logs on a Linux machine and I'm trying to find a grep expression or something else (perl, awk) that gives me the entire log entry on a match somewhere in its body. Logstash looks like it could do the job, but something with onboard tools would be way better.
An example should help best. Here is an exemplary log with 5 different entries:
25 Aug 2016 14:00:46,435 DEBUG [User][IP][rsc] An error occurred
java.Exception: Foo1
at xyz
25 Aug 2016 14:00:46,436 Foo2 [User][IP][rsc] Some error occured
25 Aug 2016 14:00:46,436 DEBUG [User][IP][rsc] Somethin occured Foo3
25 Aug 2016 14:18:18,224 XYZ [User][IP][rsc] Some problems
More: bla1
More: bla2
USER.bla.bla: Blala::123 - 456
More: Could not open something
at 567
at 890
Caused by: Foo4: Could not open another thing
at 123
at 456
... 127 more
Caused by: gaga
at a1a2a3
at b3b3b3
... 146 more
25 Aug 2016 14:18:20,118 SSO [User][IP][rsc] Process: error -
Could not Foo5
<here is a blank line>
When I search for "Foo1", I need:
25 Aug 2016 14:00:46,435 DEBUG [User][IP][rsc] An error occurred
java.Exception: Foo1
at xyz
When I search for "Foo2":
25 Aug 2016 14:00:46,436 Foo2 [User][IP][rsc] Some error occured
For "Foo3":
25 Aug 2016 14:00:46,436 DEBUG [User][IP][rsc] Somethin occured Foo3
For "Foo4":
25 Aug 2016 01:18:18,224 XYZ [User][IP][rsc] Some problems
More: bla1
More: bla2
USER.bla.bla: Blala::123 - 456
More: Could not open connection
at 567
at 890
Caused by: Foo4: Could not open connection
at 123
at 456
... 127 more
Caused by: gaga
at a1a2a3
at b3b3b3
... 146 more
And finally for "Foo5":
25 Aug 2016 01:18:20,118 SSO [User][IP][rsc] Process: error -
Could not Foo5
When I search for "Foo", everything should be returned.
Is something like this possible? Maybe even as a one liner?
I would like to use it in a Webmin Custom Commands module where I supply the expression via variable.
The only basic idea I have at the moment is search for the expression and use the "[" as pattern to identify where a new entry begins.
Thanks in advance for anybody who has an idea!

A sed solution - good for environments where awk is not allowed - same sed command is shown in oneliner and multiline forms
pat=$1
# oneliner form
#sed -nr '/^[0-9]{2} [a-zA-Z]{3} [0-9]{4} [0-9]{2}:[0-9]{2}:[0-9]{2},[0-9]{3} /!{H; $!b}; x; /'"$pat"'/p; ${g; /^[0-9]{2} [a-zA-Z]{3} [0-9]{4} [0-9]{2}:[0-9]{2}:[0-9]{2},[0-9]{3} /!q; /'"$pat"'/p }'
# multiline form
sed -nr '
/^[0-9]{2} [a-zA-Z]{3} [0-9]{4} [0-9]{2}:[0-9]{2}:[0-9]{2},[0-9]{3} /!{H; $!b}
x
/'"$pat"'/p
${
g
/^[0-9]{2} [a-zA-Z]{3} [0-9]{4} [0-9]{2}:[0-9]{2}:[0-9]{2},[0-9]{3} /!q
/'"$pat"'/p
}'
uses timestamp at beginning of line as record start - accumulates non-timestamp lines i.e. record body in holdspace - swaps holdspace and patternspace on record start - prints record if pattern is matched
special case for record start on last line - it has to be re-gotten from holdspace and separately tested for pattern match
shell quoting needed to construct sed command with pat bash variable

I set awk RS to the timestamp pattern for multiline records:
pat=$1
awk -vpat="$pat" '
BEGIN{
RS="[0-9]{2} [a-zA-Z]{3} [0-9]{4} [0-9]{2}:[0-9]{2}:[0-9]{2},[0-9]{3} "
}
$0 ~ pat {printf("%s%s", prt, $0)}
{prt=RT}
'

Collect info from multiple lines

I need to extract certain info from multiple lines (5 lines every transaction) and make the output as csv file. These lines are coming from a maillog wherein every transaction has its own transaction id. Here's one sample transaction:
Nov 17 00:15:19 server01 sm-mta[14107]: tAGGFJla014107: from=<sender#domain>, size=2447, class=0, nrcpts=1, msgid=<201511161615.tAGGFJla014107#server01>, proto=ESMTP, daemon=MTA, tls_verify=NONE, auth=NONE, relay=[100.24.134.19]
Nov 17 00:15:19 server01 flow-control[6033]: tAGGFJla014107 accepted
Nov 17 00:15:19 server01 MM: [Jilter Processor 21 - Async Jilter Worker 9 - 127.0.0.1:51698-tAGGFJla014107] INFO user.log - virus.McAfee: CLEAN - Declaration for Shared Parental Leave Allocation System
Nov 17 00:15:19 server01 MM: [Jilter Processor 21 - Async Jilter Worker 9 - 127.0.0.1:51698-tAGGFJla014107] INFO user.log - mtaqid=tAGGFJla014107, msgid=<201511161615.tAGGFJla014107#server01>, from=<sender#domain>, size=2488, to=<recipient#domain>, relay=[100.24.134.19], disposition=Deliver
Nov 17 00:15:20 server01 sm-mta[14240]: tAGGFJla014107: to=<recipient#domain>, delay=00:00:01, xdelay=00:00:01, mailer=smtp, pri=122447, relay=relayserver.domain. [100.91.20.1], dsn=2.0.0, stat=Sent (tAGGFJlR021747 Message accepted for delivery)
What I tried is, I made these 5 lines into 1 line and used awk to parse each column - unfortunately, the column count is not uniform.
I'm looking into getting the date/time (line 1, columns 1-3), sender, recipient, and subject (line 3, words after "CLEAN -" to the end of line)
Preferably sed or awk in bash.
Thanks!

Explanation: fileis your file.
The script initializes id and block to empty strings. At first run id takes the value of field nr. 7. After that all lines are added to block until a line doesn't match id. At that point block and id are reinitialized.
awk 'BEGIN{id="";block=""} {if (id=="") id=$6; else {if ($0~id) block= block $0; else {print block;block=$0;id=$6}}}' file
Then you're going to have to process each line of the output.

There are many ways to approach this. Here is one example calling a simple script and passing the log filename as the first argument. It will parse the requested data and save the data separated into individual variables. It simply prints the results at the end.
#!/bin/bash
[ -r "$1" ] || { ## validate input file readable
printf "error: invalid argument, file not readable '%s'\n" "$1"
exit 1
}
while read -r line; do
## set date from line containing from/sender
if grep -q -o 'from=<' <<<"$line" &>/dev/null; then
dt=$(cut -c -15 <<<"$line")
from=$(grep -o 'from=<[a-zA-Z0-9]*#[a-zA-Z0-9]*>' <<<"$line")
sender=${from##*<}
sender=${sender%>*}
fi
## search each line for CLEAN
if grep -q -o 'CLEAN.*$' <<<"$line" &>/dev/null; then
subject=$(grep -o 'CLEAN.*$' <<<"$line")
subject="${subject#*CLEAN - }"
fi
## search line for to
if grep -q -o 'to=<' <<<"$line" &>/dev/null; then
to=$(grep -o 'to=<[a-zA-Z0-9]*#[a-zA-Z0-9]*>' <<<"$line")
to=${to##*<}
to=${to%>*}
fi
done < "$1"
printf " date : %s\n from : %s\n to : %s\n subject: \"%s\"\n" \
"$dt" "$sender" "$to" "$subject"
Input
$ cat dat/mail.log
Nov 17 00:15:19 server01 sm-mta[14107]: tAGGFJla014107: from=<sender#domain>, size=2447, class=0, nrcpts=1, msgid=<201511161615.tAGGFJla014107#server01>, proto=ESMTP, daemon=MTA, tls_verify=NONE, auth=NONE, relay=[100.24.134.19]
Nov 17 00:15:19 server01 flow-control[6033]: tAGGFJla014107 accepted
Nov 17 00:15:19 server01 MM: [Jilter Processor 21 - Async Jilter Worker 9 - 127.0.0.1:51698-tAGGFJla014107] INFO user.log - virus.McAfee: CLEAN - Declaration for Shared Parental Leave Allocation System
Nov 17 00:15:19 server01 MM: [Jilter Processor 21 - Async Jilter Worker 9 - 127.0.0.1:51698-tAGGFJla014107] INFO user.log - mtaqid=tAGGFJla014107, msgid=<201511161615.tAGGFJla014107#server01>, from=<sender#domain>, size=2488, to=<recipient#domain>, relay=[100.24.134.19], disposition=Deliver
Nov 17 00:15:20 server01 sm-mta[14240]: tAGGFJla014107: to=<recipient#domain>, delay=00:00:01, xdelay=00:00:01, mailer=smtp, pri=122447, relay=relayserver.domain. [100.91.20.1], dsn=2.0.0, stat=Sent (tAGGFJlR021747 Message accepted for delivery)
Output
$ bash parsemail.sh dat/mail.log
date : Nov 17 00:15:19
from : sender#domain
to : recipient#domain
subject: "Declaration for Shared Parental Leave Allocation System"
Note: if your from/sender is not always going to be in the first line, you can simply move those lines out from under the test clause. Let me know if you have any questions.

Bash script assistance with renaming file using existing parts of filename

I'm looking for help with a bash script to do some renaming of files for me. I don't know much about bash scripting, and what I have read is overwhelming. It's a lot to know/understand for the limited applications I will probably have.
In Dropbox, my media files are named something like:
Photo Jul 04, 5 49 44 PM.jpg
Video Jun 22, 11 21 00 AM.mov
I'd like them to be renamed in the following format: 2015-07-04 1749.ext
Some difficulties:
The script has to determine if AM or PM to put in the correct 24-hour format
The year is not specified; it is safe to assume the current year
The date, minute and second have a leading zero, but the hour does not; therefore the position after the hour is not absolute
Any assistance would be appreciated. FWIW, I'm running MacOS.

Mac OSX
This uses awk to reformat the date string:
for f in *.*
do
new=$(echo "$f" | awk -F'[ .]' '
BEGIN {
split("Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec",month)
for (i in month) {
nums[month[i]]=i
}
}
$(NF-1)=="PM" {$4+=12;}
{printf "%s 2015-%02i-%02i %02i%02i.%s",$1,nums[$2],$3,$4,$5,$8;}
')
mv "$f" "$new"
done
After the above was run, the files are now named:
$ ls -1 *.*
Photo 2015-07-04 1749.jpg
Video 2015-06-22 1121.mov
The above was tested on GNU awk but I don't believe that I have used any GNU-specific features.
GNU/Linux
GNU date has a handy feature for interpreting human-style date strings:
for f in *.*
do
prefix=${f%% *}
ext=${f##*.}
datestr=$(date -d "$(echo "$f" | sed 's/[^ ]* //; s/[.].*//; s/ /:/3; s/ /:/3; s/,//')" '+%F %H%M')
mv "$f" "$prefix $datestr.$ext"
done
Here is an example of the script in operation:
$ ls -1 *.*
Photo Jul 04, 5 49 44 PM.jpg
Video Jun 22, 11 21 00 AM.mov
$ bash script
$ ls -1 *.*
Photo 2015-07-04 1749.jpg
Video 2015-06-22 1121.mov

While not a simple parse and reformat for date, it isn't that difficult. The bash string tools of parameter expansion/substring removal are all you need to parse the pieces of the date into a format that date can use to output a new date string in the format for use in a filename. (see String Manipulation ) date -d is used to generate a new date string based on the contents of the original filename.
Note: the following presumes the dropbox filenames are in the format you have specified. (it doesn't care what the first part of the name or extension is as long as it matches the format you have specified) Here is an example of properly isolating the pieces of the filename needed to generate a date in the format specified)
Further, all spaces have been removed from the filename. While you originally showed a space between the day and hours, I will not provide an example of poor practice by inserting a space in a filename. As such, the spaces have been replaced with '_' and '-':
#!/bin/bash
# Photo Jul 04, 5 49 44 PM.jpg
# Video Jun 22, 11 21 00 AM.mov
# fn="Photo Jul 04, 5 49 44 PM.jpg"
fn="Video Jun 22, 11 21 00 AM.mov"
ext=${fn##*.} # determine extension
prefix=${fn%% *} # determine prefix (Photo or Video)
datestr=${fn%.${ext}} # remove extension from filename
datestr=${datestr#${prefix} } # remove prefix from datestr
day=${datestr%%,*} # isolate Month and date in day
ampm=${datestr##* } # isloate AM/PM in ampm
datestr=${datestr% ${ampm}} # remove ampm from datestr
timestr=${datestr##*, } # isolate time in timestr
timestr=$(tr ' ' ':' <<<"$timestr") # translate spaces to ':' using herestring
cmb="$day $timestr $hr" # create combined date/proper format
## create date/time string for filename
datetm=$(date -d "$cmb" '+%Y%m%d-%H%M')
newfn="${prefix}_${datetm}.${ext}"
## example moving of file to new name
# (assumes you handle the path correctly)
printf "mv '%s' %s\n" "$fn" "$newfn"
# mv "$fn" "$newfn" # uncomemnt to actually use
exit 0
Example/Output
$ bash dateinfname.sh
mv 'Video Jun 22, 11 21 00 AM.mov' Video_20150622-1121.mov

Using sed to extract a substring in curly brackets

I've currently got a string as below:
integration#{Wed Nov 19 14:17:32 2014} branch: thebranch
This is contained in a file, and I parse the string. However I want the value between the brackets {Wed Nov 19 14:17:32 2014}
I have zero experience with Sed, and to be honest I find it a little cryptic.
So far I've managed to use the following command, however the output is still the entire string.
What am I doing wrong?
sed -e 's/[^/{]*"\([^/}]*\).*/\1/'

To get the values which was between {, }
$ sed 's/^[^{]*{\([^{}]*\)}.*/\1/' file
Wed Nov 19 14:17:32 2014

This is very simple to do with awk, not complicate regex.
awk -F"{|}" '{print $2}' file
Wed Nov 19 14:17:32 2014
It sets the field separator to { or }, then your data will be in the second field.
FS could be set like this to:
awk -F"[{}]" '{print $2}' file
To see all field:
awk -F"{|}" '{print "field#1="$1"\nfield#2="$2"\nfield#3="$3}' file
field#1=integration#
field#2=Wed Nov 19 14:17:32 2014
field#3= branch: thebranch

This might work
sed -e 's/[^{]*\({[^}]*}\).*/\1/g'
Test
$ echo "integration#{Wed Nov 19 14:17:32 2014} branch: thebranch" | sed -e 's/[^{]*{\([^}]*\)}.*/\1/g'
Wed Nov 19 14:17:32 2014
Regex
[^{]* Matches anything other than the {, That is integration#
([^}]*) Capture group 1
\{ Matches {
[^}]* matches anything other than }, That is Wed Nov 19 14:17:32 2014
\} matches a }
.* matches the rest

Simply, below command also get the data...
echo "integration#{Wed Nov 19 14:17:32 2014} branch: thebranch" | sed 's/.*{\(.*\)}.*/\1/g'

BASH grep with multiple parameters + n lines after one of the matches

I have a bunch of text as a output from command, I need to display only specific matching lines plus some additional lines after match "message" (message text is obviously longer than 1 line)
what I tried was:
grep -e 'Subject:' -e 'Date:' -A50 -e 'Message:'
but it included 50 lines after EACH match, and I need to pass that only to single parameter. How would I do that?
code with output command:
(<...> | telnet <mailserver> 110 | grep -e 'Subject:' -e 'Date:' -A50 -e 'Message:'
Part of the telnet output:
Date: Tue, 10 Sep 2013 16
Message-ID: <00fb01ceae25$
MIME-Version: 1.0
Content-Type: multipart/alternative;
boundary="----=_NextPart_000_00FC_01CEAE3E.DE32CE40"
X-Mailer: Microsoft Office Outlook 12.0
Thread-Index: Ac6uJWYdA3lUzs1cT8....
Content-Language: lt
X-Mailman-Approved-At: Tue, 10 Sep 2013 16:0 ....
Subject: ...
X-BeenThere: ...
Precedence: list

Try following:
... | telnet ... > <file>
grep -e 'Subject:' -e 'Date:' <file> && grep -A50 -e 'Message:' <file>
Will need to dump the output to a file first.
This can be done with awk as well, without the need for dumping output to a file.
... | telnet ... | awk '/Date:/ {print}; /Subject:/ {print}; /Message:/ {c=50} c && c--'

With grep it would be hard to do. Better use awk for this
awk '/Subject:|Date:/;/Message:/ {while(l<=50){print $0;l++;getline}}'
Here the awk prints 50 lines below the Message: pattern and only one line is printed for all other patterns.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Cut Mod_Security ID with sed/awk - bash

With sed: sed -n 's/.\[id "\([^"]\)"].*/\1/p' you need to consume all items before [id and after your token you need to escape the square bracket

The ids are accessible in the 6th awk field when double quotes are used as custom separators: $ awk -F '"|"' '{print $6}' file 000784 9 00263

Related

Parsing java logs for multiline entries using bash

Collect info from multiple lines

Bash script assistance with renaming file using existing parts of filename

Using sed to extract a substring in curly brackets

BASH grep with multiple parameters + n lines after one of the matches

Categories

Resources

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Cut Mod_Security ID with sed/awk - bash

With sed: sed -n 's/.*\[id "\([^"]*\)"].*/\1/p' you need to consume all items before [id and after your token you need to escape the square bracket

The ids are accessible in the 6th awk field when double quotes are used as custom separators: $ awk -F '"|"' '{print $6}' file 000784 9 00263

Related

Parsing java logs for multiline entries using bash

Collect info from multiple lines

Bash script assistance with renaming file using existing parts of filename

Using sed to extract a substring in curly brackets

BASH grep with multiple parameters + n lines after one of the matches

Categories

Resources

With sed: sed -n 's/.\[id "\([^"]\)"].*/\1/p' you need to consume all items before [id and after your token you need to escape the square bracket