Logstash custom date format and irregular spaces - elasticsearch

Receiving a parsing failure with my grok match. I can't seem to find anything that will match my log.
Here is my log:
2016-06-14 14:03:42 1.1.1.1 GET /origin-www.site.com/ScriptResource.axd?d= jEHA4v5Z26oA-nbsKDVsBINPydW0esbNCScJdD-RX5iFGr6qqeyJ69OnKDoJgTsDcnI1&t=5f9d5645 200 26222 0 "http://site/ layouts/CategoryPage.aspx?dsNav=N:10014" "Mozilla/5.0 (Linux; Android 4.4.4; SM-G318HZ Build/KTU84P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.95 Mobile Safari/537.36" "cookie"
Here is my grok match. It works fine in the grok debugger.
filter {
grok {
match => { 'message' => '%{DATE:date} %{TIME:time} %{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:status} %{NUMBER:bytes} %{NUMBER:time_taken} %{QUOTEDSTRING:referrer} %{QUOTEDSTRING:user_agent} %{QUOTEDSTRING:cookie}' }
}
}
EDIT: I decided to do a screenshot of what my log file looks like as the spaces dont come over when copying and pasting. Those appear to be single spaces when I copy/paste.

Beside the space in that logline you posted, which I assume won't exist in your logs, your pattern is incorrect on the date parsing. Logstash DATE follows this pattern:
DATE_US %{MONTHNUM}[/-]%{MONTHDAY}[/-]%{YEAR}
DATE_EU %{MONTHDAY}[./-]%{MONTHNUM}[./-]%{YEAR}
DATE %{DATE_US}|%{DATE_EU}
Which doesn't match your YYYY-MM-dd format. I recommend using a pattern file and defining a custom date format
CUST_DATE %{YEAR}-%{MONTHNUM2}-%{MONTHDAY}
then your pattern can be
%{CUST_DATE:date} %{TIME:time} %{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:status} %{NUMBER:bytes} %{NUMBER:time_taken} %{QUOTEDSTRING:referrer} %{QUOTEDSTRING:user_agent} %{QUOTEDSTRING:cookie}
EDIT:
You may be able to handle weird whitespace with a gsub, this won't remove whitespace, but will normalize spaces to all be 1 " "
mutate {
gsub => [
# replace all whitespace characters or multiple adjacent whitespace characters with one space
"message", "\s+", " "
]
}

Related

How can i exclude parsing events by kv filter which does not match pattern

I am parsing logs from many daemons of UTM solution.
Grok and kv config looks like:
grok {
match => [ "message", "%{SYSLOGPROG} %{NOTSPACE:syslog_hostname} %{DATA:syslog_program}(?:\[%{POSINT:syslog_pid}\])?: %{GREEDYDATA:syslog_message}" ]
}
kv {
id => "syslogkv"
source => "syslog_message"
trim_key => " "
trim_value => " "
value_split => "="
field_split => " "
}
Usually events looks like
<30>2019:04:23-20:13:38 hostname ulogd[5354]: id="2001" severity="info" sys="SecureNet" sub="packetfilter" name="Packet dropped" action="drop" fwrule="60002" initf="eth3.5" outitf="eth5" srcmac="c8:9c:1d:af:68:7f" dstmac="00:1a:8c:f0:f5:23" srcip="x.x.x.x" dstip="y.y.y.y" proto="17" length="56" tos="0x00" prec="0x00" ttl="63" srcport="5892" dstport="53"
and parsed without any problem
But when some daemons generates events looking like (WAF in example)
<139>2019:04:23-16:21:38 hostname httpd[1475]: [security2:error] [pid 1475:tid 3743300464] [client x.x.x.x] ModSecurity: Warning. Pattern match "([\\\\~\\\\!\\\\#\\\\#\\\\$\\\\%\\\\^\\\\&\\\\*\\\\(\\\\)\\\\-\\\\+\\\\=\\\\{\\\\}\\\\[\\\\]\\\\|\\\\:\\\\;\\"\\\\'\\\\\\xc2\\xb4\\\\\\xe2\\x80\\x99\\\\\\xe2\\x80\\x98\\\\`\\\\<\\\\>].*?){8,}"
my output breaks and logstash stops processing any logs.
How can i exclude kv parsing events by regexp or any pattern?
In simple words - do not use kv if first words in syslog_message begins with "[" or any other regexp.
Wrap your kv filter in a conditional on the field:
if [syslog_message] !~ /^\[/ {
kv { }
}

Is there a way parse log messages using rsyslog config and transform them to structured messages?

I am trying to parse log messages and transform them to structured messages using rsyslog. Is there a way support such operation with rsyslog config? I have not yet explored the option to write custom parser or message modification plugin for this.
I found template list properties which can do some of it. Is there a way to do the following?
Map 2 fields to single output name. Ex: "__ts": "2018-09-20 10:18:56.363" (first 2 fields in example below). Would not use regex here as I am looking for a solution that does not depend on value of the fields. Ex: the two fields could be two strings or some other values not just dates.
Extract what is left in msg after extracting all known fields based on position. Ex: "msg": "Unregistering application nameOfAnApiHere with someOtherName with status DOWN".
Is there a way to use local variables to hold the values of fields from msg and use the variables in templates?
Example Log message:
2018-09-20 10:18:56.363 INFO --- [Thread-68] x.y.z.key1Value Unregistering application nameOfAnApiHere with someOtherName with status DOWN
1. rsyslog config template definition
template(name="structure-log-format" type="list") {
constant(value="{")
# This only extracts the first field with value 2018-09-20.
# TODO: What is a way to map first 2 fields to map to __ts field?
property(outname="__ts" name="msg" field.number="1" field.delimiter="32" format="jsonf") constant(value=", ")
constant(value="\"event\":[{")
constant(value="\"payload\":{")
property(outname="_log_" name="syslogtag" format="jsonf") constant(value=", ")
property(outname="__loglvl" name="msg" field.number="4" field.delimiter="32" format="jsonf") constant(value=", ")
property(outname="__thread" name="msg" field.number="7" field.delimiter="32" format="jsonf") constant(value=", ")
property(outname="__key1" name="msg" field.number="8" field.delimiter="32" format="jsonf") constant(value=", ")
# The following setting will include full message value starting from "2018-09-20 ... DOWN"
# TODO: What is a way to only include message starting from "Unregistering ... DOWN"?
property(name="msg" format="jsonf" droplastlf="on" )
constant(value="}")
constant(value="}]} \n")
}
2. Expected result:
{
"__ts": "2018-09-20 10:18:56.363",
"event": [
{
"payload": {
"_log_": "catalina",
"__loglvl": "INFO",
"__thread": "Thread-68",
"__key1": "x.y.z.key1Value",
"msg": "Unregistering application nameOfAnApiHere with someOtherName with status DOWN"
}
}
]
}
3. Actual result:
{
"__ts": "2018-09-20",
"event": [
{
"payload": {
"_log_": "catalina",
"__loglvl": "INFO",
"__thread": "Thread-68",
"__key1": "x.y.z.key1Value",
"msg": "2018-09-20 10:18:56.363 INFO 2144 --- [Thread-68] x.y.z.key1Value Unregistering application nameOfAnApiHere with someOtherName with status DOWN"
}
}
]
}
Thank you.
You can also use regular expressions to match parts of a message. For example, replace your outname="__ts" property with:
property(outname="__ts" name="msg"
regex.expression="([^ ]+ +[^ ]+)"
regex.type="ERE"
regex.submatch="1" format="jsonf")
Here the extended regular expression (ERE) looks for not-a-space ([^ ]) one or more of them (+), followed by a space or more, and another not-a-space. These 2 words are captured as a submatch by the () and you select this one, counting from 1. The result should be as you want.
You can similarly use a regex for the second requirement, either by counting "words" and spaces again, or some more precise other match. Here the regex skips 6 words by putting a repeat count {6} after the word-and-spaces pattern, then captures the rest (.*). Since there are 2 sets of (), the submatch to keep is now 2, not 1:
property(name="msg"
regex.expression="([^ ]+ +){6}(.*)"
regex.type="ERE"
regex.submatch="2" format="jsonf" droplastlf="on" )

Grok Pattern for multiline is not working

1st| 2nd|3rd |4th |5th |6th |7th |8th |2012.07.12 05:31:04 |10th |ProductDir: C:\samplefiles\test\storage\4.0 (LF)
C:\samplefiles\test\storage\5.0 (LF)
SampleDir: (LF)
Note: LF -> Line Feed is getting appended
I have tried the following options.. Nothing seems to be working
match => [ "message", "(?m)....
(?<message>(.|\r|\n)*)
Greedydata is also not working as its not considering new line.
mutate {gsub => ["message", "\n", "LINE_BREAK"] }
codec => multiline { pattern => "^\s" negate => true what => previous }
(?m)%{GREEDYDATA} will match any multiline log including yours.
Please test it here
The below one worked for me.
codec => multiline{
pattern => "^\s*\d{1,}\|"
negate => "true"
what => "previous"
}

Logstash overwrite doesn’t happen when message empty

logstash 5.3.0
filter {
grok {
patterns_dir => ["/etc/logstash/patterns"]
match => [
"message", "%{NGINXACCESS} %{GREEDYDATA:message}",
"message", "%{NGINXACCESSAUTH}%{GREEDYDATA:message}",
"message", "%{NGINXERROR}",
"message", "%{PHPLOG}%{GREEDYDATA:message}",
"message", "%{FPMERROR}%{GREEDYDATA:message}",
"message", "%{SYSLOG5424PRI}%{SYSLOGBASE2} %{GREEDYDATA:message}"
]
overwrite => [ "message" ]
}
I am having an issue here where I have a complete parse here for NGINXACCESSAUTH which leaves me with empty result for %{GREEDYDATA:message} and this not rewriting message field to empty, leaving me with messy outcome of message field being the full rsyslog source message as well as all the tags parsed.
program:nginx
logsource:ppdlweb005
nginx_client:10.175.37.27
nginx_auth:-
nginx_time:08/Mar/2018:14:16:24 +0000
nginx_ident:-
nginx_response:200
message:<141>Mar 8 14:16:33 ppdlweb005 nginx 10.175.37.27 - - - [08/Mar/2018:14:16:24 +0000] "HEAD /?_=havemercy11 HTTP/1.1" 200 0 "-" "AppleWebkit/534.1 (KHTML) HbbTV/1.4.1 (+DRM;SureSoft-Browser-3.0;T3;0010;1.0;Manhattan-FVPlay;) FVC/2.0(SureSoft-Browser-3.0;Manhattan-FVPlay;)" SUCCESS 0.001
nginx_bytes:0
http_user_agent:AppleWebkit/534.1 (KHTML) HbbTV/1.4.1 (+DRM;SureSoft-Browser-3.0;T3;0010;1.0;Manhattan-FVPlay;) FVC/2.0(SureSoft-Browser-3.0;Manhattan-FVPlay;)
nginx_httpversion:1.1
#timestamp:March 8th 2018, 14:16:33.000
nginx_verb:HEAD
nginx_processing_time:0.001
fvc_role:auth
http_referer:-
fvc_env:staging
syslog5424_pri:141
#version:1
host:ppdlweb005
nginx_ssl_verify:SUCCESS
nginx_request:/?_=havemercy11
timestamp:Mar 8 14:16:33
_id:AWIF-Hov00VaJHdB36R2
_type:logs
_index:logstash-2018.03.08
_score: -
Any idea how to go about this apart from removing part of the pattern so there is something for GREEDYDATA to parse?
Use keep_empty_captures => true to retain the empty message

grok match : parse log file for time only using pattern or match

I want to parse the below mentioned line from log file.
03:34:19,491 INFO [:sm-secondary-17]: DBBackup:106 - The max_allowed_packet value defined in [16M] does not match the value from /etc/mysql/my.cnf [24M]. The value will be used.
After parse, the output must be :
Time : 03:34:19
LogType : INOF
Message : [:sm-secondary-17]: DBBackup:106 - The max_allowed_packet value defined in [16M] does not match the value from /etc/mysql/my.cnf [24M]. The value will be used.
Ignore : ,491 (comma and 3 digit number).
Grok filter config should be like this for parsing the mentioned log.
...
filter {
grok {
match => {"message" => "%{TIME:Time},%{NUMBER:ms} %{WORD:LogType} %{GREEDYDATA:Message}"}
remove_field => [ "ms" ]
}
}
...

Resources