"tags":["_dateparsefailure"]," with logstash - elasticsearch

12-Apr-2021 17:12:45.289 FINE [https-jsse-nio2-8443-exec-5] org.apache.catalina.authenticator.FormAuthenticator.doAuthenticate Authentication of 'user1' was successful
I am parsing above log message with the below code in logstash and unfortunately getting a "tags":["_dateparsefailure"], .
%{MY_DATE_PATTERN:timestamp} is an custom pattern as follows MY_DATE_PATTERN %{MONTHDAY}-%{MONTH}-%{YEAR} %{HOUR}:?%{MINUTE}(?::?%{SECOND})
I also have checked with https://grokdebug.herokuapp.com/ that it parses perfectly fine.
I was wondering that you may be able to see where i am doing wrong.
filter {
grok{
patterns_dir => "/etc/logstash/patterns"
match => { "message" => "%{MY_DATE_PATTERN:timestamp}\s+%{WORD:severity}\s+\[%{DATA:thread}\]\s+%{NOTSPACE:type_log}\s+(?<action>\w(?:[\w\s]*\w)?)(?:\s+['\[](?<user>[^\]']+))?" }
}
# Converting timestamp
date {
locale => "nl"
match => ["timestamp", "dd-MM-YYYY HH:mm:ss"]
timezone => "Europe/Amsterdam"
target => "timestampconverted"
}
ruby {
code => "event.set('timestamp', (event.get('timestampconverted').to_f*1000).to_i)"
}
The output ( had to remove couple things so that i could post here)
user":"user1,"type_log":"org.apache.catalina.authenticator.FormAuthenticator.doAuthenticate","logSource":{"environment,"tags":["_dateparsefailure"],"thread":"https-jsse-nio2-8443-exec-6","action":"Authentication of
thanks in advance!
Update
I also tried below and still getting the error
date {
locale => "nl"
match => ["timestamp", "dd-MMM-YYYY HH:mm:ss.SSS"]
timezone => "Europe/Amsterdam"
target => "timestampconverted"
}

It should definitely be "dd-MMM-YYYY HH:mm:ss.SSS" -- you have to consume the entire field. Can you try removing the 'locale => "nl"' option (just for debugging purposes). We are currently in a month where the Dutch and English month abbreviations match. If it starts working then the month abbreviations are not what you think they are. Some locales expect to have a . at the end of the abbreviation. Looking at the CLDR charts it definitely appears that locale nl is one of them, so you will have to gsub it in. The CLDR data is here, scroll down to "Months - Abbreviated - Formatting". You could try
mutate { gsub => [ "timestamp", "(jan|feb|mrt|apr|jun|jul|aug|sep|okt|nov|dec)", "\1." ] }
My original suggestion of
mutate { gsub => [ "timestamp", "(jan|feb|apr|aug|sept|oct|okt|nov|dec)", "\1." ] }
was based on the abbreviations given here but that is not what Java uses.
The issue is definitely in the date filter, not the grok. If the grok filter were not parsing the timestamp field then the date filter would be a no-op and would not add the tag.

I figured out that custom pattern was causing issue, instead of using it from another location i added to my conf file as regex as following (?<logstamp>%{MONTHDAY}-%{MONTH}-%{YEAR} %{HOUR}:%{MINUTE}:%{SECOND})

Related

logstash-input-mongodb: controlling the output?

I'm trying to setup the logstash-input-mongodb plugin to read audits from my database, but all the parsing strategies seem to have issues and I don't see how to customize anything.
The "flatten" parse_method works quite nicely, but it ignores mongodb object IDs and does not output them anywhere except in the log_entry field.
The "simple" parse_method includes object IDs but outputs dates in a way that I cannot figure out how to parse with the date filter (e.g., "2017-02-12 16:30:00 UTC"). Then, in the absence of a proper timestamp, the plugin seems to generate timestamps on its own which have no relation to the current time (e.g., in 2022).
The "dig" method I haven't quite figured out yet.
So my questions:
Is there a way to parse data from the log_entry (see example below) field that the plugin outputs? I've tried the json filter but it is not json because it's been ruby-formatted.
Or, is there any way to get the "flatten" method to include object IDs?
Or, is there anyw ay to get the "simple" method to properly format mongodb ISODate fields?
Is there any way to prevent the plugin from reading data from the beginning of time (I only want to push the last day or so into logstash)?
Can be reproduced with any configuration, here's my basic one:
input {
mongodb {
uri => 'mongodb://localhost:27017/test'
placeholder_db_dir => '/elk/logstash-mongodb/'
placeholder_db_name => 'logstash_sqlite.db'
collection => 'auditcommunications'
batch_size => 1000
parse_method => "flatten"
}
}
filter {
date {
match => [ "timestamp", "ISO8601" ]
}
}
output {
stdout { codec => rubydebug }
}
Example data including log_entry:
{
"audit-id" => "58a2edc916e057270065fa74",
"created" => "2017-02-14T11:45:13Z",
"type" => "mongodb-audit",
"audit-type" => "PaymentAudit",
"mongo_id" => "58a2edc916e057270065fa74",
"expiresAt" => "2017-05-15T11:45:13Z",
"lastUpdated" => "2017-02-14T11:45:13Z",
"#timestamp" => 2017-02-14T11:45:13.000Z,
"log_entry" => "{\"_id\"=>BSON::ObjectId('58a2edc916e057270065fa74'), \"order\"=>BSON::ObjectId('a8a2f205790858970046aa59'), \"_type\"=>\"PaymentAudit\", \"lastUpdated\"=>2017-02-14 11:45:13 UTC, \"created\"=>2017-02-14 11:45:13 UTC, \"payment\"=>BSON::ObjectId('58a2edc02eafcd560101ee5f'), \"organization\"=>BSON::ObjectId('56edde0ba33e1c03ff54a5ec'), \"status\"=>\"succeeded\", \"context\"=>{\"type\"=>\"order\", \"id\"=>BSON::ObjectId('58a2e205790852270046ab59')}, \"expiresAt\"=>2017-05-15 11:45:13 UTC, \"__v\"=>0}",
"logdate" => "2017-02-14T11:45:13+00:00",
"__v" => 0,
"#version" => "1",
"context_type" => "order",
"status" => "succeeded",
"timestamp" => "2017-02-14T11:45:13Z"
}
How can I extract the organization from the log_entry field above?
I've tried the following:
filter {
ruby {
code => "event.set('organization', eval(event.get('[log_entry]')))"
}
}
but this throws a rubyexception: ERROR logstash.filters.ruby - Ruby exception occurred: (eval):1: syntax error, unexpected tINTEGER
If you use the simple parse_method then you can parse the timestamp easily with the following pattern yyyy-MM-dd HH:mm:ss ZZZ that you can add to your date filter.
filter {
date {
match => [ "timestamp", "yyyy-MM-dd HH:mm:ss ZZZ" ]
}
}
Regarding the last point, I suggest checking the since_* settings which allow you to keep a cursor of what's been already processed and only start from that cursor on the next logstash restart.

Timezone causing different results when doing a search query to an index in Elastic Search

I'm trying to find out the results from a search query (ie: searching results for the given date range) of a particular index. So that I could get the results in a daily basis.
This is the query : http://localhost:9200/dialog_test/_search?q=timestamp:[2016-08-03T00:00:00.128%20TO%202016-08-03T23:59:59.128]
In the above, timestamp is a field which i added using my logstash.conf in order to get the actual log time. When i tried querying this, surprisingly i got a number of hits (total hits: 24) which should've been 0 since I didn't have any log records from the date of (2016-08-03) . It actually displays the count for the next day (ie: (2016-08-04), which has 24 records in the log file. I'm sure something has gone wrong with the timezone.
My timezone is GMT+5:30.
Here is my filtering part of logstash conf:
filter {
grok {
patterns_dir => ["D:/ELK Stack/logstash/logstash-2.3.4/bin/patterns"]
match => { "message" => "^%{LOGTIMESTAMP:logtimestamp}%{GREEDYDATA}" }
}
mutate {
add_field => { "timestamp" => "%{logtimestamp}" }
remove_field => ["logtimestamp"]
}
date {
match => [ "timestamp" , "ISO8601" , "yyyyMMdd HH:mm:ss.SSS" ]
target => "timestamp"
locale => "en"
}}
EDIT:
This is a snap of the first 24 records which has the date of (2016-08-04) from the log file:
And this is a snap of the JSON response I got when I searched for the date of 2016-08-03:
Where am i going wrong? Any help could be appreciated.
In your date filter you need to add a timezone
date {
match => [ "timestamp" , "ISO8601" , "yyyyMMdd HH:mm:ss.SSS" ]
target => "timestamp"
locale => "en"
timezone => "Asia/Calcutta" <--- add this
}

Logstash date parsing as timestamp using the date filter

Well, after looking around quite a lot, I could not find a solution to my problem, as it "should" work, but obviously doesn't.
I'm using on a Ubuntu 14.04 LTS machine Logstash 1.4.2-1-2-2c0f5a1, and I am receiving messages such as the following one:
2014-08-05 10:21:13,618 [17] INFO Class.Type - This is a log message from the class:
BTW, I am also multiline
In the input configuration, I do have a multiline codec and the event is parsed correctly. I also separate the event text in several parts so that it is easier to read.
In the end, I obtain, as seen in Kibana, something like the following (JSON view):
{
"_index": "logstash-2014.08.06",
"_type": "customType",
"_id": "PRtj-EiUTZK3HWAm5RiMwA",
"_score": null,
"_source": {
"#timestamp": "2014-08-06T08:51:21.160Z",
"#version": "1",
"tags": [
"multiline"
],
"type": "utg-su",
"host": "ubuntu-14",
"path": "/mnt/folder/thisIsTheLogFile.log",
"logTimestamp": "2014-08-05;10:21:13.618",
"logThreadId": "17",
"logLevel": "INFO",
"logMessage": "Class.Type - This is a log message from the class:\r\n BTW, I am also multiline\r"
},
"sort": [
"21",
1407315081160
]
}
You may have noticed that I put a ";" in the timestamp. The reason is that I want to be able to sort the logs using the timestamp string, and apparently logstash is not that good at that (e.g.: http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/multi-fields.html).
I have unsuccessfull tried to use the date filter in multiple ways, and it apparently did not work.
date {
locale => "en"
match => ["logTimestamp", "YYYY-MM-dd;HH:mm:ss.SSS", "ISO8601"]
timezone => "Europe/Vienna"
target => "#timestamp"
add_field => { "debug" => "timestampMatched"}
}
Since I read that the Joda library may have problems if the string is not strictly ISO 8601-compliant (very picky and expects a T, see https://logstash.jira.com/browse/LOGSTASH-180), I also tried to use mutate to convert the string to something like 2014-08-05T10:21:13.618 and then use "YYYY-MM-dd'T'HH:mm:ss.SSS". That also did not work.
I do not want to have to manually put a +02:00 on the time because that would give problems with daylight saving.
In any of these cases, the event goes to elasticsearch, but date does apparently nothing, as #timestamp and logTimestamp are different and no debug field is added.
Any idea how I could make the logTime strings properly sortable? I focused on converting them to a proper timestamp, but any other solution would also be welcome.
As you can see below:
When sorting over #timestamp, elasticsearch can do it properly, but since this is not the "real" log timestamp, but rather when the logstash event was read, I need (obviously) to be able to sort also over logTimestamp. This is what then is output. Obviously not that useful:
Any help is welcome! Just let me know if I forgot some information that may be useful.
Update:
Here is the filter config file that finally worked:
# Filters messages like this:
# 2014-08-05 10:21:13,618 [17] INFO Class.Type - This is a log message from the class:
# BTW, I am also multiline
# Take only type- events (type-componentA, type-componentB, etc)
filter {
# You cannot write an "if" outside of the filter!
if "type-" in [type] {
grok {
# Parse timestamp data. We need the "(?m)" so that grok (Oniguruma internally) correctly parses multi-line events
patterns_dir => "./patterns"
match => [ "message", "(?m)%{TIMESTAMP_ISO8601:logTimestampString}[ ;]\[%{DATA:logThreadId}\][ ;]%{LOGLEVEL:logLevel}[ ;]*%{GREEDYDATA:logMessage}" ]
}
# The timestamp may have commas instead of dots. Convert so as to store everything in the same way
mutate {
gsub => [
# replace all commas with dots
"logTimestampString", ",", "."
]
}
mutate {
gsub => [
# make the logTimestamp sortable. With a space, it is not! This does not work that well, in the end
# but somehow apparently makes things easier for the date filter
"logTimestampString", " ", ";"
]
}
date {
locale => "en"
match => ["logTimestampString", "YYYY-MM-dd;HH:mm:ss.SSS"]
timezone => "Europe/Vienna"
target => "logTimestamp"
}
}
}
filter {
if "type-" in [type] {
# Remove already-parsed data
mutate {
remove_field => [ "message" ]
}
}
}
I have tested your date filter. it works on me!
Here is my configuration
input {
stdin{}
}
filter {
date {
locale => "en"
match => ["message", "YYYY-MM-dd;HH:mm:ss.SSS"]
timezone => "Europe/Vienna"
target => "#timestamp"
add_field => { "debug" => "timestampMatched"}
}
}
output {
stdout {
codec => "rubydebug"
}
}
And I use this input:
2014-08-01;11:00:22.123
The output is:
{
"message" => "2014-08-01;11:00:22.123",
"#version" => "1",
"#timestamp" => "2014-08-01T09:00:22.123Z",
"host" => "ABCDE",
"debug" => "timestampMatched"
}
So, please make sure that your logTimestamp has the correct value.
It is probably other problem. Or can you provide your log event and logstash configuration for more discussion. Thank you.
This worked for me - with a slightly different datetime format:
# 2017-11-22 13:00:01,621 INFO [AtlassianEvent::0-BAM::EVENTS:pool-2-thread-2] [BuildQueueManagerImpl] Sent ExecutableQueueUpdate: addToQueue, agents known to be affected: []
input {
file {
path => "/data/atlassian-bamboo.log"
start_position => "beginning"
type => "logs"
codec => multiline {
pattern => "^%{TIMESTAMP_ISO8601} "
charset => "ISO-8859-1"
negate => true
what => "previous"
}
}
}
filter {
grok {
match => [ "message", "(?m)^%{TIMESTAMP_ISO8601:logtime}%{SPACE}%{LOGLEVEL:loglevel}%{SPACE}\[%{DATA:thread_id}\]%{SPACE}\[%{WORD:classname}\]%{SPACE}%{GREEDYDATA:logmessage}" ]
}
date {
match => ["logtime", "yyyy-MM-dd HH:mm:ss,SSS", "yyyy-MM-dd HH:mm:ss,SSS Z", "MMM dd, yyyy HH:mm:ss a" ]
timezone => "Europe/Berlin"
}
}
output {
elasticsearch { hosts => ["localhost:9200"] }
stdout { codec => rubydebug }
}

Logstash change time format

My log statement looks like this.
2014-04-23 06:40:29 INFO [1605853264] [ModuleName] - [ModuleName] -
Blah blah
I am able to parse it fine and it gets logged to ES correctly with following ES field
"LogTimestamp": "2014-04-23T13:40:29.000Z"
But my requirement is to log this statement as following, note 'z' is dropped with +0000. I tried replace, gsub but none changes the output.
"LogTimestamp": "2014-04-23T13:40:29.000+0000"
Can somebody help?
Here is my pattern
TEMP_TIMESTAMP %{YEAR}-%{MONTHNUM}-%{MONTHDAY}\s%{HOUR}:%{MINUTE}:%{SECOND} TEMP_LOG %{TEMP_TIMESTAMP:logdate}\s*?%{LOGLEVEL:TempLogLevel}\s*?\[\s?*%{BASE10NUM:TempThreadId}\]%{GREEDYDATA}
This is the filter config:
grok{
patterns_dir => ["patterns"]
match=> ["message", "%{TEMP_LOG}"]
}
date{
match => [ "logdate", "yyyy-MM-dd HH:mm:ss" ]
target => "LogTimestamp"
timezone => "PST8PDT"
}
mutate {
gsub => ["logdate", ".000Z", ".000+0000"]
}
I haven't quite understood meaning of fields in logstash and how they map to elastic search, that confusion is making me go wrong in this case.
You can use ruby plugin to do what you want!
As your requirement, you want to change this
"LogTimestamp": "2014-04-23T13:40:29.000Z"
to
"LogTimestamp": "2014-04-23T13:40:29.000+0000"
Try to use this filter
filter {
ruby {
code => "
event['LogTimestamp'] = event['LogTimestamp'].localtime('+00:00')
"
}
}
Hope this can help you.

How to handle non-matching Logstash grok filters

I am wondering what the best approach to take with my Logstash Grok filters. I have some filters that are for specific log entries, and won't apply to all entries. The ones that don't apply always generate _grokparsefailure tags. For example, I have one grok filter that's for every log entry and it works fine. Then I have another filter that's for error messages with tracebacks. The traceback filter throws a grokparsefailure for every single log entry that doesn't have a traceback.
I'd prefer to have it just pass the rule if there isn't a match instead of adding the parsefailure tag. I use the parsefailure tag to find things that aren't parsing properly, not things that simply didn't match a particular filter. Maybe it's just the nomenclature "parse failure" that gets me. To me that means there's something wrong with the filter (e.g. badly formatted), not that it didn't match.
So the question is, how should I handle this?
Make the filter pattern optional using ?
(ab)use the tag_on_failure option by setting it to nothing []
make the filter conditional using something like "if traceback in message"
something else I'm not considering?
Thanks in advance.
EDIT
I took the path of adding a conditional around the filter:
if [message] =~ /took\s\d+/ {
grok {
patterns_dir => "/etc/logstash/patterns"
match => ["message", "took\s+(?<servicetime>[\d\.]+)"]
add_tag => [ "stats", "servicetime" ]
}
}
Still interested in feedback though. What is considered "best practice" here?
When possible, I'd go with a conditional wrapper just like the one you're using. Feel free to post that as an answer!
If your application produces only a few different line formats, you can use multiple match patterns with the grok filter. By default, the filter will process up to the first successful match:
grok {
patterns_dir => "./patterns"
match => {
"message" => [
"%{BASE_PATTERN} %{EXTRA_PATTERN}",
"%{BASE_PATTERN}",
"%{SOME_OTHER_PATTERN}"
]
}
}
If your logic is less straightforward (maybe you need to check the same condition more than once), the grep filter can be useful to add a tag. Something like this:
grep {
drop => false #grep normally drops non-matching events
match => ["message", "/took\s\d+/"]
add_tag => "has_traceback"
}
...
if "has_traceback" in [tags] {
...
}
You can also add tag_on_failure => [] to your grok stanza like so:
grok {
match => ["context", "\"tags\":\[%{DATA:apptags}\]"]
tag_on_failure => [ ]
}
grok will still fail, but will do so without adding to the tags array.
This is the most efficient way of doing this. Ignore the filter
filter {
grok {
match => [ "message", "something"]
}
if "_grokparsefailure" in [tags] {
drop { }
}
}
You can also do this
remove_tag => [ "_grokparsefailure" ]
whenever you have a match.

Resources