logstash-input-mongodb: controlling the output? - ruby

I'm trying to setup the logstash-input-mongodb plugin to read audits from my database, but all the parsing strategies seem to have issues and I don't see how to customize anything.
The "flatten" parse_method works quite nicely, but it ignores mongodb object IDs and does not output them anywhere except in the log_entry field.
The "simple" parse_method includes object IDs but outputs dates in a way that I cannot figure out how to parse with the date filter (e.g., "2017-02-12 16:30:00 UTC"). Then, in the absence of a proper timestamp, the plugin seems to generate timestamps on its own which have no relation to the current time (e.g., in 2022).
The "dig" method I haven't quite figured out yet.
So my questions:
Is there a way to parse data from the log_entry (see example below) field that the plugin outputs? I've tried the json filter but it is not json because it's been ruby-formatted.
Or, is there any way to get the "flatten" method to include object IDs?
Or, is there anyw ay to get the "simple" method to properly format mongodb ISODate fields?
Is there any way to prevent the plugin from reading data from the beginning of time (I only want to push the last day or so into logstash)?
Can be reproduced with any configuration, here's my basic one:
input {
mongodb {
uri => 'mongodb://localhost:27017/test'
placeholder_db_dir => '/elk/logstash-mongodb/'
placeholder_db_name => 'logstash_sqlite.db'
collection => 'auditcommunications'
batch_size => 1000
parse_method => "flatten"
}
}
filter {
date {
match => [ "timestamp", "ISO8601" ]
}
}
output {
stdout { codec => rubydebug }
}
Example data including log_entry:
{
"audit-id" => "58a2edc916e057270065fa74",
"created" => "2017-02-14T11:45:13Z",
"type" => "mongodb-audit",
"audit-type" => "PaymentAudit",
"mongo_id" => "58a2edc916e057270065fa74",
"expiresAt" => "2017-05-15T11:45:13Z",
"lastUpdated" => "2017-02-14T11:45:13Z",
"#timestamp" => 2017-02-14T11:45:13.000Z,
"log_entry" => "{\"_id\"=>BSON::ObjectId('58a2edc916e057270065fa74'), \"order\"=>BSON::ObjectId('a8a2f205790858970046aa59'), \"_type\"=>\"PaymentAudit\", \"lastUpdated\"=>2017-02-14 11:45:13 UTC, \"created\"=>2017-02-14 11:45:13 UTC, \"payment\"=>BSON::ObjectId('58a2edc02eafcd560101ee5f'), \"organization\"=>BSON::ObjectId('56edde0ba33e1c03ff54a5ec'), \"status\"=>\"succeeded\", \"context\"=>{\"type\"=>\"order\", \"id\"=>BSON::ObjectId('58a2e205790852270046ab59')}, \"expiresAt\"=>2017-05-15 11:45:13 UTC, \"__v\"=>0}",
"logdate" => "2017-02-14T11:45:13+00:00",
"__v" => 0,
"#version" => "1",
"context_type" => "order",
"status" => "succeeded",
"timestamp" => "2017-02-14T11:45:13Z"
}
How can I extract the organization from the log_entry field above?
I've tried the following:
filter {
ruby {
code => "event.set('organization', eval(event.get('[log_entry]')))"
}
}
but this throws a rubyexception: ERROR logstash.filters.ruby - Ruby exception occurred: (eval):1: syntax error, unexpected tINTEGER

If you use the simple parse_method then you can parse the timestamp easily with the following pattern yyyy-MM-dd HH:mm:ss ZZZ that you can add to your date filter.
filter {
date {
match => [ "timestamp", "yyyy-MM-dd HH:mm:ss ZZZ" ]
}
}
Regarding the last point, I suggest checking the since_* settings which allow you to keep a cursor of what's been already processed and only start from that cursor on the next logstash restart.

Related

"tags":["_dateparsefailure"]," with logstash

12-Apr-2021 17:12:45.289 FINE [https-jsse-nio2-8443-exec-5] org.apache.catalina.authenticator.FormAuthenticator.doAuthenticate Authentication of 'user1' was successful
I am parsing above log message with the below code in logstash and unfortunately getting a "tags":["_dateparsefailure"], .
%{MY_DATE_PATTERN:timestamp} is an custom pattern as follows MY_DATE_PATTERN %{MONTHDAY}-%{MONTH}-%{YEAR} %{HOUR}:?%{MINUTE}(?::?%{SECOND})
I also have checked with https://grokdebug.herokuapp.com/ that it parses perfectly fine.
I was wondering that you may be able to see where i am doing wrong.
filter {
grok{
patterns_dir => "/etc/logstash/patterns"
match => { "message" => "%{MY_DATE_PATTERN:timestamp}\s+%{WORD:severity}\s+\[%{DATA:thread}\]\s+%{NOTSPACE:type_log}\s+(?<action>\w(?:[\w\s]*\w)?)(?:\s+['\[](?<user>[^\]']+))?" }
}
# Converting timestamp
date {
locale => "nl"
match => ["timestamp", "dd-MM-YYYY HH:mm:ss"]
timezone => "Europe/Amsterdam"
target => "timestampconverted"
}
ruby {
code => "event.set('timestamp', (event.get('timestampconverted').to_f*1000).to_i)"
}
The output ( had to remove couple things so that i could post here)
user":"user1,"type_log":"org.apache.catalina.authenticator.FormAuthenticator.doAuthenticate","logSource":{"environment,"tags":["_dateparsefailure"],"thread":"https-jsse-nio2-8443-exec-6","action":"Authentication of
thanks in advance!
Update
I also tried below and still getting the error
date {
locale => "nl"
match => ["timestamp", "dd-MMM-YYYY HH:mm:ss.SSS"]
timezone => "Europe/Amsterdam"
target => "timestampconverted"
}
It should definitely be "dd-MMM-YYYY HH:mm:ss.SSS" -- you have to consume the entire field. Can you try removing the 'locale => "nl"' option (just for debugging purposes). We are currently in a month where the Dutch and English month abbreviations match. If it starts working then the month abbreviations are not what you think they are. Some locales expect to have a . at the end of the abbreviation. Looking at the CLDR charts it definitely appears that locale nl is one of them, so you will have to gsub it in. The CLDR data is here, scroll down to "Months - Abbreviated - Formatting". You could try
mutate { gsub => [ "timestamp", "(jan|feb|mrt|apr|jun|jul|aug|sep|okt|nov|dec)", "\1." ] }
My original suggestion of
mutate { gsub => [ "timestamp", "(jan|feb|apr|aug|sept|oct|okt|nov|dec)", "\1." ] }
was based on the abbreviations given here but that is not what Java uses.
The issue is definitely in the date filter, not the grok. If the grok filter were not parsing the timestamp field then the date filter would be a no-op and would not add the tag.
I figured out that custom pattern was causing issue, instead of using it from another location i added to my conf file as regex as following (?<logstamp>%{MONTHDAY}-%{MONTH}-%{YEAR} %{HOUR}:%{MINUTE}:%{SECOND})

ElasticSearch - not setting the date type

I am trying the ELK stack, and so far so good :)
I have run in to strange situation regardgin the parsing the date field and sending it to ElasticSearch. I manage to parse the field, and it really gets created in the ElasticSearch, but it always end up as string.
I have tried many different combinations. Also I have tried many different things that people suggested, but still I fail.
This is my setup:
The strings that comes from Filebeat:
[2017-04-26 09:40:33] security.DEBUG: Stored the security token in the session. {"key":"securitysecured_area"} []
[2017-04-26 09:50:42] request.INFO: Matched route "home_logged_in". {"route_parameters":{"controller":"AppBundle\Controller\HomeLoggedInController::showAction","locale":"de","route":"homelogged_in"},"request_uri":"https://qa.someserver.de/de/home"} []
The logstash parsing section:
if [#metadata][type] == "feprod" or [#metadata][type] == "feqa"{
grok {
match => { "message" => "%{TIMESTAMP_ISO8601:logdate}" }
}
date {
#timezone => "Europe/Berlin"
match => [ "logdate", "yyyy-MM-dd HH:mm:ss"]
}
}
According to the documentation, my #timestamp field should be overwritten with the logdate value. But it is no happening.
In the ElasticSearch I can see the field logdate is being created and it has value of 2017-04-26 09:40:33, but its type is string.
I always create index from zero, I delete it first and let the logstash populate it.
I need either #timestamp overwritten with the actual date (not the date when it was indexed), or that logdate field is created with date type. Both is good
Unless you are explicitly adding [#metadata][type] somewhere that you aren't showing, that is your problem. It's not set by default, [type] is set by default from the 'type =>' parameter on your input.
You can validate this with a minimal complete example:
input {
stdin {
type=>'feprod'
}
}
filter {
if [#metadata][type] == "feprod" or [#metadata][type] == "feqa"{
grok {
match => { "message" => "%{TIMESTAMP_ISO8601:logdate}" }
}
date {
match => [ "logdate", "yyyy-MM-dd HH:mm:ss"]
}
}
}
output {
stdout { codec => "rubydebug" }
}
And running it:
echo '[2017-04-26 09:40:33] security.DEBUG: Stored the security token in the session. {"key":"securitysecured_area"} []' | bin/logstash -f test.conf
And getting the output:
{
"#timestamp" => 2017-05-02T15:15:05.875Z,
"#version" => "1",
"host" => "xxxxxxxxx",
"message" => "[2017-04-26 09:40:33] security.DEBUG: Stored the security token in the session. {\"key\":\"securitysecured_area\"} []",
"type" => "feprod",
"tags" => []
}
if you use just if [type] ==... it will work fine.
{
"#timestamp" => 2017-04-26T14:40:33.000Z,
"logdate" => "2017-04-26 09:40:33",
"#version" => "1",
"host" => "xxxxxxxxx",
"message" => "[2017-04-26 09:40:33] security.DEBUG: Stored the security token in the session. {\"key\":\"securitysecured_area\"} []",
"type" => "feprod",
"tags" => []
}

Timezone causing different results when doing a search query to an index in Elastic Search

I'm trying to find out the results from a search query (ie: searching results for the given date range) of a particular index. So that I could get the results in a daily basis.
This is the query : http://localhost:9200/dialog_test/_search?q=timestamp:[2016-08-03T00:00:00.128%20TO%202016-08-03T23:59:59.128]
In the above, timestamp is a field which i added using my logstash.conf in order to get the actual log time. When i tried querying this, surprisingly i got a number of hits (total hits: 24) which should've been 0 since I didn't have any log records from the date of (2016-08-03) . It actually displays the count for the next day (ie: (2016-08-04), which has 24 records in the log file. I'm sure something has gone wrong with the timezone.
My timezone is GMT+5:30.
Here is my filtering part of logstash conf:
filter {
grok {
patterns_dir => ["D:/ELK Stack/logstash/logstash-2.3.4/bin/patterns"]
match => { "message" => "^%{LOGTIMESTAMP:logtimestamp}%{GREEDYDATA}" }
}
mutate {
add_field => { "timestamp" => "%{logtimestamp}" }
remove_field => ["logtimestamp"]
}
date {
match => [ "timestamp" , "ISO8601" , "yyyyMMdd HH:mm:ss.SSS" ]
target => "timestamp"
locale => "en"
}}
EDIT:
This is a snap of the first 24 records which has the date of (2016-08-04) from the log file:
And this is a snap of the JSON response I got when I searched for the date of 2016-08-03:
Where am i going wrong? Any help could be appreciated.
In your date filter you need to add a timezone
date {
match => [ "timestamp" , "ISO8601" , "yyyyMMdd HH:mm:ss.SSS" ]
target => "timestamp"
locale => "en"
timezone => "Asia/Calcutta" <--- add this
}

reparsing a logstash record? fix extracts?

I'm taking a JSON message (Cloudtrail, many objects concatenated together) and by the time I'm done filtering it, Logstash doesn't seem to be parsing the message correctly. It's as if the hash was simply dumped into a string.
Anyhow, here's the input and filter.
input {
s3 {
bucket => "stanson-ops"
delete => false
#snipped unimportant bits
type => "cloudtrail"
}
}
filter {
if [type] == "cloudtrail" {
json { # http://logstash.net/docs/1.4.2/filters/json
source => "message"
}
ruby {
code => "event['RecordStr'] = event['Records'].join('~~~')"
}
split {
field => "RecordStr"
terminator => "~~~"
remove_field => [ "message", "Records" ]
}
}
}
By the time I'm done, elasticsearch entries include a RecordStr key with the following data. It doesn't have a message field, nor does it have a Records field.
{"eventVersion"=>"1.01", "userIdentity"=>{"type"=>"IAMUser", "principalId"=>"xxx"}}
Note that is not JSON style, it's been parsed. (which is important for the concat->split thing to work).
So, the RecordStr key looks not quite right as one value. Further, in Kibana, filterable fields include RecordStr (no subfields). It includes some entries that aren't there anymore: Records.eventVersion, Records.userIdentity.type.
Why is that? How can I get the proper fields?
edit 1 here's part of the input.
{"Records":[{"eventVersion":"1.01","userIdentity":{"type":"IAMUser",
It's unprettified JSON. It appears the body of the file (the above) is in the message field, json extracts it and I end up with an array of records in the Records field. That's why I join and split it- I then end up with individual documents, each with a single RecordStr entry. However, the template(?) doesn't seem to understand the new structure.
I've worked out a method that allows for indexing the appropriate CloudTrail fields as you requested. Here are the modified input and filter configs:
input {
s3 {
backup_add_prefix => \"processed-logs/\"
backup_to_bucket => \"test-bucket\"
bucket => \"test-bucket\"
delete => true
interval => 30
prefix => \"AWSLogs/<account-id>/CloudTrail/\"
type => \"cloudtrail\"
}
}
filter {
if [type] == \"cloudtrail\" {
json {
source => \"message\"
}
ruby {
code => \"event.set('RecordStr', event.get('Records').join('~~~'))\"
}
split {
field => \"RecordStr\"
terminator => \"~~~\"
remove_field => [ \"message\", \"Records\" ]
}
mutate {
gsub => [
\"RecordStr\", \"=>\", \":\"
]
}
mutate {
gsub => [
\"RecordStr\", \"nil\", \"null\"
]
}
json {
skip_on_invalid_json => true
source => \"RecordStr\"
target => \"cloudtrail\"
}
mutate {
add_tag => [\"cloudtrail\"]
remove_field=>[\"RecordStr\", \"#version\"]
}
date {
match => [\"[cloudtrail][eventTime]\",\"ISO8601\"]
}
}
}
The key observation here is that once the split is done we no longer possess valid json in the event and are therefore required to execute the mutate replacements ('=>' to ':' and 'nil' to 'null'). Additionally, I found it useful to get the timestamp out of the CloudTrail eventTime and do some cleanup of unnecessary fields.

Logstash date parsing as timestamp using the date filter

Well, after looking around quite a lot, I could not find a solution to my problem, as it "should" work, but obviously doesn't.
I'm using on a Ubuntu 14.04 LTS machine Logstash 1.4.2-1-2-2c0f5a1, and I am receiving messages such as the following one:
2014-08-05 10:21:13,618 [17] INFO Class.Type - This is a log message from the class:
BTW, I am also multiline
In the input configuration, I do have a multiline codec and the event is parsed correctly. I also separate the event text in several parts so that it is easier to read.
In the end, I obtain, as seen in Kibana, something like the following (JSON view):
{
"_index": "logstash-2014.08.06",
"_type": "customType",
"_id": "PRtj-EiUTZK3HWAm5RiMwA",
"_score": null,
"_source": {
"#timestamp": "2014-08-06T08:51:21.160Z",
"#version": "1",
"tags": [
"multiline"
],
"type": "utg-su",
"host": "ubuntu-14",
"path": "/mnt/folder/thisIsTheLogFile.log",
"logTimestamp": "2014-08-05;10:21:13.618",
"logThreadId": "17",
"logLevel": "INFO",
"logMessage": "Class.Type - This is a log message from the class:\r\n BTW, I am also multiline\r"
},
"sort": [
"21",
1407315081160
]
}
You may have noticed that I put a ";" in the timestamp. The reason is that I want to be able to sort the logs using the timestamp string, and apparently logstash is not that good at that (e.g.: http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/multi-fields.html).
I have unsuccessfull tried to use the date filter in multiple ways, and it apparently did not work.
date {
locale => "en"
match => ["logTimestamp", "YYYY-MM-dd;HH:mm:ss.SSS", "ISO8601"]
timezone => "Europe/Vienna"
target => "#timestamp"
add_field => { "debug" => "timestampMatched"}
}
Since I read that the Joda library may have problems if the string is not strictly ISO 8601-compliant (very picky and expects a T, see https://logstash.jira.com/browse/LOGSTASH-180), I also tried to use mutate to convert the string to something like 2014-08-05T10:21:13.618 and then use "YYYY-MM-dd'T'HH:mm:ss.SSS". That also did not work.
I do not want to have to manually put a +02:00 on the time because that would give problems with daylight saving.
In any of these cases, the event goes to elasticsearch, but date does apparently nothing, as #timestamp and logTimestamp are different and no debug field is added.
Any idea how I could make the logTime strings properly sortable? I focused on converting them to a proper timestamp, but any other solution would also be welcome.
As you can see below:
When sorting over #timestamp, elasticsearch can do it properly, but since this is not the "real" log timestamp, but rather when the logstash event was read, I need (obviously) to be able to sort also over logTimestamp. This is what then is output. Obviously not that useful:
Any help is welcome! Just let me know if I forgot some information that may be useful.
Update:
Here is the filter config file that finally worked:
# Filters messages like this:
# 2014-08-05 10:21:13,618 [17] INFO Class.Type - This is a log message from the class:
# BTW, I am also multiline
# Take only type- events (type-componentA, type-componentB, etc)
filter {
# You cannot write an "if" outside of the filter!
if "type-" in [type] {
grok {
# Parse timestamp data. We need the "(?m)" so that grok (Oniguruma internally) correctly parses multi-line events
patterns_dir => "./patterns"
match => [ "message", "(?m)%{TIMESTAMP_ISO8601:logTimestampString}[ ;]\[%{DATA:logThreadId}\][ ;]%{LOGLEVEL:logLevel}[ ;]*%{GREEDYDATA:logMessage}" ]
}
# The timestamp may have commas instead of dots. Convert so as to store everything in the same way
mutate {
gsub => [
# replace all commas with dots
"logTimestampString", ",", "."
]
}
mutate {
gsub => [
# make the logTimestamp sortable. With a space, it is not! This does not work that well, in the end
# but somehow apparently makes things easier for the date filter
"logTimestampString", " ", ";"
]
}
date {
locale => "en"
match => ["logTimestampString", "YYYY-MM-dd;HH:mm:ss.SSS"]
timezone => "Europe/Vienna"
target => "logTimestamp"
}
}
}
filter {
if "type-" in [type] {
# Remove already-parsed data
mutate {
remove_field => [ "message" ]
}
}
}
I have tested your date filter. it works on me!
Here is my configuration
input {
stdin{}
}
filter {
date {
locale => "en"
match => ["message", "YYYY-MM-dd;HH:mm:ss.SSS"]
timezone => "Europe/Vienna"
target => "#timestamp"
add_field => { "debug" => "timestampMatched"}
}
}
output {
stdout {
codec => "rubydebug"
}
}
And I use this input:
2014-08-01;11:00:22.123
The output is:
{
"message" => "2014-08-01;11:00:22.123",
"#version" => "1",
"#timestamp" => "2014-08-01T09:00:22.123Z",
"host" => "ABCDE",
"debug" => "timestampMatched"
}
So, please make sure that your logTimestamp has the correct value.
It is probably other problem. Or can you provide your log event and logstash configuration for more discussion. Thank you.
This worked for me - with a slightly different datetime format:
# 2017-11-22 13:00:01,621 INFO [AtlassianEvent::0-BAM::EVENTS:pool-2-thread-2] [BuildQueueManagerImpl] Sent ExecutableQueueUpdate: addToQueue, agents known to be affected: []
input {
file {
path => "/data/atlassian-bamboo.log"
start_position => "beginning"
type => "logs"
codec => multiline {
pattern => "^%{TIMESTAMP_ISO8601} "
charset => "ISO-8859-1"
negate => true
what => "previous"
}
}
}
filter {
grok {
match => [ "message", "(?m)^%{TIMESTAMP_ISO8601:logtime}%{SPACE}%{LOGLEVEL:loglevel}%{SPACE}\[%{DATA:thread_id}\]%{SPACE}\[%{WORD:classname}\]%{SPACE}%{GREEDYDATA:logmessage}" ]
}
date {
match => ["logtime", "yyyy-MM-dd HH:mm:ss,SSS", "yyyy-MM-dd HH:mm:ss,SSS Z", "MMM dd, yyyy HH:mm:ss a" ]
timezone => "Europe/Berlin"
}
}
output {
elasticsearch { hosts => ["localhost:9200"] }
stdout { codec => rubydebug }
}

Resources