Parse date string in elasticsearch using custom joda format string - elasticsearch

Trying to figure out why this joda custom format is causing an error. I'm trying to match this date string:
Wed May 23 2018 13:45:04 GMT-0700 (Pacific Daylight Time)
with this joda custom format string:
E MMM dd yyyy HH:mm:ss z (zzzz)||epoch_millis
I'm doing this in the dev console to test a mapping that uses the format. Elasticsearch doesn't like it:
PUT /twitter
{}
PUT /twitter/_mapping/_doc
{
"properties": {
"TxnDate": {
"type": "date",
"format": "E MMM dd yyyy HH:mm:ss z (zzzz)||epoch_millis"
}
}
}
Elasticsearch is returning:
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "Incomplete parser array"
}
],
"type": "illegal_argument_exception",
"reason": "Incomplete parser array"
},
"status": 400
}

In order to get the mapping to save, the correct format to be used is this one, i.e. you need to escape GMT- and the parenthesis.
E MMM dd yyyy HH:mm:ss 'GMT'Z '('ZZZZ')'||epoch_millis
However, this is not the end of the story, unfortunately... You'll then get a parsing error at indexing time when saving your document with a date such as Wed May 23 2018 13:45:04 GMT-0700 (Pacific Daylight Time). The problem here is that Joda time doesn't parse timezones as "explained" in their documentation:
Zone names: Time zone names ('z') cannot be parsed.
So your only option is to remove the timezone in parenthesis before indexing your document and the pattern E MMM dd yyyy HH:mm:ss 'GMT'Z||epoch_millis will work fine. The timezone in parenthesis is useless anyway.
On another note, you should take the habit of storing all your dates in the GMT timezone, but that's another story.

Related

Set year on syslog events read into logstash read after new year

Question
When reading syslog events with Logstash, how can one set a proper year where:
Syslog events still by default lack the year
Logstash processing can be late in processing - logs arriving late, logstash down for maintenance, syslog queue backing up
In short - events can come in un-even order - and all / most lack the year.
The Logstash date filter will successfully parse a Syslog date, and use the current year by default. This can be wrong.
One constraint: Logs will never be from the future, not counting TimeZone +/- 1 day.
How can logic be applied to logstash to:
Check if a parsed date appears to be in the future?
Handle "Feb 29" if parsed in the year after the actual leap year.
Date extraction and parsing
I've used the GROK filter plugin to extract the SYSLOGTIMESTAMP from the message into a syslog_timestamp field.
Then the Logstash date filter plugin to parse syslog_timestamp into the #timestamp field.
#
# Map the syslog date into the elasticsearch #timestamp field
#
date {
match => ["syslog_timestamp",
"MMM dd HH:mm:ss",
"MMM d HH:mm:ss",
"MMM dd yyyy HH:mm:ss",
"MMM d yyyy HH:mm:ss" ]
timezone => "Europe/Oslo"
target => "#timestamp"
add_tag => [ "dated" ]
tag_on_failure => [ "_dateparsefailure" ]
}
# Check if a localized date filter can read the date.
if "_dateparsefailure" in [tags] {
date {
match => ["syslog_timestamp",
"MMM dd HH:mm:ss",
"MMM d HH:mm:ss",
"MMM dd yyyy HH:mm:ss",
"MMM d yyyy HH:mm:ss" ]
locale => "no_NO"
timezone => "Europe/Oslo"
target => "#timestamp"
add_tag => [ "dated" ]
tag_on_failure => [ "_dateparsefailure_locale" ]
}
}
Background
We are storing syslog events into Elasticsearch using Logstash. The input comes from a wide variety of servers both of different OS and OS versions, several hundred in total.
On the logstash server the logs are read from file. Servers ship their logs using the standard syslog forwarding protocol.
The standard Syslog event still only has the month and date in each log, and configuring all servers to also add the year is out of scope for this question.
Problem
From time to time an event will occur where a servers syslog queue backs up. The queue will then (mostly) be released after a syslog / or server restart. The patching regime ensures that all servers are booted several times a year, so (most likely) any received events will at most be under a year old.
In addition any delay in processing, such as between 31/12 (December) and 1/1 (January) makes an event belong to another year than the year it is processed.
From time to time you also will need to re-read some logs, and then there's the leap year issue of February 29th - 29/02 - "Feb 29".
Examples:
May 25 HH:MM:SS
May 27 HH:MM:SS
May 30 HH:MM:SS
May 31 HH:MM:SS
Mai 31 HH:MM:SS # Localized
In sum: Logs may be late, and we need to handle it.
More advanced DateTime logic can be done with the Logstash Ruby filter plugin.
Leap year
29th of February every four years makes "Feb 29" a valid date for the year 2020, but not in 2021.
The date is saved in syslog_timestamp and run through the date filters in the Q.
The following Ruby code will:
Check if this year is a leap year (probably not since parsing failed)
Check if last year was a leap year.
If the date falls outside these checks we can't rightly assume anything else, so this check falls into the "I know and accept the risk."
#
# Handle old leap syslog messages, typically from the previous year, while in a non-leap-year
# Ruby comes with a price, so don't run it unless the date filter has failed and the date is "Feb 29".
#
if "_dateparsefailure" in [tags] and "_dateparsefailure_locale" in [tags] and [syslog_timestamp] =~ /^Feb 29/ {
ruby {
code => "
today = DateTime.now
last_year = DateTime.now().prev_year
if not today.leap? then
if last_year.leap? then
timestamp = last_year.strftime('%Y') + event.get('syslog_timestamp')
event.set('[#metadata][fix_leapyear]', LogStash::Timestamp.new(Time.parse(timestamp)))
end
end
"
}
#
# Overwrite the `#timestamp` field if successful and remove the failure tags
#
if [#metadata][fix_leapyear] {
mutate {
copy => { "[#metadata][fix_leapyear]" => "#timestamp" }
remove_tag => ["_dateparsefailure", "_dateparsefailure_locale"]
add_tag => ["dated"]
}
}
}
Date in the future
Dates "in the future" occurs if you get i.e. Nov 11 in a log parsed after New Year.
This Ruby filter will:
Set a tomorrow date variable two days in the future (ymmv)
Check if the parsed event date #timestamp is after (in the future) tomorrow
When reading syslog we assume that logs from the future does not exist. If you run test servers to simulate later dates you must of course adapt to that, but that is outside the scope.
# Fix Syslog date without YEAR.
# If the date is "in the future" we assume it is really in the past by one year.
#
if ![#metadata][fix_leapyear] {
ruby {
code => "
#
# Create a Time object for two days from the current time by adding 172800 seconds.
# Depends on that [event][timestamp] is set before any 'date' filter or use Ruby's `Time.now`
#
tomorrow = event.get('[event][timestamp]').time.localtime() + 172800
#
# Read the #timestamp set by the 'date' filter
#
timestamp = event.get('#timestamp').time.localtime()
#
# If the event timestamp is _newer_ than two days from now
# we assume that this is syslog, and a really old message, and that it is really from
# last year. We cannot be sure that it is not even older, hence the 'assume'.
#
if timestamp > tomorrow then
if defined?(timestamp.usec_with_frac) then
new_date = LogStash::Timestamp.new(Time.new(timestamp.year - 1, timestamp.month, timestamp.day, timestamp.hour, timestamp.min, timestamp.sec, timestamp.usec_with_frac)
else
new_date = LogStash::Timestamp.new(Time.new(timestamp.year - 1, timestamp.month, timestamp.day, timestamp.hour, timestamp.min, timestamp.sec))
end
event.set('#timestamp', new_date)
event.set('[event][timestamp_datefilter]', timestamp)
end
"
}
}
Caveat: I'm by no means a Ruby expert, so other answers or comments on how to improve on the Ruby code or logic will be greatly appreciated.
In the hope that this can help or inspire others.

Date pattern doesn't work as expected in logstash

I am trying to use following date filter to convert string to date but it doesn't seem to be working.
Sample input data(string) - Mon Jan 20 09:20:35 GMT 2020
I am first using a mutate gsub to remove GMT which renders following string output-
Mon Jan 20 09:20:35 2020
My gsub mutate filter looks like this -
mutate { gsub => [ "TimeStamp", "GMT", "" ] }
Now, I am using a date filter to convert gsub output to date format but it doesn't seem to be working-
date {
match => [ "TimeStamp", "EEE MMM dd HH:mm:ss yyyy" ]
target => "TimeStamp"
locale => "en"
}
I have also tried following with no success-
date {
match => [ "TimeStamp", "EEE\sMMM\sdd\sHH:mm:ss\s+yyyy" ]
target => "TimeStamp"
timezone => "Etc/GMT"
locale => "en"
}
The date pattern should be
MMM dd HH:mm:ss yyyy
Maybe you have to add some extra spaces before the year (looks like you have them in your logs).
Instead of EEE (name of weekday abreviated) you need to use MMM (name of month abreviated).

Overriding #timestamp via Logstash's date filter with a grok-extracted value

I am trying to mutate an string value to date time in logstash. Although the format is correct but in kibana/elastic search the field is showing string and not date.
As part of the analysis I tried to mutate the date in multiple ways but none of them are working. I tried some filters for milliseconds and half day as the date format for my log is with AM/PM.
Grok
match => { message => [
"\"%{WORD:status}\"\,\"(?<monitortime>%{MONTH:month}%{SPACE}%{MONTHDAY:day}\,%{SPACE}%{YEAR:year}%{SPACE}%{TIME:t1}%{SPACE}%{WORD:t2})\"\,\"%{WORD:monitor}\"\,%{INT:loadtime}\,%{INT:totalbytes}\,\"%{WORD:location}\"\,(?m)%{GREEDYDATA:error}"
Date Conversion
date {
locale => "en"
match => [ "monitortime", "MMM dd, yyyy kk:mm:ss.SSS aa ZZZ", "YYYY-MM-dd kk:mm:ss.SSS aa ZZZ" ]
timezone => "Etc/UCT"
}
output in kibana
message "Error","Jun 14, 2019 02:47:33 pm","xxxxxxxxxx",0,0,"stage_1","HomePage: Sign in link is not visible!"
monitortime Jun 14, 2019 02:47:33 pm
monitortime string
Timestamp recorded by elasticsearch
#timestamp Sep 10, 2019 # 20:06:48.525
The expected result will be to get monitortime as datatype date.

How to convert datetime in jmeter using beanshell sampler

I have timestamp for one of my http sampler in following format
Tue Nov 07 10:28:10 PST 2017
and i need to convert it to in following format
11/07/2017 10:28:10
i tried different approaches but don't know what am i doing wrong.Can anyone help me on that.Thanks.
It's very similar to how you'd do it in Java.
Here's an example:
import java.text.DateFormat;
import java.text.SimpleDateFormat;
String string = "Tue Nov 07 10:28:10 PST 2017";
// Original format to convert from
DateFormat formatFrom = new SimpleDateFormat("EEE MMM dd HH:mm:ss zzz yyyy", Locale.ENGLISH);
// Target format to convert to
DateFormat formatTo = new SimpleDateFormat("dd/MM/yyyy HH:mm:ss", Locale.ENGLISH);
// Parse original string, using original format
Date date = formatFrom.parse(string);
// Convert to a target format
String result = formatTo.format(date);
// Just to show the output, not really necessary
log.info(result);
One catch: since target format omits the zone, local zone of the computer will be used. So for example original time 10:28:10 PST will be converted to 10:28:10 for computer in PST zone, but for computer in EST zone it will be converted to 13:28:10
I heard Groovy is the new black so given:
Date class in Groovy SDK has format() and parse() methods
It is recommended to use JSR223 Test Elements and Groovy language since JMeter 3.1
you can get the date converted in a single line of Groovy code like:
Date.parse("EEE MMM dd HH:mm:ss zzz yyyy", 'Tue Nov 07 10:28:10 PST 2017').format("dd/MM/yyyy HH:mm:ss", TimeZone.getTimeZone('PST'))
Demo:

ElasticSearch not mapping JODA time format

I am indexing tweets, and would like to map the created_at field to a date. An example date looks like this:
'created_at': 'Wed Sep 21 05:19:16 +0000 2011'
which using the JODA time format, I figured out to be:
"format" : "EEE MMM dd HH:mm:ss +SSSS yyyy",
However, when trying to index a new tweet I get the following error:
{u'status': 400, u'error': u'RemoteTransportException[[Rattler][inet[/192.155.85.243:9301]][index]]; nested: MapperParsingException[Failed to parse [created_at]]; nested: MapperParsingException[failed to parse date field [2013-04-30 20:34:43], tried both date format [yyyyMMdd HH:mm:ss], and timestamp number]; nested: IllegalArgumentException[Invalid format: "2013-04-30 20:34:43" is malformed at "-04-30 20:34:43"]; '}
I've tried changing the date format to use
yyyy-MM-dd HH:mm:ss
EEE, dd MMM yyyy HH:mm:ss Z
EEE dd MMM yyyy HH:mm:ss Z
EEE MMM dd HH:mm:ss +0000 yyyy
, and several other variations to just see, and no luck. I'm using the following call to create an initial tweet document:
curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '{
"tweet" : {
"properties" : {
"created_at" : {"type" : "date", "format" : "EEE dd MMM yyyy HH:mm:ss Z"}
}
}
}'
Any help is greatly appreciated!
The Joda time format you specified is not completely correct.
S is for fraction of second, not timezone as you wanted. Also the "+" sign is included in the timezone parser.
I managed to parse the twitter date format in elasticsearch with this format specifier:
"format": "EE MMM d HH:mm:ss Z yyyy"

Resources