Logstash unable to index into elasticsearch because it can't parse date - elasticsearch

I am getting a lot of the following errors when I am running logstash to index documents into Elasticsearch
[2019-11-02T18:48:13,812][WARN ][logstash.outputs.elasticsearch] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"my-index-2019-09-28", :_type=>"doc", :_routing=>nil}, #<LogStash::Event:0x729fc561>], :response=>{"index"=>{"_index"=>"my-index-2019-09-28", "_type"=>"doc", "_id"=>"BhlNLm4Ba4O_5bsE_PxF", "status"=>400, "error"=>{"type"=>"mapper_parsing_exception", "reason"=>"failed to parse field [timestamp] of type [date] in document with id 'BhlNLm4Ba4O_5bsE_PxF'", "caused_by"=>{"type"=>"illegal_argument_exception", "reason"=>"Invalid format: \"2019-09-28 23:32:10.586\" is malformed at \" 23:32:10.586\""}}}}}
It clearly has a problem with the date being formed but I don't see what that problem could be. Below are excerpts from my logstash config and the elasticsearch template. I include these because I'm trying to use the timestamp field to articulate the index in my logstash config by copying timestamp into #timestamp then formatting that to a YYY-MM-DD format and use that stored metadata to articulate my index
Logstash config:
input {
stdin { type => stdin }
}
filter {
csv {
separator => " " # this is a tab (/t) not just whitespace
columns => ["timestamp","field1", "field2", ...]
convert => {
"timestamp" => "date_time"
...
}
}
}
filter {
date {
match => ["timestamp", "yyyy-MM-dd' 'HH:mm:ss'.'SSS'"]
target => "#timestamp"
}
}
filter {
date_formatter {
source => "#timestamp"
target => "[#metadata][date]"
pattern => "YYYY-MM-dd"
}
}
filter {
mutate {
remove_field => [
"#timestamp",
...
]
}
}
output {
amazon_es {
hosts =>
["my-es-cluster.us-east-1.es.amazonaws.com"]
index => "my-index-%{[#metadata][date]}"
template => "my-config.json"
template_name => "my-index-*"
region => "us-east-1"
}
}
Template:
{
"template" : "my-index-*",
"mappings" : {
"doc" : {
"dynamic" : "false",
"properties" : {
"timestamp" : {
"type" : "date"
}, ...
},
"settings" : {
"index" : {
"number_of_shards" : "12",
"number_of_replicas" : "0"
}
}
}
When I inspect the raw data it looks like what the error is showing me and that appears to be well formed so I'm not sure what my issue is
Here is an example row, it's been redacted but the problem field is untouched and is the first one
2019-09-28 07:29:46.454 NA 2019-09-28 07:29:00 someApp 62847957802 62847957802

Turns out the source problem was the convert block. logstash is unable to understand the time format specified in the file. To address this I changed the original timestamp field to unformatted_timestamp and apply the date formatter I was already using
filter {
date {
match => ["unformatted_timestamp", "yyyy-MM-dd' 'HH:mm:ss'.'SSS'"]
target => "timestamp"
}
}
filter {
date_formatter {
source => "timestamp"
target => "[#metadata][date]"
pattern => "YYYY-MM-dd"
}
}

You are parsing your lines using the csv filter and setting the separator to a space, but your date is also split by a space, this way your first field, named timestamp only gets the date 2019-09-28 and the time is on the field named field1.
You can solve your problem creating a new field named date_and_time with the contents of the fields with the date and the time, for example.
csv {
separator => " "
columns => ["date","time","field1","field2","field3","field4","field5","field6"]
}
mutate {
add_field => { "date_and_time" => "%{date} %{time}" }
}
mutate {
remove_field => ["date","time"]
}
This will create a field named date_and_time with the value 2019-09-28 07:29:46.454, you can now use the date filter to parse this value into the #timestamp field, the default for logstash.
date {
match => ["date_and_time", "YYYY-MM-dd HH:mm:ss.SSS"]
}
This will leave you with two fields with the same value, date_and_time and #timestamp, the #timestamp is the default for logstash so I would suggest keeping it and removing the date_and_time that was created before.
mutate {
remove_field => ["date_and_time"]
}
Now you can create your date based index using the format YYYY-MM-dd and logstash will extract the date from the #timestamp field, just change your index line in your output for this one:
index => "my-index-%{+YYYY-MM-dd}"

Related

Remove nested field with NULL value before insert to Elasticsearch

I'm using Metricbeat to query my database and ship the result to Logstash. One of the queries returns a date column that might contain a NULL value , lets call it "records_last_failure".
I'm getting the following error in logstash :
[2020-12-29T15:45:59,077][WARN ][logstash.outputs.elasticsearch][my_pipeline] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"my-index", :routing=>nil, :_type=>"_doc"}, #<LogStash::Event:0x4f7d2f63>], :response=>{"index"=>{"_index"=>"my-index", "_type"=>"_doc", "_id"=>"wRItr3YBUsMlz2Mnbnlz", "status"=>400, "error"=>{"type"=>"mapper_parsing_exception", "reason"=>"failed to parse field [sql.metrics.string.records_last_failure] of type [date] in document with id 'wRItr3YBUsMlz2Mnbnlz'. Preview of field's value: 'NULL'", "caused_by"=>{"type"=>"illegal_argument_exception", "reason"=>"failed to parse date field [NULL] with format [strict_date_optional_time||epoch_millis]", "caused_by"=>{"type"=>"date_time_parse_exception", "reason"=>"date_time_parse_exception: Failed to parse with all enclosed parsers"}}}}}}
My goal is to remove this field before trying to save the document into Elasticsearch. I tried to use the following two filters in Logstash but both of them didn't help and I still keep getting the error mentioned above. The filters :
filter {
......
......
if "[sql.metrics.string.records_last_failure]" == "NULL" {
mutate {
remove_field => [ "[sql.metrics.string.records_last_failure]" ]
}
}
if "[sql][metrics][string][records_last_failure]" == "NULL" {
mutate {
remove_field => [ "[sql][metrics][string][records_last_failure]" ]
}
}
}
Not sure why or how, but after a few restarts logstash started working well.
My json looked like the following example :
"a" : 1,
"b" : 2,
"sql" : {
"metrics" : {
"string" : {
"record_last_failure" : "NULL",
"other_field" : 5
}
}
}
I wanted to remove the field sql.metrics.string.record_last_failure if it was set to null. The syntax that I used in my filter :
if "[sql][metrics][string][records_last_failure]" == "NULL" {
mutate {
remove_field => [ "[sql][metrics][string][records_last_failure]" ]
}

Logstash - Setting a timestamp from a JSON parsed object

I am having an issue with setting a timestamp from a JSON parse.
I have this string:
[{"orderNumber":"423523-4325-3212-4235-463a72e76fe8","externalOrderNumber":"reactivate_22d6ff0d8f55eb821be14df9d35505a6","operation":{"name":"CAPTURE","amount":134,"status":"SUCCESS","createdAt":"2015-05-11T09:14:30.969Z","updatedAt":{}}}]
I parse it as a json using this Logstash filter:
grok {
match => { "message" => "\[%{GREEDYDATA:firstjson}\]%{SPACE} \[%{GREEDYDATA:secondjson}\}]}]"}
}
json{
source => "firstjson"
}
date {
match => [ "operation.createdAt", "ISO8601"]
}
mutate {
remove_field => [ "firstjson", "secondjson" ]
}
}
This creates a document inside the ElasticSearch. I have a field named operation.createdAt which is properly recognised as a date field. But for some reason, this line:
date {
match => [ "operation.createdAt", "ISO8601"]
}
is not setting #timestamp field. Current #timestamp field is set at the moment of document insertion. What am I doing wrong?
Thanks to nice people at ES Logstash Community, I have found the answer.
Instead of:
date {
match => [ "operation.createdAt", "ISO8601"]
}
I use this:
date {
match => [ "[operation][createdAt]", "ISO8601"]
}
and that properly extracts and parses the JSON time object.

Drop log messages containing a specific string

So I have log messages of the format :
[INFO] <blah.blah> 2016-06-27 21:41:38,263 some text
[INFO] <blah.blah> 2016-06-28 18:41:38,262 some other text
Now I want to drop all logs that does not contain a specific string "xyz" and keep all the rest. I also want to index timestamp.
grokdebug is not helping much.
This is my attempt :
input {
file {
path => "/Users/username/Desktop/validateLogconf/logs/*"
start_position => "beginning"
}
}
filter {
grok {
match => {
"message" => '%{SYSLOG5424SD:loglevel} <%{JAVACLASS:job}> %{GREEDYDATA:content}'
}
}
date {
match => [ "Date", "YYYY-mm-dd HH:mm:ss" ]
locale => en
}
}
output {
stdout {
codec => plain {
charset => "ISO-8859-1"
}
}
elasticsearch {
hosts => "http://localhost:9201"
index => "hello"
}
}
I am new to grok so patterns above might not be making sense. Please help.
To drop the message that does not contain the string xyz:
if ([message] !~ "xyz") {
drop { }
}
Your grok pattern is not grabbing the date part of your logs.
Once you have a field from your grok pattern containing the date, you can invoque the date filter on this field.
So your grok filter should look like this:
grok {
match => {
"message" => '%{SYSLOG5424SD:loglevel} <%{JAVACLASS:job}> %{TIMESTAMP_ISO8601:Date} %{GREEDYDATA:content}'
}
}
I added a part to grab the date, which will be in the field Date. Then you can use the date filter:
date {
match => [ "Date", "YYYY-mm-dd HH:mm:ss,SSS" ]
locale => en
}
I added the ,SSS so that the format match the one from the Date field.
The parsed date will be stored in the #timestamp field, unless specified differently with the target parameter.
to check if your message contains a substring, you can do:
if [message] =~ "a" {
mutate {
add_field => { "hello" => "world" }
}
}
So in your case you can use the if to invoke the drop{} filter, or you can wrap your output plugin in it.
To parse a date and write it back to your timestamp field, you can use something like this:
date {
locale => "en"
match => ["timestamp", "ISO8601"]
timezone => "UTC"
target => "#timestamp"
add_field => { "debug" => "timestampMatched"}
}
This matches my timestamp in:
Source field: "timestamp" (see match)
Format is "ISO...", you can use a custom format that matches your timestamp
timezone - self explanatory
target - write it back into the event's "#timestamp" field
Add a debug field to check that it has been matched correctly
Hope that helps,
Artur

Convert log message timestamp to UTC before storing it in Elasticsearch

I am collecting and parsing Tomcat access-log messages using Logstash, and am storing the parsed messages in Elasticsearch.
I am using Kibana to display the log messges in Elasticsearch.
Currently I am using Elasticsearch 2.0.0, Logstash 2.0.0, and Kibana 4.2.1.
An access-log line looks something like the following:
02-08-2016 19:49:30.669 ip=11.22.333.444 status=200 tenant=908663983 user=0a4ac75477ed42cfb37dbc4e3f51b4d2 correlationId=RID-54082b02-4955-4ce9-866a-a92058297d81 request="GET /pwa/rest/908663983/rms/SampleDataDeployment HTTP/1.1" userType=Apache-HttpClient requestInfo=- duration=4 bytes=2548 thread=http-nio-8080-exec-5 service=rms itemType=SampleDataDeployment itemOperation=READ dataLayer=MongoDB incomingItemCnt=0 outgoingItemCnt=7
The time displayed in the log file (ex. 02-08-2016 19:49:30.669) is in local time (not UTC!)
Here is how I parse the message line:
filter {
grok {
match => { "message" => "%{DATESTAMP:logTimestamp}\s+" }
}
kv {}
mutate {
convert => { "duration" => "integer" }
convert => { "bytes" => "integer" }
convert => { "status" => "integer" }
convert => { "incomingItemCnt" => "integer" }
convert => { "outgoingItemCnt" => "integer" }
gsub => [ "message", "\r", "" ]
}
grok {
match => { "request" => [ "(?:%{WORD:method} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpVersion})?)" ] }
overwrite => [ "request" ]
}
}
I would like Logstash to convert the time read from the log message ('logTimestamp' field) into UTC before storing it in Elasticsearch.
Can someone assist me with that please?
--
I have added the date filter to my processing, but I had to add a timezone.
filter {
grok {
match => { "message" => "%{DATESTAMP:logTimestamp}\s+" }
}
date {
match => [ "logTimestamp" , "mm-dd-yyyy HH:mm:ss.SSS" ]
timezone => "Asia/Jerusalem"
target => "logTimestamp"
}
...
}
Is there a way to convert the date to UTC without supplying the local timezone, such that Logstash takes the timezone of the machine it is running on?
The motivation behind this question is I would like to use the same configuration file in all my deployments, in various timezones.
That's what the date{} filter is for - to parse a string field containing a date string replace the [#timestamp] field with that value in UTC.
This can also be done in an ingest processor as follows:
PUT _ingest/pipeline/chage_local_time_to_iso
{
"processors": [
{
"date" : {
"field" : "my_time",
"target_field": "my_time",
"formats" : ["dd/MM/yyyy HH:mm:ss"],
"timezone" : "Europe/Madrid"
}
}
]
}

logstash elastic search date output is different

my system audit log contains the date format like created_at":1422765535789, so, the elastic search output also displays the date as same style. however, I would like convert and print this 1422765535789 to unix style date format.
I've used this format in syslog file (as suggested by another question thread) . but I am not getting the above value to unix style Date format
date {
match => ["created_at", "UNIX_MS"]
}
Hi, I've updated the code in the syslog , however, I am getting the created_at still output to elastic search page on same format like 1422765535789 , please find the modified code
input {
stdin {
}
}
filter {
grok {
match => [ "message", "%{NUMBER:created_at}"
]
}
if [message] =~ /^created_at/ {
date {
match => [ "created_at" , "UNIX_MS" ]
}
ruby {
code => "
event['created_at'] = Time.at(event['created_at']/1000);
"
}
}
}
output {
elasticsearch { host => localhost }
stdout { codec => rubydebug }
}
The date filter is used to update the #timestamp field value.
input {
stdin {
}
}
filter {
grok {
match => [ "message", "%{NUMBER:created_at:int}"
]
}
if "_grokparsefailure" not in [tags]
{
date {
match => [ "created_at" , "UNIX_MS" ]
}
ruby {
code => "
event['created_at'] = Time.at(event['created_at']/1000);
"
}
}
}
output
{
stdout {
codec => rubydebug
}
}
Here is my config. When I input 1422765535789, it can parse the value and update the #timestamp field value.
The output is
{
"message" => "1422765535789",
"#version" => "1",
"#timestamp" => "2015-02-01T04:38:55.789Z",
"host" => "ABC",
"created_at" => "2015-02-01T12:38:55.000+08:00"
}
You can found the value of #timestamp is same with created_at.
And, the ruby filter is used to convert the created_at to UTC format.
FYI.

Resources