Remove nested field with NULL value before insert to Elasticsearch - elasticsearch

I'm using Metricbeat to query my database and ship the result to Logstash. One of the queries returns a date column that might contain a NULL value , lets call it "records_last_failure".
I'm getting the following error in logstash :
[2020-12-29T15:45:59,077][WARN ][logstash.outputs.elasticsearch][my_pipeline] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"my-index", :routing=>nil, :_type=>"_doc"}, #<LogStash::Event:0x4f7d2f63>], :response=>{"index"=>{"_index"=>"my-index", "_type"=>"_doc", "_id"=>"wRItr3YBUsMlz2Mnbnlz", "status"=>400, "error"=>{"type"=>"mapper_parsing_exception", "reason"=>"failed to parse field [sql.metrics.string.records_last_failure] of type [date] in document with id 'wRItr3YBUsMlz2Mnbnlz'. Preview of field's value: 'NULL'", "caused_by"=>{"type"=>"illegal_argument_exception", "reason"=>"failed to parse date field [NULL] with format [strict_date_optional_time||epoch_millis]", "caused_by"=>{"type"=>"date_time_parse_exception", "reason"=>"date_time_parse_exception: Failed to parse with all enclosed parsers"}}}}}}
My goal is to remove this field before trying to save the document into Elasticsearch. I tried to use the following two filters in Logstash but both of them didn't help and I still keep getting the error mentioned above. The filters :
filter {
......
......
if "[sql.metrics.string.records_last_failure]" == "NULL" {
mutate {
remove_field => [ "[sql.metrics.string.records_last_failure]" ]
}
}
if "[sql][metrics][string][records_last_failure]" == "NULL" {
mutate {
remove_field => [ "[sql][metrics][string][records_last_failure]" ]
}
}
}

Not sure why or how, but after a few restarts logstash started working well.
My json looked like the following example :
"a" : 1,
"b" : 2,
"sql" : {
"metrics" : {
"string" : {
"record_last_failure" : "NULL",
"other_field" : 5
}
}
}
I wanted to remove the field sql.metrics.string.record_last_failure if it was set to null. The syntax that I used in my filter :
if "[sql][metrics][string][records_last_failure]" == "NULL" {
mutate {
remove_field => [ "[sql][metrics][string][records_last_failure]" ]
}

Related

Logstash unable to index into elasticsearch because it can't parse date

I am getting a lot of the following errors when I am running logstash to index documents into Elasticsearch
[2019-11-02T18:48:13,812][WARN ][logstash.outputs.elasticsearch] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"my-index-2019-09-28", :_type=>"doc", :_routing=>nil}, #<LogStash::Event:0x729fc561>], :response=>{"index"=>{"_index"=>"my-index-2019-09-28", "_type"=>"doc", "_id"=>"BhlNLm4Ba4O_5bsE_PxF", "status"=>400, "error"=>{"type"=>"mapper_parsing_exception", "reason"=>"failed to parse field [timestamp] of type [date] in document with id 'BhlNLm4Ba4O_5bsE_PxF'", "caused_by"=>{"type"=>"illegal_argument_exception", "reason"=>"Invalid format: \"2019-09-28 23:32:10.586\" is malformed at \" 23:32:10.586\""}}}}}
It clearly has a problem with the date being formed but I don't see what that problem could be. Below are excerpts from my logstash config and the elasticsearch template. I include these because I'm trying to use the timestamp field to articulate the index in my logstash config by copying timestamp into #timestamp then formatting that to a YYY-MM-DD format and use that stored metadata to articulate my index
Logstash config:
input {
stdin { type => stdin }
}
filter {
csv {
separator => " " # this is a tab (/t) not just whitespace
columns => ["timestamp","field1", "field2", ...]
convert => {
"timestamp" => "date_time"
...
}
}
}
filter {
date {
match => ["timestamp", "yyyy-MM-dd' 'HH:mm:ss'.'SSS'"]
target => "#timestamp"
}
}
filter {
date_formatter {
source => "#timestamp"
target => "[#metadata][date]"
pattern => "YYYY-MM-dd"
}
}
filter {
mutate {
remove_field => [
"#timestamp",
...
]
}
}
output {
amazon_es {
hosts =>
["my-es-cluster.us-east-1.es.amazonaws.com"]
index => "my-index-%{[#metadata][date]}"
template => "my-config.json"
template_name => "my-index-*"
region => "us-east-1"
}
}
Template:
{
"template" : "my-index-*",
"mappings" : {
"doc" : {
"dynamic" : "false",
"properties" : {
"timestamp" : {
"type" : "date"
}, ...
},
"settings" : {
"index" : {
"number_of_shards" : "12",
"number_of_replicas" : "0"
}
}
}
When I inspect the raw data it looks like what the error is showing me and that appears to be well formed so I'm not sure what my issue is
Here is an example row, it's been redacted but the problem field is untouched and is the first one
2019-09-28 07:29:46.454 NA 2019-09-28 07:29:00 someApp 62847957802 62847957802
Turns out the source problem was the convert block. logstash is unable to understand the time format specified in the file. To address this I changed the original timestamp field to unformatted_timestamp and apply the date formatter I was already using
filter {
date {
match => ["unformatted_timestamp", "yyyy-MM-dd' 'HH:mm:ss'.'SSS'"]
target => "timestamp"
}
}
filter {
date_formatter {
source => "timestamp"
target => "[#metadata][date]"
pattern => "YYYY-MM-dd"
}
}
You are parsing your lines using the csv filter and setting the separator to a space, but your date is also split by a space, this way your first field, named timestamp only gets the date 2019-09-28 and the time is on the field named field1.
You can solve your problem creating a new field named date_and_time with the contents of the fields with the date and the time, for example.
csv {
separator => " "
columns => ["date","time","field1","field2","field3","field4","field5","field6"]
}
mutate {
add_field => { "date_and_time" => "%{date} %{time}" }
}
mutate {
remove_field => ["date","time"]
}
This will create a field named date_and_time with the value 2019-09-28 07:29:46.454, you can now use the date filter to parse this value into the #timestamp field, the default for logstash.
date {
match => ["date_and_time", "YYYY-MM-dd HH:mm:ss.SSS"]
}
This will leave you with two fields with the same value, date_and_time and #timestamp, the #timestamp is the default for logstash so I would suggest keeping it and removing the date_and_time that was created before.
mutate {
remove_field => ["date_and_time"]
}
Now you can create your date based index using the format YYYY-MM-dd and logstash will extract the date from the #timestamp field, just change your index line in your output for this one:
index => "my-index-%{+YYYY-MM-dd}"

How can I configure a custom field to be aggregatable in Kibana?

I am new to running the ELK stack. I have Logstash configured to feed my webapp log into Elasticsearch. I am trying to set up a visualization in Kibana that will show the count of unique users, given by the user_email field, which is parsed out of certain log lines.
I am fairly sure that I want to use the Unique Count aggregation, but I can't seem to get Kibana to include user_email in the list of fields which I can aggregate.
Here is my Logstash configuration:
filter {
if [type] == "wl-proxy-log" {
grok {
match => {
"message" => [
"(?<syslog_datetime>%{SYSLOGTIMESTAMP}\s+%{YEAR})\s+<%{INT:session_id}>\s+%{DATA:log_message}\s+license=%{WORD:license}\&user=(?<user_email>%{USERNAME}\#%{URIHOST})\&files=%{WORD:files}",
]
}
break_on_match => true
}
date {
match => [ "syslog_datetime", "MMM dd HH:mm:ss yyyy", "MMM d HH:mm:ss yyyy" ]
target => "#timestamp"
locale => "en_US"
timezone => "America/Los_Angeles"
}
kv {
source => "uri_params"
field_split => "&?"
}
}
}
output {
elasticsearch {
ssl => false
index => "wl-proxy"
manage_template => false
}
}
Here is the relevant mapping in Elasticsearch:
{
"wl-proxy" : {
"mappings" : {
"wl-proxy-log" : {
"user_email" : {
"full_name" : "user_email",
"mapping" : {
"user_email" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
}
}
Can anyone tell me what I am missing?
BTW, I am running CentOS with the following versions:
Elasticsearch Version: 6.0.0, Build: 8f0685b/2017-11-10T18:41:22.859Z, JVM: 1.8.0_151
Logstash v.6.0.0
Kibana v.6.0.0
Thanks!
I figured it out. The configuration was correct, AFAICT. The issue was that I simply hadn't refreshed the list of fields in the index in the Kibana UI.
Management -> Index Patterns -> Refresh Field List (the refresh icon)
After doing that, the field began appearing in the list of aggregatable terms, and I was able to create the necessary visualizations.

Logstash Mutate Filter to Convert Field type is not working

I have a field traceinfo.duration in my webapplog. ES maps it as a string, but i want to change it's field type to integer. My logstash.conf contains following filter section:
filter {
if "webapp-log" in [tags] {
json { source => "message" }
mutate {
convert => {"[traceinfo][duration]" => "integer"}
}
mutate {
remove_field => ["[beat][hostname]","[beat][name]"]
}
}
}
I am creating a new index with this configuration to test it. But my field type in kibana is still string for traceinfo.duration field. My logstash version is 5.3.0. Please help

Creating/updating array of objects in elasticsearch logstash output

I am facing an issue using elastic search output with logstash. Here is my sample event
{
"guid":"someguid",
"nestedObject":{
"field1":"val1",
"field2":"val2"
}
}
I expect the document with id to already be present in elasticsearch when this update happens.
Here is what I want to have in my elastic search document after 2 upserts:
{
"oldField":"Some old field from original document before upserts."
"nestedObjects":[{
"field1":"val1",
"field2":"val2"
},
{
"field3":"val3",
"field4":"val4"
}]
}
Here is my current elastic search output setting:
elasticsearch {
index => "elastictest"
action => "update"
document_type => "summary"
document_id => "%{guid}"
doc_as_upsert => true
script_lang => "groovy"
script_type => "inline"
retry_on_conflict => 3
script => "
if (ctx._source.nestedObjects) {
ctx._source.nestedObjects += event.nestedObject
} else {
ctx._source.nestedObjects = [event.nestedObject]
}
"
}
Here is the error I am getting:
response=>{"update"=>{"_index"=>"elastictest", "_type"=>"summary",
"_id"=>"64648dd3-c1e9-45fd-a00b-5a4332c91ee9", "status"=>400,
"error"=>{"type"=>"mapper_parsing_exception",
"reason"=>"failed to parse [event.nestedObject]",
"caused_by"=>{"type"=>"illegal_argument_exception",
"reason"=>"unknown property [field1]"}}}}
The issue turned out to be internally generated mapping in elasticsearch due to other documents with the same document_type with conflicting type on the field nestedObject. This caused elastic to throw a mapper parsing exception. Fixing this, fixed this issue.

Logstash output to ElasticSearch With Valid Types

ELK Stack has been successfully setup.
using grokdebug.herokuapp.com
my gork patterns are also valid and getting Dumped into ElasticSearch
filter {
if [type] == "some_log" {
grok {
match => { "message" => '%{WORD:word_1} %{TIME:time_1} %{DATE:date_1} %{NUMBER:number_1}'
}
overwrite => "message"
}
}
}
This grok parsing of input is completely correct.
and output is
output {
elasticsearch {
protocol => "http"
}
}
Problem is all the dumped variables are of String Type.
How to get them logged into their respective type in ElasticSearch ( Correct Mapping Type)
time_1, date_1 and number_1 all has same type which is of type
"time_1":{
"type":"string",
"norms":{
"enabled":false
},
"fields":{
"raw":{
"type":"string",
"index":"not_analyzed",
"ignore_above":256
}
}
}
I want date_1 to be indexed as Date Type, number_1 to be indexed as Number type in Elastic search.
PS: Is it possible to do that ?? determine the Type of Elasticsearch field from Logstash.
OR - How to send those field with proper type to ElasticSearch.
Thanks
In your grok pattern, use the form %{PATTERN:field:datatype} to turn the captured fields into something other than strings. Valid data types are "int" and "float". In your case you'd e.g. use %{NUMBER:number_1:int} to turn your number_1 field into an integer.
See the grok filter documentation under Grok Basics.
Another option is to use the mutate filter to convert the type of existing fields:
mutate {
convert => ["name-of-field", "integer"]
}
Related:
Data type conversion using logstash grok
Elasticsearch converting a string to number.
You can try to convert all the fields with ruby plugin.
In this example we combine the time_1 and date_1 together and convert them to Date format.
input {
stdin{}
}
filter {
grok {
match => [ "message" , "%{WORD:word_1} %{TIME:time_1} %{DATE:date_1} %{NUMBER:number_1}"]
overwrite => "message"
}
ruby {
code => "
datetime = event['time_1'] + ' ' + event['date_1']
event['datetime'] = Time.strptime(datetime,'%H:%M:%S %d-%m-%Y')
event['number_1'] = event['number_1'].to_i
"
}
}
output {
stdout { codec => rubydebug }
}
If you have another type that need to convert, you can try to find ruby api to convert them. Hope this can help you.

Resources