Creating/updating array of objects in elasticsearch logstash output - elasticsearch

I am facing an issue using elastic search output with logstash. Here is my sample event
{
"guid":"someguid",
"nestedObject":{
"field1":"val1",
"field2":"val2"
}
}
I expect the document with id to already be present in elasticsearch when this update happens.
Here is what I want to have in my elastic search document after 2 upserts:
{
"oldField":"Some old field from original document before upserts."
"nestedObjects":[{
"field1":"val1",
"field2":"val2"
},
{
"field3":"val3",
"field4":"val4"
}]
}
Here is my current elastic search output setting:
elasticsearch {
index => "elastictest"
action => "update"
document_type => "summary"
document_id => "%{guid}"
doc_as_upsert => true
script_lang => "groovy"
script_type => "inline"
retry_on_conflict => 3
script => "
if (ctx._source.nestedObjects) {
ctx._source.nestedObjects += event.nestedObject
} else {
ctx._source.nestedObjects = [event.nestedObject]
}
"
}
Here is the error I am getting:
response=>{"update"=>{"_index"=>"elastictest", "_type"=>"summary",
"_id"=>"64648dd3-c1e9-45fd-a00b-5a4332c91ee9", "status"=>400,
"error"=>{"type"=>"mapper_parsing_exception",
"reason"=>"failed to parse [event.nestedObject]",
"caused_by"=>{"type"=>"illegal_argument_exception",
"reason"=>"unknown property [field1]"}}}}

The issue turned out to be internally generated mapping in elasticsearch due to other documents with the same document_type with conflicting type on the field nestedObject. This caused elastic to throw a mapper parsing exception. Fixing this, fixed this issue.

Related

Delete data or document from elastic search using logstash

I am trying to delete elastic search data or document using logstash configuration but delete seems to be not working.
I am using logstash 5.6.8 version
Below is the logstash configuration file:
```input {
jdbc {
#db configuration
'''
statement => " select * from table "
}
output {
elasticsearch {
action => "delete"
hosts => "localhost"
index => "myindex"
document_type => "doctype"
document_id => "%{id}"
}
stdout { codec => json_lines }
}```
But the above configuration are deleting the id's present in my db table and not deleting the id's that are not present.
when i sync from db to elastic search using logstash, i expect that deleted rows in db also synched and it should be consistent.
I also tried below configuration but getting some error:
```input {
jdbc {
#db configuration
'''
statement => " select * from table "
}
output {
elasticsearch {
action => "delete"
hosts => "localhost"
index => "myindex"
document_type => "doctype"
}
stdout { codec => json_lines }
}```
Error in logstash console:
"current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/stud-0.0.23/lib/stud/interval.rb:89:in sleep'"}]}}
[2019-12-27T16:30:16,087][WARN ][logstash.shutdownwatcher ] {"inflight_count"=>9, "stalling_thread_info"=>{"other"=>[{"thread_id"=>22, "name"=>"[main]>worker0", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/stud-0.0.23/lib/stud/interval.rb:89:insleep'"}]}}
[2019-12-27T16:30:18,623][ERROR][logstash.outputs.elasticsearch] Encountered a retryable error. Will Retry with exponential backoff {:code=>400, :url=>"http://localhost:9200/_bulk"}
[2019-12-27T16:30:21,086][WARN ][logstash.shutdownwatcher ] {"inflight_count"=>9, "stalling_thread_info"=>{"other"=>[{"thread_id"=>22, "name"=>"[main]>worker0", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/stud-0.0.23/lib/stud/interval.rb:89:in `sleep'"}]}}
Can someone tell me how to delete document and sync db data or how to handle deleted records in elastic search?

Rewriting _version metadata from elasticsearch field using logstash

I'm using the ELK- Stack to import CSV files. Each time the CSV files are imported the "_version" field of a document increases, which is as expected. However, because the _version field is a metadata field, is not indexed by ELK. Therefore the field is not searchable and cannot be used in the Dashboard.
I've created a second logstash configuration where both the input as well as the output are Elasticsearch.
Filter configuration:
filter {
mutate {
add_field => {"Version" => "{[#metadata][_version]}"}
}
}
Input configuration:
input {
elasticsearch {
hosts => ["localhost:9200"]
index => "test_csv"
query => '{"query":{"match_all" : {}}}'
size => 1000
scroll => "1s"
docinfo => true
docinfo_fields => ["_index", "_type", "_id", "_version"]
schedule => "/1 * * * *"
}
}
I cannot get the value from the _version field. The Output in Kibana looks like:
Version {[#metadata][_version]}
If I replace the _version field in the filter with _id or _index I get information back.
Any ideas on how to get value out of the _version field? Any thoughts on the matter are highly appreciated.
Chloe
For version 6.4.2, following works for me:
filter {
mutate {
add_field => {"Version" => "%{[#version]}"}
}
}

Logstash Update a document in elasticsearch

Trying to update a specific field in elasticsearch through logstash. Is it possible to update only a set of fields through logstash ?
Please find the code below,
input {
file {
path => "/**/**/logstash/bin/*.log"
start_position => "beginning"
sincedb_path => "/dev/null"
type => "multi"
}
}
filter {
csv {
separator => "|"
columns => ["GEOREFID","COUNTRYNAME", "G_COUNTRY", "G_UPDATE", "G_DELETE", "D_COUNTRY", "D_UPDATE", "D_DELETE"]
}
elasticsearch {
hosts => ["localhost:9200"]
index => "logstash-data-monitor"
query => "GEOREFID:%{GEOREFID}"
fields => [["JSON_COUNTRY","G_COUNTRY"],
["XML_COUNTRY","D_COUNTRY"]]
}
if [G_COUNTRY] {
mutate {
update => { "D_COUNTRY" => "%{D_COUNTRY}"
}
}
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "logstash-data-monitor"
document_id => "%{GEOREFID}"
}
}
We are using the above configuration when we use this the null value field is getting removed instead of skipping null value update.
Data comes from 2 different source. One is from XML file and the other is from JSON file.
XML log format : GEO-1|CD|23|John|892|Canada|31-01-2017|QC|-|-|-|-|-
JSON log format : GEO-1|AS|33|-|-|-|-|-|Mike|123|US|31-01-2017|QC
When adding one log new document will get created in the index. When reading the second log file the existing document should get updated. The update should happen only in the first 5 fields if log file is XML and last 5 fields if the log file is JSON. Please suggest us on how to do this in logstash.
Tried with the above code. Please check and can any one help on how to fix this ?
For the Elasticsearch output to do any action other than index you need to tell it to do something else.
elasticsearch {
hosts => ["localhost:9200"]
index => "logstash-data-monitor"
action => "update"
document_id => "%{GEOREFID}"
}
This should probably be wrapped in a conditional to ensure you're only updating records that need updating. There is another option, though, doc_as_upsert
elasticsearch {
hosts => ["localhost:9200"]
index => "logstash-data-monitor"
action => "update"
doc_as_upsert => true
document_id => "%{GEOREFID}"
}
This tells the plugin to insert if it is new, and update if it is not.
However, you're attempting to use two inputs to define a document. This makes things complicated. Also, you're not providing both inputs, so I'll improvise. To provide different output behavior, you will need to define two outputs.
input {
file {
path => "/var/log/xmlhome.log"
[other details]
}
file {
path => "/var/log/jsonhome.log"
[other details]
}
}
filter { [some stuff ] }
output {
if [path] == '/var/log/xmlhome.log' {
elasticsearch {
[XML file case]
}
} else if [path] == '/var/log/jsonhome.log' {
elasticsearch {
[JSON file case]
action => "update"
}
}
}
Setting it up like this will allow you to change the ElasticSearch behavior based on where the event originated.

Can't access Elasticsearch index name metadata in Logstash filter

I want to add the elasticsearch index name as a field in the event when processing in Logstash. This is suppose to be pretty straight forward but the index name does not get printed out. Here is the complete Logstash config.
input {
elasticsearch {
hosts => "elasticsearch.example.com"
index => "*-logs"
}
}
filter {
mutate {
add_field => {
"log_source" => "%{[#metadata][_index]}"
}
}
}
output {
elasticsearch {
index => "logstash-%{+YYYY.MM}"
}
}
This will result in log_source being set to %{[#metadata][_index]} and not the actual name of the index. I have tried this with _id and without the underscores but it will always just output the reference and not the value.
Doing just %{[#metadata]} crashes Logstash with the error that it's trying to accessing the list incorrectly so [#metadata] is being set but it seems like index or any values are missing.
Does anyone have a another way of assigning the index name to the event?
I am using 5.0.1 of both Logstash and Elasticsearch.
You're almost there, you're simply missing the docinfo setting, which is false by default:
input {
elasticsearch {
hosts => "elasticsearch.example.com"
index => "*-logs"
docinfo => true
}
}

Logstash -> Elasticsearch : update document #timestamp if newer, discard if older

Using the elasticsearch output in logstash, how can i update only the #timestamp for a log message if newer?
I don't want to reindex the whole document, nor have the same log message indexed twice.
Also, if the #timestamp is older, it must not update/replace the current version.
Currently, i'm doing this:
filter {
if ("cloned" in [tags]) {
fingerprint {
add_tag => [ "lastlogin" ]
key => "lastlogin"
method => "SHA1"
}
}
}
output {
if ("cloned" in [tags]) {
elasticsearch {
action => "update"
doc_as_upsert => true
document_id => "%{fingerprint}"
index => "lastlogin-%{+YYYY.MM}"
sniffing => true
template_overwrite => true
}
}
}
It is similar to How to deduplicate documents while indexing into elasticsearch from logstash but i do not want to always update the message field; only if the #timestamp field is more recent.
You can't decide from Logstash level if a document needs to be updated or nothing should be done, this needs to be decided at Elasticsearch level. Which means that you need to experiment and test with _update API.
I suggest looking at https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update.html#upserts. Meaning, if the document exists the script is executed (where you can check, if you want, the #timestamp), otherwise the content of upsert is considered as a new document.

Resources