Rewriting _version metadata from elasticsearch field using logstash - elasticsearch

I'm using the ELK- Stack to import CSV files. Each time the CSV files are imported the "_version" field of a document increases, which is as expected. However, because the _version field is a metadata field, is not indexed by ELK. Therefore the field is not searchable and cannot be used in the Dashboard.
I've created a second logstash configuration where both the input as well as the output are Elasticsearch.
Filter configuration:
filter {
mutate {
add_field => {"Version" => "{[#metadata][_version]}"}
}
}
Input configuration:
input {
elasticsearch {
hosts => ["localhost:9200"]
index => "test_csv"
query => '{"query":{"match_all" : {}}}'
size => 1000
scroll => "1s"
docinfo => true
docinfo_fields => ["_index", "_type", "_id", "_version"]
schedule => "/1 * * * *"
}
}
I cannot get the value from the _version field. The Output in Kibana looks like:
Version {[#metadata][_version]}
If I replace the _version field in the filter with _id or _index I get information back.
Any ideas on how to get value out of the _version field? Any thoughts on the matter are highly appreciated.
Chloe

For version 6.4.2, following works for me:
filter {
mutate {
add_field => {"Version" => "%{[#version]}"}
}
}

Related

Logstash Elasticsearch plugin. Compare results from two sources

I have two deployed Elasticsearch clusters. Data "surpassingly" should be the same in both clusters. My main aim is to compare _source field for each elasticsearch document from source and target ES clusters.
I created logstash config in which I define Elasticsearch input plugin, which run over each document in source cluster, next using elasticsearch filter look up the target Elasticsearch cluster and query from it document by _id which I took from source cluster, match results of the _source field for both documents.
Could you please helm to implement such a config.
input {
elasticsearch {
hosts => ["source_cluster:9200"]
ssl => true
user => "user"
password => "password"
index => "my_index_pattern"
}
}
filter {
mutate {
remove_field => ["#version", "#timestamp"]
}
elasticsearch {
hosts => ["target_custer:9200"]
ssl => true
user => "user"
password => "password"
query => ???????
match _source field ????
}
}
output {
stdout { codec => rubydebug }
}
Maybe print some results of comparison...

Logstash to Elasticsearch adding new data in the fields instead of overwriting the existing data?

My pipeline is like this: CouchDB -> Logstash -> ElasticSearch. Every time I update a field value in CouchDB, the data in Elasticsearch is overwritten. My requirement is that, when data in a field is updated in CouchDB, I want to create a new data in Elasticsearch instead of overwriting the existing one.
My current logstash.conf is like this:
input {
couchdb_changes {
host => "<ip>"
port => <port>
db => "test_database"
keep_id => false
keep_revision => true
initial_sequence => 0
always_reconnect => true
#sequence_path => "/usr/share/logstash/config/seqfile"
}
}
output {
if([doc][doc_type] == "HR") {
elasticsearch {
hosts => ["http://elasticsearch:9200"]
index => "hrindex_new_1"
document_id => "%{[doc][_id]}"
user => elastic
password => changeme
}
}
if([doc][doc_type] == "SoftwareEngg") {
elasticsearch {
hosts => ["http://elasticsearch:9200"]
index => "softwareenggindex_new"
document_id => "%{[doc][_id]}"
user => elastic
password => changeme
}
}
}
How to do this?
You are using the document_id option in your elasticsearch output, what this option does is tell elasticsearch that it should index the document using this value as the document id, which will be a unique id.
document_id => "%{[doc][_id]}"
So, if in your source document the field [doc][_id] has for example the value of 1000, the _id field in the elasticsearch will also have the same value.
When you change something in the source document that has the [doc][_id] equals to 1000, it will replace the document with the _id equals to 1000 in elasticsearch because the _id is unique.
To achieve what you want, you will need to remove the option document_id from your outputs, this way elasticsearch will generate a unique value for the _id field of the document.
elasticsearch {
hosts => ["http://elasticsearch:9200"]
index => "softwareenggindex_new"
user => elastic
password => changeme
}

Can't access Elasticsearch index name metadata in Logstash filter

I want to add the elasticsearch index name as a field in the event when processing in Logstash. This is suppose to be pretty straight forward but the index name does not get printed out. Here is the complete Logstash config.
input {
elasticsearch {
hosts => "elasticsearch.example.com"
index => "*-logs"
}
}
filter {
mutate {
add_field => {
"log_source" => "%{[#metadata][_index]}"
}
}
}
output {
elasticsearch {
index => "logstash-%{+YYYY.MM}"
}
}
This will result in log_source being set to %{[#metadata][_index]} and not the actual name of the index. I have tried this with _id and without the underscores but it will always just output the reference and not the value.
Doing just %{[#metadata]} crashes Logstash with the error that it's trying to accessing the list incorrectly so [#metadata] is being set but it seems like index or any values are missing.
Does anyone have a another way of assigning the index name to the event?
I am using 5.0.1 of both Logstash and Elasticsearch.
You're almost there, you're simply missing the docinfo setting, which is false by default:
input {
elasticsearch {
hosts => "elasticsearch.example.com"
index => "*-logs"
docinfo => true
}
}

Creating/updating array of objects in elasticsearch logstash output

I am facing an issue using elastic search output with logstash. Here is my sample event
{
"guid":"someguid",
"nestedObject":{
"field1":"val1",
"field2":"val2"
}
}
I expect the document with id to already be present in elasticsearch when this update happens.
Here is what I want to have in my elastic search document after 2 upserts:
{
"oldField":"Some old field from original document before upserts."
"nestedObjects":[{
"field1":"val1",
"field2":"val2"
},
{
"field3":"val3",
"field4":"val4"
}]
}
Here is my current elastic search output setting:
elasticsearch {
index => "elastictest"
action => "update"
document_type => "summary"
document_id => "%{guid}"
doc_as_upsert => true
script_lang => "groovy"
script_type => "inline"
retry_on_conflict => 3
script => "
if (ctx._source.nestedObjects) {
ctx._source.nestedObjects += event.nestedObject
} else {
ctx._source.nestedObjects = [event.nestedObject]
}
"
}
Here is the error I am getting:
response=>{"update"=>{"_index"=>"elastictest", "_type"=>"summary",
"_id"=>"64648dd3-c1e9-45fd-a00b-5a4332c91ee9", "status"=>400,
"error"=>{"type"=>"mapper_parsing_exception",
"reason"=>"failed to parse [event.nestedObject]",
"caused_by"=>{"type"=>"illegal_argument_exception",
"reason"=>"unknown property [field1]"}}}}
The issue turned out to be internally generated mapping in elasticsearch due to other documents with the same document_type with conflicting type on the field nestedObject. This caused elastic to throw a mapper parsing exception. Fixing this, fixed this issue.

Logstash -> Elasticsearch : update document #timestamp if newer, discard if older

Using the elasticsearch output in logstash, how can i update only the #timestamp for a log message if newer?
I don't want to reindex the whole document, nor have the same log message indexed twice.
Also, if the #timestamp is older, it must not update/replace the current version.
Currently, i'm doing this:
filter {
if ("cloned" in [tags]) {
fingerprint {
add_tag => [ "lastlogin" ]
key => "lastlogin"
method => "SHA1"
}
}
}
output {
if ("cloned" in [tags]) {
elasticsearch {
action => "update"
doc_as_upsert => true
document_id => "%{fingerprint}"
index => "lastlogin-%{+YYYY.MM}"
sniffing => true
template_overwrite => true
}
}
}
It is similar to How to deduplicate documents while indexing into elasticsearch from logstash but i do not want to always update the message field; only if the #timestamp field is more recent.
You can't decide from Logstash level if a document needs to be updated or nothing should be done, this needs to be decided at Elasticsearch level. Which means that you need to experiment and test with _update API.
I suggest looking at https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update.html#upserts. Meaning, if the document exists the script is executed (where you can check, if you want, the #timestamp), otherwise the content of upsert is considered as a new document.

Resources