Elastic Document's #version not incrementing when updating via Logstash - elasticsearch

I want to load the issue data from a JIRA instance to my Elastic Stack on a regular basis. I don't want to create a new elastic document every time I pull the data from the JIRA API, but instead update the existing document document, which means there should only exist one document per JIRA issue. When updating, I would expect the #version field to increment automatically when setting the document_id field of the elasticsearch output plugin.
Currently working setup
Elastic Stack: Version 7.4.0 running on Ubuntu in Docker containers
Logstash Input stage: get the JIRA issue data via http_poller input plugin
Logstash Filter stage: use the split filter plugin to modify the JSON data as needed
Logstash Output stage: pipe the data to Elasticsearch and make it visible in Kibana
Where I am struggling
The data is correctly registered in Elastic and shown in Kibana. As expected there is one document per issue. However, the document is being overwritten but #version stays at value 1. I assumend using action => "update", doc_as_upsert => true and document_id => "%{[#metadata][id]}" would be enough to make Elasticsearch realize that it needs to increment the version of the document.
I am wondering in general if this is the correct approach to make the JIRA issue data searchable over time. For example, will I be able to find the status quo of a JIRA ticket at a past #version? Or will the #version value only give me the information how often the document was updated, without giving me the indiviual document version's values?
logstash.conf (certain data was removed and replaced with <> tags)
input {
http_poller {
urls => {
data => {
method => get
url => "https://<myjira>.com/jira/rest/api/2/search?<searchJQL>"
headers => {
Authorization => "Basic <censored>"
Accept => "application/json"
"Content-Type" => "application/json"
}
}
}
request_timeout => 60
schedule => { every => "10s" } # low value for debugging
codec => "json"
}
}
filter {
split {
field => "issues"
add_field => {
"key" => "%{[issues][key]}"
"Summary" => "%{[issues][fields][summary]}"
[#metadata]["id"] => "%{[issues][id]}" # unique ID of a JIRA issue, the JIRA issue key could also be used
}
remove_field => [ "startAt", "total", "maxResults", "expand", "issues"]
}
}
output {
stdout { codec => rubydebug }
elasticsearch {
index => "gsep"
user => ["<usr>"]
password => ["<pw>"]
hosts => ["elasticsearch:9200"]
action => "update"
document_id => "%{[#metadata][id]}"
doc_as_upsert => true
}
}
Screenshots from Document Data in Kibana
I had to censor information, but the missing information should not be relevant. On the screenshot you can see that the same _id is correctly set, but the #version stays at 1. In Elasticstash/Kibana exists only exactly this document for the respective issue/_id.

The #version field is coming from logstash and is just an indicator for the version of your log message format. There is no auto-increment functionality etc.
Please note, there is also a _version field in elasticsearch documents.
_version is an automatically incremented value used for optimistic locking in a concurrency scenario.
Just to be clear, elasticsearch can't give you what you are expecting in terms of versioning out of the box. You can't access a different version of the same document relying on _version. There are design patterns hot to implement such a document history in elasticsearch. But that's a broad question with many answers and out of scope of this question.

Related

Update existing document of Elasticsearch and insert current record through logstash

I am trying to insert a record into elasticsearch and also update a field of an existing document whose _id I'll be getting from the current record. After searching online, I found that we can use the _update_by_query api with the http plugin in logstash. This is the below configuration.
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "my_index_*"
document_id => "%{id_field}"
}
http {
url => "http://localhost:9200/my_index_*/_update_by_query"
http_method => "post"
content_type => "application/json"
format => "message"
message => '{"query":{"match":{"_id":"%{previous_record_id}"}},"script":{"source":"ctx._source.field_to_be_updated=xyz","lang":"painless"}}'
}
}
The Elasticsearch has no password protection and so I haven't added an authorization header.
But when I start logstash, the current record gets inserted but I always the below error for the http plugin.
2022-05-05T11:31:51,916][ERROR][logstash.outputs.http ][logstash_txe] [HTTP Output Failure] Encountered non-2xx HTTP code 400 {:response_code=>400, :url=>"http://localhost:9200/my_index_*/_update_by_query", :event=>#<LogStash::Event:0x192606f8>}
It's not how you're supposed to do it, you can simply use the elasticsearch output for both use cases.
The first one for indexing a new record and the following one for partial updating another record whose id is previous_record_id. The event data can be accessed in params.event within the script:
elasticsearch {
hosts => ["localhost:9200"]
index => "my_index_xyz"
document_id => "%{previous_record_id}"
action => "update"
script => "ctx._source.field_to_be_updated = params.event.xyz"
script_lang => "painless"
script_type => "inline"
}

_id like variable logstash output email

I would use _id (metadata) like variable in my output mail.
but it wouldn't work because it can't considerate _id like variable.
someone have idea ?
I make my output on this way :
elasticsearch {
hosts => [ "https://xxx:9200" ]
ssl => true
ssl_certificate_verification => false
user => "admin"
password => "admin"
index => "apache"
}
stdout { codec => rubydebug }
if [tags] {
email {
to => "xxx"
address => "smtp.gmail.com"
port => 587
username => "xxx"
password => "xxx"
use_tls => true
body => "something happened: %{message} http://xxx/5601/app/discover#/doc/82de0080-acd9-11eb-a4b8-614232a13000/indexname?id=%{id}"
}}} ```
I would proceed differently and leverage the Alerting & Actions feature in Kibana.
You can set an alert on a custom query (e.g. tags exist) and decide to send the alert via email.
UPDATE:
When using OpenDistro, you have access to their Alerting plugin that works in a similar way and that you can use to send your alerts.
I think you're trying to create an email containing a direct link to the document in question.
You can achieve this with three small changes to your existing configuration.
Generate an 'id' string during the filter stage of the pipeline
Use that string as document_id in the elasticsearch output
Use that string in your email output template
By default, Elasticsearch will generate a random _id which is not shared with Logstash. That's why we need to do all three of these steps.
Generate an ID string
The UUID and Fingerprint filter plugins can help here. I'll use UUID because it's simpler:
filter {
uuid {
target => "[#metadata][uuid]"
}
}
This generates a random UUID, which should be adequate for your purposes. If you'd prefer to use a consistent hash (e.g. for deduplication), then use Fingerprint.
Set document_id to the ID string
Use the UUID as _id by adding document_id => "[#metadata][uuid]" to your elasticsearch output.
elasticsearch {
hosts => [ "https://xxx:9200" ]
ssl => true
ssl_certificate_verification => false
user => "******"
password => "******"
index => "apache"
document_id => "[#metadata][uuid]"
}
More detail in the Elasticsearch output plugin docs here
Include the ID string in email output body template
Your body line should be updated to include %{[#metadata][uuid]}.
body => "something happened: %{message} http://xxx/5601/app/discover#/doc/82de0080-acd9-11eb-a4b8-614232a13000/indexname?id=%{[#metadata][uuid]}"
Note regarding Kibana index pattern reference
I assume 82de0080-acd9-11eb-a4b8-614232a13000 is the object ID of the Kibana index pattern relevant to the ES indices here. For other index-patterns, or for others attempting the same thing, the simplest way to determine the appropriate string is the navigate in Kibana to a single document then replace with the ID variable above.
Alternatively, at the time of writing (May 2021) you can replace that string with the word mixed, like so:
http://xxx/5601/app/discover#/doc/mixed/indexname?id=%{[#metadata][uuid]}
This may break in future, and you'll still need to get indexname right.

Logstash elasticsearch output plugin - Populating api_key from metadata field does not work

I am using the elasticsearch output plugin of logstash to post my events to elasticsearch. I am using the api_key authentication method. It is all working fine until I have the api_key parameter value hardcoded. For Ex:
api_key => "xxxxxxxxxxxx:yyyyyyyyyyyyyyyy"
where Xs resemble id and Ys the api_key generated using the create api_key security api.
But in my filter I am adding the value to be passed to api_key parameter into a metadata field [#metadata][myapikey]. The idea is use that in the output plugin as shown below
output {
elasticsearch {
hosts => ["https://localhost:9200"]
cacert => 'path-to-ca.crt'
index => "my-index-name"
api_key => "%{[#metadata][myapikey]}"
ssl => true
}
}
As per my understanding, this should have worked like it would work if we provided the index from a metadata field like index => "%{[#metadata][some-index-name]}". I have used this for index names successfully before.
Not sure why the same implementation does not work for api_key parameter. I have made sure using stdout plugin that the metadata carries the right value in it, but still see invalid api_key value message when I run this.
Please help here.
Adding full pipeline config
input {
generator {
lines => [
'{"timestamp" : "26/01/2021", "fruit-ID" : "t6789", "vegetable-ID" : "Veg1-1002", "Status" : "OK", "myapikey" : "3p4oIUr-Qxxxxxxx-rA"}'
]
count => 1
codec => "json"
}
}
filter {
mutate {
add_field => { "[#metadata][myapikey]" => "xxxxxxxxxxx-%{myapikey}" }
remove_field => ["myapikey"]
}
}
output {
elasticsearch {
hosts => ["https://localhost:9200"]
cacert => 'path-to-ca.crt'
index => "my-index-name"
api_key => "%{[#metadata][myapikey]}"
ssl => true
}
}
I think the reason is because the api_key setting doesn't support the sprintf format.
In contrary to the index settings which supports that format, api_key doesn't, so what happens is that Logstash sends the raw value %{[#metadata][myapikey]} (without resolving it) as the API key and that obviously fails.
I think the main reason behind this design decision is that an API key, much like a password, is not supposed to be a field that travels in each document.

Logstash -> Elasticsearch : update document #timestamp if newer, discard if older

Using the elasticsearch output in logstash, how can i update only the #timestamp for a log message if newer?
I don't want to reindex the whole document, nor have the same log message indexed twice.
Also, if the #timestamp is older, it must not update/replace the current version.
Currently, i'm doing this:
filter {
if ("cloned" in [tags]) {
fingerprint {
add_tag => [ "lastlogin" ]
key => "lastlogin"
method => "SHA1"
}
}
}
output {
if ("cloned" in [tags]) {
elasticsearch {
action => "update"
doc_as_upsert => true
document_id => "%{fingerprint}"
index => "lastlogin-%{+YYYY.MM}"
sniffing => true
template_overwrite => true
}
}
}
It is similar to How to deduplicate documents while indexing into elasticsearch from logstash but i do not want to always update the message field; only if the #timestamp field is more recent.
You can't decide from Logstash level if a document needs to be updated or nothing should be done, this needs to be decided at Elasticsearch level. Which means that you need to experiment and test with _update API.
I suggest looking at https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update.html#upserts. Meaning, if the document exists the script is executed (where you can check, if you want, the #timestamp), otherwise the content of upsert is considered as a new document.

Copy ElasticSearch-Index with Logstash

I have an ready-build Apache-Index on one machine, that I would like to clone to another machine using logstash. Fairly easy i thought
input {
elasticsearch {
host => "xxx.xxx.xxx.xxx"
index => "logs"
}
}
filter {
}
output {
elasticsearch {
cluster => "Loa"
host => "127.0.0.1"
protocol => http
index => "logs"
index_type => "apache_access"
}
}
that pulls over the docs, but doesn't stop as it uses the default query "*" (the original index has ~50.000 docs and I killed the former script, when the new index was over 600.000 docs and rising)
Next I tried to make sure the docs would get updated instead of duplicated, but this commit hasn't made it yet, so i don't have a primary..
Then I remembered sincedb but don't seem to be able to use that in the query (or is that possible)
Any advice? Maybe a complete different approach? Thanks a lot!
Assuming that the elasticsearch input creates a logstash event with the document id ( I assume it will be _id or something similar), try setting the elastic search output the following way:
output {
elasticsearch {
cluster => "Loa"
host => "127.0.0.1"
protocol => http
index => "logs"
index_type => "apache_access"
document_id => "%{_id}"
}
}
That way, even if the elasticsearch input, for whatever reason, continues to push the same documents indefinitely, elasticsearch will merely updated the existing documents, instead of creating new documents with new ids.
Once you reach 50,000, you can stop.

Resources