Using the elasticsearch output module, I have doc_as_upsert set to true for many update actions. I'm using the #metadata capability to store the fields which I don't want upserting into ES.
However, I'm having trouble accessing fields inside of #metadata inside the Elasticsearch script function. The below script makes sure the urls array is less than 1,001 and also makes sure the new URL being added is unique to the array:
output {
elasticsearch{
hosts => "***************"
user => "****"
index => "****"
password => "*********"
document_type => "document"
document_id => "%{[#metadata][domain]}"
action => "update"
script => 'if(ctx._source.urls.length < 1001){ boolean match = false; for (url in ctx._source.urls){if (url == params.event.get("[#metadata][url]")){match = true;}} if(match==false){ctx._source.urls.add(params.event.get("[#metadata][url]"));}}'
doc_as_upsert => true
}
}
Inside of ES, the URL value gets appended simply as NULL. I viewed the url metadata field inside of rubydebug and it is for sure being added. I can't reference the metadata directly %{[#metadata][domain]} because I run into script compilation errors for too many generated scripts.
Is there any way to access the metadata.url field inside of the script function for elasticsearch output plugin?
Thank you!
You cant use #metadata fields with output plugins, they are not sent (that the aim of metadata).
So, you could add a mutate filter or a ruby filter to copy metadata fields to event data:
ruby {
code => 'event.get("[#metadata][rabbitmq_headers]").each {|k,v| event.set(k, v)}'
}
Related
I am trying to insert a record into elasticsearch and also update a field of an existing document whose _id I'll be getting from the current record. After searching online, I found that we can use the _update_by_query api with the http plugin in logstash. This is the below configuration.
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "my_index_*"
document_id => "%{id_field}"
}
http {
url => "http://localhost:9200/my_index_*/_update_by_query"
http_method => "post"
content_type => "application/json"
format => "message"
message => '{"query":{"match":{"_id":"%{previous_record_id}"}},"script":{"source":"ctx._source.field_to_be_updated=xyz","lang":"painless"}}'
}
}
The Elasticsearch has no password protection and so I haven't added an authorization header.
But when I start logstash, the current record gets inserted but I always the below error for the http plugin.
2022-05-05T11:31:51,916][ERROR][logstash.outputs.http ][logstash_txe] [HTTP Output Failure] Encountered non-2xx HTTP code 400 {:response_code=>400, :url=>"http://localhost:9200/my_index_*/_update_by_query", :event=>#<LogStash::Event:0x192606f8>}
It's not how you're supposed to do it, you can simply use the elasticsearch output for both use cases.
The first one for indexing a new record and the following one for partial updating another record whose id is previous_record_id. The event data can be accessed in params.event within the script:
elasticsearch {
hosts => ["localhost:9200"]
index => "my_index_xyz"
document_id => "%{previous_record_id}"
action => "update"
script => "ctx._source.field_to_be_updated = params.event.xyz"
script_lang => "painless"
script_type => "inline"
}
I would use _id (metadata) like variable in my output mail.
but it wouldn't work because it can't considerate _id like variable.
someone have idea ?
I make my output on this way :
elasticsearch {
hosts => [ "https://xxx:9200" ]
ssl => true
ssl_certificate_verification => false
user => "admin"
password => "admin"
index => "apache"
}
stdout { codec => rubydebug }
if [tags] {
email {
to => "xxx"
address => "smtp.gmail.com"
port => 587
username => "xxx"
password => "xxx"
use_tls => true
body => "something happened: %{message} http://xxx/5601/app/discover#/doc/82de0080-acd9-11eb-a4b8-614232a13000/indexname?id=%{id}"
}}} ```
I would proceed differently and leverage the Alerting & Actions feature in Kibana.
You can set an alert on a custom query (e.g. tags exist) and decide to send the alert via email.
UPDATE:
When using OpenDistro, you have access to their Alerting plugin that works in a similar way and that you can use to send your alerts.
I think you're trying to create an email containing a direct link to the document in question.
You can achieve this with three small changes to your existing configuration.
Generate an 'id' string during the filter stage of the pipeline
Use that string as document_id in the elasticsearch output
Use that string in your email output template
By default, Elasticsearch will generate a random _id which is not shared with Logstash. That's why we need to do all three of these steps.
Generate an ID string
The UUID and Fingerprint filter plugins can help here. I'll use UUID because it's simpler:
filter {
uuid {
target => "[#metadata][uuid]"
}
}
This generates a random UUID, which should be adequate for your purposes. If you'd prefer to use a consistent hash (e.g. for deduplication), then use Fingerprint.
Set document_id to the ID string
Use the UUID as _id by adding document_id => "[#metadata][uuid]" to your elasticsearch output.
elasticsearch {
hosts => [ "https://xxx:9200" ]
ssl => true
ssl_certificate_verification => false
user => "******"
password => "******"
index => "apache"
document_id => "[#metadata][uuid]"
}
More detail in the Elasticsearch output plugin docs here
Include the ID string in email output body template
Your body line should be updated to include %{[#metadata][uuid]}.
body => "something happened: %{message} http://xxx/5601/app/discover#/doc/82de0080-acd9-11eb-a4b8-614232a13000/indexname?id=%{[#metadata][uuid]}"
Note regarding Kibana index pattern reference
I assume 82de0080-acd9-11eb-a4b8-614232a13000 is the object ID of the Kibana index pattern relevant to the ES indices here. For other index-patterns, or for others attempting the same thing, the simplest way to determine the appropriate string is the navigate in Kibana to a single document then replace with the ID variable above.
Alternatively, at the time of writing (May 2021) you can replace that string with the word mixed, like so:
http://xxx/5601/app/discover#/doc/mixed/indexname?id=%{[#metadata][uuid]}
This may break in future, and you'll still need to get indexname right.
I am using the elasticsearch output plugin of logstash to post my events to elasticsearch. I am using the api_key authentication method. It is all working fine until I have the api_key parameter value hardcoded. For Ex:
api_key => "xxxxxxxxxxxx:yyyyyyyyyyyyyyyy"
where Xs resemble id and Ys the api_key generated using the create api_key security api.
But in my filter I am adding the value to be passed to api_key parameter into a metadata field [#metadata][myapikey]. The idea is use that in the output plugin as shown below
output {
elasticsearch {
hosts => ["https://localhost:9200"]
cacert => 'path-to-ca.crt'
index => "my-index-name"
api_key => "%{[#metadata][myapikey]}"
ssl => true
}
}
As per my understanding, this should have worked like it would work if we provided the index from a metadata field like index => "%{[#metadata][some-index-name]}". I have used this for index names successfully before.
Not sure why the same implementation does not work for api_key parameter. I have made sure using stdout plugin that the metadata carries the right value in it, but still see invalid api_key value message when I run this.
Please help here.
Adding full pipeline config
input {
generator {
lines => [
'{"timestamp" : "26/01/2021", "fruit-ID" : "t6789", "vegetable-ID" : "Veg1-1002", "Status" : "OK", "myapikey" : "3p4oIUr-Qxxxxxxx-rA"}'
]
count => 1
codec => "json"
}
}
filter {
mutate {
add_field => { "[#metadata][myapikey]" => "xxxxxxxxxxx-%{myapikey}" }
remove_field => ["myapikey"]
}
}
output {
elasticsearch {
hosts => ["https://localhost:9200"]
cacert => 'path-to-ca.crt'
index => "my-index-name"
api_key => "%{[#metadata][myapikey]}"
ssl => true
}
}
I think the reason is because the api_key setting doesn't support the sprintf format.
In contrary to the index settings which supports that format, api_key doesn't, so what happens is that Logstash sends the raw value %{[#metadata][myapikey]} (without resolving it) as the API key and that obviously fails.
I think the main reason behind this design decision is that an API key, much like a password, is not supposed to be a field that travels in each document.
I want to load the issue data from a JIRA instance to my Elastic Stack on a regular basis. I don't want to create a new elastic document every time I pull the data from the JIRA API, but instead update the existing document document, which means there should only exist one document per JIRA issue. When updating, I would expect the #version field to increment automatically when setting the document_id field of the elasticsearch output plugin.
Currently working setup
Elastic Stack: Version 7.4.0 running on Ubuntu in Docker containers
Logstash Input stage: get the JIRA issue data via http_poller input plugin
Logstash Filter stage: use the split filter plugin to modify the JSON data as needed
Logstash Output stage: pipe the data to Elasticsearch and make it visible in Kibana
Where I am struggling
The data is correctly registered in Elastic and shown in Kibana. As expected there is one document per issue. However, the document is being overwritten but #version stays at value 1. I assumend using action => "update", doc_as_upsert => true and document_id => "%{[#metadata][id]}" would be enough to make Elasticsearch realize that it needs to increment the version of the document.
I am wondering in general if this is the correct approach to make the JIRA issue data searchable over time. For example, will I be able to find the status quo of a JIRA ticket at a past #version? Or will the #version value only give me the information how often the document was updated, without giving me the indiviual document version's values?
logstash.conf (certain data was removed and replaced with <> tags)
input {
http_poller {
urls => {
data => {
method => get
url => "https://<myjira>.com/jira/rest/api/2/search?<searchJQL>"
headers => {
Authorization => "Basic <censored>"
Accept => "application/json"
"Content-Type" => "application/json"
}
}
}
request_timeout => 60
schedule => { every => "10s" } # low value for debugging
codec => "json"
}
}
filter {
split {
field => "issues"
add_field => {
"key" => "%{[issues][key]}"
"Summary" => "%{[issues][fields][summary]}"
[#metadata]["id"] => "%{[issues][id]}" # unique ID of a JIRA issue, the JIRA issue key could also be used
}
remove_field => [ "startAt", "total", "maxResults", "expand", "issues"]
}
}
output {
stdout { codec => rubydebug }
elasticsearch {
index => "gsep"
user => ["<usr>"]
password => ["<pw>"]
hosts => ["elasticsearch:9200"]
action => "update"
document_id => "%{[#metadata][id]}"
doc_as_upsert => true
}
}
Screenshots from Document Data in Kibana
I had to censor information, but the missing information should not be relevant. On the screenshot you can see that the same _id is correctly set, but the #version stays at 1. In Elasticstash/Kibana exists only exactly this document for the respective issue/_id.
The #version field is coming from logstash and is just an indicator for the version of your log message format. There is no auto-increment functionality etc.
Please note, there is also a _version field in elasticsearch documents.
_version is an automatically incremented value used for optimistic locking in a concurrency scenario.
Just to be clear, elasticsearch can't give you what you are expecting in terms of versioning out of the box. You can't access a different version of the same document relying on _version. There are design patterns hot to implement such a document history in elasticsearch. But that's a broad question with many answers and out of scope of this question.
Is there another way to tell Logstash to supply a value to an output variable without pulling it from a Logstash input? For example, in my case I'd like to create an Elasticsearch index based on a performance run ID (which I'd do from an external script) and then have Logstash send to that. For now I was thinking of creating a tcp input just for receiving perf run info and then have a filter to match on the run id. Seems like a convoluted way to do this though. For example:
input {
tcp {
type => "perfinfo"
port => 8888
}
}
if [type] == "perfinfo" {
do some matching to extract the id
}
output {
elasticsearch {
cluster => "mycluster"
manage_template => false
index => "%{id}-perftest"
}
}
I'm not sure if setting manage_template to false would actually be necessary. I've read that it is.
Update
Thanks Nirdesh for that. Using Ruby might be very handy.
While I was waiting I tried using a grok filter like so:
grok {
match => { "message" => "%{WORD:perftype}-%{POSINT:perfid}" }
}
Which produced this stdout during debugging:
{
"message" => "awperf-14",
"#version" => "1",
"#timestamp" => "2014-10-17T20:01:19.758Z",
"host" => "0:0:0:0:0:0:0:1:33361",
"type" => "perfinfo",
"perftype" => "awperf",
"perfid" => "14"
}
Which I tried creating an index based on this like so:
index => "%{perftype}-%{perfid}"
So when I passed 'awperf-14' to the input, I ended up creating these indexes
%{perftype}-%{perfid}
awperf-14
Which is not what I was expecting. Also, it's the %{perftype}-%{perfid} index that starts to be populated, not awperf-14, the one I actually wanted.
Yes.
You can add any no. of your own variables either for intermediate result or for permanent using a property called add_field. All most all filters in logstash support this property.
So, for your soluation, you can use a ruby script to find out the id dynamically and store it in a new variable called id, which you can use it in output.
For Example :
input {
tcp {
type => "perfinfo"
port => 8888
}
}
filter{
if [type] == "perfinfo" {
ruby{
//do some processing
add_field => { "id" => "Some value" }
}
}
}
output {
elasticsearch {
cluster => "mycluster"
manage_template => false
index => "%{id}-perftest"
}
}
I'm not sure I can do what I was trying to do via Logstash. To be a clearer, I simply wanted to change the index based on the performance run ID I'm executing. There's nothing in the data that would have this information (I have to pull it from a DB). So instead of trying to have Logstash listen for a performance run ID, I scripted this externally. The script uses the Elasticsearch API to create a new index, and then does a string replace for the index in the Logstash config file. It then restarts Logstash, which normally happens between performance runs anyway. This approach was much easier to do, and seems cleaner.