How to dynamically change destination index in continuous Elasticsearch transforms? - elasticsearch

I am trying to extract some high level metrics from the log data we store in Elasticsearch. To achieve this I am running a number of continuous transforms to generate more meaningful high level logs.
I have included a dest block in my transform definition JSON, as follows:
"dest": {
"index": "transform_index" + date
}
But the aforementioned code is evaluated only once on transform creation time, and is not updated in future sync cycles.
I am looking for a solution to change the transform index on a monthly basis and I think it is doable using a pipeline. However, I am not sure how.
Any pointers are appreciated.

I've read through the documentation and found my answer. I've managed to achieve what I needed using pipelines, I've created a pipeline as follows:
PUT /_ingest/pipeline/add_timestamp_pipeline
{
"processors" : [
{
//copy timestamp field from transform source
"set" : {
"field" : "#timestamp",
"value" : "{{#timestamp}}"
}
},
{
//create indices based on #timestamp rounded to month
"date_index_name" => {
"field" => "#timestamp",
"index_name_prefix" => "hourly-activity-index-",
"date_rounding" => "M",
"date_formats" => ["UNIX_MS"]
}
}
]
}
Then you use the created pipeline in your transform:
PUT /_transform/hourly_transform
{
"dest" : {
"index" : "hourly_activity_index",
"pipeline" => "add_timestamp_pipeline"
},
//rest of the transform definition
}

Related

Conditional indexing in metricbeat using Ingest node pipeline creates a datastream

I am trying to achieve conditional indexing for namespaces in elastic using ingest node pipelines. I used the below pipeline but the index getting created when I add the pipeline in metricbeat.yml is in form of datastreams.
PUT _ingest/pipeline/sample-pipeline
{
"processors": [
{
"set": {
"field": "_index",
"copy_from": "metricbeat-dev",
"if": "ctx.kubernetes?.namespace==\"dev\"",
"ignore_failure": true
}
}
]
}
Expected index name is metricbeat-dev but i am getting the value in _index as .ds-metricbeat-dev.
This works fine when I test with one document but when I implement it in yml file I get the index name starting with .ds- why is this happening?
update for the template :
{
"metricbeat" : {
"order" : 1,
"index_patterns" : [
"metricbeat-*"
],
"settings" : {
"index" : {
"lifecycle" : {
"name" : "metricbeat",
"rollover_alias" : "metricbeat-metrics"
},
If you have data streams enabled in the index templates it has potential to create a datastream. This would depend upon how you configure the priority. If priority is not mentioned then it would create legacy index but if priority higher than 100 is mentioned in the index templates. Then this creates a data stream(legacy index has priority 100 so use priority value more than 100 if you want index in form of data stream).
If its create a data stream and its not expected please check if there is a template pointing to index you are writing where data stream is enabled! This was the reason in my case.
Have been working with this for few months and this is what I have observed!

Trigger an action for each hit of Elasticsearch query in Kibana Monitor

Is it possible to trigger an action for each hit of a given query in a Kibana Monitor? I would like to use a foreach loop to do this as demonstrated here. However, it's unclear how to implement this on the Kibana Monitor page. On the page there is an input field for Trigger Conditions but I'm unsure how to format the foreach within it or if this is supported.
Consider using Elasticsearch watcher (require at least gold licesnse): https://www.elastic.co/guide/en/elasticsearch/reference/current/how-watcher-works.html
Watcher will run on a certain interval and will perform a query against indices (according to your configuration). You will need to create a condition (e.g. hits number is greater than 5) that when it evaluates to true an action will be performed. Elasticsearch allows you to use multiple actions. For example, you can use webhook and receive the data from the last watcher run (you can also use watcher api to transform the data). If you don't have Gold license you can mimic watcher behavior by a script/program that uses Elasticsearch Search API.
Herbeby is a simple example of a watcher checking index named test every minute and sends a webhook with the entire search context in case there is at least one document.
{
"trigger" : {
"schedule" : { "interval" : "1m" }
},
"input" : {
"search" : {
"request" : {
"indices" : [ "test" ],
"body" : {
"query" : {
"bool": {
"must": {
"range": {
"updatedAt": {
"gte": "now-1m"
}
}
}
}
}
}
}
}
},
"condition" : {
"compare" : { "ctx.payload.hits.total" : { "gt" : 0 }}
},
"actions" : {
"sample_webhook" : {
"webhook" : {
"method" : "POST",
"url": "http://b4022015b928.ngrok.io/UtilsService/api/elasticHandler/watcher",
"body" : "{{#toJson}}ctx.payload{{/toJson}}",
"auth": {
"basic": {
"user": "user",
"password": "pass"
}
}
}
}
}
}
An alternative way would be to use Kibana Alerts and Actions.
https://www.elastic.co/guide/en/kibana/current/alerting-getting-started.html
This feature is slightly different from Watcher but basically allows you to perfrom actions upon a query against Elasticsearch. This featrue is only part of Kibana opposing to watcher which is part of Elasticsearch (though it is accessible from Kibana stack management).

Is there a way to update a document with a Painless script without changing the order of unaffected fields?

I'm using Elasticsearch's Update by Query API to update some documents with a Painless script like this (the actual query is more complicated):
POST ts-scenarios/_update_by_query?routing=test
{
"query": {
"term": { "routing": { "value": "test" } }
},
"script": {
"source": """ctx._source.tagIDs = ["5T8QLHIBB_kDC9Ugho68"]"""
}
}
This works, except that upon reindexing, other fields get reordered, including some classes which are automatically (de)serialized using JSON.NET's type handling. That means a document with the following source before the update:
{
"routing" : "testsuite",
"activities" : [
{
"$type" : "Test.Models.SomeActivity, Test"
},
{
"$type" : "Test.Models.AnotherActivity, Test",
"CustomParameter" : 1,
"CustomSetting" : false
}
]
}
ends up as
{
"routing" : "testsuite",
"activities" : [
{
"$type" : "Test.Models.SomeActivity, Test"
},
{
"CustomParameter" : 1,
"CustomSetting" : false,
"$type" : "Test.Models.AnotherActivity, Test"
}
],
"tagIDs" : [
"5T8QLHIBB_kDC9Ugho68"
]
}
which JSON.NET can't deserialize. Is there a way I can tell the script (or the Update by Query API) not to change the order of those other fields?
In case it matters, I'm using Elasticsearch OSS version 7.6.1 on macOS. I haven't checked whether an Ingest pipeline would work here, as I'm not familiar with them.
(It turns out I can make the deserialization more flexible by setting the MetadataPropertyHandling property to ReadAhead, as mentioned here. That works, but as mentioned it may hurt performance and there might be other situations where field order matters. Technically, it shouldn't; JSON isn't XML, but there are always edge cases where it does matter.)

Elasticsearch. Painless script to search based on the last result

Let's see if someone could shed a light on this one, which seems to be a little hard.
We need to correlate data from multiple index and various fields. We are trying painless script.
Example:
We make a search in an index to gather data about the queueid of mails sent by someone#domain
Once we have the queueids, we need to store the queueids in an array an iterate over it to make new searchs to gather data like email receivers, spam checks, postfix results and so on.
Problem: Hos can we store the data from one search and use it later in the second search?
We are testing something like:
GET here_an_index/_search
{
"query": {
"bool" : {
"must": [
{
"range": {
"#timestamp": {
"gte": "now-15m",
"lte": "now"
}
}
}
],
"filter" : {
"script" : {
"script" : {
"source" : "doc['postfix_from'].value == params.from; qu = doc['postfix_queueid'].value; return qu",
"params" : {
"from" : "someona#mdomain"
}
}
}
}
}
}
}
And, of course, it throws an error.
"doc['postfix_from'].value ...",
"^---- HERE"
So, in a nuttshell: is there any way ti execute a search looking for some field value based on a filter (like from:someone#dfomain) and use this values on later searchs?
We have evaluated using script fields or nested, but due to some architecture reasons and what those changes would entail, right now, can not be used.
Thank you very much!

Sorting results based on location using elastica

I am trying to learn ElasticSearch using elastica to connect in and finding information hard to get in order to understand how to query it.
So basically what I am trying to do is, I have inserted data into elastic search, added geo coordinates in and now what i need to do is to be able to run a query that will sort the results i get by closest to farthest.
I wanted to find all the stores in my state, then order them by which one is closest to my current location.
so given a field called "state" and field called "point" which is an array holding long/Lat using elastica what would the query be?
Thanks for any help that you can give me.
First, you need to map your location field as type geo_point (this needs to be done before inserting any data)
{
"stores" : {
"properties" : {
"point" : {
"type" : "geo_point"
}
}
}
}
After that, you can simply sort your search by _geo_distance
{
"sort" : [
{
"_geo_distance" : {
"stores.point" : [-70, 40], // <- reference starting position
"order" : "asc",
"unit" : "km"
}
}
],
"query" : {
"match_all" : {}
}
}
For Elastica, have a look at their docs regarding mapping and query building, and read the unit tests.
For those wanting to sort by distance, this cut down snippet details how to use a custom score:
$q = new BoolQuery();
$subQ = new MultiMatchQuery();
$subQ->setQuery('find me')->setFields(array('foo', 'bar'));
$q->addShould($subQ);
$cs = new CustomScore();
$cs->setScript('-doc["location"].distanceInKm(lat,lon)');
$cs->addParams(array(
'lat' => -33.882583,
'lon' => 151.209737
));
$cs->setQuery($q);
Hope it helps.

Resources