Trends metric on Kibana Dashboard, it’s possible? - elasticsearch

I want to create a metric in kibana dashboard, which use ratio of multiple metrics and offset period.
Example :
Date Budget
YYYY-MM-DD $
2019-01-01 15
2019-01-02 10
2019-01-03 5
2019-01-04 10
2019-01-05 12
2019-01-06 4
If I select time range between 2019-01-04 to 2019-01-06 , I want to compute ratio with offset period: 2019-01-01 to 2019-01-03.
to resume : (sum(10+12+4) - sum(15+10+5)) / sum(10+12+4) = -0.15
evolution of my budget equal to -15% (and this is what I want to print in the dashboard)
But, with metric it's not possible (no offset), with visual builder: different metric aggregation do not have different offset (too bad because bucket script allow to compute ratio), and with vega : I not found a solution too.
Any idea ? Thanks a lot
Aurélien
NB: I use kibana version > 6.X

Please check the below sample mapping which I've constructed based on data you've provided in the query and aggregation solution that you wanted to take a look.
Mapping:
PUT <your_index_name>
{
"mappings": {
"mydocs": {
"properties": {
"date": {
"type": "date",
"format": "yyyy-MM-dd"
},
"budget": {
"type": "float"
}
}
}
}
}
Aggregation
I've made use of the following types of aggregation:
Date Histogram where I've mentioned interval as 4d based on the data you've mentioned in the question
Sum
Derivative
Bucket Script which actually gives you the required budget evolution figure.
Also I'm assuming that the date format would be in yyyy-MM-dd and budget would be of float data type.
Below is how your aggregation query would be.
POST <your_index_name>/_search
{
"size": 0,
"query": {
"range": {
"date": {
"gte": "2019-01-01",
"lte": "2019-01-06"
}
}
},
"aggs": {
"my_date": {
"date_histogram": {
"field": "date",
"interval": "4d",
"format": "yyyy-MM-dd"
},
"aggs": {
"sum_budget": {
"sum": {
"field": "budget"
}
},
"budget_derivative": {
"derivative": {
"buckets_path": "sum_budget"
}
},
"budget_evolution": {
"bucket_script": {
"buckets_path": {
"input_1": "sum_budget",
"input_2": "budget_derivative"
},
"script": "(params.input_2/params.input_1)*(100)"
}
}
}
}
}
}
Note that the result that you are looking for would be in the budget_evolution part.
Hope this helps!

Related

Bucket aggregation that doesn't depend on the time range in Elasticsearch

I'm using Elasticsearch 7.9.3 to query time series data metrics which are stored in a form of:
{
"timestamp": <long>,
"name" : <string - metric name>,
"value" : <float>
}
I want to show this data in our UI widgets however the query might bring way too much data for the widget so I went with bucket aggregation that will calculate the average value per bucket and will bring the "calculated" representatives from the time series. Here is a slightly simplified query of what I'm doing
{
"size": 0,
"query": {
"bool": {
"filter": [
{
"term": {
"name": "METRICS_NAME_COMES_HERE"
}
},
{
"range": {
"timestamp": {
"gte": {{from}},
"lt": {{to}}
}
}
}
]
}
},
"aggs": {
"primary-agg": {
"date_histogram": {
"field": "timestamp",
"fixed_interval": "{{bucket_size}}ms",
"min_doc_count" : 1,
"offset": "{{offset_in_ms}}ms"
},
"aggs": {
"average-value": {
"avg": {
"field": "value"
}
}
}
}
}
}
Now when the time range changes (we have a kibana-like time picker in our ui widget that allows to change the time range translated to 'from'/'to' in the query), the bucket data gets recalculated and it may bring to significant data discrepancy shown in UI.
For example if from UI I see a "spike" of data, and zoom (thus narrowing down the search period) the spike is preserved but the actual values of the "representatives" are changed significantly.
So my question is what are the best practices to create a query that produces the fixed number of results (therefor I understand that I need some kind of aggregation) but the values are not affected by the range changes?

Transforming in elasticsearch not update aggregated data

I am working on a scenario to aggregate daily data per user. The data processed realtime and stored in elasticsearch. Now I wanno use elasticsearch feature for aggregating data in real time.Iv'e read about Transfrom in elasticsearch and found this is the case we need.
The problem is when the source index is updated, the destination index which is proposed to calculate aggregation is not updated. This is the case I have tested:
source_index data model:
{
"my_datetime": "2021-06-26T08:50:59",
"client_no": "1",
"my_date": "2021-06-26",
"amount": 1000
}
and the transform I defined:
PUT _transform/my_transform
{
"source": {
"index": "dest_index"
},
"pivot": {
"group_by": {
"client_no": {
"terms": {
"field": "client_no"
}
},
"my_date": {
"terms": {
"field": "my_date"
}
}
},
"aggregations": {
"sum_amount": {
"sum": {
"field": "amount"
}
},
"count_amount": {
"value_count": {
"field": "amount"
}
}
}
},
"description": "total amount sum per client",
"dest": {
"index": "my_analytic"
},
"frequency": "60s",
"sync": {
"time": {
"field": "my_datetime",
"delay": "10s"
}
}
}
Now when I add another document or update current documents in source index, destination index is not updated and not consider new documents.
Also note that elasticsearch version I used is 7.13
I also changed date field to be timestamp(epoch format like 1624740659000) but still have the same problem.
What am I doing wrong here?
Could it be that your "my_datetime" is further in the past than the "delay": "10s" (plus the time of "frequency": "60s")?
The docs for sync.field note:
In general, it’s a good idea to use a field that contains the ingest timestamp. If you use a different field, you might need to set the delay such that it accounts for data transmission delays.
You might just need a higher delay.

Aggregation Median/Mean Queries

I have an index with a type that can be reduced to:
{
'date': DATE_STRING,
'owner': INT,
'color: 'red' | 'purple' | 'blue'
}
and am looking to make queries to present the following data, where an owner's value is equal to the aggregate number of items they own that are 'blue' subtracted by the number of item's they own that are 'red' over a requested time (Don't ask why):
minimum value of any owner (within requested time)
maximum value of any owner (within requested time)
mean value of all owners (within requested time)
median value of all owners (within requested time)
a particular owner's value (within requested time)
Set up the index:
PUT colorful
{
"mappings": {
"properties": {
"date": {
"type": "date"
},
"owner": {
"type": "integer"
},
"color": {
"type": "keyword"
}
}
}
}
Insert a few docs
POST colorful/_doc
{"date":"2020-05-28T19:56:12.237Z","owner":131351351,"color":"red"}
POST colorful/_doc
{"date":"2020-04-28T19:58:02.110Z","owner":35135125,"color":"purple"}
POST colorful/_doc
{"date":"2020-05-15T19:58:15.966Z","owner":997654341,"color":"blue"}
POST colorful/_doc
{"date":"2020-05-21T19:58:35.766Z","owner":366449,"color":"red"}
Filter by a date range & aggregate. Min, Max, Avg(=Mean) can be calculated using stats, for median there's percentiles[50]. Not sure what you meant by a particular owner's value but the actual range-filtered docs can be fetched using top_hits plus you could add a filter for a specific doc.
GET colorful/_search
{
"size": 0,
"query": {
"range": {
"date": {
"gte": "now-3M",
"lte": "now-1h"
}
}
},
"aggs": {
"1)general_stats": {
"stats": {
"field": "owner"
}
},
"2)median": {
"percentiles": {
"field": "owner",
"percents": [
50
]
}
},
"3)top_hits": {
"top_hits": {
"size": 10
}
}
}
}

elasticsearch Need average per week of some value

I have simple data as
sales, date_of_sales
I need is average per week i.e. sum(sales)/no.of weeks.
Please help.
What i have till now is
{
"size": 0,
"aggs": {
"WeekAggergation": {
"date_histogram": {
"field": "date_of_sales",
"interval": "week"
}
},
"TotalSales": {
"sum": {
"field": "sales"
}
},
"myValue": {
"bucket_script": {
"buckets_path": {
"myGP": "TotalSales",
"myCount": "WeekAggergation._bucket_count"
},
"script": "params.myGP/params.myCount"
}
}
}
}
I get the error
Invalid pipeline aggregation named [myValue] of type [bucket_script].
Only sibling pipeline aggregations are allowed at the top level.
I think this may help:
{
"size": 0,
"aggs": {
"WeekAggergation": {
"date_histogram": {
"field": "date_of_sale",
"interval": "week",
"format": "yyyy-MM-dd"
},
"aggs": {
"TotalSales": {
"sum": {
"field": "sales"
}
},
"AvgSales": {
"avg": {
"field": "sales"
}
}
}
},
"avg_all_weekly_sales": {
"avg_bucket": {
"buckets_path": "WeekAggergation>TotalSales"
}
}
}
}
Note the TotalSales aggregation is now a nested aggregation under the weekly histogram aggregation (I believe there was a typo in the code provided - the simple schema provided indicated the field name of date_of_sale and the aggregation provided uses the plural form date_of_sales). This provides you a total of all sales in the weekly bucket.
Additionally, AvgSales provides a similar nested aggregation under the weekly histogram aggregation so you can see the average of all sales specific to that week.
Finally, the pipeline aggregation avg_all_weekly_sales will give the average of weekly sales based on the TotalSales bucket and the number of non-empty buckets - if you want to include empty buckets, add the gap_policy parameter like so:
...
"avg_all_weekly_sales": {
"avg_bucket": {
"buckets_path": "WeekAggergation>TotalSales",
"gap_policy": "insert_zeros"
}
}
...
(see: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-pipeline-avg-bucket-aggregation.html).
This pipeline aggregation may or may not be what you're actually looking for, so please check the math to ensure the result is what is expected, but should provide the correct output based on the original script.

Elasticsearch: get top nested doc per month without top level duplicates

I have some time-based, nested data of which I would like to get the biggest changes, positive and negative, of plugins per month. I work with Elasticsearch 5.3 (and Kibana 5.3).
A document is structured as follows:
{
_id: "xxx",
#timestamp: 1508244365987,
siteURL: "www.foo.bar",
plugins: [
{
name: "foo",
version: "3.1.4"
},
{
name: "baz",
version: "13.37"
}
]
}
However, per id (siteURL), I have multiple entries per month and I would like to use only the latest per time bucket, to avoid unfair weighing.
I tried to solve this by using the following aggregation:
{
"aggs": {
"normal_dates": {
"date_range": {
"field": "#timestamp",
"ranges": [
{
"from": "now-1y/d",
"to": "now"
}
]
},
"aggs": {
"date_histo": {
"date_histogram": {
"field": "#timestamp",
"interval": "month"
},
"aggs": {
"top_sites": {
"terms": {
"field": "siteURL.keyword",
"size": 50000
},
"aggs": {
"top_plugin_hits": {
"top_hits": {
"sort": [
{
"#timestamp": {
"order": "desc"
}
}
],
"_source": {
"includes": [
"plugins.name"
]
},
"size": 1
}
}
}
}
}
}
}
}
}
}
Now I get per month the latest site and its plugins. Next I would like to turn the data inside out and get the plugins present per month and a count of the occurrences. Then I would use a serial_diff to compare months.
However, I don't know how to go from my aggregation to the serial diff, i.e. turn the data inside out.
Any help would be most welcome
PS: extra kudos if I can get it in a Kibana 5.3 table...
It turns out it is not possible to further aggregate on a top_hits query.
I ended up loading the results of the posted query into Python and used Python for further processing and visualization.

Resources