How to retrieve timestamp along with minimum value within an elastic aggregation - elasticsearch

I am executing aggregations query on an elastic dataset
I want to retrieve the maximum and minimum values over a timespan
I achieve this with
"aggs": {
"DateRangeFilter": {
"filter": {
"range": {
"#timestamp": {
"gte": "2015-10-16T11:13:17.000",
"lte": "2015-10-16T12:29:47.000"
}
}
},
"aggs": {
"min_chan4": {
"min": {
"field": "ch_004"
}
},
"max_chan4": {
"max": {
"field": "ch_004"
}
}
and this gives me:
"DateRangeFilter": {
"doc_count": 153,
"min_chan4": {
"value": 0.7463656663894653
},
"max_chan4": {
"value": 5.170884132385254
}
which is great.
What i need is also to retrieve the time that each of these occurred
My documents look like this:
"_source": {
"GroupId": "blahblah",
"#timestamp": "2015-10-14T12:41:30Z",
"ch_004": 1.5608633995056154
}
so I would expect to be able to retrieve not only the min and max values over the given date range but also at what time (#timestamp) that min or max value was recorded
e.g. min value minX occurred at time t1, max value maxX occurred at time t2

You can achieve it by using Top Hits Aggregation. example can be found here:
{
"aggs": {
"DateRangeFilter": {
"filter": {
"range": {
"#timestamp": {
"gte": "2015-10-16T11:13:17.000",
"lte": "2015-10-16T12:29:47.000"
}
}
},
"aggs": {
"ch_004_min": {
"top_hits": {
"size": 1,
"sort": [
{
"timestamp": {
"order": "asc"
}
}
]
}
},
"ch_004_max": {
"top_hits": {
"size": 1,
"sort": [
{
"timestamp": {
"order": "desc"
}
}
]
}
}
}
}
}
}

Related

Bucket sort on dynamic aggregation name

I would like to sort my aggregations value from quantity.
But my problem is that each aggregation have a name that couldn't be know in advance :
Given this query :
{
"size": 0,
"query": {
"bool": {
"must": [
{
"range": {
"datetime": {
"gte": "2021-01-01",
"lte": "2021-12-09"
}
}
}
]
}
},
"aggs": {
"sorting": {
"bucket_sort": {
"sort": [
{
"year>quantity": {
"order": "desc"
}
}
]
}
},
"UNKNOWN_1": {
"aggs": {
"year": {
"filter": {
"bool": {
"must": [
{
"range": {
"datetime": {
"gte": "2021-01-01",
"lte": "2021-12-09"
}
}
}
]
}
},
"aggs": {
"quantity": {
"sum": {
"field": "item.quantity"
}
}
}
}
}
},
"UNKNOWN_2": {
"aggs": {
"year": {
"aggs": {
"quantity": {
"sum": {
"field": "item.quantity"
}
}
}
}
}
},
....
}
}
it miss one level on my bucket_sort aggregation to reach that quantity value.
Here is one elastic record :
{
datetime: '2021-12-01',
item.quantity: 5
}
Note that I have remove the biggest part of the request for comprehension, like filter aggregation, ect....
I tried something with wildcard :
"sorting": {
"bucket_sort": {
"sort": [
{
"*>year>quantity": {
"order": "desc"
}
}
]
}
},
But got the same error....
Is it possible to achieve this behaviour ?
I think you misunderstood the "bucket_sort" aggregation: it won't sort your aggregations but it sorts the buckets coming from one multi-bucket aggregation. Also the bucket_sort aggregation has to be subordinate to that multi-bucket aggregation.
From the docs:
[The bucket sort aggregation is] "a parent pipeline aggregation which sorts the buckets of its parent multi-bucket aggregation"
If I get it correct, you try to create "buckets" with specific filter aggregations and you can't know in advance how many of those filter aggregations you create.
For that you can use the "multi filters" aggregation where you can specify as many filters as you want and each of them creates a bucket.
Subordinated to that filters-aggregation you can create one single sum aggregation on item.quantity.
Also subordinated to the filters-aggregations you then add your buckets_sort aggregation, where you also just have to name the sibling "sum" aggregation.
All in all it might look like that:
{
"aggs": {
"your_filters": {
"filters": {
"filters": {
"unknown_1": {
"range": {
"datetime": {
"gte": "2021-01-01",
"lte": "2021-12-09"
}
}
},
"unknown_2": {
/** more filters here... **/
}
}
},
"aggs": {
"quantity": {
"sum": {
"field": "item.quantity"
}
},
"sorting": {
"bucket_sort": {
"sort": [
{ "quantity": { "order": "desc" } }
]
}
}
}
}
}
}

Is it possible to fetch count of total number of docs that contain a qualifying aggregation condition in elasticsearch?

I use ES v7.3 and as per my requirements I am aggregating some fields to fetch the required docs in response, further their is a requirement to fetch the count of total number of all such docs also that contain the nested field which qualifies the aggregation condition as described below but I did not find a way where I am able to do that.
Current aggregation query that I am using to fetch the documents is,
"aggs": {
"users": {
"composite": {
"sources": [
{
"users": {
"terms": {
"field": "co_profileId.keyword"
}
}
}
],
"size": 5000
},
"aggs": {
"sessions": {
"nested": {
"path": "co_score"
},
"aggs": {
"last_4_days": {
"filter": {
"range": {
"co_score.sessionTime": {
"gte": "2021-01-10T00:00:31.399Z",
"lte": "2021-01-14T01:37:31.399Z"
}
}
},
"aggs": {
"score_count": {
"sum": {
"field": "co_score.value"
}
}
}
}
}
},
"page_view_count_filter": {
"bucket_selector": {
"buckets_path": {
"sessionCount": "sessions > last_4_days > score_count"
},
"script": "params.sessionCount > 100"
}
},
"filtered_users": {
"top_hits": {
"size": 1,
"_source": {
"includes": [
"co_profileId",
"co_type",
"co_score"
]
}
}
}
}
}
}
Sample doc:
{
"co_profileId": "14654325",
"co_type": "identify",
"co_updatedAt": "2021-01-11T11:37:33.499Z",
"co_score": [
{
"value": 3,
"sessionTime": "2021-01-09T01:37:31.399Z"
},
{
"value": 3,
"sessionTime": "2021-01-10T10:47:33.419Z"
},
{
"value": 6,
"sessionTime": "2021-01-11T11:37:33.499Z"
}
]
}

Need aggregation on document inner array object - ElasticSearch

I am trying to do aggregation over the following document
{
"pid": 900000,
"mid": 9000,
"cid": 90,
"bid": 1000,
"gmv": 1000000,
"vol": 200,
"data": [
{
"date": "25-11-2018",
"gmv": 100000,
"vol": 20
},
{
"date": "24-11-2018",
"gmv": 100000,
"vol": 20
},
{
"date": "23-11-2018",
"gmv": 100000,
"vol": 20
}
]
}
The analysis which needs to be done here is:
Filter on mid or/and cid on all documents
Filter range on data.date for last 7 days and sum data.vol over that range for each pid
sort the documents over the sum obtained in previous step in desc order
Group these results by pid.
This means we are trying to get top products by sum of the volume (quantity sold) within a date range for specific cid/mid.
PID here refers product ID,
MID refers here merchant ID,
CID refers here category ID
Firstly you need to change your mapping to run the query on nested fields.
change the type for field 'data' as 'nested'.
Then you can use the range query in filter along with the terms filter on mid/cid to filter on the data. Once you get the correct data set, then you can aggregate on the pid following the sub aggregation on sum of vol.
Here is the below query.
{
"query": {
"bool": {
"filter": [
{
"bool": {
"must": [
{
"range": {
"data.date": {
"gte": "28-11-2018",
"lte": "25-11-2018"
}
}
},
{
"must": [
{
"terms": {
"mid": [
"9000"
]
}
}
]
}
]
}
}
]
}
},
"aggs": {
"AGG_PID": {
"terms": {
"field": "pid",
"size": 0,
"order": {
"TOTAL_SUM": "desc"
},
"min_doc_count": 1
},
"aggs": {
"TOTAL_SUM": {
"sum": {
"field": "data.vol"
}
}
}
}
}
}
You can modify the query accordingly. Hope this will be helpful.
Please find nested aggregation query which sorts by "vol" for each bucket of "pid". You can add any number of filters in the query part.
{
"size": 0,
"query": {
"bool": {
"must": [
{
"term": {
"mid": "2"
}
}
]
}
},
"aggs": {
"top_products_sorted_by_order_volume": {
"terms": {
"field": "pid",
"order": {
"nested_data_object>order_volume_by_range>order_volume_sum": "desc"
}
},
"aggs": {
"nested_data_object": {
"nested": {
"path": "data"
},
"aggs": {
"order_volume_by_range": {
"filter": {
"range": {
"data.date": {
"gte": "2018-11-26",
"lte": "2018-11-27"
}
}
},
"aggs": {
"order_volume_sum": {
"sum": {
"field": "data.ord_vol"
}
}
}
}
}
}
}
}
}
}

Elastic aggregation to identify period A vs B percentage increases

I have some daily sales data indexed into Elasticsearch. I successfully run a number of aggregations to identify top sellers across a date range etc.
I am now trying to write a single query to do the following:
Identify Top n sellers over a date range (Period A)
Take the results of Period A and sum sales for these products over second date range (Period B)
Compare sales in period A to Period B and identify those with percentage increases above X%.
My attempt so far:
{
"query": {
"bool": {
"filter": [
{
"range": {
"date": {
"gte": "2017-10-01",
"lte": "2017-10-14"
}
}
}
]
}
},
"size": 0,
"aggs": {
"data_split": {
"terms": {
"size": 10,
"field": "product_id"
},
"aggs": {
"date_periods": {
"date_range": {
"field": "date",
"format": "YYYY-MM-dd",
"ranges": [
{
"from": "2017-10-01",
"to": "2017-10-07"
},
{
"from": "2017-10-08",
"to": "2017-10-14"
}
]
},
"aggs": {
"product_id_split": {
"terms": {
"field": "product_id"
},
"aggs": {
"unit_sum": {
"sum": {
"field": "units"
}
}
}
}
}
}
}
}
}
}
Although this outputs results for two periods, I don't think this is quite what I want as the initial filter is running from Period A start date to Period B end date and I think summing results for that range instead of Period A only. I also don't get the % comparison, I would probably do this at my application level, but I understand could be handled with a scripted Elastic query?
It would be especially awesome if instead of top n results in period A, I could set a sales threshold of say 1,000 sales.
Any pointers would be much appreciated. Thanks in advance!
Currently running Elastic 5.6
{
"query": {
"bool": {
"filter": [
{
"range": {
"date": {
"gte": "2017-10-01",
"lte": "2017-10-14"
}
}
}
]
}
},
"size": 0,
"aggs": {
"data_split": {
"terms": {
"size": 10,
"field": "product_id"
},
"aggs": {
"date_period1": {
"filter": {
"range": {
"date": {
"gte": "2017-10-01",
"lte": "2017-10-07"
}
}
},
"aggs": {
"unit_sum": {
"sum": {
"field": "units"
}
}
}
},
"date_period2": {
"filter": {
"range": {
"date": {
"gte": "2017-10-08",
"lte": "2017-10-14"
}
}
},
"aggs": {
"unit_sum": {
"sum": {
"field": "units"
}
}
}
},
"percentage_increase": {
"bucket_script": {
"buckets_path": {
"firstPeriod": "date_period1>unit_sum",
"secondPeriod": "date_period2>unit_sum"
},
"script": "(params.secondPeriod-params.firstPeriod)*100/params.firstPeriod"
}
},
"retain_buckets": {
"bucket_selector": {
"buckets_path": {
"percentage": "percentage_increase"
},
"script": "params.percentage > 5"
}
}
}
}
}
}
And a full test data in this gist.
The result of this aggregation is giving you this:
"aggregations": {
"data_split": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "A",
"doc_count": 6,
"date_period1": {
"doc_count": 3,
"unit_sum": {
"value": 150
}
},
"date_period2": {
"doc_count": 3,
"unit_sum": {
"value": 160
}
},
"percentage_increase": {
"value": 6.666666666666667
}
},
{
"key": "C",
"doc_count": 2,
"date_period1": {
"doc_count": 1,
"unit_sum": {
"value": 50
}
},
"date_period2": {
"doc_count": 1,
"unit_sum": {
"value": 70
}
},
"percentage_increase": {
"value": 40
}
}
]
}
}
The idea is that you use two filter type of aggregations for the two date intervals. And for each you calculate a sum. Then, using a third aggregation of type bucket_script you calculate the percentage increase (note, though, that it will be a negative number of there is a decrease in sales for example).
Then, using yet another aggregation - of type bucket_selector - you keep the product_ids where the percentage is larger than 5%.

Query elasticsearch with multiple numeric ranges

{
"query": {
"filtered": {
"query": {
"match": {
"log_path": "message_notification.log"
}
},
"filter": {
"numeric_range": {
"time_taken": {
"gte": 10
}
}
}
}
},
"aggs": {
"distinct_user_ids": {
"cardinality": {
"field": "user_id"
}
}
}
}
I have to run this query 20 times as i want to know notification times above each of the following thresholds- [10,30,60,120,240,300,600,1200..]. Right now, i am running a loop and making 20 queries for fetching this.
Is there a more sane way to query elasticsearch once and get ranges that fall into these thresholds respectively?
What you probably want is a "range aggregation".
Here is the possible query where you can add more range or alter them -
{
"size": 0,
"query": {
"match": {
"log_path": "message_notification.log"
}
},
"aggs": {
"intervals": {
"range": {
"field": "time_taken",
"ranges": [
{
"to": 50
},
{
"from": 50,
"to": 100
},
{
"from": 100
}
]
},
"aggs": {
"distinct_user_ids": {
"cardinality": {
"field": "user_id"
}
}
}
}
}
}

Resources