Need aggregation on document inner array object - ElasticSearch - elasticsearch

I am trying to do aggregation over the following document
{
"pid": 900000,
"mid": 9000,
"cid": 90,
"bid": 1000,
"gmv": 1000000,
"vol": 200,
"data": [
{
"date": "25-11-2018",
"gmv": 100000,
"vol": 20
},
{
"date": "24-11-2018",
"gmv": 100000,
"vol": 20
},
{
"date": "23-11-2018",
"gmv": 100000,
"vol": 20
}
]
}
The analysis which needs to be done here is:
Filter on mid or/and cid on all documents
Filter range on data.date for last 7 days and sum data.vol over that range for each pid
sort the documents over the sum obtained in previous step in desc order
Group these results by pid.
This means we are trying to get top products by sum of the volume (quantity sold) within a date range for specific cid/mid.
PID here refers product ID,
MID refers here merchant ID,
CID refers here category ID

Firstly you need to change your mapping to run the query on nested fields.
change the type for field 'data' as 'nested'.
Then you can use the range query in filter along with the terms filter on mid/cid to filter on the data. Once you get the correct data set, then you can aggregate on the pid following the sub aggregation on sum of vol.
Here is the below query.
{
"query": {
"bool": {
"filter": [
{
"bool": {
"must": [
{
"range": {
"data.date": {
"gte": "28-11-2018",
"lte": "25-11-2018"
}
}
},
{
"must": [
{
"terms": {
"mid": [
"9000"
]
}
}
]
}
]
}
}
]
}
},
"aggs": {
"AGG_PID": {
"terms": {
"field": "pid",
"size": 0,
"order": {
"TOTAL_SUM": "desc"
},
"min_doc_count": 1
},
"aggs": {
"TOTAL_SUM": {
"sum": {
"field": "data.vol"
}
}
}
}
}
}
You can modify the query accordingly. Hope this will be helpful.

Please find nested aggregation query which sorts by "vol" for each bucket of "pid". You can add any number of filters in the query part.
{
"size": 0,
"query": {
"bool": {
"must": [
{
"term": {
"mid": "2"
}
}
]
}
},
"aggs": {
"top_products_sorted_by_order_volume": {
"terms": {
"field": "pid",
"order": {
"nested_data_object>order_volume_by_range>order_volume_sum": "desc"
}
},
"aggs": {
"nested_data_object": {
"nested": {
"path": "data"
},
"aggs": {
"order_volume_by_range": {
"filter": {
"range": {
"data.date": {
"gte": "2018-11-26",
"lte": "2018-11-27"
}
}
},
"aggs": {
"order_volume_sum": {
"sum": {
"field": "data.ord_vol"
}
}
}
}
}
}
}
}
}
}

Related

Bucket sort on dynamic aggregation name

I would like to sort my aggregations value from quantity.
But my problem is that each aggregation have a name that couldn't be know in advance :
Given this query :
{
"size": 0,
"query": {
"bool": {
"must": [
{
"range": {
"datetime": {
"gte": "2021-01-01",
"lte": "2021-12-09"
}
}
}
]
}
},
"aggs": {
"sorting": {
"bucket_sort": {
"sort": [
{
"year>quantity": {
"order": "desc"
}
}
]
}
},
"UNKNOWN_1": {
"aggs": {
"year": {
"filter": {
"bool": {
"must": [
{
"range": {
"datetime": {
"gte": "2021-01-01",
"lte": "2021-12-09"
}
}
}
]
}
},
"aggs": {
"quantity": {
"sum": {
"field": "item.quantity"
}
}
}
}
}
},
"UNKNOWN_2": {
"aggs": {
"year": {
"aggs": {
"quantity": {
"sum": {
"field": "item.quantity"
}
}
}
}
}
},
....
}
}
it miss one level on my bucket_sort aggregation to reach that quantity value.
Here is one elastic record :
{
datetime: '2021-12-01',
item.quantity: 5
}
Note that I have remove the biggest part of the request for comprehension, like filter aggregation, ect....
I tried something with wildcard :
"sorting": {
"bucket_sort": {
"sort": [
{
"*>year>quantity": {
"order": "desc"
}
}
]
}
},
But got the same error....
Is it possible to achieve this behaviour ?
I think you misunderstood the "bucket_sort" aggregation: it won't sort your aggregations but it sorts the buckets coming from one multi-bucket aggregation. Also the bucket_sort aggregation has to be subordinate to that multi-bucket aggregation.
From the docs:
[The bucket sort aggregation is] "a parent pipeline aggregation which sorts the buckets of its parent multi-bucket aggregation"
If I get it correct, you try to create "buckets" with specific filter aggregations and you can't know in advance how many of those filter aggregations you create.
For that you can use the "multi filters" aggregation where you can specify as many filters as you want and each of them creates a bucket.
Subordinated to that filters-aggregation you can create one single sum aggregation on item.quantity.
Also subordinated to the filters-aggregations you then add your buckets_sort aggregation, where you also just have to name the sibling "sum" aggregation.
All in all it might look like that:
{
"aggs": {
"your_filters": {
"filters": {
"filters": {
"unknown_1": {
"range": {
"datetime": {
"gte": "2021-01-01",
"lte": "2021-12-09"
}
}
},
"unknown_2": {
/** more filters here... **/
}
}
},
"aggs": {
"quantity": {
"sum": {
"field": "item.quantity"
}
},
"sorting": {
"bucket_sort": {
"sort": [
{ "quantity": { "order": "desc" } }
]
}
}
}
}
}
}

Need aggregation of only the query results

I need to do an aggregation but only with the limited results I get form the query, but it is not working, it returns other results outside the size limit of the query. Here is the query I am doing
{
"size": 500,
"query": {
"bool": {
"must": [
{
"term": {
"tags.keyword": "possiblePurchase"
}
},
{
"term": {
"clientName": "Ci"
}
},
{
"range": {
"firstSeenDate": {
"gte": "now-30d"
}
}
}
],
"must_not": [
{
"term": {
"tags.keyword": "skipPurchase"
}
}
]
}
},
"sort": [
{
"firstSeenDate": {
"order": "desc"
}
}
],
"aggs": {
"byClient": {
"terms": {
"field": "clientName",
"size": 25
},
"aggs": {
"byTarget": {
"terms": {
"field": "targetName",
"size": 6
},
"aggs": {
"byId": {
"terms": {
"field": "id",
"size": 5
}
}
}
}
}
}
}
}
I need the aggregations to only consider the first 500 results of the query, sorted by the field I am requesting on the query. I am completely lost. Thanks for the help
Scope of the aggregation is the number of hits of your query, the size parameter is only used to specify the number of hits to fetch and display.
If you want to restrict the scope of the aggregation on the first n hits of a query, I would suggest the sampler aggregation in combination with your query

Aggregation not taking place on basis of size paramter passed in ES query

My ES query looks like this. I am trying to get average rating for indexes starting from 0 to 9. But ES is taking the average of all the records.
GET review/analytics/_search
{
"_source": "r_id",
"from": 0,
"size": 9,
"query": {
"bool": {
"filter": [
{
"terms": {
"b_id": [
236611
]
}
},
{
"range": {
"r_date": {
"gte": "1970-01-01 05:30:00",
"lte": "2019-08-13 17:13:17",
"format": "yyyy-MM-dd HH:mm:ss"
}
}
},
{
"terms": {
"s_type": [
"aggregation",
"organic",
"survey"
]
}
},
{
"bool": {
"must_not": [
{
"terms": {
"s_id": [
392
]
}
}
]
}
},
{
"term": {
"status": 2
}
},
{
"bool": {
"must_not": [
{
"terms": {
"ba_id": []
}
}
]
}
}
]
}
},
"sort": [
{
"featured": {
"order": "desc"
}
},
{
"r_date": {
"order": "desc"
}
}
],
"aggs": {
"avg_rating": {
"filter": {
"bool": {
"must_not": [
{
"term": {
"rtng": 0
}
}
]
}
},
"aggs": {
"rtng": {
"avg": {
"field": "rtng"
}
}
}
},
"avg_rating1": {
"filter": {
"bool": {
"must_not": [
{
"term": {
"rtng": 0
}
}
]
}
},
"aggs": {
"rtng": {
"avg": {
"field": "rtng"
}
}
}
}
}
}
The query results shows the doc_count as 43 . whereas i want it to be 9 so that i can calculate the average correctly. I have specified the size above. The result of query seems to be calculated correctly but aggregation result is not proper.
from and size have no impact on the aggregations. They only define how many documents will be returned in the hits.hits array.
Aggregations always run on the whole document set selected by whatever query is in your query section.
If you know the IDs of the "first" nine documents, you can add a terms query in your query so that only those 9 documents are selected and so that the average rating is only computed on those 9 documents.

How to get aggregations and sort based on child document hits in elasticsearch

I have documents indexed like so:
{
"attrib": "value", // etc
"prices": [
{
"p": 10,
"d": "2016-01-01"
},
{
"p": 20,
"d": "2016-01-02"
},
{
"p": 30,
"d": "2016-01-03"
},
{
"p": 40,
"d": "2016-01-04"
}
]
}
I would like to get aggregation buckets to tell me something like this:
Price Buckets
prices.p between 1 and 10 (20)
prices.p between 11 and 20 (22)
prices.p between 21 and 30 (2)
Date Buckets
prices.d between Jan 1 and Jan 30 (20)
prices.d between Feb 1 and Feb 28 (22)
prices.d between Mar 1 and Mar 30 (2)
Where the count would show the number of parent documents that have prices.X between X and Y, NOT the number of prices in total.
Secondary to this, if I wanted to perform a filter to only get documents with prices.p between 1 and 30, I'd need the aggregation to reflect this.
Thirdly, I'd like to be able to order my results by the top child hit of the result.
So in plain English, my query would be:
"Find me all documents with at least one price between X and Y having a date between A and B, order the results by price (or date as required)"
My query so far:
{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"nested": {
"filter": {
"bool": {
"must": [
{
"range": {
"prices.p": {
"gte": 1,
"lte": 30
}
}
}
]
}
},
"path": "prices",
"inner_hits": {
"sort": [
"p"
]
}
}
}
]
}
},
"query": {
"match_all": {}
}
}
}
}
This returns documents in the default sort order, but with the inner_hits sorted by prices.p - so then I can display the lowest price for an item, alongwith the date for that price (prices.d).
Similarly, I'd like to be able to filter where prices.d is between two dates - also aggregating the dates.
Lastly, I'd like to be able to order my full document hits by the first inner hit (p or d)
You have to use nested aggregations. To build two buckets, you have to run two parallel nested aggregations.
To filter furthur more your bucket, you can add a parent query which will filter your document set as well as your buckets.
Following is the query, I changed nested d type to integer for my simplicity, but this will work for you on date range as well.
{
"aggs": {
"p_range": {
"nested": {
"path": "prices"
},
"aggs": {
"p_nested_range": {
"range": {
"field": "prices.p",
"ranges": [
{
"from": 0,
"to": 1000
},
{
"from": 1000,
"to": 2000
}
]
}
}
}
},
"d_range" :{
"nested": {
"path": "prices"
},
"aggs": {
"d_nested_range": {
"range": {
"field": "prices.d",
"ranges": [
{
"from": 0,
"to": 500
},
{
"from": 500,
"to": 1000
}
]
}
}
}
}
},
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"nested": {
"filter": {
"bool": {
"must": [
{
"range": {
"prices.p": {
"gte": 200,
"lte": 1400
}
}
}
]
}
},
"path": "prices",
"inner_hits": {
"sort": [
"prices.p"
]
}
}
}
]
}
},
"query": {
"match_all": {}
}
}
}
}
Furthur more if you want to filter only document sets, but you don't want your query to effect your buckets, you can take a look at post_filter
Edit - To sort parent document based on first inner_hit inside prices nested type, use the following query.
You don't need to have a sort clause inside innerhits, as sort inside innerhits is used to sort the nested type only not the parent doc
{
"aggs": {
"p_range": {
"nested": {
"path": "prices"
},
"aggs": {
"p_nested_range": {
"range": {
"field": "prices.p",
"ranges": [
{
"from": 0,
"to": 1000
},
{
"from": 1000,
"to": 2000
}
]
}
}
}
},
"d_range" :{
"nested": {
"path": "prices"
},
"aggs": {
"d_nested_range": {
"range": {
"field": "prices.d",
"ranges": [
{
"from": 0,
"to": 500
},
{
"from": 500,
"to": 1000
}
]
}
}
}
}
},
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"nested": {
"filter": {
"bool": {
"must": [
{
"range": {
"prices.p": {
"gte": 200,
"lte": 1400
}
}
}
]
}
},
"path": "prices",
"inner_hits": {
}
}
}
]
}
},
"query": {
"match_all": {}
}
}
},
"sort": {
"_script" : {
"type" : "number",
"script" : {
"inline": "_source.prices[0].p"
},
"order" : "asc"
}
}
}

sorting elasticsearch top hits results

I am trying to execute a query in elasticsearch to get reuslt of specific users from certain date range. the results should be grouped by userId and sorted on trackTime field, I am able to use group by using aggregation but i am not able to sort aggregation buckets on tracktime, i write down the following query
GET _search
{
"size": 0,
"query": {
"filtered": {
"query": {
"bool": {
"must": [
{
"range": {
"trackTime": {
"from": "2016-02-08T05:51:02.000Z"
}
}
}
]
}
},
"filter": {
"terms": {
"userId": [
9,
10,
3
]
}
}
}
},
"aggs": {
"by_district": {
"terms": {
"field": "userId"
},
"aggs": {
"tops": {
"top_hits": {
"size": 2
}
}
}
}
}
}
what more should i have to use to sort the top hits result? Thanks in advance...
You can use sort like .
"aggs": {
"by_district": {
"terms": {
"field": "userId"
},
"aggs": {
"tops": {
"top_hits": {
"sort": [
{
"fieldName": {
"order": "desc"
}
}
],
"size": 2
}
}
}
}
}
Hope it helps

Resources