How to get aggregations and sort based on child document hits in elasticsearch - elasticsearch

I have documents indexed like so:
{
"attrib": "value", // etc
"prices": [
{
"p": 10,
"d": "2016-01-01"
},
{
"p": 20,
"d": "2016-01-02"
},
{
"p": 30,
"d": "2016-01-03"
},
{
"p": 40,
"d": "2016-01-04"
}
]
}
I would like to get aggregation buckets to tell me something like this:
Price Buckets
prices.p between 1 and 10 (20)
prices.p between 11 and 20 (22)
prices.p between 21 and 30 (2)
Date Buckets
prices.d between Jan 1 and Jan 30 (20)
prices.d between Feb 1 and Feb 28 (22)
prices.d between Mar 1 and Mar 30 (2)
Where the count would show the number of parent documents that have prices.X between X and Y, NOT the number of prices in total.
Secondary to this, if I wanted to perform a filter to only get documents with prices.p between 1 and 30, I'd need the aggregation to reflect this.
Thirdly, I'd like to be able to order my results by the top child hit of the result.
So in plain English, my query would be:
"Find me all documents with at least one price between X and Y having a date between A and B, order the results by price (or date as required)"
My query so far:
{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"nested": {
"filter": {
"bool": {
"must": [
{
"range": {
"prices.p": {
"gte": 1,
"lte": 30
}
}
}
]
}
},
"path": "prices",
"inner_hits": {
"sort": [
"p"
]
}
}
}
]
}
},
"query": {
"match_all": {}
}
}
}
}
This returns documents in the default sort order, but with the inner_hits sorted by prices.p - so then I can display the lowest price for an item, alongwith the date for that price (prices.d).
Similarly, I'd like to be able to filter where prices.d is between two dates - also aggregating the dates.
Lastly, I'd like to be able to order my full document hits by the first inner hit (p or d)

You have to use nested aggregations. To build two buckets, you have to run two parallel nested aggregations.
To filter furthur more your bucket, you can add a parent query which will filter your document set as well as your buckets.
Following is the query, I changed nested d type to integer for my simplicity, but this will work for you on date range as well.
{
"aggs": {
"p_range": {
"nested": {
"path": "prices"
},
"aggs": {
"p_nested_range": {
"range": {
"field": "prices.p",
"ranges": [
{
"from": 0,
"to": 1000
},
{
"from": 1000,
"to": 2000
}
]
}
}
}
},
"d_range" :{
"nested": {
"path": "prices"
},
"aggs": {
"d_nested_range": {
"range": {
"field": "prices.d",
"ranges": [
{
"from": 0,
"to": 500
},
{
"from": 500,
"to": 1000
}
]
}
}
}
}
},
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"nested": {
"filter": {
"bool": {
"must": [
{
"range": {
"prices.p": {
"gte": 200,
"lte": 1400
}
}
}
]
}
},
"path": "prices",
"inner_hits": {
"sort": [
"prices.p"
]
}
}
}
]
}
},
"query": {
"match_all": {}
}
}
}
}
Furthur more if you want to filter only document sets, but you don't want your query to effect your buckets, you can take a look at post_filter
Edit - To sort parent document based on first inner_hit inside prices nested type, use the following query.
You don't need to have a sort clause inside innerhits, as sort inside innerhits is used to sort the nested type only not the parent doc
{
"aggs": {
"p_range": {
"nested": {
"path": "prices"
},
"aggs": {
"p_nested_range": {
"range": {
"field": "prices.p",
"ranges": [
{
"from": 0,
"to": 1000
},
{
"from": 1000,
"to": 2000
}
]
}
}
}
},
"d_range" :{
"nested": {
"path": "prices"
},
"aggs": {
"d_nested_range": {
"range": {
"field": "prices.d",
"ranges": [
{
"from": 0,
"to": 500
},
{
"from": 500,
"to": 1000
}
]
}
}
}
}
},
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"nested": {
"filter": {
"bool": {
"must": [
{
"range": {
"prices.p": {
"gte": 200,
"lte": 1400
}
}
}
]
}
},
"path": "prices",
"inner_hits": {
}
}
}
]
}
},
"query": {
"match_all": {}
}
}
},
"sort": {
"_script" : {
"type" : "number",
"script" : {
"inline": "_source.prices[0].p"
},
"order" : "asc"
}
}
}

Related

Aggregation not taking place on basis of size paramter passed in ES query

My ES query looks like this. I am trying to get average rating for indexes starting from 0 to 9. But ES is taking the average of all the records.
GET review/analytics/_search
{
"_source": "r_id",
"from": 0,
"size": 9,
"query": {
"bool": {
"filter": [
{
"terms": {
"b_id": [
236611
]
}
},
{
"range": {
"r_date": {
"gte": "1970-01-01 05:30:00",
"lte": "2019-08-13 17:13:17",
"format": "yyyy-MM-dd HH:mm:ss"
}
}
},
{
"terms": {
"s_type": [
"aggregation",
"organic",
"survey"
]
}
},
{
"bool": {
"must_not": [
{
"terms": {
"s_id": [
392
]
}
}
]
}
},
{
"term": {
"status": 2
}
},
{
"bool": {
"must_not": [
{
"terms": {
"ba_id": []
}
}
]
}
}
]
}
},
"sort": [
{
"featured": {
"order": "desc"
}
},
{
"r_date": {
"order": "desc"
}
}
],
"aggs": {
"avg_rating": {
"filter": {
"bool": {
"must_not": [
{
"term": {
"rtng": 0
}
}
]
}
},
"aggs": {
"rtng": {
"avg": {
"field": "rtng"
}
}
}
},
"avg_rating1": {
"filter": {
"bool": {
"must_not": [
{
"term": {
"rtng": 0
}
}
]
}
},
"aggs": {
"rtng": {
"avg": {
"field": "rtng"
}
}
}
}
}
}
The query results shows the doc_count as 43 . whereas i want it to be 9 so that i can calculate the average correctly. I have specified the size above. The result of query seems to be calculated correctly but aggregation result is not proper.
from and size have no impact on the aggregations. They only define how many documents will be returned in the hits.hits array.
Aggregations always run on the whole document set selected by whatever query is in your query section.
If you know the IDs of the "first" nine documents, you can add a terms query in your query so that only those 9 documents are selected and so that the average rating is only computed on those 9 documents.

Elastic search find difference in a field using range query

I have to find out how many KWH has been run between two given time. For now, I am having 2 queries to find out last and the first record between the time using asc and desc sorting and doing subtraction to get the KWH value between the time is there any other way to get the KWH without 2 queries
Range query:
"query": {
"bool": {
"must": [
{
"range": {
"createdtime": {
"gte": "1566757800000",
"lte": "1566844199000",
"boost": 2.0
}
}
},
{
"match": {
"meter_id": 101
}
}
]
}
},
"size" : 1,
"from": 0,
"sort": { "createdtime" : {"order" : "desc"} }
}
another query is almost same except the order is asc
So both the 2 queries will return the record, and I am doing the subtractions in the result set to find out the differences.
You could run one query only and use top_hits aggregation to extract the "first" and "last" value, but it won't calculate the difference. You'd still have to do it outside Elasticsearch.
{
"size": 0,
"query": {
"bool": {
"must": [
{
"range": {
"createdtime": {
"gte": "1566757800000",
"lte": "1566844199000",
"boost": 2.0
}
}
},
{
"match": {
"meter_id": 101
}
}
]
}
},
"aggs": {
"range": {
"filter": {
"range": {
"createddate": {
"gte": "2016-08-19T10:00:00",
"lte": "2016-08-23T10:00:00"
}
}
},
"aggs": {
"min": {
"top_hits": {
"sort": [{"createddate": {"order": "asc"}}],
"_source": {"includes": [ "kwh_value" ]},
"size" : 1
}
},
"max": {
"top_hits": {
"sort": [{"createddate": {"order": "desc"}}],
"_source": {"includes": [ "kwh_value" ]},
"size" : 1
}
}
}
}
}
}

Need aggregation on document inner array object - ElasticSearch

I am trying to do aggregation over the following document
{
"pid": 900000,
"mid": 9000,
"cid": 90,
"bid": 1000,
"gmv": 1000000,
"vol": 200,
"data": [
{
"date": "25-11-2018",
"gmv": 100000,
"vol": 20
},
{
"date": "24-11-2018",
"gmv": 100000,
"vol": 20
},
{
"date": "23-11-2018",
"gmv": 100000,
"vol": 20
}
]
}
The analysis which needs to be done here is:
Filter on mid or/and cid on all documents
Filter range on data.date for last 7 days and sum data.vol over that range for each pid
sort the documents over the sum obtained in previous step in desc order
Group these results by pid.
This means we are trying to get top products by sum of the volume (quantity sold) within a date range for specific cid/mid.
PID here refers product ID,
MID refers here merchant ID,
CID refers here category ID
Firstly you need to change your mapping to run the query on nested fields.
change the type for field 'data' as 'nested'.
Then you can use the range query in filter along with the terms filter on mid/cid to filter on the data. Once you get the correct data set, then you can aggregate on the pid following the sub aggregation on sum of vol.
Here is the below query.
{
"query": {
"bool": {
"filter": [
{
"bool": {
"must": [
{
"range": {
"data.date": {
"gte": "28-11-2018",
"lte": "25-11-2018"
}
}
},
{
"must": [
{
"terms": {
"mid": [
"9000"
]
}
}
]
}
]
}
}
]
}
},
"aggs": {
"AGG_PID": {
"terms": {
"field": "pid",
"size": 0,
"order": {
"TOTAL_SUM": "desc"
},
"min_doc_count": 1
},
"aggs": {
"TOTAL_SUM": {
"sum": {
"field": "data.vol"
}
}
}
}
}
}
You can modify the query accordingly. Hope this will be helpful.
Please find nested aggregation query which sorts by "vol" for each bucket of "pid". You can add any number of filters in the query part.
{
"size": 0,
"query": {
"bool": {
"must": [
{
"term": {
"mid": "2"
}
}
]
}
},
"aggs": {
"top_products_sorted_by_order_volume": {
"terms": {
"field": "pid",
"order": {
"nested_data_object>order_volume_by_range>order_volume_sum": "desc"
}
},
"aggs": {
"nested_data_object": {
"nested": {
"path": "data"
},
"aggs": {
"order_volume_by_range": {
"filter": {
"range": {
"data.date": {
"gte": "2018-11-26",
"lte": "2018-11-27"
}
}
},
"aggs": {
"order_volume_sum": {
"sum": {
"field": "data.ord_vol"
}
}
}
}
}
}
}
}
}
}

Elasticsearch - adding a separate query for aggregation

Below is the elasticsearch query I am using to get the results and the filter options for the results from the aggregation. The problem is that whenever someone applies a filter, the overall result changes and hence the filter options also changes. I do not want the filter options to changes unless query parameter change. For now I am making two calls:
get all results without aggregation
Get all filters by using aggregation and setting the size parameter to 0
This approach uses 2 api requests and hence doubling the time. Can this be done in one request only ?
First call: All results without aggregation
{
"query": {
"bool": {
"should": [
{
"match_phrase": {
"title": {
"query": "cooking",
"boost": 2,
"slop": 10
}
}
},
{
"match": {
"title": {
"query": "cooking",
"boost": 1
}
}
}
],
"minimum_should_match": 1,
"filter": [
{
"match": {
"is_paid": false
}
}
]
}
},
"sort": [],
"from": 0,
"size": 15
}
Second call: getting filters
{
"query": {
"bool": {
"should": [
{
"match_phrase": {
"title": {
"query": "cooking",
"boost": 2,
"slop": 10
}
}
},
{
"match": {
"title": {
"query": "cooking",
"boost": 1
}
}
}
],
"minimum_should_match": 1
}
},
"size": 0,
"aggs": {
"courseCount": {
"terms": {
"field": "provider",
"size": 100
}
},
"paidCount": {
"terms": {
"field": "is_paid",
"size": 3
}
},
"subjectCount": {
"terms": {
"field": "subject",
"size": 30
}
},
"levelCount": {
"terms": {
"field": "level",
"size": 4
}
},
"pacingCount": {
"terms": {
"field": "pacing_type",
"size": 4
}
}
}
}

ElasticSearch - significant term aggregation with range

I am interested to know how can I add a range for a significant terms aggregations query. For example:
{
"query": {
"terms": {
"text_content": [
"searchTerm"
]
},
"range": {
"dateField": {
"from": "date1",
"to": "date2"
}
}
},
"aggregations": {
"significantQTypes": {
"significant_terms": {
"field": "field1",
"size": 10
}
}
},
"size": 0
}
will not work. Any suggestions on how to specify the range?
Instead of using a range query, use a range filter as the relevance/score doesn't seem to matter in your case.
Then, in order to combine your query with a range filter, you should use a filtered query (see documentation).
Try something like this :
{
"query": {
"filtered": {
"query": {
"terms": {
"text_content": [
"searchTerm"
]
}
},
"filter": {
"range": {
"dateField": {
"from": "date1",
"to": "date2"
}
}
}
}
},
"aggs": {
"significantQTypes": {
"significant_terms": {
"field": "field1",
"size": 10
}
}
},
"size": 0
}
Hope this helps!

Resources