Elastic-search aggregate top 3 common result - elasticsearch

My indexed data is of below structure, i want to aggregate top 3 most repeted productProperty, so top 3 most repeated productProperty will be there in aggregation result
[
{
productProperty: "material",
productValue:[{value: wood},{value: plastic}] ,
},
{
productProperty: "material",
productValuea:[{value: wood},{value: plastic}] ,
},
{
productProperty: "type",
productValue:[{value: 26A},{value: 23A}] ,
},
{
productProperty: "type",
productValue:[{value: 22B},{value: 90C}] ,
},
{
productProperty: "material",
productValue:[{value: wood},{value: plastic}] ,
},
{
productProperty: "age_rating",
productValue:[{value: 18},{value: 13}] ,
}
]
Below query aggregates all based on productProperty but how can i get top 3 results out of that
{
"query": {},
"aggs": {
"filtered_product_property": {
"filter": {
"bool": {
"must": []
}
},
"aggs": {
"aggs": {
"productProperty": {
"terms": {
"field": "productProperty"
}
}
}
}
}
}
}

You can use the size parameter in your term aggregation.
{
"query": {},
"aggs": {
"filtered_product_property": {
"filter": {
"bool": {
"must": []
}
},
"aggs": {
"aggs": {
"productProperty": {
"terms": {
"field": "productProperty",
"size" : 3
}
}
}
}
}
}
}
Important to point out, that terms aggregations are not the most accurate in some cases.

As mentioned by #Tushar you can use the size param. According to the ES official documentation
when there are lots of unique terms, Elasticsearch only returns the
top terms; this number is the sum of the document counts for all
buckets that are not part of the response
However, you can define the order in which the sorting of the results should be done of the aggregation response, using the order param.
By default, the result is sorted on the basis of doc count in descending order
Search Query will be
{
"aggs": {
"productProperty": {
"terms": {
"field": "productProperty.keyword",
"size": 3
}
}
}
}
And, search result would be
"aggregations": {
"productProperty": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "material",
"doc_count": 3
},
{
"key": "type",
"doc_count": 2
},
{
"key": "age_rating",
"doc_count": 1
}
]
}
}

Related

ElasticSearch cardinality aggregation with multiple query

I have a document with merchant and item. my document will look liken
{
"merchant": "M1",
"item": "I1"
}
For the given list of merchant names, I want to get number of unique items on each merchant.
I was able to get number of unique items on a given merchant by following query:
{
"size": 0,
"query": {
"match": {
"merchant": "M1"
}
},
"aggs": {
"count_unique_items": {
"cardinality": {
"field": "I1"
}
}
}
}
Is there a way to expand this query so instead of 1 merchant, I can do search for N merchants with one query?
You need to use terms query to match multiple merchants and use multilevel aggregation to find unique count per merchant. So create a terms aggregation for merchant and then add cardinality aggregation as sub aggregation to the terms aggregation. Query will look like below:
{
"size": 0,
"query": {
"terms": {
"merchant": [
"M1",
"M2"
]
}
},
"aggs": {
"merchent": {
"terms": {
"field": "merchant"
},
"aggs": {
"item_count": {
"cardinality": {
"field": "item"
}
}
}
}
}
}
As suggested by #Opster ES Ninja Nishant, you need to use multilevel aggregation.
Adding a working example with index data,search query, and search result
Index Data:
{
"merchant": "M3",
"item": ["I3","I2"]
}
{
"merchant": "M2",
"item": ["I2","I2"]
}
{
"merchant": "M1",
"item": "I1"
}
Search Query:
To count the unique number of item for a given merchant, in the cardinality aggregation instead of I1, you should use the item field
{
"size":0,
"query": {
"terms": {
"merchant.keyword": [
"M1",
"M2",
"M3"
]
}
},
"aggs": {
"merchent": {
"terms": {
"field": "merchant.keyword"
},
"aggs": {
"item_count": {
"cardinality": {
"field": "item.keyword" <-- note this
}
}
}
}
}
}
Search Result:
"aggregations": {
"merchent": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "M1",
"doc_count": 1,
"item_count": {
"value": 1
}
},
{
"key": "M2",
"doc_count": 1,
"item_count": {
"value": 1
}
},
{
"key": "M3",
"doc_count": 1,
"item_count": {
"value": 2
}
}
]
}

Filtering aggregation results

This question is a subquestion of this question. Posting as a separate question for attention.
Sample Docs:
{
"id":1,
"product":"p1",
"cat_ids":[1,2,3]
}
{
"id":2,
"product":"p2",
"cat_ids":[3,4,5]
}
{
"id":3,
"product":"p3",
"cat_ids":[4,5,6]
}
Ask: To get products belonging to a particular category. e.g cat_id = 3
Query:
GET product/_search
{
"size": 0,
"aggs": {
"cats": {
"terms": {
"field": "cats",
"size": 10
},"aggs": {
"products": {
"terms": {
"field": "name.keyword",
"size": 10
}
}
}
}
}
}
Question:
How to filter the aggregated result for cat_id = 3 here. I tried bucket_selector as well but it is not working.
Note: Due to multi-value of cat_ids filtering and then aggregation isn't working
You can filter values, on the basis of which buckets will be created.
It is possible to filter the values for which buckets will be created.
This can be done using the include and exclude parameters which are
based on regular expression strings or arrays of exact values.
Additionally, include clauses can filter using partition expressions.
Adding a working example with index data, search query, and search result
Index Data:
{
"id":1,
"product":"p1",
"cat_ids":[1,2,3]
}
{
"id":2,
"product":"p2",
"cat_ids":[3,4,5]
}
{
"id":3,
"product":"p3",
"cat_ids":[4,5,6]
}
Search Query:
{
"size": 0,
"aggs": {
"cats": {
"terms": {
"field": "cat_ids",
"include": [ <-- note this
3
]
},
"aggs": {
"products": {
"terms": {
"field": "product.keyword",
"size": 10
}
}
}
}
}
}
Search Result:
"aggregations": {
"cats": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 3,
"doc_count": 2,
"products": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "p1",
"doc_count": 1
},
{
"key": "p2",
"doc_count": 1
}
]
}
}
]
}

Filter in bucket key and doc_count in elastic search

I have an index which has multiple document. Now I want to write a query in elastic search which will allow me to filter on bucket key and doc_count
{
"aggs": {
"genres": {
"terms": {
"field": "event.keyword"
}
}
}
}
"aggregations": {
"genres": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 33,
"buckets": [
{
"key": "eone",
"doc_count": 5
}
,
{
"key": "etwo",
"doc_count": 2
}
]
}
}
I want to write query by which I can apply filter on key name and dpc count. Suppose I want to get result for which key is eone and doc count is 5 then I should only get the result of matching this critera
You can try with min_doc_count like below,
{
"aggs": {
"genres": {
"terms": {
"field": "event.keyword",
"min_doc_count": 5
}
}
}
}
By using filter and min_doc_count
GET index_name/_search
{
"size": 0,
"query": {
"constant_score": {
"filter": {
"bool": {
"must": [
{
"match": {
"event.keyword": "eone"
}
}
]
}
}
}
},
"aggs": {
"genres": {
"terms": {
"field": "event.keyword",
"min_doc_count": 5
}
}
}
}
OR using include along with min_doc_count like below,
GET index_name/_search
{
"size": 0,
"aggs": {
"genres": {
"terms": {
"field": "event.keyword",
"min_doc_count": 5,
"include" : "eone"
}
}
}
}
See more: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#_minimum_document_count_4

Can ElasticSearch aggregate over top N items in each sorted bucket

I have this query that buckets the records by data source code, and computes an average over all records in each bucket.
How could I modify it so that each bucket is limited to having (at most) top N records when ordered by record.timestamp desc (or any other record field for that matter)
The end effect I want is an average per bucket using the most recent N records rather than all records (so the doc_count in each bucket would have an upper limit of N).
I've searched and experimented extensively with no success.
Current query:
{
"size": 0,
"query": {
"constant_score": {
"filter": {
"bool": {
"must": [
{
"term": {
"jobType": "LiveEventScoring"
}
},
{
"term": {
"host": "MTVMDANS"
}
},
{
"term": {
"measurement": "EventDataLoadFromCacheDuration"
}
}
]
}
}
}
},
"aggs": {
"data-sources": {
"terms": {
"field": "dataSourceCode"
},
"aggs": {
"avgDuration": {
"avg": {
"field": "elapsedMs"
}
}
}
}
}
}
Results in:
"aggregations": {
"data-sources": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "AU_VIRT",
"doc_count": 6259,
"avgDuration": {
"value": 3525.683176226234
}
},
{
"key": "AU_HN_VIRT",
"doc_count": 2812,
"avgDuration": {
"value": 3032.0771692745375
}
},
{
"key": "GB_VIRT",
"doc_count": 1845,
"avgDuration": {
"value": 1432.39945799458
}
}
]
}
}
}
Alternately if grabbing top N from sorted bucket is not possible, I could do multiple queries one for each dataSourceCode, e.g. for AU_VIRT:
{
"size":0,
"query": {
"constant_score": {
"filter": {
"bool": {
"must": [
{
"term": {
"jobType": "LiveEventScoring"
}
},
{
"term": {
"host": "MTVMDANS"
}
},
{
"term": {
"dataSourceCode": "AU_VIRT"
}
},
{
"term": {
"measurement": "EventDataLoadFromCacheDuration"
}
}
]
}
}
}
},
"aggs": {
"avgDuration": {
"avg": {
"field": "elapsedMs"
}
}
}
}
}
but I am now challenged in how I make the avgDuration work on top N results sorted by timestamp desc.

How to use ElasticSearch to bucket historical data from midnight to now?

So I have an index with timestamps in the following format:
2015-03-20T12:00:00+0500
What I would like to do in the SQL equivalent is the following:
select date(timestamp), sum(orders)
from data
where time(timestamp) < time(now)
group by date(timestamp)
I know I need an aggregation but, for now, I've tried a basic search query below but I'm getting a malformed error:
{
"size": 0,
"query":
{
"filtered":
{
"query":
{
"match_all" : {}
},
"filter":
{
"range":
{
"#timestamp":
{
"from": "00:00:01.000",
"to": "15:00:00.000"
}
}
}
}
}
}
You do indeed want an aggregation, specifically the date histogram aggregation. Something like
{
"query": {"match_all": {}},
"aggs": {
"by_date": {
"date_histogram": {
"field": "timestamp",
"interval": "day"
},
"aggs": {
"order_sum": {
"sum": {"field": "foo"}
}
}
}
}
}
First you have a bucketing aggregation that groups your documents by date, then inside that a metric aggregation that computes a value (in this case a sum) for each bucket
which would return data of the form
{
...
"aggregations": {
"by_date": {
"buckets": [
{
"key_as_string": "2015-03-01T00:00:00.000Z",
"key": 1425168000000,
"doc_count": 8644,
"order_sum": {
"value": 1234
}
},
{
"key_as_string": "2015-03-02T00:00:00.000Z",
"key": 1425254400000,
"doc_count": 8819,
"order_sum": {
"value": 45678
}
},
...
]
}
}
}
There is a good intro to aggregations on the elasticsearch blog (part 1 and part 2) if you want to do some more reading.

Resources