Filtering aggregation results - elasticsearch

This question is a subquestion of this question. Posting as a separate question for attention.
Sample Docs:
{
"id":1,
"product":"p1",
"cat_ids":[1,2,3]
}
{
"id":2,
"product":"p2",
"cat_ids":[3,4,5]
}
{
"id":3,
"product":"p3",
"cat_ids":[4,5,6]
}
Ask: To get products belonging to a particular category. e.g cat_id = 3
Query:
GET product/_search
{
"size": 0,
"aggs": {
"cats": {
"terms": {
"field": "cats",
"size": 10
},"aggs": {
"products": {
"terms": {
"field": "name.keyword",
"size": 10
}
}
}
}
}
}
Question:
How to filter the aggregated result for cat_id = 3 here. I tried bucket_selector as well but it is not working.
Note: Due to multi-value of cat_ids filtering and then aggregation isn't working

You can filter values, on the basis of which buckets will be created.
It is possible to filter the values for which buckets will be created.
This can be done using the include and exclude parameters which are
based on regular expression strings or arrays of exact values.
Additionally, include clauses can filter using partition expressions.
Adding a working example with index data, search query, and search result
Index Data:
{
"id":1,
"product":"p1",
"cat_ids":[1,2,3]
}
{
"id":2,
"product":"p2",
"cat_ids":[3,4,5]
}
{
"id":3,
"product":"p3",
"cat_ids":[4,5,6]
}
Search Query:
{
"size": 0,
"aggs": {
"cats": {
"terms": {
"field": "cat_ids",
"include": [ <-- note this
3
]
},
"aggs": {
"products": {
"terms": {
"field": "product.keyword",
"size": 10
}
}
}
}
}
}
Search Result:
"aggregations": {
"cats": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 3,
"doc_count": 2,
"products": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "p1",
"doc_count": 1
},
{
"key": "p2",
"doc_count": 1
}
]
}
}
]
}

Related

Join Query in Kibana KQL

I have three logs in ES like
{"#timestamp":"2022-07-19T11:24:16.274073+05:30","log":{"level":200,"logger":"production","message":"BUY_ITEM1","context":{"user_id":31312},"datetime":"2022-07-19T11:24:16.274073+05:30","extra":{"ip":"127.0.0.1"}}}
{"#timestamp":"2022-07-19T11:24:16.274073+05:30","log":{"level":200,"logger":"production","message":"BUY_ITEM2","context":{"user_id":31312},"datetime":"2022-07-19T11:24:16.274073+05:30","extra":{"ip":"127.0.0.1"}}}
{"#timestamp":"2022-07-19T11:24:16.274073+05:30","log":{"level":200,"logger":"production","message":"CLICK_ITEM3","context":{"user_id":31312},"datetime":"2022-07-19T11:24:16.274073+05:30","extra":{"ip":"127.0.0.1"}}}
I can get the users who bought Item1 by querying log.message: "BUY_ITEM1" in KQL in Kibana.
How can I get user_ids who have both BUY_ITEM1 and BUY_ITEM2 ?
Tldr;
Join query as they exist in SQL are not really possible in Elasticsearch, they are (very limited)[https://www.elastic.co/guide/en/elasticsearch/reference/current/joining-queries.html].
You will need to work around this issue.
Work around
You could do an aggregation on user_id of all the product they bought.
GET /73031860/_search
{
"query": {
"terms": {
"log.message.keyword": [
"BUY_ITEM1",
"BUY_ITEM2"
]
}
},
"size": 0,
"aggs": {
"users": {
"terms": {
"field": "log.context.user_id",
"size": 10
},
"aggs": {
"products": {
"terms": {
"field": "log.message.keyword",
"size": 10
}
}
}
}
}
}
This will give you the following result
{
...
},
"aggregations": {
"users": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 31312,
"doc_count": 2,
"products": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "BUY_ITEM1",
"doc_count": 1
},
{
"key": "BUY_ITEM2",
"doc_count": 1
}
]
}
}
]
}
}
}

Elastic-search aggregate top 3 common result

My indexed data is of below structure, i want to aggregate top 3 most repeted productProperty, so top 3 most repeated productProperty will be there in aggregation result
[
{
productProperty: "material",
productValue:[{value: wood},{value: plastic}] ,
},
{
productProperty: "material",
productValuea:[{value: wood},{value: plastic}] ,
},
{
productProperty: "type",
productValue:[{value: 26A},{value: 23A}] ,
},
{
productProperty: "type",
productValue:[{value: 22B},{value: 90C}] ,
},
{
productProperty: "material",
productValue:[{value: wood},{value: plastic}] ,
},
{
productProperty: "age_rating",
productValue:[{value: 18},{value: 13}] ,
}
]
Below query aggregates all based on productProperty but how can i get top 3 results out of that
{
"query": {},
"aggs": {
"filtered_product_property": {
"filter": {
"bool": {
"must": []
}
},
"aggs": {
"aggs": {
"productProperty": {
"terms": {
"field": "productProperty"
}
}
}
}
}
}
}
You can use the size parameter in your term aggregation.
{
"query": {},
"aggs": {
"filtered_product_property": {
"filter": {
"bool": {
"must": []
}
},
"aggs": {
"aggs": {
"productProperty": {
"terms": {
"field": "productProperty",
"size" : 3
}
}
}
}
}
}
}
Important to point out, that terms aggregations are not the most accurate in some cases.
As mentioned by #Tushar you can use the size param. According to the ES official documentation
when there are lots of unique terms, Elasticsearch only returns the
top terms; this number is the sum of the document counts for all
buckets that are not part of the response
However, you can define the order in which the sorting of the results should be done of the aggregation response, using the order param.
By default, the result is sorted on the basis of doc count in descending order
Search Query will be
{
"aggs": {
"productProperty": {
"terms": {
"field": "productProperty.keyword",
"size": 3
}
}
}
}
And, search result would be
"aggregations": {
"productProperty": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "material",
"doc_count": 3
},
{
"key": "type",
"doc_count": 2
},
{
"key": "age_rating",
"doc_count": 1
}
]
}
}

ElasticSearch cardinality aggregation with multiple query

I have a document with merchant and item. my document will look liken
{
"merchant": "M1",
"item": "I1"
}
For the given list of merchant names, I want to get number of unique items on each merchant.
I was able to get number of unique items on a given merchant by following query:
{
"size": 0,
"query": {
"match": {
"merchant": "M1"
}
},
"aggs": {
"count_unique_items": {
"cardinality": {
"field": "I1"
}
}
}
}
Is there a way to expand this query so instead of 1 merchant, I can do search for N merchants with one query?
You need to use terms query to match multiple merchants and use multilevel aggregation to find unique count per merchant. So create a terms aggregation for merchant and then add cardinality aggregation as sub aggregation to the terms aggregation. Query will look like below:
{
"size": 0,
"query": {
"terms": {
"merchant": [
"M1",
"M2"
]
}
},
"aggs": {
"merchent": {
"terms": {
"field": "merchant"
},
"aggs": {
"item_count": {
"cardinality": {
"field": "item"
}
}
}
}
}
}
As suggested by #Opster ES Ninja Nishant, you need to use multilevel aggregation.
Adding a working example with index data,search query, and search result
Index Data:
{
"merchant": "M3",
"item": ["I3","I2"]
}
{
"merchant": "M2",
"item": ["I2","I2"]
}
{
"merchant": "M1",
"item": "I1"
}
Search Query:
To count the unique number of item for a given merchant, in the cardinality aggregation instead of I1, you should use the item field
{
"size":0,
"query": {
"terms": {
"merchant.keyword": [
"M1",
"M2",
"M3"
]
}
},
"aggs": {
"merchent": {
"terms": {
"field": "merchant.keyword"
},
"aggs": {
"item_count": {
"cardinality": {
"field": "item.keyword" <-- note this
}
}
}
}
}
}
Search Result:
"aggregations": {
"merchent": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "M1",
"doc_count": 1,
"item_count": {
"value": 1
}
},
{
"key": "M2",
"doc_count": 1,
"item_count": {
"value": 1
}
},
{
"key": "M3",
"doc_count": 1,
"item_count": {
"value": 2
}
}
]
}

Filter in bucket key and doc_count in elastic search

I have an index which has multiple document. Now I want to write a query in elastic search which will allow me to filter on bucket key and doc_count
{
"aggs": {
"genres": {
"terms": {
"field": "event.keyword"
}
}
}
}
"aggregations": {
"genres": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 33,
"buckets": [
{
"key": "eone",
"doc_count": 5
}
,
{
"key": "etwo",
"doc_count": 2
}
]
}
}
I want to write query by which I can apply filter on key name and dpc count. Suppose I want to get result for which key is eone and doc count is 5 then I should only get the result of matching this critera
You can try with min_doc_count like below,
{
"aggs": {
"genres": {
"terms": {
"field": "event.keyword",
"min_doc_count": 5
}
}
}
}
By using filter and min_doc_count
GET index_name/_search
{
"size": 0,
"query": {
"constant_score": {
"filter": {
"bool": {
"must": [
{
"match": {
"event.keyword": "eone"
}
}
]
}
}
}
},
"aggs": {
"genres": {
"terms": {
"field": "event.keyword",
"min_doc_count": 5
}
}
}
}
OR using include along with min_doc_count like below,
GET index_name/_search
{
"size": 0,
"aggs": {
"genres": {
"terms": {
"field": "event.keyword",
"min_doc_count": 5,
"include" : "eone"
}
}
}
}
See more: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#_minimum_document_count_4

Getting description when aggregating with Elasticsearch

When we use the aggregation feature on elastic, we get a value of the field we aggregating back but we also want to get the description of that field. We have to use the sector.id as other parts of our api uses it later on.
For ex: our data looks like this:
[{
"id":"123"
"sectors":[{
"id":"sector-1",
"name":"Automotive"
}]
},
{
"id":"123"
"sectors":[{
"id":"sector-2",
"name":"Biology"
}]
}]
When we aggregate over sectors.id our response looks like:
"aggregations": {
"sector": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "sector-2",
"doc_count": 19672
},
{
"key": "sector-1",
"doc_count": 11699
}]
}
}
Is there any way to get sectors.name as well as the key in the results?
It seems like that sectors should a nested field. Now assuming that sector name is unique per sector-id.
You may use sub-aggregations to figure out the related keys
GET _search
{
"size": 0,
"aggs": {
"sectors": {
"nested": {
"path": "sectors"
},
"aggs": {
"sector_id": {
"terms": {
"field": "sectors.id"
},
"aggs": {
"sector_name": {
"terms": {
"field": "sectors.name"
}
}
}
}
}
}
}
}

Resources