ElasticSearch cardinality aggregation with multiple query - elasticsearch

I have a document with merchant and item. my document will look liken
{
"merchant": "M1",
"item": "I1"
}
For the given list of merchant names, I want to get number of unique items on each merchant.
I was able to get number of unique items on a given merchant by following query:
{
"size": 0,
"query": {
"match": {
"merchant": "M1"
}
},
"aggs": {
"count_unique_items": {
"cardinality": {
"field": "I1"
}
}
}
}
Is there a way to expand this query so instead of 1 merchant, I can do search for N merchants with one query?

You need to use terms query to match multiple merchants and use multilevel aggregation to find unique count per merchant. So create a terms aggregation for merchant and then add cardinality aggregation as sub aggregation to the terms aggregation. Query will look like below:
{
"size": 0,
"query": {
"terms": {
"merchant": [
"M1",
"M2"
]
}
},
"aggs": {
"merchent": {
"terms": {
"field": "merchant"
},
"aggs": {
"item_count": {
"cardinality": {
"field": "item"
}
}
}
}
}
}

As suggested by #Opster ES Ninja Nishant, you need to use multilevel aggregation.
Adding a working example with index data,search query, and search result
Index Data:
{
"merchant": "M3",
"item": ["I3","I2"]
}
{
"merchant": "M2",
"item": ["I2","I2"]
}
{
"merchant": "M1",
"item": "I1"
}
Search Query:
To count the unique number of item for a given merchant, in the cardinality aggregation instead of I1, you should use the item field
{
"size":0,
"query": {
"terms": {
"merchant.keyword": [
"M1",
"M2",
"M3"
]
}
},
"aggs": {
"merchent": {
"terms": {
"field": "merchant.keyword"
},
"aggs": {
"item_count": {
"cardinality": {
"field": "item.keyword" <-- note this
}
}
}
}
}
}
Search Result:
"aggregations": {
"merchent": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "M1",
"doc_count": 1,
"item_count": {
"value": 1
}
},
{
"key": "M2",
"doc_count": 1,
"item_count": {
"value": 1
}
},
{
"key": "M3",
"doc_count": 1,
"item_count": {
"value": 2
}
}
]
}

Related

Order by doc_count in composite aggregation (or suitable alternatives)

I have a search like the following
{
"size": 0,
"query": { "...": "..." },
"_source": false,
"aggregations": {
"agg1": { "...": "..." },
"agg2": { "...": "..." }
}
}
where agg* is composite aggregation of the kind
"agg1" : {
"composite": {
"size": 300,
"sources": [
{
"field1": {
"terms": {
"field": "field1.keyword",
"missing_bucket": true,
}
}
},
{
"field2": {
"terms": {
"field": "field2.keyword",
"missing_bucket": true,
"order": "asc"
}
}
}
]
},
"aggregations": {
"field3": {
"filter": { "term": { "field3.keyword": "xyz" } }
}
}
}
I want to order by doc_count of the buckets as I don't need all the buckets, but just the top n, like what happens in some Kibana visualizations. From the documentation of composite aggregations it doesn't seem possible to order the results similarly at what happens with terms aggregations. Is there a workaround or alternative queries to do this?

Elastic-search aggregate top 3 common result

My indexed data is of below structure, i want to aggregate top 3 most repeted productProperty, so top 3 most repeated productProperty will be there in aggregation result
[
{
productProperty: "material",
productValue:[{value: wood},{value: plastic}] ,
},
{
productProperty: "material",
productValuea:[{value: wood},{value: plastic}] ,
},
{
productProperty: "type",
productValue:[{value: 26A},{value: 23A}] ,
},
{
productProperty: "type",
productValue:[{value: 22B},{value: 90C}] ,
},
{
productProperty: "material",
productValue:[{value: wood},{value: plastic}] ,
},
{
productProperty: "age_rating",
productValue:[{value: 18},{value: 13}] ,
}
]
Below query aggregates all based on productProperty but how can i get top 3 results out of that
{
"query": {},
"aggs": {
"filtered_product_property": {
"filter": {
"bool": {
"must": []
}
},
"aggs": {
"aggs": {
"productProperty": {
"terms": {
"field": "productProperty"
}
}
}
}
}
}
}
You can use the size parameter in your term aggregation.
{
"query": {},
"aggs": {
"filtered_product_property": {
"filter": {
"bool": {
"must": []
}
},
"aggs": {
"aggs": {
"productProperty": {
"terms": {
"field": "productProperty",
"size" : 3
}
}
}
}
}
}
}
Important to point out, that terms aggregations are not the most accurate in some cases.
As mentioned by #Tushar you can use the size param. According to the ES official documentation
when there are lots of unique terms, Elasticsearch only returns the
top terms; this number is the sum of the document counts for all
buckets that are not part of the response
However, you can define the order in which the sorting of the results should be done of the aggregation response, using the order param.
By default, the result is sorted on the basis of doc count in descending order
Search Query will be
{
"aggs": {
"productProperty": {
"terms": {
"field": "productProperty.keyword",
"size": 3
}
}
}
}
And, search result would be
"aggregations": {
"productProperty": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "material",
"doc_count": 3
},
{
"key": "type",
"doc_count": 2
},
{
"key": "age_rating",
"doc_count": 1
}
]
}
}

Stats Aggregation with Min Mode in ElasticSearch

I have the below mapping in ElasticSearch
{
"properties":{
"Costs":{
"type":"nested",
"properties":{
"price":{
"type":"integer"
}
}
}
}
}
So every document has an Array field Costs, which contains many elements and each element has price in it. I want to find the min and max price with the condition being - that from each array the element with the minimum price should be considered. So it is basically min/max among the minimum value of each array.
Lets say I have 2 documents with the Costs field as
Costs: [
{
"price": 100,
},
{
"price": 200,
}
]
and
Costs: [
{
"price": 300,
},
{
"price": 400,
}
]
So I need to find the stats
This is the query I am currently using
{
"costs_stats":{
"nested":{
"path":"Costs"
},
"aggs":{
"price_stats_new":{
"stats":{
"field":"Costs.price"
}
}
}
}
}
And it gives me this:
"min" : 100,
"max" : 400
But I need to find stats after taking minimum elements of each array for consideration.
So this is what i need:
"min" : 100,
"max" : 300
Like we have a "mode" option in sort, is there something similar in stats aggregation also, or any other way of achieving this, maybe using a script or something. Please suggest. I am really stuck here.
Let me know if anything is required
Update 1:
Query for finding min/max among minimums
{
"_source":false,
"timeout":"5s",
"from":0,
"size":0,
"aggs":{
"price_1":{
"terms":{
"field":"id"
},
"aggs":{
"price_2":{
"nested":{
"path":"Costs"
},
"aggs":{
"filtered":{
"aggs":{
"price_3":{
"min":{
"field":"Costs.price"
}
}
},
"filter":{
"bool":{
"filter":{
"range":{
"Costs.price":{
"gte":100
}
}
}
}
}
}
}
}
}
},
"minValue":{
"min_bucket":{
"buckets_path":"price_1>price_2>filtered>price_3"
}
}
}
}
Only few buckets are coming and hence the min/max is coming among those, which is not correct. Is there any size limit.
One way to achieve your use case is to add one more field id, in each document. With the help of id field terms aggregation can be performed, and so buckets will be dynamically built - one per unique value.
Then, we can apply min aggregation, which will return the minimum value among numeric values extracted from the aggregated documents.
Adding a working example with index data, mapping, search query, and search result
Index Mapping:
{
"mappings": {
"properties": {
"Costs": {
"type": "nested"
}
}
}
}
Index Data:
{
"id":1,
"Costs": [
{
"price": 100
},
{
"price": 200
}
]
}
{
"id":2,
"Costs": [
{
"price": 300
},
{
"price": 400
}
]
}
Search Query:
{
"size": 0,
"aggs": {
"id_terms": {
"terms": {
"field": "id",
"size": 15 <-- note this
},
"aggs": {
"nested_entries": {
"nested": {
"path": "Costs"
},
"aggs": {
"min_position": {
"min": {
"field": "Costs.price"
}
}
}
}
}
}
}
}
Search Result:
"aggregations": {
"id_terms": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 1,
"doc_count": 1,
"nested_entries": {
"doc_count": 2,
"min_position": {
"value": 100.0
}
}
},
{
"key": 2,
"doc_count": 1,
"nested_entries": {
"doc_count": 2,
"min_position": {
"value": 300.0
}
}
}
]
}
Using stats aggregation also, it can be achieved (if you add one more field id that uniquely identifies your document)
{
"size": 0,
"aggs": {
"id_terms": {
"terms": {
"field": "id",
"size": 15 <-- note this
},
"aggs": {
"costs_stats": {
"nested": {
"path": "Costs"
},
"aggs": {
"price_stats_new": {
"stats": {
"field": "Costs.price"
}
}
}
}
}
}
}
}
Update 1:
To find the maximum value among those minimums (as seen in the above query), you can use max bucket aggregation
{
"size": 0,
"aggs": {
"id_terms": {
"terms": {
"field": "id",
"size": 15 <-- note this
},
"aggs": {
"nested_entries": {
"nested": {
"path": "Costs"
},
"aggs": {
"min_position": {
"min": {
"field": "Costs.price"
}
}
}
}
}
},
"maxValue": {
"max_bucket": {
"buckets_path": "id_terms>nested_entries>min_position"
}
}
}
}
Search Result:
"aggregations": {
"id_terms": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 1,
"doc_count": 1,
"nested_entries": {
"doc_count": 2,
"min_position": {
"value": 100.0
}
}
},
{
"key": 2,
"doc_count": 1,
"nested_entries": {
"doc_count": 2,
"min_position": {
"value": 300.0
}
}
}
]
},
"maxValue": {
"value": 300.0,
"keys": [
"2"
]
}
}

Filtering aggregation results

This question is a subquestion of this question. Posting as a separate question for attention.
Sample Docs:
{
"id":1,
"product":"p1",
"cat_ids":[1,2,3]
}
{
"id":2,
"product":"p2",
"cat_ids":[3,4,5]
}
{
"id":3,
"product":"p3",
"cat_ids":[4,5,6]
}
Ask: To get products belonging to a particular category. e.g cat_id = 3
Query:
GET product/_search
{
"size": 0,
"aggs": {
"cats": {
"terms": {
"field": "cats",
"size": 10
},"aggs": {
"products": {
"terms": {
"field": "name.keyword",
"size": 10
}
}
}
}
}
}
Question:
How to filter the aggregated result for cat_id = 3 here. I tried bucket_selector as well but it is not working.
Note: Due to multi-value of cat_ids filtering and then aggregation isn't working
You can filter values, on the basis of which buckets will be created.
It is possible to filter the values for which buckets will be created.
This can be done using the include and exclude parameters which are
based on regular expression strings or arrays of exact values.
Additionally, include clauses can filter using partition expressions.
Adding a working example with index data, search query, and search result
Index Data:
{
"id":1,
"product":"p1",
"cat_ids":[1,2,3]
}
{
"id":2,
"product":"p2",
"cat_ids":[3,4,5]
}
{
"id":3,
"product":"p3",
"cat_ids":[4,5,6]
}
Search Query:
{
"size": 0,
"aggs": {
"cats": {
"terms": {
"field": "cat_ids",
"include": [ <-- note this
3
]
},
"aggs": {
"products": {
"terms": {
"field": "product.keyword",
"size": 10
}
}
}
}
}
}
Search Result:
"aggregations": {
"cats": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 3,
"doc_count": 2,
"products": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "p1",
"doc_count": 1
},
{
"key": "p2",
"doc_count": 1
}
]
}
}
]
}

Elasticsearch nested cardinality aggregation

I have a mapping with nested schema, i am tring to do aggregation on nested field and order by docid count.
select name, count(distinct docid) as uniqueid from table
group by name
order by uniqueid desc
Above is what i am trying to do.
{
"size": 0,
"aggs": {
"samples": {
"nested": {
"path": "sample"
},
"aggs": {
"sample": {
"terms": {
"field": "sample.name",
"order": {
"DocCounts": "desc"
}
},
"aggs": {
"DocCounts": {
"cardinality": {
"field": "docid"
}
}
}
}
}
}
}
}
But in the result i am not getting the expected output
result:
"buckets": [
{
"key": "xxxxx",
"doc_count": 173256,
"DocCounts": {
"value": 0
}
},
{
"key": "yyyyy",
"doc_count": 63,
"DocCounts": {
"value": 0
}
}
]
i am getting the DocCounts = 0. This is not expected. What went wrong in my query.
I think your last nested aggregation is too much. Try to get rid of it:
{
"size": 0,
"aggs": {
"samples": {
"nested": {
"path": "sample"
},
"aggs": {
"sample": {
"terms": {
"field": "sample.name",
"order": {
"DocCounts": "desc"
}
},
"DocCounts": {
"cardinality": {
"field": "docid"
}
}
}
}
}
}
}
In general when doing some aggregation on nested type by value from upper scope, we observed that we need to put/copy the value from upper scope on nested type when storing document.
Then in your case aggregation would look like:
"aggs": {
"DocCounts": {
"cardinality": {
"field": "sample.docid"
}
}
}
It works in such case at least on version 1.7 of Elasticsearch.
You can use reverse nested aggregation on top of Cardinality aggregation on DocCounts. This is because when nested aggregation is applied, the query runs against the nested document. So to access any field of parent document inside nested doc, reverse nested aggregation can be used. Check ES Reference for more info on this.
Your cardinality query will look like:
"aggs": {
"internal_DocCounts": {
"reverse_nested": { },
"DocCounts": {
"cardinality": {
"field": "docid"
}
}
}
}
The response will look like:
"buckets": [
{
"key": "xxxxx",
"doc_count": 173256,
"internal_DocCounts": {
"doc_count": 173256,
"DocCounts": {
"value": <some_value>
}
}
},
{
"key": "yyyyy",
"doc_count": 63,
"internal_DocCounts": {
"doc_count": 63,
"DocCounts": {
"value": <some_value>
}
}
},
.....
Check this similar thread

Resources