Elasticsearch - Sum with group by - elasticsearch

I couldn't apply the concept of chain aggregation... i need help with this scenario:
My documents look like this:
{
"date":"2019-01-30",
"value":1234.56,
"partnerId":9876
}
and i would like to filter by date (month) and summarize them by partner Id, and then count it, obtaining a result like:
{
"partnerId": 9876,
"totalValue": 12345567.87,
"count": 6574
}
How would this query look like?

What you are trying to achieve can be done by sub aggregation, in other words aggregation inside aggregation.
For your case first you want to group by parternId, so you will require terms aggregation on parternId field. Lets call this aggregation as partners. This will give you two values of your expected result, parternId and count.
Now for each of the groups (bucket) of partnerId, totalValue is required i.e. sum of value for each partnerId. This can be done by adding sum aggregation inside term aggregation partners. So the final query along with the filter for date (month) will be:
{
"query": {
"bool": {
"filter": {
"range": {
"date": {
"gte": "2019-01-01",
"lte": "2019-01-31"
}
}
}
}
},
"aggs": {
"partner": {
"terms": {
"field": "partnerId"
},
"aggs": {
"totalValue": {
"sum": {
"field": "value"
}
}
}
}
}
}
Sample Result (agg only):
"aggregations": {
"partner": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 9876,
"doc_count": 3,
"totalValue": {
"value": 3704.68017578125
}
},
{
"key": 9878,
"doc_count": 2,
"totalValue": {
"value": 2454.1201171875
}
}
]
}
In the result above key is partnerId, doc_count is count and totalValue.value is totalValue of your expected result.

Related

Elastic-search aggregate top 3 common result

My indexed data is of below structure, i want to aggregate top 3 most repeted productProperty, so top 3 most repeated productProperty will be there in aggregation result
[
{
productProperty: "material",
productValue:[{value: wood},{value: plastic}] ,
},
{
productProperty: "material",
productValuea:[{value: wood},{value: plastic}] ,
},
{
productProperty: "type",
productValue:[{value: 26A},{value: 23A}] ,
},
{
productProperty: "type",
productValue:[{value: 22B},{value: 90C}] ,
},
{
productProperty: "material",
productValue:[{value: wood},{value: plastic}] ,
},
{
productProperty: "age_rating",
productValue:[{value: 18},{value: 13}] ,
}
]
Below query aggregates all based on productProperty but how can i get top 3 results out of that
{
"query": {},
"aggs": {
"filtered_product_property": {
"filter": {
"bool": {
"must": []
}
},
"aggs": {
"aggs": {
"productProperty": {
"terms": {
"field": "productProperty"
}
}
}
}
}
}
}
You can use the size parameter in your term aggregation.
{
"query": {},
"aggs": {
"filtered_product_property": {
"filter": {
"bool": {
"must": []
}
},
"aggs": {
"aggs": {
"productProperty": {
"terms": {
"field": "productProperty",
"size" : 3
}
}
}
}
}
}
}
Important to point out, that terms aggregations are not the most accurate in some cases.
As mentioned by #Tushar you can use the size param. According to the ES official documentation
when there are lots of unique terms, Elasticsearch only returns the
top terms; this number is the sum of the document counts for all
buckets that are not part of the response
However, you can define the order in which the sorting of the results should be done of the aggregation response, using the order param.
By default, the result is sorted on the basis of doc count in descending order
Search Query will be
{
"aggs": {
"productProperty": {
"terms": {
"field": "productProperty.keyword",
"size": 3
}
}
}
}
And, search result would be
"aggregations": {
"productProperty": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "material",
"doc_count": 3
},
{
"key": "type",
"doc_count": 2
},
{
"key": "age_rating",
"doc_count": 1
}
]
}
}

Filtering aggregation results

This question is a subquestion of this question. Posting as a separate question for attention.
Sample Docs:
{
"id":1,
"product":"p1",
"cat_ids":[1,2,3]
}
{
"id":2,
"product":"p2",
"cat_ids":[3,4,5]
}
{
"id":3,
"product":"p3",
"cat_ids":[4,5,6]
}
Ask: To get products belonging to a particular category. e.g cat_id = 3
Query:
GET product/_search
{
"size": 0,
"aggs": {
"cats": {
"terms": {
"field": "cats",
"size": 10
},"aggs": {
"products": {
"terms": {
"field": "name.keyword",
"size": 10
}
}
}
}
}
}
Question:
How to filter the aggregated result for cat_id = 3 here. I tried bucket_selector as well but it is not working.
Note: Due to multi-value of cat_ids filtering and then aggregation isn't working
You can filter values, on the basis of which buckets will be created.
It is possible to filter the values for which buckets will be created.
This can be done using the include and exclude parameters which are
based on regular expression strings or arrays of exact values.
Additionally, include clauses can filter using partition expressions.
Adding a working example with index data, search query, and search result
Index Data:
{
"id":1,
"product":"p1",
"cat_ids":[1,2,3]
}
{
"id":2,
"product":"p2",
"cat_ids":[3,4,5]
}
{
"id":3,
"product":"p3",
"cat_ids":[4,5,6]
}
Search Query:
{
"size": 0,
"aggs": {
"cats": {
"terms": {
"field": "cat_ids",
"include": [ <-- note this
3
]
},
"aggs": {
"products": {
"terms": {
"field": "product.keyword",
"size": 10
}
}
}
}
}
}
Search Result:
"aggregations": {
"cats": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 3,
"doc_count": 2,
"products": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "p1",
"doc_count": 1
},
{
"key": "p2",
"doc_count": 1
}
]
}
}
]
}

Elasticserach filter on aggregated results (SQL HAVING)

I have an ES query that aggregates data from a monitoring tool.
Currently, I've found the number of documents in each relevant group (by "externalId").
Now, I wish to filter the results by the number of records in each group.
(Similar to "HAVING" clause in SQL, doc_count > 0)
For instance, to find the "externalId" that stored more then one time.
This is my ES query:
{
"query":
{
"match" :
{
"method" : "METHOD_NAME"
}
},
"size":0,
"aggs":
{
"group_by_external_id":
{
"terms":
{
"field": "externalId"
}
}
}
}
The results looks like this:
"aggregations": {
"group_by_external_id": {
"doc_count_error_upper_bound": 5,
"sum_other_doc_count": 53056,
"buckets": [
{
"key": "6088417651626873",
"doc_count": 1
},
{
"key": "6088417688232882",
"doc_count": 1
}
Terms aggregations have a min_doc_count option you can use. For example,
"aggs":
{
"group_by_external_id":
{
"terms":
{
"field": "externalId",
"min_doc_count": 2
}
}
}

ElasticSearch 2.1.0 - Deep 'children' aggregation with 'sum' metric returning empty results

I have a hierarchy of document types two levels deep. The documents are related by parent-child relationships as follows: category > sub_category > item i.e. each sub_category has a _parent field referring to a category id, and each item has a _parent field referring to a sub_category id.
Each item has a price field. Given a query for categories, which includes conditions for sub-categories and items, I want to calculate a total price for each sub_category.
My query looks something like this:
{
"query": {
"has_child": {
"child_type": "sub_category",
"query": {
"has_child": {
"child_type": "item",
"query": {
"range": {
"price": {
"gte": 100,
"lte": 150
}
}
}
}
}
}
}
}
My aggregation to calculate the price for each sub-category looks like this:
{
"aggs": {
"categories": {
"terms": {
"field": "id"
},
"aggs": {
"sub_categories": {
"children": {
"type": "sub_category"
},
"aggs": {
"sub_category_ids": {
"terms": {
"field": "id"
},
"aggs": {
"items": {
"children": {
"type": "item"
},
"aggs": {
"price": {
"sum": {
"field": "price"
}
}
}
}
}
}
}
}
}
}
}
}
Despite the query response listing matching results, the aggregation response doesn't match any items:
{
"aggregations": {
"categories": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "category1",
"doc_count": 1,
"sub_categories": {
"doc_count": 3,
"sub_category_ids": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "subcat1",
"doc_count": 1,
"items": {
"doc_count": 0,
"price": {
"value": 0
}
}
},
{
"key": "subcat2",
"doc_count": 1,
"items": {
"doc_count": 0,
"price": {
"value": 0
}
}
},
{
"key": "subcat3",
"doc_count": 1,
"items": {
"doc_count": 0,
"price": {
"value": 0
}
}
}
]
}
}
}]
}
}
}
However, omitting the sub_category_ids aggregation does cause the items to appear and for prices to be summed at the level of the categories aggregation. I would expect including the sub_category_ids aggregation to simply change the level at which the prices are summed.
Am I misunderstanding how the aggregation is evaluated, and if so how could I modify it to display the summed prices for each sub-category?
I opened an issue #15413, regarding children aggregation as I and other folks were facing similar issues in ES 2.0
Apparently the problem according to ES developer #martijnvg was that
The children agg makes an assumption (that all segments are being seen by children agg) that was true in 1.x but not in 2.x
PR #15457 fixed this issue, again from #martijnvg
Before we only evaluated segments that yielded matches in parent aggs, which caused us to miss to evaluate child docs in segments we didn't have parent matches for.
The fix for this is stop remember in what segments we have matches for
and simply evaluate all segments. This makes the code simpler and we
can still quickly see if a segment doesn't hold child docs like we did
before
This pull request has been merged and it has also been back ported to the 2.x, 2.1 and 2.0 branches.

How to use ElasticSearch to bucket historical data from midnight to now?

So I have an index with timestamps in the following format:
2015-03-20T12:00:00+0500
What I would like to do in the SQL equivalent is the following:
select date(timestamp), sum(orders)
from data
where time(timestamp) < time(now)
group by date(timestamp)
I know I need an aggregation but, for now, I've tried a basic search query below but I'm getting a malformed error:
{
"size": 0,
"query":
{
"filtered":
{
"query":
{
"match_all" : {}
},
"filter":
{
"range":
{
"#timestamp":
{
"from": "00:00:01.000",
"to": "15:00:00.000"
}
}
}
}
}
}
You do indeed want an aggregation, specifically the date histogram aggregation. Something like
{
"query": {"match_all": {}},
"aggs": {
"by_date": {
"date_histogram": {
"field": "timestamp",
"interval": "day"
},
"aggs": {
"order_sum": {
"sum": {"field": "foo"}
}
}
}
}
}
First you have a bucketing aggregation that groups your documents by date, then inside that a metric aggregation that computes a value (in this case a sum) for each bucket
which would return data of the form
{
...
"aggregations": {
"by_date": {
"buckets": [
{
"key_as_string": "2015-03-01T00:00:00.000Z",
"key": 1425168000000,
"doc_count": 8644,
"order_sum": {
"value": 1234
}
},
{
"key_as_string": "2015-03-02T00:00:00.000Z",
"key": 1425254400000,
"doc_count": 8819,
"order_sum": {
"value": 45678
}
},
...
]
}
}
}
There is a good intro to aggregations on the elasticsearch blog (part 1 and part 2) if you want to do some more reading.

Resources