ElasticSearch 2.1.0 - Deep 'children' aggregation with 'sum' metric returning empty results - elasticsearch

I have a hierarchy of document types two levels deep. The documents are related by parent-child relationships as follows: category > sub_category > item i.e. each sub_category has a _parent field referring to a category id, and each item has a _parent field referring to a sub_category id.
Each item has a price field. Given a query for categories, which includes conditions for sub-categories and items, I want to calculate a total price for each sub_category.
My query looks something like this:
{
"query": {
"has_child": {
"child_type": "sub_category",
"query": {
"has_child": {
"child_type": "item",
"query": {
"range": {
"price": {
"gte": 100,
"lte": 150
}
}
}
}
}
}
}
}
My aggregation to calculate the price for each sub-category looks like this:
{
"aggs": {
"categories": {
"terms": {
"field": "id"
},
"aggs": {
"sub_categories": {
"children": {
"type": "sub_category"
},
"aggs": {
"sub_category_ids": {
"terms": {
"field": "id"
},
"aggs": {
"items": {
"children": {
"type": "item"
},
"aggs": {
"price": {
"sum": {
"field": "price"
}
}
}
}
}
}
}
}
}
}
}
}
Despite the query response listing matching results, the aggregation response doesn't match any items:
{
"aggregations": {
"categories": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "category1",
"doc_count": 1,
"sub_categories": {
"doc_count": 3,
"sub_category_ids": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "subcat1",
"doc_count": 1,
"items": {
"doc_count": 0,
"price": {
"value": 0
}
}
},
{
"key": "subcat2",
"doc_count": 1,
"items": {
"doc_count": 0,
"price": {
"value": 0
}
}
},
{
"key": "subcat3",
"doc_count": 1,
"items": {
"doc_count": 0,
"price": {
"value": 0
}
}
}
]
}
}
}]
}
}
}
However, omitting the sub_category_ids aggregation does cause the items to appear and for prices to be summed at the level of the categories aggregation. I would expect including the sub_category_ids aggregation to simply change the level at which the prices are summed.
Am I misunderstanding how the aggregation is evaluated, and if so how could I modify it to display the summed prices for each sub-category?

I opened an issue #15413, regarding children aggregation as I and other folks were facing similar issues in ES 2.0
Apparently the problem according to ES developer #martijnvg was that
The children agg makes an assumption (that all segments are being seen by children agg) that was true in 1.x but not in 2.x
PR #15457 fixed this issue, again from #martijnvg
Before we only evaluated segments that yielded matches in parent aggs, which caused us to miss to evaluate child docs in segments we didn't have parent matches for.
The fix for this is stop remember in what segments we have matches for
and simply evaluate all segments. This makes the code simpler and we
can still quickly see if a segment doesn't hold child docs like we did
before
This pull request has been merged and it has also been back ported to the 2.x, 2.1 and 2.0 branches.

Related

Elasticsearch - Sum with group by

I couldn't apply the concept of chain aggregation... i need help with this scenario:
My documents look like this:
{
"date":"2019-01-30",
"value":1234.56,
"partnerId":9876
}
and i would like to filter by date (month) and summarize them by partner Id, and then count it, obtaining a result like:
{
"partnerId": 9876,
"totalValue": 12345567.87,
"count": 6574
}
How would this query look like?
What you are trying to achieve can be done by sub aggregation, in other words aggregation inside aggregation.
For your case first you want to group by parternId, so you will require terms aggregation on parternId field. Lets call this aggregation as partners. This will give you two values of your expected result, parternId and count.
Now for each of the groups (bucket) of partnerId, totalValue is required i.e. sum of value for each partnerId. This can be done by adding sum aggregation inside term aggregation partners. So the final query along with the filter for date (month) will be:
{
"query": {
"bool": {
"filter": {
"range": {
"date": {
"gte": "2019-01-01",
"lte": "2019-01-31"
}
}
}
}
},
"aggs": {
"partner": {
"terms": {
"field": "partnerId"
},
"aggs": {
"totalValue": {
"sum": {
"field": "value"
}
}
}
}
}
}
Sample Result (agg only):
"aggregations": {
"partner": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 9876,
"doc_count": 3,
"totalValue": {
"value": 3704.68017578125
}
},
{
"key": 9878,
"doc_count": 2,
"totalValue": {
"value": 2454.1201171875
}
}
]
}
In the result above key is partnerId, doc_count is count and totalValue.value is totalValue of your expected result.

Filter elasticsearch bucket aggregation based on term field

I have a list of products (deal entities) and I'm attempting to create a bucket aggregation by categories, ordered by the sum of available_stock.
This all works fine, but I want to exclude such categories from the resulting aggregation that don't have level set to 1 (In other words, I only want to keep aggregations on category where level IS 1).
I am aware that elasticsearch provides "exclude" and "include" parameters, but these only work on the same field I'm aggregating on (deal.category.id in this case)
This is my sample deal document:
{
"_source": {
"id": 392745,
"category": [
{
"id": 17575,
"level": 2
},
{
"id": 17574,
"level": 1
},
{
"id": 17572,
"level": 0
}
],
"stats": {
"available_stock": 500
}
}
}
And this would be the query:
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
}
},
"aggs": {
"mainAggregation": {
"terms": {
"field": "deal.category.id",
"order": {
"available_stock": "desc"
},
"size": 3
},
"aggs": {
"available_stock": {
"sum": {
"field": "deal.stats.available_stock"
}
}
}
}
},
"size": 0
}
And my resulting aggregation, sadly including category 17572 with level 0.
{
"aggregations": {
"mainAggregation": {
"buckets": [
{
"key": 17572,
"doc_count": 30,
"available_stock": {
"value": 24000
}
},
{
"key": 17598,
"doc_count": 10,
"available_stock": {
"value": 12000
}
},
{
"key": 17602,
"doc_count": 8,
"available_stock": {
"value": 6000
}
}
]
}
}
}
P.S.: Currently on ElasticSearch 1.6
Update 1: Still stuck on the problem after various experiments with various combimation of subaggregations.
I have found this impossible to solve and decided to go with two separate queries.

ElasticSearch calculate percentage for each bucket from total

I'm using ElasticSearch v5. I'm trying to do something similar described in Elasticsearch analytics percent where I have a terms aggregation and I want to calculate a percentage which is a value from each bucket over the total of all buckets. This is my request:
{
"query": {
"match_all": {}
},
"aggs": {
"periods": {
"terms": {
"field": "periods",
"size": 3
},
"aggs": {
"balance": {
"sum": {
"field": "balance"
}
}
}
},
"total_balance": {
"sum_bucket": {
"buckets_path": "periods>balance"
}
}
}
}
The result I get back this like this:
{
"aggregations": {
"periods": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 1018940846,
"buckets": [
{
"key": 1177977600000,
"doc_count": 11615418,
"balance": {
"value": 2492032741768.1616
}
},
{
"key": 1185926400000,
"doc_count": 11592425,
"balance": {
"value": 2575365325406.6533
}
},
{
"key": 1175385600000,
"doc_count": 11477402,
"balance": {
"value": 2456256695380.8306
}
}
]
},
"total_balance": {
"value": 7523654762555.645
}
}
}
How do I calculate "balance"/"total_balance" for each item in the bucket from ElasticSearch? I tried bucket script aggregation at the bucket (periods) level, but I cannot set my buckets_path to total_balance. This post https://discuss.elastic.co/t/combining-two-aggregations-to-get-term-percentage/22201 talks about using Significant Terms Aggregation, but I need calculation of using specific fields, not doc_count. I know I can do this as a simple calculation on the client side, but I would like to do this all together in ElasticSearch if possible.
No, you can't do that. By the time I'm writing this post, we're in version 6.1.
According to
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-pipeline.html#buckets-path-syntax,
there's only two major types of aggregations pipelines: parent and siblings.
So, in order to reference the total_balance aggregation from within the periods buckets, we should be able to reference an "uncle" aggregation from the buckets_path attribute, which is not possible.

How to get an Elasticsearch aggregation with multiple fields

I'm attempting to find related tags to the one currently being viewed. Every document in our index is tagged. Each tag is formed of two parts - an ID and text name:
{
...
meta: {
...
tags: [
{
id: 123,
name: 'Biscuits'
},
{
id: 456,
name: 'Cakes'
},
{
id: 789,
name: 'Breads'
}
]
}
}
To fetch the related tags I am simply querying the documents and getting an aggregate of their tags:
{
"query": {
"bool": {
"must": [
{
"match": {
"item.meta.tags.id": "123"
}
},
{
...
}
]
}
},
"aggs": {
"baked_goods": {
"terms": {
"field": "item.meta.tags.id",
"min_doc_count": 2
}
}
}
}
This works perfectly, I am getting the results I want. However, I require both the tag ID and name to do anything useful. I have explored how to accomplish this, the solutions seem to be:
Combine the fields when indexing
A script to munge together the fields
A nested aggregation
Option one and two are are not available to me so I have been going with 3 but it's not responding in an expected manner. Given the following query (still searching for documents also tagged with 'Biscuits'):
{
...
"aggs": {
"baked_goods": {
"terms": {
"field": "item.meta.tags.id",
"min_doc_count": 2
},
"aggs": {
"name": {
"terms": {
"field": "item.meta.tags.name"
}
}
}
}
}
}
I will get this result:
{
...
"aggregations": {
"baked_goods": {
"buckets": [
{
"key": "456",
"doc_count": 11,
"name": {
"buckets": [
{
"key": "Biscuits",
"doc_count": 11
},
{
"key": "Cakes",
"doc_count": 11
}
]
}
}
]
}
}
}
The nested aggregation includes both the search term and the tag I'm after (returned in alphabetical order).
I have tried to mitigate this by adding an exclude to the nested aggregation but this slowed the query down far too much (around 100 times for 500000 docs). So far the fastest solution is to de-dupe the result manually.
What is the best way to get an aggregation of tags with both the tag ID and tag name in the response?
Thanks for making it this far!
By the looks of it, your tags is not nested.
For this aggregation to work, you need it nested so that there is an association between an id and a name. Without nested the list of ids is just an array and the list of names is another array:
"item": {
"properties": {
"meta": {
"properties": {
"tags": {
"type": "nested", <-- nested field
"include_in_parent": true, <-- to, also, keep the flat array-like structure
"properties": {
"id": {
"type": "integer"
},
"name": {
"type": "string"
}
}
}
}
}
}
}
Also, note that I've added to the mapping this line "include_in_parent": true which means that your nested tags will, also, behave like a "flat" array-like structure.
So, everything you had so far in your queries will still work without any changes to the queries.
But, for this particular query of yours, the aggregation needs to change to something like this:
{
"aggs": {
"baked_goods": {
"nested": {
"path": "item.meta.tags"
},
"aggs": {
"name": {
"terms": {
"field": "item.meta.tags.id"
},
"aggs": {
"name": {
"terms": {
"field": "item.meta.tags.name"
}
}
}
}
}
}
}
}
And the result is like this:
"aggregations": {
"baked_goods": {
"doc_count": 9,
"name": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 123,
"doc_count": 3,
"name": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "biscuits",
"doc_count": 3
}
]
}
},
{
"key": 456,
"doc_count": 2,
"name": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "cakes",
"doc_count": 2
}
]
}
},
.....

ElasticSearch aggregation function

Is that a possible to define an aggregation function in elastic search?
E.g. for data:
author weekday status
me monday ok
me tuesday ok
me moday bad
I want to get an aggregation based on author and weekday, and as a value I want to get concatenation of status field:
agg1 agg2 value
me monday ok,bad
me tuesday ok
I know you can do count, but is that possible to define another function used for aggregation?
EDIT/ANSWER: Looks like there is no multirow aggregation support in ES, thus we had to use subaggregations on last field (see Akshay's example). If you need to have more complex aggregation function, then aggregate by id (note, you won't be able to use _id, so you'll have to duplicate it in other field) - that way you'll be able to do advanced aggregation on individual items in each bucket.
You can get get roughly what you want by using sub aggregations available in 1.0. Assuming the documents are structured as author, weekday and status, you could using the aggregation below:
{
"size": 0,
"aggs": {
"author": {
"terms": {
"field": "author"
},
"aggs": {
"days": {
"terms": {
"field": "weekday"
},
"aggs": {
"status": {
"terms": {
"field": "status"
}
}
}
}
}
}
}
}
Which gives you the following result:
{
...
"aggregations": {
"author": {
"buckets": [
{
"key": "me",
"doc_count": 3,
"days": {
"buckets": [
{
"key": "monday",
"doc_count": 2,
"status": {
"buckets": [
{
"key": "bad",
"doc_count": 1
},
{
"key": "ok",
"doc_count": 1
}
]
}
},
{
"key": "tuesday",
"doc_count": 1,
"status": {
"buckets": [
{
"key": "ok",
"doc_count": 1
}
]
}
}
]
}
}
]
}
}
}

Resources