ElasticSearch calculate percentage for each bucket from total - elasticsearch

I'm using ElasticSearch v5. I'm trying to do something similar described in Elasticsearch analytics percent where I have a terms aggregation and I want to calculate a percentage which is a value from each bucket over the total of all buckets. This is my request:
{
"query": {
"match_all": {}
},
"aggs": {
"periods": {
"terms": {
"field": "periods",
"size": 3
},
"aggs": {
"balance": {
"sum": {
"field": "balance"
}
}
}
},
"total_balance": {
"sum_bucket": {
"buckets_path": "periods>balance"
}
}
}
}
The result I get back this like this:
{
"aggregations": {
"periods": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 1018940846,
"buckets": [
{
"key": 1177977600000,
"doc_count": 11615418,
"balance": {
"value": 2492032741768.1616
}
},
{
"key": 1185926400000,
"doc_count": 11592425,
"balance": {
"value": 2575365325406.6533
}
},
{
"key": 1175385600000,
"doc_count": 11477402,
"balance": {
"value": 2456256695380.8306
}
}
]
},
"total_balance": {
"value": 7523654762555.645
}
}
}
How do I calculate "balance"/"total_balance" for each item in the bucket from ElasticSearch? I tried bucket script aggregation at the bucket (periods) level, but I cannot set my buckets_path to total_balance. This post https://discuss.elastic.co/t/combining-two-aggregations-to-get-term-percentage/22201 talks about using Significant Terms Aggregation, but I need calculation of using specific fields, not doc_count. I know I can do this as a simple calculation on the client side, but I would like to do this all together in ElasticSearch if possible.

No, you can't do that. By the time I'm writing this post, we're in version 6.1.
According to
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-pipeline.html#buckets-path-syntax,
there's only two major types of aggregations pipelines: parent and siblings.
So, in order to reference the total_balance aggregation from within the periods buckets, we should be able to reference an "uncle" aggregation from the buckets_path attribute, which is not possible.

Related

Filter elasticsearch bucket aggregation based on term field

I have a list of products (deal entities) and I'm attempting to create a bucket aggregation by categories, ordered by the sum of available_stock.
This all works fine, but I want to exclude such categories from the resulting aggregation that don't have level set to 1 (In other words, I only want to keep aggregations on category where level IS 1).
I am aware that elasticsearch provides "exclude" and "include" parameters, but these only work on the same field I'm aggregating on (deal.category.id in this case)
This is my sample deal document:
{
"_source": {
"id": 392745,
"category": [
{
"id": 17575,
"level": 2
},
{
"id": 17574,
"level": 1
},
{
"id": 17572,
"level": 0
}
],
"stats": {
"available_stock": 500
}
}
}
And this would be the query:
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
}
},
"aggs": {
"mainAggregation": {
"terms": {
"field": "deal.category.id",
"order": {
"available_stock": "desc"
},
"size": 3
},
"aggs": {
"available_stock": {
"sum": {
"field": "deal.stats.available_stock"
}
}
}
}
},
"size": 0
}
And my resulting aggregation, sadly including category 17572 with level 0.
{
"aggregations": {
"mainAggregation": {
"buckets": [
{
"key": 17572,
"doc_count": 30,
"available_stock": {
"value": 24000
}
},
{
"key": 17598,
"doc_count": 10,
"available_stock": {
"value": 12000
}
},
{
"key": 17602,
"doc_count": 8,
"available_stock": {
"value": 6000
}
}
]
}
}
}
P.S.: Currently on ElasticSearch 1.6
Update 1: Still stuck on the problem after various experiments with various combimation of subaggregations.
I have found this impossible to solve and decided to go with two separate queries.

Getting description when aggregating with Elasticsearch

When we use the aggregation feature on elastic, we get a value of the field we aggregating back but we also want to get the description of that field. We have to use the sector.id as other parts of our api uses it later on.
For ex: our data looks like this:
[{
"id":"123"
"sectors":[{
"id":"sector-1",
"name":"Automotive"
}]
},
{
"id":"123"
"sectors":[{
"id":"sector-2",
"name":"Biology"
}]
}]
When we aggregate over sectors.id our response looks like:
"aggregations": {
"sector": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "sector-2",
"doc_count": 19672
},
{
"key": "sector-1",
"doc_count": 11699
}]
}
}
Is there any way to get sectors.name as well as the key in the results?
It seems like that sectors should a nested field. Now assuming that sector name is unique per sector-id.
You may use sub-aggregations to figure out the related keys
GET _search
{
"size": 0,
"aggs": {
"sectors": {
"nested": {
"path": "sectors"
},
"aggs": {
"sector_id": {
"terms": {
"field": "sectors.id"
},
"aggs": {
"sector_name": {
"terms": {
"field": "sectors.name"
}
}
}
}
}
}
}
}

ElasticSearch 2.1.0 - Deep 'children' aggregation with 'sum' metric returning empty results

I have a hierarchy of document types two levels deep. The documents are related by parent-child relationships as follows: category > sub_category > item i.e. each sub_category has a _parent field referring to a category id, and each item has a _parent field referring to a sub_category id.
Each item has a price field. Given a query for categories, which includes conditions for sub-categories and items, I want to calculate a total price for each sub_category.
My query looks something like this:
{
"query": {
"has_child": {
"child_type": "sub_category",
"query": {
"has_child": {
"child_type": "item",
"query": {
"range": {
"price": {
"gte": 100,
"lte": 150
}
}
}
}
}
}
}
}
My aggregation to calculate the price for each sub-category looks like this:
{
"aggs": {
"categories": {
"terms": {
"field": "id"
},
"aggs": {
"sub_categories": {
"children": {
"type": "sub_category"
},
"aggs": {
"sub_category_ids": {
"terms": {
"field": "id"
},
"aggs": {
"items": {
"children": {
"type": "item"
},
"aggs": {
"price": {
"sum": {
"field": "price"
}
}
}
}
}
}
}
}
}
}
}
}
Despite the query response listing matching results, the aggregation response doesn't match any items:
{
"aggregations": {
"categories": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "category1",
"doc_count": 1,
"sub_categories": {
"doc_count": 3,
"sub_category_ids": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "subcat1",
"doc_count": 1,
"items": {
"doc_count": 0,
"price": {
"value": 0
}
}
},
{
"key": "subcat2",
"doc_count": 1,
"items": {
"doc_count": 0,
"price": {
"value": 0
}
}
},
{
"key": "subcat3",
"doc_count": 1,
"items": {
"doc_count": 0,
"price": {
"value": 0
}
}
}
]
}
}
}]
}
}
}
However, omitting the sub_category_ids aggregation does cause the items to appear and for prices to be summed at the level of the categories aggregation. I would expect including the sub_category_ids aggregation to simply change the level at which the prices are summed.
Am I misunderstanding how the aggregation is evaluated, and if so how could I modify it to display the summed prices for each sub-category?
I opened an issue #15413, regarding children aggregation as I and other folks were facing similar issues in ES 2.0
Apparently the problem according to ES developer #martijnvg was that
The children agg makes an assumption (that all segments are being seen by children agg) that was true in 1.x but not in 2.x
PR #15457 fixed this issue, again from #martijnvg
Before we only evaluated segments that yielded matches in parent aggs, which caused us to miss to evaluate child docs in segments we didn't have parent matches for.
The fix for this is stop remember in what segments we have matches for
and simply evaluate all segments. This makes the code simpler and we
can still quickly see if a segment doesn't hold child docs like we did
before
This pull request has been merged and it has also been back ported to the 2.x, 2.1 and 2.0 branches.

Elasticsearch: how to scope aggregations to your query and filter?

I have been playing around with elasticsearch query and filter for some time now but never worked with aggregations before. The idea that we can scope the aggregations with our query seems quite amazing to me but I want to understand how to do it properly so that I do not make any mistakes. Currently all my search queries are designed this way:
{
"query": {
},
"filter": {
},
"from": 0,
"size": 60
}
Now, when I added some aggregation buckets, the structure became this:
{
"aggs": {
"all_colors": {
"terms": {
"field": "color.name"
}
},
"all_brands": {
"terms": {
"field": "brand_slug"
}
},
"all_sizes": {
"terms": {
"field": "sizes"
}
}
},
"query": {
},
"filter": {
},
"from": 0,
"size": 60
}
However, the results of the aggregation are always the same irrespective of what info I provide in filter.
Now, when I changed the query structure to something like this, it started showing different results:
{
"aggs": {
"all_colors": {
"terms": {
"field": "color.name"
}
},
"all_brands": {
"terms": {
"field": "brand_slug"
}
},
"all_sizes": {
"terms": {
"field": "sizes"
}
}
},
"query": {
"filtered": {
"query": {
},
"filter": {
}
}
},
"from": 0,
"size": 60
}
Does it mean I will have to change the structure of my search queries everywhere to this new filtered type of structure ? Is there any other workaround which allows me to achieve desired results without having to change that much of code ?
Also, another thing I observed is that if my brand_slug field contains multiple keywords like "peter england", then both of these are returned in separate buckets like this:
{
"buckets": [
{
"key": "england",
"doc_count": 368
},
{
"key": "peter",
"doc_count": 368
}
]
}
How can I ensure that both these end up in a same bucket like this:
{
"buckets": [
{
"key": "peter england",
"doc_count": 368
}
]
}
UPDATE: This second part I have been able to accomplish by indexing brand, color and sizes differently like this:
"sizes": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
What you've noticed is by design. Have a look at my answer to a similar question on SO. Basically, input to both aggregation and filter sections is the output of query section. Filtered Query as you've suggested would be the best way to achieve the results you desire. There is another way too. You can use Filter Aggregation. Then you would not need to change your query and filter sections but simply copy the filter section inside the aggregation sections but that in my opinion would be an overkill and a violation of the DRY principle in general.

ElasticSearch aggregation function

Is that a possible to define an aggregation function in elastic search?
E.g. for data:
author weekday status
me monday ok
me tuesday ok
me moday bad
I want to get an aggregation based on author and weekday, and as a value I want to get concatenation of status field:
agg1 agg2 value
me monday ok,bad
me tuesday ok
I know you can do count, but is that possible to define another function used for aggregation?
EDIT/ANSWER: Looks like there is no multirow aggregation support in ES, thus we had to use subaggregations on last field (see Akshay's example). If you need to have more complex aggregation function, then aggregate by id (note, you won't be able to use _id, so you'll have to duplicate it in other field) - that way you'll be able to do advanced aggregation on individual items in each bucket.
You can get get roughly what you want by using sub aggregations available in 1.0. Assuming the documents are structured as author, weekday and status, you could using the aggregation below:
{
"size": 0,
"aggs": {
"author": {
"terms": {
"field": "author"
},
"aggs": {
"days": {
"terms": {
"field": "weekday"
},
"aggs": {
"status": {
"terms": {
"field": "status"
}
}
}
}
}
}
}
}
Which gives you the following result:
{
...
"aggregations": {
"author": {
"buckets": [
{
"key": "me",
"doc_count": 3,
"days": {
"buckets": [
{
"key": "monday",
"doc_count": 2,
"status": {
"buckets": [
{
"key": "bad",
"doc_count": 1
},
{
"key": "ok",
"doc_count": 1
}
]
}
},
{
"key": "tuesday",
"doc_count": 1,
"status": {
"buckets": [
{
"key": "ok",
"doc_count": 1
}
]
}
}
]
}
}
]
}
}
}

Resources