What should be the bucket path for nested term aggregation? - elasticsearch

I want to do pipeline aggregation on my elasticsearch aggregation. Here is my query body
{
"aggs": {
"user_info": {
"terms": {
"field": "user_id"
},
"aggs": {
"product_info": {
"terms": {
"field": "product_id"
},
"aggs": {
"total_item_price": {
"sum": {
"field": "selling_price"
}
}
}
}
}
},
"price_percentile": {
"percentiles_bucket": {
"buckets_path": "user_info.product_info.total_item_price"
}
}
}
}
This is giving me error that
No aggregation found for path [user_info.product_info.total_item_price]
What should be the path for bucket if such nested aggregation is there? Or is it not possible to find percentiles for such bucket arrangement in elasticsearch.
P.S I am using elasticsearch 6.5

#jzzfs answer is also somewhat right. I approached it in a different way. I reversed my aggregations and it fulfilled my use case. But in general, you can't do nested bucket percentiles for now.
{
"aggs": {
"product_info": {
"terms": {
"field": "product_id"
},
"aggs": {
"user_info": {
"terms": {
"field": "user_id"
},
"aggs": {
"total_item_price": {
"sum": {
"field": "selling_price"
}
}
}
},
"pb": {
"percentiles_bucket": {
"buckets_path": "user_info>total_item_price"
}
}
}
}
}
}

First, don't use dots in the path -- use > instead:
GET stack/_search
{
"aggs": {
"user_info": {
"terms": {
"field": "user_id"
},
"aggs": {
"product_info": {
"terms": {
"field": "product_id"
},
"aggs": {
"total_item_price": {
"sum": {
"field": "selling_price"
}
}
}
}
}
},
"pb": {
"percentiles_bucket": {
"buckets_path": "user_info>product_info>total_item_price"
}
}
}
}
which yields "buckets_path must reference either a number value or a single value numeric metric aggregation, got: [Object[]] at aggregation [product_info]" so it's not gonna work.
Here are our options:
Aggregate globally but just under the bucketed product info (without the users):
GET stack/_search
{
"aggs": {
"product_info": {
"terms": {
"field": "product_id"
},
"aggs": {
"total_item_price": {
"sum": {
"field": "selling_price"
}
}
}
},
"pb": {
"percentiles_bucket": {
"buckets_path": "product_info>total_item_price"
}
}
}
}
Use filtered aggregations in order to mimic the original intent:
GET stack/_search
{
"aggs": {
"user_123": { <-- keeping the agg name consistent w/ the filter
"filter": {
"term": {
"user_id": 123 <-- actual filter
}
},
"aggs": {
"product_info": {
"terms": {
"field": "product_id"
},
"aggs": {
"total_item_price": {
"sum": {
"field": "selling_price"
}
}
}
},
"pb": {
"percentiles_bucket": {
"buckets_path": "product_info>total_item_price"
}
}
}
}
}
}
You can then have as many user_xyz subaggregations as you like -- provided you gather their IDs beforehand.

Related

How to define percentage of result items with specific field in Elasticsearch query?

I have a search query that returns all items matching users that have type manager or lead.
{
"from": 0,
"size": 20,
"query": {
"bool": {
"should": [
{
"terms": {
"type": ["manager", "lead"]
}
}
]
}
}
}
Is there a way to define what percentage of the results should be of type "manager"?
In other words, I want the results to have 80% of users with type manager and 20% with type lead.
I want to make a suggestion to use bucket_path aggregation. As I know this aggregation needs to be run in sub-aggs of a histogram aggregation. As you have such field in your mapping so I think this query should work for you:
{
"size": 0,
"aggs": {
"NAME": {
"date_histogram": {
"field": "my_datetime",
"interval": "month"
},
"aggs": {
"role_type": {
"terms": {
"field": "type",
"size": 10
},
"aggs": {
"count": {
"value_count": {
"field": "_id"
}
}
}
},
"role_1_ratio": {
"bucket_script": {
"buckets_path": {
"role_1": "role_type['manager']>count",
"role_2": "role_type['lead']>count"
},
"script": "params.role_1 / (params.role_1+params.role_2)*100"
}
},
"role_2_ratio": {
"bucket_script": {
"buckets_path": {
"role_1": "role_type['manager']>count",
"role_2": "role_type['lead']>count"
},
"script": "params.role_2 / (params.role_1+params.role_2)*100"
}
}
}
}
}
}
Please let me know if it didn't work well for you.

How can I use the parent's sibling bucket path in bucket_script?

I want to use the parent's sibling bucket path in bucket-script,the DSL like this:
{
"size": 0,
"aggs": {
"car_type": {
"terms": {
"field": "screenName",
"size": 10
},
"aggs": {
"active_num": {
"terms": {
"field": "activeNum",
"size": 10
},
"aggs": {
"active_count": {
"value_count": {
"field": "activeNum"
}
},
"result" : {
"bucket_script": {
"buckets_path": {
"count1" : "car_type>all_count",
"count2" : "active_count"
},
"script": "params.count2/params.count1"
}
}
}
},
"all_count": {
"value_count": {
"field": "activeNum"
}
}
}
}
}
}
I want to use all_count in result, but es will throw Exception:
No aggregation found for path [car_type>all_count]
Then,I change the place to use bucket_script,like this:
{
"size": 0,
"aggs": {
"car_type": {
"terms": {
"field": "screenName",
"size": 10
},
"aggs": {
"active_num": {
"terms": {
"field": "activeNum",
"size": 10
},
"aggs": {
"active_count": {
"value_count": {
"field": "activeNum"
}
}
}
},
"all_count": {
"value_count": {
"field": "activeNum"
}
},
"result" : {
"bucket_script": {
"buckets_path": {
"count1" : "all_count",
"count2" : "active_num>active_count"
},
"script": "params.count2/params.count1"
}
}
}
}
}
}
but I get another Exception:
buckets_path must reference either a number value or a single value numeric metric aggregation, got: java.lang.Object[]
I have found the official website page, but I get nothing.
How can I use this bucket_path?
Using the following method can resolve some problems:
{
"size": 0,
"aggs": {
"car_type": {
"terms": {
"field": "screenName",
"size": 10
},
"aggs": {
"active_num": {
"filter": {
"term": {
"activeNum": "1"
}
},
"aggs": {
"active_count": {
"value_count": {
"field": "activeNum"
}
}
}
},
"all_count": {
"value_count": {
"field": "activeNum"
}
},
"result" : {
"bucket_script": {
"buckets_path": {
"count1" : "all_count",
"count2" : "active_num>active_count"
},
"script": "params.count2/params.count1"
}
}
}
}
}
}
But there still has some problems,this method can only get one value's result,can not get every value of this field.
So, who has any other ideas?

How to display only the key from the bucket

I have an index with millions of documents. Suppose each of my documents has some code, and I need to find the list of codes matching some criteria. The only way I found doing that, is using whole lot of aggregations, so I created an ugly query which does exactly what I want:
POST my-index/_search
{
"query": {
"range": {
"timestamp": {
"gte": "2017-08-01T00:00:00.000",
"lt": "2017-08-08T00:00:00.000"
}
}
},
"size": 0,
"aggs": {
"codes": {
"terms": {
"field": "code",
"size": 10000
},
"aggs": {
"days": {
"date_histogram": {
"field": "timestamp",
"interval": "day",
"format": "dd"
},
"aggs": {
"hours": {
"date_histogram": {
"field": "timestamp",
"interval": "hour",
"format": "yyyy-MM-dd:HH"
},
"aggs": {
"hour_income": {
"sum": {
"field": "price"
}
}
}
},
"max_income": {
"max_bucket": {
"buckets_path": "hours>hour_income"
}
},
"day_income": {
"sum_bucket": {
"buckets_path": "hours.hour_income"
}
},
"more_than_sixty_percent": {
"bucket_script": {
"buckets_path": {
"dayIncome": "day_income",
"maxIncome": "max_income"
},
"script": "params.maxIncome - params.dayIncome * 60 / 100 > 0 ? 1 : 0"
}
}
}
},
"amount_of_days": {
"sum_bucket": {
"buckets_path": "days.more_than_sixty_percent"
}
},
"bucket_filter": {
"bucket_selector": {
"buckets_path": {
"amountOfDays": "amount_of_days"
},
"script": "params.amountOfDays >= 3"
}
}
}
}
}
}
The response I get is a few millions lines of JSON, consisting of buckets. Each bucket has more than 700 lines (and buckets of its own), but all I need is its key, so that I have my list of codes. I guess it's not good having a response a few thousand times larger than neccessary, and there might be problems with parsing. So I wanted to ask, is there any way to hide the other info in the bucket and get only the keys?
Thanks.

It's possible to build an ELS 'bucket_selector' query with bodybuilder?

Using 'bodybuilder', is it possible to get a 'bucket_selector' aggregation.
I'm trying to get something like this:
"aggs": {
"aggname1": {
"terms": {
"field": "field1",
"size": 5
},
"aggs": {
"aggname2": {
"avg": {
"field": "field2"
}
},
"a_bucker_filter": {
"bucket_selector": {
"buckets_path": {
"bucker_selector_name": "aggname2"
},
"script": [script]
}
}
}
}

For each country/colour/brand combination , find sum of number of items in elasticsearch

This is a portion of the data I have indexed in elasticsearch:
{
"country" : "India",
"colour" : "white",
"brand" : "sony"
"numberOfItems" : 3
}
I want to get the total sum of numberOfItems on a per country basis, per colour basis and per brand basis. Is there any way to do this in elasticsearch?
The following should land you straight to the answer.
Make sure you enable scripting before using it.
{
"aggs": {
"keys": {
"terms": {
"script": "doc['country'].value + doc['color'].value + doc['brand'].value"
},
"aggs": {
"keySum": {
"sum": {
"field": "numberOfItems"
}
}
}
}
}
}
To get a single result you may use sum aggregation applied to a filtered query with term (terms) filter, e.g.:
{
"query": {
"filtered": {
"filter": {
"term": {
"country": "India"
}
}
}
},
"aggs": {
"total_sum": {
"sum": {
"field": "numberOfItems"
}
}
}
}
To get statistics for all countries/colours/brands in a single pass over the data you may use the following query with 3 multi-bucket aggregations, each of them containing a single-bucket sum sub-aggregation:
{
"query": {
"match_all": {}
},
"aggs": {
"countries": {
"terms": {
"field": "country"
},
"aggs": {
"country_sum": {
"sum": {
"field": "numberOfItems"
}
}
}
},
"colours": {
"terms": {
"field": "colour"
},
"aggs": {
"colour_sum": {
"sum": {
"field": "numberOfItems"
}
}
}
},
"brands": {
"terms": {
"field": "brand"
},
"aggs": {
"brand_sum": {
"sum": {
"field": "numberOfItems"
}
}
}
}
}
}

Resources