For each country/colour/brand combination , find sum of number of items in elasticsearch - elasticsearch

This is a portion of the data I have indexed in elasticsearch:
{
"country" : "India",
"colour" : "white",
"brand" : "sony"
"numberOfItems" : 3
}
I want to get the total sum of numberOfItems on a per country basis, per colour basis and per brand basis. Is there any way to do this in elasticsearch?

The following should land you straight to the answer.
Make sure you enable scripting before using it.
{
"aggs": {
"keys": {
"terms": {
"script": "doc['country'].value + doc['color'].value + doc['brand'].value"
},
"aggs": {
"keySum": {
"sum": {
"field": "numberOfItems"
}
}
}
}
}
}

To get a single result you may use sum aggregation applied to a filtered query with term (terms) filter, e.g.:
{
"query": {
"filtered": {
"filter": {
"term": {
"country": "India"
}
}
}
},
"aggs": {
"total_sum": {
"sum": {
"field": "numberOfItems"
}
}
}
}
To get statistics for all countries/colours/brands in a single pass over the data you may use the following query with 3 multi-bucket aggregations, each of them containing a single-bucket sum sub-aggregation:
{
"query": {
"match_all": {}
},
"aggs": {
"countries": {
"terms": {
"field": "country"
},
"aggs": {
"country_sum": {
"sum": {
"field": "numberOfItems"
}
}
}
},
"colours": {
"terms": {
"field": "colour"
},
"aggs": {
"colour_sum": {
"sum": {
"field": "numberOfItems"
}
}
}
},
"brands": {
"terms": {
"field": "brand"
},
"aggs": {
"brand_sum": {
"sum": {
"field": "numberOfItems"
}
}
}
}
}
}

Related

How to define percentage of result items with specific field in Elasticsearch query?

I have a search query that returns all items matching users that have type manager or lead.
{
"from": 0,
"size": 20,
"query": {
"bool": {
"should": [
{
"terms": {
"type": ["manager", "lead"]
}
}
]
}
}
}
Is there a way to define what percentage of the results should be of type "manager"?
In other words, I want the results to have 80% of users with type manager and 20% with type lead.
I want to make a suggestion to use bucket_path aggregation. As I know this aggregation needs to be run in sub-aggs of a histogram aggregation. As you have such field in your mapping so I think this query should work for you:
{
"size": 0,
"aggs": {
"NAME": {
"date_histogram": {
"field": "my_datetime",
"interval": "month"
},
"aggs": {
"role_type": {
"terms": {
"field": "type",
"size": 10
},
"aggs": {
"count": {
"value_count": {
"field": "_id"
}
}
}
},
"role_1_ratio": {
"bucket_script": {
"buckets_path": {
"role_1": "role_type['manager']>count",
"role_2": "role_type['lead']>count"
},
"script": "params.role_1 / (params.role_1+params.role_2)*100"
}
},
"role_2_ratio": {
"bucket_script": {
"buckets_path": {
"role_1": "role_type['manager']>count",
"role_2": "role_type['lead']>count"
},
"script": "params.role_2 / (params.role_1+params.role_2)*100"
}
}
}
}
}
}
Please let me know if it didn't work well for you.

What should be the bucket path for nested term aggregation?

I want to do pipeline aggregation on my elasticsearch aggregation. Here is my query body
{
"aggs": {
"user_info": {
"terms": {
"field": "user_id"
},
"aggs": {
"product_info": {
"terms": {
"field": "product_id"
},
"aggs": {
"total_item_price": {
"sum": {
"field": "selling_price"
}
}
}
}
}
},
"price_percentile": {
"percentiles_bucket": {
"buckets_path": "user_info.product_info.total_item_price"
}
}
}
}
This is giving me error that
No aggregation found for path [user_info.product_info.total_item_price]
What should be the path for bucket if such nested aggregation is there? Or is it not possible to find percentiles for such bucket arrangement in elasticsearch.
P.S I am using elasticsearch 6.5
#jzzfs answer is also somewhat right. I approached it in a different way. I reversed my aggregations and it fulfilled my use case. But in general, you can't do nested bucket percentiles for now.
{
"aggs": {
"product_info": {
"terms": {
"field": "product_id"
},
"aggs": {
"user_info": {
"terms": {
"field": "user_id"
},
"aggs": {
"total_item_price": {
"sum": {
"field": "selling_price"
}
}
}
},
"pb": {
"percentiles_bucket": {
"buckets_path": "user_info>total_item_price"
}
}
}
}
}
}
First, don't use dots in the path -- use > instead:
GET stack/_search
{
"aggs": {
"user_info": {
"terms": {
"field": "user_id"
},
"aggs": {
"product_info": {
"terms": {
"field": "product_id"
},
"aggs": {
"total_item_price": {
"sum": {
"field": "selling_price"
}
}
}
}
}
},
"pb": {
"percentiles_bucket": {
"buckets_path": "user_info>product_info>total_item_price"
}
}
}
}
which yields "buckets_path must reference either a number value or a single value numeric metric aggregation, got: [Object[]] at aggregation [product_info]" so it's not gonna work.
Here are our options:
Aggregate globally but just under the bucketed product info (without the users):
GET stack/_search
{
"aggs": {
"product_info": {
"terms": {
"field": "product_id"
},
"aggs": {
"total_item_price": {
"sum": {
"field": "selling_price"
}
}
}
},
"pb": {
"percentiles_bucket": {
"buckets_path": "product_info>total_item_price"
}
}
}
}
Use filtered aggregations in order to mimic the original intent:
GET stack/_search
{
"aggs": {
"user_123": { <-- keeping the agg name consistent w/ the filter
"filter": {
"term": {
"user_id": 123 <-- actual filter
}
},
"aggs": {
"product_info": {
"terms": {
"field": "product_id"
},
"aggs": {
"total_item_price": {
"sum": {
"field": "selling_price"
}
}
}
},
"pb": {
"percentiles_bucket": {
"buckets_path": "product_info>total_item_price"
}
}
}
}
}
}
You can then have as many user_xyz subaggregations as you like -- provided you gather their IDs beforehand.

How can I use the parent's sibling bucket path in bucket_script?

I want to use the parent's sibling bucket path in bucket-script,the DSL like this:
{
"size": 0,
"aggs": {
"car_type": {
"terms": {
"field": "screenName",
"size": 10
},
"aggs": {
"active_num": {
"terms": {
"field": "activeNum",
"size": 10
},
"aggs": {
"active_count": {
"value_count": {
"field": "activeNum"
}
},
"result" : {
"bucket_script": {
"buckets_path": {
"count1" : "car_type>all_count",
"count2" : "active_count"
},
"script": "params.count2/params.count1"
}
}
}
},
"all_count": {
"value_count": {
"field": "activeNum"
}
}
}
}
}
}
I want to use all_count in result, but es will throw Exception:
No aggregation found for path [car_type>all_count]
Then,I change the place to use bucket_script,like this:
{
"size": 0,
"aggs": {
"car_type": {
"terms": {
"field": "screenName",
"size": 10
},
"aggs": {
"active_num": {
"terms": {
"field": "activeNum",
"size": 10
},
"aggs": {
"active_count": {
"value_count": {
"field": "activeNum"
}
}
}
},
"all_count": {
"value_count": {
"field": "activeNum"
}
},
"result" : {
"bucket_script": {
"buckets_path": {
"count1" : "all_count",
"count2" : "active_num>active_count"
},
"script": "params.count2/params.count1"
}
}
}
}
}
}
but I get another Exception:
buckets_path must reference either a number value or a single value numeric metric aggregation, got: java.lang.Object[]
I have found the official website page, but I get nothing.
How can I use this bucket_path?
Using the following method can resolve some problems:
{
"size": 0,
"aggs": {
"car_type": {
"terms": {
"field": "screenName",
"size": 10
},
"aggs": {
"active_num": {
"filter": {
"term": {
"activeNum": "1"
}
},
"aggs": {
"active_count": {
"value_count": {
"field": "activeNum"
}
}
}
},
"all_count": {
"value_count": {
"field": "activeNum"
}
},
"result" : {
"bucket_script": {
"buckets_path": {
"count1" : "all_count",
"count2" : "active_num>active_count"
},
"script": "params.count2/params.count1"
}
}
}
}
}
}
But there still has some problems,this method can only get one value's result,can not get every value of this field.
So, who has any other ideas?

Elasticsearch: Aggregation on filtered nested objects to find unique values

I have an array of objects (tags) in each document in Elasticsearch 5:
{
"tags": [
{ "key": "tag1", "value": "val1" },
{ "key": "tag2", "value": "val2" },
...
]
}
Now I want to find unique tag values for a certain tag key. Something similiar to this SQL query:
SELECT DISTINCT(tags.value) FROM tags WHERE tags.key='some-key'
I have came to this DSL so far:
{
"size": 0,
"aggs": {
"my_tags": {
"nested": {
"path": "tags"
},
"aggs": {
"filter" : { "terms": { "tags.key": "tag1" } },
"aggs": {
"my_tags_values": {
"terms" : {
"field" : "tags.value",
"size": 9999
}
}
}
}
}
}
}
But It is showing me this error:
[terms] unknown field [tags.key], parser not found.
Is this the right approach to solve the problem? Thanks for your help.
Note: I have declared the tags field as a nested field in my mapping.
You mixed up things there. You wanted probably to add a filter aggregation, but you didn't give it any name:
{
"size": 0,
"aggs": {
"my_tags": {
"nested": {
"path": "tags"
},
"aggs": {
"my_filter": {
"filter": {
"terms": {
"tags.key": [
"tag1"
]
}
},
"aggs": {
"my_tags_values": {
"terms": {
"field": "tags.value",
"size": 9999
}
}
}
}
}
}
}
}
Try Bool Query inside the Filter-Aggregation:
{
"size": 0,
"aggs": {
"my_tags": {
"nested": {
"path": "tags"
},
"aggs": {
"filter": {
"bool": {
"must": [
{
"term": {
"tags.key": "tag1"
}
}
]
},
"aggs": {
"my_tags_values": {
"terms": {
"field": "tags.value",
"size": 0
}
}
}
}
}
}
}
}
BTW: if you want to retrieve all buckets, you can write 0 instead of 9999 in aggregation size.

Elasticsearch - get terms aggregation for specified fields

I am using terms aggregations to get all the no of users from each city
{
"aggs" : {
"cities" : {
"terms" : { "field" : "city.name" }
}
}
}
This is giving results. But I always want to get some specific cities in results of aggregation irrespective of whether they are in top 10 or not. Do I need to use filter aggregation for each of the city separately to get its result?
You have three solutions:
A. You can specify a filter in the query:
{
"query": {
"terms": {
"city.name": [ "city1", "city2", "city3" ]
}
},
"aggs": {
"cities": {
"terms": {
"field": "city.name"
}
}
}
}
B. You can specify a filter in the aggregations:
{
"aggs": {
"city_filter": {
"filter": {
"terms": {
"city.name": [
"city1",
"city2",
"city3"
]
}
},
"aggs": {
"cities": {
"terms": {
"field": "city.name"
}
}
}
}
}
}
C. You can filter values in the terms aggregation:
{
"aggs": {
"cities": {
"terms": {
"field": "city.name",
"include": "city1*",
"exclude": "city2*"
}
}
}
}

Resources