bucket script not working - elasticsearch 2.4.2 - elasticsearch

I have tried to subtract the aggregations
{
"query": {
"match_all": {}
},
"size": 0,
"aggs": {
"total_query_id": {
"sum": {
"field": "query_id"
}
},
"total_num_results": {
"sum": {
"field": "num_results"
}
},
"minus_value": {
"bucket_script": {
"buckets_path": {
"qid": "total_query_id",
"nrs": "total_num_results"
},
"script": "qid - nrs"
}
}
}
}
it throws the below error
"reason": "Invalid pipeline aggregation named [minus_value] of type [bucket_script]. Only sibling pipeline aggregations are allowed at the top level"
I have moved to back and forth minus_value node to aggs node but it does not solve my problem.
can anyone help me on this?

The idea is that pipeline aggregations must work on a parent bucket aggregation.
It is not the case in your example, so you must have one parent aggregation. Since you have a match_all query, you could try using a global bucket aggregation and then embed your 3 aggregations inside it, like this:
{
"query": {
"match_all": {}
},
"size": 0,
"aggs": {
"all": {
"global": {},
"aggs": {
"total_query_id": {
"sum": {
"field": "query_id"
}
},
"total_num_results": {
"sum": {
"field": "num_results"
}
},
"minus_value": {
"bucket_script": {
"buckets_path": {
"qid": "total_query_id",
"nrs": "total_num_results"
},
"script": "qid - nrs"
}
}
}
}
}
}

Related

How to define percentage of result items with specific field in Elasticsearch query?

I have a search query that returns all items matching users that have type manager or lead.
{
"from": 0,
"size": 20,
"query": {
"bool": {
"should": [
{
"terms": {
"type": ["manager", "lead"]
}
}
]
}
}
}
Is there a way to define what percentage of the results should be of type "manager"?
In other words, I want the results to have 80% of users with type manager and 20% with type lead.
I want to make a suggestion to use bucket_path aggregation. As I know this aggregation needs to be run in sub-aggs of a histogram aggregation. As you have such field in your mapping so I think this query should work for you:
{
"size": 0,
"aggs": {
"NAME": {
"date_histogram": {
"field": "my_datetime",
"interval": "month"
},
"aggs": {
"role_type": {
"terms": {
"field": "type",
"size": 10
},
"aggs": {
"count": {
"value_count": {
"field": "_id"
}
}
}
},
"role_1_ratio": {
"bucket_script": {
"buckets_path": {
"role_1": "role_type['manager']>count",
"role_2": "role_type['lead']>count"
},
"script": "params.role_1 / (params.role_1+params.role_2)*100"
}
},
"role_2_ratio": {
"bucket_script": {
"buckets_path": {
"role_1": "role_type['manager']>count",
"role_2": "role_type['lead']>count"
},
"script": "params.role_2 / (params.role_1+params.role_2)*100"
}
}
}
}
}
}
Please let me know if it didn't work well for you.

What should be the bucket path for nested term aggregation?

I want to do pipeline aggregation on my elasticsearch aggregation. Here is my query body
{
"aggs": {
"user_info": {
"terms": {
"field": "user_id"
},
"aggs": {
"product_info": {
"terms": {
"field": "product_id"
},
"aggs": {
"total_item_price": {
"sum": {
"field": "selling_price"
}
}
}
}
}
},
"price_percentile": {
"percentiles_bucket": {
"buckets_path": "user_info.product_info.total_item_price"
}
}
}
}
This is giving me error that
No aggregation found for path [user_info.product_info.total_item_price]
What should be the path for bucket if such nested aggregation is there? Or is it not possible to find percentiles for such bucket arrangement in elasticsearch.
P.S I am using elasticsearch 6.5
#jzzfs answer is also somewhat right. I approached it in a different way. I reversed my aggregations and it fulfilled my use case. But in general, you can't do nested bucket percentiles for now.
{
"aggs": {
"product_info": {
"terms": {
"field": "product_id"
},
"aggs": {
"user_info": {
"terms": {
"field": "user_id"
},
"aggs": {
"total_item_price": {
"sum": {
"field": "selling_price"
}
}
}
},
"pb": {
"percentiles_bucket": {
"buckets_path": "user_info>total_item_price"
}
}
}
}
}
}
First, don't use dots in the path -- use > instead:
GET stack/_search
{
"aggs": {
"user_info": {
"terms": {
"field": "user_id"
},
"aggs": {
"product_info": {
"terms": {
"field": "product_id"
},
"aggs": {
"total_item_price": {
"sum": {
"field": "selling_price"
}
}
}
}
}
},
"pb": {
"percentiles_bucket": {
"buckets_path": "user_info>product_info>total_item_price"
}
}
}
}
which yields "buckets_path must reference either a number value or a single value numeric metric aggregation, got: [Object[]] at aggregation [product_info]" so it's not gonna work.
Here are our options:
Aggregate globally but just under the bucketed product info (without the users):
GET stack/_search
{
"aggs": {
"product_info": {
"terms": {
"field": "product_id"
},
"aggs": {
"total_item_price": {
"sum": {
"field": "selling_price"
}
}
}
},
"pb": {
"percentiles_bucket": {
"buckets_path": "product_info>total_item_price"
}
}
}
}
Use filtered aggregations in order to mimic the original intent:
GET stack/_search
{
"aggs": {
"user_123": { <-- keeping the agg name consistent w/ the filter
"filter": {
"term": {
"user_id": 123 <-- actual filter
}
},
"aggs": {
"product_info": {
"terms": {
"field": "product_id"
},
"aggs": {
"total_item_price": {
"sum": {
"field": "selling_price"
}
}
}
},
"pb": {
"percentiles_bucket": {
"buckets_path": "product_info>total_item_price"
}
}
}
}
}
}
You can then have as many user_xyz subaggregations as you like -- provided you gather their IDs beforehand.

How to diversify the result of top-hits aggregation?

Let's start with a concrete example. I have a document with these fields:
{
"template": {
"mappings": {
"template": {
"properties": {
"tid": {
"type": "long"
},
"folder_id": {
"type": "long"
},
"status": {
"type": "integer"
},
"major_num": {
"type": "integer"
}
}
}
}
}
}
I want to aggregate the query result by field folder_id, and for each group divided by folder_id, retrieve the top-N documents' _source detail. So i write query DSL like:
GET /template/template/_search
{
"size": 0,
"query": {
"bool": {
"filter": [
{
"term": {
"status": 1
}
}
]
}
},
"aggs": {
"folder": {
"terms": {
"field": "folder_id",
"size": 10
},
"aggs": {
"top_hit":{
"top_hits": {
"size": 5,
"_source": ["major_num"]
}
}
}
}
}
}
However, now comes a requirement that the top hits documents for each folder_id must be diversified on the field major_num. For each folder_id, the top hits documents retrieve by the sub top_hits aggregation under the terms aggregation, must be unique on field major_num, and for each major_num value, return at most 1 document in the sub top hits aggregation result.
top_hits aggregation cannot accept sub-aggregations, so how should i solve the question?
Why not simply adding another terms aggregation on the major_num field ?
GET /template/template/_search
{
"size": 0,
"query": {
"bool": {
"filter": [
{
"term": {
"status": 1
}
}
]
}
},
"aggs": {
"folder": {
"terms": {
"field": "folder_id",
"size": 10
},
"aggs": {
"majornum": {
"terms": {
"field": "major_num",
"size": 10
},
"aggs": {
"top_hit": {
"top_hits": {
"size": 1
}
}
}
}
}
}
}
}

Can We Apply Bucket Selector Aggregation on Nested Aggregation in ElasticSearch?

I want to use PipeLine Aggregation(Bucket Selector Aggregation) to Nested Field Aggregation in ElasticSearch 2.4. I want to do something similar to below but I am not successful. Could you please suggest me if it is possible to do the PipeLine Aggregation in the nested field?
{
"size": 0,
"aggregations": {
"totalPaidAmount": {
"nested": {
"path": "count"
},
"aggregations": {
"paidAmountTotal": {
"sum": {
"field": "count.totalPaidAmount"
}
},
"paidAmount_filter": {
"bucket_selector": {
"script": {
"inline": "amount > 5000000"
},
"buckets_path": {
"amount": "paidAmountTotal"
}
}
}
}
}
}
}
I found the solution for the query. Actually, bucket selector Aggregation should be parallel to the nested aggregation and path should be referenced by '>' as shown below:
{
"size": 0,
"aggregations": {
"amount": {
"terms": {
"field": "countId",
"size": 0
},
"aggregations": {
"totalPaidAmount": {
"nested": {
"path": "count"
},
"aggregations": {
"paidAmountTotal": {
"sum": {
"field": "count.totalPaidAmount"
}
}
}
},
"paidAmount_filter": {
"bucket_selector": {
"script": {
"inline": "amount > 1000"
},
"buckets_path": {
"amount": "totalPaidAmount>paidAmountTotal"
}
}
}
}
}
}
}
You are missing params in script value. so, paidAmount_filter should look like:
"bucket_filter": {
"bucket_selector": {
"buckets_path": {
"amount ": "paidAmountTotal"
},
"script": "params.amount > 5000000"
}
}

For each country/colour/brand combination , find sum of number of items in elasticsearch

This is a portion of the data I have indexed in elasticsearch:
{
"country" : "India",
"colour" : "white",
"brand" : "sony"
"numberOfItems" : 3
}
I want to get the total sum of numberOfItems on a per country basis, per colour basis and per brand basis. Is there any way to do this in elasticsearch?
The following should land you straight to the answer.
Make sure you enable scripting before using it.
{
"aggs": {
"keys": {
"terms": {
"script": "doc['country'].value + doc['color'].value + doc['brand'].value"
},
"aggs": {
"keySum": {
"sum": {
"field": "numberOfItems"
}
}
}
}
}
}
To get a single result you may use sum aggregation applied to a filtered query with term (terms) filter, e.g.:
{
"query": {
"filtered": {
"filter": {
"term": {
"country": "India"
}
}
}
},
"aggs": {
"total_sum": {
"sum": {
"field": "numberOfItems"
}
}
}
}
To get statistics for all countries/colours/brands in a single pass over the data you may use the following query with 3 multi-bucket aggregations, each of them containing a single-bucket sum sub-aggregation:
{
"query": {
"match_all": {}
},
"aggs": {
"countries": {
"terms": {
"field": "country"
},
"aggs": {
"country_sum": {
"sum": {
"field": "numberOfItems"
}
}
}
},
"colours": {
"terms": {
"field": "colour"
},
"aggs": {
"colour_sum": {
"sum": {
"field": "numberOfItems"
}
}
}
},
"brands": {
"terms": {
"field": "brand"
},
"aggs": {
"brand_sum": {
"sum": {
"field": "numberOfItems"
}
}
}
}
}
}

Resources