Can We Apply Bucket Selector Aggregation on Nested Aggregation in ElasticSearch? - elasticsearch

I want to use PipeLine Aggregation(Bucket Selector Aggregation) to Nested Field Aggregation in ElasticSearch 2.4. I want to do something similar to below but I am not successful. Could you please suggest me if it is possible to do the PipeLine Aggregation in the nested field?
{
"size": 0,
"aggregations": {
"totalPaidAmount": {
"nested": {
"path": "count"
},
"aggregations": {
"paidAmountTotal": {
"sum": {
"field": "count.totalPaidAmount"
}
},
"paidAmount_filter": {
"bucket_selector": {
"script": {
"inline": "amount > 5000000"
},
"buckets_path": {
"amount": "paidAmountTotal"
}
}
}
}
}
}
}

I found the solution for the query. Actually, bucket selector Aggregation should be parallel to the nested aggregation and path should be referenced by '>' as shown below:
{
"size": 0,
"aggregations": {
"amount": {
"terms": {
"field": "countId",
"size": 0
},
"aggregations": {
"totalPaidAmount": {
"nested": {
"path": "count"
},
"aggregations": {
"paidAmountTotal": {
"sum": {
"field": "count.totalPaidAmount"
}
}
}
},
"paidAmount_filter": {
"bucket_selector": {
"script": {
"inline": "amount > 1000"
},
"buckets_path": {
"amount": "totalPaidAmount>paidAmountTotal"
}
}
}
}
}
}
}

You are missing params in script value. so, paidAmount_filter should look like:
"bucket_filter": {
"bucket_selector": {
"buckets_path": {
"amount ": "paidAmountTotal"
},
"script": "params.amount > 5000000"
}
}

Related

How to define percentage of result items with specific field in Elasticsearch query?

I have a search query that returns all items matching users that have type manager or lead.
{
"from": 0,
"size": 20,
"query": {
"bool": {
"should": [
{
"terms": {
"type": ["manager", "lead"]
}
}
]
}
}
}
Is there a way to define what percentage of the results should be of type "manager"?
In other words, I want the results to have 80% of users with type manager and 20% with type lead.
I want to make a suggestion to use bucket_path aggregation. As I know this aggregation needs to be run in sub-aggs of a histogram aggregation. As you have such field in your mapping so I think this query should work for you:
{
"size": 0,
"aggs": {
"NAME": {
"date_histogram": {
"field": "my_datetime",
"interval": "month"
},
"aggs": {
"role_type": {
"terms": {
"field": "type",
"size": 10
},
"aggs": {
"count": {
"value_count": {
"field": "_id"
}
}
}
},
"role_1_ratio": {
"bucket_script": {
"buckets_path": {
"role_1": "role_type['manager']>count",
"role_2": "role_type['lead']>count"
},
"script": "params.role_1 / (params.role_1+params.role_2)*100"
}
},
"role_2_ratio": {
"bucket_script": {
"buckets_path": {
"role_1": "role_type['manager']>count",
"role_2": "role_type['lead']>count"
},
"script": "params.role_2 / (params.role_1+params.role_2)*100"
}
}
}
}
}
}
Please let me know if it didn't work well for you.

What should be the bucket path for nested term aggregation?

I want to do pipeline aggregation on my elasticsearch aggregation. Here is my query body
{
"aggs": {
"user_info": {
"terms": {
"field": "user_id"
},
"aggs": {
"product_info": {
"terms": {
"field": "product_id"
},
"aggs": {
"total_item_price": {
"sum": {
"field": "selling_price"
}
}
}
}
}
},
"price_percentile": {
"percentiles_bucket": {
"buckets_path": "user_info.product_info.total_item_price"
}
}
}
}
This is giving me error that
No aggregation found for path [user_info.product_info.total_item_price]
What should be the path for bucket if such nested aggregation is there? Or is it not possible to find percentiles for such bucket arrangement in elasticsearch.
P.S I am using elasticsearch 6.5
#jzzfs answer is also somewhat right. I approached it in a different way. I reversed my aggregations and it fulfilled my use case. But in general, you can't do nested bucket percentiles for now.
{
"aggs": {
"product_info": {
"terms": {
"field": "product_id"
},
"aggs": {
"user_info": {
"terms": {
"field": "user_id"
},
"aggs": {
"total_item_price": {
"sum": {
"field": "selling_price"
}
}
}
},
"pb": {
"percentiles_bucket": {
"buckets_path": "user_info>total_item_price"
}
}
}
}
}
}
First, don't use dots in the path -- use > instead:
GET stack/_search
{
"aggs": {
"user_info": {
"terms": {
"field": "user_id"
},
"aggs": {
"product_info": {
"terms": {
"field": "product_id"
},
"aggs": {
"total_item_price": {
"sum": {
"field": "selling_price"
}
}
}
}
}
},
"pb": {
"percentiles_bucket": {
"buckets_path": "user_info>product_info>total_item_price"
}
}
}
}
which yields "buckets_path must reference either a number value or a single value numeric metric aggregation, got: [Object[]] at aggregation [product_info]" so it's not gonna work.
Here are our options:
Aggregate globally but just under the bucketed product info (without the users):
GET stack/_search
{
"aggs": {
"product_info": {
"terms": {
"field": "product_id"
},
"aggs": {
"total_item_price": {
"sum": {
"field": "selling_price"
}
}
}
},
"pb": {
"percentiles_bucket": {
"buckets_path": "product_info>total_item_price"
}
}
}
}
Use filtered aggregations in order to mimic the original intent:
GET stack/_search
{
"aggs": {
"user_123": { <-- keeping the agg name consistent w/ the filter
"filter": {
"term": {
"user_id": 123 <-- actual filter
}
},
"aggs": {
"product_info": {
"terms": {
"field": "product_id"
},
"aggs": {
"total_item_price": {
"sum": {
"field": "selling_price"
}
}
}
},
"pb": {
"percentiles_bucket": {
"buckets_path": "product_info>total_item_price"
}
}
}
}
}
}
You can then have as many user_xyz subaggregations as you like -- provided you gather their IDs beforehand.

Bucket_selector aggregation and size. optimizations

I have question about bucket_selector aggregation.
(Environment tested: ES6.8 and ES7 basic on centos7)
In my use case I need to drop documents if there are dupes by selected property. Index is not big about 2mln records.
Query to find those records looks like this:
GET index_id1/_search
{
"size": 0,
"aggs": {
"byNested": {
"nested": {
"path": "nestedObjects"
},
"aggs": {
"sameIds": {
"terms": {
"script": {
"lang": "painless",
"source": "return doc['nestedObjects.id'].value"
},
"size": 1000
},
"aggs": {
"byId": {
"reverse_nested": {}
},
"byId_bucket_filter": {
"bucket_selector": {
"buckets_path": {
"totalCount": "byId._count"
},
"script": {
"source": "params.totalCount > 1"
}
}
}
}
}
}
}
}
}
I get the buckets back. But to relax the query and the load. I do it by size: 1000. So, next query issued to get more dupes until zero is back.
The problem is however - too small amount of dupes. I checked the result of the query by setting size: 2000000:
GET index_id1/_search
{
"size": 0,
"aggs": {
"byNested": {
"nested": {
"path": "nestedObjects"
},
"aggs": {
"sameIds": {
"terms": {
"script": {
"lang": "painless",
"source": "return doc['nestedObjects.id'].value"
},
"size": 2000000 <-- too big
},
"aggs": {
"byId": {
"reverse_nested": {}
},
"byId_bucket_filter": {
"bucket_selector": {
"buckets_path": {
"totalCount": "byId._count"
},
"script": {
"source": "params.totalCount > 1"
}
}
}
}
}
}
}
}
}
As I understand first step is: it actually creates the buckets as stated in the query and then bucket_selector filters only what i need. And that's why i see this kind of behavior. In order to get all the buckets I have to adjust "search.max_buckets" to 2000000.
Converted to query with composite aggregation:
GET index_id1/_search
{
"aggs": {
"byNested": {
"nested": {
"path": "nestedObjects"
},
"aggs": {
"compositeAgg": {
"composite": {
"after": {
"termsAgg": "03f10a7d-0162-4409-8647-c643274d6727"
},
"size": 1000,
"sources": [
{
"termsAgg": {
"terms": {
"script": {
"lang": "painless",
"source": "return doc['nestedObjects.id'].value"
}
}
}
}
]
},
"aggs": {
"byId": {
"reverse_nested": {}
},
"byId_bucket_filter": {
"bucket_selector": {
"script": {
"source": "params.totalCount > 1"
},
"buckets_path": {
"totalCount": "byId._count"
}
}
}
}
}
}
}
},
"size": 0
}
As I understand it does the same thing except that I need to make 2000 calls (size: 1000 each) to go over the whole index.
Is composite agg caches the results or why this is better?
Maybe there is a better approach in this case?

Elasticsearch. Using term aggregation, return values where doc count is less than some value

I want to group values by field(account id in my case) using term aggregation and return only fields where doc_count is less than some value.
I can specify min_doc_count parameter, but there is no max_doc_count. So I'm looking for a way to simulate this behavior. One of my many tries is this, but it doesn't work.
{
"size": 0,
"aggs": {
"by_account": {
"terms": {
"field": "accountId"
},
"aggs": {
"by_account_filtered": {
"bucket_selector": {
"buckets_path": {
"totalDocs": "_count"
},
"script": "params.totalDocs < 10000"
}
}
}
}
}
}
What am I doing wrong?
The bucket_selector aggregation need to be nested ( since its a parent-type aggregation ) and sibling of a metric aggregation that it will use to filter buckets.
So we use a top level terms aggregation, then use a nested value_count aggregation to expose the bucket doc_count to the sibling selector_bucket aggregation
try this :
{
"size": 0,
"aggs": {
"by_account": {
"terms": {
"field": "accountId"
},
"aggs": {
"by_account_number": {
"value_count" : {
"field" : "accountId"
}
},
"by_account_filtered": {
"bucket_selector": {
"buckets_path": {
"totalDocs": "by_account_number"
},
"script": "params.totalDocs < 10000"
}
}
}
}
}
}
EDIT : If you want to get the lowest account doc_count
{
"size": 0,
"aggs": {
"by_account": {
"terms": {
"field": "accountId",
"order" : { "_count" : "asc" },
"size": 100
},
"aggs": {
"by_account_number": {
"value_count" : {
"field" : "accountId"
}
},
"by_account_filtered": {
"bucket_selector": {
"buckets_path": {
"totalDocs": "by_account_number"
},
"script": "params.totalDocs < 10000"
}
}
}
}
}
}

bucket script not working - elasticsearch 2.4.2

I have tried to subtract the aggregations
{
"query": {
"match_all": {}
},
"size": 0,
"aggs": {
"total_query_id": {
"sum": {
"field": "query_id"
}
},
"total_num_results": {
"sum": {
"field": "num_results"
}
},
"minus_value": {
"bucket_script": {
"buckets_path": {
"qid": "total_query_id",
"nrs": "total_num_results"
},
"script": "qid - nrs"
}
}
}
}
it throws the below error
"reason": "Invalid pipeline aggregation named [minus_value] of type [bucket_script]. Only sibling pipeline aggregations are allowed at the top level"
I have moved to back and forth minus_value node to aggs node but it does not solve my problem.
can anyone help me on this?
The idea is that pipeline aggregations must work on a parent bucket aggregation.
It is not the case in your example, so you must have one parent aggregation. Since you have a match_all query, you could try using a global bucket aggregation and then embed your 3 aggregations inside it, like this:
{
"query": {
"match_all": {}
},
"size": 0,
"aggs": {
"all": {
"global": {},
"aggs": {
"total_query_id": {
"sum": {
"field": "query_id"
}
},
"total_num_results": {
"sum": {
"field": "num_results"
}
},
"minus_value": {
"bucket_script": {
"buckets_path": {
"qid": "total_query_id",
"nrs": "total_num_results"
},
"script": "qid - nrs"
}
}
}
}
}
}

Resources