ElasticSearch 6.6.0 Aggregation Avg with script not working - elasticsearch

I have the following I am trying to run on my cluster:
GET _search
{
"aggs": {
"buckets": {
"terms": {
"field": "main_feature_id.keyword",
"size": 10
},
"aggs": {
"average_dwell": {
"avg": {
"field": "dwell.dwell_ms",
"script": {
"lang": "painless",
"source": "long x = Math.round(_value*100)/100000; return x;"
}
}
}
}
}
}
}
But no matter what I try I cannot get it to round the result.
Here is what the result looks like:
"doc_count" : 26032,
"average_dwell" : {
"value" : 44.87277178006528
}
Can someone please tell me what I am doing wrong I am sure it is something obvious.
Thank you!

_value script applies the script on each value of the document and then calculates the average of the modified values. What you seem to achieve is to reduce the precision to two decimal places. This can be achieved by making use of bucket script aggregation to get the expected value.
{
"aggs": {
"buckets": {
"terms": {
"field": "main_feature_id.keyword",
"size": 10
},
"aggs": {
"average_dwell": {
"avg": {
"field": "dwell.dwell_ms"
}
},
"rounded_avg": {
"bucket_script": {
"buckets_path": {
"curr_avg": "average_dwell"
},
"script": "Math.round(params.curr_avg * 100)/100.0;"
}
}
}
}
}
}

Related

How to define percentage of result items with specific field in Elasticsearch query?

I have a search query that returns all items matching users that have type manager or lead.
{
"from": 0,
"size": 20,
"query": {
"bool": {
"should": [
{
"terms": {
"type": ["manager", "lead"]
}
}
]
}
}
}
Is there a way to define what percentage of the results should be of type "manager"?
In other words, I want the results to have 80% of users with type manager and 20% with type lead.
I want to make a suggestion to use bucket_path aggregation. As I know this aggregation needs to be run in sub-aggs of a histogram aggregation. As you have such field in your mapping so I think this query should work for you:
{
"size": 0,
"aggs": {
"NAME": {
"date_histogram": {
"field": "my_datetime",
"interval": "month"
},
"aggs": {
"role_type": {
"terms": {
"field": "type",
"size": 10
},
"aggs": {
"count": {
"value_count": {
"field": "_id"
}
}
}
},
"role_1_ratio": {
"bucket_script": {
"buckets_path": {
"role_1": "role_type['manager']>count",
"role_2": "role_type['lead']>count"
},
"script": "params.role_1 / (params.role_1+params.role_2)*100"
}
},
"role_2_ratio": {
"bucket_script": {
"buckets_path": {
"role_1": "role_type['manager']>count",
"role_2": "role_type['lead']>count"
},
"script": "params.role_2 / (params.role_1+params.role_2)*100"
}
}
}
}
}
}
Please let me know if it didn't work well for you.

Bucket_selector aggregation and size. optimizations

I have question about bucket_selector aggregation.
(Environment tested: ES6.8 and ES7 basic on centos7)
In my use case I need to drop documents if there are dupes by selected property. Index is not big about 2mln records.
Query to find those records looks like this:
GET index_id1/_search
{
"size": 0,
"aggs": {
"byNested": {
"nested": {
"path": "nestedObjects"
},
"aggs": {
"sameIds": {
"terms": {
"script": {
"lang": "painless",
"source": "return doc['nestedObjects.id'].value"
},
"size": 1000
},
"aggs": {
"byId": {
"reverse_nested": {}
},
"byId_bucket_filter": {
"bucket_selector": {
"buckets_path": {
"totalCount": "byId._count"
},
"script": {
"source": "params.totalCount > 1"
}
}
}
}
}
}
}
}
}
I get the buckets back. But to relax the query and the load. I do it by size: 1000. So, next query issued to get more dupes until zero is back.
The problem is however - too small amount of dupes. I checked the result of the query by setting size: 2000000:
GET index_id1/_search
{
"size": 0,
"aggs": {
"byNested": {
"nested": {
"path": "nestedObjects"
},
"aggs": {
"sameIds": {
"terms": {
"script": {
"lang": "painless",
"source": "return doc['nestedObjects.id'].value"
},
"size": 2000000 <-- too big
},
"aggs": {
"byId": {
"reverse_nested": {}
},
"byId_bucket_filter": {
"bucket_selector": {
"buckets_path": {
"totalCount": "byId._count"
},
"script": {
"source": "params.totalCount > 1"
}
}
}
}
}
}
}
}
}
As I understand first step is: it actually creates the buckets as stated in the query and then bucket_selector filters only what i need. And that's why i see this kind of behavior. In order to get all the buckets I have to adjust "search.max_buckets" to 2000000.
Converted to query with composite aggregation:
GET index_id1/_search
{
"aggs": {
"byNested": {
"nested": {
"path": "nestedObjects"
},
"aggs": {
"compositeAgg": {
"composite": {
"after": {
"termsAgg": "03f10a7d-0162-4409-8647-c643274d6727"
},
"size": 1000,
"sources": [
{
"termsAgg": {
"terms": {
"script": {
"lang": "painless",
"source": "return doc['nestedObjects.id'].value"
}
}
}
}
]
},
"aggs": {
"byId": {
"reverse_nested": {}
},
"byId_bucket_filter": {
"bucket_selector": {
"script": {
"source": "params.totalCount > 1"
},
"buckets_path": {
"totalCount": "byId._count"
}
}
}
}
}
}
}
},
"size": 0
}
As I understand it does the same thing except that I need to make 2000 calls (size: 1000 each) to go over the whole index.
Is composite agg caches the results or why this is better?
Maybe there is a better approach in this case?

Can We Apply Bucket Selector Aggregation on Nested Aggregation in ElasticSearch?

I want to use PipeLine Aggregation(Bucket Selector Aggregation) to Nested Field Aggregation in ElasticSearch 2.4. I want to do something similar to below but I am not successful. Could you please suggest me if it is possible to do the PipeLine Aggregation in the nested field?
{
"size": 0,
"aggregations": {
"totalPaidAmount": {
"nested": {
"path": "count"
},
"aggregations": {
"paidAmountTotal": {
"sum": {
"field": "count.totalPaidAmount"
}
},
"paidAmount_filter": {
"bucket_selector": {
"script": {
"inline": "amount > 5000000"
},
"buckets_path": {
"amount": "paidAmountTotal"
}
}
}
}
}
}
}
I found the solution for the query. Actually, bucket selector Aggregation should be parallel to the nested aggregation and path should be referenced by '>' as shown below:
{
"size": 0,
"aggregations": {
"amount": {
"terms": {
"field": "countId",
"size": 0
},
"aggregations": {
"totalPaidAmount": {
"nested": {
"path": "count"
},
"aggregations": {
"paidAmountTotal": {
"sum": {
"field": "count.totalPaidAmount"
}
}
}
},
"paidAmount_filter": {
"bucket_selector": {
"script": {
"inline": "amount > 1000"
},
"buckets_path": {
"amount": "totalPaidAmount>paidAmountTotal"
}
}
}
}
}
}
}
You are missing params in script value. so, paidAmount_filter should look like:
"bucket_filter": {
"bucket_selector": {
"buckets_path": {
"amount ": "paidAmountTotal"
},
"script": "params.amount > 5000000"
}
}

How to use cumulative_sum with a previous aggregation?

I would like to plot a cumulative sum of some events, per day. The cumulative sum aggregation seems to be the way to go so I tried to reuse the example given in the docs.
The first aggregation works fine, the following query
{
"aggs": {
"vulns_day" : {
"date_histogram" :{
"field": "HOST_START_iso",
"interval": "day"
}
}
}
}
gives replies such as
(...)
{
"key_as_string": "2016-09-08T00:00:00.000Z",
"key": 1473292800000,
"doc_count": 76330
},
{
"key_as_string": "2016-09-09T00:00:00.000Z",
"key": 1473379200000,
"doc_count": 37712
},
(...)
I then wanted to query the cumulative sum of doc_count above via
{
"aggs": {
"vulns_day" : {
"date_histogram" :{
"field": "HOST_START_iso",
"interval": "day"
}
},
"aggs": {
"vulns_cumulated": {
"cumulative_sum": {
"buckets_path": "doc_count"
}
}
}
}
}
but it gives an error:
"reason": {
"type": "search_parse_exception",
"reason": "Could not find aggregator type [vulns_cumulated] in [aggs]",
I see that bucket_path should point to the elements to be summed and the example for cumulative aggregations created a specific intermediate sum but I do not have anything to sum (beside doc_count).
I guess, you should change your query like this:
{
"aggs": {
"vulns_day": {
"date_histogram": {
"field": "HOST_START_iso",
"interval": "day"
},
"aggs": {
"document_count": {
"value_count": {
"field": "HOST_START_iso"
}
},
"vulns_cumulated": {
"cumulative_sum": {
"buckets_path": "document_count"
}
}
}
}
}
}
I found the solution. Since doc_count did not seem to be available, I tried to retrieve stats for the time parameter, and use its count value. It worked:
{
"size": 0,
"aggs": {
"vulns_day": {
"date_histogram": {
"field": "HOST_START_iso",
"interval": "day"
},
"aggs": {
"dates_stats": {
"stats": {
"field": "HOST_START_iso"
}
},
"vulns_cumulated": {
"cumulative_sum": {
"buckets_path": "dates_stats.count"
}
}
}
}
}
}

How to hide buckets in ElasticSearch result?

In my query, I aggregate the buckets in one scalar. Since I'm not interested in each bucket (which, in my case, are tens of millions), I'd like to remove them from the returned result; i.e. I want to do something like "size":0 to hide all the hits. Is it possible?
E.g.:
{
"size": 0,
"aggs": {
"pop": {
"terms": {
"field": "account_number",
"size": 0
},
"aggs": {
"average": {
"avg": {
"field": "price"
}
}
}
},
"sum_of_avg": {
"sum_bucket": {
"buckets_path": "pop>average"
}
}
}
}
Result:
[...]
"aggregations": {
"pop": {
"doc_count_error_upper_bound": 40851,
"sum_other_doc_count": 93441329,
"buckets": [...] <== i don't want this
},
"sum_of_avg": {
"value": 128.0768325884469
}
I just posted an answer in the related question.
In this case the request should look like this:
curl -XPOST 'http://localhost:9200/<index>/_search?filter_path=aggregations.sum_of_avg' -d '
{
"size": 0,
"aggs": {
"pop": {
"terms": {
"field": "account_number",
"size": 0
},
"aggs": {
"average": {
"avg": {
"field": "price"
}
}
}
},
"sum_of_avg": {
"sum_bucket": {
"buckets_path": "pop>average"
}
}
}
}
PS: if you found another solution, please share it here. Thanks!
I think what you want is the "Cardinality" Aggregation. That will return to you the number of distinct values.
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-cardinality-aggregation.html
Example:
GET devdev/alert/_search
{
"size": 0,
"aggs": {
"agg1": {
"cardinality": {
"field": "price"
}
}
}
}

Resources