Elasticsearch - Bucket_script and buckets_paths return "Could not find aggregator type" - elasticsearch

I'm trying to calculate some percentages with Elasticsearch but I have a (small) problem. I want ES to calculate the following: "(wins / Total) * 100".
So I added:
"bucket_script": {
"buckets_paths": {
"total": "TotalStatus",
"wins": "TotalWins"
},
"script": " (total/ wins) * 100"
}
To my ES request, which looks like:
{
"size": 0,
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "*",
"analyze_wildcard": true
}
}
],
"must_not": []
}
},
"aggs": {
"status": {
"terms": {
"field": "status.raw"
}
},
"wins": {
"terms": {
"field": "status.raw",
"include": {
"pattern": "Accepted|Released|Closed"
}
}
},
"losses": {
"terms": {
"field": "status.raw",
"include": {
"pattern": "Rejected"
}
}
},
"TotalStatus": {
"sum_bucket": {
"buckets_path": "status._count"
}
},
"TotalWins": {
"sum_bucket": {
"buckets_path": "wins._count"
}
},
"TotalLosses": {
"sum_bucket": {
"buckets_path": "losses._count"
}
}
}
}
This however returns the following error:
{
"error": {
"root_cause": [
{
"type": "parsing_exception",
"reason": "Could not find aggregator type [buckets_paths] in [bucket_script]",
"line": 54,
"col": 28
}
],
"type": "parsing_exception",
"reason": "Could not find aggregator type [buckets_paths] in [bucket_script]",
"line": 54,
"col": 28
},
"status": 400
}
Any idea's?

I played a lot with bucket_script but I guess it might not be possible as it can't be top level aggregation and also you would need both total_wins and total_status coming from same parent aggregation with one numeric value and I think it might not be possible.
But it can be solved by scripted metric aggregation
{
"size": 0,
"aggs": {
"win_loss_ratio": {
"scripted_metric": {
"init_script": "_agg['win_count'] = 0; _agg['total_count'] = 0; _agg['win_status']=['Accepted','Released','Closed'];",
"map_script": "if (doc['status.raw'].value in _agg['win_status']) { _agg['win_count']+=1};if (doc['status.raw'].value) { _agg['total_count']+=1}",
"combine_script": "return [_agg['win_count'],_agg['total_count']];",
"reduce_script": "total_win = 0; total_status_count=0; for (a in _aggs) { total_win += a[0]; total_status_count += a[1] }; if(total_win == 0) {return 0} else {return (total_status_count/total_win) * 100}"
}
}
}
}
init_script initializes three variables. win_status array has all the values corresponding to win status.
map_script iterates through every document, if the status.raw value is in win_status then win_count is incremented and if it has any value at all total_count is incremented(you could remove this if condition if you also want to include null values)
combine_script gets all values per shard
reduce_script sums all the values and then divides it. There is also a check so that we dont divide by zero or script will throw exception.

Related

Bucket_selector on nested agg bucket doesn't works

I'm trying to make an aggregation on a sibling's children aggregation to filter bucket based on a requested quantity condition, so here is my query :
GET _search
{
"size": 0,
"query": {
"bool": {
"must": [
{
"terms": {
"product.id": [20,21,22,23,24]
}
}
]
}
},
"aggs": {
"carts": {
"terms": {
"field": "item.cart_key"
},
"aggs": {
"unique_product": {
"terms": {
"field": "product.id"
},
"aggs": {
"quantity": {
"sum": {
"field": "item.quantity"
}
}
}
},
"filtered_product_quantity": {
"bucket_selector": {
"buckets_path": {
"productId": "unique_product.key",
"productQuantity": "unique_product>quantity"
},
"script": {
"params": {
"requiredQuantities": {
"20": null,
"21": null,
"22": null,
"23": 3,
"24": null
}
},
"lang": "painless",
"source": "params.requiredQuantities[params.productId] <= params.productQuantity"
}
}
}
}
}
}
}
And the error :
{
"error": {
"root_cause": [],
"type": "search_phase_execution_exception",
"reason": "",
"phase": "fetch",
"grouped": true,
"failed_shards": [],
"caused_by": {
"type": "aggregation_execution_exception",
"reason": "buckets_path must reference either a number value or a single value numeric metric aggregation, got: [Object[]] at aggregation [unique_product]"
}
},
"status": 500
}
Here is a sample document set :
[
{
product.id: 12,
item.cart_key: abc_123,
item.quantity: 2
},
{
product.id: 11,
item.cart_key: abc_123,
item.quantity: 1
},
{
product.id: 23,
item.cart_key: def_456,
item.quantity: 1
}
]
Is it the appropriate aggregation in use ?
In other way, I would like to :
Aggregate my documents by cart_key.
Per product.id , sum the quantity
Filter aggregations that have a quantity higher than a given Record object {[product.id]: minimum_quantity} (here is the requiredQuantities param
I don't know if the source script will works as elasticsearch can't reach it.
I don't think you handle the problem correctly, but here an hacky working solution
{
"size": 0,
"aggs": {
"carts": {
"terms": {
"field": "item.cart_key"
},
"aggs": {
"unique_product": {
"terms": {
"field": "product.id"
},
"aggs": {
"quantity": {
"sum": {
"field": "item.quantity"
}
},
"hackProductId": {
"max": {
"field": "product.id"
}
},
"filtered_product_quantity": {
"bucket_selector": {
"buckets_path": {
"productQuantity": "quantity",
"productId": "hackProductId"
},
"script": {
"params": {
"requiredQuantities": {
"12": 0,
"11": 0,
"22": 0,
"23": 0,
"24": 0
}
},
"lang": "painless",
"source": "params.requiredQuantities[((int)params.productId).toString()] <= params.productQuantity"
}
}
}
}
}
}
}
}
}

Is it possible to fetch count of total number of docs that contain a qualifying aggregation condition in elasticsearch?

I use ES v7.3 and as per my requirements I am aggregating some fields to fetch the required docs in response, further their is a requirement to fetch the count of total number of all such docs also that contain the nested field which qualifies the aggregation condition as described below but I did not find a way where I am able to do that.
Current aggregation query that I am using to fetch the documents is,
"aggs": {
"users": {
"composite": {
"sources": [
{
"users": {
"terms": {
"field": "co_profileId.keyword"
}
}
}
],
"size": 5000
},
"aggs": {
"sessions": {
"nested": {
"path": "co_score"
},
"aggs": {
"last_4_days": {
"filter": {
"range": {
"co_score.sessionTime": {
"gte": "2021-01-10T00:00:31.399Z",
"lte": "2021-01-14T01:37:31.399Z"
}
}
},
"aggs": {
"score_count": {
"sum": {
"field": "co_score.value"
}
}
}
}
}
},
"page_view_count_filter": {
"bucket_selector": {
"buckets_path": {
"sessionCount": "sessions > last_4_days > score_count"
},
"script": "params.sessionCount > 100"
}
},
"filtered_users": {
"top_hits": {
"size": 1,
"_source": {
"includes": [
"co_profileId",
"co_type",
"co_score"
]
}
}
}
}
}
}
Sample doc:
{
"co_profileId": "14654325",
"co_type": "identify",
"co_updatedAt": "2021-01-11T11:37:33.499Z",
"co_score": [
{
"value": 3,
"sessionTime": "2021-01-09T01:37:31.399Z"
},
{
"value": 3,
"sessionTime": "2021-01-10T10:47:33.419Z"
},
{
"value": 6,
"sessionTime": "2021-01-11T11:37:33.499Z"
}
]
}

access nested variable from sub-aggregation on elasticsearch

I have an index with documents that look like:
{
"id": 1,
"timeline": [{
"amount": {
"mpe": 30,
"drawn": 20
},
"interval": {
"gte": "2020-03-01",
"lte": "2020-04-01"
}
}, {
"amount": {
"mpe": 40,
"drawn": 10
},
"interval": {
"gte": "2020-04-01",
"lte": "2020-06-01"
}
}]
}
Then I have the following query that produces a time bucketed sum of the values from the original intervals:
{
"aggs": {
"cp-timeline": {
"nested": {
"path": "timeline"
},
"aggs": {
"mpes": {
"date_histogram": {
"field": "timeline.interval",
"calendar_interval": "day"
},
"aggs": {
"sum_mpe": {
"sum": {
"field": "timeline.amount.mpe"
}
},
"sum_drawn": {
"sum": {
"field": "timeline.amount.drawn"
}
}
}
}
}
}
}
}
The above works like a charm yielding the correct sum for each day. Now I want to improve it so I can dynamically multiply the values by a given number that may vary between query executions, although for simplicity I will just use a fixed number 2. I've tried the following:
{
"aggs": {
"cp-timeline": {
"nested": {
"path": "timeline"
},
"aggs": {
"mpes": {
"date_histogram": {
"field": "timeline.interval",
"calendar_interval": "day"
},
"aggs": {
"sum_mpe": {
"sum": {
"script": "timeline.amount.mpe * 2"
}
},
"sum_drawn": {
"sum": {
"script": "timeline.amount.drawn * 2"
}
}
}
}
}
}
}
}
But I get the following error:
{
"reason": {
"type": "script_exception",
"reason": "compile error",
"script_stack": [
"timeline.amount.mpe * 2",
"^---- HERE"
],
"script": "timeline.amount.mpe * 2",
"lang": "painless",
"position": {
"offset": 0,
"start": 0,
"end": 23
},
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Variable [timeline] is not defined."
}
}
}
Is there a way to make the nested variable declared above available in the script?
This link states as how to access the fields via script. Note that you can only use this for fields which are analyzed i.e. text type.
The below should help:
POST <your_index_name>/_search
{
"size": 0,
"aggs": {
"cp-timeline": {
"nested": {
"path": "timeline"
},
"aggs": {
"mpes": {
"date_histogram": {
"field": "timeline.interval.gte",
"calendar_interval": "day",
"min_doc_count": 1 <---- Note this
},
"aggs": {
"sum_mpe": {
"sum": {
"script": "doc['timeline.amount.mpe'].value * 2" <---- Note this
}
},
"sum_drawn": {
"sum": {
"script": "doc['timeline.amount.drawn'].value * 2" <---- Note this
}
}
}
}
}
}
}
}
Also note that I've made use of min_doc_count so that your histogram would only show you the valid dates.

elastic search : Aggregating the specific nested documents only

I want to aggregate the specific nested documents which satisfies the given query.
Let me explain it through an example. I have inserted two records in my index:
First document is,
{
"project": [
{
"subject": "maths",
"marks": 47
},
{
"subject": "computers",
"marks": 22
}
]
}
second document is,
{
"project": [
{
"subject": "maths",
"marks": 65
},
{
"subject": "networks",
"marks": 72
}
]
}
Which contains the subject along with the marks in each record. From that documents, I need to have an average of maths subject alone from the given documents.
The query I tried is:
{
"size": 0,
"aggs": {
"avg_marks": {
"avg": {
"field": "project.marks"
}
}
},
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "project.subject:maths",
"analyze_wildcard": true,
"default_field": "*"
}
}
]
}
}
}
Which is returning the result of aggregating all the marks average which is not required.
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0,
"hits": []
},
"aggregations": {
"avg_marks": {
"value": 51.5
}
}
}
I just need an average of maths subject from the given documents, in which the expected result is 56.00
any help with the query or idea will be helpful.
Thanks in advance.
First you need in your mapping to specify that index have nested field like following:
PUT /nested-index {
"mappings": {
"document": {
"properties": {
"project": {
"type": "nested",
"properties": {
"subject": {
"type": "keyword"
},
"marks": {
"type": "long"
}
}
}
}
}
}
}
then you insert your docs:
PUT nested-index/document/1
{
"project": [
{
"subject": "maths",
"marks": 47
},
{
"subject": "computers",
"marks": 22
}
]
}
then insert second doc:
PUT nested-index/document/2
{
"project": [
{
"subject": "maths",
"marks": 65
},
{
"subject": "networks",
"marks": 72
}
]
}
and then you do aggregation but specify that you have nested structure like this:
GET nested-index/_search
{
"size": 0,
"aggs": {
"subjects": {
"nested": {
"path": "project"
},
"aggs": {
"subjects": {
"terms": {
"field": "project.subject",
"size": 10
},
"aggs": {
"average": {
"avg": {
"field": "project.marks"
}
}
}
}
}
}
}
}
and why your query is not working and why give that result is because when you have nested field and do average it sums all number from one array if in that array you have some keyword doesn't matter that you want to aggregate only by one subject.
So if you have those two docs because in both docs you have math subject avg will be calculated like this:
(47 + 22 + 65 + 72) / 4 = 51.5
if you want avg for networks it will return you (because in one document you have network but it will do avg over all values in array):
65 + 72 = 68.5
so you need to use nested structure in this case.
If you are interested just for one subject you can than do aggregation just for subject equal to something like this (subject equal to "maths"):
GET nested-index/_search
{
"size": 0,
"aggs": {
"project": {
"nested": {
"path": "project"
},
"aggs": {
"subjects": {
"filter": {
"term": {
"project.subject": "maths"
}
},
"aggs": {
"average": {
"avg": {
"field": "project.marks"
}
}
}
}
}
}
}
}

How do I compute for the fields of matching documents in Elasticsearch?

Here is my sample document:
{
"jobID": "ace4c888-1907-4021-a808-4a816e99aa2e",
"startTime": 1415255164835,
"endTime": 1415255164898,
"moduleCode": "STARTING_MODULE"
}
I have thousands of documents.
I have a pair of documents with the same jobID and the module code would be STARTING_MODULE and ENDING_MODULE.
My formula would be ENDING_MODULE endTime minus STARTING_MODULE startTime equals the elapsed time it took the module to process.
My question is: How do I get the total of all results with the elapsed time that is less than let's say 28800000?
Is such results possible with Elasticsearch? I'd like to display my results in Kibana too.
Please let me know if this needs more clarification. Thanks!
Try the following, might not be ideal, but it returns a jobID and the elapsed time. First I'm assuming jobID and moduleCode are not_analyzed:
{
"mappings": {
"jobs": {
"properties": {
"jobID":{
"type": "string",
"index": "not_analyzed"
},
"startTime":{
"type": "date"
},
"endTime":{
"type": "date"
},
"moduleCode":{
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
Used scripted_metric aggregation available in ES 1.4.0 to compute the difference between those two values. Haven't looked into how to add the filtering for "less than 28800000", but I hope there can be something done with that script to limit this:
{
"query": {
"match_all": {}
},
"aggs": {
"jobIds": {
"terms": {
"field": "jobID"
},
"aggs": {
"executionTimes": {
"scripted_metric": {
"init_script": "_agg['time'] = 0L",
"map_script": "if (doc['moduleCode'].value == \"STARTING_MODULE\") { _agg['time']=-1*doc['startTime'].value } else { _agg['time']=doc['endTime'].value}",
"combine_script": "execution = 0; for (t in _agg.time) { execution += t };return execution",
"reduce_script": "execution = 0; for (a in _aggs) { execution += a }; return execution"
}
}
}
}
}
}
And the result should be something like this:
"aggregations": {
"jobIds": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "ace4c888-1907-4021-a808-4a816e99aa1e",
"doc_count": 2,
"executionTimes": {
"value": 1
}
},
{
"key": "ace4c888-1907-4021-a808-4a816e99aa2e",
"doc_count": 2,
"executionTimes": {
"value": 1000201063
}
},
{
"key": "ace4c888-1907-4021-a808-4a816e99aa3e",
"doc_count": 2,
"executionTimes": {
"value": 10000
}
}
]
}
}

Resources