why aggregation script is not working in elasticsearch? - elasticsearch

i have a some problem in elasticsearch.
i want division value with two aggregated values.
this query is working.
{
"query": {
"bool": {
"adjust_pure_negative": true,
"boost": 1.0
}
},
"aggregations": {
"sumPageview": {
"sum": {
"field": "pageview",
"missing": 0
}
},
"sumVisit": {
"sum": {
"field": "visit",
"missing": 0
}
}
}
but this query is not working.
{
"query": {
"bool": {
"adjust_pure_negative": true,
"boost": 1.0
}
},
"aggregations": {
"sumPageview": {
"sum": {
"field": "pageview",
"missing": 0
}
},
"sumVisit": {
"sum": {
"field": "visit",
"missing": 0
}
},
"totalPageviewPerVisit": {
"bucket_script": {
"buckets_path": {
"sumPageview": "sumPageview",
"sumVisit": "sumVisit"
},
"script": {
"source": "params.sumPageview / params.sumVisit",
"lang": "painless"
},
"gap_policy": "skip"
}
}
}
i think this reason is what sum value is not in bucket.
this reason right? help me, please.

Sum aggregation is a single-value metrics aggregation that sums
up numeric values that are extracted from the aggregated documents.
Bucket script aggregation is a parent pipeline aggregation that
executes a script that can perform per bucket computations on
specified metrics in the parent multi-bucket aggregation.
Because sum aggregation, do not create any buckets, so you cannot use bucket script aggregation on it.
Adding a working example with index data, search query, and search result
Index Data:
{
"user_id":1,
"pageview": 1,
"visit": 2
}
{
"user_id":2,
"pageview": 2,
"visit": 3
}
{
"user_id":3,
"pageview": 3,
"visit": 4
}
Search Query:
{
"size": 0,
"aggs": {
"all": {
"terms": {
"field": "user_id"
},
"aggs": {
"sum_1": {
"sum": {
"field": "pageview"
}
},
"sum_2": {
"sum": {
"field": "visit"
}
},
"division": {
"bucket_script": {
"buckets_path": {
"my_var1": "sum_1",
"my_var2": "sum_2"
},
"script": "params.my_var1 / params.my_var2"
}
}
}
}
}
}
Search Result:
"aggregations": {
"all": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 1,
"doc_count": 1,
"sum_2": {
"value": 2.0
},
"sum_1": {
"value": 1.0
},
"division": {
"value": 0.5
}
},
{
"key": 2,
"doc_count": 1,
"sum_2": {
"value": 3.0
},
"sum_1": {
"value": 2.0
},
"division": {
"value": 0.6666666666666666
}
},
{
"key": 3,
"doc_count": 1,
"sum_2": {
"value": 4.0
},
"sum_1": {
"value": 3.0
},
"division": {
"value": 0.75
}
}
]
}

Related

max from aggregated results elasticsearch

i am using elastic search to perform a simple aggregation. this aggreagtion gives total transactions from each country:
{
"query": {
"match_all": {}
},
"aggs": {
"distribution": {
"terms": {
"field": "country"
},
"aggs": {
"Transactions": {
"sum": {
"field": "transactions"
}
}
}
}
}
}
This gives me correct result as below:
"buckets": [
{
"key": "Australia",
"doc_count": 31,
"Transactions": {
"value": 9
}
}
,
{
"key": "Canada",
"doc_count": 31,
"Transactions": {
"value": 1
}
}
How do i modify the query to return the max value of transaction from aggregated results. here in this case output should be australia with value 9.

Elasticsearch: filter aggregation using bucket value

Not sure how to formulate the question.
I'm using Elasticsearch 2.2.
Let's start with an example of the dataset, made of 5 documents:
[
{
"header": {
"called_entity": { "uuid": "a" },
"coverage_entity": {},
"sucessful_transfers": 1
}
},
{
"header": {
"called_entity": { "uuid": "a" },
"coverage_entity": { "uuid": "b" },
"sucessful_transfers": 1
}
},
{
"header": {
"called_entity": { "uuid": "b" },
"coverage_entity": { "uuid": "a" },
"sucessful_transfers": 1
}
},
{
"header": {
"called_entity": { "uuid": "b" },
"coverage_entity": { "uuid": "a" },
"sucessful_transfers": 0
}
}
]
called_entity always has a uuid.
coverage_entity can be empty, or have an uuid.
I use a script to aggregate on either called_entity.uuid or coverage_entity.uuid:
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"dim1": {
"terms": {
"script" : "return doc['header.called_entity.uuid'] + doc['header.coverage_entity.uuid']",
"size": 10
},
"aggs": {
"successful_transfers": {
"sum": {
"field": "header.successful_transfers"
}
}
}
}
}
}
So now, the aggregation has generated terms from either header.called_entity.uuid, or header.coverage_entity.uuid.
How can I filter my aggregation using the value of the aggregation key? For example, if I want to count, for each bucket, how many documents have their uuid taken from header.called_entity.uuid only. Something like that:
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"dim1": {
"terms": {
"script" : "return doc['header.called_entity.uuid'] + doc['header.coverage_entity.uuid']",
"size": 10
},
"aggs": {
"successful_transfers": {
"sum": {
"field": "header.successful_transfers"
}
},
"from_called_entity": {
"filter": {
"term": { "header.called_entity.uuid": BUCKET_KEY }
}
}
}
}
}
}
Not sure this is possible. The key itself is only available as a sorting option.
Could you use something like this:
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"dim1": {
"terms": {
"script": "return doc['header.called_entity.uuid'] + doc['header.coverage_entity.uuid']",
"size": 10
},
"aggs": {
"successful_transfers": {
"sum": {
"field": "header.sucessful_transfers"
}
}
}
},
"called_entity_source": {
"terms": {
"field": "header.called_entity.uuid",
"size": 10
}
},
"coverage_entity_source": {
"terms": {
"field": "header.coverage_entity.uuid",
"size": 10
}
}
}
}
And the output will be something like this:
"called_entity_source": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "a",
"doc_count": 2
},
{
"key": "b",
"doc_count": 2
}
]
},
"coverage_entity_source": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "a",
"doc_count": 2
},
{
"key": "b",
"doc_count": 1
}
]
},
"dim1": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "a",
"doc_count": 4,
"successful_transfers": {
"value": 3
}
},
{
"key": "b",
"doc_count": 3,
"successful_transfers": {
"value": 2
}
}
]
}
If you really need to have the json in that specific way, add another final step in your application where you post process the result a bit. The result above does contain the info you need but the keys from coverage_entity_source and called_entity_source are not under the dim aggregation.

Elasticsearch range bucket aggregation based on doc_count

I have an elasticsearch aggregation query like this.
{
"aggs": {
"customer": {
"aggs": {
"Total_Sale": {
"sum": {
"field": "amount"
}
}
},
"terms": {
"field": "org",
"size": 50000
}
}
}
}
And it results in bucket aggregation like following
{
"aggregations": {
"customer": {
"buckets": [
{
"Total_Sale": { "value": 9999 },
"doc_count": 8,
"key": "cats"
},
{
"Total_Sale": { "value": 8888 },
"doc_count": 6,
"key": "tigers"
},
{
"Total_Sale": { "value": 444},
"doc_count": 5,
"key": "lions"
},
{
"Total_Sale": { "value": 555 },
"doc_count": 2,
"key": "wolves"
}
]
}
}
}
I want another range bucket aggregation based on doc_count. So, final result required is
{
"buckets": [
{
"Sum_of_Total_Sale": 555, // If I can form bucket, I can get this using sum_bucket. So, getting bucket is important.
"Sum_of_doc_count": 2,
"doc_count": 1,
"key": "*-3",
"to": 3.0
},
{
"Sum_of_Total_Sale": 9332,
"Sum_of_doc_count": 11,
"doc_count": 2,
"from": 4.0,
"key": "4-6",
"to": 6.0
},
{
"Sum_of_Total_Sale": 9999,
"Sum_of_doc_count": 8,
"doc_count": 1,
"from": 7.0,
"key": "7-*"
}
]
}
Bucket Selector Aggregation and then using bucket sum aggregation will not work because there is more than one key for range.
Bucket Script Aggregation does calculation within bucket.
Can I add scripted doc field for each document which help me to create these buckets?
There's no aggregation that I know of that can allow you to do this in one shot. however, there is one technique that I use from time to time to overcome this limitation. The idea is to repeat the same terms/sum aggregation and then use a bucket_selector pipeline aggregation for each of the ranges you're interested in.
POST index/_search
{
"size": 0,
"aggs": {
"*-3": {
"terms": {
"field": "org",
"size": 1000
},
"aggs": {
"Total_Sale": {
"sum": {
"field": "amount"
}
},
"*-3": {
"bucket_selector": {
"buckets_path": {
"docCount": "_count"
},
"script": "params.docCount <= 3"
}
}
}
},
"*-3_Total_Sales": {
"sum_bucket": {
"buckets_path": "*-3>Total_Sale"
}
},
"*-3_Total_Docs": {
"sum_bucket": {
"buckets_path": "*-3>_count"
}
},
"4-6": {
"terms": {
"field": "org",
"size": 1000
},
"aggs": {
"Total_Sale": {
"sum": {
"field": "amount"
}
},
"4-6": {
"bucket_selector": {
"buckets_path": {
"docCount": "_count"
},
"script": "params.docCount >= 4 && params.docCount <= 6"
}
}
}
},
"4-6_Total_Sales": {
"sum_bucket": {
"buckets_path": "4-6>Total_Sale"
}
},
"4-6_Total_Docs": {
"sum_bucket": {
"buckets_path": "4-6>_count"
}
},
"7-*": {
"terms": {
"field": "org",
"size": 1000
},
"aggs": {
"Total_Sale": {
"sum": {
"field": "amount"
}
},
"7-*": {
"bucket_selector": {
"buckets_path": {
"docCount": "_count"
},
"script": "params.docCount >= 7"
}
}
}
},
"7-*_Total_Sales": {
"sum_bucket": {
"buckets_path": "7-*>Total_Sale"
}
},
"7_*_Total_Docs": {
"sum_bucket": {
"buckets_path": "7-*>_count"
}
}
}
}
You'll get an answer that looks like this, which contains exactly the figures you're looking for in the xyz_Total_Sales and xyz_Total_Docs results:
"aggregations": {
"*-3": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "wolves",
"doc_count": 2,
"Total_Sale": {
"value": 555
}
}
]
},
"7-*": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "cats",
"doc_count": 8,
"Total_Sale": {
"value": 9999
}
}
]
},
"4-6": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "tigers",
"doc_count": 6,
"Total_Sale": {
"value": 8888
}
},
{
"key": "lions",
"doc_count": 5,
"Total_Sale": {
"value": 444
}
}
]
},
"*-3_Total_Sales": {
"value": 555
},
"*-3_Total_Docs": {
"value": 2
},
"4-6_Total_Sales": {
"value": 9332
},
"4-6_Total_Docs": {
"value": 11
},
"7-*_Total_Sales": {
"value": 9999
},
"7_*_Total_Docs": {
"value": 8
}
}

ElasticSearch - calculate ratio between aggregation buckets

I'm using the following terms aggregations to get views and clicks of each campaign ( by campaign_id ) :
"aggregations": {
"campaigns": {
"terms": {
"field": "campaign_id",
"size": 10,
"order": {
"_term": "asc"
}
},
"aggregations": {
"actions": {
"terms": {
"field": "action",
"size": 10
}
}
}
}}
This is the response I get:
"aggregations": {
"campaigns": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "someId",
"doc_count": 12,
"actions": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "click",
"doc_count": 3
},
{
"key": "view",
"doc_count": 9
}
]
}
}
]
}
}
EDIT:
Here is an example of a document ( only the relevant parts of it..):
{
"_index": "action",
"_type": "click",
"_id": "AVI2XOTl8otXlszOjypT",
"_score": 1,
"_source": {
"ip": "127.0.0.1",
"timestamp": "2016-01-12T15:03:23.622743524Z",
"action": "click",
"campaign_id": "IypmiroC"
}}
I need to be able to retrieve the conversion rate of each campaign ( clicks / views ) , and I can't do it on the client side since I need to be able to sort by conversion rate.
Any help would be much appreciated.
This will require use of various aggregations and ES 2.x. First I am getting all unique campaign_id with terms aggregation. Then I am filtering with actions and getting the count of documents with that particular action. Then You need to use pipeline aggregation introduced in ES 2.0, mainly bucket script aggregation to take the ratio. This is how it looks.
{
"size": 0,
"aggs": {
"unique_campaign": {
"terms": {
"field": "campaign_id",
"size": 10
},
"aggs": {
"click_bucket": {
"filter": {
"term": {
"action": "click"
}
},
"aggs": {
"click_count": {
"value_count": {
"field": "action"
}
}
}
},
"view_bucket": {
"filter": {
"term": {
"action": "view"
}
},
"aggs": {
"view_count": {
"value_count": {
"field": "action"
}
}
}
},
"conversion_ratio": {
"bucket_script": {
"buckets_path": {
"total_clicks": "click_bucket>click_count",
"total_views": "view_bucket>view_count"
},
"script": "total_clicks/total_views"
}
}
}
}
}
}
Also, you need to have not_analyzed mapping for action as Click wont match click.
Hope this helps!!
As for now 7.x, sorting can be achieved as follows, just a demo for reference:
bucket_script
bucket_sort
{
"size": 0,
"aggs": {
"mallBucket": {
"terms": {
"field": "mallId",
"size": 20,
"min_doc_count": 3,
"shard_size": 10000
},
"aggs": {
"totalOrderCount": {
"value_count": {
"field": "orderSn"
}
},
"filteredCoupon": {
"filter": {
"terms": {
"tags": [
"hello",
"cool"
]
}
},
"aggs": {
"couponCount": {
"value_count": {
"field": "orderSn"
}
}
}
},
"countRatio": {
"bucket_script": {
"buckets_path": {
"orderCount": "totalOrderCount",
"couponCount": "filteredCoupon>couponCount"
},
"script": "params.couponCount/params.orderCount"
}
},
"ratio_bucket_sort": {
"bucket_sort": {
"sort": [
{
"countRatio": {
"order": "desc"
}
}
],
"size": 20
}
}
}
}
}
}

How to calculate difference between metrics in different aggregations in elasticsearch

I want to calculate the difference of nested aggregations between two dates.
To be more concrete is it possible to calculate the difference between date_1.buckets.field_1.buckets.field_2.buckets.field_3.value - date_2.buckets.field_1.buckets.field_2.buckets.field_3.value given the below request/response. Is that possible with elasticsearch v.1.0.1?
The aggregation query request looks like this:
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"terms": {
"date": [
"2014-08-18 00:00:00.0",
"2014-08-15 00:00:00.0"
]
}
}
]
}
}
}
},
"aggs": {
"date_1": {
"filter": {
"terms": {
"date": [
"2014-08-18 00:00:00.0"
]
}
},
"aggs": {
"my_agg_1": {
"terms": {
"field": "field_1",
"size": 2147483647,
"order": {
"_term": "desc"
}
},
"aggs": {
"my_agg_2": {
"terms": {
"field": "field_2",
"size": 2147483647,
"order": {
"_term": "desc"
}
},
"aggs": {
"my_agg_3": {
"sum": {
"field": "field_3"
}
}
}
}
}
}
}
},
"date_2": {
"filter": {
"terms": {
"date": [
"2014-08-15 00:00:00.0"
]
}
},
"aggs": {
"my_agg_1": {
"terms": {
"field": "field_1",
"size": 2147483647,
"order": {
"_term": "desc"
}
},
"aggs": {
"my_agg_1": {
"terms": {
"field": "field_2",
"size": 2147483647,
"order": {
"_term": "desc"
}
},
"aggs": {
"my_agg_3": {
"sum": {
"field": "field_3"
}
}
}
}
}
}
}
}
}
}
And the response looks like this:
{
"took": 236,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1646,
"max_score": 0,
"hits": []
},
"aggregations": {
"date_1": {
"doc_count": 823,
"field_1": {
"buckets": [
{
"key": "field_1_key_1",
"doc_count": 719,
"field_2": {
"buckets": [
{
"key": "key_1",
"doc_count": 275,
"field_3": {
"value": 100
}
}
]
}
}
]
}
},
"date_2": {
"doc_count": 823,
"field_1": {
"buckets": [
{
"key": "field_1_key_1",
"doc_count": 719,
"field_2": {
"buckets": [
{
"key": "key_1",
"doc_count": 275,
"field_3": {
"value": 80
}
}
]
}
}
]
}
}
}
}
Thank you.
With elasticsearch new version (eg: 5.6.9) is possible:
{
"size": 0,
"query": {
"constant_score": {
"filter": {
"bool": {
"filter": [
{
"range": {
"date_created": {
"gte": "2018-06-16T00:00:00+02:00",
"lte": "2018-06-16T23:59:59+02:00"
}
}
}
]
}
}
}
},
"aggs": {
"by_millisec": {
"range" : {
"script" : {
"lang": "painless",
"source": "doc['date_delivered'][0] - doc['date_created'][0]"
},
"ranges" : [
{ "key": "<1sec", "to": 1000.0 },
{ "key": "1-5sec", "from": 1000.0, "to": 5000.0 },
{ "key": "5-30sec", "from": 5000.0, "to": 30000.0 },
{ "key": "30-60sec", "from": 30000.0, "to": 60000.0 },
{ "key": "1-2min", "from": 60000.0, "to": 120000.0 },
{ "key": "2-5min", "from": 120000.0, "to": 300000.0 },
{ "key": "5-10min", "from": 300000.0, "to": 600000.0 },
{ "key": ">10min", "from": 600000.0 }
]
}
}
}
}
No arithmetic operations are allowed between two aggregations' result from elasticsearch DSL, not even using scripts. (Upto version 1.1.1, at least I know)
Such operations need to be handeled in client side after processing the aggs result.
Reference
elasticsearch aggregation to sort by ratio of aggregations
In 1.0.1 I couldn't find anything but in 1.4.2 you could try scripted_metric aggregation (still experimental).
Here are the scripted_metric documentation page
I am not good with the elasticsearch syntax but I think your metric inputs would be:
init_script- just initialize a accumulator for each date:
"init_script": "_agg.d1Val = 0; _agg.d2Val = 0;"
map_script- test the date of the document and add to the right accumulator:
"map_script": "if (doc.date == firstDate) { _agg.d1Val += doc.field_3; } else { _agg.d2Val = doc.field_3;};",
reduce_script - accumulate intermediate data from various shards and return the final results:
"reduce_script": "totalD1 = 0; totalD2 = 0; for (agg in _aggs) { totalD1 += agg.d1Val ; totalD2 += agg.d2Val ;}; return totalD1 - totalD2"
I don't think that in this case you need a combine_script.
If course, if you can't use 1.4.2 than this is no help :-)

Resources