I'm using the following terms aggregations to get views and clicks of each campaign ( by campaign_id ) :
"aggregations": {
"campaigns": {
"terms": {
"field": "campaign_id",
"size": 10,
"order": {
"_term": "asc"
}
},
"aggregations": {
"actions": {
"terms": {
"field": "action",
"size": 10
}
}
}
}}
This is the response I get:
"aggregations": {
"campaigns": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "someId",
"doc_count": 12,
"actions": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "click",
"doc_count": 3
},
{
"key": "view",
"doc_count": 9
}
]
}
}
]
}
}
EDIT:
Here is an example of a document ( only the relevant parts of it..):
{
"_index": "action",
"_type": "click",
"_id": "AVI2XOTl8otXlszOjypT",
"_score": 1,
"_source": {
"ip": "127.0.0.1",
"timestamp": "2016-01-12T15:03:23.622743524Z",
"action": "click",
"campaign_id": "IypmiroC"
}}
I need to be able to retrieve the conversion rate of each campaign ( clicks / views ) , and I can't do it on the client side since I need to be able to sort by conversion rate.
Any help would be much appreciated.
This will require use of various aggregations and ES 2.x. First I am getting all unique campaign_id with terms aggregation. Then I am filtering with actions and getting the count of documents with that particular action. Then You need to use pipeline aggregation introduced in ES 2.0, mainly bucket script aggregation to take the ratio. This is how it looks.
{
"size": 0,
"aggs": {
"unique_campaign": {
"terms": {
"field": "campaign_id",
"size": 10
},
"aggs": {
"click_bucket": {
"filter": {
"term": {
"action": "click"
}
},
"aggs": {
"click_count": {
"value_count": {
"field": "action"
}
}
}
},
"view_bucket": {
"filter": {
"term": {
"action": "view"
}
},
"aggs": {
"view_count": {
"value_count": {
"field": "action"
}
}
}
},
"conversion_ratio": {
"bucket_script": {
"buckets_path": {
"total_clicks": "click_bucket>click_count",
"total_views": "view_bucket>view_count"
},
"script": "total_clicks/total_views"
}
}
}
}
}
}
Also, you need to have not_analyzed mapping for action as Click wont match click.
Hope this helps!!
As for now 7.x, sorting can be achieved as follows, just a demo for reference:
bucket_script
bucket_sort
{
"size": 0,
"aggs": {
"mallBucket": {
"terms": {
"field": "mallId",
"size": 20,
"min_doc_count": 3,
"shard_size": 10000
},
"aggs": {
"totalOrderCount": {
"value_count": {
"field": "orderSn"
}
},
"filteredCoupon": {
"filter": {
"terms": {
"tags": [
"hello",
"cool"
]
}
},
"aggs": {
"couponCount": {
"value_count": {
"field": "orderSn"
}
}
}
},
"countRatio": {
"bucket_script": {
"buckets_path": {
"orderCount": "totalOrderCount",
"couponCount": "filteredCoupon>couponCount"
},
"script": "params.couponCount/params.orderCount"
}
},
"ratio_bucket_sort": {
"bucket_sort": {
"sort": [
{
"countRatio": {
"order": "desc"
}
}
],
"size": 20
}
}
}
}
}
}
Related
i have a some problem in elasticsearch.
i want division value with two aggregated values.
this query is working.
{
"query": {
"bool": {
"adjust_pure_negative": true,
"boost": 1.0
}
},
"aggregations": {
"sumPageview": {
"sum": {
"field": "pageview",
"missing": 0
}
},
"sumVisit": {
"sum": {
"field": "visit",
"missing": 0
}
}
}
but this query is not working.
{
"query": {
"bool": {
"adjust_pure_negative": true,
"boost": 1.0
}
},
"aggregations": {
"sumPageview": {
"sum": {
"field": "pageview",
"missing": 0
}
},
"sumVisit": {
"sum": {
"field": "visit",
"missing": 0
}
},
"totalPageviewPerVisit": {
"bucket_script": {
"buckets_path": {
"sumPageview": "sumPageview",
"sumVisit": "sumVisit"
},
"script": {
"source": "params.sumPageview / params.sumVisit",
"lang": "painless"
},
"gap_policy": "skip"
}
}
}
i think this reason is what sum value is not in bucket.
this reason right? help me, please.
Sum aggregation is a single-value metrics aggregation that sums
up numeric values that are extracted from the aggregated documents.
Bucket script aggregation is a parent pipeline aggregation that
executes a script that can perform per bucket computations on
specified metrics in the parent multi-bucket aggregation.
Because sum aggregation, do not create any buckets, so you cannot use bucket script aggregation on it.
Adding a working example with index data, search query, and search result
Index Data:
{
"user_id":1,
"pageview": 1,
"visit": 2
}
{
"user_id":2,
"pageview": 2,
"visit": 3
}
{
"user_id":3,
"pageview": 3,
"visit": 4
}
Search Query:
{
"size": 0,
"aggs": {
"all": {
"terms": {
"field": "user_id"
},
"aggs": {
"sum_1": {
"sum": {
"field": "pageview"
}
},
"sum_2": {
"sum": {
"field": "visit"
}
},
"division": {
"bucket_script": {
"buckets_path": {
"my_var1": "sum_1",
"my_var2": "sum_2"
},
"script": "params.my_var1 / params.my_var2"
}
}
}
}
}
}
Search Result:
"aggregations": {
"all": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 1,
"doc_count": 1,
"sum_2": {
"value": 2.0
},
"sum_1": {
"value": 1.0
},
"division": {
"value": 0.5
}
},
{
"key": 2,
"doc_count": 1,
"sum_2": {
"value": 3.0
},
"sum_1": {
"value": 2.0
},
"division": {
"value": 0.6666666666666666
}
},
{
"key": 3,
"doc_count": 1,
"sum_2": {
"value": 4.0
},
"sum_1": {
"value": 3.0
},
"division": {
"value": 0.75
}
}
]
}
I am new to Elastic Search so please forgive me if the answer is obvious.
I have modified a query to use aggs to show 'distinct' results. However, after adding the aggs the size doesn't seem to work anymore - it always returns 10 results no matter what I set size to.
Would anyone know how I could use both aggs and size together?
My query is:
{
"size": "15",
"from": "0",
"query": {
"bool": {
"filter": [
{
"term": {
"category": "Cars"
}
},
{
"term": {
"location": "Sydney"
}
},
{
"term": {
"status": true
}
}
]
}
},
"sort": [
{
"_score": "desc"
},
{
"brand": "asc"
}
],
"aggs": {
"brand": {
"terms": {
"field": "brand",
"order": {
"price": "asc"
}
},
"aggs": {
"brand": {
"top_hits": {
"size": 1,
"sort": [
{
"price": {
"order": "asc"
}
}
]
}
},
"price": {
"min": {
"field": "price"
}
}
}
}
}
}
The size parameter you have mentioned before the query, is used to set the size for the query hits, and will not affect the aggregations bucket size.
Use the size parameter inside the parent aggregation just like you have mentioned in the sub-aggregation as "size":1
The modified query to get top 10 aggs is :
{
"size": "15",
"from": "0",
"query": {
"bool": {
"filter": [
{
"term": {
"category": "Cars"
}
},
{
"term": {
"location": "Sydney"
}
},
{
"term": {
"status": true
}
}
]
}
},
"sort": [
{
"_score": "desc"
},
{
"brand": "asc"
}
],
"aggs": {
"brand": {
"terms": {
"field": "brand",
"size": 10,
"order": {
"price": "asc"
}
},
"aggs": {
"brand": {
"top_hits": {
"size": 1,
"sort": [
{
"price": {
"order": "asc"
}
}
]
}
},
"price": {
"min": {
"field": "price"
}
}
}
}
}
}
Hope this helps.
my index consists of documents like this one
{
"clientPorttopKByCount": [
{
"value": 1,
"key": "41770"
},
{
"value": 1,
"key": "41791"
}
],
"timestamp": 1574335260000,
}
Requirement : group by clientPorttopKByCount.key and sum the clientPorttopKByCount.value for every 60 seconds of histogram
My current ES Query : ( It is giving the wrong sum )
"aggregations": {
"clientPorttopKByCount.key": {
"nested": {
"path": "clientPorttopKByCount"
},
"aggregations": {
"orders": {
"terms": {
"field": "clientPorttopKByCount.key",
"size": 5000,
"min_doc_count": 1,
"shard_min_doc_count": 0,
"show_term_doc_count_error": false,
"order": [
{
"_count": "desc"
},
{
"_key": "asc"
}
]
},
"aggregations": {
"records": {
"reverse_nested": {
},
"aggregations": {
"histogram": {
"histogram": {
"field": "timestamp",
"interval": 60000.0,
"offset": 0.0,
"order": {
"_key": "asc"
},
"keyed": false,
"min_doc_count": 0
},
"aggregations": {
"clientPorttopKByCount.key": {
"nested": {
"path": "clientPorttopKByCount"
},
"aggregations": {
"clientPorttopKByCount.value_sum": {
"sum": {
"field": "clientPorttopKByCount.value"
}
}
}
}
}
}
}
}
}
}
}
}
}
the problem: it is giving the sum of all the histogram minutes for a single key.
Please help me to solve this.
I have a query as follows:
{
"size": 0,
"query": {
"filtered": {
"query": {
"bool": {
"must": [
{
"match": {
"_type": "grx-ipx"
}
},
{
"range": {
"#timestamp": {
"gte": "2015-09-08T15:00:00.000Z",
"lte": "2015-09-08T15:10:00.000Z"
}
}
}
]
}
},
"filter": {
"and": [
{
"terms": {
"inSightCustID": [
"ASD001",
"ZXC049"
]
}
},
{
"terms": {
"reportFamily": [
"GRXoIPX",
"LTEoIPX"
]
}
}
]
}
}
},
"_source": [
"inSightCustID",
"fiveMinuteIn",
"reportFamily",
"#timestamp"
],
"aggs": {
"timestamp": {
"terms": {
"field": "#timestamp",
"size": 5
},
"aggs": {
"reportFamily": {
"terms": {
"field": "reportFamily"
},
"aggs": {
"averageFiveMinute": {
"avg": {
"field": "fiveMinuteIn"
}
}
}
}
}
},
"distinct_timestamps": {
"cardinality": {
"field": "#timestamp"
}
}
}
}
This result of this query looks like:
...
"aggregations": {
"distinct_timestamps": {
"value": 3,
"value_as_string": "1970-01-01T00:00:00.003Z"
},
"timestamp": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 1441724700000,
"key_as_string": "2015-09-08T15:05:00.000Z",
"doc_count": 10,
"reportFamily": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "GRXoIPX",
"doc_count": 5,
"averageFiveMinute": {
"value": 1687.6
}
},
{
"key": "LTEoIPX",
"doc_count": 5,
"averageFiveMinute": {
"value": 56710.6
}
}
]
}
},
...
What I want to do is for each bucket in the reportFamily aggregation, I want to show the sum of the averageFiveMinute values. So for instance, in the example above, I would also like to show the sum of 1687.6 and 56710.6. I want to do this for all reportFamily aggregations.
Here is what I have tried:
{
"size": 0,
"query": {
"filtered": {
"query": {
"bool": {
"must": [
{
"match": {
"_type": "grx-ipx"
}
},
{
"range": {
"#timestamp": {
"gte": "2015-09-08T15:00:00.000Z",
"lte": "2015-09-08T15:10:00.000Z"
}
}
}
]
}
},
"filter": {
"and": [
{
"terms": {
"inSightCustID": [
"ASD001",
"ZXC049"
]
}
},
{
"terms": {
"reportFamily": [
"GRXoIPX",
"LTEoIPX"
]
}
}
]
}
}
},
"_source": [
"inSightCustID",
"fiveMinuteIn",
"reportFamily",
"#timestamp"
],
"aggs": {
"timestamp": {
"terms": {
"field": "#timestamp",
"size": 5
},
"aggs": {
"reportFamily": {
"terms": {
"field": "reportFamily"
},
"aggs": {
"averageFiveMinute": {
"avg": {
"field": "fiveMinuteIn"
}
}
}
},
"sum_AvgFiveMinute": {
"sum_bucket": {
"buckets_path": "reportFamily>averageFiveMinute"
}
}
}
},
"distinct_timestamps": {
"cardinality": {
"field": "#timestamp"
}
}
}
}
I have added:
"sum_AvgFiveMinute": {
"sum_bucket": {
"buckets_path": "reportFamily>averageFiveMinute"
}
}
But unfortunately, this triggers an exception Parse Failure [Could not find aggregator type [sum_bucket] in [sum_AvgFiveMinute]
I expected the results to be something like:
...
"aggregations": {
"distinct_timestamps": {
"value": 3,
"value_as_string": "1970-01-01T00:00:00.003Z"
},
"timestamp": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 1441724700000,
"key_as_string": "2015-09-08T15:05:00.000Z",
"doc_count": 10,
"reportFamily": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "GRXoIPX",
"doc_count": 5,
"averageFiveMinute": {
"value": 1687.6
}
},
{
"key": "LTEoIPX",
"doc_count": 5,
"averageFiveMinute": {
"value": 56710.6
}
}
]
},
"sum_AvgFiveMinute": {
"value": 58398.2
}
},
...
What is wrong with this query and how can I achieve the expected result?
Here is a link to the sum bucket aggregation docs.
Many thanks for the help.
I want to calculate the difference of nested aggregations between two dates.
To be more concrete is it possible to calculate the difference between date_1.buckets.field_1.buckets.field_2.buckets.field_3.value - date_2.buckets.field_1.buckets.field_2.buckets.field_3.value given the below request/response. Is that possible with elasticsearch v.1.0.1?
The aggregation query request looks like this:
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"terms": {
"date": [
"2014-08-18 00:00:00.0",
"2014-08-15 00:00:00.0"
]
}
}
]
}
}
}
},
"aggs": {
"date_1": {
"filter": {
"terms": {
"date": [
"2014-08-18 00:00:00.0"
]
}
},
"aggs": {
"my_agg_1": {
"terms": {
"field": "field_1",
"size": 2147483647,
"order": {
"_term": "desc"
}
},
"aggs": {
"my_agg_2": {
"terms": {
"field": "field_2",
"size": 2147483647,
"order": {
"_term": "desc"
}
},
"aggs": {
"my_agg_3": {
"sum": {
"field": "field_3"
}
}
}
}
}
}
}
},
"date_2": {
"filter": {
"terms": {
"date": [
"2014-08-15 00:00:00.0"
]
}
},
"aggs": {
"my_agg_1": {
"terms": {
"field": "field_1",
"size": 2147483647,
"order": {
"_term": "desc"
}
},
"aggs": {
"my_agg_1": {
"terms": {
"field": "field_2",
"size": 2147483647,
"order": {
"_term": "desc"
}
},
"aggs": {
"my_agg_3": {
"sum": {
"field": "field_3"
}
}
}
}
}
}
}
}
}
}
And the response looks like this:
{
"took": 236,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1646,
"max_score": 0,
"hits": []
},
"aggregations": {
"date_1": {
"doc_count": 823,
"field_1": {
"buckets": [
{
"key": "field_1_key_1",
"doc_count": 719,
"field_2": {
"buckets": [
{
"key": "key_1",
"doc_count": 275,
"field_3": {
"value": 100
}
}
]
}
}
]
}
},
"date_2": {
"doc_count": 823,
"field_1": {
"buckets": [
{
"key": "field_1_key_1",
"doc_count": 719,
"field_2": {
"buckets": [
{
"key": "key_1",
"doc_count": 275,
"field_3": {
"value": 80
}
}
]
}
}
]
}
}
}
}
Thank you.
With elasticsearch new version (eg: 5.6.9) is possible:
{
"size": 0,
"query": {
"constant_score": {
"filter": {
"bool": {
"filter": [
{
"range": {
"date_created": {
"gte": "2018-06-16T00:00:00+02:00",
"lte": "2018-06-16T23:59:59+02:00"
}
}
}
]
}
}
}
},
"aggs": {
"by_millisec": {
"range" : {
"script" : {
"lang": "painless",
"source": "doc['date_delivered'][0] - doc['date_created'][0]"
},
"ranges" : [
{ "key": "<1sec", "to": 1000.0 },
{ "key": "1-5sec", "from": 1000.0, "to": 5000.0 },
{ "key": "5-30sec", "from": 5000.0, "to": 30000.0 },
{ "key": "30-60sec", "from": 30000.0, "to": 60000.0 },
{ "key": "1-2min", "from": 60000.0, "to": 120000.0 },
{ "key": "2-5min", "from": 120000.0, "to": 300000.0 },
{ "key": "5-10min", "from": 300000.0, "to": 600000.0 },
{ "key": ">10min", "from": 600000.0 }
]
}
}
}
}
No arithmetic operations are allowed between two aggregations' result from elasticsearch DSL, not even using scripts. (Upto version 1.1.1, at least I know)
Such operations need to be handeled in client side after processing the aggs result.
Reference
elasticsearch aggregation to sort by ratio of aggregations
In 1.0.1 I couldn't find anything but in 1.4.2 you could try scripted_metric aggregation (still experimental).
Here are the scripted_metric documentation page
I am not good with the elasticsearch syntax but I think your metric inputs would be:
init_script- just initialize a accumulator for each date:
"init_script": "_agg.d1Val = 0; _agg.d2Val = 0;"
map_script- test the date of the document and add to the right accumulator:
"map_script": "if (doc.date == firstDate) { _agg.d1Val += doc.field_3; } else { _agg.d2Val = doc.field_3;};",
reduce_script - accumulate intermediate data from various shards and return the final results:
"reduce_script": "totalD1 = 0; totalD2 = 0; for (agg in _aggs) { totalD1 += agg.d1Val ; totalD2 += agg.d2Val ;}; return totalD1 - totalD2"
I don't think that in this case you need a combine_script.
If course, if you can't use 1.4.2 than this is no help :-)