Filter hits from aggregation buckets - elasticsearch

I want to filter out hits to only return hits that are in my aggregation bucket.
{
"from": 0,
"aggs": {
"id.raw": {
"terms": {
"field": "id.raw",
"size": 0
},
"aggs": {
"id_bucket_filter": {
"bucket_selector": {
"buckets_path": {
"count": "_count"
},
"script": {
"inline": "count == 1"
}
}
}
}
}
}
}
Result aggregation:
"id.raw": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "1726200CFABY",
"doc_count": 1
}
]
}
But I have 66 hits. I want only 1 hit. The 1 document for key 1726200CFABY
"hits": {
"total": 66,
"max_score": 1,
"hits": [
{
How can I get back only rows for the ids that match my aggregation buckets
EDIT: From Val comment I tried
{
"size": 0,
"aggs": {
"id.raw": {
"terms": {
"field": "id.raw",
"size": 0
},
"aggs": {
"top_hits": {
"top_hits": {
"size": 1
}
},
"id_bucket_filter": {
"bucket_selector": {
"buckets_path": {
"count": "_count"
},
"script": {
"inline": "count == 1"
}
}
}
}
}
}
}
I think i'm good now

Related

why aggregation script is not working in elasticsearch?

i have a some problem in elasticsearch.
i want division value with two aggregated values.
this query is working.
{
"query": {
"bool": {
"adjust_pure_negative": true,
"boost": 1.0
}
},
"aggregations": {
"sumPageview": {
"sum": {
"field": "pageview",
"missing": 0
}
},
"sumVisit": {
"sum": {
"field": "visit",
"missing": 0
}
}
}
but this query is not working.
{
"query": {
"bool": {
"adjust_pure_negative": true,
"boost": 1.0
}
},
"aggregations": {
"sumPageview": {
"sum": {
"field": "pageview",
"missing": 0
}
},
"sumVisit": {
"sum": {
"field": "visit",
"missing": 0
}
},
"totalPageviewPerVisit": {
"bucket_script": {
"buckets_path": {
"sumPageview": "sumPageview",
"sumVisit": "sumVisit"
},
"script": {
"source": "params.sumPageview / params.sumVisit",
"lang": "painless"
},
"gap_policy": "skip"
}
}
}
i think this reason is what sum value is not in bucket.
this reason right? help me, please.
Sum aggregation is a single-value metrics aggregation that sums
up numeric values that are extracted from the aggregated documents.
Bucket script aggregation is a parent pipeline aggregation that
executes a script that can perform per bucket computations on
specified metrics in the parent multi-bucket aggregation.
Because sum aggregation, do not create any buckets, so you cannot use bucket script aggregation on it.
Adding a working example with index data, search query, and search result
Index Data:
{
"user_id":1,
"pageview": 1,
"visit": 2
}
{
"user_id":2,
"pageview": 2,
"visit": 3
}
{
"user_id":3,
"pageview": 3,
"visit": 4
}
Search Query:
{
"size": 0,
"aggs": {
"all": {
"terms": {
"field": "user_id"
},
"aggs": {
"sum_1": {
"sum": {
"field": "pageview"
}
},
"sum_2": {
"sum": {
"field": "visit"
}
},
"division": {
"bucket_script": {
"buckets_path": {
"my_var1": "sum_1",
"my_var2": "sum_2"
},
"script": "params.my_var1 / params.my_var2"
}
}
}
}
}
}
Search Result:
"aggregations": {
"all": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 1,
"doc_count": 1,
"sum_2": {
"value": 2.0
},
"sum_1": {
"value": 1.0
},
"division": {
"value": 0.5
}
},
{
"key": 2,
"doc_count": 1,
"sum_2": {
"value": 3.0
},
"sum_1": {
"value": 2.0
},
"division": {
"value": 0.6666666666666666
}
},
{
"key": 3,
"doc_count": 1,
"sum_2": {
"value": 4.0
},
"sum_1": {
"value": 3.0
},
"division": {
"value": 0.75
}
}
]
}

Is there a way to compare string alphabetically in painless

I would like to execute this kind of operation in painless :
if (_value >= 'c)' {
return _value
} else {
return '__BAD__'
}
value is a string and I would like this following behaviour :
if value is foo I want to replace it with __BAD__ if the value is bar, I want to keep bar. only values alphabetically after 'c' should be set to __BAD__.
I got this exception :
"lang": "painless",
"caused_by": {
"type": "class_cast_exception",
"reason": "Cannot apply [>] operation to types [java.lang.String] and [java.lang.String]."
}
Is there a way to perform string alphabetical comparaison between string in painless ?
My documents are looking :
{
"id": "doca",
"categoryId": "aaa",
"parentNames": "a$aa$aaa"
},
{
"id": "docb",
"categoryId": "bbb",
"parentNames": "a$aa$bbb"
},
{
"id": "docz",
"categoryId": "zzz",
"parentNames": "a$aa$zzz"
}
and my query is like :
{
"query": {
"bool": {
"filter": []
}
},
"size": 0,
"aggs": {
"catNames": {
"terms": {
"size": 10000,
"order": {
"_key": "asc"
},
"script": {
"source": "if(doc['parentNames'].value < 'a$aa$ccc') {return doc['parentNames'].value} return '__BAD__'",
"lang": "painless"
}
},
"aggs": {
"sort": {
"bucket_sort": {
"size": 2
}
},
"catId": {
"terms": {
"field": "categoryId",
"size": 1
}
}
}
}
}
}
I am expecting the result :
{
"took": 29,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0,
"hits": []
},
"aggregations": {
"catNames": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "__BAD__",
"doc_count": 1,
"catId": {
"buckets": [
{
"key": "aaa",
"doc_count": 1
}
]
}
},
{
"key": "a$aa$bbb",
"doc_count": 1,
"catId": {
"buckets": [
{
"key": "bbb",
"doc_count": 1
}
]
}
},
{
"key": "a$aa$zzz",
"doc_count": 1,
"catId": {
"buckets": [
{
"key": "zzz",
"doc_count": 1
}
]
}
}
]
}
}
}
In fact, I can use the compareTo function of java.lang.String.
if (_value.compareTo('c') > 0) {
return _value
} else {
return '__BAD__'
}
My query is becoming :
{
"query": {
"bool": {
"filter": []
}
},
"size": 0,
"aggs": {
"catNames": {
"terms": {
"size": 10000,
"order": {
"_key": "asc"
},
"script": {
"source": "if(doc['parentNames'].value.compareTo('a$aa$ccc')) {return doc['parentNames'].value} return '__BAD__'",
"lang": "painless"
}
},
"aggs": {
"sort": {
"bucket_sort": {
"size": 2
}
},
"catId": {
"terms": {
"field": "categoryId",
"size": 1
}
}
}
}
}
}

Elasticsearch sort on formula with values from nested aggregations

I would like to sort on a formula that takes value from nested aggregation and compute it with parent aggregation document count.
I want to sort by this formula result:
national_averages_9_10.avg * national_averages_9_10.count / key.doc_count
where key.doc_count = root bucket document count
More specifically, for first result document, the formula is:
9.543799991607665 * 100 / 194 = 4.919484
I have the following search:
GET student-grade/_search
{
"size": 0,
"aggs": {
"schools": {
"terms": {
"field": "SCHOOL_NAME.keyword",
"size": 2,
"shard_size": 250,
"min_doc_count": 20
},
"aggs": {
"national_averages_9_10": {
"filter": {
"range": {
"STUDENT_NATIONAL_AVERAGE_GRADE": {
"gte": 9,
"lte": 10
}
}
},
"aggs": {
"range_stats": {
"stats": {
"field": "STUDENT_NATIONAL_AVERAGE_GRADE"
}
}
}
}
}
}
}
}
This produce this sample output:
"aggregations": {
"schools": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 10790,
"buckets": [
{
"key": "Școala Gimnazială nr. 195",
"doc_count": 194,
"national_averages_9_10": {
"doc_count": 100,
"range_stats": {
"count": 100,
"min": 9.020000457763672,
"max": 10,
"avg": 9.543799991607665,
"sum": 954.3799991607666
}
}
},
{
"key": "Școala Gimnazială nr. 56",
"doc_count": 178,
"national_averages_9_10": {
"doc_count": 110,
"range_stats": {
"count": 110,
"min": 9,
"max": 10,
"avg": 9.566909139806574,
"sum": 1052.3600053787231
}
}
}
]
}
}
{
"size": 0,
"aggs": {
"schools": {
"terms": {
"field": "SCHOOL_NAME.keyword"
},
"aggs": {
"national_averages_9_10": {
"filter": {
"range": {
"STUDENT_NATIONAL_AVERAGE_GRADE": {
"gte": 9,
"lte": 10
}
}
},
"aggs": {
"range_stats": {
"stats": {
"field": "STUDENT_NATIONAL_AVERAGE_GRADE"
}
}
}
},
"my_formula": {
"bucket_script": {
"buckets_path": {
"school_total_count": "_count",
"average_9_10_total_count": "national_averages_9_10._count",
"average": "national_averages_9_10>range_stats.avg"
},
"script": "params.average * params.average_9_10_total_count / params.school_total_count"
}
}
}
}
}
}

ElasticSearch - calculate ratio between aggregation buckets

I'm using the following terms aggregations to get views and clicks of each campaign ( by campaign_id ) :
"aggregations": {
"campaigns": {
"terms": {
"field": "campaign_id",
"size": 10,
"order": {
"_term": "asc"
}
},
"aggregations": {
"actions": {
"terms": {
"field": "action",
"size": 10
}
}
}
}}
This is the response I get:
"aggregations": {
"campaigns": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "someId",
"doc_count": 12,
"actions": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "click",
"doc_count": 3
},
{
"key": "view",
"doc_count": 9
}
]
}
}
]
}
}
EDIT:
Here is an example of a document ( only the relevant parts of it..):
{
"_index": "action",
"_type": "click",
"_id": "AVI2XOTl8otXlszOjypT",
"_score": 1,
"_source": {
"ip": "127.0.0.1",
"timestamp": "2016-01-12T15:03:23.622743524Z",
"action": "click",
"campaign_id": "IypmiroC"
}}
I need to be able to retrieve the conversion rate of each campaign ( clicks / views ) , and I can't do it on the client side since I need to be able to sort by conversion rate.
Any help would be much appreciated.
This will require use of various aggregations and ES 2.x. First I am getting all unique campaign_id with terms aggregation. Then I am filtering with actions and getting the count of documents with that particular action. Then You need to use pipeline aggregation introduced in ES 2.0, mainly bucket script aggregation to take the ratio. This is how it looks.
{
"size": 0,
"aggs": {
"unique_campaign": {
"terms": {
"field": "campaign_id",
"size": 10
},
"aggs": {
"click_bucket": {
"filter": {
"term": {
"action": "click"
}
},
"aggs": {
"click_count": {
"value_count": {
"field": "action"
}
}
}
},
"view_bucket": {
"filter": {
"term": {
"action": "view"
}
},
"aggs": {
"view_count": {
"value_count": {
"field": "action"
}
}
}
},
"conversion_ratio": {
"bucket_script": {
"buckets_path": {
"total_clicks": "click_bucket>click_count",
"total_views": "view_bucket>view_count"
},
"script": "total_clicks/total_views"
}
}
}
}
}
}
Also, you need to have not_analyzed mapping for action as Click wont match click.
Hope this helps!!
As for now 7.x, sorting can be achieved as follows, just a demo for reference:
bucket_script
bucket_sort
{
"size": 0,
"aggs": {
"mallBucket": {
"terms": {
"field": "mallId",
"size": 20,
"min_doc_count": 3,
"shard_size": 10000
},
"aggs": {
"totalOrderCount": {
"value_count": {
"field": "orderSn"
}
},
"filteredCoupon": {
"filter": {
"terms": {
"tags": [
"hello",
"cool"
]
}
},
"aggs": {
"couponCount": {
"value_count": {
"field": "orderSn"
}
}
}
},
"countRatio": {
"bucket_script": {
"buckets_path": {
"orderCount": "totalOrderCount",
"couponCount": "filteredCoupon>couponCount"
},
"script": "params.couponCount/params.orderCount"
}
},
"ratio_bucket_sort": {
"bucket_sort": {
"sort": [
{
"countRatio": {
"order": "desc"
}
}
],
"size": 20
}
}
}
}
}
}

How to calculate difference between metrics in different aggregations in elasticsearch

I want to calculate the difference of nested aggregations between two dates.
To be more concrete is it possible to calculate the difference between date_1.buckets.field_1.buckets.field_2.buckets.field_3.value - date_2.buckets.field_1.buckets.field_2.buckets.field_3.value given the below request/response. Is that possible with elasticsearch v.1.0.1?
The aggregation query request looks like this:
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"terms": {
"date": [
"2014-08-18 00:00:00.0",
"2014-08-15 00:00:00.0"
]
}
}
]
}
}
}
},
"aggs": {
"date_1": {
"filter": {
"terms": {
"date": [
"2014-08-18 00:00:00.0"
]
}
},
"aggs": {
"my_agg_1": {
"terms": {
"field": "field_1",
"size": 2147483647,
"order": {
"_term": "desc"
}
},
"aggs": {
"my_agg_2": {
"terms": {
"field": "field_2",
"size": 2147483647,
"order": {
"_term": "desc"
}
},
"aggs": {
"my_agg_3": {
"sum": {
"field": "field_3"
}
}
}
}
}
}
}
},
"date_2": {
"filter": {
"terms": {
"date": [
"2014-08-15 00:00:00.0"
]
}
},
"aggs": {
"my_agg_1": {
"terms": {
"field": "field_1",
"size": 2147483647,
"order": {
"_term": "desc"
}
},
"aggs": {
"my_agg_1": {
"terms": {
"field": "field_2",
"size": 2147483647,
"order": {
"_term": "desc"
}
},
"aggs": {
"my_agg_3": {
"sum": {
"field": "field_3"
}
}
}
}
}
}
}
}
}
}
And the response looks like this:
{
"took": 236,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1646,
"max_score": 0,
"hits": []
},
"aggregations": {
"date_1": {
"doc_count": 823,
"field_1": {
"buckets": [
{
"key": "field_1_key_1",
"doc_count": 719,
"field_2": {
"buckets": [
{
"key": "key_1",
"doc_count": 275,
"field_3": {
"value": 100
}
}
]
}
}
]
}
},
"date_2": {
"doc_count": 823,
"field_1": {
"buckets": [
{
"key": "field_1_key_1",
"doc_count": 719,
"field_2": {
"buckets": [
{
"key": "key_1",
"doc_count": 275,
"field_3": {
"value": 80
}
}
]
}
}
]
}
}
}
}
Thank you.
With elasticsearch new version (eg: 5.6.9) is possible:
{
"size": 0,
"query": {
"constant_score": {
"filter": {
"bool": {
"filter": [
{
"range": {
"date_created": {
"gte": "2018-06-16T00:00:00+02:00",
"lte": "2018-06-16T23:59:59+02:00"
}
}
}
]
}
}
}
},
"aggs": {
"by_millisec": {
"range" : {
"script" : {
"lang": "painless",
"source": "doc['date_delivered'][0] - doc['date_created'][0]"
},
"ranges" : [
{ "key": "<1sec", "to": 1000.0 },
{ "key": "1-5sec", "from": 1000.0, "to": 5000.0 },
{ "key": "5-30sec", "from": 5000.0, "to": 30000.0 },
{ "key": "30-60sec", "from": 30000.0, "to": 60000.0 },
{ "key": "1-2min", "from": 60000.0, "to": 120000.0 },
{ "key": "2-5min", "from": 120000.0, "to": 300000.0 },
{ "key": "5-10min", "from": 300000.0, "to": 600000.0 },
{ "key": ">10min", "from": 600000.0 }
]
}
}
}
}
No arithmetic operations are allowed between two aggregations' result from elasticsearch DSL, not even using scripts. (Upto version 1.1.1, at least I know)
Such operations need to be handeled in client side after processing the aggs result.
Reference
elasticsearch aggregation to sort by ratio of aggregations
In 1.0.1 I couldn't find anything but in 1.4.2 you could try scripted_metric aggregation (still experimental).
Here are the scripted_metric documentation page
I am not good with the elasticsearch syntax but I think your metric inputs would be:
init_script- just initialize a accumulator for each date:
"init_script": "_agg.d1Val = 0; _agg.d2Val = 0;"
map_script- test the date of the document and add to the right accumulator:
"map_script": "if (doc.date == firstDate) { _agg.d1Val += doc.field_3; } else { _agg.d2Val = doc.field_3;};",
reduce_script - accumulate intermediate data from various shards and return the final results:
"reduce_script": "totalD1 = 0; totalD2 = 0; for (agg in _aggs) { totalD1 += agg.d1Val ; totalD2 += agg.d2Val ;}; return totalD1 - totalD2"
I don't think that in this case you need a combine_script.
If course, if you can't use 1.4.2 than this is no help :-)

Resources