How to do a range aggregation in elasticsearch

How to do a range aggregation in elasticsearch - elasticsearch

enter image description here
When aggregating by the field userguid
{
"_source": false,
"aggregations": {
"range_userGuid": {
"terms": {
"field": "userGuid"
}
}
}
}
I get the result
"aggregations" : {
"range_userGuid" : {
"doc_count_error_upper_bound" : 151,
"sum_other_doc_count" : 2424145,
"buckets" : [
{
"key" : 803100110976,
"doc_count" : 1
},
{
"key" : 813110447915,
"doc_count" : 10
},
{
"key" : 803100110306,
"doc_count" : 101
},
{
"key" : 2123312,
"doc_count" : 300
},
{
"key" : 3452342,
"doc_count" : 9999
},
]
}
}
Now I want to get the range from the aggs result. For example (0-100],(100-1000],>1000, and get the count of users. The expect result:
[
{
"from": 0,
"to": 100,
"count": 2 <---- 2 users, 803100110976 and 813110447915
},
{
"from": 100,
"to": "1000",
"count": 2 <---- 803100110306 and 2123312
},
{
"from": 1001,
"count": 1 <---- 3452342
}
]
The bucket size of aggs about 150000, how do I write such query?

You can use the range aggregation to achieve what you expect:
POST /test/_search
{
"size": 0,
"aggs": {
"range_userGuid": {
"range": {
"field": "userGuid",
"ranges": [
{
"from": 0,
"to": 100
},
{
"from": 100,
"to": 200
},
{
"from": 200,
"to": 1000
},
{
"from": 1000
}
]
}
}
}
}
UPDATE: Adapting this answer to your need:
POST index/_search
{
"size": 0,
"aggs": {
"users_0_100": {
"terms": {
"field": "userGuid",
"size": 1000
},
"aggs": {
"0_100": {
"bucket_selector": {
"buckets_path": {
"docCount": "_count"
},
"script": "params.docCount < 100"
}
}
}
},
"users_100_200": {
"terms": {
"field": "userGuid",
"size": 1000
},
"aggs": {
"100_200": {
"bucket_selector": {
"buckets_path": {
"docCount": "_count"
},
"script": "params.docCount >= 100 && params.docCount < 200"
}
}
}
},
"users_200_1000": {
"terms": {
"field": "userGuid",
"size": 1000
},
"aggs": {
"200_1000": {
"bucket_selector": {
"buckets_path": {
"docCount": "_count"
},
"script": "params.docCount >= 200 && params.docCount < 1000"
}
}
}
},
"users_1000": {
"terms": {
"field": "userGuid",
"size": 1000
},
"aggs": {
"1000": {
"bucket_selector": {
"buckets_path": {
"docCount": "_count"
},
"script": "params.docCount >= 1000"
}
}
}
}
}
}

Related

Elasticsearch cardinality multiple fields

I have an index which is as follows:
{
"_index" : "r2332",
"_type" : "_doc",
"_id" : "Vl81o3oBs8vUbHSMCZVQ",
"_score" : 1.0,
"_source" : {
"maid" : "d8ee3c5e-babb-4777-9cba-17fb0cd8e8a9",
"date" : "2021-06-09",
"hour" : 5,
"site_id" : 1035
}
},
{
"_index" : "r2332",
"_type" : "_doc",
"_id" : "Xl81o3oBs8vUbHSMCZVQ",
"_score" : 1.0,
"_source" : {
"maid" : "d8ee3c5e-babb-4777-9cba-17fb0cd8e8a9",
"date" : "2021-06-09",
"hour" : 5,
"site_id" : 1897
}
}
I am trying to get the unique count across maid, date. I am able to aggregate with one field maid but not both. The following are the codes that I tried.
Trial 1:
{
"size": 0,
"query": {
"bool": {
"filter": [{
"terms": {
"site_id": [7560, 7566]
}
}, {
"range": {
"date": {
"gte": "2021-09-01",
"lte": "2021-09-15"
}
}
}]
}
},
"runtime_mappings": {
"type_and_promoted": {
"type": "keyword",
"script": "emit(doc['maid'].value + ' ' + doc['date'].value)"
}
},
"aggs": {
"group_by": {
"terms": {
"field": "site_id",
"size": 100
},
"aggs": {
"bydate": {
"terms": {
"field": "date",
"size": 100
},
"aggs": {
"byhour": {
"terms": {
"field": "hour",
"size": 24
},
"aggs": {
"reverse_nested": {},
"uv": {
"cardinality": {
"field": "runtime_mappings"
}
}
}
}
}
}
}
}
}
}
This is giving an empty output.
Trial 2:
{
"size": 0,
"query": {
"bool": {
"filter": [{
"terms": {
"site_id": [7560, 7566]
}
}, {
"range": {
"date": {
"gte": "2021-09-01",
"lte": "2021-09-15"
}
}
}]
}
},
"aggs": {
"group_by": {
"terms": {
"field": "site_id",
"size": 100
},
"aggs": {
"bydate": {
"terms": {
"field": "date",
"size": 100
},
"aggs": {
"byhour": {
"terms": {
"field": "hour",
"size": 24
},
"aggs": {
"uv": {
"cardinality": {
"script": "doc['maid'].value + '#' +doc'date'].value"
}
}
}
}
}
}
}
}
}
}
This gives me syntax error at doc['maid'].value. How do I effectively combine two fields for cardinality. I am using Elasticsearch 7.13.2.
The mapping of my index is as follows:
"r2332" : {
"mappings" : {
"dynamic" : "false",
"properties" : {
"date" : {
"type" : "date"
},
"hour" : {
"type" : "integer"
},
"maid" : {
"type" : "keyword"
},
"reach" : {
"type" : "integer"
},
"site_id" : {
"type" : "keyword"
}
}
}
}
}

Modify your second query as shown below. You need to use the maid.keyword instead of the maid field in the cardinality aggregation to avoid the search_phase_execution_exception
{
"size": 0,
"query": {
"bool": {
"filter": [
{
"terms": {
"site_id": [
7560,
7566
]
}
},
{
"range": {
"date": {
"gte": "2021-09-01",
"lte": "2021-09-15"
}
}
}
]
}
},
"aggs": {
"group_by": {
"terms": {
"field": "site_id",
"size": 100
},
"aggs": {
"bydate": {
"terms": {
"field": "date",
"size": 100
},
"aggs": {
"byhour": {
"terms": {
"field": "hour",
"size": 24
},
"aggs": {
"uv": {
"cardinality": {
"script": "doc['maid.keyword'].value + '#' +doc['date'].value"
}
}
}
}
}
}
}
}
}
}

How do I count the number of buckets that match a condition in Elastic Search?

I have a collection of documents describing the scores of users. The same user will have multiple scores.
My data is structured like so:
[
{ "user_id" : 3, "score" : 10 },
{ "user_id" : 1, "score" : 20 },
{ "user_id" : 2, "score" : 60 },
{ "user_id" : 1, "score" : 10 },
...
]
I am trying to determine each user's max score. The elastic search query that I am using looks like this:
{
"size": 0,
"aggs": {
"users": {
"terms": {
"field": "user_id",
"size": 9999
},
"aggs": {
"max_score": {
"max": {
"field": "score"
}
}
}
}
}
}
The response looks like this:
"aggregations": {
"users": {
"buckets": [
{
"key": "1",
"doc_count": 10,
"max_score": {
"value": 10
}
},
{
"key": "2",
"doc_count": 10,
"max_score": {
"value": 20
}
},
...
]
}
}
}
How can I find the number of buckets where max_score > 20, max_score > 50, and max_score > 100?
Is there any way to make the response look like below?
"aggregations": {
"users": {
"buckets": [
{
"key": "1",
"doc_count": 10,
"max_score": {
"value": 10
}
},
...
],
"scoresGreaterThan20": {
"value": 10
},
"scoresGreaterThan50": {
"value": 5
},
"scoresGreaterThan100": {
"value": 2
},
}
}
}

You can achieve your use case by repeating the same terms and max aggregation along with bucket selector aggregation, for different conditions you need. Adding a working example -
Index Data:
{ "user_id" : 3, "score" : 10 }
{ "user_id" : 1, "score" : 20 }
{ "user_id" : 2, "score" : 60 }
{ "user_id" : 1, "score" : 10 }
Search Query:
You can use stats bucket aggregation to get the count of buckets after performing the bucket selector aggregation.
{
"size": 0,
"aggs": {
"user_gt20": {
"terms": {
"field": "user_id",
"size": 9999
},
"aggs": {
"max_score": {
"max": {
"field": "score"
}
},
"scoresGreaterThan20": {
"bucket_selector": {
"buckets_path": {
"values": "max_score"
},
"script": "params.values > 20"
}
}
}
},
"user_gt20_count": {
"stats_bucket": {
"buckets_path": "user_gt20._count"
}
},
"user_gt50": {
"terms": {
"field": "user_id",
"size": 9999
},
"aggs": {
"max_score": {
"max": {
"field": "score"
}
},
"scoresGreaterThan50": {
"bucket_selector": {
"buckets_path": {
"values": "max_score"
},
"script": "params.values > 50"
}
}
}
},
"user_gt50_count": {
"stats_bucket": {
"buckets_path": "user_gt50._count"
}
},
"user_gt100": {
"terms": {
"field": "user_id",
"size": 9999
},
"aggs": {
"max_score": {
"max": {
"field": "score"
}
},
"scoresGreaterThan100": {
"bucket_selector": {
"buckets_path": {
"values": "max_score"
},
"script": "params.values > 100"
}
}
}
},
"user_gt100_count": {
"stats_bucket": {
"buckets_path": "user_gt100._count"
}
}
}
}
Search Result:
"aggregations": {
"user_gt100": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": []
},
"user_gt20": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 2,
"doc_count": 1,
"max_score": {
"value": 60.0
}
}
]
},
"user_gt50": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 2,
"doc_count": 1,
"max_score": {
"value": 60.0
}
}
]
},
"user_gt20_count": {
"count": 1, // note this
"min": 1.0,
"max": 1.0,
"avg": 1.0,
"sum": 1.0
},
"user_gt50_count": {
"count": 1, // note this
"min": 1.0,
"max": 1.0,
"avg": 1.0,
"sum": 1.0
},
"user_gt100_count": {
"count": 0, // note this
"min": null,
"max": null,
"avg": null,
"sum": 0.0
}
}

group by nested and non nested fields in es

Hi i am trying to do group by nested and non nested fields.I want to do group by on 1 non nested fields(from_district) ,1 nested field(truck_number) and max on nested field(truck_number.score).
Requirement -: to get max score of each truck in all districts if truck is present in that district for a given sp_id
eg-:
District1 ,truck1, 0.9,
District2 ,truck1, 0.8,
District1 ,truck2, 1.8,
District2 ,truck3, 0.7,
District3 ,truck4, 1.7
Below is my mapping
{
"sp_ranked_indent" : {
"mappings" : {
"properties" : {
"from_district" : {
"type" : "keyword"
},
"sp_id" : {
"type" : "long"
},
"to_district" : {
"type" : "keyword"
},
"truck_ranking_document" : {
"type" : "nested",
"properties" : {
"score" : {
"type" : "float"
},
"truck_number" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
}
}
Below is the query that i tried but it is not grouping by nested and non nested field and also the max truck score is incorrect
{
"size": 0,
"query": {
"terms": {
"sp_id": [650128],
"boost": 1.0
}
},
"aggregations": {
"NESTED_AGG": {
"nested": {
"path": "truck_ranking_document"
},
"aggregations": {
"max_score": {
"max": {
"field": "truck_ranking_document.score"
}
},
"truck_numer": {
"terms": {
"field": "truck_ranking_document.truck_number.keyword",
"size": 10,
"min_doc_count": 1,
"shard_min_doc_count": 0,
"show_term_doc_count_error": false,
"order": [{
"_count": "desc"
}, {
"_key": "asc"
}]
}
},
"fromdistrictagg": {
"reverse_nested": {},
"aggregations": {
"fromDistrict": {
"terms": {
"field": "from_district",
"size": 10,
"min_doc_count": 1,
"shard_min_doc_count": 0,
"show_term_doc_count_error": false,
"order": [{
"_count": "desc"
}, {
"_key": "asc"
}]
}
}
}
}
}
}
}
}

I think this can be done using term and nested aggregation. Below query will produce output in follwing format
District1
Truck1
Max score
Truck2
Max score
Truck3
Max score
District2
Truck1
Max score
Truck2
Max score
Truck3
Max score
Query:
{
"query": {
"terms": {
"sp_id": [
1
]
}
},
"aggs": {
"district": {
"terms": {
"field": "from_district",
"size": 10
},
"aggs": {
"trucks": {
"nested": {
"path": "truck_ranking_document"
},
"aggs": {
"truck_no": {
"terms": {
"field": "truck_ranking_document.truck_number.keyword",
"size": 10
},
"aggs": {
"max_score": {
"max": {
"field": "truck_ranking_document.score"
}
},
"select": {
"bucket_selector": {
"buckets_path": {
"score": "max_score"
},
"script": "if(params.score>0) return true;"
}
}
}
}
}
},
"min_bucket_selector": {
"bucket_selector": {
"buckets_path": {
"count": "trucks>truck_no._bucket_count"
},
"script": {
"inline": "params.count != 0"
}
}
}
}
}
}
}
Result:
"aggregations" : {
"district" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "District1",
"doc_count" : 1,
"trucks" : {
"doc_count" : 2,
"truck_no" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "1",
"doc_count" : 1,
"max_score" : {
"value" : 2.0
}
},
{
"key" : "3",
"doc_count" : 1,
"max_score" : {
"value" : 3.0
}
}
]
}
}
}
]
}
Composite Aggregation
Composite aggregation response contains an after_key
"after_key" : {
"district" : "District4"
}
you need to use the after parameter to retrieve the next results
{
"aggs": {
"my_buckets": {
"composite": {
"size": 100,
"sources": [
{
"district": {
"terms": {
"field": "from_district"
}
}
}
]
},
"aggs": {
"trucks": {
"nested": {
"path": "truck_ranking_document"
},
"aggs": {
"truck_no": {
"terms": {
"field": "truck_ranking_document.truck_number.keyword",
"size": 10
},
"aggs": {
"max_score": {
"max": {
"field": "truck_ranking_document.score"
}
},
"select": {
"bucket_selector": {
"buckets_path": {
"score": "max_score"
},
"script": "if(params.score>0) return true;"
}
}
}
}
}
}
}
}
}
}

Filter hits from aggregation buckets

I want to filter out hits to only return hits that are in my aggregation bucket.
{
"from": 0,
"aggs": {
"id.raw": {
"terms": {
"field": "id.raw",
"size": 0
},
"aggs": {
"id_bucket_filter": {
"bucket_selector": {
"buckets_path": {
"count": "_count"
},
"script": {
"inline": "count == 1"
}
}
}
}
}
}
}
Result aggregation:
"id.raw": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "1726200CFABY",
"doc_count": 1
}
]
}
But I have 66 hits. I want only 1 hit. The 1 document for key 1726200CFABY
"hits": {
"total": 66,
"max_score": 1,
"hits": [
{
How can I get back only rows for the ids that match my aggregation buckets
EDIT: From Val comment I tried
{
"size": 0,
"aggs": {
"id.raw": {
"terms": {
"field": "id.raw",
"size": 0
},
"aggs": {
"top_hits": {
"top_hits": {
"size": 1
}
},
"id_bucket_filter": {
"bucket_selector": {
"buckets_path": {
"count": "_count"
},
"script": {
"inline": "count == 1"
}
}
}
}
}
}
}
I think i'm good now

How to calculate difference between metrics in different aggregations in elasticsearch

I want to calculate the difference of nested aggregations between two dates.
To be more concrete is it possible to calculate the difference between date_1.buckets.field_1.buckets.field_2.buckets.field_3.value - date_2.buckets.field_1.buckets.field_2.buckets.field_3.value given the below request/response. Is that possible with elasticsearch v.1.0.1?
The aggregation query request looks like this:
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"terms": {
"date": [
"2014-08-18 00:00:00.0",
"2014-08-15 00:00:00.0"
]
}
}
]
}
}
}
},
"aggs": {
"date_1": {
"filter": {
"terms": {
"date": [
"2014-08-18 00:00:00.0"
]
}
},
"aggs": {
"my_agg_1": {
"terms": {
"field": "field_1",
"size": 2147483647,
"order": {
"_term": "desc"
}
},
"aggs": {
"my_agg_2": {
"terms": {
"field": "field_2",
"size": 2147483647,
"order": {
"_term": "desc"
}
},
"aggs": {
"my_agg_3": {
"sum": {
"field": "field_3"
}
}
}
}
}
}
}
},
"date_2": {
"filter": {
"terms": {
"date": [
"2014-08-15 00:00:00.0"
]
}
},
"aggs": {
"my_agg_1": {
"terms": {
"field": "field_1",
"size": 2147483647,
"order": {
"_term": "desc"
}
},
"aggs": {
"my_agg_1": {
"terms": {
"field": "field_2",
"size": 2147483647,
"order": {
"_term": "desc"
}
},
"aggs": {
"my_agg_3": {
"sum": {
"field": "field_3"
}
}
}
}
}
}
}
}
}
}
And the response looks like this:
{
"took": 236,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1646,
"max_score": 0,
"hits": []
},
"aggregations": {
"date_1": {
"doc_count": 823,
"field_1": {
"buckets": [
{
"key": "field_1_key_1",
"doc_count": 719,
"field_2": {
"buckets": [
{
"key": "key_1",
"doc_count": 275,
"field_3": {
"value": 100
}
}
]
}
}
]
}
},
"date_2": {
"doc_count": 823,
"field_1": {
"buckets": [
{
"key": "field_1_key_1",
"doc_count": 719,
"field_2": {
"buckets": [
{
"key": "key_1",
"doc_count": 275,
"field_3": {
"value": 80
}
}
]
}
}
]
}
}
}
}
Thank you.

With elasticsearch new version (eg: 5.6.9) is possible:
{
"size": 0,
"query": {
"constant_score": {
"filter": {
"bool": {
"filter": [
{
"range": {
"date_created": {
"gte": "2018-06-16T00:00:00+02:00",
"lte": "2018-06-16T23:59:59+02:00"
}
}
}
]
}
}
}
},
"aggs": {
"by_millisec": {
"range" : {
"script" : {
"lang": "painless",
"source": "doc['date_delivered'][0] - doc['date_created'][0]"
},
"ranges" : [
{ "key": "<1sec", "to": 1000.0 },
{ "key": "1-5sec", "from": 1000.0, "to": 5000.0 },
{ "key": "5-30sec", "from": 5000.0, "to": 30000.0 },
{ "key": "30-60sec", "from": 30000.0, "to": 60000.0 },
{ "key": "1-2min", "from": 60000.0, "to": 120000.0 },
{ "key": "2-5min", "from": 120000.0, "to": 300000.0 },
{ "key": "5-10min", "from": 300000.0, "to": 600000.0 },
{ "key": ">10min", "from": 600000.0 }
]
}
}
}
}

No arithmetic operations are allowed between two aggregations' result from elasticsearch DSL, not even using scripts. (Upto version 1.1.1, at least I know)
Such operations need to be handeled in client side after processing the aggs result.
Reference
elasticsearch aggregation to sort by ratio of aggregations

In 1.0.1 I couldn't find anything but in 1.4.2 you could try scripted_metric aggregation (still experimental).
Here are the scripted_metric documentation page
I am not good with the elasticsearch syntax but I think your metric inputs would be:
init_script- just initialize a accumulator for each date:
"init_script": "_agg.d1Val = 0; _agg.d2Val = 0;"
map_script- test the date of the document and add to the right accumulator:
"map_script": "if (doc.date == firstDate) { _agg.d1Val += doc.field_3; } else { _agg.d2Val = doc.field_3;};",
reduce_script - accumulate intermediate data from various shards and return the final results:
"reduce_script": "totalD1 = 0; totalD2 = 0; for (agg in _aggs) { totalD1 += agg.d1Val ; totalD2 += agg.d2Val ;}; return totalD1 - totalD2"
I don't think that in this case you need a combine_script.
If course, if you can't use 1.4.2 than this is no help :-)

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How to do a range aggregation in elasticsearch - elasticsearch

Related

Elasticsearch cardinality multiple fields

How do I count the number of buckets that match a condition in Elastic Search?

group by nested and non nested fields in es

Filter hits from aggregation buckets

How to calculate difference between metrics in different aggregations in elasticsearch

Categories

Resources