how to sort bucket key to integer in elasticsearch? - elasticsearch

i have a problem with sorting by bucket key.
how do i sort bucket key by integer?
this is my query.
{
"aggregations": {
"by_time": {
"terms": {
"script": {
"source": "Instant.ofEpochMilli(doc['statdate'].date.millis).atZone(ZoneId.of(params.tz)).hour",
"lang": "painless",
"params": {
"tz": "Asia/Seoul"
}
},
"size": 10,
"min_doc_count": 0,
"shard_min_doc_count": 0,
"show_term_doc_count_error": false,
"order": {
"_key": "asc"
}
}
}
}
and result.
{
"aggregations": {
"sterms#by_time": {
"buckets": [
{
"key": "11",
"doc_count": 1
},
{
"key": "19",
"doc_count": 1
},
{
"key": "22",
"doc_count": 1
}
},
{
"key": "7",
"doc_count": 1
},
{
"key": "9",
"doc_count": 7
}
]
}
}
but i don't want this result.
i think what this key type is string.
how can i sort by integer key?

You need to use bucket sort aggregation that is a parent pipeline
aggregation which sorts the buckets of its parent multi-bucket
aggregation. Zero or more sort fields may be specified together with
the corresponding sort order. Each bucket may be sorted based on its
_key, _count or its sub-aggregations.
Try out this search query:
{
"aggregations": {
"by_time": {
"terms": {
"script": {
"source": "Instant.ofEpochMilli(doc['statdate'].date.millis).atZone(ZoneId.of(params.tz)).hour",
"lang": "painless",
"params": {
"tz": "Asia/Seoul"
}
},
"size": 20, <-- note this
"min_doc_count": 0,
"shard_min_doc_count": 0,
"show_term_doc_count_error": false,
"order": {
"_key": "asc"
}
},
"aggs": {
"bucket_truncate": {
"bucket_sort": { <-- note this
"sort": [
{
"_key": {
"order": "asc"
}
}
],
"size": 20 <-- note this
}
}
}
}
}
}
Adding a working example with index data, search query, and search result
Index Data:
{
"id":"1",
"title":"a"
}
{
"id":"3",
"title":"c"
}
{
"id":"2",
"title":"b"
}
{
"id":"2",
"title":"c"
}
Search Query:
{
"size": 0,
"aggs": {
"unique_id": {
"terms": {
"field": "id.keyword"
},
"aggs": {
"bucket_truncate": {
"bucket_sort": {
"sort": [
{
"_key": {
"order": "asc"
}
}
]
}
}
}
}
}
}
Search Result:
"aggregations": {
"unique_id": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "1",
"doc_count": 1
},
{
"key": "2",
"doc_count": 2
},
{
"key": "3",
"doc_count": 1
}
]
}
}

Related

Getting the count of customers for each version, only including the highest version a customer has

I have an aggregation to get the count of customers for each version:
{
"aggs": {
"2": {
"terms": {
"field": "version.string.keyword",
"order": {
"_key": "desc"
},
"size": 50
},
"aggs": {
"1": {
"cardinality": {
"field": "orgId.keyword"
}
}
}
}
}
The problem with this is that if a customer has two versions running at the same time, the customer will be included in both versions. What I need is for the customer to be included only in the highest version. For example, if I've got documents:
{
"orgId": "A",
"version": {
"string": "1.1",
"major": 1,
"minor": 1
}
}
{
"orgId": "A",
"version": {
"string": "1.2",
"major": 1,
"minor": 2
}
}
{
"orgId": "B",
"version": {
"string": "1.1",
"major": 1,
"minor": 2
}
}
The response should be:
[
{
"1": {
"value": 1
},
"key": "1.1"
},
{
"1": {
"value": 1
},
"key": "1.2"
}
]
instead of:
[
{
"1": {
"value": 2
},
"key": "1.1"
},
{
"1": {
"value": 1
},
"key": "1.2"
}
]
I've tried this query which correctly returns highest version for each customer:
{
"aggs": {
"2": {
"terms": {
"field": "orgId.keyword",
"order": {
"_key": "desc"
},
"size": 50
},
"aggs": {
"sorted_version": {
"top_hits": {
"sort": [
{
"version.major": {
"order": "desc"
},
"version.minor": {
"order": "desc"
}
}
],
"_source": {
"includes": [
"version.string"
]
},
"size": 1
}
}
}
}
}
}
I'm kinda lost now on how to combine these two queries, any help would be appreciated.
This results help you?
{
"size": 0,
"aggs": {
"group_by_version_string": {
"terms": {
"field": "version.string.keyword",
"order": {
"_key": "desc"
}
},
"aggs": {
"group_by_orgId": {
"terms": {
"field": "orgId.keyword",
"order": {
"_key": "desc"
}
}
}
}
}
}
}
Response
"buckets": [
{
"key": "1.2",
"doc_count": 1,
"group_by_orgId": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "A",
"doc_count": 1
}
]
}
},
{
"key": "1.1",
"doc_count": 2,
"group_by_orgId": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "B",
"doc_count": 1
},
{
"key": "A",
"doc_count": 1
}
]
}
}
]

why aggregation script is not working in elasticsearch?

i have a some problem in elasticsearch.
i want division value with two aggregated values.
this query is working.
{
"query": {
"bool": {
"adjust_pure_negative": true,
"boost": 1.0
}
},
"aggregations": {
"sumPageview": {
"sum": {
"field": "pageview",
"missing": 0
}
},
"sumVisit": {
"sum": {
"field": "visit",
"missing": 0
}
}
}
but this query is not working.
{
"query": {
"bool": {
"adjust_pure_negative": true,
"boost": 1.0
}
},
"aggregations": {
"sumPageview": {
"sum": {
"field": "pageview",
"missing": 0
}
},
"sumVisit": {
"sum": {
"field": "visit",
"missing": 0
}
},
"totalPageviewPerVisit": {
"bucket_script": {
"buckets_path": {
"sumPageview": "sumPageview",
"sumVisit": "sumVisit"
},
"script": {
"source": "params.sumPageview / params.sumVisit",
"lang": "painless"
},
"gap_policy": "skip"
}
}
}
i think this reason is what sum value is not in bucket.
this reason right? help me, please.
Sum aggregation is a single-value metrics aggregation that sums
up numeric values that are extracted from the aggregated documents.
Bucket script aggregation is a parent pipeline aggregation that
executes a script that can perform per bucket computations on
specified metrics in the parent multi-bucket aggregation.
Because sum aggregation, do not create any buckets, so you cannot use bucket script aggregation on it.
Adding a working example with index data, search query, and search result
Index Data:
{
"user_id":1,
"pageview": 1,
"visit": 2
}
{
"user_id":2,
"pageview": 2,
"visit": 3
}
{
"user_id":3,
"pageview": 3,
"visit": 4
}
Search Query:
{
"size": 0,
"aggs": {
"all": {
"terms": {
"field": "user_id"
},
"aggs": {
"sum_1": {
"sum": {
"field": "pageview"
}
},
"sum_2": {
"sum": {
"field": "visit"
}
},
"division": {
"bucket_script": {
"buckets_path": {
"my_var1": "sum_1",
"my_var2": "sum_2"
},
"script": "params.my_var1 / params.my_var2"
}
}
}
}
}
}
Search Result:
"aggregations": {
"all": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 1,
"doc_count": 1,
"sum_2": {
"value": 2.0
},
"sum_1": {
"value": 1.0
},
"division": {
"value": 0.5
}
},
{
"key": 2,
"doc_count": 1,
"sum_2": {
"value": 3.0
},
"sum_1": {
"value": 2.0
},
"division": {
"value": 0.6666666666666666
}
},
{
"key": 3,
"doc_count": 1,
"sum_2": {
"value": 4.0
},
"sum_1": {
"value": 3.0
},
"division": {
"value": 0.75
}
}
]
}

Is there a way to compare string alphabetically in painless

I would like to execute this kind of operation in painless :
if (_value >= 'c)' {
return _value
} else {
return '__BAD__'
}
value is a string and I would like this following behaviour :
if value is foo I want to replace it with __BAD__ if the value is bar, I want to keep bar. only values alphabetically after 'c' should be set to __BAD__.
I got this exception :
"lang": "painless",
"caused_by": {
"type": "class_cast_exception",
"reason": "Cannot apply [>] operation to types [java.lang.String] and [java.lang.String]."
}
Is there a way to perform string alphabetical comparaison between string in painless ?
My documents are looking :
{
"id": "doca",
"categoryId": "aaa",
"parentNames": "a$aa$aaa"
},
{
"id": "docb",
"categoryId": "bbb",
"parentNames": "a$aa$bbb"
},
{
"id": "docz",
"categoryId": "zzz",
"parentNames": "a$aa$zzz"
}
and my query is like :
{
"query": {
"bool": {
"filter": []
}
},
"size": 0,
"aggs": {
"catNames": {
"terms": {
"size": 10000,
"order": {
"_key": "asc"
},
"script": {
"source": "if(doc['parentNames'].value < 'a$aa$ccc') {return doc['parentNames'].value} return '__BAD__'",
"lang": "painless"
}
},
"aggs": {
"sort": {
"bucket_sort": {
"size": 2
}
},
"catId": {
"terms": {
"field": "categoryId",
"size": 1
}
}
}
}
}
}
I am expecting the result :
{
"took": 29,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0,
"hits": []
},
"aggregations": {
"catNames": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "__BAD__",
"doc_count": 1,
"catId": {
"buckets": [
{
"key": "aaa",
"doc_count": 1
}
]
}
},
{
"key": "a$aa$bbb",
"doc_count": 1,
"catId": {
"buckets": [
{
"key": "bbb",
"doc_count": 1
}
]
}
},
{
"key": "a$aa$zzz",
"doc_count": 1,
"catId": {
"buckets": [
{
"key": "zzz",
"doc_count": 1
}
]
}
}
]
}
}
}
In fact, I can use the compareTo function of java.lang.String.
if (_value.compareTo('c') > 0) {
return _value
} else {
return '__BAD__'
}
My query is becoming :
{
"query": {
"bool": {
"filter": []
}
},
"size": 0,
"aggs": {
"catNames": {
"terms": {
"size": 10000,
"order": {
"_key": "asc"
},
"script": {
"source": "if(doc['parentNames'].value.compareTo('a$aa$ccc')) {return doc['parentNames'].value} return '__BAD__'",
"lang": "painless"
}
},
"aggs": {
"sort": {
"bucket_sort": {
"size": 2
}
},
"catId": {
"terms": {
"field": "categoryId",
"size": 1
}
}
}
}
}
}

Elasticsearch: filter aggregation using bucket value

Not sure how to formulate the question.
I'm using Elasticsearch 2.2.
Let's start with an example of the dataset, made of 5 documents:
[
{
"header": {
"called_entity": { "uuid": "a" },
"coverage_entity": {},
"sucessful_transfers": 1
}
},
{
"header": {
"called_entity": { "uuid": "a" },
"coverage_entity": { "uuid": "b" },
"sucessful_transfers": 1
}
},
{
"header": {
"called_entity": { "uuid": "b" },
"coverage_entity": { "uuid": "a" },
"sucessful_transfers": 1
}
},
{
"header": {
"called_entity": { "uuid": "b" },
"coverage_entity": { "uuid": "a" },
"sucessful_transfers": 0
}
}
]
called_entity always has a uuid.
coverage_entity can be empty, or have an uuid.
I use a script to aggregate on either called_entity.uuid or coverage_entity.uuid:
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"dim1": {
"terms": {
"script" : "return doc['header.called_entity.uuid'] + doc['header.coverage_entity.uuid']",
"size": 10
},
"aggs": {
"successful_transfers": {
"sum": {
"field": "header.successful_transfers"
}
}
}
}
}
}
So now, the aggregation has generated terms from either header.called_entity.uuid, or header.coverage_entity.uuid.
How can I filter my aggregation using the value of the aggregation key? For example, if I want to count, for each bucket, how many documents have their uuid taken from header.called_entity.uuid only. Something like that:
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"dim1": {
"terms": {
"script" : "return doc['header.called_entity.uuid'] + doc['header.coverage_entity.uuid']",
"size": 10
},
"aggs": {
"successful_transfers": {
"sum": {
"field": "header.successful_transfers"
}
},
"from_called_entity": {
"filter": {
"term": { "header.called_entity.uuid": BUCKET_KEY }
}
}
}
}
}
}
Not sure this is possible. The key itself is only available as a sorting option.
Could you use something like this:
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"dim1": {
"terms": {
"script": "return doc['header.called_entity.uuid'] + doc['header.coverage_entity.uuid']",
"size": 10
},
"aggs": {
"successful_transfers": {
"sum": {
"field": "header.sucessful_transfers"
}
}
}
},
"called_entity_source": {
"terms": {
"field": "header.called_entity.uuid",
"size": 10
}
},
"coverage_entity_source": {
"terms": {
"field": "header.coverage_entity.uuid",
"size": 10
}
}
}
}
And the output will be something like this:
"called_entity_source": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "a",
"doc_count": 2
},
{
"key": "b",
"doc_count": 2
}
]
},
"coverage_entity_source": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "a",
"doc_count": 2
},
{
"key": "b",
"doc_count": 1
}
]
},
"dim1": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "a",
"doc_count": 4,
"successful_transfers": {
"value": 3
}
},
{
"key": "b",
"doc_count": 3,
"successful_transfers": {
"value": 2
}
}
]
}
If you really need to have the json in that specific way, add another final step in your application where you post process the result a bit. The result above does contain the info you need but the keys from coverage_entity_source and called_entity_source are not under the dim aggregation.

elasticsearch sort aggregation on categorical value

In elasticsearch, I can aggregate and sort the aggregation on a second aggregation's numeric field.
e.g.
GET myindex/_search
{
"size":0,
"aggs": {
"a1": {
"terms": {
"field": "FIELD1",
"size":0,
"order": {"a2": "desc"}
},
"aggs":{
"a2":{
"sum":{
"field":"FIELD2"
}
}
}
}
}
}
However, I want to sort the aggregation on a categorical field value. ie. let's say the value of FIELD2 was one of ("a", "b", "c") -- I want to sort a1 first by all documents's with FIELD2: "a", then FIELD2: "b", then FIELD2: "c".
In my case, every FIELD1 has a unique FIELD2. So I really just want a way to sort the a1 results by FIELD2.
I am not sure what exactly you want but I tried following.
I created index with mapping
PUT your_index
{
"mappings": {
"your_type": {
"properties": {
"name": {
"type": "string"
},
"fruit" : {"type" : "string", "index": "not_analyzed"}
}
}
}
}
Then I indexed few documents like this
PUT your_index/your_type/1
{
"name" : "federer",
"fruit" : "orange"
}
Then I sorted all players with fruits with following aggregation
{
"size": 0,
"aggs": {
"a1": {
"terms": {
"field": "name",
"order": {
"_term": "asc"
}
},
"aggs": {
"a2": {
"terms": {
"field": "fruit",
"order": {
"_term": "asc"
}
}
}
}
}
}
}
The result I got is
"aggregations": {
"a1": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "federer",
"doc_count": 3,
"a2": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "green apple",
"doc_count": 1
},
{
"key": "orange",
"doc_count": 2
}
]
}
},
{
"key": "messi",
"doc_count": 2,
"a2": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "apple",
"doc_count": 1
},
{
"key": "banana",
"doc_count": 1
}
]
}
},
{
"key": "nadal",
"doc_count": 2,
"a2": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "blueberry",
"doc_count": 1
},
{
"key": "watermelon",
"doc_count": 1
}
]
}
},
{
"key": "ronaldo",
"doc_count": 2,
"a2": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "banana",
"doc_count": 1
},
{
"key": "watermelon",
"doc_count": 1
}
]
}
}
]
}
}
Make sure your FIELD2 is not_analyzed or you will get unexpected results.
Does this help?
I found a way that works. You must first aggregate on FIELD2, then on FIELD1.
{
"size": 0,
"aggs": {
"a2": {
"terms": {
"size": 0,
"field": "FIELD2",
"order": {
"_term": "asc"
}
},
"aggs": {
"a1": {
"terms": {
"size": 0,
"field": "FIELD1",
"order": {
"_term": "asc"
}
}
}
}
}
}
}

Resources