Elasticsearch - Merge date histogram aggregation - elasticsearch

Query
GET /_search
{
"size" : 0,
"query" : {
"ids" : {
"types" : [ ],
"values" : [ "someId1", "someId2", "someId3" ... ]
}
},
"aggregations" : {
"how_to_merge" : {
"date_histogram" : {
"script" : {
"inline" : "doc['date'].values"
},
"interval" : "1y",
"format" : "yyyy"
}
}
}
}
Result
{
...
"aggregations": {
"how_to_merge": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key_as_string": "2006",
"key": 1136073600000,
"doc_count": 1
},
{
"key_as_string": "2007",
"key": 1167609600000,
"doc_count": 2
},
{
"key_as_string": "2008",
"key": 1199145600000,
"doc_count": 3
}
]
}
}
}
I want to merge "2006" and "2007"
And change key name to "TMP"
So result must like this:
{
...
"aggregations": {
"how_to_merge": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key_as_string": "TMP",
"key": 1136073600000,
"doc_count": 3 <----------- 2006(1) + 2007(2)
},
{
"key_as_string": "2008",
"key": 1199145600000,
"doc_count": 3
}
]
}
}
}
In the case of term queries, if I write a script query as shown below, it works normally, but the date histogram query does not change the key value.
The query below is a script query in which the key value in term aggregation changes SOMETHING1 and SOMETHING2 to TMP and merges.
def param = new groovy.json.JsonSlurper().parseText(
'{\"SOMETHING1\": \"TMP\", \"SOMETHING2\": \"TMP\"}'
);
def data = doc['my_field'].values;
def list = [];
if (!doc['my_field'].empty) {
for (x in data) {
if (param[x] != null) {
list.add(param[x]);
} else {
list.add(x);
}
}
};
return list;
How can I write a script query when I want to change the key value to the value I want and combine doc_count in the data histogram query?
Thanks for your help and comments!

Related

elasticsearch filters aggregation does not return array format

The filters aggregation returns bucket as object
"buckets": {
"errors": {
"doc_count": 1
},
"warnings": {
"doc_count": 2
}
}
But i would like to return a buckets array, like the terms aggregation
"buckets": [
{
"key": "errors",
"doc_count": 1
},
{
"key": "warnings",
"doc_count": 2
}
]
Is this possible or any sort of data transformation can be done in the query to make it so?
You can do it by providing an array of filters, but in this case your buckets will be anonymous:
GET logs/_search
{
"size": 0,
"aggs" : {
"messages" : {
"filters" : {
"filters" : [ <--- specify array
{ "match" : { "body" : "error" }},
{ "match" : { "body" : "warning" }}
]
}
}
}
}
The response will provide an array of resulting buckets in the same order
"buckets": [
{
"doc_count": 1
},
{
"doc_count": 2
}
]

How to merge aggregation bucket in Elasticsearch?

Query
GET /_search
{
"size" : 0,
"query" : {
"ids" : {
"types" : [ ],
"values" : [ "someId1", "someId2", "someId3" ... ]
}
},
"aggregations" : {
"how_to_merge" : {
"terms" : {
"field" : "country",
"size" : 50
}
}
}
}
Result
{
...
"aggregations": {
"how_to_merge": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "KR",
"doc_count": 90
},
{
"key": "JP",
"doc_count": 83
},
{
"key": "US",
"doc_count": 50
},
{
"key": "BE",
"doc_count": 9
}
]
}
}
}
I want to merge "KR" and "JP" and "US"
And change key name to "NEW_RESULT"
So result must like this:
{
...
"aggregations": {
"how_to_merge": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "NEW_RESULT",
"doc_count": 223
},
{
"key": "BE",
"doc_count": 9
}
]
}
}
}
Is it possible in elasticsearch query?
I cannot use a client-side solution since there are too many entities and retrieving all of them and merging would be probably too slow for my application.
Thanks for your help and comments!
You can try writing a script for that though I would recommend benchmarking this approach against the client-side processing since it might be quite slow.

Elasticsearch aggregation by full array

How i can to get aggregation by all array items inside document, not by each value of array. For example i have several documents, like this
{'some_field': [1,2]}
{'some_field': [1]}
{'some_field': [1]}
{'some_field': [7,2]}
Now with simple aggregation query like this
{
"aggs" : {
"agg_name" : {
"terms" : {
"field" : "some_field"
}
}
},
"size": 0
}
i got result like this
"buckets": [
{
"key": "1",
"doc_count": 3
},
{
"key": "2",
"doc_count": 2
},
...
]
but i want to get full array view, like this
"buckets": [
{
"key": [1],
"doc_count": 2
},
{
"key": [1,2],
"doc_count": 1
},
{
"key": [7,2],
"doc_count": 1
},
]
I was looking for the same aggregation, still doesn't exists.
So fixed with a painless script
POST some_index/_search
{
"size": 0,
"aggs": {
"myaggs": {
"terms": {
"size": 100,
"script": {
"lang": "painless",
"source": """
def myString = "";
for (int i = 0; i < doc['data. some_field.keyword'].length; ++i) {
myString += doc['data. some_field.keyword'][i] + ", ";
}
return myString;
"""
}
}
}
}
}

Grouping values in an aggregation

I'm quite new to Elasticsearch.
I have a query that looks like this:
GET animals/_search
{
"aggregations" : {
"top_animals" : {
"terms" : {"field" : "animals", "size" : 10}
}
},
"size" : 0
}
This returns something like:
{
(...)
"aggregations": {
"top_animals": {
(...)
"buckets": [
{
"key": "dogs",
"doc_count": 100
},
{
"key": "whales",
"doc_count": 70
},
{
"key": "dolphins",
"doc_count": 50
},
{
"key": "cats",
"doc_count": 10
}
]
}
}
}
Now I've been given a list of animals that are equivalent and should be counted together.
So "dogs" and "cats" are "pets", and "dolphins" and "whales" are "aquatic_mammals".
I'd like a result like this (note that the results are ordered):
{
(...)
"aggregations": {
"top_animals": {
(...)
"buckets": [
{
"key": "aquatic_mammals",
"doc_count": 120
},
{
"key": "pets",
"doc_count": 110
}
]
}
}
}
How should I modify my query?
Thanks!
If I understand you well, the values pets and aquatic are not part of the stored data?
There's probably a way with a script (which I can't test, so... good luck!), something like:
GET animals/_search
{
"aggregations" : {
"top_animals" : {
"terms" : {
"field": "animals",
"script" : {
"source": """
if (_value == 'cats' || _value == 'dogs') {
return 'pets';
} else if (_value == 'whales' || _value == 'dolphins') {
return 'aquatic';
} else {
return 'alien';
}
""",
"lang": "painless"
},
"size" : 10
}
}
},
"size" : 0
}
Here, _value is set because a "field" is targeted. Check the Terms Aggregation documentation.
It's quite boring to write because switch doesn't seem to exist in their language, but it should do the trick. Also, a more skilled programmer might have shorter/better ways of writing this script: I've never had much use of this "painless" scripting.
Hope this helps. And works. ;)

Elasticsearch sub-aggregation excluding key from parent

I am currently doing an aggregation to get the top 20 terms in a given field and the top 5 co-occuring terms.
{
"aggs": {
"descTerms" : {
"terms" : {
"field" : "Desc as Marketed",
"exclude": "[a-z]{1}|and|the|with",
"size" : 20
},
"aggs" : {
"innerTerms" : {
"terms" : {
"field" : "Desc as Marketed",
"size" : 5
}
}
}
}
}
}
Which results in something like this:
"key": "bluetooth",
"doc_count": 11172,
"innerTerms": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 33700,
"buckets": [
{
"key": "bluetooth",
"doc_count": 11172
},
{
"key": "with",
"doc_count": 3827
}
I would like to exclude the key in the sub aggregation as it always returns as the top result (obviously) I just can't seem to figure out how to do so.
aka I want the previous to look like this:
"key": "bluetooth",
"doc_count": 11172,
"innerTerms": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 33700,
"buckets": [
{
"key": "with",
"doc_count": 3827
}

Resources