Grouping values in an aggregation - elasticsearch

I'm quite new to Elasticsearch.
I have a query that looks like this:
GET animals/_search
{
"aggregations" : {
"top_animals" : {
"terms" : {"field" : "animals", "size" : 10}
}
},
"size" : 0
}
This returns something like:
{
(...)
"aggregations": {
"top_animals": {
(...)
"buckets": [
{
"key": "dogs",
"doc_count": 100
},
{
"key": "whales",
"doc_count": 70
},
{
"key": "dolphins",
"doc_count": 50
},
{
"key": "cats",
"doc_count": 10
}
]
}
}
}
Now I've been given a list of animals that are equivalent and should be counted together.
So "dogs" and "cats" are "pets", and "dolphins" and "whales" are "aquatic_mammals".
I'd like a result like this (note that the results are ordered):
{
(...)
"aggregations": {
"top_animals": {
(...)
"buckets": [
{
"key": "aquatic_mammals",
"doc_count": 120
},
{
"key": "pets",
"doc_count": 110
}
]
}
}
}
How should I modify my query?
Thanks!

If I understand you well, the values pets and aquatic are not part of the stored data?
There's probably a way with a script (which I can't test, so... good luck!), something like:
GET animals/_search
{
"aggregations" : {
"top_animals" : {
"terms" : {
"field": "animals",
"script" : {
"source": """
if (_value == 'cats' || _value == 'dogs') {
return 'pets';
} else if (_value == 'whales' || _value == 'dolphins') {
return 'aquatic';
} else {
return 'alien';
}
""",
"lang": "painless"
},
"size" : 10
}
}
},
"size" : 0
}
Here, _value is set because a "field" is targeted. Check the Terms Aggregation documentation.
It's quite boring to write because switch doesn't seem to exist in their language, but it should do the trick. Also, a more skilled programmer might have shorter/better ways of writing this script: I've never had much use of this "painless" scripting.
Hope this helps. And works. ;)

Related

elasticsearch filters aggregation does not return array format

The filters aggregation returns bucket as object
"buckets": {
"errors": {
"doc_count": 1
},
"warnings": {
"doc_count": 2
}
}
But i would like to return a buckets array, like the terms aggregation
"buckets": [
{
"key": "errors",
"doc_count": 1
},
{
"key": "warnings",
"doc_count": 2
}
]
Is this possible or any sort of data transformation can be done in the query to make it so?
You can do it by providing an array of filters, but in this case your buckets will be anonymous:
GET logs/_search
{
"size": 0,
"aggs" : {
"messages" : {
"filters" : {
"filters" : [ <--- specify array
{ "match" : { "body" : "error" }},
{ "match" : { "body" : "warning" }}
]
}
}
}
}
The response will provide an array of resulting buckets in the same order
"buckets": [
{
"doc_count": 1
},
{
"doc_count": 2
}
]

Elasticsearch - Merge date histogram aggregation

Query
GET /_search
{
"size" : 0,
"query" : {
"ids" : {
"types" : [ ],
"values" : [ "someId1", "someId2", "someId3" ... ]
}
},
"aggregations" : {
"how_to_merge" : {
"date_histogram" : {
"script" : {
"inline" : "doc['date'].values"
},
"interval" : "1y",
"format" : "yyyy"
}
}
}
}
Result
{
...
"aggregations": {
"how_to_merge": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key_as_string": "2006",
"key": 1136073600000,
"doc_count": 1
},
{
"key_as_string": "2007",
"key": 1167609600000,
"doc_count": 2
},
{
"key_as_string": "2008",
"key": 1199145600000,
"doc_count": 3
}
]
}
}
}
I want to merge "2006" and "2007"
And change key name to "TMP"
So result must like this:
{
...
"aggregations": {
"how_to_merge": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key_as_string": "TMP",
"key": 1136073600000,
"doc_count": 3 <----------- 2006(1) + 2007(2)
},
{
"key_as_string": "2008",
"key": 1199145600000,
"doc_count": 3
}
]
}
}
}
In the case of term queries, if I write a script query as shown below, it works normally, but the date histogram query does not change the key value.
The query below is a script query in which the key value in term aggregation changes SOMETHING1 and SOMETHING2 to TMP and merges.
def param = new groovy.json.JsonSlurper().parseText(
'{\"SOMETHING1\": \"TMP\", \"SOMETHING2\": \"TMP\"}'
);
def data = doc['my_field'].values;
def list = [];
if (!doc['my_field'].empty) {
for (x in data) {
if (param[x] != null) {
list.add(param[x]);
} else {
list.add(x);
}
}
};
return list;
How can I write a script query when I want to change the key value to the value I want and combine doc_count in the data histogram query?
Thanks for your help and comments!

How to merge aggregation bucket in Elasticsearch?

Query
GET /_search
{
"size" : 0,
"query" : {
"ids" : {
"types" : [ ],
"values" : [ "someId1", "someId2", "someId3" ... ]
}
},
"aggregations" : {
"how_to_merge" : {
"terms" : {
"field" : "country",
"size" : 50
}
}
}
}
Result
{
...
"aggregations": {
"how_to_merge": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "KR",
"doc_count": 90
},
{
"key": "JP",
"doc_count": 83
},
{
"key": "US",
"doc_count": 50
},
{
"key": "BE",
"doc_count": 9
}
]
}
}
}
I want to merge "KR" and "JP" and "US"
And change key name to "NEW_RESULT"
So result must like this:
{
...
"aggregations": {
"how_to_merge": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "NEW_RESULT",
"doc_count": 223
},
{
"key": "BE",
"doc_count": 9
}
]
}
}
}
Is it possible in elasticsearch query?
I cannot use a client-side solution since there are too many entities and retrieving all of them and merging would be probably too slow for my application.
Thanks for your help and comments!
You can try writing a script for that though I would recommend benchmarking this approach against the client-side processing since it might be quite slow.

Combine results of multiple aggregations

I have movies index in which each document has this structure :
Document :
{
"color": "Color",
"director_name": "Sam Raimi",
"actor_2_name": "James Franco",
"movie_title": "Spider-Man 2",
"actor_3_name" : "Brad Pitt",
"actor_1_name": "J.K. Simmons"
}
I need to do calculate number of movies corresponding to each actor (actor can be in both actor_1_name or actor_2_name or actor_3_name field)
Mapping of these 3 fields is :
Mapping
"mappings": {
"properties": {
"actor_1_name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"actor_2_name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"actor_3_name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
Is there a way I can aggregated result which can combine terms from all 3 actor fields and give a single aggreagation.
Currently I am creating separate aggregation for each actor field and through my JAVA code combine these different aggregations into one.
Search Query by creating different aggregation :
Search Query :
{
"aggs" : {
"actor1_count" : {
"terms" : {
"field" : "actor_1_name.keyword"
}
},
"actor2_count" : {
"terms" : {
"field" : "actor_2_name.keyword"
}
},
"actor3_count" : {
"terms" : {
"field" : "actor_3_name.keyword"
}
}
}
}
Result
Sample Result is :
"aggregations": {
"actor1_count": {
"buckets": [
{
"key": "Johnny Depp",
"doc_count": 2
}
]
},
"actor2_count": {
"buckets": [
{
"key": "Johnny Depp",
"doc_count": 1 }
]
},
"actor3_count": {
"buckets": [
{
"key": "Johnny Depp",
"doc_count": 3
}
]
}
}
So, is it possible instead of creating different aggregation , I can combine result of all 3 aggregation in one aggreation through Elasticsearch.
Basically this is I want :
"aggregations": {
"actor_count": {
"buckets": [
{
"key": "Johnny Depp",
"doc_count": 6
}
]
}
}
(Johnny Depp doc_count should show sum from all 3 field actor_1_name, actor_2_name, actor_3_name wherever it is present)
I have tried though script but it dint worked correctly .
Script Query :
{
"aggregations": {
"name": {
"terms": {
"script": "doc['actor_1_name.keyword'].value + ' ' + doc['actor_2_name.keyword'].value + ' ' + doc['actor_2_name.keyword'].value"
}
}
}
}
It is combining actor names and then giving result .
Result :
"buckets": [
{
"key": "Steve Buscemi Adam Sandler Adam Sandler",
"doc_count": 6
},
{
"key": "Leonard Nimoy Nichelle Nichols Nichelle Nichols",
"doc_count": 4
}
]
This is not going to work w/ terms. Gotta resort to scripted_metric, I think:
GET actors/_search
{
"size": 0,
"aggs": {
"merged_actors": {
"scripted_metric": {
"init_script": "state.actors_map=[:]",
"map_script": """
def actor_keys = ['actor_1_name', 'actor_2_name', 'actor_3_name'];
for (def key : actor_keys) {
def actor_name = doc[key + '.keyword'].value;
if (state.actors_map.containsKey(actor_name)) {
state.actors_map[actor_name] += 1;
} else {
state.actors_map[actor_name] = 1;
}
}
""",
"combine_script": "return state",
"reduce_script": "return states"
}
}
}
}
yielding
...
"aggregations" : {
"merged_actors" : {
"value" : [
{
"actors_map" : {
"Brad Pitt" : 5,
"J.K. Simmons" : 1,
"James Franco" : 3
}
}
]
}
}

Elasticsearch aggregation by full array

How i can to get aggregation by all array items inside document, not by each value of array. For example i have several documents, like this
{'some_field': [1,2]}
{'some_field': [1]}
{'some_field': [1]}
{'some_field': [7,2]}
Now with simple aggregation query like this
{
"aggs" : {
"agg_name" : {
"terms" : {
"field" : "some_field"
}
}
},
"size": 0
}
i got result like this
"buckets": [
{
"key": "1",
"doc_count": 3
},
{
"key": "2",
"doc_count": 2
},
...
]
but i want to get full array view, like this
"buckets": [
{
"key": [1],
"doc_count": 2
},
{
"key": [1,2],
"doc_count": 1
},
{
"key": [7,2],
"doc_count": 1
},
]
I was looking for the same aggregation, still doesn't exists.
So fixed with a painless script
POST some_index/_search
{
"size": 0,
"aggs": {
"myaggs": {
"terms": {
"size": 100,
"script": {
"lang": "painless",
"source": """
def myString = "";
for (int i = 0; i < doc['data. some_field.keyword'].length; ++i) {
myString += doc['data. some_field.keyword'][i] + ", ";
}
return myString;
"""
}
}
}
}
}

Resources