elasticsearch filters aggregation does not return array format - elasticsearch

The filters aggregation returns bucket as object
"buckets": {
"errors": {
"doc_count": 1
},
"warnings": {
"doc_count": 2
}
}
But i would like to return a buckets array, like the terms aggregation
"buckets": [
{
"key": "errors",
"doc_count": 1
},
{
"key": "warnings",
"doc_count": 2
}
]
Is this possible or any sort of data transformation can be done in the query to make it so?

You can do it by providing an array of filters, but in this case your buckets will be anonymous:
GET logs/_search
{
"size": 0,
"aggs" : {
"messages" : {
"filters" : {
"filters" : [ <--- specify array
{ "match" : { "body" : "error" }},
{ "match" : { "body" : "warning" }}
]
}
}
}
}
The response will provide an array of resulting buckets in the same order
"buckets": [
{
"doc_count": 1
},
{
"doc_count": 2
}
]

Related

Elasticsearch - Merge date histogram aggregation

Query
GET /_search
{
"size" : 0,
"query" : {
"ids" : {
"types" : [ ],
"values" : [ "someId1", "someId2", "someId3" ... ]
}
},
"aggregations" : {
"how_to_merge" : {
"date_histogram" : {
"script" : {
"inline" : "doc['date'].values"
},
"interval" : "1y",
"format" : "yyyy"
}
}
}
}
Result
{
...
"aggregations": {
"how_to_merge": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key_as_string": "2006",
"key": 1136073600000,
"doc_count": 1
},
{
"key_as_string": "2007",
"key": 1167609600000,
"doc_count": 2
},
{
"key_as_string": "2008",
"key": 1199145600000,
"doc_count": 3
}
]
}
}
}
I want to merge "2006" and "2007"
And change key name to "TMP"
So result must like this:
{
...
"aggregations": {
"how_to_merge": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key_as_string": "TMP",
"key": 1136073600000,
"doc_count": 3 <----------- 2006(1) + 2007(2)
},
{
"key_as_string": "2008",
"key": 1199145600000,
"doc_count": 3
}
]
}
}
}
In the case of term queries, if I write a script query as shown below, it works normally, but the date histogram query does not change the key value.
The query below is a script query in which the key value in term aggregation changes SOMETHING1 and SOMETHING2 to TMP and merges.
def param = new groovy.json.JsonSlurper().parseText(
'{\"SOMETHING1\": \"TMP\", \"SOMETHING2\": \"TMP\"}'
);
def data = doc['my_field'].values;
def list = [];
if (!doc['my_field'].empty) {
for (x in data) {
if (param[x] != null) {
list.add(param[x]);
} else {
list.add(x);
}
}
};
return list;
How can I write a script query when I want to change the key value to the value I want and combine doc_count in the data histogram query?
Thanks for your help and comments!

How to merge aggregation bucket in Elasticsearch?

Query
GET /_search
{
"size" : 0,
"query" : {
"ids" : {
"types" : [ ],
"values" : [ "someId1", "someId2", "someId3" ... ]
}
},
"aggregations" : {
"how_to_merge" : {
"terms" : {
"field" : "country",
"size" : 50
}
}
}
}
Result
{
...
"aggregations": {
"how_to_merge": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "KR",
"doc_count": 90
},
{
"key": "JP",
"doc_count": 83
},
{
"key": "US",
"doc_count": 50
},
{
"key": "BE",
"doc_count": 9
}
]
}
}
}
I want to merge "KR" and "JP" and "US"
And change key name to "NEW_RESULT"
So result must like this:
{
...
"aggregations": {
"how_to_merge": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "NEW_RESULT",
"doc_count": 223
},
{
"key": "BE",
"doc_count": 9
}
]
}
}
}
Is it possible in elasticsearch query?
I cannot use a client-side solution since there are too many entities and retrieving all of them and merging would be probably too slow for my application.
Thanks for your help and comments!
You can try writing a script for that though I would recommend benchmarking this approach against the client-side processing since it might be quite slow.

ElasticSearch Max Agg on lowest value inside a list property of the document

I'm looking to do a Max aggregation on a value of the property under my document, the property is a list of complex object (key and value). Here's my data:
[{
"id" : "1",
"listItems" :
[
{
"key" : "li1",
"value" : 100
},
{
"key" : "li2",
"value" : 5000
}
]
},
{
"id" : "2",
"listItems" :
[
{
"key" : "li3",
"value" : 200
},
{
"key" : "li2",
"value" : 2000
}
]
}]
When I do the Nested Max Aggregation on "listItems.value", I'm expecting the max value returned to be 200 (and not 5000), reason being I want the logic to first figure the MIN value under listItems for each document, then doing the Max Aggregation on that. Is it possible to do something like this?
Thanks.
The search query performs the following aggregation :
Terms aggregation on the id field
Min aggregation on listItems.value
Max bucket aggregation that is a sibling pipeline aggregation which identifies the bucket(s) with the maximum value of a specified metric in a sibling aggregation and outputs both the value and the key(s) of the bucket(s).
Please refer to nested aggregation, to get a detailed explanation on it.
Adding a working example with index data, index mapping, search query, and search result.
Index Mapping:
{
"mappings": {
"properties": {
"listItems": {
"type": "nested"
},
"id":{
"type":"text",
"fielddata":"true"
}
}
}
}
Index Data:
{
"id" : "1",
"listItems" :
[
{
"key" : "li1",
"value" : 100
},
{
"key" : "li2",
"value" : 5000
}
]
}
{
"id" : "2",
"listItems" :
[
{
"key" : "li3",
"value" : 200
},
{
"key" : "li2",
"value" : 2000
}
]
}
Search Query:
{
"size": 0,
"aggs": {
"id_terms": {
"terms": {
"field": "id"
},
"aggs": {
"nested_entries": {
"nested": {
"path": "listItems"
},
"aggs": {
"min_position": {
"min": {
"field": "listItems.value"
}
}
}
}
}
},
"maxValue": {
"max_bucket": {
"buckets_path": "id_terms>nested_entries>min_position"
}
}
}
}
Search Result:
"aggregations": {
"id_terms": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "1",
"doc_count": 1,
"nested_entries": {
"doc_count": 2,
"min_position": {
"value": 100.0
}
}
},
{
"key": "2",
"doc_count": 1,
"nested_entries": {
"doc_count": 2,
"min_position": {
"value": 200.0
}
}
}
]
},
"maxValue": {
"value": 200.0,
"keys": [
"2"
]
}
}
Initial post was mentioning nested aggregation, thus i was sure question is about nested documents. Since i've come to solution before seeing another answer, i'm keeping the whole thing for history, but actually it differs only in adding nested aggregation.
The whole process can be explained like that:
Bucket each document into single bucket.
Use nested aggregation to be able to aggregate on nested documents.
Use min aggregation to find minimum value within all document nested documents, and by that, for document itself.
Finally, use another aggregation to calculate maximum value among results of previous aggregation.
Given this setup:
// PUT /index
{
"mappings": {
"properties": {
"children": {
"type": "nested",
"properties": {
"value": {
"type": "integer"
}
}
}
}
}
}
// POST /index/_doc
{
"children": [
{ "value": 12 },
{ "value": 45 }
]
}
// POST /index/_doc
{
"children": [
{ "value": 7 },
{ "value": 35 }
]
}
I can use those aggregations in request to get required value:
{
"size": 0,
"aggs": {
"document": {
"terms": {"field": "_id"},
"aggs": {
"children": {
"nested": {
"path": "children"
},
"aggs": {
"minimum": {
"min": {
"field": "children.value"
}
}
}
}
}
},
"result": {
"max_bucket": {
"buckets_path": "document>children>minimum"
}
}
}
}
{
"aggregations": {
"document": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "O4QxyHQBK5VO9CW5xJGl",
"doc_count": 1,
"children": {
"doc_count": 2,
"minimum": {
"value": 7.0
}
}
},
{
"key": "OoQxyHQBK5VO9CW5kpEc",
"doc_count": 1,
"children": {
"doc_count": 2,
"minimum": {
"value": 12.0
}
}
}
]
},
"result": {
"value": 12.0,
"keys": [
"OoQxyHQBK5VO9CW5kpEc"
]
}
}
}
There also should be a workaround using script for calculating max - all that you will need to do is just find and return smallest value in document in such script.

Grouping values in an aggregation

I'm quite new to Elasticsearch.
I have a query that looks like this:
GET animals/_search
{
"aggregations" : {
"top_animals" : {
"terms" : {"field" : "animals", "size" : 10}
}
},
"size" : 0
}
This returns something like:
{
(...)
"aggregations": {
"top_animals": {
(...)
"buckets": [
{
"key": "dogs",
"doc_count": 100
},
{
"key": "whales",
"doc_count": 70
},
{
"key": "dolphins",
"doc_count": 50
},
{
"key": "cats",
"doc_count": 10
}
]
}
}
}
Now I've been given a list of animals that are equivalent and should be counted together.
So "dogs" and "cats" are "pets", and "dolphins" and "whales" are "aquatic_mammals".
I'd like a result like this (note that the results are ordered):
{
(...)
"aggregations": {
"top_animals": {
(...)
"buckets": [
{
"key": "aquatic_mammals",
"doc_count": 120
},
{
"key": "pets",
"doc_count": 110
}
]
}
}
}
How should I modify my query?
Thanks!
If I understand you well, the values pets and aquatic are not part of the stored data?
There's probably a way with a script (which I can't test, so... good luck!), something like:
GET animals/_search
{
"aggregations" : {
"top_animals" : {
"terms" : {
"field": "animals",
"script" : {
"source": """
if (_value == 'cats' || _value == 'dogs') {
return 'pets';
} else if (_value == 'whales' || _value == 'dolphins') {
return 'aquatic';
} else {
return 'alien';
}
""",
"lang": "painless"
},
"size" : 10
}
}
},
"size" : 0
}
Here, _value is set because a "field" is targeted. Check the Terms Aggregation documentation.
It's quite boring to write because switch doesn't seem to exist in their language, but it should do the trick. Also, a more skilled programmer might have shorter/better ways of writing this script: I've never had much use of this "painless" scripting.
Hope this helps. And works. ;)

Incorrect unique values from field in elasticsearch

I am trying to get unique values from the field in elastic search. For doing that first of all I did next:
PUT tv-programs/_mapping/text?update_all_types
{
"properties": {
"channelName": {
"type": "text",
"fielddata": true
}
}
}
After that I executed this :
GET _search
{
"size": 0,
"aggs" : {
"channels" : {
"terms" : { "field" : "channelName" ,
"size": 1000
}
}
}}
And saw next response:
...
"buckets": [
{
"key": "tv",
"doc_count": 4582
},
{
"key": "baby",
"doc_count": 2424
},
{
"key": "24",
"doc_count": 1547
},
{
"key": "channel",
"doc_count": 1192
},..
The problem is that in original entries there are not 4 different records. Correct output should be next:
"buckets": [
{
"key": "baby tv",
"doc_count": 4582
}
{
"key": "channel 24",
"doc_count": 1547
},..
Why that's happening? How can I see the correct output?
I've found the solution.
I just added .keyword after field name:
GET _search
{
"size": 0,
"aggs" : {
"channels" : {
"terms" : { "field" : "channelName.keyword" ,
"size": 1000
}
}
}}

Resources