How to merge aggregation bucket in Elasticsearch? - elasticsearch

Query
GET /_search
{
"size" : 0,
"query" : {
"ids" : {
"types" : [ ],
"values" : [ "someId1", "someId2", "someId3" ... ]
}
},
"aggregations" : {
"how_to_merge" : {
"terms" : {
"field" : "country",
"size" : 50
}
}
}
}
Result
{
...
"aggregations": {
"how_to_merge": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "KR",
"doc_count": 90
},
{
"key": "JP",
"doc_count": 83
},
{
"key": "US",
"doc_count": 50
},
{
"key": "BE",
"doc_count": 9
}
]
}
}
}
I want to merge "KR" and "JP" and "US"
And change key name to "NEW_RESULT"
So result must like this:
{
...
"aggregations": {
"how_to_merge": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "NEW_RESULT",
"doc_count": 223
},
{
"key": "BE",
"doc_count": 9
}
]
}
}
}
Is it possible in elasticsearch query?
I cannot use a client-side solution since there are too many entities and retrieving all of them and merging would be probably too slow for my application.
Thanks for your help and comments!

You can try writing a script for that though I would recommend benchmarking this approach against the client-side processing since it might be quite slow.

Related

Elasticsearch - Merge date histogram aggregation

Query
GET /_search
{
"size" : 0,
"query" : {
"ids" : {
"types" : [ ],
"values" : [ "someId1", "someId2", "someId3" ... ]
}
},
"aggregations" : {
"how_to_merge" : {
"date_histogram" : {
"script" : {
"inline" : "doc['date'].values"
},
"interval" : "1y",
"format" : "yyyy"
}
}
}
}
Result
{
...
"aggregations": {
"how_to_merge": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key_as_string": "2006",
"key": 1136073600000,
"doc_count": 1
},
{
"key_as_string": "2007",
"key": 1167609600000,
"doc_count": 2
},
{
"key_as_string": "2008",
"key": 1199145600000,
"doc_count": 3
}
]
}
}
}
I want to merge "2006" and "2007"
And change key name to "TMP"
So result must like this:
{
...
"aggregations": {
"how_to_merge": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key_as_string": "TMP",
"key": 1136073600000,
"doc_count": 3 <----------- 2006(1) + 2007(2)
},
{
"key_as_string": "2008",
"key": 1199145600000,
"doc_count": 3
}
]
}
}
}
In the case of term queries, if I write a script query as shown below, it works normally, but the date histogram query does not change the key value.
The query below is a script query in which the key value in term aggregation changes SOMETHING1 and SOMETHING2 to TMP and merges.
def param = new groovy.json.JsonSlurper().parseText(
'{\"SOMETHING1\": \"TMP\", \"SOMETHING2\": \"TMP\"}'
);
def data = doc['my_field'].values;
def list = [];
if (!doc['my_field'].empty) {
for (x in data) {
if (param[x] != null) {
list.add(param[x]);
} else {
list.add(x);
}
}
};
return list;
How can I write a script query when I want to change the key value to the value I want and combine doc_count in the data histogram query?
Thanks for your help and comments!

Return just buckets size of aggregation query - Elasticsearch

I'm using an aggregation query on elasticsearch 2.1, here is my query:
"aggs": {
"atendimentos": {
"terms": {
"field": "_parent",
"size" : 0
}
}
}
The return is like that:
"aggregations": {
"atendimentos": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "1a92d5c0-d542-4f69-aeb0-42a467f6a703",
"doc_count": 12
},
{
"key": "4e30bf6d-730d-4217-a6ef-e7b2450a012f",
"doc_count": 12
}.......
It return 40000 buckets, so i have a lot of buckets in this aggregation, i just want return the buckets size, but i want something like that:
buckets_size: 40000
Guys, how return just the buckets size?
Well, thank you all.
try this query:
POST index/_search
{
"size": 0,
"aggs": {
"atendimentos": {
"terms": {
"field": "_parent"
}
},
"count":{
"cardinality": {
"field": "_parent"
}
}
}
}
It may return something like that:
"aggregations": {
"aads": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "aa",
"doc_count": 1
},
{
"key": "bb",
"doc_count": 1
}
]
},
"count": {
"value": 2
}
}
EDIT: More info here - https://www.elastic.co/guide/en/elasticsearch/reference/2.1/search-aggregations-metrics-cardinality-aggregation.html
{
"aggs" : {
"type_count" : {
"cardinality" : {
"field" : "type"
}
}
}
}
Read more about Cardinality Aggregation

Incorrect unique values from field in elasticsearch

I am trying to get unique values from the field in elastic search. For doing that first of all I did next:
PUT tv-programs/_mapping/text?update_all_types
{
"properties": {
"channelName": {
"type": "text",
"fielddata": true
}
}
}
After that I executed this :
GET _search
{
"size": 0,
"aggs" : {
"channels" : {
"terms" : { "field" : "channelName" ,
"size": 1000
}
}
}}
And saw next response:
...
"buckets": [
{
"key": "tv",
"doc_count": 4582
},
{
"key": "baby",
"doc_count": 2424
},
{
"key": "24",
"doc_count": 1547
},
{
"key": "channel",
"doc_count": 1192
},..
The problem is that in original entries there are not 4 different records. Correct output should be next:
"buckets": [
{
"key": "baby tv",
"doc_count": 4582
}
{
"key": "channel 24",
"doc_count": 1547
},..
Why that's happening? How can I see the correct output?
I've found the solution.
I just added .keyword after field name:
GET _search
{
"size": 0,
"aggs" : {
"channels" : {
"terms" : { "field" : "channelName.keyword" ,
"size": 1000
}
}
}}

How to use copy_to do multi field aggregation?

I put some data into ES. Then I specify two field in one group using copy_to feature. The reason to do this is to do multi field agg. Below are my steps.
Create index
curl -XPOST "localhost:9200/test?pretty" -d '{
"mappings" : {
"type9k" : {
"properties" : {
"SRC" : { "type" : "string", "index" : "not_analyzed" ,"copy_to": "SRC_AND_DST"},
"DST" : { "type" : "string", "index" : "not_analyzed" ,"copy_to": "SRC_AND_DST"},
"BITS" : { "type" : "long", "index" : "not_analyzed" },
"TIME" : { "type" : "long", "index" : "not_analyzed" }
}
}
}
}'
Put data into ES
curl -X POST "http://localhost:9200/test/type9k/_bulk?pretty" -d '
{"index":{}}
{"SRC":"BJ","DST":"DL","PROTOCOL":"ip","BITS":10,"TIME":1453360000}
{"index":{}}
{"SRC":"BJ","DST":"DL","PROTOCOL":"tcp","BITS":10,"TIME":1453360000}
{"index":{}}
{"SRC":"DL","DST":"SH","PROTOCOL":"UDP","BITS":10,"TIME":1453360000}
{"index":{}}
{"SRC":"SH","DST":"BJ","PROTOCOL":"ip","BITS":10,"TIME":1453360000}
{"index":{}}
{"SRC":"BJ","DST":"DL","PROTOCOL":"ip","BITS":20,"TIME":1453360300}
{"index":{}}
{"SRC":"BJ","DST":"SH","PROTOCOL":"tcp","BITS":20,"TIME":1453360300}
{"index":{}}
{"SRC":"DL","DST":"SH","PROTOCOL":"UDP","BITS":20,"TIME":1453360300}
{"index":{}}
{"SRC":"SH","DST":"BJ","PROTOCOL":"ip","BITS":20,"TIME":1453360300}
{"index":{}}
{"SRC":"BJ","DST":"DL","PROTOCOL":"ip","BITS":30,"TIME":1453360600}
{"index":{}}
{"SRC":"BJ","DST":"SH","PROTOCOL":"tcp","BITS":30,"TIME":1453360600}
{"index":{}}
{"SRC":"DL","DST":"SH","PROTOCOL":"UDP","BITS":30,"TIME":1453360600}
{"index":{}}
{"SRC":"SH","DST":"BJ","PROTOCOL":"ip","BITS":30,"TIME":1453360600}
'
OK. Question
I want to aggregate on SRC,DST use sum aggregator. Then return the top 3 results. Translate my requirement to SQL is like
SELECT sum(BITS) FROM table GROUP BY src,dst ORDER BY sum(BITS) DESC LIMIT 3.
I know that I can do this using script feature like below:
curl -XPOST "localhost:9200/_all/_search?pretty" -d '
{
"_source": [ "SRC", "DST","BITS"],
"size":0,
"query": { "match_all": {} },
"aggs":
{
"SRC_DST":
{
"terms": {"script": "[doc.SRC.value, doc.DST.value].join(\"-\")","size": 2,"shard_size":0, "order": {"sum_bits": "desc"}},
"aggs": { "sum_bits": { "sum": {"field": "BITS"} } }
}
}
}
'
The result I got with script will be like below:
"aggregations" : {
"SRC_DST" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 10,
"buckets" : [ {
"key" : "BJ-DL",
"doc_count" : 8,
"sum_bits" : {
"value" : 140.0
}
}, {
"key" : "DL-SH",
"doc_count" : 6,
"sum_bits" : {
"value" : 120.0
}
} ]
But I`m expecting to do it with copy_to feature. Because I think scripting may cost too much time.
I am not sure but I guess you do not need copy_to functionality. If I go by SQL query then what you are asking can be done with terms aggregation and sum aggregation like this
{
"size": 0,
"aggs": {
"unique_src": {
"terms": {
"field": "SRC",
"size": 10
},
"aggs": {
"unique_dst": {
"terms": {
"field": "DST",
"size": 3,
"order": {
"bits_sum": "desc"
}
},
"aggs": {
"bits_sum": {
"sum": {
"field": "BITS"
}
}
}
}
}
}
}
}
Above query give me output like this
"aggregations": {
"unique_src": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "BJ",
"doc_count": 6,
"unique_dst": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "DL",
"doc_count": 4,
"bits_sum": {
"value": 70
}
},
{
"key": "SH",
"doc_count": 2,
"bits_sum": {
"value": 50
}
}
]
}
},
{
"key": "DL",
"doc_count": 3,
"unique_dst": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "SH",
"doc_count": 3,
"bits_sum": {
"value": 60
}
}
]
}
},
{
"key": "SH",
"doc_count": 3,
"unique_dst": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "BJ",
"doc_count": 3,
"bits_sum": {
"value": 60
}
}
]
}
}
]
}
}
Hope this helps!

Elasticsearch sub-aggregation excluding key from parent

I am currently doing an aggregation to get the top 20 terms in a given field and the top 5 co-occuring terms.
{
"aggs": {
"descTerms" : {
"terms" : {
"field" : "Desc as Marketed",
"exclude": "[a-z]{1}|and|the|with",
"size" : 20
},
"aggs" : {
"innerTerms" : {
"terms" : {
"field" : "Desc as Marketed",
"size" : 5
}
}
}
}
}
}
Which results in something like this:
"key": "bluetooth",
"doc_count": 11172,
"innerTerms": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 33700,
"buckets": [
{
"key": "bluetooth",
"doc_count": 11172
},
{
"key": "with",
"doc_count": 3827
}
I would like to exclude the key in the sub aggregation as it always returns as the top result (obviously) I just can't seem to figure out how to do so.
aka I want the previous to look like this:
"key": "bluetooth",
"doc_count": 11172,
"innerTerms": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 33700,
"buckets": [
{
"key": "with",
"doc_count": 3827
}

Resources