Query
GET /_search
{
"size" : 0,
"query" : {
"ids" : {
"types" : [ ],
"values" : [ "someId1", "someId2", "someId3" ... ]
}
},
"aggregations" : {
"how_to_merge" : {
"date_histogram" : {
"script" : {
"inline" : "doc['date'].values"
},
"interval" : "1y",
"format" : "yyyy"
}
}
}
}
Result
{
...
"aggregations": {
"how_to_merge": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key_as_string": "2006",
"key": 1136073600000,
"doc_count": 1
},
{
"key_as_string": "2007",
"key": 1167609600000,
"doc_count": 2
},
{
"key_as_string": "2008",
"key": 1199145600000,
"doc_count": 3
}
]
}
}
}
I want to merge "2006" and "2007"
And change key name to "TMP"
So result must like this:
{
...
"aggregations": {
"how_to_merge": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key_as_string": "TMP",
"key": 1136073600000,
"doc_count": 3 <----------- 2006(1) + 2007(2)
},
{
"key_as_string": "2008",
"key": 1199145600000,
"doc_count": 3
}
]
}
}
}
In the case of term queries, if I write a script query as shown below, it works normally, but the date histogram query does not change the key value.
The query below is a script query in which the key value in term aggregation changes SOMETHING1 and SOMETHING2 to TMP and merges.
def param = new groovy.json.JsonSlurper().parseText(
'{\"SOMETHING1\": \"TMP\", \"SOMETHING2\": \"TMP\"}'
);
def data = doc['my_field'].values;
def list = [];
if (!doc['my_field'].empty) {
for (x in data) {
if (param[x] != null) {
list.add(param[x]);
} else {
list.add(x);
}
}
};
return list;
How can I write a script query when I want to change the key value to the value I want and combine doc_count in the data histogram query?
Thanks for your help and comments!
Query
GET /_search
{
"size" : 0,
"query" : {
"ids" : {
"types" : [ ],
"values" : [ "someId1", "someId2", "someId3" ... ]
}
},
"aggregations" : {
"how_to_merge" : {
"terms" : {
"field" : "country",
"size" : 50
}
}
}
}
Result
{
...
"aggregations": {
"how_to_merge": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "KR",
"doc_count": 90
},
{
"key": "JP",
"doc_count": 83
},
{
"key": "US",
"doc_count": 50
},
{
"key": "BE",
"doc_count": 9
}
]
}
}
}
I want to merge "KR" and "JP" and "US"
And change key name to "NEW_RESULT"
So result must like this:
{
...
"aggregations": {
"how_to_merge": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "NEW_RESULT",
"doc_count": 223
},
{
"key": "BE",
"doc_count": 9
}
]
}
}
}
Is it possible in elasticsearch query?
I cannot use a client-side solution since there are too many entities and retrieving all of them and merging would be probably too slow for my application.
Thanks for your help and comments!
You can try writing a script for that though I would recommend benchmarking this approach against the client-side processing since it might be quite slow.
I'm using an aggregation query on elasticsearch 2.1, here is my query:
"aggs": {
"atendimentos": {
"terms": {
"field": "_parent",
"size" : 0
}
}
}
The return is like that:
"aggregations": {
"atendimentos": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "1a92d5c0-d542-4f69-aeb0-42a467f6a703",
"doc_count": 12
},
{
"key": "4e30bf6d-730d-4217-a6ef-e7b2450a012f",
"doc_count": 12
}.......
It return 40000 buckets, so i have a lot of buckets in this aggregation, i just want return the buckets size, but i want something like that:
buckets_size: 40000
Guys, how return just the buckets size?
Well, thank you all.
try this query:
POST index/_search
{
"size": 0,
"aggs": {
"atendimentos": {
"terms": {
"field": "_parent"
}
},
"count":{
"cardinality": {
"field": "_parent"
}
}
}
}
It may return something like that:
"aggregations": {
"aads": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "aa",
"doc_count": 1
},
{
"key": "bb",
"doc_count": 1
}
]
},
"count": {
"value": 2
}
}
EDIT: More info here - https://www.elastic.co/guide/en/elasticsearch/reference/2.1/search-aggregations-metrics-cardinality-aggregation.html
{
"aggs" : {
"type_count" : {
"cardinality" : {
"field" : "type"
}
}
}
}
Read more about Cardinality Aggregation
I am trying to get unique values from the field in elastic search. For doing that first of all I did next:
PUT tv-programs/_mapping/text?update_all_types
{
"properties": {
"channelName": {
"type": "text",
"fielddata": true
}
}
}
After that I executed this :
GET _search
{
"size": 0,
"aggs" : {
"channels" : {
"terms" : { "field" : "channelName" ,
"size": 1000
}
}
}}
And saw next response:
...
"buckets": [
{
"key": "tv",
"doc_count": 4582
},
{
"key": "baby",
"doc_count": 2424
},
{
"key": "24",
"doc_count": 1547
},
{
"key": "channel",
"doc_count": 1192
},..
The problem is that in original entries there are not 4 different records. Correct output should be next:
"buckets": [
{
"key": "baby tv",
"doc_count": 4582
}
{
"key": "channel 24",
"doc_count": 1547
},..
Why that's happening? How can I see the correct output?
I've found the solution.
I just added .keyword after field name:
GET _search
{
"size": 0,
"aggs" : {
"channels" : {
"terms" : { "field" : "channelName.keyword" ,
"size": 1000
}
}
}}
I put some data into ES. Then I specify two field in one group using copy_to feature. The reason to do this is to do multi field agg. Below are my steps.
Create index
curl -XPOST "localhost:9200/test?pretty" -d '{
"mappings" : {
"type9k" : {
"properties" : {
"SRC" : { "type" : "string", "index" : "not_analyzed" ,"copy_to": "SRC_AND_DST"},
"DST" : { "type" : "string", "index" : "not_analyzed" ,"copy_to": "SRC_AND_DST"},
"BITS" : { "type" : "long", "index" : "not_analyzed" },
"TIME" : { "type" : "long", "index" : "not_analyzed" }
}
}
}
}'
Put data into ES
curl -X POST "http://localhost:9200/test/type9k/_bulk?pretty" -d '
{"index":{}}
{"SRC":"BJ","DST":"DL","PROTOCOL":"ip","BITS":10,"TIME":1453360000}
{"index":{}}
{"SRC":"BJ","DST":"DL","PROTOCOL":"tcp","BITS":10,"TIME":1453360000}
{"index":{}}
{"SRC":"DL","DST":"SH","PROTOCOL":"UDP","BITS":10,"TIME":1453360000}
{"index":{}}
{"SRC":"SH","DST":"BJ","PROTOCOL":"ip","BITS":10,"TIME":1453360000}
{"index":{}}
{"SRC":"BJ","DST":"DL","PROTOCOL":"ip","BITS":20,"TIME":1453360300}
{"index":{}}
{"SRC":"BJ","DST":"SH","PROTOCOL":"tcp","BITS":20,"TIME":1453360300}
{"index":{}}
{"SRC":"DL","DST":"SH","PROTOCOL":"UDP","BITS":20,"TIME":1453360300}
{"index":{}}
{"SRC":"SH","DST":"BJ","PROTOCOL":"ip","BITS":20,"TIME":1453360300}
{"index":{}}
{"SRC":"BJ","DST":"DL","PROTOCOL":"ip","BITS":30,"TIME":1453360600}
{"index":{}}
{"SRC":"BJ","DST":"SH","PROTOCOL":"tcp","BITS":30,"TIME":1453360600}
{"index":{}}
{"SRC":"DL","DST":"SH","PROTOCOL":"UDP","BITS":30,"TIME":1453360600}
{"index":{}}
{"SRC":"SH","DST":"BJ","PROTOCOL":"ip","BITS":30,"TIME":1453360600}
'
OK. Question
I want to aggregate on SRC,DST use sum aggregator. Then return the top 3 results. Translate my requirement to SQL is like
SELECT sum(BITS) FROM table GROUP BY src,dst ORDER BY sum(BITS) DESC LIMIT 3.
I know that I can do this using script feature like below:
curl -XPOST "localhost:9200/_all/_search?pretty" -d '
{
"_source": [ "SRC", "DST","BITS"],
"size":0,
"query": { "match_all": {} },
"aggs":
{
"SRC_DST":
{
"terms": {"script": "[doc.SRC.value, doc.DST.value].join(\"-\")","size": 2,"shard_size":0, "order": {"sum_bits": "desc"}},
"aggs": { "sum_bits": { "sum": {"field": "BITS"} } }
}
}
}
'
The result I got with script will be like below:
"aggregations" : {
"SRC_DST" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 10,
"buckets" : [ {
"key" : "BJ-DL",
"doc_count" : 8,
"sum_bits" : {
"value" : 140.0
}
}, {
"key" : "DL-SH",
"doc_count" : 6,
"sum_bits" : {
"value" : 120.0
}
} ]
But I`m expecting to do it with copy_to feature. Because I think scripting may cost too much time.
I am not sure but I guess you do not need copy_to functionality. If I go by SQL query then what you are asking can be done with terms aggregation and sum aggregation like this
{
"size": 0,
"aggs": {
"unique_src": {
"terms": {
"field": "SRC",
"size": 10
},
"aggs": {
"unique_dst": {
"terms": {
"field": "DST",
"size": 3,
"order": {
"bits_sum": "desc"
}
},
"aggs": {
"bits_sum": {
"sum": {
"field": "BITS"
}
}
}
}
}
}
}
}
Above query give me output like this
"aggregations": {
"unique_src": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "BJ",
"doc_count": 6,
"unique_dst": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "DL",
"doc_count": 4,
"bits_sum": {
"value": 70
}
},
{
"key": "SH",
"doc_count": 2,
"bits_sum": {
"value": 50
}
}
]
}
},
{
"key": "DL",
"doc_count": 3,
"unique_dst": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "SH",
"doc_count": 3,
"bits_sum": {
"value": 60
}
}
]
}
},
{
"key": "SH",
"doc_count": 3,
"unique_dst": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "BJ",
"doc_count": 3,
"bits_sum": {
"value": 60
}
}
]
}
}
]
}
}
Hope this helps!