ElasticSearch Multi Aggregation - elasticsearch

I was wondering if it was possible to retrieve the aggregation keys/counters for the documents that are not part of the response. I mean the documents which have been put in the sum_other_doc_count field.
My code for the Aggregation is as follow :
AggregationBuilder agg = AggregationBuilders.terms("AGG_1").field("field1")
.subAggregation(AggregationBuilders.terms("AGG_2").field("field2")
.subAggregation(AggregationBuilders.terms("AGG_3").field("field3")
.subAggregation(AggregationBuilders.terms("AGG_4").field("field4"))));
I've got 5 documents on the AGG_2 that are not part of the response but I need them as much as the others.
"AGG_1": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "404",
"doc_count": 3506,
"AGG_2": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "OK",
"doc_count": 1206,
"AGG_3": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 5,
"buckets": [ ...
Thanks for your help!

You can set a different size value for terms aggregations to specify how many buckets per do you want to get
{
"aggs" : {
"AGG_1" : {
"terms" : {
"field" : "field1",
"size" : 20 // override the number of buckets to return
}
}
}
}

Related

How to merge aggregation bucket in Elasticsearch?

Query
GET /_search
{
"size" : 0,
"query" : {
"ids" : {
"types" : [ ],
"values" : [ "someId1", "someId2", "someId3" ... ]
}
},
"aggregations" : {
"how_to_merge" : {
"terms" : {
"field" : "country",
"size" : 50
}
}
}
}
Result
{
...
"aggregations": {
"how_to_merge": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "KR",
"doc_count": 90
},
{
"key": "JP",
"doc_count": 83
},
{
"key": "US",
"doc_count": 50
},
{
"key": "BE",
"doc_count": 9
}
]
}
}
}
I want to merge "KR" and "JP" and "US"
And change key name to "NEW_RESULT"
So result must like this:
{
...
"aggregations": {
"how_to_merge": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "NEW_RESULT",
"doc_count": 223
},
{
"key": "BE",
"doc_count": 9
}
]
}
}
}
Is it possible in elasticsearch query?
I cannot use a client-side solution since there are too many entities and retrieving all of them and merging would be probably too slow for my application.
Thanks for your help and comments!
You can try writing a script for that though I would recommend benchmarking this approach against the client-side processing since it might be quite slow.

Elasticsearch determinte bucket length in aggregation

please help me with understanding nested bucket aggregation in elastic search. I have next query aggregation results:
[...]
{
"key": "key1",
"doc_count": 1166,
"range_keys": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "happy",
"doc_count": 1166
}
]
}
},
{
"key": "key2",
"doc_count": 1123,
"range_keys": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "cookies",
"doc_count": 1122
},
{
"key": "happy",
"doc_count": 1
}
]
}
},
[...]
As you see, i have query results with only "happy", but i need to get all results only with "happy" and "cookies".
In order to achieve this goal i tried to use "size" argument, but this argument gave e results with size and less results query.
How i can determine "bucket" length in nested query?

When are "buckets": [] in an aggregation?

My query is a nested aggregation
aggs: {
src: {
terms: {
field: "dst_ip",
size: 1000,
},
aggs: {
dst: {
terms: {
field: "a_field_which_changes",
size: 2000,
},
},
},
},
A typical doc the query is ran against is below (the mappings are all of type keyword)
{
"_index": "honey",
"_type": "event",
"_id": "AWHzRjHrjNgIX_EoDcfV",
"_score": 1,
"_source": {
"dst_ip": "10.101.146.166",
"src_ip": "10.10.16.1",
"src_port": "38",
}
},
There are actually two queries I make, one after the other. They differ by the value of a_field_which_changes, which is "src_ip" in one query and "src_port" in the other.
In the first query all the results are fine. The aggregation is 1 element large and the buckets specify what that element matched with
{
"key": "10.6.17.218", <--- "dst_ip" field
"doc_count": 1,
"dst": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "-1", <--- "src_port" field
"doc_count": 1
}
]
}
},
The other query yields two different kind of results:
{
"key": "10.6.17.218",
"doc_count": 1,
"dst": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": []
}
},
{
"key": "10.237.78.19",
"doc_count": 1,
"dst": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "10.12.67.89",
"doc_count": 1
}
]
}
},
The first result is problematic: it does not give the details of the buckets. It is no different from the other one but somehow the details are missing.
Why is it so, and most importantly - how to force Elasticsearch to display the details of the buckets?
The documentation goes into details on how to interfere with the aggregation but I could not find anything relevant there.

doc_count sub aggregation script

I have an aggregation which gives back a number of buckets. I'd like to further run a script which analyzes all the doc_counts and groups them into categories. Can someone give me an example of how to do this?
For example....
"aggregations": {
“updates_by_account: {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "fbf60d008f0b2f9b3d1d8f1f7fe6e4262662a04c9bcbcc20d92316daade3c25c",
"doc_count": 2
},
{
"key": "916129338fb099792f7b1f414868d45c3fd1a0feb89e1bbeafe24bdb496bec0b",
"doc_count": 1
},
{
"key": "f1b256be780d983549e968f187daef882999fd05889dcab7f1c8c4769ed0996b",
"doc_count": 1
}
]
}
my query looks like this
{
"aggs" : {
"updates_by_account": {
"terms": {
"script" : "doc[‘account_number’].value"
}
}
}
I'd like to do something like:
Number of users with 0-5 updates: 4
Number of users with 6-10 updates: 7
Number of users with 11 or more updates: 12
etc

Elasticsearch sub-aggregation excluding key from parent

I am currently doing an aggregation to get the top 20 terms in a given field and the top 5 co-occuring terms.
{
"aggs": {
"descTerms" : {
"terms" : {
"field" : "Desc as Marketed",
"exclude": "[a-z]{1}|and|the|with",
"size" : 20
},
"aggs" : {
"innerTerms" : {
"terms" : {
"field" : "Desc as Marketed",
"size" : 5
}
}
}
}
}
}
Which results in something like this:
"key": "bluetooth",
"doc_count": 11172,
"innerTerms": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 33700,
"buckets": [
{
"key": "bluetooth",
"doc_count": 11172
},
{
"key": "with",
"doc_count": 3827
}
I would like to exclude the key in the sub aggregation as it always returns as the top result (obviously) I just can't seem to figure out how to do so.
aka I want the previous to look like this:
"key": "bluetooth",
"doc_count": 11172,
"innerTerms": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 33700,
"buckets": [
{
"key": "with",
"doc_count": 3827
}

Resources