Histogram aggregation OR something else? - elasticsearch

Which aggregation should I use, when I want same functionality as Histogram, BUT with specify only number of buckets, instead of specify interval?
Something like: give me aggs for price, and split it to 5 buckets...
I don’t want to make min+max aggregation, then calculate 5 intervals before sending my query, because that means 1 extra query on server ... first ask for min+max, then send actual query.
STANDARD HISTOGRAM AGGS QUERY:
"aggs":{
"prices":{
"histogram": {
"field": "variants.priceVat.d1",
"interval": 500
}
}
}
STANDARD RESULT (min 10, max 850 = 2 buckets, because interval is 500):
"prices": {
"doc_count": 67,
"prices": {
"buckets": [
{
"key": 10,
"doc_count": 56
},
{
"key": 500,
"doc_count": 13
}
]
}
}
WHAT I WANT (five buckets with automatic range min:10, max:850 = 1 bucket interval is 168):
"prices": {
"doc_count": 67,
"prices":{
"buckets": [
{
"key": 10,
"doc_count": 42
},
{
"key": 178,
"doc_count": 10
},
{
"key": 346,
"doc_count": 4
},
{
"key": 514,
"doc_count": 7
},
{
"key": 682,
"doc_count": 2
}
]
}
}

Related

Get count of distinct values for a field across all documents in elastic search

I have a field
*slices.code *
in my Elasticsearch mapping. Slices is an array element and slices.code has various values like "ATF", "ORW", "HKL". Slices is not a nested type field. I want to avoid adding nested type to this field. In each document there could be multiple occurnces for slice.code = ATF/ORW. So I want to get all possible values of slice.code along with total occurence of each field value in all the documents. Something like this where HKL appeared in 2 documents but 3 number of times total
{
"key": "HKL",
"doc_count": 2,
"total": {
"value": 3
}
},
{
"key": "ATF",
"doc_count": 3,
"total": {
"value": 7
}
},
{
"key": "ORW",
"doc_count": 2,
"total": {
"value": 5
}
}
I tried using terms query, but with that i only get doc_count, i don't get total occurence of the field value with that query. Below is the terms query that i tried
{
"size": 0,
"aggs": {
"distinct_colors": {
"terms": {
"field": "slices.code.keyword",
"size": 65535
}
}
}
}
Output that i received:
"buckets": [
{
"key": "HKG",
"doc_count": 1
},
{
"key": "MNL",
"doc_count": 1
},
{
"key": "PVG",
"doc_count": 1
},
{
"key": "TPE",
"doc_count": 1
}
]

query on result aggregation elasticsearch

I had imported millions data into elasticsearch. Mapping is following:
"_source": {
"mt": "w",
"hour": 1
}
i want to find number of hour's that have occured more than 5.
for exmple:
using terms aggregation i get following result :
"aggregations": {
"hours": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 1,
"doc_count": 7
},
{
"key": 4,
"doc_count": 5
},
{
"key": 5,
"doc_count": 2
}
]
}
}
How do i find count of hour's that occure more than 5.
in here it be 1 because only hour=1 is more than 5
you can use "min_doc_count": 5 in terms aggregation Elastic doc

Using Elasticsearch Date Histogram Aggregations to Count Dates in Array Properties

I have an elasticsearch index with the following document:
{
dates: ["2014-01-31","2014-02-01"]
}
I want to count all the instances of all the days in my index separated by year and month. I hoped to do this using a date histogram aggregation (which is successful for counting non-array properties):
{
"from": 0,
"size": 0,
"aggregations": {
"year": {
"date_histogram": {
"field": "dates",
"interval": "1y",
"format": "yyyy"
},
"aggregations": {
"month": {
"date_histogram": {
"field": "dates",
"interval": "1M",
"format": "M"
},
"aggregations": {
"day": {
"date_histogram": {
"field": "dates",
"interval": "1d",
"format": "d"
}
}
}
}
}
}
}
}
However, I get the following aggregation results:
"aggregations": {
"year": {
"buckets": [
{
"key_as_string": "2014",
"key": 1388534400000,
"doc_count": 1,
"month": {
"buckets": [
{
"key_as_string": "1",
"key": 1388534400000,
"doc_count": 1,
"day": {
"buckets": [
{
"key_as_string": "31",
"key": 1391126400000,
"doc_count": 1
},
{
"key_as_string": "1",
"key": 1391212800000,
"doc_count": 1
}
]
}
},
{
"key_as_string": "2",
"key": 1391212800000,
"doc_count": 1,
"day": {
"buckets": [
{
"key_as_string": "31",
"key": 1391126400000,
"doc_count": 1
},
{
"key_as_string": "1",
"key": 1391212800000,
"doc_count": 1
}
]
}
}
]
}
}
]
}
}
The "day" aggregation ignores the bucket of its parent "month" aggregation, so it processes both elements of the array in each bucket, counting each date twice. The results indicate that two dates appear in each month (and four total), which is obviously incorrect.
I've tried reducing my aggregation to a single date histogram (and bucketing the results in java based on the key) but the doc_count returns as one instead of the number of elements in the array (two in my example). Adding a value_count brings me back to my original issue in which documents that overlap multiple buckets have their dates double-counted.
Is there a way to add a filter to the date histogram aggregations or otherwise modify them in order to count the elements in my date arrays correctly? Alternatively, does Elasticsearch have an option to unwind arrays like in MongoDB? I want to avoid using scripting due to security concerns.
Thanks,
Thomas

ElasticSearch aggregation query customized field

I am just wondering for a aggregation query in ES, is that possible to utilize the returned bucket for your own purpose. For example if I have response result like this:
{
"key": "test",
"doc_count": 2000,
"child": {
"value": 1000
}
}
And I want to get the ratio of doc_count and value, so I am looking for a way to generate another field/aggregation to do the math of those two fields, like this:
{
"key": "test",
"doc_count": 2000,
"child": {
"value": 1000
},
"ratio" : 2
}
or
{
"key": "test",
"doc_count": 1997,
"child": {
"value": 817
},
"buckets": [
{
"key": "ratio",
"value": 2
}
]
}

ElasticSearch - Get Statistics on Aggregation results

I have the following simple aggregation:
GET index1/type1/_search
{
"size": 0,
"aggs": {
"incidentID": {
"terms": {
"field": "incidentID",
"size": 5
}
}
}
}
Results are:
"aggregations": {
"incidentID": {
"buckets": [
{
"key": "0A631EB1-01EF-DC28-9503-FC28FE695C6D",
"doc_count": 233
},
{
"key": "DF107D2B-CA1E-85C9-E01A-C966DC6F7051",
"doc_count": 226
},
{
"key": "60B8955F-38FD-8DFE-D374-4387668C8368",
"doc_count": 220
},
{
"key": "B787868A-F72E-63DC-D837-B3A864D9FFC6",
"doc_count": 174
},
{
"key": "C597EC5F-C60F-F3BA-61CB-4990F12C1893",
"doc_count": 174
}
]
}
}
What I want to do is get the "statistics" of the "doc_count" returned. I want:
Min Value
Max Value
Average
Standard Deviation
No, this is not currently possible, here is the issue tracking the support:
https://github.com/elasticsearch/elasticsearch/issues/8110
Obviously, it is possible to do this client side if you are able to pull the full list of all buckets into memory.

Resources