Date_Histogram in elastic search - elasticsearch

Today I had a task where I have to aggregate the data bucketed by 1 hour interval. So I used Date_Histogram aggregation in elastic search. Below is the query:
GET test-2017.02.01/_search
{
"size" : 0,
"aggs": {
"range_aggs": {
"date_histogram": {
"field": "#timestamp",
"interval": "hour",
"format": "yyyy-MM-dd HH:mm"
}
}
}
}
I got the below result:
"aggregations": {
"range_aggs": {
"buckets": [
{
"key_as_string": "2017-02-01 12:00",
"key": 1485950400000,
"doc_count": 4027
},
{
"key_as_string": "2017-02-01 13:00",
"key": 1485954000000,
"doc_count": 0
}
]
}
}
Every is good till now as I have run this query for one day, but when I run the query for multiple days in that case, I am getting the keys per day.
My question is - How can I get the data for the hour intervals(ex- 9am to 10am, 10am to 11am, ...etc) across all the days ?

{
"aggs": {
"range_aggs": {
"date_histogram": {
"field": "#timestamp",
"interval": "day",
"min_doc_count": 1
},
"aggs": {
"range_aggs": {
"date_histogram": {
"field": "#timestamp",
"interval": "hour"
}
}
}
}
}
}
If you need response grouped by hour, on across days, try this one.

Related

elasticSearch - SUM aggregation returns double instead of long

I am doing a multiple aggregation on a LONG field (eventSize).
Is there any way to request another format without losing the precision in the output of the sum aggregation?
Below are the parts of the request used and response I got.
Query:
"aggs": {
"eventTermsAgg": {
"terms": {
"field": "eventType"
},
"aggs": {
"splitPerDayAgg": {
"date_histogram": {
"field": "date",
"fixed_interval": "1d",
"format": "yyyy-MM-dd"
},
"aggs": {
"eventSizeAgg": {
"sum": {
"field": "eventSize",
"format": "##.00"
}
}
}
}
}
}
}
Response:
"key": "112233",
"doc_count": 123,
"splitPerDayAgg": {
"buckets": [
{
"key_as_string": "2022-12-15",
"key": 123456789,
"doc_count": 3456,
"eventSizeAgg": {
"value": 1.01724077E8,
"value_as_string": "101724077.00"
}
}
I tried using the "format": "##.00" in the SUM aggs parameters, but it only returns the same value as string, losing the precision of the actual sum.

Elasticsearch: How set 'doc_count' of a FILTER-Aggregation in relation to total 'doc_count'

A seemingly very trivial problem prompted me today to read the Elasticsearch documentation again diligently. So far, however, I have not come across the solution....
Question:
is ther's a simple way to set the doc_count of a filter aggregation in relation to the total doc_count?
Here's a snippet from my search-request-json.
In the feature_occurrences aggregation I filtered documents.
Now I want to calculate the ratio filtered/all Docs in each time bucket.
GET my_index/_search
{
"aggs": {
"time_buckets": {
"date_histogram": {
"field": "date",
"calendar_interval": "1d",
"min_doc_count": 0
},
"aggs": {
"feature_occurrences": {
"filter": {
"term": {
"x": "y"
}
}
},
"feature_occurrences_per_doc" : {
// feature_occurences.doc_count / doc_count
}
Any Ideas ?
You can use bucket_script to calc the ratio:
{
"aggs": {
"date": {
"date_histogram": {
"field": "#timestamp",
"interval": "hour"
},
"aggs": {
"feature_occurrences": {
"filter": {
"term": {
"cloud.region": "westeurope"
}
}
},
"ratio": {
"bucket_script": {
"buckets_path": {
"doc_count": "_count",
"features_count": "feature_occurrences._count"
},
"script": "params.features_count / params.doc_count"
}
}
}
}
}
}
Elastic bucket script doc:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-pipeline-bucket-script-aggregation.html

How can i add additional terms in the ElasticSearch Aggregation with Datetime Buckets?

Using Elastic Search 5.3 aggregation api - unable to write a query which calculates a measure on a date bucket- week split by Dimension/ term/field. i am able to make the date buckets and get the measure calculated for that bucket, but unable to split it down by a term: say application or term say transaction. Elastic search 5+ version has deprecated a lot of APIs from previous versions. here is what i got - this is right now aggregating the measure across all terms for that date bucket. Need to split it by some fields / terms. How do I go about doing it.
POST /index_name/_search?size=0
{
"aggs": {
"myname_Summary": {
"date_histogram": {
"field": "#timestamp",
"interval": "week"
, "format": "yyyy-MM-dd"
, "time_zone": "-04:00"
},
"aggs":{ "total_volume" : {"sum": {"field": "volume"}}
}
}
}}
you can try this
{
"size": 0,
"aggs": {
"myname_Summary": {
"date_histogram": {
"field": "#timestamp",
"interval": "week",
"format": "yyyy-MM-dd",
"time_zone": "-04:00"
},
"aggs": {
"split": {
"terms": {
"field": "application",
"size": 10
},
"aggs": {
"transaction": {
"terms": {
"field": "transaction",
"size": 10
},
"aggs": {
"total_volume": {
"sum": {
"field": "volume"
}
}
}
}
}
}
}
}
}
}
Hope this helps

ElasticSearch Date Histogram Aggregation considering dates within a Document range

I'm working with documents in Elasticsearch that represent Alerts. These Alerts are activated for a time and then deactivated. They are similar to this schema.
{
"id": 189393,
"sensorId": "1111111",
"activationTime": 1462569310000,
"deactivationTime": 1462785524876,
}
I would like to know the number of active alerts per day. To achieve this I want to perform a Date Histogram Aggregation that returns the days between activation and deactivation and the number of active alerts per day.
What I've tried so far is this query.
{
"query" : {
...
},
"aggs": {
"active_alerts": {
"date_histogram": {
"field": "timestamp",
"interval": "day"
}
}
}
}
However, It returns just the day it was activated.
"aggregations": {
"active_alerts": {
"buckets": [
{
"key_as_string": "2016-05-06T00:00:00.000Z",
"key": 1462492800000,
"doc_count": 1
}
]
}
}
Which I'd like ​​to return are the days between activation and deactivation time and the number of active alerts per day, as shown below.
"aggregations": {
"active_alerts": {
"buckets": [
{
"key_as_string": "2016-05-06T00:00:00.000Z",
"key": 1462492800000,
"doc_count": 1
},
{
"key_as_string": "2016-05-07T00:00:00.000Z",
"key": 1462579200000,
"doc_count": 1
},
{
"key_as_string": "2016-05-08T00:00:00.000Z",
"key": 1462665600000,
"doc_count": 1
}
]
}
}
Thanks.
Finally I've found a solution via script, creating one that emits an array of dates from activation date until deactivation date.
"aggs": {
"active_alerts": {
"date_histogram": {
"interval": "day",
"script": "Date d1 = new Date(doc['activationTime'].value); Date d2 = new Date(doc['deactivationTime'].value); List<Date> dates = new ArrayList<Date>(); (d1..d2).each { date-> dates.add(date.toTimestamp().getTime())}; return dates;"
}
}
}
Thanks.
I think you can only do it with scripted dateHistogram where you add the "missing" days from that interval you have programmatically:
"aggs": {
"active_alerts": {
"date_histogram": {
"interval": "day",
"script": "counter=0;combinedDates=[];currentDate=doc.activationTime.date;while(currentDate.isBefore(doc.deactivationTime.date.getMillis())){combinedDates[counter++]=currentDate.getMillis();currentDate.addDays(1)};combinedDates[counter]=doc.deactivationTime.date.getMillis();return combinedDates"
}
}
}

Elasticsearch Date_Histogram does not cover entire filter

I'm using ES Date Histogram and a weird behavior started happening and I'm wondering why.
This is the request i'm sending to elasticsearch:
{
"from": 0,
"size": 0,
"query": {
"filtered": {
"filter": {
"and": [
{
"bool": {
"must": [
{
"range": {
"publishTime": {
"from": "2010-07-02T12:15:20.000Z",
"to": "2015-07-08T12:43:59.000Z"
}
}
}
]
}
}
]
}
}
},
"aggs": {
"agg|date_histogram|publishTime": {
"date_histogram": {
"field": "publishTime",
"interval": "1d",
"min_doc_count": 0
}
}
}
}
The result i'm getting are buckets, and the first bucket is:
{
"key_as_string": "2010-08-24T00:00:00.000Z",
"key": 1282608000000,
"doc_count": 1
}
So i'm filtering from 2010-07-02 and getting results only from 2010-08-24
This is just an example, I also saw this behavior with many more missing buckets (several months).
[edit]
this seems to correlate with the date of the first result, meaning that the first result in that time range is from 2010-08-24, but as I included "min_doc_count": 0 I expect to get results from that entire range
min_doc_count is only sufficient for returning empty buckets between the first and last documents matched by your filter. If you want to get results for the entire range you need to use extended_bounds as well:
"aggs": {
"agg|date_histogram|publishTime": {
"date_histogram": {
"field": "publishTime",
"interval": "1d",
"min_doc_count": 0
"extended_bounds": {
"min": 1278072920000,
"max": 1436359439000
}
}
}
}

Resources