ElasticSearch Date Histogram Aggregation considering dates within a Document range - elasticsearch

I'm working with documents in Elasticsearch that represent Alerts. These Alerts are activated for a time and then deactivated. They are similar to this schema.
{
"id": 189393,
"sensorId": "1111111",
"activationTime": 1462569310000,
"deactivationTime": 1462785524876,
}
I would like to know the number of active alerts per day. To achieve this I want to perform a Date Histogram Aggregation that returns the days between activation and deactivation and the number of active alerts per day.
What I've tried so far is this query.
{
"query" : {
...
},
"aggs": {
"active_alerts": {
"date_histogram": {
"field": "timestamp",
"interval": "day"
}
}
}
}
However, It returns just the day it was activated.
"aggregations": {
"active_alerts": {
"buckets": [
{
"key_as_string": "2016-05-06T00:00:00.000Z",
"key": 1462492800000,
"doc_count": 1
}
]
}
}
Which I'd like ​​to return are the days between activation and deactivation time and the number of active alerts per day, as shown below.
"aggregations": {
"active_alerts": {
"buckets": [
{
"key_as_string": "2016-05-06T00:00:00.000Z",
"key": 1462492800000,
"doc_count": 1
},
{
"key_as_string": "2016-05-07T00:00:00.000Z",
"key": 1462579200000,
"doc_count": 1
},
{
"key_as_string": "2016-05-08T00:00:00.000Z",
"key": 1462665600000,
"doc_count": 1
}
]
}
}
Thanks.

Finally I've found a solution via script, creating one that emits an array of dates from activation date until deactivation date.
"aggs": {
"active_alerts": {
"date_histogram": {
"interval": "day",
"script": "Date d1 = new Date(doc['activationTime'].value); Date d2 = new Date(doc['deactivationTime'].value); List<Date> dates = new ArrayList<Date>(); (d1..d2).each { date-> dates.add(date.toTimestamp().getTime())}; return dates;"
}
}
}
Thanks.

I think you can only do it with scripted dateHistogram where you add the "missing" days from that interval you have programmatically:
"aggs": {
"active_alerts": {
"date_histogram": {
"interval": "day",
"script": "counter=0;combinedDates=[];currentDate=doc.activationTime.date;while(currentDate.isBefore(doc.deactivationTime.date.getMillis())){combinedDates[counter++]=currentDate.getMillis();currentDate.addDays(1)};combinedDates[counter]=doc.deactivationTime.date.getMillis();return combinedDates"
}
}
}

Related

Date_Histogram in elastic search

Today I had a task where I have to aggregate the data bucketed by 1 hour interval. So I used Date_Histogram aggregation in elastic search. Below is the query:
GET test-2017.02.01/_search
{
"size" : 0,
"aggs": {
"range_aggs": {
"date_histogram": {
"field": "#timestamp",
"interval": "hour",
"format": "yyyy-MM-dd HH:mm"
}
}
}
}
I got the below result:
"aggregations": {
"range_aggs": {
"buckets": [
{
"key_as_string": "2017-02-01 12:00",
"key": 1485950400000,
"doc_count": 4027
},
{
"key_as_string": "2017-02-01 13:00",
"key": 1485954000000,
"doc_count": 0
}
]
}
}
Every is good till now as I have run this query for one day, but when I run the query for multiple days in that case, I am getting the keys per day.
My question is - How can I get the data for the hour intervals(ex- 9am to 10am, 10am to 11am, ...etc) across all the days ?
{
"aggs": {
"range_aggs": {
"date_histogram": {
"field": "#timestamp",
"interval": "day",
"min_doc_count": 1
},
"aggs": {
"range_aggs": {
"date_histogram": {
"field": "#timestamp",
"interval": "hour"
}
}
}
}
}
}
If you need response grouped by hour, on across days, try this one.

Date_histogram aggregation returns bad results

I had to create aggregation that counts number of documents containing in date ranges.
My query looks like:
{
"query":{
"range":{
"doc.createdTime":{
"gte":1483228800000,
"lte":1485907199999
}
}
},
"size":0,
"aggs":{
"by_day":{
"histogram":{
"field":"doc.createdTime",
"interval":"604800000ms",
"format":"yyyy-MM-dd'T'HH:mm:ssZZ",
"min_doc_count":0,
"extended_bounds":{
"min":1483228800000,
"max":1485907199999
}
}
}
}
}
Interval: 604800000 equals to 7 days.
As a result, I recive these:
"aggregations": {
"by_day": {
"buckets": [
{
"key_as_string": "2016-12-29T00:00:00+00:00",
"key": 1482969600000,
"doc_count": 0
},
{
"key_as_string": "2017-01-05T00:00:00+00:00",
"key": 1483574400000,
"doc_count": 603
},
{
"key_as_string": "2017-01-12T00:00:00+00:00",
"key": 1484179200000,
"doc_count": 3414
},
{
"key_as_string": "2017-01-19T00:00:00+00:00",
"key": 1484784000000,
"doc_count": 71551
},
{
"key_as_string": "2017-01-26T00:00:00+00:00",
"key": 1485388800000,
"doc_count": 105652
}
]
}
}
As You can mantion that my buckets starts from 29/12/2016, but as a range query do not cover this date. I expect my buckets should start from 01/01/2017 as I pointed in the range query. This problem occurs only in query with interval with number of days greater then 1. In case of any other intervals it works fine. I've tried with day, months and hours and it works fine.
I've tried also to use filtered aggs and only then use date_histogram. Result is the same.
I'm using Elasticsearch 2.2.0 version.
And the question is how I can force date_histogram to start from date I need?
Try to add offset param with value calculated from given formula:
value = start_date_in_ms % week_in_ms = 1483228800000 % 604800000 =
259200000
{
"query": {
"range": {
"doc.createdTime": {
"gte": 1483228800000,
"lte": 1485907199999
}
}
},
"size": 0,
"aggs": {
"by_day": {
"date_histogram": {
"field": "doc.createdTime",
"interval": "604800000ms",
"offset": "259200000ms",
"format": "yyyy-MM-dd'T'HH:mm:ssZZ",
"min_doc_count": 0,
"extended_bounds": {
"min": 1483228800000,
"max": 1485907199999
}
}
}
}
}

Dates not sorted for date histogram in elasticsearch

Im firing a query, only with date_histogram aggregation like below
{
"aggs": {
"mentionsAnalytics": {
"date_histogram": {
"field": "created_at",
"interval": "day"
}
}
}
Now the response Im getting is not sorted according to date. Following is my response
"aggregations": {
"mentionsAnalytics": {
"buckets": [
{
"key_as_string": "2014-10-17T00:00:00.000Z",
"key": 1413504000000,
"doc_count": 2
}
,
{
"key_as_string": "2015-09-07T00:00:00.000Z",
"key": 1441584000000,
"doc_count": 2
}
,
{
"key_as_string": "2015-09-29T00:00:00.000Z",
"key": 1443484800000,
"doc_count": 2
}
,
{
"key_as_string": "2015-11-09T00:00:00.000Z",
"key": 1447027200000,
"doc_count": 4
}
]
}
}
As you can see, the dates are occuring in random order. Can I make it sorted,by modifying any parameters inside the date_histogram aggregation. I dont prefer to give a sort query, as a seperate query. Is that possible?
{
"aggs": {
"mentionsAnalytics": {
"date_histogram": {
"field": "created_at",
"interval": "day",
"order": {
"_key": "desc"
}
}
}
}
The field to be sorted on from the aggregation response should be given inside order and the type of sorting

How to use ElasticSearch to bucket historical data from midnight to now?

So I have an index with timestamps in the following format:
2015-03-20T12:00:00+0500
What I would like to do in the SQL equivalent is the following:
select date(timestamp), sum(orders)
from data
where time(timestamp) < time(now)
group by date(timestamp)
I know I need an aggregation but, for now, I've tried a basic search query below but I'm getting a malformed error:
{
"size": 0,
"query":
{
"filtered":
{
"query":
{
"match_all" : {}
},
"filter":
{
"range":
{
"#timestamp":
{
"from": "00:00:01.000",
"to": "15:00:00.000"
}
}
}
}
}
}
You do indeed want an aggregation, specifically the date histogram aggregation. Something like
{
"query": {"match_all": {}},
"aggs": {
"by_date": {
"date_histogram": {
"field": "timestamp",
"interval": "day"
},
"aggs": {
"order_sum": {
"sum": {"field": "foo"}
}
}
}
}
}
First you have a bucketing aggregation that groups your documents by date, then inside that a metric aggregation that computes a value (in this case a sum) for each bucket
which would return data of the form
{
...
"aggregations": {
"by_date": {
"buckets": [
{
"key_as_string": "2015-03-01T00:00:00.000Z",
"key": 1425168000000,
"doc_count": 8644,
"order_sum": {
"value": 1234
}
},
{
"key_as_string": "2015-03-02T00:00:00.000Z",
"key": 1425254400000,
"doc_count": 8819,
"order_sum": {
"value": 45678
}
},
...
]
}
}
}
There is a good intro to aggregations on the elasticsearch blog (part 1 and part 2) if you want to do some more reading.

ElasticSearch aggregation function

Is that a possible to define an aggregation function in elastic search?
E.g. for data:
author weekday status
me monday ok
me tuesday ok
me moday bad
I want to get an aggregation based on author and weekday, and as a value I want to get concatenation of status field:
agg1 agg2 value
me monday ok,bad
me tuesday ok
I know you can do count, but is that possible to define another function used for aggregation?
EDIT/ANSWER: Looks like there is no multirow aggregation support in ES, thus we had to use subaggregations on last field (see Akshay's example). If you need to have more complex aggregation function, then aggregate by id (note, you won't be able to use _id, so you'll have to duplicate it in other field) - that way you'll be able to do advanced aggregation on individual items in each bucket.
You can get get roughly what you want by using sub aggregations available in 1.0. Assuming the documents are structured as author, weekday and status, you could using the aggregation below:
{
"size": 0,
"aggs": {
"author": {
"terms": {
"field": "author"
},
"aggs": {
"days": {
"terms": {
"field": "weekday"
},
"aggs": {
"status": {
"terms": {
"field": "status"
}
}
}
}
}
}
}
}
Which gives you the following result:
{
...
"aggregations": {
"author": {
"buckets": [
{
"key": "me",
"doc_count": 3,
"days": {
"buckets": [
{
"key": "monday",
"doc_count": 2,
"status": {
"buckets": [
{
"key": "bad",
"doc_count": 1
},
{
"key": "ok",
"doc_count": 1
}
]
}
},
{
"key": "tuesday",
"doc_count": 1,
"status": {
"buckets": [
{
"key": "ok",
"doc_count": 1
}
]
}
}
]
}
}
]
}
}
}

Resources