Dates not sorted for date histogram in elasticsearch - elasticsearch

Im firing a query, only with date_histogram aggregation like below
{
"aggs": {
"mentionsAnalytics": {
"date_histogram": {
"field": "created_at",
"interval": "day"
}
}
}
Now the response Im getting is not sorted according to date. Following is my response
"aggregations": {
"mentionsAnalytics": {
"buckets": [
{
"key_as_string": "2014-10-17T00:00:00.000Z",
"key": 1413504000000,
"doc_count": 2
}
,
{
"key_as_string": "2015-09-07T00:00:00.000Z",
"key": 1441584000000,
"doc_count": 2
}
,
{
"key_as_string": "2015-09-29T00:00:00.000Z",
"key": 1443484800000,
"doc_count": 2
}
,
{
"key_as_string": "2015-11-09T00:00:00.000Z",
"key": 1447027200000,
"doc_count": 4
}
]
}
}
As you can see, the dates are occuring in random order. Can I make it sorted,by modifying any parameters inside the date_histogram aggregation. I dont prefer to give a sort query, as a seperate query. Is that possible?

{
"aggs": {
"mentionsAnalytics": {
"date_histogram": {
"field": "created_at",
"interval": "day",
"order": {
"_key": "desc"
}
}
}
}
The field to be sorted on from the aggregation response should be given inside order and the type of sorting

Related

elasticSearch - SUM aggregation returns double instead of long

I am doing a multiple aggregation on a LONG field (eventSize).
Is there any way to request another format without losing the precision in the output of the sum aggregation?
Below are the parts of the request used and response I got.
Query:
"aggs": {
"eventTermsAgg": {
"terms": {
"field": "eventType"
},
"aggs": {
"splitPerDayAgg": {
"date_histogram": {
"field": "date",
"fixed_interval": "1d",
"format": "yyyy-MM-dd"
},
"aggs": {
"eventSizeAgg": {
"sum": {
"field": "eventSize",
"format": "##.00"
}
}
}
}
}
}
}
Response:
"key": "112233",
"doc_count": 123,
"splitPerDayAgg": {
"buckets": [
{
"key_as_string": "2022-12-15",
"key": 123456789,
"doc_count": 3456,
"eventSizeAgg": {
"value": 1.01724077E8,
"value_as_string": "101724077.00"
}
}
I tried using the "format": "##.00" in the SUM aggs parameters, but it only returns the same value as string, losing the precision of the actual sum.

Date_histogram aggregation returns bad results

I had to create aggregation that counts number of documents containing in date ranges.
My query looks like:
{
"query":{
"range":{
"doc.createdTime":{
"gte":1483228800000,
"lte":1485907199999
}
}
},
"size":0,
"aggs":{
"by_day":{
"histogram":{
"field":"doc.createdTime",
"interval":"604800000ms",
"format":"yyyy-MM-dd'T'HH:mm:ssZZ",
"min_doc_count":0,
"extended_bounds":{
"min":1483228800000,
"max":1485907199999
}
}
}
}
}
Interval: 604800000 equals to 7 days.
As a result, I recive these:
"aggregations": {
"by_day": {
"buckets": [
{
"key_as_string": "2016-12-29T00:00:00+00:00",
"key": 1482969600000,
"doc_count": 0
},
{
"key_as_string": "2017-01-05T00:00:00+00:00",
"key": 1483574400000,
"doc_count": 603
},
{
"key_as_string": "2017-01-12T00:00:00+00:00",
"key": 1484179200000,
"doc_count": 3414
},
{
"key_as_string": "2017-01-19T00:00:00+00:00",
"key": 1484784000000,
"doc_count": 71551
},
{
"key_as_string": "2017-01-26T00:00:00+00:00",
"key": 1485388800000,
"doc_count": 105652
}
]
}
}
As You can mantion that my buckets starts from 29/12/2016, but as a range query do not cover this date. I expect my buckets should start from 01/01/2017 as I pointed in the range query. This problem occurs only in query with interval with number of days greater then 1. In case of any other intervals it works fine. I've tried with day, months and hours and it works fine.
I've tried also to use filtered aggs and only then use date_histogram. Result is the same.
I'm using Elasticsearch 2.2.0 version.
And the question is how I can force date_histogram to start from date I need?
Try to add offset param with value calculated from given formula:
value = start_date_in_ms % week_in_ms = 1483228800000 % 604800000 =
259200000
{
"query": {
"range": {
"doc.createdTime": {
"gte": 1483228800000,
"lte": 1485907199999
}
}
},
"size": 0,
"aggs": {
"by_day": {
"date_histogram": {
"field": "doc.createdTime",
"interval": "604800000ms",
"offset": "259200000ms",
"format": "yyyy-MM-dd'T'HH:mm:ssZZ",
"min_doc_count": 0,
"extended_bounds": {
"min": 1483228800000,
"max": 1485907199999
}
}
}
}
}

Is it possible to returns other fields when you aggregate results on Elasticsearch?

Here is the mappings of my index PublicationsLikes:
id : String
account : String
api : String
date : Date
I'm currently making an aggregation on ES where I group the results counts by the id (of the publication).
{
"key": "<publicationId-1>",
"doc_count": 25
},
{
"key": "<publicationId-2>",
"doc_count": 387
},
{
"key": "<publicationId-3>",
"doc_count": 7831
}
The returned "key" (the id) is an information but I also need to select another fields of the publication like account and api. A bit like that:
{
"key": "<publicationId-1>",
"api": "Facebook",
"accountId": "65465z4fe6ezf456ezdf",
"doc_count": 25
},
{
"key": "<publicationId-2>",
"api": "Twitter",
"accountId": "afaez5f4eaz",
"doc_count": 387
}
How can I manage this?
Thanks.
This requirement is best achieved by top_hits aggregation, where you can sort the documents in each bucket and choose the first and also you can control which fields you want returned:
{
"size": 0,
"aggs": {
"publications": {
"terms": {
"field": "id"
},
"aggs": {
"sample": {
"top_hits": {
"size": 1,
"_source": ["api","accountId"]
}
}
}
}
}
}
You can use subaggregation for this.
GET /PublicationsLikes/_search
{
"aggs" : {
"ids": {
"terms": {
"field": "id"
},
"aggs": {
"accounts": {
"terms": {
"field": "account",
"size": 1
}
}
}
}
}
}
Your result will not exactly what you want but it will be a bit similar:
{
"key": "<publicationId-1>",
"doc_count": 25,
"accounts": {
"buckets": [
{
"key": "<account-1>",
"doc_count": 25
}
]
}
},
{
"key": "<publicationId-2>",
"doc_count": 387,
"accounts": {
"buckets": [
{
"key": "<account-2>",
"doc_count": 387
}
]
}
},
{
"key": "<publicationId-3>",
"doc_count": 7831,
"accounts": {
"buckets": [
{
"key": "<account-3>",
"doc_count": 7831
}
]
}
}
You can also check the link to find more information
Thanks both for your quick replies. I think the first solution is the most "beautiful" (in terms of request but also to retrieves the results) but both seems to be sub aggregations queries.
{
"size": 0,
"aggs": {
"publications": {
"terms": {
"size": 0,
"field": "publicationId"
},
"aggs": {
"sample": {
"top_hits": {
"size": 1,
"_source": ["accountId", "api"]
}
}
}
}
}
}
I think I must be careful to size=0 parameter, so, because I work in the Java Api, I decided to put INT.Max instead of 0.
Thnaks a lot guys.

ElasticSearch Date Histogram Aggregation considering dates within a Document range

I'm working with documents in Elasticsearch that represent Alerts. These Alerts are activated for a time and then deactivated. They are similar to this schema.
{
"id": 189393,
"sensorId": "1111111",
"activationTime": 1462569310000,
"deactivationTime": 1462785524876,
}
I would like to know the number of active alerts per day. To achieve this I want to perform a Date Histogram Aggregation that returns the days between activation and deactivation and the number of active alerts per day.
What I've tried so far is this query.
{
"query" : {
...
},
"aggs": {
"active_alerts": {
"date_histogram": {
"field": "timestamp",
"interval": "day"
}
}
}
}
However, It returns just the day it was activated.
"aggregations": {
"active_alerts": {
"buckets": [
{
"key_as_string": "2016-05-06T00:00:00.000Z",
"key": 1462492800000,
"doc_count": 1
}
]
}
}
Which I'd like ​​to return are the days between activation and deactivation time and the number of active alerts per day, as shown below.
"aggregations": {
"active_alerts": {
"buckets": [
{
"key_as_string": "2016-05-06T00:00:00.000Z",
"key": 1462492800000,
"doc_count": 1
},
{
"key_as_string": "2016-05-07T00:00:00.000Z",
"key": 1462579200000,
"doc_count": 1
},
{
"key_as_string": "2016-05-08T00:00:00.000Z",
"key": 1462665600000,
"doc_count": 1
}
]
}
}
Thanks.
Finally I've found a solution via script, creating one that emits an array of dates from activation date until deactivation date.
"aggs": {
"active_alerts": {
"date_histogram": {
"interval": "day",
"script": "Date d1 = new Date(doc['activationTime'].value); Date d2 = new Date(doc['deactivationTime'].value); List<Date> dates = new ArrayList<Date>(); (d1..d2).each { date-> dates.add(date.toTimestamp().getTime())}; return dates;"
}
}
}
Thanks.
I think you can only do it with scripted dateHistogram where you add the "missing" days from that interval you have programmatically:
"aggs": {
"active_alerts": {
"date_histogram": {
"interval": "day",
"script": "counter=0;combinedDates=[];currentDate=doc.activationTime.date;while(currentDate.isBefore(doc.deactivationTime.date.getMillis())){combinedDates[counter++]=currentDate.getMillis();currentDate.addDays(1)};combinedDates[counter]=doc.deactivationTime.date.getMillis();return combinedDates"
}
}
}

Using Elasticsearch Date Histogram Aggregations to Count Dates in Array Properties

I have an elasticsearch index with the following document:
{
dates: ["2014-01-31","2014-02-01"]
}
I want to count all the instances of all the days in my index separated by year and month. I hoped to do this using a date histogram aggregation (which is successful for counting non-array properties):
{
"from": 0,
"size": 0,
"aggregations": {
"year": {
"date_histogram": {
"field": "dates",
"interval": "1y",
"format": "yyyy"
},
"aggregations": {
"month": {
"date_histogram": {
"field": "dates",
"interval": "1M",
"format": "M"
},
"aggregations": {
"day": {
"date_histogram": {
"field": "dates",
"interval": "1d",
"format": "d"
}
}
}
}
}
}
}
}
However, I get the following aggregation results:
"aggregations": {
"year": {
"buckets": [
{
"key_as_string": "2014",
"key": 1388534400000,
"doc_count": 1,
"month": {
"buckets": [
{
"key_as_string": "1",
"key": 1388534400000,
"doc_count": 1,
"day": {
"buckets": [
{
"key_as_string": "31",
"key": 1391126400000,
"doc_count": 1
},
{
"key_as_string": "1",
"key": 1391212800000,
"doc_count": 1
}
]
}
},
{
"key_as_string": "2",
"key": 1391212800000,
"doc_count": 1,
"day": {
"buckets": [
{
"key_as_string": "31",
"key": 1391126400000,
"doc_count": 1
},
{
"key_as_string": "1",
"key": 1391212800000,
"doc_count": 1
}
]
}
}
]
}
}
]
}
}
The "day" aggregation ignores the bucket of its parent "month" aggregation, so it processes both elements of the array in each bucket, counting each date twice. The results indicate that two dates appear in each month (and four total), which is obviously incorrect.
I've tried reducing my aggregation to a single date histogram (and bucketing the results in java based on the key) but the doc_count returns as one instead of the number of elements in the array (two in my example). Adding a value_count brings me back to my original issue in which documents that overlap multiple buckets have their dates double-counted.
Is there a way to add a filter to the date histogram aggregations or otherwise modify them in order to count the elements in my date arrays correctly? Alternatively, does Elasticsearch have an option to unwind arrays like in MongoDB? I want to avoid using scripting due to security concerns.
Thanks,
Thomas

Resources