Elastic search date_histogram extended_bounds - elasticsearch

I want to get date_histogram during specific period, how to restrict the date period? Should I use the extended_bounds parameter? For example : I want to query the date_histogram between '2016-08-01' and '2016-08-31', and the interval is day. I query with this expression :
{
"aggs": {
"cf_loan": {
"date_histogram": {
"field": "createDate",
"interval": "day",
"format": "yyyy-MM-dd",
"min_doc_count": 0,
"extended_bounds": {
"min": "2016-08-01",
"max": "2016-08-31"
}
}
}
}
}
But I get the date_histogram not in the range.

You're almost there, you need to add a range query in order to only select documents whose createDate field is in the desired range.
{
"query": {
"range": { <---- add this range query
"createDate": {
"gte": "2016-08-01T00:00:00.000Z",
"lt": "2016-09-01T00:00:00.000Z"
}
}
},
"aggs": {
"cf_loan": {
"date_histogram": {
"field": "createDate",
"interval": "day",
"format": "yyyy-MM-dd",
"min_doc_count": 0,
"extended_bounds": {
"min": "2016-08-01",
"max": "2016-08-31"
}
}
}
}
}
The role of the extended_bounds parameter is to make sure you'll get daily buckets from min to max even if there are no documents in them. For instance, say you have 1 document each day between 2016-08-04 and 2016-08-28, then without the extended_bounds parameter, you'd get 25 buckets (2016-08-04, 2016-08-05, 2016-08-06, ..., 2016-08-28).
With the extended_bounds parameter, you'll also get the following buckets but with 0 documents:
2016-08-01
2016-08-02
2016-08-03
2016-08-29
2016-08-30
2016-08-31

Related

Find lowest price each day in list of documents

we store in elasticsearch documents with following fields
a (keyword)
b (keyword)
c (keyword)
date (date-time)
p (long)
how to find the lowest value p each date between 12/1/1920 and 12/31/1920 (pacific time zone)
You need to combine 2 elements: query and pipeline aggregations. Pipeline aggregations are aggregations on aggregations - first you create buckets per day (first aggregation) and then you take the min of each.
Here's how the query should look like:
{
"query": {
"range": {
"date": {
"gte": "1920-12-01",
"lte": "1920-12-31"
}
}
},
"aggs": {
"daily": {
"date_histogram": {
"field": "date",
"calendar_interval": "day"
},
"aggs": {
"min_price": {
"min": {"field": "p"}
}
}
}
}
}

Date_histogram and top_hits from unique values only

I am trying to do a date_histogram aggregation to show a sum of Duration for each hour.
I have the following documents:
{
"EntryTimestamp": 1567029600000,
"Username": "johndoe",
"UpdateTimestamp": 1567029600000,
"Duration": 10,
"EntryID": "ASDF1234"
}
The following works very well but my problem is that sometimes multiple documents appear with the same EntryID. So ideally I would need to add a top_hits somehow, and order by the UpdateTimestamp as I need the last updated document for each unique EntryID. But not sure how to add this to my query.
{
"size": 0,
"query": {
"bool": {
"filter": [{
"range": {
"EntryTimestamp": {
"gte": "1567029600000",
"lte": "1567065599999",
"format": "epoch_millis"
}
}
}, {
"query_string": {
"analyze_wildcard": true,
"query": "Username.keyword=johndoe"
}
}
]
}
},
"aggs": {
"2": {
"date_histogram": {
"interval": "1h",
"field": "EntryTimestamp",
"min_doc_count": 0,
"extended_bounds": {
"min": "1567029600000",
"max": "1567065599999"
},
"format": "epoch_millis"
},
"aggs": {
"1": {
"sum": {
"field": "Duration"
}
}
}
}
}
}
I think you'll need a top_hits aggregation inside a terms aggregation.
The terms aggregation will get the distinct EntryIDs and the top hit aggregation inside of it will get only the most recent document (based on UpdateTimestamp) for each bucket (each distinct value) of the terms aggregation.
I have no clear syntax adapted to your context, and i believe you might run into some issues regarding the number of sub aggregations (i ran into some limitations with advanced aggregations in the past)
You can see this post for more info on that; i hope it'll prove to be helpful to you.

elasticsearch aggregation and get columns value

want to get the values ​​of the bps and pps columns over time through aggregation. Now, if you use my query, you can only get a count of it. If there is a way to get the value of a specific column according to the interval value
this's my code
<code>
"size": 0,
"aggs": {
"group_by_state": {
"date_histogram": {
"field": "reg-date",
"interval": "day",
"min_doc_count": 0,
"extended_bounds": {
"min": "2018-10-01T00:00:00",
"max": "2018-10-07T23:59:59"
}
}
}
}
</code>
Is there a way I want to get a value that is not the number of specific columns that satisfy this query?
Getting the "value" of a single field once aggregated doesn't make sense anymore - if your one-day aggregation has three documents, then a single field doesn't have one "value" anymore over that one-day period.
Instead you can use a sub-aggregation to compute an aggregate value for the day, such as a sum or average:
{
"aggs": {
"group_by_state": {
"date_histogram": {
"field": "reg-date",
"interval": "day",
"min_doc_count": 0,
"extended_bounds": {
"min": "2018-10-01T00:00:00",
"max": "2018-10-07T23:59:59"
}
},
# sub-aggregations
"aggs": {
"bps_average": {
"avg": {
"field": "bps"
}
},
"pps_average": {
"avg": {
"field": "pps"
}
}
}
}
}
}
Then each bucket will have fields bps_average and pps_average. If you replace avg with sum you'll get a sum instead, and there are many other metrics aggregations.
The ElasticSearch guide has a good section on aggregations and nesting.

How to calculate the number of empty bucket when aggregating by days?

I want to get the number of days that a person stayed in a town in May (Month equal to 5).
This is my query, but it gives me the number of entries in myindex that have PersonID equal to 111 and Month equal to 5. For example, this query may give me an output like 90, but there are maximally 31 days per month.
GET myindex/_search?
{
"size":0,
"query": {
"bool": {
"must": [
{ "match": {
"PersonID": "111"
}},
{ "match": {
"Month": "5"
}}
]
} },
"aggs": {
"stay_days": {
"terms" : {
"field": "Month"
}
}
}
}
In myindex I have fields like DateTime with the date and time when a person was registered by a camera, e.g. 2017-05-01T00:30:08". So, during a single day the same person may pass several times by the camera, but it should be count as 1.
How can I update my query in order to calculate the number of days per month instead of the number of capturing by a camera?
Assuming your DateTime field called datetime, one way to consider is DateHistogram aggregation:
{
"size": 0,
"query": {
"bool": {
"must": [
{
"match": {
"PersonID": "111"
}
},
{
"range": {
"datetime": {
"gte": "2017-05-01",
"lt": "2017-06-01"
}
}
}
]
}
},
"aggregations": {
"my_day_histogram": {
"date_histogram": {
"field": "datetime",
"interval": "1d",
"min_doc_count": 1
}
}
}
}
Pay attention, that, in the must clause I used range term with the datetime field (not necessary but you may consider the Month field redundant). Also, you may need to edit the date format in the range term to your mapping
my_day_histogram: divide the data to buckets of separate days by setting the "interval": "1d".
"min_doc_count": 1 removes buckets contains zero documents.
Other approach, remove the range/match for month 5 and extend the histogram for every day in the year.
This can be also aggregated with month histogram like so:
"aggregations": {
"my_month_histogram": {
"date_histogram": {
"field": "first_timestamp",
"interval": "1M",
"min_doc_count": 1
},
"aggregations": {
"my_day_histogram": {
"date_histogram": {
"field": "first_timestamp",
"interval": "1d"
}
}
}
}
}
Its clear to me that, in both ways you'll need to count the number of buckets for which indicates the number of days.

How to limit a date histogram aggregation of nested documents to a specific date range?

Version
Using Elasticsearch 1.7.2
Objective
I would like to create a graph of the number of predictions made by users per day for the last n days. In this case, 10 days.
Current query
{
"size": 0,
"aggs": {
"predictions": {
"nested": {
"path": "user_answers"
},
"aggs": {
"predictions_over_time": {
"date_histogram": {
"field": "user_answers.created",
"interval": "day",
"format": "yyyy-MM-dd",
"min_doc_count": 0
}
}
}
}
}
}
Issue
This query will return a histogram but will return buckets for all available dates across all documents. It doesn't restrict to a specific date range.
What have I tried?
I've tried a number of approaches to solving this, all of which have failed.
* Range filter, then histogram that
* Date range aggregation, then histogram the buckets
* Using extended_bounds with, full dates, now-10d and also timestamps
* Trying a range filter inside the histogram aggregation
Any guidance would be appreciated! Thanks.
query didn't work for me in that situation, what I used is a third aggs:
{
"size": 0,
"aggs": {
"user_answers": {
"nested": { "path": "user_answers" },
"aggs": {
"timed_user_answers": {
"filter": {
"range": {
"user_answers.created": {
"gte": "now",
"lte": "now -10d"
}
}
},
"aggs": {
"predictions_over_time": {
"date_histogram": {
"field": "user_answers.created",
"interval": "day",
"format": "yyyy-MM-dd",
"min_doc_count": 0
}
}
}
}
}
}
}
}
One aggs specifies nested, one specifies filter, and the last specifies the actual aggregation. Don't know why this syntax makes sense, but you seem to not be able to use two on the same aggs.
You need to add a query. Query can be anything except from post_filter. It should be nested and contain date range. One of the ways is to define a constant score query. Inside constant score query, use a nested filter which should use a range filter.
{
"query": {
"constant_score": {
"filter": {
"nested": {
"path": "user_answers",
"filter": {
"range": {
"user_answers.created": {
"gte": "now",
"lte": "now -10d"
}
}
}
}
}
}
}
}
Confirm if this works for you.

Resources