will elasticsearch date_histogram check the date inside the interval exist or not? And if so , what will happen? If no any error handling for this - elasticsearch

So far i am working on the ES date histogram for getting monthly result, and my query is like
{
"aggs": {
"sales_over_time": {
"date_histogram": {
"field": "date",
"calendar_interval": "1M",
"offset": Cutoff
}
}
}
}
and the return is like
date
1 10978.521 2020-11-20 5995.69
2 11177.911 2020-12-20 199.39
3 11177.911 2021-01-20 0.00
So my question is :
what if the date "20" is not exist ? and any error handling from ES?
thanks
Jeff

Since it's a monthly date histogram, each bucket must have a date key. That date key is the date of the beginning of the monthly bucket. For instance, 2020-11-20 is the key and the starting date of the bucket starting on that date. In that bucket, you will find all documents whose date is between 2020-11-20 and 2020-12-20.
Same thing for the last bucket which starts on 2021-01-20, it will contain all documents starting on that date and going through 2021-02-20. It doesn't matter whether you have documents whose date field is specifically on those bucket key dates, those keys are just interval bounds.

Related

Using date_histogram with fixed_interval (30d) unexpected bucket start

I have a requirement to get data aggregated per 30 days (not month) so I'm using a date_histogram with "fixed_interval": "30d" to get that data. For example, if the user wants the last 90 days aggregations, there should be 3 buckets: [90-60, 60-30, 30-0]. Taking today's date (18-Mar-2021), I would want buckets [18-Dec,17-Jan,16-Feb].
However, what I actually get is [4-Dec,3-Jan,2-Feb,4-Mar]. The first bucket starts way earlier than any data is available, which also means an additional bucket than expected is needed in the end.
I found out that you can't easily tell when your buckets are meant to start (e.g. I want my first bucket to start at today-90 days). Buckets seem to start from 1970-01-01 according to what I could find (e.g. this) and the documentation kinda says this as well (this link, though it doesn't go into depth of the impact).
With this in mind, I worked out that I could use offset with an "interesting formula" so that I get the correct buckets that I need. E.g.:
GET /my_index/_search?filter_path=aggregations
{
"size": 0,
"query": {
"bool": {
"must": [
{ "range" : {
"#timestamp" : {
"gte" : "TODAY - 90/60/30",
"lt" : "TODAY"
}}
}
]
}
},
"aggs": {
"discussion_interactions_chart": {
"date_histogram": {
"field": "#timestamp",
"fixed_interval": "30d",
"format": "yyyy-MM-dd",
"offset": "(DAYS(#timestamp.gte, 1970-01-01) % 30)d"
}
}
}
}
(obviously this query doesn't work directly, I build the variables in code which for the example of 18-Mar-2021 offset is 14)
So basically offset is calculated as the number of days between my lower bound date and epoch, and then mod that value by 30. This seems to work but it's kinda hard to justify this logic on a code review. Is there a nicer solution to this?
Here's a Python implementation of the answer in your question (which you really deserve upvotes for, it's clever and helped me):
fixed_interval_days = 90
# offset needed to make fixed_interval histogram end on today's date (it starts the intervals at 1970-01-01)
offset_days = (datetime.datetime.utcnow() - datetime.datetime(1970, 1, 1)).days % fixed_interval_days
...
A(
"date_histogram",
fixed_interval=f"{fixed_interval_days}d",
offset=f"{offset_days}d",

what does mean now/d elasticsearch

what exactly is it now-1d/d or now/d in elastic search, Below is an example query
GET /_search
{
"query": {
"range" : {
"timestamp" : {
"gte" : "now-1d/d",
"lt" : "now/d"
}
}
}
}
it will take the current timestamp(time when your query reaches to Elasticsearch) and deduct the 1 day timestamp and bring the document in that range.
These types of queries are useful when you don't want to specify the exact time and want to get data of last 1 day, 3 day, 7 day, 1 month etc.
As mentioned in official doc of range query
now is always the current system time in UTC.
Taken example from official doc of datemath
Assuming now is 2001-01-01 12:00:00, some examples are:
now+1h now in milliseconds plus one hour. Resolves to: 2001-01-01
13:00:00
now-1h now in milliseconds minus one hour. Resolves to: 2001-01-01
11:00:00
now-1h/d now in milliseconds minus one hour, rounded down to UTC
00:00. Resolves to: 2001-01-01 00:00:00
2001.02.01||+1M/d 2001-02-01 in milliseconds plus one month. Resolves to: 2001-03-01 00:00:00

Elastic Search Scoring based on the date time fields

How to write the custom Scoreing function in Elasticsearch based on the date field
can any one help me to write the custom Scoreing function in Elasticsearch based on the date field?
If I give the date field as asc it will use other scoring function to calculate score and finally if use the asc i need add the score to document with has least recent days and if desc the score should be based on most recent days.
I bet what you are looking for is so-called Function Queries.
In case of date you could use field_value_factor. It will take your date value and transform it into milliseconds (Unix timestamp). So you should supply smth like:
"field_value_factor": {
"field": "your_date_field",
"factor": 1,
"modifier": "none",
"missing": 1
}

ElasticSearch 2.4 date range histogram using the difference between two date fields

I haven't been able to find anything regarding this for ES 2.* in regards to the problem here or in the docs, so sorry if this is a duplicate.
What I am trying to do is create an aggregation in an ElasticSearch query that will allow me to create buckets based on the difference in a record between 2 date fields.
I.e. If I had data in ES for a shop, I might like to see the time difference between a purchase_date field and shipped_date field.
So in that instance I'd want to create an aggregate that had buckets to give me the hits for when shipped_date - purchase_date is < 1 day, 1-2 days, 3-4 days or 5+ days.
Ideally I was hoping this was possible in an ES query. Is that the case or would the best approach be to process the results into my own array based on the time difference for each hit?
I was able to achieve this by using the built in expression language which is enabled by default in ES 2.4. The functionality I wanted was to group my results to show the difference between EndDate and Date Processed in increments of 15 days. Relevant part of the query is:
{
...,
"aggs": {
"reason": {
"date_histogram": {
"min_doc_count": 1,
"interval": "1296000000ms", // 15 days
"format": "epoch_millis",
"script": {
"lang": "expression",
"inline": "doc['DateProcessed'] > doc['EndDate'] ? doc['DateProcessed'] - doc['EndDate'] : -1"
}
}
...
}
}

Dynamic time zone offset in elasticsearch aggregation?

I'm aggregating documents that each have a timestamp. The timestamp is UTC, but the documents each also have a local time zone ("timezone": "America/Los_Angeles") that can be different across documents.
I'm trying to do a date_histogram aggregation based on local time, not UTC or a fixed time zone (e.g., using the option "time_zone": "America/Los_Angeles").
How can I convert the timezone for each document to its local time before the aggregation?
Here's the simple aggregation:
{
"aggs": {
"date": {
"date_histogram": {
"field": "created_timestamp",
"interval": "day"
}
}
}
}
I'm not sure if I fully understand it, but it seems like the time_zone property would be for that:
The zone value accepts either a numeric value for the hours offset, for example: "time_zone" : -2. It also accepts a format of hours and minutes, like "time_zone" : "-02:30". Another option is to provide a time zone accepted as one of the values listed here.
If you store another field that's the local time without timezone information it should work.
Take every timestamp you have (which is in UTC), convert it to a date in the local timezone (this will contain the timezone information). Now simply drop the timezone information from this datetime. Now you can perform actions on this new field.
Suppose you start with this time in UTC:
'2016-07-17T01:33:52.412Z'
Now, suppose you're in PDT you can convert it to:
'2016-07-16T18:33:52.412-07:00'
Now, hack off the end so you end up with:
'2016-07-16T18:33:52.412Z'
Now you can operate on this field.

Resources