Range query for selecting the "today" - elasticsearch

I am using watcher for selecting the today's records. Format of the timstamp is yyyy-MM-dd HH:mm:ss. now/d rounds the time to beginning of the day. But the the following from/to values doen't seem to work. What's wrong the following query?
"filter": {
"range": {
"ingestion_timestamp": {
"from": "now/d",
"to": "now"
}
}
}

Range query has no from/to, it uses gt, gte, lt, lte to specify the range. See here for more info https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-range-query.html
Date range aggregation on the other hand uses from/to, see here
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-daterange-aggregation.html

Related

Using date_histogram with fixed_interval (30d) unexpected bucket start

I have a requirement to get data aggregated per 30 days (not month) so I'm using a date_histogram with "fixed_interval": "30d" to get that data. For example, if the user wants the last 90 days aggregations, there should be 3 buckets: [90-60, 60-30, 30-0]. Taking today's date (18-Mar-2021), I would want buckets [18-Dec,17-Jan,16-Feb].
However, what I actually get is [4-Dec,3-Jan,2-Feb,4-Mar]. The first bucket starts way earlier than any data is available, which also means an additional bucket than expected is needed in the end.
I found out that you can't easily tell when your buckets are meant to start (e.g. I want my first bucket to start at today-90 days). Buckets seem to start from 1970-01-01 according to what I could find (e.g. this) and the documentation kinda says this as well (this link, though it doesn't go into depth of the impact).
With this in mind, I worked out that I could use offset with an "interesting formula" so that I get the correct buckets that I need. E.g.:
GET /my_index/_search?filter_path=aggregations
{
"size": 0,
"query": {
"bool": {
"must": [
{ "range" : {
"#timestamp" : {
"gte" : "TODAY - 90/60/30",
"lt" : "TODAY"
}}
}
]
}
},
"aggs": {
"discussion_interactions_chart": {
"date_histogram": {
"field": "#timestamp",
"fixed_interval": "30d",
"format": "yyyy-MM-dd",
"offset": "(DAYS(#timestamp.gte, 1970-01-01) % 30)d"
}
}
}
}
(obviously this query doesn't work directly, I build the variables in code which for the example of 18-Mar-2021 offset is 14)
So basically offset is calculated as the number of days between my lower bound date and epoch, and then mod that value by 30. This seems to work but it's kinda hard to justify this logic on a code review. Is there a nicer solution to this?
Here's a Python implementation of the answer in your question (which you really deserve upvotes for, it's clever and helped me):
fixed_interval_days = 90
# offset needed to make fixed_interval histogram end on today's date (it starts the intervals at 1970-01-01)
offset_days = (datetime.datetime.utcnow() - datetime.datetime(1970, 1, 1)).days % fixed_interval_days
...
A(
"date_histogram",
fixed_interval=f"{fixed_interval_days}d",
offset=f"{offset_days}d",

Elasticsearch 2.4 post_filter date math

When using a post_filter with date math on an Elasticsearch 2.4 query such as the following:
"post_filter": {
"bool": {
"must": [
[
{
"range": {
"facets.due_date": {
"gte": "now+2d\/d",
"lte": "now+3d\/d-1s"
}
}
}
]
]
}
}
The results include documents with dates outside the range by 1 day. The exact same values are used in the aggregations which report the correct counts for the buckets (2 documents for Saturday in this case), however, as mentioned when I apply the above post filter 3 documents are returned [the extra document being for Sunday at 9am]. The dates are arbitrary, I can change them to in a few days and the same thing happens. I'm also on UTC time and have allowed for this in my testing by adding/removing a few hours in the values to bypass any errors raised by timezones.
If I use an actual set of concrete dates it works as expected so my question is, does post_filter have a problem / bug with date math or is there a way to use explain to show me the dates the post_filter is sending to the ES server?
Thanks in advance, been banging my head against a brick wall for 3 days on this !!
So it turns out for some very strange reason using lte on a post filter captures surrounding documents whereas if I use lt then it works as expected, I don't have a clue why this is doing this, I can only assume some rounding is taking place when the post_filter is applied but it not rounded when the aggregations are calculated!

Is there any way to set default date range in elasticsearch

Is there any way in elasticsearch to set a default date range if to and from fields are null. Like whenever to and from are empty, then elasticsearch should perform search on the basis of defined default range. I have written a query but it only works in the case if to and from is defined:
"range": {
"time": {
"from": "2018-01-16T07:05:00",
"to": "2018-01-16T10:59:09",
"include_lower": true,
"include_upper": true
}
}
There is no default date range in elasticsearch right now.
If you didn't provide any date range filter, the elasticsearch will search the entire records which match your query.
It will search the entire index or alias you have pointed to search.
My suggestion would be
If you want to set a default time frame in your filter, you have to do that in your code (means client side).
So your program should set a time frame example - Last 30 days or something.

ElasticSearch 2.4 date range histogram using the difference between two date fields

I haven't been able to find anything regarding this for ES 2.* in regards to the problem here or in the docs, so sorry if this is a duplicate.
What I am trying to do is create an aggregation in an ElasticSearch query that will allow me to create buckets based on the difference in a record between 2 date fields.
I.e. If I had data in ES for a shop, I might like to see the time difference between a purchase_date field and shipped_date field.
So in that instance I'd want to create an aggregate that had buckets to give me the hits for when shipped_date - purchase_date is < 1 day, 1-2 days, 3-4 days or 5+ days.
Ideally I was hoping this was possible in an ES query. Is that the case or would the best approach be to process the results into my own array based on the time difference for each hit?
I was able to achieve this by using the built in expression language which is enabled by default in ES 2.4. The functionality I wanted was to group my results to show the difference between EndDate and Date Processed in increments of 15 days. Relevant part of the query is:
{
...,
"aggs": {
"reason": {
"date_histogram": {
"min_doc_count": 1,
"interval": "1296000000ms", // 15 days
"format": "epoch_millis",
"script": {
"lang": "expression",
"inline": "doc['DateProcessed'] > doc['EndDate'] ? doc['DateProcessed'] - doc['EndDate'] : -1"
}
}
...
}
}

String range query in Elasticsearch

I'm trying to query data in an Elasticsearch cluster (2.3) using the following range query. To clarify, I'm searching on a field that contains an array of values that were derived by concatenating two ids together with a count. For example:
Schema:
{
id1: 111,
id2: 222,
count: 5
}
The query I'm using looks like the following:
Query:
{
"query": {
"bool": {
"must": {
"range": {
"myfield": {
"from": "111_222_1",
"to": "111_222_2147483647",
"include_lower": true,
"include_upper": true
}
}
}
}
}
}
The to field uses Integer.MAX_VALUE
This works alright but doesn't exactly match the underlying data. Querying through other means produces more results than this method.
More strangely, trying 111_222_5 in the from field produces 0 results, while trying 111_222_10 does produce results.
How is ES (and/or Lucene) interpreting this range query and why is it producing such strange results? My initial guess is that it's not looking at the full value of the last portion of the String and possibly only looking at the first digit.
Is there a way to specify a format for the TermRange? I understand date ranging allows formatting.
A look here provides the answer.
The way it's doing range is lexicographic, 5 comes before 50 comes before 6, etc.
To get around this, I reindexed using a fixed length string for the count.
0000000001
0000000100
0001000101
...

Resources