Need an elasticsearch query filter range which starts 5 minutes before scheduled time - elasticsearch

I'm using elasticsearch 6.5.4, and a kibana watcher to alert.
I have a filter range like so:
"filter": [
{
"range": {
"#timestamp": {
"gte": "{{ctx.trigger.scheduled_time}}||-{{ctx.metadata.triggered_interval}}m"
}
}
}
]
The scheduled_time is every hour at the 5th minute (1:05, 2:05, etc.) The triggered_interval is 60.
I want to gather a range of #timestamps, ignoring the most recent 5 minutes. Basically, certain status messages might be too new to true errors, so want to ignore them.
I'm trying to craft this so it reads as: begin time is trigger.scheduled_time - 5m and end time is triggered_interval.
The range format is time1-time2, so scheduled_time-5m-triggered_interval is invalid syntax.
I've tried a few iterations but nothing seems to work. The watcher just returns null pointer exception.
"gte": "<{{{ctx.trigger.scheduled_time}}||-5m}>-{{ctx.metadata.triggered_interval}}m"
"gte": "<{{ctx.trigger.scheduled_time}}||-5m>-{{ctx.metadata.triggered_interval}}m"
"gte": "{{ctx.trigger.scheduled_time}}||-5m-{{ctx.metadata.triggered_interval}}m"
"gte": "({{ctx.trigger.scheduled_time}}||-5m)-{{ctx.metadata.triggered_interval}}m"
Is this possible to do in the range filter?

The elasticsearch date math functionality together with a range query should do the trick.
If you want to select all events older than 5 minutes and younger than 60 minutes, relative to the execution time, I´ll go with this:
"filter": [
{
"range": {
"#timestamp": {
"lte": "now-5m/m",
"gte": "now-60m/m"
}
}
}
]
In other words: Get all events, where the #timestamp is older than 5 minutes but not older than 60 minutes with all #timestamps rounded to full minute. If you don´t need the rounding, just remove the /m.
Cheers!

Related

Elasticsearch Datehistogram Interval

I am creating a date histogram aggregation like this, where min and max of extended_bounds are unix epoch values.
"aggs": {
"0": {
"date_histogram": {
"field": "#timestamp",
"fixed_interval": "30s",
"time_zone": "Asia/Kolkata",
"extended_bounds": {
"min": 1656419435318,
"max": 1656420335318
}
}
}
}
Now I am using "30s" as hard-coded value for fixed_interval value.
How can this value be dynamically generated depending on the duration of the bounds (min & max of extended bounds), if I want same number of buckets for each duration? Is there any function available from any kibana plugins for this purpose?
For example if I want 30 buckets:
(a) for 1 hour duration, fixed_interval will be 2 mins
(b) for 24 hours, fixed_interval will be 45 mins
I can write code of own to do this calculation, but any existing api would be helpful.
Also, when to use calendar_interval in place of fixed_interval. I have checked kibana lens generated queries, where depending of search duration fixed_interval or calendar_interval is used.

Using date_histogram with fixed_interval (30d) unexpected bucket start

I have a requirement to get data aggregated per 30 days (not month) so I'm using a date_histogram with "fixed_interval": "30d" to get that data. For example, if the user wants the last 90 days aggregations, there should be 3 buckets: [90-60, 60-30, 30-0]. Taking today's date (18-Mar-2021), I would want buckets [18-Dec,17-Jan,16-Feb].
However, what I actually get is [4-Dec,3-Jan,2-Feb,4-Mar]. The first bucket starts way earlier than any data is available, which also means an additional bucket than expected is needed in the end.
I found out that you can't easily tell when your buckets are meant to start (e.g. I want my first bucket to start at today-90 days). Buckets seem to start from 1970-01-01 according to what I could find (e.g. this) and the documentation kinda says this as well (this link, though it doesn't go into depth of the impact).
With this in mind, I worked out that I could use offset with an "interesting formula" so that I get the correct buckets that I need. E.g.:
GET /my_index/_search?filter_path=aggregations
{
"size": 0,
"query": {
"bool": {
"must": [
{ "range" : {
"#timestamp" : {
"gte" : "TODAY - 90/60/30",
"lt" : "TODAY"
}}
}
]
}
},
"aggs": {
"discussion_interactions_chart": {
"date_histogram": {
"field": "#timestamp",
"fixed_interval": "30d",
"format": "yyyy-MM-dd",
"offset": "(DAYS(#timestamp.gte, 1970-01-01) % 30)d"
}
}
}
}
(obviously this query doesn't work directly, I build the variables in code which for the example of 18-Mar-2021 offset is 14)
So basically offset is calculated as the number of days between my lower bound date and epoch, and then mod that value by 30. This seems to work but it's kinda hard to justify this logic on a code review. Is there a nicer solution to this?
Here's a Python implementation of the answer in your question (which you really deserve upvotes for, it's clever and helped me):
fixed_interval_days = 90
# offset needed to make fixed_interval histogram end on today's date (it starts the intervals at 1970-01-01)
offset_days = (datetime.datetime.utcnow() - datetime.datetime(1970, 1, 1)).days % fixed_interval_days
...
A(
"date_histogram",
fixed_interval=f"{fixed_interval_days}d",
offset=f"{offset_days}d",

elasticsearch get date range of most recent ingestion

I have an elasticsearch index that gets new data in large dumps, so from looking at the graph its very obvious when new data is added.
If I only want to get data from the most recent ingestion (in this case data from 2020-08-06, whats the best way of doing this?
I can use this query to get the most recent document:
GET /indexname/_search
{
"query": {
"bool": {
"must": [
{
"query_string": {
"query": queryString
}
}
]
}
},
"sort": {
"#timestamp" : "desc"
},
"size": 1
}
Which will return the most recent document, in this case a document with a timestamp of 2020-08-06. I can set that to my endDate and set my startDate to that date minus one day, but im worried of cases where the data was ingested overnight and spanned two days.
I could keep making requests to go back in time 5 hours at a time to find when the most recent large gap is, but im worried that making a request in a for loop could be time consuming? Is there a smarter way for getting the date range of my most recent ingestion?thx
When your data is coming in batches it'd be best to attribute an identifier to each batch. That way, there's no date math required.

Range query for selecting the "today"

I am using watcher for selecting the today's records. Format of the timstamp is yyyy-MM-dd HH:mm:ss. now/d rounds the time to beginning of the day. But the the following from/to values doen't seem to work. What's wrong the following query?
"filter": {
"range": {
"ingestion_timestamp": {
"from": "now/d",
"to": "now"
}
}
}
Range query has no from/to, it uses gt, gte, lt, lte to specify the range. See here for more info https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-range-query.html
Date range aggregation on the other hand uses from/to, see here
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-daterange-aggregation.html

ElasticSearch 2.4 date range histogram using the difference between two date fields

I haven't been able to find anything regarding this for ES 2.* in regards to the problem here or in the docs, so sorry if this is a duplicate.
What I am trying to do is create an aggregation in an ElasticSearch query that will allow me to create buckets based on the difference in a record between 2 date fields.
I.e. If I had data in ES for a shop, I might like to see the time difference between a purchase_date field and shipped_date field.
So in that instance I'd want to create an aggregate that had buckets to give me the hits for when shipped_date - purchase_date is < 1 day, 1-2 days, 3-4 days or 5+ days.
Ideally I was hoping this was possible in an ES query. Is that the case or would the best approach be to process the results into my own array based on the time difference for each hit?
I was able to achieve this by using the built in expression language which is enabled by default in ES 2.4. The functionality I wanted was to group my results to show the difference between EndDate and Date Processed in increments of 15 days. Relevant part of the query is:
{
...,
"aggs": {
"reason": {
"date_histogram": {
"min_doc_count": 1,
"interval": "1296000000ms", // 15 days
"format": "epoch_millis",
"script": {
"lang": "expression",
"inline": "doc['DateProcessed'] > doc['EndDate'] ? doc['DateProcessed'] - doc['EndDate'] : -1"
}
}
...
}
}

Resources