get documents irrespective of years in elasticsearch - elasticsearch

I want to get all documents on 8 Dec irrespective of years. I have tried two queries but both fails, Is there any way to calculate this?
First Query
GET /my_index/my_type/_search
{
"query": {
"bool": {
"must": [
{
"range": {
"myDate": {
"gte": "12-08",
"lte": "12-08",
"format": "MM-dd"
}
}
}
]
}
}
}
Second Query
GET /my_index/my_type/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"mydate": "12-08"
}
}
]
}
}
}

Unfortunately, I don't think that will be easily possible. DateTime datatypes are actually just long numbers. The range query will also transform the defined input into a number. Example: now -> 1497541939892. See https://www.elastic.co/guide/en/elasticsearch/reference/current/date.html for more information - specifically this:
Internally, dates are converted to UTC (if the time-zone is specified) and stored as a long number representing milliseconds-since-the-epoch.
With that in mind, you would have to subtract 1 (or x) years (in milliseconds) for every subquery. That doesn't sound practical.
I think your best bet would be, to additionally index the day and month - and maybe year as well - separately. Then you would be able to query just by month/day, which would be integer values. I don't know if that is easily done in your case, but I really have no other idea right now.

Related

Elastic relative data math - finding all things today

I'm trying to do so fairly simple query with Elasticsearch, but I don't think I understand what I'm doing wrong, so I'm posting here for some pointers.
I have an elastic index where each document has a date like so:
{
// edited for brevity
"releasedate": "2020-10-03T15:55:03+00:00",
}
and I am using django DRF to make queries like so, where I pass this value along &releasedate__gt=now-3d/d
Which ends up with an elastic range query like this.
{
"from": 0,
"query": {
"bool": {
"filter": [
{
"range": {
"releasedate": {
"gt": "now/d-3d"
}
}
}
]
}
},
"size": 10,
"sort": [
"_score"
]
}
If I want to see all "documents since yesterday", I think of it in terms of all documents with releasedate greater than midnight yesterday, I figured the key part of the query would need to be like so:
{
"query": {
"bool": {
"filter": [
{
"range": {
"releasedate": {
"gt": "now/d-1d"
}
}
}
]
}
}
}
So I expect this would round the time now, to 00:00 today, then go back one day.
So if I ran this on 2020-10-04. I'd assume this would catch a document with the release date of 2020-10-03T15:55:03+00:00.
Here's my reasoning
Rounding down with now/d would take us to 2020-10-04T00:00.
And then going back one day with -1d would take us to 2020-10-03T00:00.
This ought to include the document, but I'm not seeing it. I need to look back more than one day to find the documents, so I need to use now/d-2d to find matching documents.
Any idea why this might be? I'm unsure of how to see what now/d-1d evaluates in terms of a timezone aware object, to check - that's what I might reach for, but I don't know how with Elastic.
FWIW, this is using Elastic 5.6. We'll be updating soon.
I'd say that once you round down to the nearest day (either with now-2d/d or now/d-2d -- as you did), the gt query's intervals will indeed be day-based.
In other words, gt : 2020-10-03T00:00 is >= 2020-10-04T00:00. So what you need instead of gt is gte and that'll work as >=2020-10-03T00:00.

Elasticsearch must clause faster than filter

We use elasticsearch 7.2 and we've been observing something weird lately
We tried executing the following two queries
{
"query": {
"bool": {
"must": [
{
"term": {
"customer(keyword_field)": "big_customer"
}
}
]
}
}
}
{
"query": {
"bool": {
"filter": [
{
"term": {
"customer(keyword_field)": "big_customer"
}
}
]
}
}
}
This matches around ~1million documents. The 1st one was faster than the 2nd (10 times faster!). I expected 1 to be slower because of scoring
Also, when i added sorting, both of them got slower (2nd remained the same, 1st became as slow as 2nd)
I have a suspicion that the 'filter' looks through all documents, whereas the 'term' (or range for dates, or match etc etc) will look at the indexed values., SPotted something similar at a new client, and was baffled why they were using 'filter' at the top level, and not range or match.
Could be wrong here btw...so try on your systems first

How to calculate the overlap / elapsed time range in elasticsearch?

I have some records in ES, they are different online meeting records that people join/leave at the different time.
{name:"p1", join:'2017-11-17T00:01:00.293Z', leave: "2017-11-17T00:06:00.293Z"}
{name:"p2", join:'2017-11-17T00:02:00.293Z', leave: "2017-11-17T00:04:00.293Z"}
{name:"p3", join:'2017-11-17T00:03:00.293Z', leave: "2017-11-17T00:05:00.293Z"}
Time range could be something like this:
p1: [============================================]
p2: [=================]
p3: [==================]
The question is how to calculate the overlap time range (common/meeting/shared time), which should be 3 min
Another further question is that is it possible to know when to when there is 1/2/3 people at that time? 2 mins 2 persons; 1 min 3 persons
I don't think its possible to do only with ES. Simply because all you need is that in search it should go to all documents that matched and calculate based on that
I would do it in following steps.
1.Before indexing new document search for documents which overlaps.
GET /meetings/_search
{
"query": {
"bool": {
"must": [
{
"range": {
"join": {
"gte": "2007-10-01T00:00:00"
}
}
},
{
"range": {
"leave": {
"lte": "2007-10-01T00:00:00"
}
}
}
]
}
}
}
Calculate all functionality on back-end for all documents that overlaps.
Save to to documents as nested object overlaps metadata you need
You can do the first part easily using max(join) and min(leave):
GET your_index/your_type/_search
{
"size": 0,
"aggs": {
"startTime": {
"max": {
"field": "join"
}
},
"endTime": {
"min": {
"field": "leave"
}
}
}
}
And then you can compute endTime-startTime either when you process Elasticsearch response or using a bucket script aggregation. It may be negative in which case there is no overlap.
For the second one, it depends of what you want:
If you want the exact boundaries, which may be hard to read, you can do it using a Scripted Metric Aggregation.
If you want to have the number per slot (hour for instance) it may be easier to use a Date Histogram Aggregation.

How to search for exact date match in Elasticsearch

I have a couple of items in my ES database with fields containing 2020-02-26T05:24:55.757Z for example. Is it possible to (with the URI Search, _search?q=...) search for exact dates? For example, in this case, I would like to find items from 2020-02-26. Is that possible?
Yes, It is possible. You could refer to query string documentation for more info.
curl localhost:9200/your_index_name/_search?q=your_date_field:%7B2020-02-26%20TO%20*%7D
You would need to encode the url. query part looks like q=your_date_field:{2020-02-26 TO *}
Above query in REST api would look like
{
"query": {
"range": {
"your_date_field": {
"gte": "2020-02-26"
}
}
}
}
For exact dates following would work
curl localhost:9200/your_index_name/_search?q=your_date_field:2020-02-26
Although this question is old, I came across it, so maybe others will do so too.
If you want to only work in UTC, you can use a match query, like:
{
"query": {
"match": {
"your_date_field": {
"query": "2020-02-26"
}
}
}
}
If you need to consider things matching on a particular date in a different timezone, you have to use a range query, like:
{
"query": {
"range": {
"your_date_field": {
"gte": "2020-02-26",
"lte": "2020-02-26",
"time_zone": "-08:00"
}
}
}
}

Elastic Search Date Range

I have a query that properly parses date ranges. However, my database has a default value that all dates have a timestamp of 00:00:00. This means that items that are still valid today are shown as expired even if they should still be valid. How can I adjust the following to look at just the date and not the time of the item (expirationDate).
{
"range": {
"expirationDate": {
"gte": "now"
}
}
}
An example of the data is:
"expirationDate": "2014-06-24T00:00:00.000Z",
Did you look into the different format options for dates stored in ElasticSearch? If this does not work for you or you don't want to store dates without the time you can try this query, which will work for your exact use case I guess:
{
"range": {
"expirationDate": {
"gt": "now-1d"
}
}
}
You can also round down the time so that your query returns anything that occurred since the beginning of the day:
Assuming that
now is 2017-03-07T07:00:00.000,
now/d is 2017-03-07T00:00:00.000
Your query would be:
{
"range": {
"expirationDate": {
"gte": "now/d"
}
}
}
elastic search documentation on rounding times

Resources