Conditional Sorting in ElasticSearch - sorting

I have some documents that I would like to sort on a date field. For documents with date equal to a specified date, example today, and all dates after that I would like to sort ascending. For dates before the specified date I would like to sort in descending order.
Is this possible in ElasticSearch? If so could you suggest any literature or an approach.
date is of type "date" and format "dateOptionalTime".
Thanks

Yes this is possible in ElasticSearch using a script, either for sorting or for scoring.
My preference would be for a scoring script because 'script based score' is going to be quicker (according to the documentation).
Using a scoring script, you could use the Unix timestamp for the date field of type int/long and an mvel sorting script in the custom_score query. You might need to re-index your documents. You would also need to be able to convert the searched for time into a Unix timestamp to pump it at ElasticSearch.
The sorting script would then deduct the requested timestamp from each document's timestamp and make an absolute value. Then the results are sorted in ascending order - the lowest 'distance' is the best.
So when looking for documents dated about a year ago, it would look something like:
"query": {
"custom_score" : {
"query" : {
....
},
"params" : {
"req_date_stamp" : 1348438345,
},
"script" : "abs(doc['timestamp'].value - req_date_timestamp)"
}
},
"sort": {
"_score": {
'order': 'asc'
}
}
(Apologies for any mistakes in my JSON - I tested this idea in pyes)
You might need to tweak this to get the rounding right - for example your question mentions matching days, so you might want to round the timestamp generator to the nearest day.
For "full" info you can check out the Custom Score Query docs and follow the link to MVEL scripting.

For this kind of specific use cases, you should use a sorting script.
See the "script based sorting" section in the Sort documentation page.

My English is poor.
My soluation is boost.
My data is {"terms_id": [20211011,20211012,20211013,20211014],"sort_value":1} {"terms_id": [20211012,20211013,20211014],"sort_value":2} {"terms_id": [20211013,20211014,20211015],"sort_value":1}
My query is {"bool":{"must":[],"should":[{"bool":{"must":[{"terms":{"terms_id":[20211012],"boost":5}}],"must_not":[]}},{"bool":{"must_not":[{"terms":{"terms_id":[20211012]}}]}}],"minimum_should_match":1}}
My sort is {"_score":{"order":"desc"},"sort_value":{"order":"desc"}}
Result is{"terms_id": [20211012,20211013,20211014],"sort_value":2} {"terms_id": [20211011,20211012,20211013,20211014],"sort_value":1} {"terms_id": [20211013,20211014,20211015],"sort_value":1}

Related

Elasticsearch - scripted sorting by date(s) field ascending with specific input date

Lets assume i have this kind of document (events):
[
{
"title": "Foo",
"dates": [
"2019-07-01",
"2019-07-15",
"2019-08-01"
]
},
{
"title": "Bar",
"dates": [
"2019-07-18"
]
}
]
And i want to perform a search that finds all events happening in a given datespan, so i search for all events that take place in between the 10th of july (2019-07-10) and the 10th auf august (2019-08-10).
The query wont be the problem but if i want to display the events in an ascending order (so the "Foo" event is ranked before because the 15th of july comes before the 18th) - how do i sort that correctly?
I thought about using a script to sort my documents, maybe by filtering the dates so only valid dates remain and then use the timestamp from the first date value to do a simple numeric ordering. But how would i filter the dates by the script?
Could i use a "script field" and pass my fromDate and toDate to copy the filtered dates onto a new field (lets say validDates) and use THAT field for sorting?
BTW: the index will only contain a few thousand events.
UPDATE:
After some research it looks like this can not be done without using nested objects instead of arrays. Please correct me if i am wrong, i would have preferred to use a simple array of dates over the nested type...

Elasticsearch simoultaneous sort

I have multiple es indices - each of the tables from the database is kept in separate one.
Each of those indices has different mapping, although they present similar data. And so for dates I have:
In index1 - First_date which is datetype (2018-05-22).
In index2 - First_date which is integer (2018)
In index3 - justDate - integer (2017)
In index4 - date - string ("May 2018")
Is there way of sorting by all of these field simultaneously? I guess the answer might be script sorting, however I'm interested if this can be achieved in any other way.
If not, maybe at least same can be done for fields with same field type.
It could've look like this:
POST index1,index2,index3,index4/_search
{
"sort": [
{"First_date": {"order": "desc"}},
{"justDate": {"order": "desc"}},
{"date": {"order": "desc"}}
]
}
But I assume you want to sort by all of the fields in date order which this query will not give you.
Solving this task with a script will bring unnecessary calculations on query time.
I would suggest you to create date-format field in each index and fill it on index time. In this case query above will work as is.

Elasticsearch: Aggregate documents based on date range

I have a set of documents in ElasticSearch 5.5 with two date fields: start_date and end_date.
I want to aggregate them into date histogram buckets (ex: weekly) such that if the start_date < week X < end_date, then document would be in "week X" bucket.
This means that a single document might be in multiple buckets.
Consider the following concrete example: I have a set of documents describing company employees, and for each employee you have hire date and (optionally) termination date. I want to build date histogram of number of active employees for trailing twelve months.
Sample doc content:
{
"start_date": "2013-01-12T00:00:00.000Z",
"end_date": "2016-12-08T00:00:00.000Z",
"id": "123123123"
}
Is there a way to do this in ES?
I have found one way to do this, using filter aggregations (
https://www.elastic.co/guide/en/elasticsearch/reference/master/search-aggregations-bucket-filter-aggregation.html). If I need, say, 12 trailing months report, then I would create 12 buckets, where each bucket defines filter conditions, such as:
"bool":{
"must":[{
"range":{
"start_date":{
"lte":"2016-01-01T00:00:00.000Z"
}
}
},{
{
"range":{
"end_date":{
"gt":"2016-02-01T00:00:00.000Z"
}
}
}]
}
However, I feel that it would be nice if there was an easier way to do this, since if I want say trailing 365 days, that means I have to create 365 bucket filters, which makes resultant query very large.
I know this question is quite old but as it's still open I am sharing my knowledge on this. Also this question does not clearly explains that what kind of output is expected but still I think this can be achieved using the "Date Histogram Aggregation" and "Bucket Script Aggregation".
Here are the documentation links for both of these aggregations.
https://www.elastic.co/guide/en/elasticsearch/reference/5.5/search-aggregations-bucket-datehistogram-aggregation.html
https://www.elastic.co/guide/en/elasticsearch/reference/5.5/search-aggregations-pipeline-bucket-script-aggregation.html

Elastic Search Scoring based on the date time fields

How to write the custom Scoreing function in Elasticsearch based on the date field
can any one help me to write the custom Scoreing function in Elasticsearch based on the date field?
If I give the date field as asc it will use other scoring function to calculate score and finally if use the asc i need add the score to document with has least recent days and if desc the score should be based on most recent days.
I bet what you are looking for is so-called Function Queries.
In case of date you could use field_value_factor. It will take your date value and transform it into milliseconds (Unix timestamp). So you should supply smth like:
"field_value_factor": {
"field": "your_date_field",
"factor": 1,
"modifier": "none",
"missing": 1
}

Elastic Search - Sort by multiple fields with the missing parameter

I am trying to apply a sort to an Elastic Search query by two different fields:
price_sold and price_list
I would like to first sort on price_sold, but if that value is null, I would like to then sort by price_list
Would the query be correct if I just set the sorts to:
"sort": [
{ "price_sold": { "order": "desc"}},
{ "price_list": { "order": "desc"}}
]
I have executed the query, and I do not get any errors, and it seems like the results are correct, however I am curious if I have overlooked something.
I have been reading about the missing filter, along with possibly using a custom value. This may not be required, but I am not quite sure.
Would there be a way to define a second field to sort on if the first field is missing, or is that not necessary? Something like:
"sort": [{"price_sold: {"order": "desc", "missing": "doc['field_name']"}]
Would simply adding these two sorts give me the desired result?
Thanks.
I think I understand what you're asking. In SQL terms, you'd like to ORDER BY COALESCE(price_sold, price_list) DESC.
The first sort you listed is a little different. It's similar to ORDER BY price_sold DESC, price_list DESC - in other words, primary sort is by price_sold, and for entries where price_sold is equal, secondary sort is by price_list.
Your second sort attempt would be great if "missing" worked that way. Unfortunately, missing's "custom" option appears to allow you to specify a constant value only.
If you don't need to limit your search using from and size, you should be able to use sort's _script option to write some logic that works for you. I ended up here because I do use from and size to retrieve batches, and when I sort by _script, the items I'm getting don't make sense - the items are sorted correctly, but I'm not getting the right set of items. So, I added a new analyzer and expanded my fields to use the new analyzer, and I was hoping to be able to sort using the new field or, if the new field doesn't exist (for previously-indexed items), use the old field's value instead. But that doesn't seem to be possible. I think I'm going to have to reindex my items so my new field is populated.
In case someone is still looking I ended up creating a script similar to this:
curl -XGET 'localhost:9200/_search?pretty&size=10&from=0' -H 'Content-Type: application/json' -d'
{
"sort" : {
"_script" : {
"type" : "number",
"script" : {
"lang": "painless",
"inline": "doc[\u0027price_sold\u0027] == null ? doc[\u0027price_list\u0027].value : doc[\u0027price_sold\u0027].value"
},
"order" : "desc"
}
},
}
'
For sorting dates, the type still has to remain number but you replace .value with .date.getMillisOfDay() as discussed here.
The from and size worked fine in my version of ElasticSearch (5.1.1).
To make sure your algorithm is working fine check the generated value in the response, e.g.: "sort" : [ 5.0622E7 ].

Resources