Ordering a string with 'leading zeros' in Elastic - sorting

Is there a way to order following items based on their text and numeric value instead of only string based?
ABC0001
BCA0001
ABC0003
ABC00002
Currently I get this order
ABC00002
ABC0001
ABC0003
BCA0001
But I want it like this:
ABC0001
ABC00002
ABC0003
BCA0001
The config of the field is this:
'type' => 'keyword',
'normalizer' => 'lowercase_normalizer',
I didn't see anything in the docs about it

If your strings always have three letters concatenated with a zero-padded number, you can resort to script-based sorting. The following script would sort the way you like:
POST test/_search
{
"sort": {
"_script": {
"type": "string",
"script": {
"lang": "painless",
"source": """
def letters = doc['field'].value.substring(0, 3);
def numbers = Integer.parseInt(doc['field'].value.substring(3));
return letters + numbers;
"""
},
"order": "asc"
}
}
}
However, since scripts can kill the performance of your queries depending on your volume of data, a better way would be to create a sub-field with the appropriate analyzer that could do this analysis at indexing time instead of search time. It would be pretty easy to do.

Related

Custom sort lexicographically as int

I have some elastic elements that have a string property that looks like 10/2021 and it need to be sorted as a int, but when I perform this query
"sort": [
{
"myProperty": {
"order": "asc"
}
},
I get the lexicographic order.
1/2021
10/2021
100/2021
101/2021
102/2021
But I need it to sort by the first number and the year like this:
1/2020
2/2020
...
1/2021
2/2021
I can't figure out how to custom sort, is it even possible?
Solution 1:
Using Scripted-Sort ...
Not Recommended with large data-set: It will take time as we are performing computations here
GET <>/_search
{
"query": {
"match_all": {}
},
"sort": {
"_script":{
"type":"number",
"script":{
"lang":"painless",
"source":"Integer.parseInt(doc['myProperty.keyword'].value.replace(\"/\",\"\"))" //<====== Replace myProperty.keyword with the keyword field or String field with field-data true
}
}
}
}
Note: i haven't added null checks in the script, just in case you have any document which don't have this field.
Solution 2:
Store another Numeric field in elastic search which doesn't have "/"
Sort based on that field
Migrate the data of existing documents to the field using update_by_query API
This is the Recommended approach.

Sorting on field which in text data type but integer will be store in Elastic Search

we have a field in index - TempNo which has to be text type but all values in this field are number (integer)
When i am doing sorting (desc) on this field , sort does not happen correctly. I am not getting result in desc order of TempNo.
It seems it is because of text type . How can I sort it correctly ? (type is text but sorting should happen based on Number)
Thanks,
Gopal
Actually, if the type is text, ElasticSearch does not do any Sort/Agg operations for you.
There are 2 ways to make some changes.
1. Change the TempNo from text to integer directly. (It will sort correctly)
2. Add Raw type for TempNo if you must use the text,(https://www.elastic.co/guide/en/elasticsearch/reference/current/multi-fields.html), and then use the painless for sorting by number.
GET my_index/_search
{
"query": {
"match_all": {}
},
"sort": {
"_script": {
"type": "Number",
"order": "desc",
"script": {
"lang": "painless",
"source": """
String s = doc['TempNo'].value;
int tdvalue = Integer.parseInt(s);
return tdvalue;
"""
}
}
}
}

ElasticSearch - Access array of date_range in painless script filter

Is there any way to access an array of date_range in a painless script filter?
My mapping for the "blocked_dates" field is as follows:
"blocked_dates": {
"type": "date_range",
"format": "strict_date"
},
Data looks like this:
"blocked_dates": [
{
"gte": "2019-07-12",
"lte": "2019-07-14"
},
{
"gte": "2019-07-16",
"lte": "2019-07-18"
}
],
I am using Amazon ElasticSearch v6.7 so I cannot use params._source in a script filter and if I try and access it via doc then I get an illegal_argument_exception.
"blocked_dates = doc['blocked_dates'].value; ",
" ^---- HERE"
Fielddata is not supported on field [blocked_dates] of type [date_range]
I have a complex booking window requirement that checks if the chosen move-in and move-out date is within x days of another booking (blocked date) and this has to be done in a script.
I could do something hacky like store a copy of the array of date_range as a comma delimited (ie "2019-07-20,2019-09-12") string array. Then grab the string array from the painless script filter and parse the dates out of them.
But that is my last resort.
Try params._source.blocked_dates.gte (or lte depends on your needs), but keep in mind what it returned string, but not a date in your particular case.
In my case (float_range) solution was
"script": {
"lang": "painless",
"source": "Float.parseFloat(params._source.price.gte)"
}
I think idea is pretty clear

How can you sort by the lesser of two dates in Elasticsearch?

Scenario:
I have data that has two date fields (created_at and published_on). In some cases the dates are not right, and I need to sort by the lesser (older) of the two dates.
In other words, for each item, figure out the older date, and then use that to sort on.
I see from the documentation that you can sort by fields "sequentially" (eg: sort by name and then sort by city), and I see some numerical sort options (min/max, etc). But I'm struggling with how to achieve this.
This post describes how this works in something like PostgreSQL but I know that Elastic is a completely different animal, so I'm hoping there's some way to do this.
Thanks!
You need to extract one value out of the two fields and then sort based on that value. For this you need script based sorting. In your query dsl you have to add the sort param as below:
{
"sort": {
"_script": {
"type": "number",
"script": {
"source": "if(doc['created_at'].value.getMillis() < doc['published_on'].value.getMillis()) { return doc['created_at'].value.getMillis(); } else { return doc['published_on'].value.getMillis(); }",
"lang": "painless"
},
"order": "desc"
}
}
}

Query DSL terms filtering with script for day by numeric value

Within aggs I am able to get buckets by day of the week that are represented in numeric (1-7) keys using something like this:
"aggs" : {
"group_by_day" :{
"terms": {
"script": "doc['#timestamp'].date.dayOfWeek",
"order": {
"_key": "asc"
}
}
}
}
however I am looking for a way to add to the query filtering terms clause something like this to only show results for a monday or tuesday and haven't been able to get this:
I have tried
{
"terms": {
"script":"doc['#timestamp'].date.dayOfWeek"
}
}
and the use of script tag doesn't seem to be supported in terms query? at least how I am attempting to use it. Is there another way to get at filtering with script, or another approach (better) to get want I am trying to achieve? I am using 6.2...thanks!
Here is it:
"script":{
"script": {
"source": "doc['#timestamp'].date.dayOfWeek == 1"
}
}
Where I just handle the string to numeric conversion outside of this query, this is within a query.bool.must clause.

Resources