Elasticsearch: sort by field with null values last - elasticsearch

I want sort by name, but keep entries with null or missing price data at the end of the results.
I have tried:
sort: {
sorting.name: {
order: "asc"
},
prices.minprice.price: {
missing: "_last"
}
}
but this only sorts by name.

Actually you could use the missing order.
https://www.elastic.co/guide/en/elasticsearch/reference/current/sort-search-results.html#_missing_values

To do this we can use a custom function score (which is faster than custom script based sorting according to elasticsearch documentation). When price is missing, the numeric value will return 0.
We then sort on score first to split into a section with missing price data, and a section without missing price data, and then sort each of these two sections based on name.
{
"query": {
"function_score": {
"boost_mode": "replace",
"query": {"match_all": {}},
"script_score": {
"script": "doc['price'].value == 0 ? 0 : 1"
}
}
},
"sort": [
"_score",
"name"
]
}

This has worked for me,
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-sort.html#_script_based_sorting
"sort":[
{
"_script":{
"type":"number",
"script":{
"lang":"painless",
"source":"return params._source.store[0].sku_available_unit == 0 ? 0 : 1;"
},
"order":"desc"
}
}
]

Related

Elasticsearch ordering by field value which is not in the filter

can somebody help me please to make a query which will order result items according some field value if this field is not part of query in request. I have a query:
{
"_source": [
"ico",
"name",
"city",
"status"
],
"sort": {
"_score": "desc",
"status": "asc"
},
"size": 20,
"query": {
"bool": {
"should": [
{
"match": {
"normalized": {
"query": "idona",
"analyzer": "standard",
"boost": 3
}
}
},
{
"term": {
"normalized2": {
"value": "idona",
"boost": 2
}
}
},
{
"match": {
"normalized": "idona"
}
}
]
}
}
}
The result is sorted according field status alphabetically ascending. Status contains few values like [active, canceled, old....] and I need something like boosting for every possible values in query. E.g. active boost 5, canceled boost 4, old boost 3 ........... Is it possible to do it? Thanks.
You would need a custom sort using script to achieve what you want.
I've just made use of generic match_all query for my query, you can probably go ahead and add your query logic there, but the solution that you are looking for is in the sort section of the below query.
Make sure that status is a keyword type
Custom Sorting Based on Values
POST <your_index_name>/_search
{
"query":{
"match_all":{
}
},
"sort":[
{ "_score": "desc" },
{
"_script":{
"type":"number",
"script":{
"lang":"painless",
"inline":"if(params.scores.containsKey(doc['status'].value)) { return params.scores[doc['status'].value];} return 100000;",
"params":{
"scores":{
"active":5,
"old":4,
"cancelled":3
}
}
},
"order":"desc"
}
}
]
}
In the above query, go ahead and add the values in the scores section of the query. For e.g. if your value is new and you want it to be at say value 2, then your scores would be in the below:
{
"scores":{
"active":5,
"old":4,
"cancelled":3,
"new":6
}
}
So basically the documents would first get sorted by _score and then on that sorted documents, the script sort would be executed.
Note that the script sort is desc by nature as I understand that you would want to show active documents at the top, followed by other values. Feel free to play around with it.
Hope this helps!

Is it possible to sort by a range in Elasticsearch?

When I execute the following query:
{
"query": {
"bool": {
"filter": [
{
"match": {
"my_value": "hi"
}
},
{
"range": {
"my_range": {
"gt": 0,
"lte": 200
}
}
}
]
}
},
"sort": {
"my_range": {
"order": "asc",
"mode": "min"
}
}
}
I get the error:
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Fielddata is not supported on field [my_range] of type [long_range]"
}
How can I enable a range datatype to be sortable? Is this possible?
Elasticsearch version: 5.4, but I am wondering if this is possible with ANY version.
More context
Not all documents in the alias/index have the range field. However, the query filters to only include documents with that field.
It is not straight-forward to sort using a field of range data type. Still you can use script based sorting to some extent to get the expected result.
e.g. For simplicity of script I'm assuming for all your docs, the data indexed against my_range field has data for gt and lte only and you want to sort based on the minimum values of the two then you can add the below for sorting:
{
"query": {
"bool": {
"filter": [
{
"match": {
"my_value": "hi"
}
},
{
"range": {
"my_range": {
"gt": 0,
"lte": 200
}
}
}
]
}
},
"sort": {
"_script": {
"type": "number",
"script": {
"lang": "painless",
"inline": "Math.min(params['_source']['my_range']['gt'], params['_source']['my_range']['lte'])"
},
"order": "asc"
}
}
}
You can modify the script as per your needs for complex data involving combination of all lt, gt, lte, gte.
Updates (Scripts for other different use cases):
1. Sort by difference
"Math.abs(params['_source']['my_range']['gt'] - params['_source']['my_range']['lte'])"
2. Sort by gt
"params['_source']['my_range']['gt']"
3. Sort by lte
"params['_source']['my_range']['lte']"
4. Sorting if query returns few docs which don't have range field
"if(params['_source']['my_range'] != null) { <sorting logic> } else { return 0; }"
Replace <sorting logic> with the required logic of sorting (which can be one of the 3 above or the one in the query)
return 0 can be replace by return -1 or anything other number as per the sorting needs
I think what you are looking for is sort based on the difference of the range coz I'm not sure if simply sorting on any of the range values would make any sense.
For e.g. if range for one document is 100, 300 and another 200, 600 then you would want to sort based on the difference for e.g. you would want the lesser range to be appearing i.e 300-100 = 200 to be appearing at the top.
If so, I've made use of the below painless script and implemented script based sorting.
Sorting based on difference in Range
POST <your_index_name>/_search
{
"query":{
"match_all":{
}
},
"sort":{
"_script":{
"type":"number",
"script":{
"lang":"painless",
"inline":"params._source.my_range.lte-params._source.my_range.gte"
},
"order":"asc"
}
}
}
Note that in this case, sort won't be based on any of the field values of my_range but only on their differences. If you want to further sort based on the fields like lte, lt, gte or gt you can have your sort implemented with multiple script as below:
Sorting based on difference in Range + Range Field (my_range.lte)
POST <your_index_name>/_search
{
"query":{
"match_all":{
}
},
"sort":[
{
"_script":{
"type":"number",
"script":{
"lang":"painless",
"inline":"params._source.my_range.lte - params._source.my_range.gte"
},
"order":"asc"
}
},
{
"_script":{
"type":"number",
"script":{
"lang":"painless",
"inline":"params._source.my_range.lte"
},
"order":"asc"
}
}
]
}
So in this case even if for two documents, ranges are same, the one with the lesser my_range.lte would be showing up first.
Sort based on range field
However if you simply want to sort based on one of the range values, you can make use of below query.
POST <your_index_name>/_search
{
"query":{
"match_all":{
}
},
"sort":{
"_script":{
"type":"number",
"script":{
"lang":"painless",
"inline":"params._source.my_range.lte"
},
"order":"asc"
}
}
}
Updated Answer to manage documents without range
This is for the scenario, Sort based on difference in range + Range.lte or Range.lt whichever is present
The below code what it does is,
Checks if the document has my_range field
If it doesn't have, then by default it would return Long.MAX_VALUE. This would mean if you sort by asc, this document should returned
last.
Further it would check if document has lte or lt and uses that value as high. Note that default value of high is Long.MAX_VALUE.
Similarly it would check if document has gte or gt and uses that value as low. Default value of low would be 0.
Calculate now high - low value on which sorting would be applied.
Updated Query
POST <your_index_name>/_search
{
"size":100,
"query":{
"match_all":{
}
},
"sort":[
{
"_script":{
"type":"number",
"script":{
"lang":"painless",
"inline":"""
if(params._source.my_range==null){
return Long.MAX_VALUE;
} else {
long high = Long.MAX_VALUE;
long low = 0L;
if(params._source.my_range.lte!=null){
high = params._source.my_range.lte;
} else if(params._source.my_range.lt!=null){
high = params._source.my_range.lt;
}
if(params._source.my_range.gte!=null){
low = params._source.my_range.gte;
} else if (params._source.my_range.gt==null){
low = params._source.my_range.gt;
}
return high - low;
}
"""
},
"order":"asc"
}
},
{
"_script":{
"type":"number",
"script":{
"lang":"painless",
"inline":"""
if(params._source.my_range==null){
return Long.MAX_VALUE;
}
long high = Long.MAX_VALUE;
if(params._source.my_range.lte!=null){
high = params._source.my_range.lte;
} else if(params._source.my_range.lt!=null){
high = params._source.my_range.lt;
}
return high;"""
},
"order":"asc"
}
}
]
}
This should work with ES 5.4. Hope it helps!
This can be resolved easily by using the regex interval filter :
Interval The interval option enables the use of numeric ranges,
enclosed by angle brackets "<>". For string: "foo80":
foo<1-100> # match
foo<01-100> # match
foo<001-100> # no match
Enabled with the INTERVAL or ALL flags.
Elactic docs
{
"query": {
"bool": {
"filter": [
{
"match": {
"my_value": "hi"
}
},
{
"regexp": {
"my_range": {
"value": "<0-200>"
}
}
}
]
}
},
"sort": {
"my_range": {
"order": "asc",
"mode": "min"
}
}
}

Return distinct values in Elasticsearch

I am trying to solve an issue where I have to get distinct result in the search.
{
"name" : "ABC",
"favorite_cars" : [ "ferrari","toyota" ]
}, {
"name" : "ABC",
"favorite_cars" : [ "ferrari","toyota" ]
}, {
"name" : "GEORGE",
"favorite_cars" : [ "honda","Hyundae" ]
}
When I perform a term query on favourite cars "ferrari". I get two results whose name is ABC. I simply want that the result returned should be one in this case. So my requirement will be if I can apply a distinct on name field to receive one 1 result.
Thanks
One way to achieve what you want is to use a terms aggregation on the name field and then a top_hits sub-aggregation with size 1, like this:
{
"size": 0,
"query": {
"term": {
"favorite_cars": "ferrari"
}
},
"aggs": {
"names": {
"terms": {
"field": "name"
},
"aggs": {
"single_result": {
"top_hits": {
"size": 1
}
}
}
}
}
}
That way, you'll get a single term ABC and then nested into it a single matching document

How to check field data is numeric when using inline Script in ElasticSearch

Per our requirement we need to find the max ID of the document before adding new document. Problem here is doc may contain string data also So had to use inline script on the elastic query to find out max id only for the document which has integer data otherwise returning 0. am using following inline script query to find max-key but not working. can you help me onthis ?.
{
"size":0,
"query":
{"bool":
{"filter":[
{"term":
{"Name":
{
"value":"Test2"
}
}}
]
}},
"aggs":{
"MaxId":{
"max":{
"field":"Key","script":{
"inline":"((doc['Key'].value).isNumber()) ? Integer.parseInt(doc['Key'].value) : 0"}}
}
}
}
The error is because the max aggregation only supports numeric fields, i.e. you cannot specify a string field (i.e. Key) in a max aggregation.
Simply remove the "field":"Key" part and only keep the script part
{
"size": 0,
"query": {
"bool": {
"filter": [
{
"term": {
"Name": "Test2"
}
}
]
}
},
"aggs": {
"MaxId": {
"max": {
"script": {
"source": "((doc['Key'].value).isNumber()) ? Integer.parseInt(doc['Key'].value) : 0"
}
}
}
}
}

In a should bool clause in elasticsearch what is default score mode?

Actually i am writing three query inside my should boo query clause in ES now i was wondering what is the default score_mode of the should clause as i want to use maximum of all three query score how can i achieve that. My query is given below now where to define score_mode ??
bool: {
"disable_coord": true,
"should": [
{
term : { 'address.area2' : search_area2 }
},
{
term : { "address.area1" : search_area1 }
},
{
term : { 'address.city' : search_city }
}
], "boost": 2.0
}
From the Bool Query docs:
The bool query takes a more-matches-is-better approach, so the score
from each matching must or should clause will be added together to
provide the final _score for each document.
To override that behavior, wrap your bool in a
Function Score Query. You can define a Field Value Factor Function for address.area2, one for address.area1 and one for address.city, then use max as score_mode.
The resulting function score should be something like the following (did not try, you may have to modify a bit)
"function_score": {
"query": YOUR_BOOL_QUERY,
"boost": 2,
"functions": [
{
"field_value_factor": {
"field": "address.area2",
"factor": 1
}
},
{
"field_value_factor": {
"field": "address.area1",
"factor": 1
}
},
{
"field_value_factor": {
"field": "address.city",
"factor": 1
}
}
],
"score_mode": "max",
"boost_mode": "replace"
}
UPDATE:
added "boost_mode": "replace" according to docs, because we want to ignore the query score and only use our function score

Resources