ElasticSearch - sort search results by relevance and custom field (Date) - elasticsearch

For example, I have entities with two fields - Text and Date. I want search by entities with results sorted by Date. But if I do it simply, then the result is unexpected.
For search query "Iphone 6" there are the newest texts only with "6" in top of еру results, not with "iphone 6". Without sorting the results seem nice, but not ordered by Date as I want.
How write custom sort function which will consider both relevance and Date? Or may be exist way to give weight to field Date which will be consider in scoring?
In addition, may be I shall want to suppress search results only with "6". How to customize search to find results only by bigrams for example?

Did you tried with bool query like this
{
"query": {
"bool": {
"must": {
"match": {
"field": "iphone 6"
}
}
}
},
"sort": {
"date": {
"order": "desc"
}
}
}
or with your query you can also do this with is more appropriate way of doing i guess ..
just add this as sort
"sort": [
{ "date": { "order": "desc" }},
{ "_score": { "order": "desc" }}
]
all matching results sorted first by date, then by relevance.

The solution is to use _score and the date field both in sort. _score as the first sort order and date field as secondary sort order.
You can use simple match query to perform relevance match.
Try it out.
Data setup:
POST ecom/prod
{
"name":"iphone 6",
"date":"2019-02-10"
}
POST ecom/prod
{
"name":"iphone 5",
"date":"2019-01-10"
}
POST ecom/prod
{
"name":"iphone 6",
"date":"2019-02-28"
}
POST ecom/prod
{
"name":"6",
"date":"2019-03-01"
}
Query for relevance and date based sorting:
POST ecommerce/prododuct/_search
{
"query": {
"match": {
"name": "iphone 6"
}
},
"sort": [
{
"_score": {
"order": "desc"
}
},
{
"date": {
"order": "desc"
}
}
]
}

You could definitely use a phrase matching query for this.
It does position-aware matching so the documents will be considered a match for your query only if both "iphone" and "6" occur in the searched fields AND that their occurrences respects this order, "iphone" shows up before "6".

looks like you want to sort first by relevance and then by date. this query will do it.
{ "query" : {
"match" : {
"my_field" : "my query"
}
},
"sort": {
"pubDate": {
"order": "desc",
"mode": "min"
}
}
}
When sorting on fields with more than one value, remember that the
values do not have any intrinsic order; a multivalue field is just a
bag of values. Which one do you choose to sort on? For numbers and
dates, you can reduce a multivalue field to a single value by using
the min, max, avg, or sum sort modes. For instance, you could sort on
the earliest date in each dates field by using the above query.
elasticsearch guide sorting

I think your relevance is broken. You should use two different analyzers, 1 for setting up your index and another for searching. like this:
PUT /my_index/my_type/_mapping
{
"my_type": {
"properties": {
"name": {
"type": "string",
"analyzer": "autocomplete",
"search_analyzer": "standard"
}
}
}
also you can read more about this here: https://www.elastic.co/guide/en/elasticsearch/guide/master/_index_time_search_as_you_type.html
Once you fix the relevance then sorting should work correctly.

Related

How to rank ElasticSearch documents based on scores

I have an Elastic search index that contain thousands of documents, each document represent a user.
each document has set of fields (is_verified: boolean, country: string, is_creator: boolean), also i have another service that call ES search to lookup for documents, how i can rank the retrieved documents based on those fields? for example a verified user with match should come first than un verified one.
is there some kind of document scoring while indexing the documents ? if yes can i modify it based on my criteria ?
what shall i read/look to understand how to rank in elastic search.
thanks
I guess the sorting function mentioned by Mikael is pretty straight forward and should cover your use cases. Check Elastic Doc for more information on that.
But in case you want to do really fancy sorting, maybe you could use a bool query and different boost values to set your desired relevancy for each matched field. It tried to come up with a real life example, but honestly didn't find one. For the sake of completeness, he following snippet should give you an idea how to achieve similar results as with the sort API (but still, i would prefer using sort).
GET /yourindexname/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"name": "Monica"
}
}
],
"should": [
{
"term": {
"is_verified": {
"value": true,
"boost": 2
}
}
},
{
"term": {
"is_creator": {
"value": true,
"boost": 2
}
}
}
]
}
}
}
is there some kind of document scoring while indexing the documents ? if yes can i modify it based on my criteria ?
I wouldn't assign a fixed score to a document while indexing, as the score should be dependent on the query. However, if you insist to have a predefined relevancy for each document, theoretically you could add a field relevancy having that value for ordering and use it later in the query:
GET /yourindexname/_search
{
"query" : {
"match" : {
"name": "Monica"
}
},
"sort" : [
{
"relevancy": {
"order": "desc"
},
"_score"
}
]
}
You can consider using the Sort Api inside your search queries ,In example below we used the search on the field country and sorted the result with respect of Boolean field (is_verified) , You can also add the other Boolean field inside Sort brackets .
GET /yourindexname/_search
{
"query" : {
"match" : {
"country": "Iceland"
}
},
"sort" : [
{
"is_verified": {
"order": "desc"
}
}
]
}

Elasticsearch ordering by field value which is not in the filter

can somebody help me please to make a query which will order result items according some field value if this field is not part of query in request. I have a query:
{
"_source": [
"ico",
"name",
"city",
"status"
],
"sort": {
"_score": "desc",
"status": "asc"
},
"size": 20,
"query": {
"bool": {
"should": [
{
"match": {
"normalized": {
"query": "idona",
"analyzer": "standard",
"boost": 3
}
}
},
{
"term": {
"normalized2": {
"value": "idona",
"boost": 2
}
}
},
{
"match": {
"normalized": "idona"
}
}
]
}
}
}
The result is sorted according field status alphabetically ascending. Status contains few values like [active, canceled, old....] and I need something like boosting for every possible values in query. E.g. active boost 5, canceled boost 4, old boost 3 ........... Is it possible to do it? Thanks.
You would need a custom sort using script to achieve what you want.
I've just made use of generic match_all query for my query, you can probably go ahead and add your query logic there, but the solution that you are looking for is in the sort section of the below query.
Make sure that status is a keyword type
Custom Sorting Based on Values
POST <your_index_name>/_search
{
"query":{
"match_all":{
}
},
"sort":[
{ "_score": "desc" },
{
"_script":{
"type":"number",
"script":{
"lang":"painless",
"inline":"if(params.scores.containsKey(doc['status'].value)) { return params.scores[doc['status'].value];} return 100000;",
"params":{
"scores":{
"active":5,
"old":4,
"cancelled":3
}
}
},
"order":"desc"
}
}
]
}
In the above query, go ahead and add the values in the scores section of the query. For e.g. if your value is new and you want it to be at say value 2, then your scores would be in the below:
{
"scores":{
"active":5,
"old":4,
"cancelled":3,
"new":6
}
}
So basically the documents would first get sorted by _score and then on that sorted documents, the script sort would be executed.
Note that the script sort is desc by nature as I understand that you would want to show active documents at the top, followed by other values. Feel free to play around with it.
Hope this helps!

How are the documents ordered in Elasticsearch if the sort value for two documents is same?

I was working with products data, here: link
The search query that sort by keyword field tags using max mode is as follows.
GET product/_doc/_search
{
"size":100,"from":20,"_source":["tags", "name"],
"query": {
"match_all": {}
},
"sort": [
{"tags":{
"order":"desc",
"mode":"max"
}}
]
}
Some documents have same sort value. I had read somewhere that if the sort value is same, it arranges by internal doc id (_id). However, the case does not seem so. See screenshot below:
First _id: 961 followed by _id:972 (fine). However, then came _id: 114. I am not understanding how it got random.
Help will be appreciated.
As you have already seen, its random. To overcome this you can add another field to be used to sort when the sorting value for first field is same. As you want to use _id the query will be then as follows:
{
"size": 100,
"from": 20,
"_source": [
"tags",
"name"
],
"query": {
"match_all": {}
},
"sort": [
{
"tags": {
"order": "desc",
"mode": "max"
}
},
{
"_id": "asc"
}
]
}

Sort documents by size of a field

I have documents like below indexed,
1.
{
"name": "Gilly",
"hobbyName" : "coin collection",
"countries": ["US","France","Georgia"]
}
2.
{
"name": "Billy",
"hobbyName":"coin collection",
"countries":["UK","Ghana","China","France"]
}
Now I need to sort these documents based on the array length of the field "countries", such that the result after the sorting would be of the order document2,document1. How can I achieve this using elasticsearch?
You can use script based sorting to achieve this.
{
"query": {
"match_all": {}
},
"sort": {
"_script": {
"type": "number",
"script": "doc['countries'].values.size()",
"order": "desc"
}
}
}
I would suggest using token count type in Elasticsearch.
By using scripts , it can be done (can check here for how to do it using scripts). But then results wont be perfect.
Scripts mostly uses filed data cache and duplicate are removed in this.
You can read more on how to use token count type here.

elasticsearch sort by document id

I have a simple index in elasticsearch and all my ids are manually added, i.e. I do not add documents with automatic string ids.
Now the requirement is to get list of all documents page by page and sorted by the document id (i.e. _id)
When I tried this with _id, it did not work. Then I looked for it on forums and found out this much that I have to use _uid for that. This actually works, although I have no clue how. But another problem is that the sorting is done as if the the _id is string. And it actually is a string. But I want the results as if the _id was a number.
So there are two issues here:
Why sorting does not work with _id and it does work with _uid
Is there a way to get document ids sorted as numbers and not integers
For e.g. if my doc ids are 1, 2, 3, ..... , 55
I am getting results in this order:
1, 10, 11, 12, ... , 19, 2, 20, ... so on
While I would like to get the results in this order:
1, 2, 3, ... so on
Any help is highly appreciated!
Have the _id indexed:
{
"mappings": {
"some_type": {
"_id": {
"index": "not_analyzed"
}
}
}
}
And use a script:
{
"sort": {
"_script": {
"type": "number",
"script": "doc['_id'].value?.isInteger()?doc['_id'].value.toFloat():null",
"order": "asc"
}
}
}
Even though I strongly recommend, if possible, changing the id to integer rather having it as string and contain numbers, instead.
And I kind of doubt that it worked with _uid because _uid is a combination between type and id.
For some reasons the code above didn't work for me. ("dynamic method [java.lang.String, isInteger/0] not found")
However the script below works (only if your _id can be converted into integers)
GET ENDPOINT/INDEX/_search
{
"sort": {
"_script": {
"type": "number",
"script": "return Integer.parseInt(doc['_id'].value)",
"order": "desc" // I personally needed descending
}
}
}
Instead of id, I used id.keyword and it worked.. sample code below:
GET index_name/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"id.keyword": {
"order": "desc"
}
}
]
}

Resources