How to rank ElasticSearch documents based on scores - elasticsearch

I have an Elastic search index that contain thousands of documents, each document represent a user.
each document has set of fields (is_verified: boolean, country: string, is_creator: boolean), also i have another service that call ES search to lookup for documents, how i can rank the retrieved documents based on those fields? for example a verified user with match should come first than un verified one.
is there some kind of document scoring while indexing the documents ? if yes can i modify it based on my criteria ?
what shall i read/look to understand how to rank in elastic search.
thanks

I guess the sorting function mentioned by Mikael is pretty straight forward and should cover your use cases. Check Elastic Doc for more information on that.
But in case you want to do really fancy sorting, maybe you could use a bool query and different boost values to set your desired relevancy for each matched field. It tried to come up with a real life example, but honestly didn't find one. For the sake of completeness, he following snippet should give you an idea how to achieve similar results as with the sort API (but still, i would prefer using sort).
GET /yourindexname/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"name": "Monica"
}
}
],
"should": [
{
"term": {
"is_verified": {
"value": true,
"boost": 2
}
}
},
{
"term": {
"is_creator": {
"value": true,
"boost": 2
}
}
}
]
}
}
}
is there some kind of document scoring while indexing the documents ? if yes can i modify it based on my criteria ?
I wouldn't assign a fixed score to a document while indexing, as the score should be dependent on the query. However, if you insist to have a predefined relevancy for each document, theoretically you could add a field relevancy having that value for ordering and use it later in the query:
GET /yourindexname/_search
{
"query" : {
"match" : {
"name": "Monica"
}
},
"sort" : [
{
"relevancy": {
"order": "desc"
},
"_score"
}
]
}

You can consider using the Sort Api inside your search queries ,In example below we used the search on the field country and sorted the result with respect of Boolean field (is_verified) , You can also add the other Boolean field inside Sort brackets .
GET /yourindexname/_search
{
"query" : {
"match" : {
"country": "Iceland"
}
},
"sort" : [
{
"is_verified": {
"order": "desc"
}
}
]
}

Related

Elasticsearch collapse not working with search_after with single sort field and PIT

I have an Elastic query that initially returns results. When I attempt the query again using search_after for paging, I am getting the error: Cannot use [collapse] in conjunction with [search_after] unless the search is sorted on the same field. Multiple sort fields are not allowed. So far as I can tell, I am sorting and collapsing using just a single field per_id. Is my query structured incorrectly or is there something else I need to do to get this query to run?
GET /_search
{
"query": {
"bool": {
"must": [{
"term": {
"pform": "iphone"
}
}]
}
},
"collapse": {
"field": "per_id"
},
"pit": {
"id": "g-ABCDDEFG12345678ABCDDEFG12345678==",
"keep_alive": "5m"
},
"sort": [
{"per_id": "asc"}
],
"search_after" : [
"ABCDDEFG12345678",
123456
]
}
I needed to exclude the tie breaker in my search_after. It shouldn't cause duplicates because I am using a PIT and sorting on the collapse field, meaning duplicates shouldn't exist in the my result set.
"search_after" : [
"ABCDDEFG12345678"
]
So I needed to remove the tiebreaker returned from the previous result before passing it into the next one

Elasticsearch Boolean query with Constant score wrapper

When using elasticsearch-7 I'm confused by es compound queries syntax.
Though reading es documents repeatedly but i just find standard syntax of Boolean or Constant score seperately.
As it illuminate,i understand what is 'query context' and what is 'filter context'.But when combining these two query type in a single query i don't know what it mean.
Let's see a example:
GET /classes_test/_search
{
"size": "21",
"query": {
"constant_score": {
"filter": {
"bool": {
"must": [
{
"match": {
"class_name": "29386556"
}
}
],
"should": [
{
"term": {
"master": "7033560"
}
},
{
"term": {
"assistant": "7033560"
}
},
{
"term": {
"students": "7033560"
}
}
],
"minimum_should_match": 1,
"must_not": [
{
"term": {
"class_id": 0
}
}
],
"filter": [
{
"term": {
"class_status": "1"
}
}
]
}
}
}
}
}
This query can be executed and response well.Each item in response content has a '_score' value with 1.0.
So,is it mean that the sub bool query as a entirety is in a filter context though it has a 'must' and 'should'?
Also i found boolean query can have a constant score sub query.
Why es allow these syntax but has no more words to explain?
If you use a constant_score query, you'll never get scores different than 1.0, unless you specify boost parameters in which case the score will match those.
If you need scoring you obviously need to ditch constant_score.
In your case, your match query on class_name cannot yield any other score than 1 or 0 since this is basically a yes/no filter, not a matching based on full-text search.
To sum up, all your query executes in a filter context (hence score 0 or 1) since you don't rely on full-text search. So you get scoring whenever you use full-text search, not because you use a match query. In your case, you can merge all must constraints into filter, it won't make any difference since you only have filters (yes/no matches) and no full-text search.

ElasticSearch Ignoring words having one single letter

I'm a beginner in ElasticSearch, I have an application that uses elasticSearch to look for ingredients in a given food or fruit...
I'm facing a problem with scoring if the user for example tapes: "Vitamine d"
ElasticSearch will give the "vitamine" phrase that has the best scoring even if the phrase "Vitamine D" exists and normally it should have the highest score.
I see that if the second word "d" in my case is just one letter then elastic search will ignore it.
I did another example: "vitamine b12" and I had the correct score.
Here is the query that the application send to the server:
{
"from": 0,
"size": 5,
"query": {
"bool": {
"must": [
{
"match": {
"constNomFr": {
"query": "vitamine d"
}
}
}
],
"should": [
{
"prefix": {
"constNomFr": {
"value": "vitamine d",
"boost": 2
}
}
}
]
}
},
"_source": {
"excludes": [
"alimentDtos"
]
}
}
What could I modify to make it work?
Thank you so much.
If you can identify your ingredients, I recommend you to index them on a separate field "ingredients" setting it's type to keyword. This way you can use a term filter and you can even run aggregations.
You may already have your documents indexed that way, in that case if your are using the default mapping, just run your query against your_field_name.keyword.
If you don't have your ingredients indexed as an array then you should take a look to the elasticsearch analyzers to choose or build the right one.

How to boost individual documents

I have a pretty complex query and now I want to boost some documents that fulfill some criteria. I have the following simplified document structure and I try to give some documents a boost based on the id, genre, tag.
{
"id": 123,
"genres": ["ACTION", "DRAMA"],
"tags": ["For kids", "Romantic", "Nature"]
}
What I want to do is for example
id: 123 boost: 5
genres: ACTION boost: 3
tags: Romantic boost: 0.2
and boost all documents that are contained in my query and fit the criteria but I don't want to filter them out. So query clause boosting is not of any help I guess.
Edit: To make if easier to understand what I want to achieve (not sure if it is possible with elasticsearch, no is also a valid answer).
I want to search with a query and get a result set. In this set I want to boost some documents. But I don't want to enlarge the result set or filter it. The boost should be independent from the query.
For example I search for a specific tag and want to boost all documents with category 'ACTION' in the result set. But I don't want all documents with category 'ACTION' in the result set and also I don't want only documents with the specific tag AND category 'ACTION'.
I think you need to have Dynamic boosting during query time.
The first matches the id title with boost and second one matches the 'genders' ACTION.
{
"query": {
"bool": {
"should": [
{
"match": {
"title": {
"query": "id",
"boost": 5
}
}
},
{
"match": {
"content": "Action"
}
}
]
}
}
}
If you want to have multi_match match based on your query:
{
"multi_match" : {
"query": "some query terms here",
"fields": [ "id^5", "genders^3", "tags^0.2" ]
}
}
Note: the ^5 means boost for the title.
Edit:
Maybe you are asking for different types of multi_match queries (at least for ES 5.x) from the ES reference guide:
best_fields
(default) Finds documents which match any field, but uses
the _score from the best field. See best_fields.
most_fields
Finds documents which match any field and combines the _score from
each field. See most_fields.
cross_fields
Treats fields with the same analyzer as though they were one big
field. Looks for each word in any field. See cross_fields.
phrase
Runs a match_phrase query on each field and combines the _score from
each field. See phrase and phrase_prefix.
phrase_prefix
Runs a match_phrase_prefix query on each field and combines the _score
from each field. See phrase and phrase_prefix.
More at: ES 5.4 ElasticSearch reference
I found a solution and it was pretty simple. I use a boosting query. I now just nest the different boosting criteria with and my original query is now the base query.
https://www.elastic.co/guide/en/elasticsearch/reference/2.3/query-dsl-boosting-query.html
For example:
{
"query": {
"boosting": {
"positive": {
"boosting": {
"positive": {
"match": {
"director": "Spielberg"
}
},
"negative": {
"term": {
"genres": "DRAMA"
}
},
"negative_boost": 1.3
}
},
"negative": {
"term": {
"tags": "Romantic"
}
},
"negative_boost": 1.2
}
}
}

ElasticSearch - sort search results by relevance and custom field (Date)

For example, I have entities with two fields - Text and Date. I want search by entities with results sorted by Date. But if I do it simply, then the result is unexpected.
For search query "Iphone 6" there are the newest texts only with "6" in top of еру results, not with "iphone 6". Without sorting the results seem nice, but not ordered by Date as I want.
How write custom sort function which will consider both relevance and Date? Or may be exist way to give weight to field Date which will be consider in scoring?
In addition, may be I shall want to suppress search results only with "6". How to customize search to find results only by bigrams for example?
Did you tried with bool query like this
{
"query": {
"bool": {
"must": {
"match": {
"field": "iphone 6"
}
}
}
},
"sort": {
"date": {
"order": "desc"
}
}
}
or with your query you can also do this with is more appropriate way of doing i guess ..
just add this as sort
"sort": [
{ "date": { "order": "desc" }},
{ "_score": { "order": "desc" }}
]
all matching results sorted first by date, then by relevance.
The solution is to use _score and the date field both in sort. _score as the first sort order and date field as secondary sort order.
You can use simple match query to perform relevance match.
Try it out.
Data setup:
POST ecom/prod
{
"name":"iphone 6",
"date":"2019-02-10"
}
POST ecom/prod
{
"name":"iphone 5",
"date":"2019-01-10"
}
POST ecom/prod
{
"name":"iphone 6",
"date":"2019-02-28"
}
POST ecom/prod
{
"name":"6",
"date":"2019-03-01"
}
Query for relevance and date based sorting:
POST ecommerce/prododuct/_search
{
"query": {
"match": {
"name": "iphone 6"
}
},
"sort": [
{
"_score": {
"order": "desc"
}
},
{
"date": {
"order": "desc"
}
}
]
}
You could definitely use a phrase matching query for this.
It does position-aware matching so the documents will be considered a match for your query only if both "iphone" and "6" occur in the searched fields AND that their occurrences respects this order, "iphone" shows up before "6".
looks like you want to sort first by relevance and then by date. this query will do it.
{ "query" : {
"match" : {
"my_field" : "my query"
}
},
"sort": {
"pubDate": {
"order": "desc",
"mode": "min"
}
}
}
When sorting on fields with more than one value, remember that the
values do not have any intrinsic order; a multivalue field is just a
bag of values. Which one do you choose to sort on? For numbers and
dates, you can reduce a multivalue field to a single value by using
the min, max, avg, or sum sort modes. For instance, you could sort on
the earliest date in each dates field by using the above query.
elasticsearch guide sorting
I think your relevance is broken. You should use two different analyzers, 1 for setting up your index and another for searching. like this:
PUT /my_index/my_type/_mapping
{
"my_type": {
"properties": {
"name": {
"type": "string",
"analyzer": "autocomplete",
"search_analyzer": "standard"
}
}
}
also you can read more about this here: https://www.elastic.co/guide/en/elasticsearch/guide/master/_index_time_search_as_you_type.html
Once you fix the relevance then sorting should work correctly.

Resources