How to change score for the record - elasticsearch

I'm using elastic search 6.4.0 and want to change the score for a specific record in the index.
What the boost will exactly perform When I send the below request. I am seeing score values are changing but the values are not updated in the index its giving me on the query time only. I am bit confused with the boost.
GET index/_search
{
"query": {
"multi_match": {
"query": "foo bar",
"fields": ["title^5", "content"]
}
}
}

The score is not saved in the index, it's calculated with each query. Your boost is saying a match of foo bar in the title field is 5 times more valuable than a match in the content field. This doesn't get persisted anywhere, it's just reflected in the score of your query results as you saw.

Related

Less restrictive search doesn't return any hits in ElasticSearch

The query below returns hits, for example where name is "Balances by bank":
GET /_search
{ "query": {
"multi_match": { "query": "Balances",
"fields": ["name","descrip","notes"]
}
}
}
So why this doesn't return anything? Note that the query is less restrictive, the word is "Balance" and not "Balances" with an s.
GET /_search
{ "query": {
"multi_match": { "query": "Balance",
"fields": ["name","descrip","notes"]
}
}
}
What search would return both?
You need to change your mapping to be able to do that.
If you didn't specified a mapping with specific analyzers when creating your index, elasticsearch will use the default mapping and analyzer.
The default mapping will map each text field as both text and keyword, so you will be able to performe full text search (match part of the string) and keyword search (match the whole string), but it will use the standard analyzer.
With the standard analyzer your example Balances by bank becomes the following list of tokens: [Balances, by, bank], those items are added to the inverted index and elasticsearch can find the documents when you search for any of them.
When you search for just Balance, this term does not exist in the inverted index and elasticsearch returns nothing.
To be able to return both Balance and Balances you need to change your mapping and use the analyzer for the english language, this analyzer will reduce your terms to their stem and match Balance, Balances as also Balancing, Balanced, Balancer etc.
Look at this part of the documentation to see how the analysis process work.
And of course, you can also search for Balance* and it will return both Balance and Balances, but it is a different query.

Boosting the relevance score based on the unique keyword found

I am in a scenario where I need to give more relevance to the document in Index if it has a unique keyword. Let me provide a scenario.
Let's say I need to search for a term znkdref unsuccessfull so the result will have contents which have znkdref or unsuccessfull or znkdref unsuccessfull but here I want that the contents which are having znkdref unsuccessfull should have highest relevance and then content having znkdref should have less relevance and then content having unsuccessfull should have least relevance.
Is there a way to achieve this ?? I would be glad to get any help
You want to use Query Time Boosting, in particular Prioritized Clauses.
In short you need to extract the keywords that you want boosted and build a query that boosts the parts that you want.
{
"query": {
"bool": {
"should": [{
"match": {
"content": {
"query": "znkdref",
"boost": 2
}
}
},
{
"match": {
"content": {
"query": "unsuccessfull"
}
}
}]
}
}
}
Update based on comment:
If you want to know why a document got the score that it did (maybe to identify "keywords") then you can pass in "explain" as a query parameter or set it in the root POST payload. The result will now have document frequency counts and sub scores.
Do you mean "znkdref" is a unique keyword? For example, "znkdref" is a special name of something. If so.
Of course, the documents match the whole query string "znkdref unsuccessfull" will have a highest relevance score in general.
The documents contain "znkdref" will usually have a higher relevance score than the documents contain "unsuccessfull". Because TF.IDF score of "znkdref" is bigger than TF.IDF score of "unsuccessfull".
The relevance score function is described at https://www.elastic.co/guide/en/elasticsearch/guide/current/practical-scoring-function.html
I hope that my answer is helpful for you.

Is it possible to chain fquery filters in elastic search with exact matches?

I have been having trouble writing a method that will take in various search parameters in elasticsearch. I was working with queries that looked like this:
body:
{query:
{filtered:
{filter:
{and:
[
{term: {some_term: "foo"}},
{term: {is_visible: true}},
{term: {"term_two": "something"}}]
}
}
}
}
Using this syntax I thought I could chain these terms together and programatically generate these queries. I was using simple strings and if there was a term like "person_name" I could split the query into two and say "where person_name match 'JOHN'" and where person_name match 'SMITH'" getting accurate results.
However, I just came across the "fquery" upon asking this question:
Escaping slash in elasticsearch
I was not able to use this "and"/"term" filter searching a value with slashes in it, so I learned that I can use fquery to search for the full value, like this
"fquery": {
"query": {
"match": {
"by_line": "John Smith"
But how can I search like this for multiple items? IT seems that when i combine fquery and my filtered/filter/and/term queries, my "and" term queries are ignored. What is the best practice for making nested / chained queries using elastic search ?
As in the comment below, yes I can just add fquery to the "and" block like so
{:filtered=>
{:filter=>
{:and=>[
{:term=>{:is_visible=>true}},
{:term=>{:is_private=>false}},
{:fquery=>
{:query=>{:match=>{:sub_location=>"New JErsey"}}}}]}}}
Why would elasticsearch also return results with "sub_location" = "new York"? I would like to only return "new jersey" here.
A match query analyzes the input and by default it is a boolean OR query if there are multiple terms after the analysis. In your case, "New JErsey" gets analyzed into the terms "new" and "jersey". The match query that you are using will search for documents in which the indexed value of field "sub_location" is either "new" or "jersey". That is why your query also matches documents where the value of field "sub_location" is "new York" because of the common term "new".
To only match for "new jersey", you can use the following version of the match query:
{
"query": {
"match": {
"sub_location": {
"query": "New JErsey",
"operator": "and"
}
}
}
}
This will not match documents where the value of field "sub_location" is "New York". But, it will match documents where the value of field "sub_location" is say "York New" because the query finally translates into a boolean query like "York" AND "New". If you are fine with this behaviour, well and good, else read further.
All these issues arise because you are using the default analyzer for the field "sub_location" which breaks tokens at word boundaries and indexes them. If you really do not care about partial matches and want to always match the entire string, you can make use of custom analyzers to use Keyword Tokenizer and Lowercase Token Filter. Mind you, going ahead with this approach will need you to re-index all your documents again.

Constant Score Query elasticsearch boosting

My understanding of Constant Score Query in elasticsearch is that boost factor would be assigned as score for every matching query. The documentation says:
A query that wraps a filter or another query and simply returns a constant score equal to the query boost for every document in the filter.
However when I send this query:
"query": {
"constant_score": {
"filter": {
"term": {
"source": "BBC"
}
},
"boost": 3
}
},
"fields": ["title", "source"]
all the matching documents are given a score of 1?! I cannot figure out what I am doing wrong, and had also tried with query instead of filter in constant_score.
Scores are only meant to be relative to all other scores in a given result set, so a result set where everything has the score of 3 is the same as a result set where everything has the score of 1.
Really, the only purpose of the relevance _score is to sort the results of the current query in the correct order. You should not try to compare the relevance scores from different queries. - Elasticsearch Guide
Either the constant score is being ignored because it's not being combined with another query or it's being normalized. As #keety said, check to the output of explain to see exactly what's going on.
Constant score query gives equal score to any matching document irrespective any scoring factors like TF, IDF etc. This can be used when you don't care whether how much a doc matched but just if a doc matched or not and give a score too, unlike filter.
If you want score as 3 literally for all the matching documents for a particular query, then you should be using function score query, something like
"query": {
"function_score": {
"functions": [
{
"filter": { "term": { "source": "BBC" } },
"weight": 3
}
]
}
...
}

Boosting in Elasticsearch

I am new to elasticsearch. In elasticsearch we can use the term boost in almost all queries. I understand it's used for modify score of documents. But i can't find actual use of it. My query is if i use boost values in some queries, will it affect final score of search or the boost rank of docs in index itself.
And what is main difference between boost at index and boost at querying..
Thanks in Advance..!
Query time boost allows you to give more weight to one query than to another. For instance, let's say you are querying the title and body fields for "Quick Brown Fox", you could write it as:
{
"query": {
"bool": {
"should": [
{
"match": {
"title": "Quick Brown Fox"
}
},
{
"match": {
"body": "Quick Brown Fox"
}
}
]
}
}
}
But you decide that you want the title field to be more important than the body field, which means you need to boost the query on the title field by (eg) 2:
{
"query": {
"bool": {
"should": [
{
"match": {
"title": {
"query": "Quick Brown Fox",
"boost": 2
}
}
},
{
"match": {
"body": "Quick Brown Fox"
}
}
]
}
}
}
(Note how the structure of the match clause changed to accommodate the boost parameter).
The boost value of 2 doesn't double the _score exactly - the scores go through a normalization process. So you should think of boost as make this query clause relatively more important than the other query clauses.
My doubt is if i use boost values in some queries. will it affect final score of search
Yes it does, but you shouldn't rely on the actual value of _score anyway. Its only purpose is to allow Elasticsearch to decide which documents are most relevant to this query. If the query changes, the scores change.
Re index time boosting: don't use it. It's inflexible and error prone.
Boost at query time won't modify your index. It only applies boost factor on fields when searching.
I prefer boost at query time as it's more flexible. If you need to change your boost rules and you had set it at index time, you will probably need to reindex.
Use cases of boosting : Suppose you are building a e-commerce web app, and your product data is in elastic search. Whenever a customer uses search bar you query elastic search and displays the result in web app.
Elastic search keeps relevance score for every document and returns the result in sorted order of the relevance score.
Now let's assume a user searches for "samsung phones", then should your web app just show samsung phones -> Answer is NO.
Your web app should show other phones as well (as user may like those as well) but first show samsung phones (as he/she is looking for those) and then show other phones as well.
So question is how do you query where samsung phones comes up in result ? -> Answer is relevance score.
Let say you hit query like for all mobile phones and samsung phone and the keep high relevance score of samsung phones,
Then result will contain first samsung phones and then other phones.

Resources