Elasticsearch - Impact of adding Boost to query - elasticsearch

I have a very simple Elastic query mentioned below.
{
"query": {
"bool": {
"must": [
{
"bool": {
"minimum_should_match": 1,
"should": [
{
"match": {
"tag": {
"query": "Audience: PRO Brand: Samsung",
"boost": 3,
"operator": "and"
}
}
},
{
"match": {
"tag": {
"query": "audience: PRO brand samsung",
"boost": 2,
"operator": "or"
}
}
}
]
}
}
]
}
}
}
I want to know if I add a boost in the query, will there be any performance impact because of this, and also will boosting help if you have a very large data set, where the occurrence of a search word is common.

Elasticsearch adds boost param with default value, IMO giving different value won't make much difference in the performance, but you should be able to measure it yourself.
Reg. your second question, adding boost definitely makes sense where the occurrence of your search words are common, this will help you to find the relevant document. for example: suppose you are searching for query in a index containing Elasticsearch posts(query will be very common on Elasticsearch posts), but you want the give more weight to documents which have tag elasticsearch-query. Adding boosts in this case, will provide you more relevant results.

Related

What is the difference between should and boost final score calculation?

I'm a little confused about what is the difference between should and boost final score calculation
when a bool query has a must clause, the should clauses act as a boost factor, meaning none of them have to match but if they do, the relevancy score for that document will be boosted and thus appear higher in the result.
so,if we have:
one query which contains must and should clauses
vs
second query which contains must clause and boosting clause
Is there a difference ?
when you recommend to use must and should vs must and boosting clauses in a query ?
You can read the documentation of boolean query here, there is huge difference in the should and boost.
Should and must both contributes to the _score of the document, and as mentioned in the above documentation, follows the
The bool query takes a more-matches-is-better approach, so the score from each matching must or should clause will be added together to provide the final _score for each document.
While boost is a parameter, using which you can increase the weight according to your value, let me explain that using an example.
Index sample docs
POST _doc/1
{
"brand" : "samsung",
"name" : "samsung phone"
}
POST _doc/2
{
"brand" : "apple",
"name" : "apple phone"
}
Boolean Query using should without boost
{
"query": {
"bool": {
"should": [
{
"match": {
"name": {
"query": "apple"
}
}
},
{
"match": {
"brand": {
"query": "apple"
}
}
}
]
}
}
}
Search result showing score
"max_score": 1.3862942,
Now in same query use boost of factor 10
{
"query": {
"bool": {
"should": [
{
"match": {
"name": {
"query": "apple"
}
}
},
{
"match": {
"brand": {
"query": "apple",
"boost": 10 --> Note additional boost
}
}
}
]
}
}
}
Query result showing boost
"max_score": 7.624619, (Note considerable high score)
In short, when you want to boost a particular document containing your query term, you can additionally pass the boost param and it will be on top of the normal score calculated by should or must.

ElasticSearch: obtaining individual scores from each query inside of a bool query

Assume I have a compound bool query with various "must" and "should" statements that each may include different leaf queries including "multi-match" and "match_phrase" queries such as below.
How can I get the score from individual queries packed into a single query?
I know one way could be to break it down into multiple queries, execute each, and then aggregate the results in code-level (not query-level). However, I suppose that is less efficient, plus, I lose sorting/pagination/.... features from ElasticSearch.
I think "Explanation API" is also not useful for me since it provides very low-level details of scoring (inefficient and hard to parse) while I just need to know the score for each specific leaf query (which I've also already named them)
If I'm wrong on any terminology (e.g. compound, leaf), please correct me. The big picture is how to obtain individual scores from each sub-query inside of a bool query.
PS: I came across Different score functions in bool query. However, it does not return the scores. If I wrap my queries in "function_score", I want the scoring to be default but obtain the individual scores in response to the query.
Please see the snippet below:
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "...",
"fields": [
"field1^3",
"field2^5"
],
"_name": "must1_mm",
"boost": 3
}
}
],
"should": [
{
"multi_match": {
"query": "...",
"fields": [
"field3^2",
"field4^5"
],
"boost": 2,
"_name": "should1_mm",
"boost": 2
}
},
{
"match_phrase": {
"field5": {
"_name": "phrase1",
"boost": 1.5,
"query": "..."
}
}
},
{
"match_phrase": {
"field6": {
"_name": "phrase2",
"boost": 1,
"query": "..."
}
}
}
]
}
}
}```

Add query time weight/boost based on field value

I'm currently using elasticsearch version 2.4 and wish to fine tune my result set based on a field I have called 'type' using query time boosting or weighting.
For example
If the value of the field 'type' is "Boats" add a weighting or boost of 4
If the value of the field 'type' is "Caravans" add a weighting or boost of 3
Thereforfor making boats that matched the query string appear before caravans in the search results.
I've found the documentation I've read so far very convoluted in regards to filters, functions and function scores. I'd appreciate if someone could provide an example to my scenario to get me going.
You should use the constant_score query and the boost option to prioritize.
{
"query": {
"bool": {
"should": [
{
"constant_score": {
"boost": 4,
"query": {
"match": {
"description": "Boats"
}
}
}
},
{
"constant_score": {
"boost": 3,
"query": {
"match": {
"description": "Caravans"
}
}
}
}
]
}
}
}

Elasticsearch fuzzy matching: How can I get direct hits first?

I'm using Elasticsearch to search names in a database, and I want it to be fuzzy to allow for minor spelling errors. Based on the advice I've found on the matter, I'm using "match" and "fuzziness" instead of "fuzzy", which definitely seems to be more accurate. This is my query:
{ "query":
{ "match":
{ "last_name":
{ "query": "Beach",
"type": "phrase",
"fuzziness": 2
}
}
}
}
However, even though I have numerous results with last_name "Beach" (I know there's at least 100), I also get results with last_name "Beech" and "Berch" in the first 10 hits returned by my query. Can someone help me figure out how to get the exact matches first?
Try changing your query to a boolean query with 2 should queries.
The first one being your current query, and then second being a query that only gives exact matches, then give that one a big boost (like 10.0).
That should get your exact matches on top while still listing your partial matches.
I tried to edit "Constantijn" answer above to include sample based on his answer, but still not appearing (pending approval). So, I will just put a sample here instead...
{
"query": {
"bool": {
"should": [
{
"match": {
"last_name": {
"query": "Beach",
"fuzziness": 2,
"boost": 1
}
}
},
{
"match": {
"last_name": {
"query": "Beach",
"boost": 10
}
}
}
]
}
}
}

elasticsearch boost importance of exact phrase match

Is there a way in elasticsearch to boost the importance of the exact phrase appearing in the the document?
For example if I was searching for the phrase "web developer" and if the words "web developer" appeared together they would be boosted by 5 compared to "web" and "developer" appearing separately throughout the document. Thereby any document that contained "web developer" together would appear first in the results.
You can combine different queries together using a bool query, and you can assing a different boost to them as well. Let's say you have a regular match query for both the terms, regardless of their positions, and then a phrase query with a higher boost.
Something like the following:
{
"query": {
"bool": {
"should": [
{
"match": {
"field": "web developer"
}
},
{
"match_phrase": {
"field": "web developer",
"boost": 5
}
}
],
"minimum_number_should_match": 1
}
}
}
As an alternative to javanna's answer, you could do something similar with must and should clauses within a bool query:
{
"query": {
"bool": {
"must": {
"match": {
"field": "web developer",
"operator": "and"
}
},
"should": {
"match_phrase": {
"field": "web developer"
}
}
}
}
}
Untested, but I believe the must clause here will match results containing both 'web' and 'developer' and the should clause will score phrases matching 'web developer' higher.
You could try using rescore to run an exact phrase match on your initial results. From the docs:
"Rescoring can help to improve precision by reordering just the top (eg 100 - 500) documents returned by the query and post_filter phases, using a secondary (usually more costly) algorithm, instead of applying the costly algorithm to all documents in the index."
https://www.elastic.co/guide/en/elasticsearch/reference/current/filter-search-results.html#rescore
I used below sample query in my case which is working. It brings exact + fuzzy results but exact ones are boosted!
{ "query": {
"bool": {
"should": [
{
"match": {
"name": "pala"
}
},
{
"fuzzy": {
"name": "pala"
}
}
]
}}}
I do not have enough reputation to comment on James Adison's answer, which I agree with.
What is still missing is the boost factor, which can be done using the following syntax:
{
"match_phrase":
{
"fieldName": {
"query": "query string for exact match",
"boost": 10
}
}
}
I think its default behaviour already with match query "or" operator. It'll filter phrase "web developer" first and then terms like "web" or "develeper". Though you can boost your query using above answers. Correct me if I'm wrong.

Resources