Add query time weight/boost based on field value - elasticsearch

I'm currently using elasticsearch version 2.4 and wish to fine tune my result set based on a field I have called 'type' using query time boosting or weighting.
For example
If the value of the field 'type' is "Boats" add a weighting or boost of 4
If the value of the field 'type' is "Caravans" add a weighting or boost of 3
Thereforfor making boats that matched the query string appear before caravans in the search results.
I've found the documentation I've read so far very convoluted in regards to filters, functions and function scores. I'd appreciate if someone could provide an example to my scenario to get me going.

You should use the constant_score query and the boost option to prioritize.
{
"query": {
"bool": {
"should": [
{
"constant_score": {
"boost": 4,
"query": {
"match": {
"description": "Boats"
}
}
}
},
{
"constant_score": {
"boost": 3,
"query": {
"match": {
"description": "Caravans"
}
}
}
}
]
}
}
}

Related

Elasticsearch - Impact of adding Boost to query

I have a very simple Elastic query mentioned below.
{
"query": {
"bool": {
"must": [
{
"bool": {
"minimum_should_match": 1,
"should": [
{
"match": {
"tag": {
"query": "Audience: PRO Brand: Samsung",
"boost": 3,
"operator": "and"
}
}
},
{
"match": {
"tag": {
"query": "audience: PRO brand samsung",
"boost": 2,
"operator": "or"
}
}
}
]
}
}
]
}
}
}
I want to know if I add a boost in the query, will there be any performance impact because of this, and also will boosting help if you have a very large data set, where the occurrence of a search word is common.
Elasticsearch adds boost param with default value, IMO giving different value won't make much difference in the performance, but you should be able to measure it yourself.
Reg. your second question, adding boost definitely makes sense where the occurrence of your search words are common, this will help you to find the relevant document. for example: suppose you are searching for query in a index containing Elasticsearch posts(query will be very common on Elasticsearch posts), but you want the give more weight to documents which have tag elasticsearch-query. Adding boosts in this case, will provide you more relevant results.

What is the difference between should and boost final score calculation?

I'm a little confused about what is the difference between should and boost final score calculation
when a bool query has a must clause, the should clauses act as a boost factor, meaning none of them have to match but if they do, the relevancy score for that document will be boosted and thus appear higher in the result.
so,if we have:
one query which contains must and should clauses
vs
second query which contains must clause and boosting clause
Is there a difference ?
when you recommend to use must and should vs must and boosting clauses in a query ?
You can read the documentation of boolean query here, there is huge difference in the should and boost.
Should and must both contributes to the _score of the document, and as mentioned in the above documentation, follows the
The bool query takes a more-matches-is-better approach, so the score from each matching must or should clause will be added together to provide the final _score for each document.
While boost is a parameter, using which you can increase the weight according to your value, let me explain that using an example.
Index sample docs
POST _doc/1
{
"brand" : "samsung",
"name" : "samsung phone"
}
POST _doc/2
{
"brand" : "apple",
"name" : "apple phone"
}
Boolean Query using should without boost
{
"query": {
"bool": {
"should": [
{
"match": {
"name": {
"query": "apple"
}
}
},
{
"match": {
"brand": {
"query": "apple"
}
}
}
]
}
}
}
Search result showing score
"max_score": 1.3862942,
Now in same query use boost of factor 10
{
"query": {
"bool": {
"should": [
{
"match": {
"name": {
"query": "apple"
}
}
},
{
"match": {
"brand": {
"query": "apple",
"boost": 10 --> Note additional boost
}
}
}
]
}
}
}
Query result showing boost
"max_score": 7.624619, (Note considerable high score)
In short, when you want to boost a particular document containing your query term, you can additionally pass the boost param and it will be on top of the normal score calculated by should or must.

Elasticsearch Boolean query with Constant score wrapper

When using elasticsearch-7 I'm confused by es compound queries syntax.
Though reading es documents repeatedly but i just find standard syntax of Boolean or Constant score seperately.
As it illuminate,i understand what is 'query context' and what is 'filter context'.But when combining these two query type in a single query i don't know what it mean.
Let's see a example:
GET /classes_test/_search
{
"size": "21",
"query": {
"constant_score": {
"filter": {
"bool": {
"must": [
{
"match": {
"class_name": "29386556"
}
}
],
"should": [
{
"term": {
"master": "7033560"
}
},
{
"term": {
"assistant": "7033560"
}
},
{
"term": {
"students": "7033560"
}
}
],
"minimum_should_match": 1,
"must_not": [
{
"term": {
"class_id": 0
}
}
],
"filter": [
{
"term": {
"class_status": "1"
}
}
]
}
}
}
}
}
This query can be executed and response well.Each item in response content has a '_score' value with 1.0.
So,is it mean that the sub bool query as a entirety is in a filter context though it has a 'must' and 'should'?
Also i found boolean query can have a constant score sub query.
Why es allow these syntax but has no more words to explain?
If you use a constant_score query, you'll never get scores different than 1.0, unless you specify boost parameters in which case the score will match those.
If you need scoring you obviously need to ditch constant_score.
In your case, your match query on class_name cannot yield any other score than 1 or 0 since this is basically a yes/no filter, not a matching based on full-text search.
To sum up, all your query executes in a filter context (hence score 0 or 1) since you don't rely on full-text search. So you get scoring whenever you use full-text search, not because you use a match query. In your case, you can merge all must constraints into filter, it won't make any difference since you only have filters (yes/no matches) and no full-text search.

Must match multiple values

I have a query that works fine when I need the property of a document
to match just one value.
However I also need to be able to search with must with two values.
So if a banana has id 1 and a lemon has id 2 and I search for yellow
I will get both if I have 1 and 2 in the must clause.
But if i have just 1 I will only get the banana.
{
"from": 0,
"size": 20,
"query": {
"bool": {
"should": [
{ "match":
{ "fruit.color": "yellow" }}
],
"must" : [
{ "match": { "fruit.id" : "1" } }
]
}
}
}
I haven´t found a way to search with two values with must.
is that possible?
If the document "must" be returned only if the id is 1 or 2, that sounds like another should clause. If I'm understanding your question properly, you want documents with either id 1 OR id 2. Additionally, if the color is yellow, give it a higher score.
Here's one way you might achieve what you're looking for:
{
"query": {
"bool": {
"should": {
"match": {
"fruit.color": "yellow"
}
},
"must": {
"bool": {
"should": [
{
"match": {
"fruit.id": "1"
}
},
{
"match": {
"fruit.id": "2"
}
}
]
}
}
}
}
}
Here I put the two match queries in the should clause of a separate bool query. This achieves the OR behavior you are looking for.
Have another look at the Bool Query documentation and take note of the nuances of should. It behaves differently by default depending on whether or not there is a sibling must clause and whether or not the bool query is being executed in filter context.
Another key option that is adjustable and can help you achieve your expected results is the minimum_should_match parameter. Have a look at this documentation page.
Instead of a match query, you could simply try the terms query for ORing between multiple terms.
Match queries are generally used for analyzed fields. For exact matching, you should use term queries
{
"from": 0,
"size": 20,
"query": {
"bool": {
"should": [
{ "match": { "fruit.color": "yellow" } }
],
"must" : [
{ "terms": { "fruit.id": ["1","2"] } }
]
}
}
}
term or terms query is the perfect way to fetch the exact text or id, using match query result in search inside the id or text
Ex:
id = '4'
id = '44'
Search using match query with id = 4 return both 4 & 44 since it matches 4 in both. This is where terms query come into play.
same search using terms query will return 4 only.
So the accepted is absolutely wrong. Use the #Rahul answer. Just one more thing you need to do, Instead of text you need to analyse the field as a keyword
Example for indexing a field both as a text and keyword (mapping is for flat level for nested change it accordingly).
{
"index_patterns": [ "test" ],
"mappings": {
"kb_mapping_doc": {
"_source": {
"enabled": true
},
"properties": {
"id": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
}
using #Rahul's answer doesn't worked because you might be analysed as a text.
id - access a text field
id.keyword - access a keyword field
it would be
{
"from": 0,
"size": 20,
"query": {
"bool": {
"should": [{
"match": {
"color": "yellow"
}
}],
"must": [{
"terms": {
"id.keyword": ["1", "2"]
}
}]
}
}
}
So I would say accepted answer will return falsy results Please use #Rahul's answer with the corresponding mapping.

Function_score using score_mode inside a bool query as max but working like sum?

I am using function_score so that i can use its score_mode as maximum score of the bool query i am using actually i have two boolean query inside should now i want the score of the document to be the maximum score among both queries my code is given below but when i am passing a string for matching both then scores are being added not be taken maximum can anyone please tell me how can i acheive that.
"function_score": {
"boost_mode": "max",
"score_mode": "max",
"query": {
bool: {
"disable_coord": true,
"should": [
{
bool: {
"disable_coord": true,
"must": [
{
"constant_score": { // here i am using this because to remove tf/idf factors from my scoring
boost: 1.04,
"query": {
query_string: {
query: location_search,
fields: ['places_city.city'],
// boost: 1.04
}
}
}
}
]
}
},
{
"constant_score": { // here i am using this because to remove tf/idf factors from my scoring
boost: 1,
"query": {
"fuzzy_like_this" : {
"fields" : ["places_city.city"],
"like_text" : "bangaloremn",
"prefix_length": 3,
"fuzziness": 2
}
}
}
}
], "minimum_should_match": 1
}
}
}
Yes boolean query takes a sum by design. If you want the maximum score of two queries, you ought to look at the dismax query. Dismax is designed to pick a "winner".
Roughly speaking, this would look like
{"query":
"dismax": {
"queries": [
{ /* your first constant_score query above */},
{/* your second constant_score query from above */}
]
}
}
Unfortunately, function score query doesn't have a great way of operating on more than one text query at a time. See this question. If you want to do any complex math with the scores of multiple queries, Solr actually has a lot more flexibility in this area.

Resources