Elasticsearch - Nested Query Boost in function_score? - elasticsearch

My question is about the boost function in elasticsearch (I've read their docs, and it's still quite unclear). Will the following "boost_mode" : "sum" apply to the boosts within the matches? Or since it's outside the enclosure perhaps it's just the sum of the final result, which is just the same as the default. I've got many fields and a vector of values - I want the scoring to be additive and not multiplicative. If the following does not work - any suggestions or pointers would be appreciated. Thanks!
"""
| "query": {
| "function_score": {
| "boost_mode": "sum",
| "query": {
| "bool": {
| "should": [
| { "match": { "someField": { "query": "someValue", "boost": 2 } } },
| { "match": { "someOtherField": { "query": "someOtherValue", "boost": 3 } } }
| }
| }
| }
| }
"""

The way the sum boost mode works is that it computes the score according to the following formula:
queryBoost * (queryScore + Math.min(funcScore, maxBoost))
where:
queryBoost is the value of the boost parameter inside your function score, since there is none, it defaults to 1.0f
queryScore is the normal score of the query, in your case it's variable and depends on the searched terms and the additional boost you're setting in your match queries
funcScore is the result of the multiplication of the score of each of your filter functions, defaults to 1.0f
maxBoost is the value of the max_boost parameter inside your function score, since there is none, it defaults to Float.MAX_VALUE
Also worth noting is that since you have no filter functions, there is no funcScore to compute and the overall score is simply the queryScore. So based what precedes, the formula can be simplified to
queryScore
which means in the end that your overall score is directly related to your query score
A good thing is also to pass ?explain=true in your query so you can get more insights into how the score was computed. In your case, since you have no filter functions, the boost_mode is simply not used at all and the query score is returned instead.
If you were to add a functions parameter with one or more score functions, then the result would be different as a funcScore could be computed.

Related

What is the difference between must and filter in Query DSL in elasticsearch?

I am new to elastic search and I am confused between must and filter. I want to perform an and operation between my terms, so I did this
POST /xyz/_search
{
"query": {
"bool": {
"must": [
{
"term": {
"city": "city1"
}
},
{
"term": {
"saleType": "sale_type1"
}
}
]
}
}
}
which gave me the required results matching both the terms, and on using filter like this
POST /xyz/_search
{
"query": {
"bool": {
"must": [
{
"term": {
"city": "city1"
}
}
],
"filter": {
"term": {
"saleType": "sale_type1"
}
}
}
}
}
I get the same result, so when should I use must and when should I use filter? What is the difference?
must contributes to the score. In filter, the score of the query is ignored.
In both must and filter, the clause(query) must appear in matching documents. This is the reason for getting same results.
You may check this link
Score
The relevance score of each document is represented by a positive floating-point number called the _score. The higher the _score, the more relevant the document.
A query clause generates a _score for each document.
To know how score is calculated, refer this link
must returns a score for every matching document. This score helps you rank the matching documents, and compare the relative relevance between documents (using the magnitude of the score of each document).
With this, one can say, Doc 1 is how many times more relevant than Doc 2. Or that Doc 1 to 7 are of much higher relevancy than Doc 8+.
For how the relative score is determined, you can refer to the references below.
Briefly, it is related to the number of term occurrences in the document, the document length, and the average number of term occurrences in your database index.
filter doesn't return a score. All one can say is, all matching documents are of relevance. But it won't help in evaluating if one is more relevant than the other. You can think of filter as a must with only 2 scores: zero or non-zero, and where all zero-scored documents are dropped.
filter is helpful if you just want to whitelist/blacklist for e.g., all documents belonging to the topic "pets".
In summary, there are 3 points that will help you in deciding when to use what:
must is your only choice when comparing/ranking documents by relevance
filter excludes all documents that don't match
filter is a lot faster because Elasticsearch doesn't need to compute the relative score
References:
Query vs Filter: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-filter-context.html
Computation of Relevance: https://www.infoq.com/articles/similarity-scoring-elasticsearch/

Elasticsearch, sorting by exact string match

I want to sort results, such that if one specific field (let's say 'first_name') is equal to an exact value (let's say 'Bob'), then those documents are returned first.
That would result in all documents where first_name is exactly 'Bob', would be returned first, and then all the other documents afterwards. Note that I don't intend to exclude documents where first_name is not 'Bob', merely sort them such that they're returned after all the Bobs.
I understand how numeric or alphabetical sorting works in Elasticsearch, but I can't find any part of the documentation covering this type of sorting.
Is this possible, and if so, how?
One solution is to manipulate the score of the results that contain the Bob in the first name field.
For example:
POST /test/users
{
"name": "Bob"
}
POST /test/users
{
"name": "Alice"
}
GET /test/users/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"name": {
"query": "Bob",
"boost" : 2
}
}
},
{
"match_all": {}
}
]
}
}
}
Would return both Bob and Alice in that order (with approximate scores of 1 and 0.2 respectively).
From the book:
Query-time boosting is the main tool that you can use to tune
relevance. Any type of query accepts a boost parameter. Setting a
boost of 2 doesn’t simply double the final _score; the actual boost
value that is applied goes through normalization and some internal
optimization. However, it does imply that a clause with a boost of 2
is twice as important as a clause with a boost of 1.
Meaning that if you also wanted "Fred" to come ahead of Bob you could just boost it with a 3 factor in the example above.

Can Elasticsearch do a decay search on the log of a value?

I store a number, views, in Elasticsearch. I want to find documents "closest" to it on a logarithmic scale, so that 10k and 1MM are the same distance (and get scored the same) from 100k views. Is that possible?
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html#exp-decay describes field value factor and decay functions but can they be "stacked"? Is there another approach?
I'm not sure if you can achieve this directly with decay, but you could easily do it with the script_score function. The example below uses dynamic scripting, but please be aware that using file-based scripts is the recommended, far more secure approach.
In the query below, the offset parameter is set to 100,000, and documents with that value for their 'views' field will score the highest. Score decays logarithmically as the value of views departs from offset. Per your example, documents with 1,000,000 and/or 10,000 have identical scores (0.30279312 in this formula).
You can invert the order of these results by changing the beginning of the script to multiply by _score instead of divide.
$ curl -XPOST localhost:9200/somestuff/_search -d '{
"size": 100,
"query": {
"bool": {
"must": [
{
"function_score": {
"functions": [
{
"script_score": {
"params": {
"offset": 100000
},
"script": "_score / (1 + ((log(offset) - log(doc['views'].value)).abs()))"
}
}
]
}
}
]
}
}
}'
Note: you may want to account for the possibility of 'views' being null, depending on your data.

change _score in elasticsearch to make equal to doc's score field

I have score (integer) field in data, I'm getting data from api, and posting it directly to localhost:9200//listings/
And I want the item _score to be equal to score field in data.
For now a solution is to add ?sort=score:desc to url
One solution is to use a function_score query, where you replace the default _score using a field_value_factor score function. It goes like this:
curl -XPOST localhost:9200/listings/_search -d '{
"query": {
"function_score": {
"functions": [
{
"field_value_factor": {
"field": "score", <---- we use the score field instead
"factor": 1, <---- take the exact same score
"missing": 1 <---- use 1 as score if the score field is missing
}
}
],
"query": {
"match_all": {}
},
"boost_mode": "replace" <---- we're replacing the default _score
}
}
}'
So we're basically computing the score using the score field multiplied by 1 and if any document doesn't have the score field we just assume the score to be 1 (you can change that to whatever makes more sense in your case).
UPDATE
According to your comment, you need the _score to be multiplied by the document's score field. You can achieve it simply by removing the boost_mode parameter, the default boost_mode is to multiply the _score with whatever value comes out of the field_value_factor function.
If you need to completely replace the default scoring mechanism to be based on your score field instead, there's a more complex way using the similarity module, where you can define another similarity algorithm solely for your score field. There is a great blog post explaining the nitty gritty details of the similarity module.

Constant Score Query elasticsearch boosting

My understanding of Constant Score Query in elasticsearch is that boost factor would be assigned as score for every matching query. The documentation says:
A query that wraps a filter or another query and simply returns a constant score equal to the query boost for every document in the filter.
However when I send this query:
"query": {
"constant_score": {
"filter": {
"term": {
"source": "BBC"
}
},
"boost": 3
}
},
"fields": ["title", "source"]
all the matching documents are given a score of 1?! I cannot figure out what I am doing wrong, and had also tried with query instead of filter in constant_score.
Scores are only meant to be relative to all other scores in a given result set, so a result set where everything has the score of 3 is the same as a result set where everything has the score of 1.
Really, the only purpose of the relevance _score is to sort the results of the current query in the correct order. You should not try to compare the relevance scores from different queries. - Elasticsearch Guide
Either the constant score is being ignored because it's not being combined with another query or it's being normalized. As #keety said, check to the output of explain to see exactly what's going on.
Constant score query gives equal score to any matching document irrespective any scoring factors like TF, IDF etc. This can be used when you don't care whether how much a doc matched but just if a doc matched or not and give a score too, unlike filter.
If you want score as 3 literally for all the matching documents for a particular query, then you should be using function score query, something like
"query": {
"function_score": {
"functions": [
{
"filter": { "term": { "source": "BBC" } },
"weight": 3
}
]
}
...
}

Resources