Elasticsearch: unexpected relevancy score for optional fields in documents - elasticsearch

I'm probably missing something trivial here, but I'm having issues with the relevancy score of the search results when it comes to optional fields in documents. Consider the following example:
Test data:
DELETE /my-index
PUT /my-index
POST /my-index/_bulk
{"index":{"_id":"1"}}
{"required_field":"RareWord"}
{"index":{"_id":"2"}}
{"required_field":"RareWord"}
{"index":{"_id":"3"}}
{"required_field":"CommonWord"}
{"index":{"_id":"4"}}
{"required_field":"CommonWord"}
{"index":{"_id":"5"}}
{"required_field":"CommonWord"}
{"index":{"_id":"6"}}
{"required_field":"CommonWord"}
{"index":{"_id":"7"}}
{"required_field":"CommonWord"}
{"index":{"_id":"8"}}
{"required_field":"CommonWord"}
{"index":{"_id":"9"}}
{"required_field":"CommonWord","optional_field":"RareWord AnotherRareWord"}
{"index":{"_id":"10"}}
{"required_field":"CommonWord","optional_field":"RareWord AnotherRareWord"}
Search Query:
If I run a search query similar to one below:
GET /my-index/_search
{"query":{"multi_match":{"query":"RareWord AnotherRareWord","fields":["required_field","optional_field"]}}}
Expectation
The end-user would expect Document #9 and #10 to score higher than others, because they contain the exact two words of the search query in their optional_field
Reality
Document #1 would score better than #10, even though it only contains one of the the two words of the search query; which is the opposite of what end-users most likely expect.
A closer look at _explain
Here is the _explain results of running the same search query for Document #1:
{
"_index" : "my-index",
"_type" : "_doc",
"_id" : "1",
"matched" : true,
"explanation" : {
"value" : 1.4816045,
"description" : "max of:",
"details" : [
{
"value" : 1.4816045,
"description" : "sum of:",
"details" : [
{
"value" : 1.4816045,
"description" : "weight(required_field:rareword in 0) [PerFieldSimilarity], result of:",
"details" : [
{
"value" : 1.4816045,
"description" : "score(freq=1.0), computed as boost * idf * tf from:",
"details" : [
{
"value" : 2.2,
"description" : "boost",
"details" : [ ]
},
{
"value" : 1.4816046,
"description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
"details" : [
{
"value" : 2,
"description" : "n, number of documents containing term",
"details" : [ ]
},
{
"value" : 10,
"description" : "N, total number of documents with field",
"details" : [ ]
}
]
},
{
"value" : 0.45454544,
"description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
"details" : [
{
"value" : 1.0,
"description" : "freq, occurrences of term within document",
"details" : [ ]
},
{
"value" : 1.2,
"description" : "k1, term saturation parameter",
"details" : [ ]
},
{
"value" : 0.75,
"description" : "b, length normalization parameter",
"details" : [ ]
},
{
"value" : 1.0,
"description" : "dl, length of field",
"details" : [ ]
},
{
"value" : 1.0,
"description" : "avgdl, average length of field",
"details" : [ ]
}
]
}
]
}
]
}
]
}
]
}
}
And here is the _explain results of running the same search query for Document #10:
{
"_index" : "my-index",
"_type" : "_doc",
"_id" : "10",
"matched" : true,
"explanation" : {
"value" : 0.36464313,
"description" : "max of:",
"details" : [
{
"value" : 0.36464313,
"description" : "sum of:",
"details" : [
{
"value" : 0.18232156,
"description" : "weight(optional_field:rareword in 9) [PerFieldSimilarity], result of:",
"details" : [
{
"value" : 0.18232156,
"description" : "score(freq=1.0), computed as boost * idf * tf from:",
"details" : [
{
"value" : 2.2,
"description" : "boost",
"details" : [ ]
},
{
"value" : 0.18232156,
"description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
"details" : [
{
"value" : 2,
"description" : "n, number of documents containing term",
"details" : [ ]
},
{
"value" : 2,
"description" : "N, total number of documents with field",
"details" : [ ]
}
]
},
{
"value" : 0.45454544,
"description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
"details" : [
{
"value" : 1.0,
"description" : "freq, occurrences of term within document",
"details" : [ ]
},
{
"value" : 1.2,
"description" : "k1, term saturation parameter",
"details" : [ ]
},
{
"value" : 0.75,
"description" : "b, length normalization parameter",
"details" : [ ]
},
{
"value" : 2.0,
"description" : "dl, length of field",
"details" : [ ]
},
{
"value" : 2.0,
"description" : "avgdl, average length of field",
"details" : [ ]
}
]
}
]
}
]
},
{
"value" : 0.18232156,
"description" : "weight(optional_field:anotherrareword in 9) [PerFieldSimilarity], result of:",
"details" : [
{
"value" : 0.18232156,
"description" : "score(freq=1.0), computed as boost * idf * tf from:",
"details" : [
{
"value" : 2.2,
"description" : "boost",
"details" : [ ]
},
{
"value" : 0.18232156,
"description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
"details" : [
{
"value" : 2,
"description" : "n, number of documents containing term",
"details" : [ ]
},
{
"value" : 2,
"description" : "N, total number of documents with field",
"details" : [ ]
}
]
},
{
"value" : 0.45454544,
"description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
"details" : [
{
"value" : 1.0,
"description" : "freq, occurrences of term within document",
"details" : [ ]
},
{
"value" : 1.2,
"description" : "k1, term saturation parameter",
"details" : [ ]
},
{
"value" : 0.75,
"description" : "b, length normalization parameter",
"details" : [ ]
},
{
"value" : 2.0,
"description" : "dl, length of field",
"details" : [ ]
},
{
"value" : 2.0,
"description" : "avgdl, average length of field",
"details" : [ ]
}
]
}
]
}
]
}
]
}
]
}
}
As you can see, Document #10 scores worse, mainly due to the lower IDF value (0.18232156). Looking closely, it's because IDF uses N, total number of documents with field: 2 instead of simply considering the total number of the documents in the index: 10.
Question
My question is that is there any way that I could force multi_match query to consider all the documents (instead of only those that contain the field) when computing the IDF value for an optional field, hence resulting in a relevance score which is closer to the expectations of the end-users?
Or alternatively, is there a better way to write the search query, so I get the expected results?
Any help would be greatly appreciated. Thanks.

Your situation seems to be similar to the one described in the cross_fields query type so you should probably try it:
{
"multi_match": {
"query": "RareWord AnotherRareWord",
"fields": ["required_field","optional_field"],
"type": "cross_fields",
"operator": "and"
}
}

Related

Elasticsearch re-indexing same document causing score changes

We have created an index with the document
POST sample-index-test/_doc/1
{
"first_name": "James",
"last_name" : "Osaka"
}
there is only one document in the index, when we are performing _explain api using match query on the index
GET sample-index-test/_explain/1
{
"query": {
"match": {
"first_name": "James"
}
}
}
Explain api returns below details
score : 0.2876821
number of documents containing term : 1
total number of documents with field : 1
{
"_index" : "sample-index-test",
"_type" : "_doc",
"_id" : "1",
"matched" : true,
"explanation" : {
"value" : 0.2876821,
"description" : "weight(first_name:james in 0) [PerFieldSimilarity], result of:",
"details" : [
{
"value" : 0.2876821,
"description" : "score(freq=1.0), computed as boost * idf * tf from:",
"details" : [
{
"value" : 2.2,
"description" : "boost",
"details" : [ ]
},
{
"value" : 0.2876821,
"description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
"details" : [
{
"value" : 1,
"description" : "n, number of documents containing term",
"details" : [ ]
},
{
"value" : 1,
"description" : "N, total number of documents with field",
"details" : [ ]
}
]
},
{
"value" : 0.45454544,
"description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
"details" : [
{
"value" : 1.0,
"description" : "freq, occurrences of term within document",
"details" : [ ]
},
{
"value" : 1.2,
"description" : "k1, term saturation parameter",
"details" : [ ]
},
{
"value" : 0.75,
"description" : "b, length normalization parameter",
"details" : [ ]
},
{
"value" : 1.0,
"description" : "dl, length of field",
"details" : [ ]
},
{
"value" : 1.0,
"description" : "avgdl, average length of field",
"details" : [ ]
}
]
}
]
}
]
}
}
Now, running the same index request multiple times in the span of seconds
POST sample-index-test/_doc/1
{
"first_name": "James",
"last_name" : "Cena"
}
Again running the same _explain api returns a different score with number of documents containing term and total number of documents with field.
score : 0.046520013
number of documents containing term : 10
total number of documents with field : 10
{
"_index" : "sample-index-test",
"_type" : "_doc",
"_id" : "1",
"matched" : true,
"explanation" : {
"value" : 0.046520013,
"description" : "weight(first_name:james in 0) [PerFieldSimilarity], result of:",
"details" : [
{
"value" : 0.046520013,
"description" : "score(freq=1.0), computed as boost * idf * tf from:",
"details" : [
{
"value" : 2.2,
"description" : "boost",
"details" : [ ]
},
{
"value" : 0.046520017,
"description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
"details" : [
{
"value" : 10,
"description" : "n, number of documents containing term",
"details" : [ ]
},
{
"value" : 10,
"description" : "N, total number of documents with field",
"details" : [ ]
}
]
},
{
"value" : 0.45454544,
"description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
"details" : [
{
"value" : 1.0,
"description" : "freq, occurrences of term within document",
"details" : [ ]
},
{
"value" : 1.2,
"description" : "k1, term saturation parameter",
"details" : [ ]
},
{
"value" : 0.75,
"description" : "b, length normalization parameter",
"details" : [ ]
},
{
"value" : 1.0,
"description" : "dl, length of field",
"details" : [ ]
},
{
"value" : 1.0,
"description" : "avgdl, average length of field",
"details" : [ ]
}
]
}
]
}
]
}
}
Why elasticsearch increasing count of total number of documents with field and number of documents containing term, same time index only contains a single document?
Elasticsearch using Lucene and all the documents stored in segments. And the segments are immutable, and document update is a 2-step process. When a document is updated, then a new document is created, and the old document is marked as deleted. So, when you create the first document in the segments, there are just only one documents. Then you update the same document 10 times, the number of deleted documents will be 9, and the latest document will be 1. For this reason, "the number of documents with field" and "number of documents containing term" is changing.
You can test with with using _forcemerge endpoint. Force Merge will merge the segments and clear the deleted documents from the segments.
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-forcemerge.html
## 1. Create the document
POST sample-index-test/_doc/1
{
"first_name": "James",
"last_name" : "Osaka"
}
## 2. Get the explain score
GET sample-index-test/_explain/1
{
"query": {
"match": {
"first_name": "James"
}
}
}
## "value": 0.2876821,
## n, number of documents containing term => 1
## N, total number of documents with field => 1
## 3.1. Execute this 10 times
POST sample-index-test/_doc/1
{
"first_name": "James",
"last_name" : "Cena"
}
## 3.2 You can execute this one also
POST sample-index-test/_update/1
{
"script" : "ctx._source.first_name = 'James'; ctx._source.last_name = 'Cena';"
}
## 3.3 Even you can use _update_by_query
POST sample-index-test/_update_by_query
{
"query": {
"match": {
"first_name": "James"
}
},
"script": {
"source": "ctx._source.first_name = 'James'; ctx._source.last_name = 'Cena';",
"lang": "painless"
}
}
## 4. Get the explain score
GET sample-index-test/_explain/1
{
"query": {
"match": {
"first_name": "James"
}
}
}
## "value": 0.046520013,
## n, number of documents containing term => 10
## N, total number of documents with field => 10
## 5. Execute the force merge.
POST sample-index-test/_forcemerge
## 6. The ForceMerge will start in the background. So, you need to wait a couple of seconds.
GET sample-index-test/_explain/1
{
"query": {
"match": {
"first_name": "James"
}
}
}
## "value": 0.2876821,
## n, number of documents containing term => 1
## N, total number of documents with field => 1

Normalization of term frequency in elasticsearch

I recently started working with elasticsearch (version 7.17.2) and there is something related to term frequency normalization and boosting that I don't quite understand.
To keep it simple, suppose I just create an index with
PUT test
and add a couple of documents
POST test/_doc/1
{
"firstname": "foo",
"lastname": "bar"
}
POST test/_doc/2
{
"firstname": "foo",
"lastname": "baz"
}
Now I want to perform the following search
POST test/_search
{
"explain": true,
"query": {
"bool": {
"should": {
"multi_match": {
"fields": [
"firstname^3",
"lastname^5"
],
"query": "foo bar"
}
}
}
}
}
which returns
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 3.465736,
"hits" : [
{
"_shard" : "[test][0]",
"_node" : "Or9Q1aPLTi-liJvA8NJW6g",
"_index" : "test",
"_type" : "_doc",
"_id" : "1",
"_score" : 3.465736,
"_source" : {
"firstname" : "foo",
"lastname" : "bar"
},
"_explanation" : {
"value" : 3.465736,
"description" : "max of:",
"details" : [
{
"value" : 0.5469647,
"description" : "sum of:",
"details" : [
{
"value" : 0.5469647,
"description" : "weight(firstname:foo in 0) [PerFieldSimilarity], result of:",
"details" : [
{
"value" : 0.5469647,
"description" : "score(freq=1.0), computed as boost * idf * tf from:",
"details" : [
{
"value" : 6.6000004,
"description" : "boost",
"details" : [ ]
},
{
"value" : 0.18232156,
"description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
"details" : [
{
"value" : 2,
"description" : "n, number of documents containing term",
"details" : [ ]
},
{
"value" : 2,
"description" : "N, total number of documents with field",
"details" : [ ]
}
]
},
{
"value" : 0.45454544,
"description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
"details" : [
{
"value" : 1.0,
"description" : "freq, occurrences of term within document",
"details" : [ ]
},
{
"value" : 1.2,
"description" : "k1, term saturation parameter",
"details" : [ ]
},
{
"value" : 0.75,
"description" : "b, length normalization parameter",
"details" : [ ]
},
{
"value" : 1.0,
"description" : "dl, length of field",
"details" : [ ]
},
{
"value" : 1.0,
"description" : "avgdl, average length of field",
"details" : [ ]
}
]
}
]
}
]
}
]
},
{
"value" : 3.465736,
"description" : "sum of:",
"details" : [
{
"value" : 3.465736,
"description" : "weight(lastname:bar in 0) [PerFieldSimilarity], result of:",
"details" : [
{
"value" : 3.465736,
"description" : "score(freq=1.0), computed as boost * idf * tf from:",
"details" : [
{
"value" : 11.0,
"description" : "boost",
"details" : [ ]
},
{
"value" : 0.6931472,
"description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
"details" : [
{
"value" : 1,
"description" : "n, number of documents containing term",
"details" : [ ]
},
{
"value" : 2,
"description" : "N, total number of documents with field",
"details" : [ ]
}
]
},
{
"value" : 0.45454544,
"description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
"details" : [
{
"value" : 1.0,
"description" : "freq, occurrences of term within document",
"details" : [ ]
},
{
"value" : 1.2,
"description" : "k1, term saturation parameter",
"details" : [ ]
},
{
"value" : 0.75,
"description" : "b, length normalization parameter",
"details" : [ ]
},
{
"value" : 1.0,
"description" : "dl, length of field",
"details" : [ ]
},
{
"value" : 1.0,
"description" : "avgdl, average length of field",
"details" : [ ]
}
]
}
]
}
]
}
]
}
]
}
},
{
"_shard" : "[test][0]",
"_node" : "Or9Q1aPLTi-liJvA8NJW6g",
"_index" : "test",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.5469647,
"_source" : {
"firstname" : "foo",
"lastname" : "baz"
},
"_explanation" : {
"value" : 0.5469647,
"description" : "max of:",
"details" : [
{
"value" : 0.5469647,
"description" : "sum of:",
"details" : [
{
"value" : 0.5469647,
"description" : "weight(firstname:foo in 0) [PerFieldSimilarity], result of:",
"details" : [
{
"value" : 0.5469647,
"description" : "score(freq=1.0), computed as boost * idf * tf from:",
"details" : [
{
"value" : 6.6000004,
"description" : "boost",
"details" : [ ]
},
{
"value" : 0.18232156,
"description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
"details" : [
{
"value" : 2,
"description" : "n, number of documents containing term",
"details" : [ ]
},
{
"value" : 2,
"description" : "N, total number of documents with field",
"details" : [ ]
}
]
},
{
"value" : 0.45454544,
"description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
"details" : [
{
"value" : 1.0,
"description" : "freq, occurrences of term within document",
"details" : [ ]
},
{
"value" : 1.2,
"description" : "k1, term saturation parameter",
"details" : [ ]
},
{
"value" : 0.75,
"description" : "b, length normalization parameter",
"details" : [ ]
},
{
"value" : 1.0,
"description" : "dl, length of field",
"details" : [ ]
},
{
"value" : 1.0,
"description" : "avgdl, average length of field",
"details" : [ ]
}
]
}
]
}
]
}
]
}
]
}
}
]
}
}
I purposedly gave more relevance to lastname with respect to firstname (5 vs. 3). In the explanation, for instance for the contribution of firstname:foo, the score is computed as boost * idf * tf.
While I gave the field firstname a relevance boost of 3, its actual boost according to the explanation is 6.6. After some investigation, I figured out that this value corresponds to 3 * (1.2 + 1), that is my boost of 3 mutiplied by (k_1 + 1), where k_1 corresponds to the parameter of the default BM25 similarity function, whose default value is 1.2.
I know this might be related to some normalization that elasticsearch performs behind the scenes (whose documentation is rather poor), but I have seen this happening in two ways:
Exactly as in this example, with tf = freq / (freq + k1 * (1 - b + b * dl / avgdl)).
Like they do it on wikipedia, with tfNorm = (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)). Notice already that the value is called tfNorm instead of just tf, and that the (k1 + 1) factor appears explicitly in the tfNorm and not "hidden" in the boost. Here are the wikipedia elasticsearch settings and mappings, in case they help.
What I would like to clarify is what is the difference between these two behaviors and how to switch between them, perhaps by updating the mapping.
BONUS QUESTION: Actually, there is a third option, that we can see in the same wikipedia example, searching for the field all_near_match. There, tfNorm = (freq * (k1 + 1)) / (freq + k1), and there is an annotation saying that the b parameter in the BM25 similarity function is 0 because norms omitted for field. How does this other approach relate with the other two I described above?
Thank you very much!

Why Elastic search is returning wrong relevance score?

I am learning elastic search, I inserted the following data in the megacorp index having the type employee:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 0.6931472,
"hits" : [
{
"_index" : "megacorp",
"_type" : "employee",
"_id" : "2",
"_score" : 0.6931472,
"_source" : {
"first_name" : "Jane",
"last_name" : "Smith",
"age" : 32,
"about" : "I like to collect rock albums",
"interests" : [
"music"
]
}
},
{
"_index" : "megacorp",
"_type" : "employee",
"_id" : "1",
"_score" : 0.2876821,
"_source" : {
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests" : [
"sports",
"music"
]
}
}
]
}
}
Then I ran the following request:
GET /megacorp/employee/_search
{
"query" : {
"match" : {
"about" : "rock climbing"
}
}
}
However the result I got is as follows:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 0.6682933,
"hits" : [
{
"_index" : "megacorp",
"_type" : "employee",
"_id" : "2",
"_score" : 0.6682933,
"_source" : {
"first_name" : "Jane",
"last_name" : "Smith",
"age" : 32,
"about" : "I like to collect rock albums",
"interests" : [
"music"
]
}
},
{
"_index" : "megacorp",
"_type" : "employee",
"_id" : "1",
"_score" : 0.5753642,
"_source" : {
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests" : [
"sports",
"music"
]
}
}
]
}
}
I have the doubt that relevance score for the following record:
{
"_index" : "megacorp",
"_type" : "employee",
"_id" : "1",
"_score" : 0.5753642,
"_source" : {
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests" : [
"sports",
"music"
]
}
}
is lesser than the previous one. I ran the query with
explain: true
and got the following result:
{
"took" : 4,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 0.6682933,
"hits" : [
{
"_shard" : "[megacorp][2]",
"_node" : "pGtCz_FvSTmteJwQKvn_lg",
"_index" : "megacorp",
"_type" : "employee",
"_id" : "2",
"_score" : 0.6682933,
"_source" : {
"first_name" : "Jane",
"last_name" : "Smith",
"age" : 32,
"about" : "I like to collect rock albums",
"interests" : [
"music"
],
"fielddata" : true
},
"_explanation" : {
"value" : 0.6682933,
"description" : "sum of:",
"details" : [
{
"value" : 0.6682933,
"description" : "weight(about:rock in 0) [PerFieldSimilarity], result of:",
"details" : [
{
"value" : 0.6682933,
"description" : "score(doc=0,freq=1.0 = termFreq=1.0\n), product of:",
"details" : [
{
"value" : 0.6931472,
"description" : "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
"details" : [
{
"value" : 1.0,
"description" : "docFreq",
"details" : [ ]
},
{
"value" : 2.0,
"description" : "docCount",
"details" : [ ]
}
]
},
{
"value" : 0.96414346,
"description" : "tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
"details" : [
{
"value" : 1.0,
"description" : "termFreq=1.0",
"details" : [ ]
},
{
"value" : 1.2,
"description" : "parameter k1",
"details" : [ ]
},
{
"value" : 0.75,
"description" : "parameter b",
"details" : [ ]
},
{
"value" : 5.5,
"description" : "avgFieldLength",
"details" : [ ]
},
{
"value" : 6.0,
"description" : "fieldLength",
"details" : [ ]
}
]
}
]
}
]
}
]
}
},
{
"_shard" : "[megacorp][3]",
"_node" : "pGtCz_FvSTmteJwQKvn_lg",
"_index" : "megacorp",
"_type" : "employee",
"_id" : "1",
"_score" : 0.5753642,
"_source" : {
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests" : [
"sports",
"music"
],
"fielddata" : true
},
"_explanation" : {
"value" : 0.5753642,
"description" : "sum of:",
"details" : [
{
"value" : 0.2876821,
"description" : "weight(about:rock in 0) [PerFieldSimilarity], result of:",
"details" : [
{
"value" : 0.2876821,
"description" : "score(doc=0,freq=1.0 = termFreq=1.0\n), product of:",
"details" : [
{
"value" : 0.2876821,
"description" : "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
"details" : [
{
"value" : 1.0,
"description" : "docFreq",
"details" : [ ]
},
{
"value" : 1.0,
"description" : "docCount",
"details" : [ ]
}
]
},
{
"value" : 1.0,
"description" : "tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
"details" : [
{
"value" : 1.0,
"description" : "termFreq=1.0",
"details" : [ ]
},
{
"value" : 1.2,
"description" : "parameter k1",
"details" : [ ]
},
{
"value" : 0.75,
"description" : "parameter b",
"details" : [ ]
},
{
"value" : 6.0,
"description" : "avgFieldLength",
"details" : [ ]
},
{
"value" : 6.0,
"description" : "fieldLength",
"details" : [ ]
}
]
}
]
}
]
},
{
"value" : 0.2876821,
"description" : "weight(about:climbing in 0) [PerFieldSimilarity], result of:",
"details" : [
{
"value" : 0.2876821,
"description" : "score(doc=0,freq=1.0 = termFreq=1.0\n), product of:",
"details" : [
{
"value" : 0.2876821,
"description" : "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
"details" : [
{
"value" : 1.0,
"description" : "docFreq",
"details" : [ ]
},
{
"value" : 1.0,
"description" : "docCount",
"details" : [ ]
}
]
},
{
"value" : 1.0,
"description" : "tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
"details" : [
{
"value" : 1.0,
"description" : "termFreq=1.0",
"details" : [ ]
},
{
"value" : 1.2,
"description" : "parameter k1",
"details" : [ ]
},
{
"value" : 0.75,
"description" : "parameter b",
"details" : [ ]
},
{
"value" : 6.0,
"description" : "avgFieldLength",
"details" : [ ]
},
{
"value" : 6.0,
"description" : "fieldLength",
"details" : [ ]
}
]
}
]
}
]
}
]
}
}
]
}
}
Can you please tell me what is the reason behind this?
Short answer: Relevance in Elasticsearch is not a simple topic :) Details below.
I was trying to reproduce your case...
First I've put the two documents:
POST /megacorp/employee/1
{
"first_name": "John",
"last_name": "Smith",
"age": 25,
"about": "I love to go rock climbing",
"interests": [
"sports",
"music"
]
}
POST /megacorp/employee/2
{
"first_name": "Jane",
"last_name": "Smith",
"age": 32,
"about": "I like to collect rock albums",
"interests": [
"music"
]
}
and later I used your query:
GET /megacorp/employee/_search
{
"query": {
"match": {
"about": "rock climbing"
}
}
}
My results were totally different:
{
"took": 89,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0.5753642,
"hits": [
{
"_index": "megacorp",
"_type": "employee",
"_id": "1",
"_score": 0.5753642,
"_source": {
"first_name": "John",
"last_name": "Smith",
"age": 25,
"about": "I love to go rock climbing",
"interests": [
"sports",
"music"
]
}
},
{
"_index": "megacorp",
"_type": "employee",
"_id": "2",
"_score": 0.2876821,
"_source": {
"first_name": "Jane",
"last_name": "Smith",
"age": 32,
"about": "I like to collect rock albums",
"interests": [
"music"
]
}
}
]
}
}
As you can see results are in "expected" order. Please note that the _score values are totally different than you.
The question is: Why? What happened?
The detailed answer for this situation was described in the Practical BM25 - Part 1: How Shards Affect Relevance Scoring in Elasticsearch article.
Shortly: as you probably could notice Elasticsearch stores documents split among shards. To be faster, by default it uses query_then_fetch strategy. This means that Elasticsearch first asks for results on every shard and later will fetch the results and present them to the user. Of course the same happens with the score calculation.
As you can see, in our results 5 shards where queried. Elasticsearch is using 5 shards by default if not specified on index creation (can be specified with number_of_shards param). That is why our scores are different. Moreover, if you try to do this again yourself there is a big chance that you get different results once again. Everything depends on how the document is distributed among shards. If you set number_of_shards to 1 for this index you will be getting the same scores each time.
An additional thing, also mentioned in the article is:
People start loading just a few documents into their index and ask
“why does document A have a higher/lower score than document B” and
sometimes the answer is that the user has a relatively high ratio of
shards to documents so that the scores are skewed across different
shards.
Elasticsearch was designed to maintain a large amount of data and the more data you put into an index, the more accurate the results you get.
I hope my answer explains your doubts.

Elastic search different query norm across shards

I'm rather new to ES and I have been studying scoring in ES in an attempt to improve the quality of search results. I have come across a situation in which the queryNorm function is very different (5X as large) across shards. I can see the dependency on the idf for the terms in the query, which can be different across shards. However, in my case, I have a single search term + the idf measure across shards are close to each other (definitely not enough to cause the X 5 times difference). I will briefly describe my setup, including my query and the result from the explain endpoint.
Setup
I have an index with ~ 6500 docs which are distributed across 5 shards. I mention there are no index time boosts on the fields that appear in the query below. I mention my setup uses ES 2.4 with "query_then_fetch". My query:
{
"query" : {
"bool" : {
"must" : [ {
"bool" : {
"must" : [ ],
"must_not" : [ ],
"should" : [ {
"multi_match" : {
"query" : "pds",
"fields" : [ "field1" ],
"lenient" : true,
"fuzziness" : "0"
}
}, {
"multi_match" : {
"query" : "pds",
"fields" : [ "field2" ],
"lenient" : true,
"fuzziness" : "0",
"boost" : 1000.0
}
}, {
"multi_match" : {
"query" : "pds",
"fields" : [ "field3" ],
"lenient" : true,
"fuzziness" : "0",
"boost" : 500.0
}
}, {
"multi_match" : {
"query" : "pds",
"fields" : [ "field4" ],
"lenient" : true,
"fuzziness" : "0",
"boost": 100.0
}
} ],
"must_not" : [ ],
"should" : [ ],
"filter" : [ ]
}
},
"size" : 1000,
"min_score" : 0.0
}
Explain output for 2 of the documents (one having query norm 5X times as large as the other one):
{
"_shard" : 4,
"_explanation" : {
"value" : 2.046937,
"description" : "product of:",
"details" : [ {
"value" : 4.093874,
"description" : "sum of:",
"details" : [ {
"value" : 0.112607226,
"description" : "weight(field1:pds in 93) [PerFieldSimilarity], result of:",
"details" : [ {
"value" : 0.112607226,
"description" : "score(doc=93,freq=1.0), product of:",
"details" : [ {
"value" : 0.019996,
"description" : "queryWeight, product of:",
"details" : [ {
"value" : 2.0,
"description" : "boost",
"details" : [ ]
}, {
"value" : 5.6314874,
"description" : "idf(docFreq=11, maxDocs=1232)",
"details" : [ ]
}, {
"value" : 0.0017753748,
"description" : "queryNorm",
"details" : [ ]
} ]
}, {
"value" : 5.6314874,
"description" : "fieldWeight in 93, product of:",
"details" : [ {
"value" : 1.0,
"description" : "tf(freq=1.0), with freq of:",
"details" : [ {
"value" : 1.0,
"description" : "termFreq=1.0",
"details" : [ ]
} ]
}, {
"value" : 5.6314874,
"description" : "idf(docFreq=11, maxDocs=1232)",
"details" : [ ]
}, {
"value" : 1.0,
"description" : "fieldNorm(doc=93)",
"details" : [ ]
} ]
} ]
} ]
}, {
"value" : 3.9812667,
"description" : "weight(field4:pds in 93) [PerFieldSimilarity], result of:",
"details" : [ {
"value" : 3.9812667,
"description" : "score(doc=93,freq=2.0), product of:",
"details" : [ {
"value" : 0.9998001,
"description" : "queryWeight, product of:",
"details" : [ {
"value" : 100.0,
"description" : "boost",
"details" : [ ]
}, {
"value" : 5.6314874,
"description" : "idf(docFreq=11, maxDocs=1232)",
"details" : [ ]
}, {
"value" : 0.0017753748,
"description" : "queryNorm",
"details" : [ ]
} ]
}, {
"value" : 3.9820628,
"description" : "fieldWeight in 93, product of:",
"details" : [ {
"value" : 1.4142135,
"description" : "tf(freq=2.0), with freq of:",
"details" : [ {
"value" : 2.0,
"description" : "termFreq=2.0",
"details" : [ ]
} ]
}, {
"value" : 5.6314874,
"description" : "idf(docFreq=11, maxDocs=1232)",
"details" : [ ]
}, {
"value" : 0.5,
"description" : "fieldNorm(doc=93)",
"details" : [ ]
} ]
} ]
} ]
} ]
}, {
"value" : 0.5,
"description" : "coord(2/4)",
"details" : [ ]
} ]
}
},
{
"_shard" : 2,
"_explanation" : {
"value" : 0.4143453,
"description" : "product of:",
"details" : [ {
"value" : 0.8286906,
"description" : "sum of:",
"details" : [ {
"value" : 0.018336227,
"description" : "weight(field1:pds in 58) [PerFieldSimilarity], result of:",
"details" : [ {
"value" : 0.018336227,
"description" : "score(doc=58,freq=1.0), product of:",
"details" : [ {
"value" : 0.0030464241,
"description" : "queryWeight, product of:",
"details" : [ {
"value" : 2.0,
"description" : "boost",
"details" : [ ]
}, {
"value" : 6.0189342,
"description" : "idf(docFreq=11, maxDocs=1815)",
"details" : [ ]
}, {
"value" : 2.5307006E-4,
"description" : "queryNorm",
"details" : [ ]
} ]
}, {
"value" : 6.0189342,
"description" : "fieldWeight in 58, product of:",
"details" : [ {
"value" : 1.0,
"description" : "tf(freq=1.0), with freq of:",
"details" : [ {
"value" : 1.0,
"description" : "termFreq=1.0",
"details" : [ ]
} ]
}, {
"value" : 6.0189342,
"description" : "idf(docFreq=11, maxDocs=1815)",
"details" : [ ]
}, {
"value" : 1.0,
"description" : "fieldNorm(doc=58)",
"details" : [ ]
} ]
} ]
} ]
}, {
"value" : 0.81035435,
"description" : "weight(field4:pds in 58) [PerFieldSimilarity], result of:",
"details" : [ {
"value" : 0.81035435,
"description" : "score(doc=58,freq=2.0), product of:",
"details" : [ {
"value" : 0.1523212,
"description" : "queryWeight, product of:",
"details" : [ {
"value" : 100.0,
"description" : "boost",
"details" : [ ]
}, {
"value" : 6.0189342,
"description" : "idf(docFreq=11, maxDocs=1815)",
"details" : [ ]
}, {
"value" : 2.5307006E-4,
"description" : "queryNorm",
"details" : [ ]
} ]
}, {
"value" : 5.3200364,
"description" : "fieldWeight in 58, product of:",
"details" : [ {
"value" : 1.4142135,
"description" : "tf(freq=2.0), with freq of:",
"details" : [ {
"value" : 2.0,
"description" : "termFreq=2.0",
"details" : [ ]
} ]
}, {
"value" : 6.0189342,
"description" : "idf(docFreq=11, maxDocs=1815)",
"details" : [ ]
}, {
"value" : 0.625,
"description" : "fieldNorm(doc=58)",
"details" : [ ]
} ]
} ]
} ]
} ]
}, {
"value" : 0.5,
"description" : "coord(2/4)",
"details" : [ ]
} ]
}
}
Notice how the queryNorm on field1 from the document in shard 4 is "0.0017753748" (with idf 5.6314874), while the queryNorm for the same field for doc in shard 2 is "0.0002.5307006" (with idf 6.0189342). I've tried to follow by hand the calculation for queryNorm using the formula on http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html , but failed to achieve the same answers.
I haven't seen too many threads / posts regarding calculating queryNorm ; one which I've found useful is http://www.openjems.com/tag/querynorm/ (this is actually Solr, but since the query is "query_then_fetch" ; the Lucene calculations should be the only thing that matter, so I expect they should behave similarly). However, I couldn't derive the right queryNorm values using the same approach (as fast as I understand, t.getBoost() should be 1 in my case since there are no index time field boosts + no special field boost in the query above).
Does anyone have any suggestion as to what might be going on here?
You can set search_type to be equal dfs_query_then_fetch:
{
"search_type": "dfs_query_then_fetch",
"query": {
"bool": {
"must": [
{
"bool": {
"must": [],
"must_not": [],
"should": [
{
"multi_match": {
"query": "pds",
"fields": [
"field1"
],
"lenient": true,
"fuzziness": "0"
}
},
{
"multi_match": {
"query": "pds",
"fields": [
"field2"
],
"lenient": true,
"fuzziness": "0",
"boost": 1000.0
}
}
]
}
},
{
"multi_match": {
"query": "pds",
"fields": [
"field3"
],
"lenient": true,
"fuzziness": "0",
"boost": 500.0
}
},
{
"multi_match": {
"query": "pds",
"fields": [
"field4"
],
"lenient": true,
"fuzziness": "0",
"boost": 100.0
}
}
],
"must_not": [],
"should": [],
"filter": []
}
},
"size": 1000,
"min_score": 0.0
}
In this case all norm values will be global. But it may impact the query performance. If your index is small, you can also create an index with a single shard. But if you have much more documents, these values should be that different.

Why is queryWeight included for some result scores, but not others, in the same query?

I'm executing a query_string query with one term on multiple fields, _all and tags.name, and trying to understand the scoring. Query: {"query":{"query_string":{"query":"animal","fields":["_all","tags.name"]}}}. Here are the documents returned by the query:
Document 1 has an exact match on tags.name, but not on _all.
Document 8 has an exact match on both tags.name and on _all.
Document 8 should win, and it does, but I'm confused by how the scoring works out. It seems like Document 1 is getting penalized by having its tags.name score multiplied by the IDF twice, whereas Document 8's tags.name score is only multiplied by the IDF once. In short:
They both have a component weight(tags.name:animal in 0) [PerFieldSimilarity].
In Document 1, we have weight = score = queryWeight x fieldWeight.
In Document 8, we have weight = fieldWeight!
Since queryWeight contains idf, this results in Document 1 getting penalized by its idf twice.
Can anyone make sense of this?
Additional information
If I remove _all from the fields of the query, queryWeight is completely gone from the explain.
Adding "use_dis_max":true as an option has no effect.
However, additionally adding "tie_breaker":0.7 (or any value) does affect Document 8 by giving it the more-complicated formula we see in Document 1.
Thoughts: It's plausible that a boolean query (which this is) might do this on purpose to give more weight to queries that match more than one sub-query. However, this doesn't make any sense for a dis_max query, which is supposed to just return the maximum of the sub-queries.
Here are the relevant explain requests. Look for embedded comments.
Document 1 (match only on tags.name):
curl -XGET 'http://localhost:9200/questions/question/1/_explain?pretty' -d '{"query":{"query_string":{"query":"animal","fields":["_all","tags.name"]}}}':
{
"ok" : true,
"_index" : "questions_1390104463",
"_type" : "question",
"_id" : "1",
"matched" : true,
"explanation" : {
"value" : 0.058849156,
"description" : "max of:",
"details" : [ {
"value" : 0.058849156,
"description" : "weight(tags.name:animal in 0) [PerFieldSimilarity], result of:",
// weight = score = queryWeight x fieldWeight
"details" : [ {
// score and queryWeight are NOT a part of the other explain!
"value" : 0.058849156,
"description" : "score(doc=0,freq=1.0 = termFreq=1.0\n), product of:",
"details" : [ {
"value" : 0.30685282,
"description" : "queryWeight, product of:",
"details" : [ {
// This idf is NOT a part of the other explain!
"value" : 0.30685282,
"description" : "idf(docFreq=1, maxDocs=1)"
}, {
"value" : 1.0,
"description" : "queryNorm"
} ]
}, {
"value" : 0.19178301,
"description" : "fieldWeight in 0, product of:",
"details" : [ {
"value" : 1.0,
"description" : "tf(freq=1.0), with freq of:",
"details" : [ {
"value" : 1.0,
"description" : "termFreq=1.0"
} ]
}, {
"value" : 0.30685282,
"description" : "idf(docFreq=1, maxDocs=1)"
}, {
"value" : 0.625,
"description" : "fieldNorm(doc=0)"
} ]
} ]
} ]
} ]
}
Document 8 (match on both _all and tags.name):
curl -XGET 'http://localhost:9200/questions/question/8/_explain?pretty' -d '{"query":{"query_string":{"query":"animal","fields":["_all","tags.name"]}}}':
{
"ok" : true,
"_index" : "questions_1390104463",
"_type" : "question",
"_id" : "8",
"matched" : true,
"explanation" : {
"value" : 0.15342641,
"description" : "max of:",
"details" : [ {
"value" : 0.033902764,
"description" : "btq, product of:",
"details" : [ {
"value" : 0.033902764,
"description" : "weight(_all:anim in 0) [PerFieldSimilarity], result of:",
"details" : [ {
"value" : 0.033902764,
"description" : "fieldWeight in 0, product of:",
"details" : [ {
"value" : 0.70710677,
"description" : "tf(freq=0.5), with freq of:",
"details" : [ {
"value" : 0.5,
"description" : "phraseFreq=0.5"
} ]
}, {
"value" : 0.30685282,
"description" : "idf(docFreq=1, maxDocs=1)"
}, {
"value" : 0.15625,
"description" : "fieldNorm(doc=0)"
} ]
} ]
}, {
"value" : 1.0,
"description" : "allPayload(...)"
} ]
}, {
"value" : 0.15342641,
"description" : "weight(tags.name:animal in 0) [PerFieldSimilarity], result of:",
// weight = fieldWeight
// No score or queryWeight in sight!
"details" : [ {
"value" : 0.15342641,
"description" : "fieldWeight in 0, product of:",
"details" : [ {
"value" : 1.0,
"description" : "tf(freq=1.0), with freq of:",
"details" : [ {
"value" : 1.0,
"description" : "termFreq=1.0"
} ]
}, {
"value" : 0.30685282,
"description" : "idf(docFreq=1, maxDocs=1)"
}, {
"value" : 0.5,
"description" : "fieldNorm(doc=0)"
} ]
} ]
} ]
}
}
I've no answer. Just want to mention I posted question to the Elasticsearch forum: https://groups.google.com/forum/#!topic/elasticsearch/xBKlFkq0SP0
I'll notify here when I'll get the answer.

Resources