Score_mode avg returns 1 for all documents - elasticsearch

I'm using function score with score_mode avg and boost_mode replace.
And according to documentation I'm expecting the query function score to be overriden by the function score filters (because I'm using boost_mode replace).
This works as expected for the sum and multiply, but not for avg (I'm aware that the average in function score is a weighted average)
When I apply this function_score all the documents get a score of 1.
How can this happen?
GET kibana_sample_data_ecommerce/_search
{
"_source": {
"includes": ["customer_last_name", "customer_first_name", "customer_gender"]
},
"size": 10,
"query": {
"function_score": {
"functions": [
{
"filter": { "match": { "customer_last_name": "Cook" } },
"weight": 2
},
{
"filter": { "match": { "customer_first_name": "Jackson" } },
"weight": 4
},
{
"filter": { "match": { "customer_gender" : "MALE"} },
"weight": 8
}
],
"score_mode": "avg",
"boost_mode": "replace"
}
}
}

So this is a bit weird, but the link provided by #jzzfs is already pretty close. The average mode of the function score query provides a weighted average, which causes this effect:
In case score_mode is set to avg the individual scores will be combined by a weighted average. For example, if two functions return score 1 and 2 and their respective weights are 3 and 4, then their scores will be combined as (1*3+2*4)/(3+4) and not (1*3+2*4)/2.
In addition, it's important to note that due to this, function scores whose filters don't match the current document have no effect on the average score, rather then reducing it. In your example, this means that if a document only matches by having a MALE customer, it will have a score of 8, but since it's weighted it'll actually have a score of (1*8)/8 = 1. If it's a MALE with the first name Jackson, the score again will be (1*8 + 1*4)/(8+4)=1. This can easily seen by using the explain api:
GET kibana_sample_data_ecommerce/_explain/ER5Bv3ABEiTwEf3FhKws
{
"query": {
"function_score": {
"functions": [
{
"filter": { "match": { "customer_last_name": "Cook" } },
"weight": 2
},
{
"filter": { "match": { "customer_first_name": "Jackson" } },
"weight": 4
},
{
"filter": { "match": { "customer_gender" : "MALE"} },
"weight": 8
}
],
"score_mode": "avg",
"boost_mode": "replace"
}
}
}
returns
{
"_index" : "kibana_sample_data_ecommerce",
"_type" : "_doc",
"_id" : "ER5Bv3ABEiTwEf3FhKws",
"matched" : true,
"explanation" : {
"value" : 1.0,
"description" : "min of:",
"details" : [
{
"value" : 1.0,
"description" : "function score, score mode [avg]",
"details" : [
{
"value" : 8.0,
"description" : "function score, product of:",
"details" : [
{
"value" : 1.0,
"description" : "match filter: customer_gender:MALE",
"details" : [ ]
},
{
"value" : 8.0,
"description" : "product of:",
"details" : [
{
"value" : 1.0,
"description" : "constant score 1.0 - no function provided",
"details" : [ ]
},
{
"value" : 8.0,
"description" : "weight",
"details" : [ ]
}
]
}
]
}
]
},
{
"value" : 3.4028235E38,
"description" : "maxBoost",
"details" : [ ]
}
]
}
}

It's already answered here. Since you used boost_mode:replace, only the function scores are used & the query score gets ignored.
Based on that, since your weights are the same, they "cancel each other out" to result in 1.

Related

Elasticsearch match one array with another array

Let's say I have two indexes kids and outings_for_kids with the following data
kids
[
{
"name": "little kid 1",
"i_like":["drawing","teddybears"]
},
]
outings for kids
[
{
"name": "Teddybear drawing fights with apples!",
"for_kids_that_like":["apples","teddybears","drawing", "play outside games"]
},
{
"name": "drawing and teddies!",
"for_kids_that_like":["teddybears","drawing"]
}
]
I want to find an outing that likes the same things little kid 1 likes and a lower score if it has more.
Little kid 1 should not match 100% with the first outing. It has what little kid 1 wants, but but it has more e.g. apples, it should match 50%.
It should match 100% with the second outing.
This will be a 2 step process:
Get i_like value from fields index
Use i_like from step 1 to query outings index
Use terms query to match each value
Use script to compare array size with number of values
Use constant score to give same score based on index count
Query
GET outings/_search
{
"query": {
"bool": {
"should": [
{
"constant_score": {
"filter": {
"bool": {
"must": [
{
"term": {
"for_kids_that_like": {
"value": "teddybears"
}
}
},
{
"term": {
"for_kids_that_like": {
"value": "drawing"
}
}
},
{
"script": {
"script": "doc['for_kids_that_like.keyword'].size()==2" --> replace 2 with size of elements searched
}
}
]
}
},
"boost": 100
}
},
{
"constant_score": {
"filter": {
"bool": {
"must": [
{
"term": {
"for_kids_that_like": {
"value": "teddybears"
}
}
},
{
"term": {
"for_kids_that_like": {
"value": "drawing"
}
}
},
{
"script": {
"script": "doc['for_kids_that_like.keyword'].size()>2"
}
}
]
}
},
"boost": 50
}
}
]
}
}
}
Result:
"hits" : [
{
"_index" : "outings",
"_type" : "_doc",
"_id" : "IH7tVHEBbLcSRUWr6wPj",
"_score" : 100.0,
"_source" : {
"name" : "Teddybear drawing fights with apples!",
"for_kids_that_like" : [
"teddybears",
"drawing"
]
}
},
{
"_index" : "outings",
"_type" : "_doc",
"_id" : "IX7zVHEBbLcSRUWrhgM9",
"_score" : 50.0,
"_source" : {
"name" : "Teddybear drawing fights with apples!",
"for_kids_that_like" : [
"teddybears",
"drawing",
"apples"
]
}
}
]
If you just want to show exact match documents on top followed by partial matches then you don't need constant score(must query with term search will work). By default exact matches are given higher score

Is it possible to set the TYPE parameter using simple query string query in Elastic Search

When using a query string query in ES and matching multiple fields, I can set a TYPE paramter to configure how ES combines/scores when matching on multiple fields.
e.g. I want to match two fields in my index, and combine scores from both fields
GET /_search
{
"query": {
"query_string" : {
"query" : "test",
"fields": ["titel", "content"],
"type": "most_fields"
}
}
}
The parameter seems to be missing using the simple query string. What is the default mode for simple query string? How are scores chosen/combined? Is it possible to set type.
Simple query string doesn't have a type parameter. It does a sum of score from each field.
Consider below index and let's see how different queries calculate score using explanation api
Mapping:
PUT testindex6
{
"mappings": {
"properties": {
"title":{
"type": "text"
},
"description":{
"type": "text"
}
}
}
}
Data:
POST testindex6/_doc
{
"title": "dog",
"description":"dog is brown"
}
1. Query_string best_fields(default)
Finds documents which match any field, but uses the _score from the
best field
GET testindex6/_search?explain=true
{
"query": {
"query_string": {
"default_field": "*",
"query": "dog brown",
"type":"best_fields"
}
}
}
Result:
"_explanation" : {
"value" : 0.5753642,
"description" : "max of:",
"details" : [
{
"value" : 0.5753642,
"description" : "sum of:",
},
{
"value" : 0.2876821,
"description" : "sum of:",
}
]
}
Best_fields takes max score from matched fields
2. Query_string most_fields
Does sum of scores from matched fields
GET testindex6/_search?explain=true
{
"query": {
"query_string": {
"default_field": "*",
"query": "dog brown",
"type":"most_fields"
}
}
}
Result
"_explanation" : {
"value" : 0.8630463,
"description" : "sum of:",
"details" : [
{
"value" : 0.5753642,
"description" : "sum of:"
....
},
{
"value" : 0.2876821,
"description" : "sum of:"
....
}
]
}
}
3. Simple_Query_String
Query
GET testindex6/_search?explain=true
{
"query": {
"simple_query_string": {
"query": "dog brown",
"fields": ["*"]
}
}
}
Result:
"_explanation" : {
"value" : 0.8630463,
"description" : "sum of:",
"details" : [
{
"value" : 0.5753642,
"description" : "sum of:",
},
{
"value" : 0.2876821,
"description" : "sum of:"
}
]
}
}
So you can see score is same in most_fields and simple_query_string(both do a sum of). But there is difference in them. Consider below index
I have created a field title with type text and subfield shingles with shingles analyzer.
PUT index_2
{
"settings": {
"analysis": {
"analyzer": {
"analyzer_shingle": {
"tokenizer": "standard",
"filter": [
"lowercase",
"shingle"
]
}
}
}
},
"mappings": {
"properties": {
"title": {
"type": "text",
"fields": {
"shingles": {
"search_analyzer": "analyzer_shingle",
"analyzer": "analyzer_shingle",
"type": "text"
}
}
}
}
}
}
Data:
POST index_2/_doc
{
"title":"the brown fox"
}
1. Most_fields
Query:
GET index_2/_search?explain=true
{
"query": {
"query_string": {
"query": "brown fox",
"fields": ["*"],
"type":"most_fields"
}
}
}
Result:
"_explanation" : {
"value" : 1.3650365,
"description" : "sum of:",
"details" : [
{
"value" : 0.7896724,
"description" : "sum of:",
},
{
"value" : 0.5753642,
"description" : "sum of:",
}
]
}
2. Simple_Query_string
Query
GET index_2/_search?explain=true
{
"query": {
"simple_query_string": {
"query": "brown fox",
"fields": ["*"]
}
}
}
Result:
"_explanation" : {
"value" : 1.2632996,
"description" : "sum of:",
"details" : [
{
"value" : 0.6316498,
"description" : "sum of:",
},
{
"value" : 0.6316498,
"description" : "sum of:"
}
]
}
}
If you will see the score is different in most_fields and simple_query_string even though both do sum of scores.
The reason is most_fields uses analyzer of field while querying ,remember titles(standard) and titles shingles(analyzer_shingle) have different analyzer while simple_query_string use default analyzer of the index(standard) for all fields.
If we will query most_fields and force it to use standard analyzer you will score is same
Query:
GET index_2/_search?explain=true
{
"query": {
"query_string": {
"query": "brown fox",
"fields": ["*"],
"type":"most_fields",
"analyzer": "standard"-->instead of field analyzer respectively use standard for all
}
}
}
Result:
"_explanation" : {
"value" : 1.2632996,
"description" : "sum of:"
"details" : [
{
"value" : 0.6879354,
"description" : "sum of:"
},
{
"value" : 0.5753642,
"description" : "sum of:"
}
]
}
simple_query_string I think is for simple scenarios, if you are using different analyzers for different field use simple_query_string or bool- match queries

Elasticsearch is not returning a document I expect in the search results

I have a collection of customers that have a first name, last name, email, description and owner id. I want to take a character string from the app, and search on all the fields, with a priority order. Im using boost to achieve that.
Currently I have a lot of test customers with the name Sean in various fields within the documents. I have 2 documents that contain an email with sean.jones#email.com. One document contains the same email in the description.
When I perform the following search, im missing the document in the search results that does not contain the email in the description.
Here is my query:
{
"query" : {
"bool" : {
"filter" : {
"match" : {
"ownerId" : "acct_123"
}
},
"must" : [
{
"bool" : {
"should" : [
{
"prefix" : {
"firstName" : {
"value" : "sean",
"boost" : 3
}
}
},
{
"prefix" : {
"lastName" : {
"value" : "sean",
"boost" : 3
}
}
},
{
"terms" : {
"boost" : 2,
"description" : [
"sean"
]
}
},
{
"prefix" : {
"email" : {
"value" : "sean",
"boost" : 1
}
}
}
]
}
}
]
}
}
}
Here is the document that Im missing:
{
"_index" : "xxx",
"_id" : "cus_123",
"_version" : 1,
"_type" : "customers",
"_seq_no" : 9096,
"_primary_term" : 1,
"found" : true,
"_source" : {
"firstName" : null,
"id" : "cus_123",
"lastName" : null,
"email" : "sean.jones#email.com",
"ownerId" : "acct_123",
"description" : null
}
}
When I look at the current results, all of the documents have a score of 3.0. They have "Sean" in the name as well, so they score higher. When I do an _explain on the document im missing, with the query above, I get the following:
{
"_index": "xxx",
"_type": "customers",
"_id": "cus_123",
"matched": true,
"explanation": {
"value": 1.0,
"description": "sum of:",
"details": [
{
"value": 1.0,
"description": "sum of:",
"details": [
{
"value": 1.0,
"description": "ConstantScore(email._index_prefix:sean)",
"details": []
}
]
},
{
"value": 0.0,
"description": "match on required clause, product of:",
"details": [
{
"value": 0.0,
"description": "# clause",
"details": []
},
{
"value": 1.0,
"description": "ownerId:acct_123",
"details": []
}
]
}
]
}
}
Here are my mappings:
{
"properties": {
"firstName": {
"type": "text",
"index_prefixes": {
"max_chars": 10,
"min_chars": 1
}
},
"email": {
"analyzer": "my_email_analyzer",
"type": "text",
"index_prefixes": {
"max_chars": 10,
"min_chars": 1
}
},
"lastName": {
"type": "text",
"index_prefixes": {
"max_chars": 10,
"min_chars": 1
}
},
"description": {
"type": "text"
},
"ownerId": {
"type": "text"
}
}
}
"my_email_analyzer": {
"type": "custom",
"tokenizer": "uax_url_email"
}
If im understanding this correctly, because this document is only scoring a 1, its not meeting a particular threshold. Ive tried adjusting the min_score but I had no luck. Any thoughts on how I can get this document to be included in the search results?
thanks so much
It depends on what mean by "missing":
is it, that the document does not make it into the number of hits (the "total")?
or is it, that the document itself does not show up as a hit in the hits list?
If it's #2 you may want to increase the number of documents Elasticsearch fetches and returns, by adding a size-clause to your search request (default size is 10):
Example
"size": 50

Elasticsearch function score query stuck at zero score

I have a query which I've simplified down to this:
GET /foos-33/_search
{
"from" : 0,
"size" : 25,
"query" : {
"function_score" : {
"query" : {
"bool" : {
"filter" : {
"bool" : {
"must" : [ {
"bool" : {
"must_not" : {
"terms" : {
"foo.id" : [ ]
}
}
}
} ]
}
}
}
},
"functions" : [ {
"field_value_factor" : {
"field" : "foo.strategicBoost",
"missing" : 1.0
}
} ],
"score_mode" : "sum"
}
},
"explain" : true,
"sort" : [ {
"counts.barsPerDay" : {
"order" : "desc"
}
} ]
}
The scores of the hits are always zero. The explain output sort of shows why this is happening, but I don't completely understand what's going on:
"_explanation": {
"value": 0,
"description": "function score, product of:",
"details": [
{
"value": 0,
"description": "ConstantScore(-() +*:*), product of:",
"details": [
{
"value": 0,
"description": "boost",
"details": []
},
{
"value": 1,
"description": "queryNorm",
"details": []
}
]
},
{
"value": 10,
"description": "min of:",
"details": [
{
"value": 10,
"description": "field value function: none(doc['foo.strategicBoost'].value?:1.0 * factor=1.0)",
"details": []
},
{
"value": 3.4028235e+38,
"description": "maxBoost",
"details": []
}
]
}
]
}
},
I tried to wrap it in a constant_score to change the constant score from 0 to 1, like this:
GET /foos-33/_search
{
"from" : 0,
"size" : 25,
"query" : {
"function_score" : {
"query" : {
"bool" : {
"constant_score": {
"boost": 1,
"filter" : {
"bool" : {
"must" : [ {
"bool" : {
"must_not" : {
"terms" : {
"foo.id" : [ ]
}
}
}
} ]
}
}
}
}
},
"functions" : [ {
"field_value_factor" : {
"field" : "foo.strategicBoost",
"missing" : 1.0
}
} ],
"score_mode" : "sum"
}
},
"explain" : true,
"sort" : [ {
"counts.barsPerDay" : {
"order" : "desc"
}
} ]
}
but that gave me an error message:
"failed_shards": [
{
"shard": 0,
"index": "foos-33",
"node": "A9s2Ui3mQE2SBZhY2VkZGw",
"reason": {
"type": "query_parsing_exception",
"reason": "[bool] query does not support [constant_score]",
"index": "foos-33",
"line": 8,
"col": 29
}
}
]
There is another way I could try to solve this problem - I could try to change the product to a sum or something - but I can't figure out where the product is coming from.
The top-level "product of" comes from the boost_mode, which defaults to multiply. Setting boost_mode to replace is the right fix in this case - the query score is always zero, so we don't care about it. Setting boost_mode to sum would be an equally valid fix in this case, too.

When using gauss decay score funtion, it always scores 1 on nested elements

For documents like
{
"_id" : "abc123",
"_score" : 3.7613528,
"_source" : {
"id" : "abc123",
"pricePeriods" : [{
"periodTo" : "2016-01-02",
"eur" : 1036,
"gbp" : 782,
"dkk" : 6880,
"sek" : 9025,
"periodFrom" : "2015-12-26",
"nok" : 8065
}, {
"periodTo" : "2016-06-18",
"eur" : 671,
"gbp" : 457,
"dkk" : 4625,
"sek" : 5725,
"periodFrom" : "2016-01-02",
"nok" : 5430
} ]
}
}
I would like to have a gauss decay function score on the prices.
I have tried like this
"query" : {
"function_score" : {
"functions" : [{
"gauss" : {
"pricePeriods.dkk" : {
"origin" : "2500",
"scale" : "2500",
"decay" : 0.8
}
},
"filter" : {
"nested" : {
"filter" : {
"range" : {
"pricePeriods.periodTo" : {
"gte" : "2016-03-17T00:00:00.000"
}
}
},
"path" : "pricePeriods"
}
}
}
]
and it seems that the filter finds the prices I want to make a gauss on, but the resulting score is always 1.
Explain says
{ "value": 1,
"description": "min of:",
"details": [
{
"value": 1,
"description": "function score, score mode [multiply]",
"details": [
{
"value": 1,
"description": "function score, product of:",
"details": [
{
"value": 1,
"description": "match filter: ToParentBlockJoinQuery (+ConstantScore(pricePeriods.periodTo:[[32 30 31 36 2d 30 33 2d 31 37 54 30 30 3a 30 30 3a 30 30 2e 30 30 30] TO *]) #QueryWrapperFilter(_type:__pricePeriods))",
"details": []
},
{
"value": 1,
"description": "Function for field pricePeriods.dkk:",
"details": [
{
"value": 1,
"description": "exp(-0.5*pow(MIN[0.0],2.0)/1.4004437867889222E7)",
"details": []
}
]
}
]
}
]
}
I can see here that gauss apparently returns 1 when it can't find the field.
But the questions is why it can't find the field in nested docs and how to ix that.
The reason gauss function is returning 1 is because as you said it can't find the field as it is nested, you basically need to wrap your whole function_score query into nested query
{
"query": {
"nested": {
"path": "pricePeriods",
"query": {
"function_score": {
"functions": [
{
"gauss": {
"pricePeriods.dkk": {
"origin": "2500",
"scale": "2500",
"decay": 0.8
}
},
"filter": {
"range": {
"pricePeriods.periodTo": {
"gte": "2016-03-17T00:00:00.000"
}
}
}
}
]
}
}
}
}
}
Does this help?

Resources