When using gauss decay score funtion, it always scores 1 on nested elements - elasticsearch

For documents like
{
"_id" : "abc123",
"_score" : 3.7613528,
"_source" : {
"id" : "abc123",
"pricePeriods" : [{
"periodTo" : "2016-01-02",
"eur" : 1036,
"gbp" : 782,
"dkk" : 6880,
"sek" : 9025,
"periodFrom" : "2015-12-26",
"nok" : 8065
}, {
"periodTo" : "2016-06-18",
"eur" : 671,
"gbp" : 457,
"dkk" : 4625,
"sek" : 5725,
"periodFrom" : "2016-01-02",
"nok" : 5430
} ]
}
}
I would like to have a gauss decay function score on the prices.
I have tried like this
"query" : {
"function_score" : {
"functions" : [{
"gauss" : {
"pricePeriods.dkk" : {
"origin" : "2500",
"scale" : "2500",
"decay" : 0.8
}
},
"filter" : {
"nested" : {
"filter" : {
"range" : {
"pricePeriods.periodTo" : {
"gte" : "2016-03-17T00:00:00.000"
}
}
},
"path" : "pricePeriods"
}
}
}
]
and it seems that the filter finds the prices I want to make a gauss on, but the resulting score is always 1.
Explain says
{ "value": 1,
"description": "min of:",
"details": [
{
"value": 1,
"description": "function score, score mode [multiply]",
"details": [
{
"value": 1,
"description": "function score, product of:",
"details": [
{
"value": 1,
"description": "match filter: ToParentBlockJoinQuery (+ConstantScore(pricePeriods.periodTo:[[32 30 31 36 2d 30 33 2d 31 37 54 30 30 3a 30 30 3a 30 30 2e 30 30 30] TO *]) #QueryWrapperFilter(_type:__pricePeriods))",
"details": []
},
{
"value": 1,
"description": "Function for field pricePeriods.dkk:",
"details": [
{
"value": 1,
"description": "exp(-0.5*pow(MIN[0.0],2.0)/1.4004437867889222E7)",
"details": []
}
]
}
]
}
]
}
I can see here that gauss apparently returns 1 when it can't find the field.
But the questions is why it can't find the field in nested docs and how to ix that.

The reason gauss function is returning 1 is because as you said it can't find the field as it is nested, you basically need to wrap your whole function_score query into nested query
{
"query": {
"nested": {
"path": "pricePeriods",
"query": {
"function_score": {
"functions": [
{
"gauss": {
"pricePeriods.dkk": {
"origin": "2500",
"scale": "2500",
"decay": 0.8
}
},
"filter": {
"range": {
"pricePeriods.periodTo": {
"gte": "2016-03-17T00:00:00.000"
}
}
}
}
]
}
}
}
}
}
Does this help?

Related

Elasticsearch Sorting on Multiple Nested Fields with Nested Filtering

Given the following data with nested objects (members within teams), I need to sort objects on multiple nested fields, first by height, then by weight. This all needs to respect the filtering that is done on other nested fields (just position in this example).
Data
PUT sample
{
"mappings": {
"dynamic": "strict",
"properties": {
"teamId": { "type": "keyword", "index": true, "doc_values": true },
"members": {
"type": "nested",
"properties": {
"memberId": { "type": "keyword", "index": true, "doc_values": true },
"position": { "type": "keyword", "index": true, "doc_values": true},
"height": { "type": "integer", "index": true, "doc_values": true},
"weight": { "type": "integer", "index": true, "doc_values": true}
}
}
}
}
}
PUT sample/_doc/1
{
"teamId" : "A"
, "members" :
[
{ "memberId" : "A1_X" , "position": "X", "height": 70, "weight": 195}
, { "memberId" : "A2_Y" , "position": "Y", "height": 70, "weight": 170}
, { "memberId" : "A3_Z" , "position": "Z", "height": 75, "weight": 210}
]
}
PUT sample/_doc/2
{
"teamId" : "B"
, "members" :
[
{ "memberId" : "B1_Z" , "position": "Z", "height": 80, "weight": 220 }
, { "memberId" : "B2_X" , "position": "X", "height": 75, "weight": 190 }
, { "memberId" : "B3_X" , "position": "X", "height": 70, "weight": 200 }
, { "memberId" : "B4_Y" , "position": "Y", "height": 70, "weight": 170 }
]
}
PUT sample/_doc/3
{
"teamId" : "C"
, "members" :
[
{ "memberId" : "C1_Y" , "position": "Y", "height": 70, "weight": 190 }
, { "memberId" : "C2_X" , "position": "X", "height": 75, "weight": 180 }
, { "memberId" : "C3_Z" , "position": "Z", "height": 75, "weight": 225 }
]
}
Query
POST sample/_search?filter_path=hits.hits.inner_hits.members.hits.hits._source.height,hits.hits.inner_hits.members.hits.hits._source.weight,hits.hits.sort,hits.hits._source.teamId
{
"size" : 3
, "track_total_hits" : true
, "query" : { "bool" : { "filter" : [
{ "match_all" : { } }
, { "nested": { "path": "members" , "query": { "bool": { "must": [
//nested filters
{ "term" : { "members.position" : "X" } }
] } }
, "inner_hits" : { "size" : 3 }
} }
]}}
, "sort": [
{ "members.height": { "order": "asc" , "nested": { "path": "members", "filter": { "bool": { "must": [
//copy all nested filters below
{ "term" : { "members.position" : "X" } }
] } } } } }
, { "members.weight": { "order": "asc" , "nested": { "path": "members", "filter": { "bool": { "must": [
//copy all nested filters below
{ "term" : { "members.position" : "X" } }
] } } } } }
, { "teamId": { "order": "asc" } }
]
}
Results
{
"hits" : {
"hits" : [
{
"_source" : {
"teamId" : "B"
},
"sort" : [
70,
190,
"B"
],
"inner_hits" : {
"members" : {
"hits" : {
"hits" : [
{
"_source" : {
"weight" : 190,
"height" : 75
}
},
{
"_source" : {
"weight" : 200,
"height" : 70
}
}
]
}
}
}
},
{
"_source" : {
"teamId" : "A"
},
"sort" : [
70,
195,
"A"
],
"inner_hits" : {
"members" : {
"hits" : {
"hits" : [
{
"_source" : {
"weight" : 195,
"height" : 70
}
}
]
}
}
}
},
{
"_source" : {
"teamId" : "C"
},
"sort" : [
75,
180,
"C"
],
"inner_hits" : {
"members" : {
"hits" : {
"hits" : [
{
"_source" : {
"weight" : 180,
"height" : 75
}
}
]
}
}
}
}
]
}
}
So in the query above, I'm trying to
Find teams that have members with position X.
Sort the teams based on height then weight (both ascending) of such members, where the height and weight used must be from the same members.
The query is close, but has 2 problems I'd like to fix.
It requires that I copy all the nested filters for each sorting field. Is there a syntax that won't require me to do this?
The height it uses to sort by may not be from the same member as the weight that it uses to sort by. In the results above, you can see that it puts team B first because it uses the height of 70 from B3_X and the weight of 190 from B2_X, which is not what I want. I want it to use the height of 70 from B3_X and the weight of 200 from the same member.
(Note that a hack I want to avoid is concatenating the 2 sorting fields into a single string or integer. Although this would solve the problem as stated, it creates other problems in the real world scenario I'm basing this example on.)

ElasticSearch: Query to find max of count of objects based on field value

For the example document below in the index, I want to find max of count of actions based on component name across all documents in the index. Could you please help to find a way for this.
Expected result assuming only one document present in the Index:
comp1 -> action1 -> max 2 times
comp1 -> action2 -> max 1 time
comp2 -> action2 -> max 1 time
comp2 -> action3 -> max 1 time
Sample Document:
{
"id": "AC103902:A13A_AC140008:01BB_5FA2E8FA_1C08:0007",
"tokens": [
{
"name": "comp1",
"items": [
{
"action": "action1",
"attr": "value"
},
{
"action": "action1",
"attr": "value"
},
{
"action": "action2",
"attr": "value"
}
]
},
{
"name": "comp2",
"items": [
{
"action": "action2",
"attr": "value"
},
{
"action": "action3",
"attr": "value"
}
]
}
]
}
ElasticSearch Version: 7.9
I can loop through each document and calculate this at client side but I am curious to know if there is already an ES query which can help to get this kid of summary from the documents in the index.
You'll need to define both the tokens array and the tokens.items array as nested in order to get the correct stats.
Then, assuming your mapping looks something along the lines of
{
"mappings": {
"properties": {
"tokens": {
"type": "nested",
"properties": {
"items": {
"type": "nested"
}
}
}
}
}
}
the following query can be executed:
GET index_name/_search
{
"size": 0,
"aggs": {
"by_token_name": {
"nested": {
"path": "tokens"
},
"aggs": {
"token_name": {
"terms": {
"field": "tokens.name.keyword"
},
"aggs": {
"by_max_actions": {
"nested": {
"path": "tokens.items"
},
"aggs": {
"max_actions": {
"terms": {
"field": "tokens.items.action.keyword"
}
}
}
}
}
}
}
}
}
}
yielding these buckets:
[
{
"key" : "comp1", <--
"doc_count" : 1,
"by_max_actions" : {
"doc_count" : 3,
"max_actions" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "action1", <--
"doc_count" : 2
},
{
"key" : "action2", <--
"doc_count" : 1
}
]
}
}
},
{
"key" : "comp2", <--
"doc_count" : 1,
"by_max_actions" : {
"doc_count" : 2,
"max_actions" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "action2", <--
"doc_count" : 1
},
{
"key" : "action3", <--
"doc_count" : 1
}
]
}
}
}
]
which can be easily post-processed at client side.

Score_mode avg returns 1 for all documents

I'm using function score with score_mode avg and boost_mode replace.
And according to documentation I'm expecting the query function score to be overriden by the function score filters (because I'm using boost_mode replace).
This works as expected for the sum and multiply, but not for avg (I'm aware that the average in function score is a weighted average)
When I apply this function_score all the documents get a score of 1.
How can this happen?
GET kibana_sample_data_ecommerce/_search
{
"_source": {
"includes": ["customer_last_name", "customer_first_name", "customer_gender"]
},
"size": 10,
"query": {
"function_score": {
"functions": [
{
"filter": { "match": { "customer_last_name": "Cook" } },
"weight": 2
},
{
"filter": { "match": { "customer_first_name": "Jackson" } },
"weight": 4
},
{
"filter": { "match": { "customer_gender" : "MALE"} },
"weight": 8
}
],
"score_mode": "avg",
"boost_mode": "replace"
}
}
}
So this is a bit weird, but the link provided by #jzzfs is already pretty close. The average mode of the function score query provides a weighted average, which causes this effect:
In case score_mode is set to avg the individual scores will be combined by a weighted average. For example, if two functions return score 1 and 2 and their respective weights are 3 and 4, then their scores will be combined as (1*3+2*4)/(3+4) and not (1*3+2*4)/2.
In addition, it's important to note that due to this, function scores whose filters don't match the current document have no effect on the average score, rather then reducing it. In your example, this means that if a document only matches by having a MALE customer, it will have a score of 8, but since it's weighted it'll actually have a score of (1*8)/8 = 1. If it's a MALE with the first name Jackson, the score again will be (1*8 + 1*4)/(8+4)=1. This can easily seen by using the explain api:
GET kibana_sample_data_ecommerce/_explain/ER5Bv3ABEiTwEf3FhKws
{
"query": {
"function_score": {
"functions": [
{
"filter": { "match": { "customer_last_name": "Cook" } },
"weight": 2
},
{
"filter": { "match": { "customer_first_name": "Jackson" } },
"weight": 4
},
{
"filter": { "match": { "customer_gender" : "MALE"} },
"weight": 8
}
],
"score_mode": "avg",
"boost_mode": "replace"
}
}
}
returns
{
"_index" : "kibana_sample_data_ecommerce",
"_type" : "_doc",
"_id" : "ER5Bv3ABEiTwEf3FhKws",
"matched" : true,
"explanation" : {
"value" : 1.0,
"description" : "min of:",
"details" : [
{
"value" : 1.0,
"description" : "function score, score mode [avg]",
"details" : [
{
"value" : 8.0,
"description" : "function score, product of:",
"details" : [
{
"value" : 1.0,
"description" : "match filter: customer_gender:MALE",
"details" : [ ]
},
{
"value" : 8.0,
"description" : "product of:",
"details" : [
{
"value" : 1.0,
"description" : "constant score 1.0 - no function provided",
"details" : [ ]
},
{
"value" : 8.0,
"description" : "weight",
"details" : [ ]
}
]
}
]
}
]
},
{
"value" : 3.4028235E38,
"description" : "maxBoost",
"details" : [ ]
}
]
}
}
It's already answered here. Since you used boost_mode:replace, only the function scores are used & the query score gets ignored.
Based on that, since your weights are the same, they "cancel each other out" to result in 1.

Elasticsearch is not returning a document I expect in the search results

I have a collection of customers that have a first name, last name, email, description and owner id. I want to take a character string from the app, and search on all the fields, with a priority order. Im using boost to achieve that.
Currently I have a lot of test customers with the name Sean in various fields within the documents. I have 2 documents that contain an email with sean.jones#email.com. One document contains the same email in the description.
When I perform the following search, im missing the document in the search results that does not contain the email in the description.
Here is my query:
{
"query" : {
"bool" : {
"filter" : {
"match" : {
"ownerId" : "acct_123"
}
},
"must" : [
{
"bool" : {
"should" : [
{
"prefix" : {
"firstName" : {
"value" : "sean",
"boost" : 3
}
}
},
{
"prefix" : {
"lastName" : {
"value" : "sean",
"boost" : 3
}
}
},
{
"terms" : {
"boost" : 2,
"description" : [
"sean"
]
}
},
{
"prefix" : {
"email" : {
"value" : "sean",
"boost" : 1
}
}
}
]
}
}
]
}
}
}
Here is the document that Im missing:
{
"_index" : "xxx",
"_id" : "cus_123",
"_version" : 1,
"_type" : "customers",
"_seq_no" : 9096,
"_primary_term" : 1,
"found" : true,
"_source" : {
"firstName" : null,
"id" : "cus_123",
"lastName" : null,
"email" : "sean.jones#email.com",
"ownerId" : "acct_123",
"description" : null
}
}
When I look at the current results, all of the documents have a score of 3.0. They have "Sean" in the name as well, so they score higher. When I do an _explain on the document im missing, with the query above, I get the following:
{
"_index": "xxx",
"_type": "customers",
"_id": "cus_123",
"matched": true,
"explanation": {
"value": 1.0,
"description": "sum of:",
"details": [
{
"value": 1.0,
"description": "sum of:",
"details": [
{
"value": 1.0,
"description": "ConstantScore(email._index_prefix:sean)",
"details": []
}
]
},
{
"value": 0.0,
"description": "match on required clause, product of:",
"details": [
{
"value": 0.0,
"description": "# clause",
"details": []
},
{
"value": 1.0,
"description": "ownerId:acct_123",
"details": []
}
]
}
]
}
}
Here are my mappings:
{
"properties": {
"firstName": {
"type": "text",
"index_prefixes": {
"max_chars": 10,
"min_chars": 1
}
},
"email": {
"analyzer": "my_email_analyzer",
"type": "text",
"index_prefixes": {
"max_chars": 10,
"min_chars": 1
}
},
"lastName": {
"type": "text",
"index_prefixes": {
"max_chars": 10,
"min_chars": 1
}
},
"description": {
"type": "text"
},
"ownerId": {
"type": "text"
}
}
}
"my_email_analyzer": {
"type": "custom",
"tokenizer": "uax_url_email"
}
If im understanding this correctly, because this document is only scoring a 1, its not meeting a particular threshold. Ive tried adjusting the min_score but I had no luck. Any thoughts on how I can get this document to be included in the search results?
thanks so much
It depends on what mean by "missing":
is it, that the document does not make it into the number of hits (the "total")?
or is it, that the document itself does not show up as a hit in the hits list?
If it's #2 you may want to increase the number of documents Elasticsearch fetches and returns, by adding a size-clause to your search request (default size is 10):
Example
"size": 50

Elasticsearch function score query stuck at zero score

I have a query which I've simplified down to this:
GET /foos-33/_search
{
"from" : 0,
"size" : 25,
"query" : {
"function_score" : {
"query" : {
"bool" : {
"filter" : {
"bool" : {
"must" : [ {
"bool" : {
"must_not" : {
"terms" : {
"foo.id" : [ ]
}
}
}
} ]
}
}
}
},
"functions" : [ {
"field_value_factor" : {
"field" : "foo.strategicBoost",
"missing" : 1.0
}
} ],
"score_mode" : "sum"
}
},
"explain" : true,
"sort" : [ {
"counts.barsPerDay" : {
"order" : "desc"
}
} ]
}
The scores of the hits are always zero. The explain output sort of shows why this is happening, but I don't completely understand what's going on:
"_explanation": {
"value": 0,
"description": "function score, product of:",
"details": [
{
"value": 0,
"description": "ConstantScore(-() +*:*), product of:",
"details": [
{
"value": 0,
"description": "boost",
"details": []
},
{
"value": 1,
"description": "queryNorm",
"details": []
}
]
},
{
"value": 10,
"description": "min of:",
"details": [
{
"value": 10,
"description": "field value function: none(doc['foo.strategicBoost'].value?:1.0 * factor=1.0)",
"details": []
},
{
"value": 3.4028235e+38,
"description": "maxBoost",
"details": []
}
]
}
]
}
},
I tried to wrap it in a constant_score to change the constant score from 0 to 1, like this:
GET /foos-33/_search
{
"from" : 0,
"size" : 25,
"query" : {
"function_score" : {
"query" : {
"bool" : {
"constant_score": {
"boost": 1,
"filter" : {
"bool" : {
"must" : [ {
"bool" : {
"must_not" : {
"terms" : {
"foo.id" : [ ]
}
}
}
} ]
}
}
}
}
},
"functions" : [ {
"field_value_factor" : {
"field" : "foo.strategicBoost",
"missing" : 1.0
}
} ],
"score_mode" : "sum"
}
},
"explain" : true,
"sort" : [ {
"counts.barsPerDay" : {
"order" : "desc"
}
} ]
}
but that gave me an error message:
"failed_shards": [
{
"shard": 0,
"index": "foos-33",
"node": "A9s2Ui3mQE2SBZhY2VkZGw",
"reason": {
"type": "query_parsing_exception",
"reason": "[bool] query does not support [constant_score]",
"index": "foos-33",
"line": 8,
"col": 29
}
}
]
There is another way I could try to solve this problem - I could try to change the product to a sum or something - but I can't figure out where the product is coming from.
The top-level "product of" comes from the boost_mode, which defaults to multiply. Setting boost_mode to replace is the right fix in this case - the query score is always zero, so we don't care about it. Setting boost_mode to sum would be an equally valid fix in this case, too.

Resources