Elasticsearch query: Multiply final score using nested object and function score - elasticsearch

I have documents with some data and a specific omit list in it (see mapping and example data):
I would like to write an ES query which does the following:
Calculate some "basic" score for the documents (Query 1):
{
"explain": true,
"query": {
"bool": {
"should": [
{
"constant_score": {
"filter": {
"term": {
"type": "TYPE1"
}
}
}
},
{
"function_score": {
"linear": {
"number": {
"origin": 30,
"scale": 20
}
}
}
}
]
}
}
}
At the end multiply the score according to the omit percent of a specific id (In the example I used omit valut for A"omit.id": "A"). As a demonstration in Query 2 I calculated this multiplier.
{
"query": {
"nested": {
"path": "omit",
"query": {
"function_score": {
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"term": {
"omit.id": "A"
}
}
}
},
"functions": [
{
"linear": {
"omit.percent": {
"origin": 0,
"scale": 50,
"offset": 0,
"decay": 0.5
}
}
}
],
"score_mode": "multiply"
}
}
}
}
}
To achieve this final multiplication I faced with the following problems:
If I calculate linear function score inside of a nested query, (according to my interpretation) I cannot use any other field in function_score query.
I cannot multiply the calculated score with any other function_score which is encapsulated into a nested query.
I would like to ask for any advice to resolve this issue.
Note that maybe I should get rid of this nested type and use key-value pairs instead. For example:
{
"omit": {
"A": {
"percent": 10
},
"B": {
"percent": 100
}
}
}
but unfortunately there will be a lot of keys, which would result a huge (continuously growing) mapping, so I not prefer this option.

At least I figured out a possible solution based on a "non-nested way". The complete script can be found here.
I modified the omit list as described in the question:
"omit": {
"A": {
"percent": 10
},
"B": {
"percent": 100
}
}
In addition I set the enabled flag to false to not have these elements in the mapping:
"omit": {
"type" : "object",
"enabled" : false
}
The last trick was to use script_score as a function_score's function, because only there I could use the value of percent by _source.omit.A.percent script:
{
"query": {
"function_score": {
"query": {
...
},
"script_score": {
"lang": "groovy",
"script": "if (_source.omit.A){(100-_source.omit.A.percent)/100} else {1}"
},
"score_mode": "multiply"
}
}
}

Related

ElasticSearch: Combining bool and script_score in a single query

I have an existing elastic bool query. I've added a dense vector field to the index and would like to search it all in one query. The compound query part of the Elastic docs seems to imply you can do this, but I can't make it work (I get a runtime error) and haven't been able to find any examples. Here's a simplified version of what I'm trying.
localQuery = {
'bool':
'should': [
{
"match_phrase": {
"field1": {
"query": query,
"boost": 10
}
}
},
{
"match_phrase": {
"field2": {
"query": query,
"boost": 6
}
}
},
{
"script_score": {
"query": {
"match_all": {}
},
"script": {
"source": "cosineSimilarity(params.element_desc_vector,
'description_vec') + 1.0",
"params": {"element_desc_vector": queryList}
}
}
}
]
}
I'd appreciate any suggestions, pointers to examples or even a flat "no you can't do that".
Thanks
Howard
Trying to do the same, I eventually found you could access the score from within the script. So you could add the score returned in the "should" clause to that of the cosine similarity.
Also I put the bool clause inside the script_score and not vice-versa.
local_query = {
"script_score": {
"query": {
"bool": {
"should": [
{
"match_phrase": {
"field1": {
"query": query,
"boost": 10
}
}
},
{
"match_phrase": {
"field2": {
"query": query,
"boost": 6
}
}
}
]
}
},
"script": {
"source": "(cosineSimilarity(params.element_desc_vector, 'description_vec') + 1.0) + _score",
"params": {
"element_desc_vector": queryList
}
}
}
}

Elasticsearch: should + minimum_should_match vs must

I test with these 2 queries
Query with must
{
"size": 200,
"from": 0,
"query": {
"bool": {
"must": [ {
"match": {
"_all": "science"
}
},
{
"match": {
"category": "fiction"
}
},
{
"match": {
"country": "us"
}
}
]
}
}
}
Query with should + minimum_should_match
{
"size": 200,
"from": 0,
"query": {
"bool": {
"should": [ {
"match": {
"_all": "science"
}
},
{
"match": {
"category": "fiction"
}
},
{
"match": {
"country": "us"
}
}
],
minimum_should_match: 3
}
}
}
Both queries give me same result, I don't know the difference between these 2, when we should use minimum_should_match?
I guess you mean minimum_number_should_match, right?
In both cases it would be the same because you have the same number of clauses in should. minimum_number_should_match usually is used when you have more clauses than the number you specify there.
For example if you have 5 should clauses, but for some reason you only need three of them to be fulfilled you would do something like this:
{
"query": {
"bool": {
"should": [
{
"term": {
"tag": "wow"
}
},
{
"term": {
"tag": "elasticsearch"
}
},
{
"term": {
"tag": "tech"
}
},
{
"term": {
"user": "plchia"
}
},
{
"range": {
"age": {
"gte": 10,
"lte": 20
}
}
}
],
"minimum_should_match": 3
}
}
}
That's correct and desired behavior. Let's decipher it a little bit:
Boolean query with must clauses means that all clauses under must section are required to match. Just like in English - it means strong obligation.
Boolean query with should clauses means that some clauses are required to match, whereas the others are not (i.e. soft obligation). The default number of clauses that must match here is simply 1. And to override this behavior the minimum_should_match parameter is coming into play. If you specify minimum_should_match=3 it will mean 3 clauses under should must match. From the practical perspective it exactly the same as specifying those clauses with must.
Hope it explains it in details.

How can we use exists query in tandem with the search query?

I have a scenario in Elasticsearch where my indexed docs are like this :-
{"id":1,"name":"xyz", "address": "xyz123"}
{"id":1,"name":"xyz", "address": "xyz123"}
{"id":1,"name":"xyz", "address": "xyz123", "note": "imp"}
Here the requirement stress that we have to do a term match query and then provide relevance score to them which is a straight forward thing but the additional aspect here is if any doc found in search result has note field then it should be given higher relevance. How can we achieve it with DSL query? Using exists we can check which docs contain notes but how to integrate with match query in ES query. Have tried lot of ways but none worked.
With ES 5, you could boost your exists query to give a higher score to documents with a note field. For example,
{
"query": {
"bool": {
"must": {
"match": {
"name": {
"query": "your term"
}
}
},
"should": {
"exists": {
"field": "note",
"boost": 4
}
}
}
}
}
With ES 2, you could try a boosted filtered subset
{
"query": {
"function_score": {
"query": {
"match": { "name": "your term" }
},
"functions": [
{
"filter": { "exists" : { "field" : "note" }},
"weight": 4
}
],
"score_mode": "sum"
}
}
}
I believe that you are looking for boosting query feature
https://www.elastic.co/guide/en/elasticsearch/reference/5.1/query-dsl-boosting-query.html
{
"query": {
"boosting": {
"positive": {
<put yours original query here>
},
"negative": {
"filtered": {
"filter": {
"exists": {
"field": "note"
}
}
}
},
"negative_boost": 4
}
}
}

Elastic search how can I query either multi match or functions

I have three following parameters that I will pass to run the query, which are;
query - Either a place name, description or empty,
lat - Either latitude of a place or empty,
lon - Either longitude of a place or empty
Based on above parameters, I get to query list of items based on query scores, then calculate the distance between result and lat, lon.
Now, I have the following script to get the items based on query and distance;
{
"query": {
"function_score": {
"query": {
"bool": {
"must": [{
"multi_match" : {
"query": "Lippo",
"fields": [ "name^6", "city^5", "country^4", "position^3", "address_line^2", "description"]
}
}]
}
},
"functions": [
{
"gauss": {
"position": {
"origin": "-6.184652, 106.7518749",
"offset": "2km",
"scale": "10km",
"decay": 0.33
}
}
}
]
}
}
}
But the thing is, if query is empty, there will be no result at all. What I want is, the result is based on either query or distance.
Is there anyway to achieve this? Any suggestion is appreciated.
setting the zero_terms_query option of multi-match to all should allow you to get the results when query is empty.
Example :
{
"query": {
"function_score": {
"query": {
"bool": {
"must": [{
"multi_match" : {
"query": "Lippo",
"fields": [ "name^6", "city^5", "country^4", "position^3", "address_line^2", "description"],
"zero_terms_query" : "all"
}
}]
}
},
"functions": [
{
"gauss": {
"position": {
"origin": "-6.184652, 106.7518749",
"offset": "2km",
"scale": "10km",
"decay": 0.33
}
}
}
]
}
}
}

Why script in custom_filters_score behaves as boost?

{
"query": {
"custom_filters_score": {
"query": {
"term": {
"name": "user1234"
}
},
"filters": [
{
"filter": {
"term": {
"subject": "math"
}
},
"script": "_score + doc['subject_score'].value"
}
]
}
}
}
If script is having like above it gives Error: unresolvable property or identifier: _score
If script is like "script": "doc['subject_score'].value" It multiplies the _score in similar way boost does. I want to replace the elasticsearch _score with custom score.
If I understood you correctly you would like to use elasticsearch scoring if subject is not math and you would like to use custom scoring with subject is math. If you are using Elasticsearch v0.90.4 or higher, it can be achieved using new function_score query:
{
"query": {
"function_score": {
"query": {
"term": {
"name": "user1234"
}
},
"functions": [{
"filter": {
"term": {
"subject": "math"
}
},
"script_score": {
"script": "doc[\"subject_score\"].value"
}
}, {
"boost_factor": 0
}],
"score_mode": "first",
"boost_mode": "sum"
}
}
}
Prior to v0.90.4 you would have to resort to using combination of custom_score and custom_filters_score:
{
"query": {
"custom_score": {
"query": {
"custom_filters_score": {
"query": {
"term": {
"name": "user1234"
}
},
"filters": [{
"filter": {
"term": {
"subject": "math"
}
},
"script": "-1.0"
}]
}
},
"script": "_score < 0.0 ? _score * -1.0 + doc[\"subject_score\"].value : _score"
}
}
}
or as #javanna suggested, use multiple custom_score queries combined together by bool query:
{
"query": {
"bool": {
"disable_coord": true,
"should": [{
"filtered": {
"query": {
"term": {
"name": "user1234"
}
},
"filter": {
"bool": {
"must_not": [{
"term": {
"subject": "math"
}
}]
}
}
}
}, {
"filtered": {
"query": {
"custom_score": {
"query": {
"term": {
"name": "user1234"
}
},
"script": "doc['subject_score'].value"
}
},
"filter": {
"term": {
"subject": "math"
}
}
}
}]
}
}
}
Firstly I'd like to say that there are many ways of customising the scoring in elasticsearch and it seems like you may have accidentally picked the wrong one. I will just summarize two and you will see what the problem is:
Custom Filters Score
If you read the docs (carefully) on custom_filters_score then you will see that it there for performance reasons, to be able to use for scoring the the faster filter machinery of elasticsearch. (Filters are faster as scoring is not calculated when computing the hit set, and they are cached between requests.)
At the end of the docs; it mentions custom_filters_score can take a "script" parameter to use instead of a "boost" parameter per filter. Best way to think of this is to calculate a number, which will be passed up to the parent query to be combined with the other sibling queries to calculate the total score for the document.
Custom Score Query
Reading the docs this is used when you want to customise the score from the query and change it how you wish. There is a _score variable available to you to use in your "script" which is the score of the query inside the custom_score query.
Try this:
"query": {
"filtered": {
"query": {
"custom_score": {
"query": {
"match_all": {}
},
"script": "doc['subject_score'].value" //*see note below
}
},
"filter": {
"and": [
{
"term": {
"subject": "math"
}
},
{
"term": {
"name": "user1234"
}
}
]
}
}
}
*NOTE: If you wanted to you could use _score here. Also, I moved both your "term" parts to filters as any match of a term would get the same score and filters are faster.
Good luck!

Resources