Why script in custom_filters_score behaves as boost? - elasticsearch

{
"query": {
"custom_filters_score": {
"query": {
"term": {
"name": "user1234"
}
},
"filters": [
{
"filter": {
"term": {
"subject": "math"
}
},
"script": "_score + doc['subject_score'].value"
}
]
}
}
}
If script is having like above it gives Error: unresolvable property or identifier: _score
If script is like "script": "doc['subject_score'].value" It multiplies the _score in similar way boost does. I want to replace the elasticsearch _score with custom score.

If I understood you correctly you would like to use elasticsearch scoring if subject is not math and you would like to use custom scoring with subject is math. If you are using Elasticsearch v0.90.4 or higher, it can be achieved using new function_score query:
{
"query": {
"function_score": {
"query": {
"term": {
"name": "user1234"
}
},
"functions": [{
"filter": {
"term": {
"subject": "math"
}
},
"script_score": {
"script": "doc[\"subject_score\"].value"
}
}, {
"boost_factor": 0
}],
"score_mode": "first",
"boost_mode": "sum"
}
}
}
Prior to v0.90.4 you would have to resort to using combination of custom_score and custom_filters_score:
{
"query": {
"custom_score": {
"query": {
"custom_filters_score": {
"query": {
"term": {
"name": "user1234"
}
},
"filters": [{
"filter": {
"term": {
"subject": "math"
}
},
"script": "-1.0"
}]
}
},
"script": "_score < 0.0 ? _score * -1.0 + doc[\"subject_score\"].value : _score"
}
}
}
or as #javanna suggested, use multiple custom_score queries combined together by bool query:
{
"query": {
"bool": {
"disable_coord": true,
"should": [{
"filtered": {
"query": {
"term": {
"name": "user1234"
}
},
"filter": {
"bool": {
"must_not": [{
"term": {
"subject": "math"
}
}]
}
}
}
}, {
"filtered": {
"query": {
"custom_score": {
"query": {
"term": {
"name": "user1234"
}
},
"script": "doc['subject_score'].value"
}
},
"filter": {
"term": {
"subject": "math"
}
}
}
}]
}
}
}

Firstly I'd like to say that there are many ways of customising the scoring in elasticsearch and it seems like you may have accidentally picked the wrong one. I will just summarize two and you will see what the problem is:
Custom Filters Score
If you read the docs (carefully) on custom_filters_score then you will see that it there for performance reasons, to be able to use for scoring the the faster filter machinery of elasticsearch. (Filters are faster as scoring is not calculated when computing the hit set, and they are cached between requests.)
At the end of the docs; it mentions custom_filters_score can take a "script" parameter to use instead of a "boost" parameter per filter. Best way to think of this is to calculate a number, which will be passed up to the parent query to be combined with the other sibling queries to calculate the total score for the document.
Custom Score Query
Reading the docs this is used when you want to customise the score from the query and change it how you wish. There is a _score variable available to you to use in your "script" which is the score of the query inside the custom_score query.
Try this:
"query": {
"filtered": {
"query": {
"custom_score": {
"query": {
"match_all": {}
},
"script": "doc['subject_score'].value" //*see note below
}
},
"filter": {
"and": [
{
"term": {
"subject": "math"
}
},
{
"term": {
"name": "user1234"
}
}
]
}
}
}
*NOTE: If you wanted to you could use _score here. Also, I moved both your "term" parts to filters as any match of a term would get the same score and filters are faster.
Good luck!

Related

ElasticSearch: Combining bool and script_score in a single query

I have an existing elastic bool query. I've added a dense vector field to the index and would like to search it all in one query. The compound query part of the Elastic docs seems to imply you can do this, but I can't make it work (I get a runtime error) and haven't been able to find any examples. Here's a simplified version of what I'm trying.
localQuery = {
'bool':
'should': [
{
"match_phrase": {
"field1": {
"query": query,
"boost": 10
}
}
},
{
"match_phrase": {
"field2": {
"query": query,
"boost": 6
}
}
},
{
"script_score": {
"query": {
"match_all": {}
},
"script": {
"source": "cosineSimilarity(params.element_desc_vector,
'description_vec') + 1.0",
"params": {"element_desc_vector": queryList}
}
}
}
]
}
I'd appreciate any suggestions, pointers to examples or even a flat "no you can't do that".
Thanks
Howard
Trying to do the same, I eventually found you could access the score from within the script. So you could add the score returned in the "should" clause to that of the cosine similarity.
Also I put the bool clause inside the script_score and not vice-versa.
local_query = {
"script_score": {
"query": {
"bool": {
"should": [
{
"match_phrase": {
"field1": {
"query": query,
"boost": 10
}
}
},
{
"match_phrase": {
"field2": {
"query": query,
"boost": 6
}
}
}
]
}
},
"script": {
"source": "(cosineSimilarity(params.element_desc_vector, 'description_vec') + 1.0) + _score",
"params": {
"element_desc_vector": queryList
}
}
}
}

How can we use exists query in tandem with the search query?

I have a scenario in Elasticsearch where my indexed docs are like this :-
{"id":1,"name":"xyz", "address": "xyz123"}
{"id":1,"name":"xyz", "address": "xyz123"}
{"id":1,"name":"xyz", "address": "xyz123", "note": "imp"}
Here the requirement stress that we have to do a term match query and then provide relevance score to them which is a straight forward thing but the additional aspect here is if any doc found in search result has note field then it should be given higher relevance. How can we achieve it with DSL query? Using exists we can check which docs contain notes but how to integrate with match query in ES query. Have tried lot of ways but none worked.
With ES 5, you could boost your exists query to give a higher score to documents with a note field. For example,
{
"query": {
"bool": {
"must": {
"match": {
"name": {
"query": "your term"
}
}
},
"should": {
"exists": {
"field": "note",
"boost": 4
}
}
}
}
}
With ES 2, you could try a boosted filtered subset
{
"query": {
"function_score": {
"query": {
"match": { "name": "your term" }
},
"functions": [
{
"filter": { "exists" : { "field" : "note" }},
"weight": 4
}
],
"score_mode": "sum"
}
}
}
I believe that you are looking for boosting query feature
https://www.elastic.co/guide/en/elasticsearch/reference/5.1/query-dsl-boosting-query.html
{
"query": {
"boosting": {
"positive": {
<put yours original query here>
},
"negative": {
"filtered": {
"filter": {
"exists": {
"field": "note"
}
}
}
},
"negative_boost": 4
}
}
}

terms query does not support minimum_match in Elasticsearch 2.3.3

I want to search documents which match at least two key words using terms query like this:
{
"query": {
"terms": {
"title": ["java","编程","思想"],
"minimum_match": 2
}
},
"highlight": {
"fields": {
"title": {}
}
}
}
It returns "terms query does not support minimum_match".What's wrong with my query?
The correct name was minimum_should_match and that setting has been deprecated in ES 2.0.
What you can do instead is to use a bool/should query with three term queries and the minimum_should_match setting for bool/should queries:
{
"query": {
"bool": {
"minimum_should_match": 2,
"should": [
{
"term": {
"title": "java"
}
},
{
"term": {
"title": "编程"
}
},
{
"term": {
"title": "思想"
}
}
]
}
},
"highlight": {
"fields": {
"title": {}
}
}
}

Terrible has_child query performance

The following query has terrible performance.
100% sure it is the has_child. Query without it runs under 300ms, with it it takes 9 seconds.
Is there some better way to use the has_child query? It seems like I could query parents, and then children by id and then join client side to do the has child check faster than the ES database engine is doing it...
{
"query": {
"filtered": {
"query": {
"bool": {
"must": [
{
"has_child": {
"type": "status",
"query": {
"term": {
"stage": "s3"
}
}
}
},
{
"has_child": {
"type": "status",
"query": {
"term": {
"stage": "es"
}
}
}
}
]
}
},
"filter": {
"bool": {
"must": [
{
"term": {
"source": "IntegrationTest-2016-03-01T23:31:15.023Z"
}
},
{
"range": {
"eventTimestamp": {
"from": "2016-03-01T20:28:15.028Z",
"to": "2016-03-01T23:33:15.028Z"
}
}
}
]
}
}
}
},
"aggs": {
"digests": {
"terms": {
"field": "digest",
"size": 0
}
}
},
"size": 0
}
Cluster info:
CPU and memory usage is low. It is AWS ES Service cluster (v1.5.2). Many small documents, and since version aws is running is old, doc values aren't on by default. Not sure if that is helping or hurting.
Since "stage" is not analyzed (based on your comment) and, therefore, you are not interested in scoring the documents that match on that field, you might realize slight performance gains by using the has_child filter instead of the has_child query. And using a term filter instead of a term query.
In the documentation for has_child, you'll notice:
The has_child filter also accepts a filter instead of a query:
The main performance benefits of using a filter come from the fact that Elasticsearch can skip the scoring phase of the query. Also, filters can be cached which should improve the performance of future searches that use the same filters. Queries, on the other hand, cannot be cached.
Try this instead:
{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"source": "IntegrationTest-2016-03-01T23:31:15.023Z"
}
},
{
"range": {
"eventTimestamp": {
"from": "2016-03-01T20:28:15.028Z",
"to": "2016-03-01T23:33:15.028Z"
}
}
},
{
"has_child": {
"type": "status",
"filter": {
"term": {
"stage": "s3"
}
}
}
},
{
"has_child": {
"type": "status",
"filter": {
"term": {
"stage": "es"
}
}
}
}
]
}
}
}
},
"aggs": {
"digests": {
"terms": {
"field": "digest",
"size": 0
}
}
},
"size": 0
}
I bit the bullet and just performed the parent:child join in my application. Instead of waiting 7 seconds for the has_child query, I fire off two consecutive term queries and do some post processing: 200ms.

Elasticsearch query: Multiply final score using nested object and function score

I have documents with some data and a specific omit list in it (see mapping and example data):
I would like to write an ES query which does the following:
Calculate some "basic" score for the documents (Query 1):
{
"explain": true,
"query": {
"bool": {
"should": [
{
"constant_score": {
"filter": {
"term": {
"type": "TYPE1"
}
}
}
},
{
"function_score": {
"linear": {
"number": {
"origin": 30,
"scale": 20
}
}
}
}
]
}
}
}
At the end multiply the score according to the omit percent of a specific id (In the example I used omit valut for A"omit.id": "A"). As a demonstration in Query 2 I calculated this multiplier.
{
"query": {
"nested": {
"path": "omit",
"query": {
"function_score": {
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"term": {
"omit.id": "A"
}
}
}
},
"functions": [
{
"linear": {
"omit.percent": {
"origin": 0,
"scale": 50,
"offset": 0,
"decay": 0.5
}
}
}
],
"score_mode": "multiply"
}
}
}
}
}
To achieve this final multiplication I faced with the following problems:
If I calculate linear function score inside of a nested query, (according to my interpretation) I cannot use any other field in function_score query.
I cannot multiply the calculated score with any other function_score which is encapsulated into a nested query.
I would like to ask for any advice to resolve this issue.
Note that maybe I should get rid of this nested type and use key-value pairs instead. For example:
{
"omit": {
"A": {
"percent": 10
},
"B": {
"percent": 100
}
}
}
but unfortunately there will be a lot of keys, which would result a huge (continuously growing) mapping, so I not prefer this option.
At least I figured out a possible solution based on a "non-nested way". The complete script can be found here.
I modified the omit list as described in the question:
"omit": {
"A": {
"percent": 10
},
"B": {
"percent": 100
}
}
In addition I set the enabled flag to false to not have these elements in the mapping:
"omit": {
"type" : "object",
"enabled" : false
}
The last trick was to use script_score as a function_score's function, because only there I could use the value of percent by _source.omit.A.percent script:
{
"query": {
"function_score": {
"query": {
...
},
"script_score": {
"lang": "groovy",
"script": "if (_source.omit.A){(100-_source.omit.A.percent)/100} else {1}"
},
"score_mode": "multiply"
}
}
}

Resources