Combine queries and order results by score - elasticsearch

I want Elastic to execute multiple (multi-match) queries and sort them by score. The score of each query should be calculated indepentent of the other queries (which is different from what I have googled so far with the bool/should clause I think).
Example:
Query 1:
"multi_match" : {
"query": "test",
"fields": ["a", "b", "c"],
"tie_breaker": 0.2,
"minimum_should_match": "50%"
}
Query 2:
"multi_match" : {
"query": "test2",
"fields": ["a", "b", "c"],
"tie_breaker": 0.2,
"minimum_should_match": "50%"
}
Combine both results and order by score. How can I do that with Elastic?

I believe Dis Max query is what you are looking for:
A query that generates the union of documents produced by its
subqueries, and that scores each document with the maximum score for
that document as produced by any subquery, plus a tie breaking
increment for any additional matching subqueries.

Related

How does Elasticsearch aggregate or weight scores from two sub queries ("bool query" and "decay function")

I have a complicated Elasticsearch query like the following example. This query has two sub queries: a weighted bool query and a decay function. I am trying to understand how Elasticsearch aggregrates the scores from each sub queries. If I run the first sub query alone (the weighted bool query), my top score is 20. If I run the second sub query alone (the decay function), my score is 1. However, if I run both sub queries together, my top score is 15. Can someone explain this?
My second related question is how to weight the scores from the two sub queries?
query = { "function_score": {
"query": {
"bool": {
"should": [
{'match': {'title': {'query': 'Quantum computing', 'boost': 1}}},
{'match': {'author': {'query': 'Richard Feynman', 'boost': 2}}}
]
},
},
"functions": [
{ "exp": # a built-in exponential decay function
{
"publication_date": {
"origin": "2000-01-01",
"offset": "7d",
"scale": "180d",
"decay": 0.5
},
},
}]
}}
I found the answer myself by reading the elasticsearch document on the usage of function_score. function_score has a parameter boost_mode that specifies how query score and function score are combined. By default, boost_mode is set to multiply.
Besides the default multiply method, we could also set boost_mode to avg, and add a parameter weight to the above decay function exp, then the combined score will be: ( the_bool_query_score + the_decay_function_score * weight ) / ( 1 + weight ).

Elasticsearch filtering with input array where

Our requirement is to filter objects by an array field of data by giving an input array to elasticsearch. Any combination input array elements is match with mentions array.
Small example
data:[
{"name": "xxxx", "mentions": ["X", "Y"]},
{"name": "yyyy", "mentions": ["K", "L", "M"]},
{"name": "zzz", "mentions": ["X", "L"]},
]
Input: [X, Y, K, L]
Output:[
{"name": "xxxx", "mentions": ["X", "Y"]},
{"name": "zzz", "mentions": ["X", "L"]}
]
Objects must be filtered according to mentions field, where each member of mentions array must be in the given input array, if there is any inconsistency, then ignore the object.
Terms query or bool with must field is not solving our problem.
A very simplistic solution is to make use of a Regex Expression in a Regex Query:
Below is how your query would be:
POST <your_index_name>/_search
{
"query": {
"bool": {
"must_not": [ <---- Note this.
{
"regexp": {
"mentions": "[^XYKL]" <---- Note this.
}
}
]
}
}
}
Square Brackets [...] would mean to match one of the characters present.
What I've done is simply used a Negate Character ^ inside the bracket and wrapped that Regex Logic inside a must_not clause of a Bool Query and it should give you what you are looking for.
The query would only return documents with values X Y K L values. Any other values barring that, it would not return those documents.
Note that I'm assuming the field mentions is of type keyword.

How do I return just the fields from a query?

If I run a search query, I want to be able to select just the distinct fields from the hits/sources. To be clear, I don't want to limit the fields that are returned, I want to select the available fields as a list. Is this possible?
For example, if I ran a typical search and there are three results each with different fields:
R1
"Field1": "foo"
"Field2": "bar"
R2
"Field2": "bara"
R3
"Field1": "fooa"
"Field3": "baz"
The result would be ["Field1", "Field2", "Field3"]

Boosting only results with a near-identical score in Elasticsearch

I'm using the following query to search through a database of names, allowing fuzzy matching but giving preference to exact matches.
"query": {
"bool": {
"should": [
{
"match": {
"name": {
"query": "x",
"operator": "and",
"boost": 10
}
}
},
{
"match": {
"name": {
"query": "x",
"fuzziness": "AUTO",
"operator": "and"
}
}
},
{
"match": {
"altname": {
"query": "x",
"fuzziness": "AUTO",
"operator": "and"
}
}
}
]
}
}
The database contains entries with identical names. If that happens, I would like to boost those entries by a second field, let's call it weight. However, I only want the boost to be applied between the subset of results with a (near) identical score, not to all of the results.
This is further complicated by the fact that results with an identical name may receive a slightly different score, as they are influenced by the relevancy on the altname field.
For example, querying for dog could give 3 results:
Dog [id 1, score 2.3, weight 10]
Dog [id 2, score 2.2, weight 20]
Doge [id 3, score 1, weight 100]
I'm looking for a query that would boost the result with id 2 to the top score. The result with id 3 should always stay at the bottom due to its poor relevancy, regardless of its weight. Ideally with tunable parameters to tweak the factor of the score vs. the factor of the weight.
Any way to do this in a single pass in Elasticsearch, of course without ruining performance?
Looks like I figured it out.
First, I realised that the example in my original question was more complex than necessary. I narrowed it down to: "How to compose a query for 'blub' that returns the following documents in the order 2, 3, 1"
id: 1
name: blub
weight: 0.01
---
id: 2
name: blub
weight: 0.1
---
id: 3
name: blub stuff
weight: 1
Thus: for the two documents with an identical (or very similar) score, the weight should be used as a tie-breaker. But documents with a significantly lower score should never be allowed to trump other results, regardless of their weight.
I loaded the data in the excellent Play tool: https://www.found.no/play/gist/edd93c69c015d4c62366#search and started experimenting.
Turned out the log2p modifier did exactly what I expected. Repeated it on a real-world dataset and everything looks exactly as expected.
function_score:
query:
match:
name: blub
field_value_factor:
field: weight
modifier: log2p

Constant Score Query elasticsearch boosting

My understanding of Constant Score Query in elasticsearch is that boost factor would be assigned as score for every matching query. The documentation says:
A query that wraps a filter or another query and simply returns a constant score equal to the query boost for every document in the filter.
However when I send this query:
"query": {
"constant_score": {
"filter": {
"term": {
"source": "BBC"
}
},
"boost": 3
}
},
"fields": ["title", "source"]
all the matching documents are given a score of 1?! I cannot figure out what I am doing wrong, and had also tried with query instead of filter in constant_score.
Scores are only meant to be relative to all other scores in a given result set, so a result set where everything has the score of 3 is the same as a result set where everything has the score of 1.
Really, the only purpose of the relevance _score is to sort the results of the current query in the correct order. You should not try to compare the relevance scores from different queries. - Elasticsearch Guide
Either the constant score is being ignored because it's not being combined with another query or it's being normalized. As #keety said, check to the output of explain to see exactly what's going on.
Constant score query gives equal score to any matching document irrespective any scoring factors like TF, IDF etc. This can be used when you don't care whether how much a doc matched but just if a doc matched or not and give a score too, unlike filter.
If you want score as 3 literally for all the matching documents for a particular query, then you should be using function score query, something like
"query": {
"function_score": {
"functions": [
{
"filter": { "term": { "source": "BBC" } },
"weight": 3
}
]
}
...
}

Resources