When using elasticsearch-7 I'm confused by es compound queries syntax.
Though reading es documents repeatedly but i just find standard syntax of Boolean or Constant score seperately.
As it illuminate,i understand what is 'query context' and what is 'filter context'.But when combining these two query type in a single query i don't know what it mean.
Let's see a example:
GET /classes_test/_search
{
"size": "21",
"query": {
"constant_score": {
"filter": {
"bool": {
"must": [
{
"match": {
"class_name": "29386556"
}
}
],
"should": [
{
"term": {
"master": "7033560"
}
},
{
"term": {
"assistant": "7033560"
}
},
{
"term": {
"students": "7033560"
}
}
],
"minimum_should_match": 1,
"must_not": [
{
"term": {
"class_id": 0
}
}
],
"filter": [
{
"term": {
"class_status": "1"
}
}
]
}
}
}
}
}
This query can be executed and response well.Each item in response content has a '_score' value with 1.0.
So,is it mean that the sub bool query as a entirety is in a filter context though it has a 'must' and 'should'?
Also i found boolean query can have a constant score sub query.
Why es allow these syntax but has no more words to explain?
If you use a constant_score query, you'll never get scores different than 1.0, unless you specify boost parameters in which case the score will match those.
If you need scoring you obviously need to ditch constant_score.
In your case, your match query on class_name cannot yield any other score than 1 or 0 since this is basically a yes/no filter, not a matching based on full-text search.
To sum up, all your query executes in a filter context (hence score 0 or 1) since you don't rely on full-text search. So you get scoring whenever you use full-text search, not because you use a match query. In your case, you can merge all must constraints into filter, it won't make any difference since you only have filters (yes/no matches) and no full-text search.
Related
I'm a little confused about what is the difference between should and boost final score calculation
when a bool query has a must clause, the should clauses act as a boost factor, meaning none of them have to match but if they do, the relevancy score for that document will be boosted and thus appear higher in the result.
so,if we have:
one query which contains must and should clauses
vs
second query which contains must clause and boosting clause
Is there a difference ?
when you recommend to use must and should vs must and boosting clauses in a query ?
You can read the documentation of boolean query here, there is huge difference in the should and boost.
Should and must both contributes to the _score of the document, and as mentioned in the above documentation, follows the
The bool query takes a more-matches-is-better approach, so the score from each matching must or should clause will be added together to provide the final _score for each document.
While boost is a parameter, using which you can increase the weight according to your value, let me explain that using an example.
Index sample docs
POST _doc/1
{
"brand" : "samsung",
"name" : "samsung phone"
}
POST _doc/2
{
"brand" : "apple",
"name" : "apple phone"
}
Boolean Query using should without boost
{
"query": {
"bool": {
"should": [
{
"match": {
"name": {
"query": "apple"
}
}
},
{
"match": {
"brand": {
"query": "apple"
}
}
}
]
}
}
}
Search result showing score
"max_score": 1.3862942,
Now in same query use boost of factor 10
{
"query": {
"bool": {
"should": [
{
"match": {
"name": {
"query": "apple"
}
}
},
{
"match": {
"brand": {
"query": "apple",
"boost": 10 --> Note additional boost
}
}
}
]
}
}
}
Query result showing boost
"max_score": 7.624619, (Note considerable high score)
In short, when you want to boost a particular document containing your query term, you can additionally pass the boost param and it will be on top of the normal score calculated by should or must.
I'm trying to boost matches on a certain field over another.
This works fine:
{
"query": {
"bool": {
"should": [
{
"terms": {
"boost": 2,
"mainField": "foo"
}
},
{
"terms": {
"otherField": "foo"
}
}
]
}
}
}
When i see the documents matched on mainField, i see they have a _score of 2.0 as expected.
But when i wrap this same query in a filter:
{
"query": {
"bool": {
"filter": [
{
"bool": {
"should": [
{
"terms": {
"boost": 2,
"mainField": "foo"
}
},
{
"terms": {
"otherField": "foo"
}
}
]
}
}
]
}
}
}
The _score for all documents is 0.0.
The same thing happens for multi_match. By itself (e.g inside a query) it works fine, but inside a bool + filter, it doesn't work.
Can someone explain why this is the case? I need to wrap in a filter due to the way my app composes queries.
Some context might also help: I'm trying to return documents that match on either mainField or otherField, but sort the ones matching on mainField first, so i figured boost would be the most appropriate choice here. But let me know if there is a better way.
The filter queries are always executed in the filter context. It will always return a score of zero and only contribute to the filtering of documents.
Refer to this documentation, to know more about filter context
Due to this, you are not getting a _score of 2.0, even after applying boost, in the second query
Assume I have a compound bool query with various "must" and "should" statements that each may include different leaf queries including "multi-match" and "match_phrase" queries such as below.
How can I get the score from individual queries packed into a single query?
I know one way could be to break it down into multiple queries, execute each, and then aggregate the results in code-level (not query-level). However, I suppose that is less efficient, plus, I lose sorting/pagination/.... features from ElasticSearch.
I think "Explanation API" is also not useful for me since it provides very low-level details of scoring (inefficient and hard to parse) while I just need to know the score for each specific leaf query (which I've also already named them)
If I'm wrong on any terminology (e.g. compound, leaf), please correct me. The big picture is how to obtain individual scores from each sub-query inside of a bool query.
PS: I came across Different score functions in bool query. However, it does not return the scores. If I wrap my queries in "function_score", I want the scoring to be default but obtain the individual scores in response to the query.
Please see the snippet below:
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "...",
"fields": [
"field1^3",
"field2^5"
],
"_name": "must1_mm",
"boost": 3
}
}
],
"should": [
{
"multi_match": {
"query": "...",
"fields": [
"field3^2",
"field4^5"
],
"boost": 2,
"_name": "should1_mm",
"boost": 2
}
},
{
"match_phrase": {
"field5": {
"_name": "phrase1",
"boost": 1.5,
"query": "..."
}
}
},
{
"match_phrase": {
"field6": {
"_name": "phrase2",
"boost": 1,
"query": "..."
}
}
}
]
}
}
}```
I make several queries to ElasticSearch to retrieve documents by keywords (I match them by code or internal id's). I don't really care about scoring in those queries, just retrieving the documents.
Would wrapping the bool queries I use in a constant_score filter increase performance, or make sense whatsoever?
It make no sense. If you are using bool query then you can apply filter to them.
GET /_search
{
"query": {
"bool": {
"must": [
{ "match": { "title": "Search" }},
{ "match": { "content": "Elasticsearch" }}
],
"filter": [
{ "term": { "status": "published" }},
{ "range": { "publish_date": { "gte": "2015-01-01" }}}
]
}
}
}
filter - The clause (query) must appear in matching documents. However unlike must the score of the query will be ignored. Filter clauses are executed in filter context, meaning that scoring is ignored and clauses are considered for caching.
Even more constant_score should be used for scoring so if there is match apply "boost" value as a score.
To Sum Up: Use filter for filter and constant_score when you need score
I am using function_score so that i can use its score_mode as maximum score of the bool query i am using actually i have two boolean query inside should now i want the score of the document to be the maximum score among both queries my code is given below but when i am passing a string for matching both then scores are being added not be taken maximum can anyone please tell me how can i acheive that.
"function_score": {
"boost_mode": "max",
"score_mode": "max",
"query": {
bool: {
"disable_coord": true,
"should": [
{
bool: {
"disable_coord": true,
"must": [
{
"constant_score": { // here i am using this because to remove tf/idf factors from my scoring
boost: 1.04,
"query": {
query_string: {
query: location_search,
fields: ['places_city.city'],
// boost: 1.04
}
}
}
}
]
}
},
{
"constant_score": { // here i am using this because to remove tf/idf factors from my scoring
boost: 1,
"query": {
"fuzzy_like_this" : {
"fields" : ["places_city.city"],
"like_text" : "bangaloremn",
"prefix_length": 3,
"fuzziness": 2
}
}
}
}
], "minimum_should_match": 1
}
}
}
Yes boolean query takes a sum by design. If you want the maximum score of two queries, you ought to look at the dismax query. Dismax is designed to pick a "winner".
Roughly speaking, this would look like
{"query":
"dismax": {
"queries": [
{ /* your first constant_score query above */},
{/* your second constant_score query from above */}
]
}
}
Unfortunately, function score query doesn't have a great way of operating on more than one text query at a time. See this question. If you want to do any complex math with the scores of multiple queries, Solr actually has a lot more flexibility in this area.