Is it true wrap inside Elasticsearch bool:filter sets query in constant score mode, no matter what's wrapped inside? - elasticsearch

I tested it with many levels nesting, it seems a filter like the following:
{"query": {"bool": {"filter": [{"bool": {"must": [{"bool": {"must": [{"bool": {"must": [{"term": {"stringField": "kimchy"}}]}}]}}]}}]}}}
runs in constant score mode, confirmed by _search/explain=true .
"_explanation":{"value":0.0,"description":"ConstantScore(stringField:kimchy)^0.0"
It seems to me as long as the top level is wrapped in a bool:filter or constant-score, the query runs without scoring. but i found no document in ES website claiming so, anyone know of a definitive answer here?

That's absolutely correct and that's mentioned here in the official documentation:
In a filter context, a query clause answers the question “Does this document match this query clause?” The answer is a simple Yes or No — no scores are calculated
[...]
Filter context is in effect whenever a query clause is passed to a filter parameter, such as the filter or must_not parameters in the bool query, the filter parameter in the constant_score query, or the filter aggregation.

Related

Is there performance difference between constant_score and bool query using filter?

When it comes to the "performance difference", I read nothing reliable till now.
Based on its official docs, as to filter used in the bool query
The clause (query) must appear in matching documents. However unlike must the score of the query will be ignored. Filter clauses are executed in filter context, meaning that scoring is ignored and clauses are considered for caching.
As for constant score query
Filter queries do not calculate relevance scores. To speed up performance, Elasticsearch automatically caches frequently used filter queries.
Just a guess
Constant query will not calculate (TF-IDF or more advanced algs), while the bool query will do the calculation but return 0 (ignoring it); so the constant query is more performant.
Besides when it comes to a specified score, you have to use constant score query instead of bool query which only will return 0.
QAs I just read: Elasticsearch : constant_score query vs bool.filter query
NO, there is no performance difference since they are the same.
Based on again its offical doc discussing about filter context:
In a filter context, a query clause answers the question “Does this document match this query clause?” The answer is a simple Yes or No — no scores are calculated.
And
Filter context is in effect whenever a query clause is passed to a filter parameter, such as the filter or must_not parameters in the bool query, the filter parameter in the constant_score query, or the filter aggregation.

Filter vs query behaviour in constant_score

I am confused about the difference between behaviour of filter and query when wrapped in constant_score compound query. Both gives me score of 1 without any boost for all documents. But the docs say that filter context is activated when we use filter clause inside constant_score. If I am getting constant score for all documents with query parameter under constant_score then that means that the query is running in filter context only. So why the doc specifically mentions filter parameter inside constant_score ? What am I missing ?
You should read this part of documentation:
https://www.elastic.co/guide/en/elasticsearch/reference/5.5/query-filter-context.html
https://www.elastic.co/guide/en/elasticsearch/reference/5.5/query-dsl-constant-score-query.html
Elasticsearch provides good explanation of your.
However, filters out constant_score will return all values that match them. Filters in constant_point, returns the result of the filter, limited by constant_score.
Regards.

Elastic Search: Use Query or Filtered Query for term, prefix or range queries?

I wonder if I should translate my Term Queries into a filtered Query to cache the results and to keep the score?
From the documentation of filtered query:
Filters are usually faster than queries because:
they don’t have to calculate the relevance _score for each document — 
the answer is just a boolean “Yes, the document matches the filter” or
“No, the document does not match the filter”.
the results from most
filters can be cached in memory, making subsequent executions faster
Also, from the filters:
Some filters already produce a result that is easily cacheable, and
the difference between caching and not caching them is the act of
placing the result in the cache or not. These filters, which include
the term, terms, prefix, and range filters, are by default cached and
are recommended to use (compared to the equivalent query version) when
the same filter (same parameters) will be used across multiple
different queries (for example, a range filter with age higher than
10).
My solution is that I would then put the term query inside a filtered query, where the "query" and "filter" predicate are the same.
I guess that I then have the best of both worlds, scoring and caching. Does this makes sense?
Sample Query:
{
"_source":true,
"query":{"term":{"displayName":"example name"}}
}
Optimized Filtered Query:
{
"_source":true,
"query":
{"filtered":
{"query":{"term":{"displayName":"example name"}},
"filter":{"term":{"displayName":"example name"}}
}
}
}
I have tested this, but didn't noticed some performance gains. As I have to write a lot of search terms beforehand, I would like to know what is the best solution and why.

Elasticsearch Filtered query vs Filter [duplicate]

This question already has an answer here:
Does it matter if the filter is inside or outside a filtered query?
(1 answer)
Closed 5 years ago.
Is there any difference between "query and filter in filtered" and "query and filter on the root"? for example
Case 1:
{
"query":{
"filtered":{
"query":{
"term":{"title":"kitchen3"}
},
"filter":{
"term":{"price":1000}
}
}
}
}
Case 2:
{
"query":{
"term":{"title":"kitchen3"}
},
"filter":{
"term":{"price":1000}
}
}
I found this discussion http://elasticsearch-users.115913.n3.nabble.com/Filtered-query-vs-using-filter-outside-td3960119.html, but referenced URL is 404 and the explanation is a bit too concise for me.
Please teach or give any document which is pointing the difference between these, thank you.
The difference is related to performance. "filter" on top level is always executed after the query. This means the query is executed on all documents, score is computed for all documents etc. - and only then documents not matching filter are excluded.
With "filtered" query there is a possibility that ES will optimize this computation, e.g. first executing the filter, then executing query on a limited set of documents, saving time on testing the documents that don't match the filter against the query, and on computing scores for them if they do match the query.
If you are performing multiple queries with same filter, then there are even more advantages: the filter may be cached, improving performance of each query even further. This applies to your example: "term" filters are cached by default.
You also can explicitly control the execution of "filtered" query (see the documentation) to optimize it for your particular use case.
The filters in the two types can be referred as pre and post filters also. As #alexey explained, root level filter is performed after query and filter in filtered query is performed before the query.
In addition you need to understand the impact of the same other then the order they are executed. The filter in "filtered" query comes under the query scope which means that while calculating aggregations the filtered output will be considered while in case of the root level filter aggregations will be performed only on the results of the query excluding the filter. Though in both case the result documents will be same.
For example with the two queries you have posted, both will give same results, but if you are performing aggregations also the first query will calculate aggregation count from documents matching title kitchen3 and price 10000 while the second query will calculate aggregation count from documents matching title kitchen3 only without filter of price 1000.

Elasticsearch query too many results

I'm tring to set up a simple search that would return me simple results with a custom ordering, the ordering i get back is fine based on a custom score.
The problem is that for this query
"query": {
"query_string": {
"query": query_term,
"fields": ["name_auto"],
}
}
NOTE: name_auto is an Edge N gram field on elastics
I always get a result set also if the query does not make any sense.
Example:
I have an elastcisearch index populated with the name of all the android applications.
If i search for face i get back all the results related to it ordered by number of comments on the play store, menans [facebook, facebook messenger, ...]
The problem is that when i query for something like facesomeuselesschars i still get the same results as before but fore sure there is nothing that match "someuselesschars".
Can anybody help about
ElasticSearch will always return results that match your query, even if the score of those results are poor. Your query for 'facesomeuselesschars' will match anything that has 'face' in it because of your ngrams (e.g. the first four characters of your query will be match multiple tokens in your index).
The rest of the characters in your query will simply lower the score of the returned match, but not prevent it from being returned.
If you want to set a minimum score that a result must reach, you can use the min_score parameter.

Resources