How do nested bool queries work ? Is there a single wrapper that can disable nesting on all the nested levels? - elasticsearch

If you wrap bool query in constant Score Query does it calculate score for internal queries. Is there another easy way to disable scoring?
Hi I have an update so I have a query where no scoring is required, so I wrote it in two forms and did load testing with 10000 documents.
Following are the two structures of query with which I did load testing:
{
"query": {
"bool": {
"filter": [
{
"must": {
"bool": {
"must": [
{
bool:.......
}
]
}
}
}
]
}
}
}
And the second one is:
{
"query": {
"bool": {
"filter": [
{
"filter": {
"bool": {
"filter": [
{
bool:.......
}
]
}
}
}
]
}
}
}
What I found was that the first query took almost as double time as the second query. I would like to know why this happened?
Also, do the internal bool queries inside filter in first example run in query context or filter context?
I have read elastic search documentation and cannot find references or details on how it works internally.
Thanks, in advance!!

Query can have two type of context in elastic search. Query context and filter context. Query context tells how well a document matches the query i.e. it calculates score whereas filter context tells whether a document matches the query and no scoring is done.
To answer your question, if you don't want scoring for bool query simply put it in filter context. More info on query context can be found here

Caching is probably why. See the documentation: "Frequently used filters will be cached automatically by Elasticsearch, to speed up performance."
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-filter-context.html

Related

In Elasticsearch, how do I combine multiple filters with OR without affecting the score?

In Elasticsearch, I want to filter my results with two different clauses aggregated with OR e.g. return documents with PropertyA=true OR PropertyB=true.
I've been trying to do this using a bool query. My base query is just a text search in must. If I put both clauses in the filter occurrence type, it aggregates them with an AND. If I put both clauses in the should occurrence type with minimum_should_match set to 1, then I get the right results. But then, documents matching both conditions get a higher score because "should" runs in a query context.
How do I filter to only documents matching either of two conditions, without increasing the score of documents matching both conditions?
Thanks in advance
You need to leverage the constant_score query, so everything runs in the filter context:
{
"query": {
"constant_score": {
"filter": {
"bool": {
"minimum_should_match": 1,
"should": [
{
"term": {
"PropertyA": true
}
},
{
"term": {
"PropertyB": true
}
}
]
}
}
}
}
}

What is the need for "bool" in queries with single must/should/must_not?

In Elasticsearch I can use boolean combination with Single must/should, is there any meaning to that?
This example with only one must inside bool works
GET /logstash-2021.02.25/_search
{
"query":
{
"bool":
{
"must":
[{
"match":
{
"level": "Error"
}
}]
}
}
}
In this other example without the "bool" (since there is only one must) it doesn't work
GET /logstash-2021.02.25/_search
{
"query":
{
"must":
[{
"match":
{
"level": "Error"
}
}]
}
}
ES DSL Query with bool means, Query that matches documents matching boolean combinations of other queries. The bool query maps to the underlying Elasticsearch layer of Lucene's Boolean query which makes the ES more powerful and delightful. It is built using one or more boolean clauses, each clause with a typed occurrence.
Elastic search bool query is a widely used query when it comes to a complex combination of filters. furthermore, ElasticSearch is able to run all the complex queries together in real-time and locate the most suitable results and return them to the user in a very short amount of time using the bool query

Elastic search wildcard query crashes cluster

I run the query below on a large elastic search cluster. The cluster bcomes unresponsive
{
"size": 10000,
"query": {
"bool": {
"must": [
{
"regexp": {
"message": {
"value": ".*exception.*"
}
}
},
{
"bool": {
"should": [
{
"term": {
"beat.hostname": "ip-xxx-xx-xx-xx"
}
}
]
}
},
{
"range": {
"#timestamp": {
"lt": 1518459660000,
"format": "epoch_millis",
"gte": 1518459600000
}
}
}
]
}
}
}
When I remove the wildcarded .*exception.* and replace it with any non wildcarded string like xyz it returns fast. Though the query uses a wildcarded expression, it also looks for a small time range and a specific host. I would think this is a very simple query. Any reason why elasticsearch server can't handle this query? The cluster has 10 nodes and 20 TB of data.
See the documentation for Regexp Query. It clearly states the following:
Note: The performance of a regexp query heavily depends on the regular
expression chosen. Matching everything like .* is very slow
What would be ideal is to change the text analysis on the message field with a WordDelimiterTokenFilter and set split_on_case_change to true. Then something like NullPointerException will get indexed as three separate tokens [Null, Pointer, Exception]. This can help you search on exception without using a regex. Caveat is you need to reindex all your documents.
Another quick thing to try might be to keep your filter conditions on the hostname and timestamp in a filter context, which will prefilter documents before running your regexp query. This may be a short-term solution for you until you fix the text analysis.

How to specify the execution order of filter and query in an Elasticsearch query

Consider the following query in Elasticsearch:
GET nyc_visionzero/_search
{
"query": {
"bool": {
"must": [{
"fuzzy": {
"on_street_name": "AVENUE"
}
}
],
"filter": {
"term": {
"borough": "MANHATTAN"
}
}
}
}
}
Is the filter part executed first and then fuzzy or its the other way around? What if I want to change the order of their execution! How can I do that?
This question relates to the query vs. filter context topic. Everything in the query context (here query.bool.must) counts to the score of a document whereas the conditions in the filter context (query.filter) are a yes/no decision.
So from a performance perspective, filters are faster and can be cached. On the other side queries allow for some fuzziness.
There is a much more detailed explanation on this in the elasticsearch docs on query and filter context.

performance query in elasticsearch

I have 2 queries:
GET _search
{
"query": {
"constant_score": {
"filter": {
"term": {
"idpays": 250
}
}
}
}
}
and
GET _search
{
"query": {
"constant_score": {
"filter": {
"bool": {
"must": {
"term": {
"idpays": 250
}
}
}
}
}
}
}
Theses 2 queries return the same results.
Which one has the best performance? The first one or the second one with bool and must?
Regards
Since Elasticsearch uses lucene under the hood all the queries are rewritten as simpler lucene queries before they are executed. If you use overly complicated queries to do a simple task, it will take Elasticsearch more time to rewrite the query into a simpler one.
add "profile": true at the root of your query to return a detailed analysis of the performance stats of the query and take a look at the rewrite time.
The larger the time the more complex the query is. A quick look tells me the second one should be slower but you should analyze the results yourself.

Resources