How to specify the execution order of filter and query in an Elasticsearch query - elasticsearch

Consider the following query in Elasticsearch:
GET nyc_visionzero/_search
{
"query": {
"bool": {
"must": [{
"fuzzy": {
"on_street_name": "AVENUE"
}
}
],
"filter": {
"term": {
"borough": "MANHATTAN"
}
}
}
}
}
Is the filter part executed first and then fuzzy or its the other way around? What if I want to change the order of their execution! How can I do that?

This question relates to the query vs. filter context topic. Everything in the query context (here query.bool.must) counts to the score of a document whereas the conditions in the filter context (query.filter) are a yes/no decision.
So from a performance perspective, filters are faster and can be cached. On the other side queries allow for some fuzziness.
There is a much more detailed explanation on this in the elasticsearch docs on query and filter context.

Related

In Elasticsearch, how do I combine multiple filters with OR without affecting the score?

In Elasticsearch, I want to filter my results with two different clauses aggregated with OR e.g. return documents with PropertyA=true OR PropertyB=true.
I've been trying to do this using a bool query. My base query is just a text search in must. If I put both clauses in the filter occurrence type, it aggregates them with an AND. If I put both clauses in the should occurrence type with minimum_should_match set to 1, then I get the right results. But then, documents matching both conditions get a higher score because "should" runs in a query context.
How do I filter to only documents matching either of two conditions, without increasing the score of documents matching both conditions?
Thanks in advance
You need to leverage the constant_score query, so everything runs in the filter context:
{
"query": {
"constant_score": {
"filter": {
"bool": {
"minimum_should_match": 1,
"should": [
{
"term": {
"PropertyA": true
}
},
{
"term": {
"PropertyB": true
}
}
]
}
}
}
}
}

What is the need for "bool" in queries with single must/should/must_not?

In Elasticsearch I can use boolean combination with Single must/should, is there any meaning to that?
This example with only one must inside bool works
GET /logstash-2021.02.25/_search
{
"query":
{
"bool":
{
"must":
[{
"match":
{
"level": "Error"
}
}]
}
}
}
In this other example without the "bool" (since there is only one must) it doesn't work
GET /logstash-2021.02.25/_search
{
"query":
{
"must":
[{
"match":
{
"level": "Error"
}
}]
}
}
ES DSL Query with bool means, Query that matches documents matching boolean combinations of other queries. The bool query maps to the underlying Elasticsearch layer of Lucene's Boolean query which makes the ES more powerful and delightful. It is built using one or more boolean clauses, each clause with a typed occurrence.
Elastic search bool query is a widely used query when it comes to a complex combination of filters. furthermore, ElasticSearch is able to run all the complex queries together in real-time and locate the most suitable results and return them to the user in a very short amount of time using the bool query

How can we make few tokens to be phrase in elastic search query

I want to search part of query to be considered as phrase .For e.g. I want to search "Can you show me documents for Hospitality and Airline Industry"
Here I want Airline Industry to be considered as phrase.I dont find any such settings in multi_match .
Even when we try to use multi_match query using "Can you show me documents for Hospitality and \"Airline Industry\"" .Default analyser breaks it into separate tokens.I dont want to change settings of my analyser.Also I have found that we can do this in simple_query_string but that has consequences that we can not apply filter option as we have in multi_match boolean query because I want to apply filter on certain feilds as well.
search_text="Can you show me documents for Hospitality and Airline Industry" Now I Want to pass Airline Industry as a phrase to search my indexed document against 2 fields.
okay so say I have existing code like this.
If filter:
qry={
“query":{
“bool”:{
“must”:{
"multi_match":{
"query":search_text,
"type":"best_fields",
"fields":["TITLE1","TEXT"],
"tie_breaker":0.3,
}
},
“filter”:{“terms”:{“GRP_CD”:[“1234”,”5678”] }
}
}
else:
qry={
"query":{
"multi_match":{
"query":search_text',
"type":"best_fields",
"fields":["TITLE1",TEXT"],
"tie_breaker":0.3
}
}
}
'But then I have realised this code is not handling Airline Industry as a phrase even though I am passing search string like this
"Can you show me documents for Hospitality and \"Airline Industry\""
As per elastic search document I came to know there is this query which might handle this
qry={"query":{
"simple_query_string":{
"query":"Can you show me documents for Hospitality and \"Airline Industry\"",
"fields":["TITLE1","TEXT"] }
} }
But now my issue is what if user want to apply filter..with filter query as above I can not pass phrase and boolean query is not possible with simple_query_string'
You can always combine queries using boolean query. Lets understand this case by case. Before going to the cases I would like to clarify one thing which is about filter. The filter clause of boolean query behave just like a must clause but the difference is that any query (even another boolean query with a must/should clause(s)) inside filter clause have filter context. Filter context means, that part of query will not be considered for score calculation.
Now lets move on to cases:
Case 1: Only query and no filters.
{
"query": {
"bool": {
"must": [
{
"simple_query_string": {
"query": "Can you show me documents for Hospitality and \"Airline Industry\"",
"fields": [
"TITLE1",
"TEXT"
]
}
}
]
}
}
}
Notice that the query is same as specified by you in the question. All I have done here is that I wrapped it in a bool query. This doesn't make any logical change to the query but doing so will make it easier to add queries to filter clause programmatically.
Case 2: Phrase query with filter.
{
"query": {
"bool": {
"must": [
{
"simple_query_string": {
"query": "Can you show me documents for Hospitality and \"Airline Industry\"",
"fields": [
"TITLE1",
"TEXT"
]
}
}
],
"filter": [
{
"terms": {
"GRP_CD": [
"1234",
"5678"
]
}
}
]
}
}
}
This way you can combine query(query context) with the filters.

How do nested bool queries work ? Is there a single wrapper that can disable nesting on all the nested levels?

If you wrap bool query in constant Score Query does it calculate score for internal queries. Is there another easy way to disable scoring?
Hi I have an update so I have a query where no scoring is required, so I wrote it in two forms and did load testing with 10000 documents.
Following are the two structures of query with which I did load testing:
{
"query": {
"bool": {
"filter": [
{
"must": {
"bool": {
"must": [
{
bool:.......
}
]
}
}
}
]
}
}
}
And the second one is:
{
"query": {
"bool": {
"filter": [
{
"filter": {
"bool": {
"filter": [
{
bool:.......
}
]
}
}
}
]
}
}
}
What I found was that the first query took almost as double time as the second query. I would like to know why this happened?
Also, do the internal bool queries inside filter in first example run in query context or filter context?
I have read elastic search documentation and cannot find references or details on how it works internally.
Thanks, in advance!!
Query can have two type of context in elastic search. Query context and filter context. Query context tells how well a document matches the query i.e. it calculates score whereas filter context tells whether a document matches the query and no scoring is done.
To answer your question, if you don't want scoring for bool query simply put it in filter context. More info on query context can be found here
Caching is probably why. See the documentation: "Frequently used filters will be cached automatically by Elasticsearch, to speed up performance."
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-filter-context.html

Elastic search wildcard query crashes cluster

I run the query below on a large elastic search cluster. The cluster bcomes unresponsive
{
"size": 10000,
"query": {
"bool": {
"must": [
{
"regexp": {
"message": {
"value": ".*exception.*"
}
}
},
{
"bool": {
"should": [
{
"term": {
"beat.hostname": "ip-xxx-xx-xx-xx"
}
}
]
}
},
{
"range": {
"#timestamp": {
"lt": 1518459660000,
"format": "epoch_millis",
"gte": 1518459600000
}
}
}
]
}
}
}
When I remove the wildcarded .*exception.* and replace it with any non wildcarded string like xyz it returns fast. Though the query uses a wildcarded expression, it also looks for a small time range and a specific host. I would think this is a very simple query. Any reason why elasticsearch server can't handle this query? The cluster has 10 nodes and 20 TB of data.
See the documentation for Regexp Query. It clearly states the following:
Note: The performance of a regexp query heavily depends on the regular
expression chosen. Matching everything like .* is very slow
What would be ideal is to change the text analysis on the message field with a WordDelimiterTokenFilter and set split_on_case_change to true. Then something like NullPointerException will get indexed as three separate tokens [Null, Pointer, Exception]. This can help you search on exception without using a regex. Caveat is you need to reindex all your documents.
Another quick thing to try might be to keep your filter conditions on the hostname and timestamp in a filter context, which will prefilter documents before running your regexp query. This may be a short-term solution for you until you fix the text analysis.

Resources