Elasticsearch difference between MUST and SHOULD bool query - elasticsearch

What is the difference between MUST and SHOULD bool query in ES?
If I ONLY want results that contain my terms should I then use must ?
I have a query that should only contain certain values, and also no results that has a lower date/timestamp than todays time/date - NOW
Also
Can i use multiple filters inside a must like the code bellow:
"filtered": {
"filter": {
"bool" : {
"must" : {
"term" : { "type" : 1 }
"term" : { "totals" : 14 }
"term" : { "groupId" : 3 }
"range" : {
"expires" : {
"gte": "now"
}
}
},

must means: The clause (query) must appear in matching documents. These clauses must match, like logical AND.
should means: At least one of these clauses must match, like logical OR.
Basically they are used like logical operators AND and OR. See this.
Now in a bool query:
must means: Clauses that must match for the document to be included.
should means: If these clauses match, they increase the _score; otherwise, they have no effect. They are simply used to refine the relevance score for each document.
Yes you can use multiple filters inside must.

Since this is a popular question, I would like to add that in Elasticsearch version 2 things changed a bit.
Instead of filtered query, one should use bool query in the top level.
If you don't care about the score of must parts, then put those parts into filter key. No scoring means faster search. Also, Elasticsearch will automatically figure out, whether to cache them, etc. must_not is equally valid for caching.
Reference: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-bool-query.html
Also, mind that "gte": "now" cannot be cached, because of millisecond granularity. Use two ranges in a must clause: one with now/1h and another with now so that the first can be cached for a while and the second for precise filtering accelerated on a smaller result set.

As said in the documentation:
Must: The clause (query) must appear in matching documents.
Should: The clause (query) should appear in the matching document. In a boolean query with no must clauses, one or more should clauses must match a document. The minimum number of should clauses to match can be set using the minimum_should_match parameter.
In other words, results will have to be matched by all the queries present in the must clause ( or match at least one of the should clauses if there is no must clause.
Since you want your results to satisfy all the queries, you should use must.
You can indeed use filters inside a boolean query.

Related

Elastic Search Must vs Should vs ExcludeFilters

Can someone elaborate on these three filters in elastic search namely must, should, excludeFilters.
Are these parts of bool queries which come under compound category of queries in elastic search.
must is the same as logical AND operator and should is the same as logical OR operator
These clauses are used to combine multiple conditions (using bool query) when you are creating your DSL query
All these queries are included inside the bool query. For example
{
"query": {
"bool": {
"must": {},
"should": {},
"filter": {}
}
}
}
Update 1:
should (contribute to relevancy score) this means that when you use a should clause, then the search results are returned based on a number of factors like doc count, length of the field, frequency, total term frequency etc.
Whereas in the case of filter (doesn't contribute to relevancy score), this means that it just gives an answer of yes/no i.e whether a document matches or not. (It does not consider other factors as must and should clause considers)

Elasticsearch: What is the difference between a match and a term in a filter?

I was following an ES tutorial, and at some point I wrote a query using term in the filter instead the recommended solution using match. My understanding is that match was used in the query part to get scoring, while term was used in the filter part to just remove hits before enter the query part. To my surprise match also works in the filter part.
What is the difference between:
GET blogs/_search
{
"query": {
"bool": {
"filter": {
"match": {
"category.keyword": "News"
}
}
}
}
}
and:
GET blogs/_search
{
"query": {
"bool": {
"filter": {
"term": {
"category.keyword": "News"
}
}
}
}
}
Both returns the same hits, and the score is 0 for all hits.
What is the behaviour or match in a filter clause? I would expect it to yield some score, but it does not.
What I thought:
term : does not analyze either the parameter or the field, and it is a yes/no scenario.
match : analyzes parameter and field and calculates a score of how good they match.
But when using match against a keyword in the filter part of the query, how does it behave?
The match query is a high-level query that resorts to using a term query if it needs to.
Scoring has nothing to do with using match instead of term. Scoring kicks in when you use bool/must/should instead of bool/filter.
Here is how the match query works:
First, it checks the type of the field.
If it's a text field then the value will be analyzed, either with the analyzer specified in the query (if any), or with the search- or index-time analyzer specified in the mapping.
If it's a keyword field (like in your case), then the input is not analyzed and taken "as is"
Since you're using the match query on a keyword field and your input is a single term, nothing is analyzed and the match query resorts to using a term query underneath. This is why you're seeing the same results.
In general, it's always best to use a match query as it is smart enough to know what to do given the field you're querying and the input data you're searching for.
You can read more about the difference between the two here.

What is the difference between `constant_score + filter` and `term` query?

I have two queries in Elasticsearch:
{
"term" : {
"price" : 20
}
}
and
"constant_score" : {
"filter" : {
"term" : {
"price" : 20
}
}
}
They are returning the same query result. I wonder what the main difference between them. I read some articles about scoring document. And I believe both queries are scoring document. The constant_score will use default score 1.0 to match the document's score. So I don't see much difference between these two.
The results would be exactly the exact.
However, the biggest difference is that the constant_score/filter version will cache the results of the term query since it's run in a filter context. All future executions will leverage that cache. Also, one feature of the constant_score query is that the returned score is always equal to the given boost value (which defaults to 1)
The first query will be run outside of the filter context and hence not benefit from the filter cache.

Using term or terms with one value in Elasticsearch queries

I am querying an Elasticsearch index using the values of a field. Sometimes, I have to extract all the documents having a field set to exactly one value; Some other times I have to retrieve all the documents having a field, set with one of the values in a list of values.
The latter use case contains the former. Can I use a single query using the terms construct?
POST /_search
{
"query": {
"terms" : { "user" : ["kimchy", "elasticsearch"]}
}
}
Or, in cases I know I need to search only for a unique value, it is better to use the term construct?
POST _search
{
"query": {
"term" : { "user" : "kimchy" }
}
}
Which approach is better regarding performance? Does Elasticsearch perform any optimization if the value in the terms construct is unique?
Thanks to all.
See this link. Terms query is automatically cached while term query is not . So, the next you run the same query, the took time for query for execution will be faster. So if you have a case where you need to run the same query again and again, terms query is a good choice. If not, there is not much of difference between the two.

How to enable fuzziness for phrase queries in ElasticSearch

We're using ElasticSearch for searching through millions of tags. Our users should be able to include boolean operators (+, -, "xy", AND, OR, brackets). If no hits are returned, we fall back to a spelling suggestion provided by ES and search again. That's our query:
$ curl -XGET 'http://127.0.0.1:9200/my_index/my_type/_search' -d '
{
"query" : {
"query_string" : {
"query" : "some test query +bools -included",
"default_operator" : "AND"
}
},
"suggest" : {
"text" : "some test query +bools -included",
"simple_phrase" : {
"phrase" : {
"field" : "my_tags_field",
"size" : 1
}
}
}
}
Instead of only providing a fallback to spelling suggestions, we'd like to enable fuzzy matching. If, for example, a user searches for "stackoverfolw", ES should return matches for "stackoverflow".
Additional question: What's the better performing method for "correcting" spelling errors? As it is now, we have to perform two subsequent requests, first with the original search term, then with the by ES suggested term.
The query_string does support some fuzziness but only when using the ~ operator, which I think doesn't your usecase. I would add a fuzzy query then and put it in or with the existing query_string. For instance you can use a bool query and add the fuzzy query as a should clause, keeping the original query_string as a must clause.
As for your additional question about how to correct spelling mistakes: I would use fuzzy queries to automatically correct them and two subsequent requests if you want the user to select the right correction from a list (e.g. Did you mean), but your approach sounds good too.

Resources