Elasticsearch: should + minimum_should_match vs must - elasticsearch

I test with these 2 queries
Query with must
{
"size": 200,
"from": 0,
"query": {
"bool": {
"must": [ {
"match": {
"_all": "science"
}
},
{
"match": {
"category": "fiction"
}
},
{
"match": {
"country": "us"
}
}
]
}
}
}
Query with should + minimum_should_match
{
"size": 200,
"from": 0,
"query": {
"bool": {
"should": [ {
"match": {
"_all": "science"
}
},
{
"match": {
"category": "fiction"
}
},
{
"match": {
"country": "us"
}
}
],
minimum_should_match: 3
}
}
}
Both queries give me same result, I don't know the difference between these 2, when we should use minimum_should_match?

I guess you mean minimum_number_should_match, right?
In both cases it would be the same because you have the same number of clauses in should. minimum_number_should_match usually is used when you have more clauses than the number you specify there.
For example if you have 5 should clauses, but for some reason you only need three of them to be fulfilled you would do something like this:
{
"query": {
"bool": {
"should": [
{
"term": {
"tag": "wow"
}
},
{
"term": {
"tag": "elasticsearch"
}
},
{
"term": {
"tag": "tech"
}
},
{
"term": {
"user": "plchia"
}
},
{
"range": {
"age": {
"gte": 10,
"lte": 20
}
}
}
],
"minimum_should_match": 3
}
}
}

That's correct and desired behavior. Let's decipher it a little bit:
Boolean query with must clauses means that all clauses under must section are required to match. Just like in English - it means strong obligation.
Boolean query with should clauses means that some clauses are required to match, whereas the others are not (i.e. soft obligation). The default number of clauses that must match here is simply 1. And to override this behavior the minimum_should_match parameter is coming into play. If you specify minimum_should_match=3 it will mean 3 clauses under should must match. From the practical perspective it exactly the same as specifying those clauses with must.
Hope it explains it in details.

Related

ElasticSearch lucene query with subclauses conversion to ES syntax

I've been trying to convert a lucene style query to ES query syntax but I'm getting stuck on sub-clauses. e.g.
(title:history^10 or series:history) and (NOT(language:eng) OR language:eng^5) and (isfree eq 'true' OR (isfree eq 'false' AND owned eq 'abc^5'))
This states that "get me a match for history in 'title' or 'series' but boost the title match AND where the language doesn't have to be english, but if if is then boost it AND where the match is free or where it isn't free then make sure it's owned by customer abc".
I feel this is a tricky query but it seems to work correctly. Converting the clauses to ES syntax is confusing me as I don't really have the concept of brackets. I think I need to use bool queries... I have the following which I know doesn't apply the criteria correctly - it says you should have (language:eng OR isFree eq 'true' OR owned:abc). I can't seem to make the mental leap to build the must/should with NOT's in it.
Help please?
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "history",
"fields": [
"title^10.0",
"series"
]
}
}
],
"should": [
{
"term": {
"language": {
"value": "eng",
"boost": 5
}
}
},
{
"term": {
"isFree": {
"value": true
}
}
},
{
"term": {
"owned": {
"value": "abc",
"boost": 5
}
}
}
]
}
},
Your query is almost correct, the only thing that wasn't translated correctly was this part of the query:
(isfree eq 'true' OR (isfree eq 'false' AND owned eq 'abc^5'))
If I understand your post correctly, this is basically saying boost the 'owned' field by a factor of five when it's value is 'abc' and the price is free. To implement this, you need to use an additional bool query that:
Filters results by isFree: true
Boosts the owned field of any documents matching abc
"bool": {
"filter": [
{
"term": {
"isFree": {
"value": false
}
}
}
],
"must": [
{
"term": {
"owned": {
"value": "abc",
"boost": 5
}
}
}
]
}
Since this is not intended to limit the result set and only boost results that meet this criteria, the bool query above should be placed inside your parent bool's should section. The final query looks like:
POST /myindex/_search
{
"explain": true,
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "history",
"fields": [
"title^10",
"series"
]
}
}
],
"should": [
{
"term": {
"language": {
"value": "eng",
"boost": 5
}
}
},
{
"bool": {
"filter": [
{
"term": {
"isFree": {
"value": false
}
}
}
],
"must": [
{
"term": {
"owned": {
"value": "abc",
"boost": 5
}
}
}
]
}
}
]
}
}
}
Note: Using should and must yield the same results for that inner bool, I honestly am not sure which would be better to use so I just arbitrarily used must.

Select distinct values of bool query elastic search

I have a query that gets me some user post data from an elastic index. I am happy with that query, though I need to make it return rows with unique usernames. Current, it displays relevant posts by users, but it may display one user twice..
{
"query": {
"bool": {
"should": [
{ "match_phrase": { "gtitle": {"query": "voice","boost": 1}}},
{ "match_phrase": { "gdesc": {"query": "voice","boost": 1}}},
{ "match": { "city": {"query": "voice","boost": 2}}},
{ "match": { "gtags": {"query": "voice","boost": 1} }}
],"must_not": [
{ "term": { "profilepicture": ""}}
],"minimum_should_match" : 1
}
}
}
I have read about aggregations but didn't understand much (also tried to use aggs but didn't work either).... any help is appreciated
You would need to use terms aggregation to get all unique users and then use top hits aggregation to get only one result for each user. This is how it looks.
{
"query": {
"bool": {
"should": [
{
"match_phrase": {
"gtitle": {
"query": "voice",
"boost": 1
}
}
},
{
"match_phrase": {
"gdesc": {
"query": "voice",
"boost": 1
}
}
},
{
"match": {
"city": {
"query": "voice",
"boost": 2
}
}
},
{
"match": {
"gtags": {
"query": "voice",
"boost": 1
}
}
}
],
"must_not": [
{
"term": {
"profilepicture": ""
}
}
],
"minimum_should_match": 1
}
},
"aggs": {
"unique_user": {
"terms": {
"field": "userid",
"size": 100
},
"aggs": {
"only_one_post": {
"top_hits": {
"size": 1
}
}
}
}
},
"size": 0
}
Here size inside user aggregation is 100, you can increase that if you have more unique users(default is 10), also the outermost size is zero to get only aggregation results. One important thing to remember is your user ids have to be unique, i.e ABC and abc will be considered different users, you might have to make your userid not_analyzed to be sure about that. More on that.
Hope this helps!!

Constant weight for each part in a bool query in Elasticsearch

Is it possible to get number of "shoulds" matched or return constant weight per each should matched. For example, if I have a bool query like this:
"query": {
"bool": {
"should": [
{ "match": { "last_name": "Shcheklein" }},
{ "match": { "first_name": "Bart" }}
]
}
}
I would like to get:
score == 1 if one of the fields (last_name, first_name) matches (even in a fuzzy sense)
score == 2 if both fields match
And is it possible to get a list of fields matched?
you could probably use constant score to achieve this and
use highlighting to figure out the fields that matched.
Example :
{
"query": {
"bool": {
"disable_coord": true,
"should": [
{
"constant_score": {
"query": {
"match": {
"last_name": "Shcheklein"
}
},
"boost": 1
}
},
{
"constant_score": {
"query": {
"match": {
"first_name": "bart"
}
},
"boost": 1
}
}
]
}
}
}

Elasticsearch query: Multiply final score using nested object and function score

I have documents with some data and a specific omit list in it (see mapping and example data):
I would like to write an ES query which does the following:
Calculate some "basic" score for the documents (Query 1):
{
"explain": true,
"query": {
"bool": {
"should": [
{
"constant_score": {
"filter": {
"term": {
"type": "TYPE1"
}
}
}
},
{
"function_score": {
"linear": {
"number": {
"origin": 30,
"scale": 20
}
}
}
}
]
}
}
}
At the end multiply the score according to the omit percent of a specific id (In the example I used omit valut for A"omit.id": "A"). As a demonstration in Query 2 I calculated this multiplier.
{
"query": {
"nested": {
"path": "omit",
"query": {
"function_score": {
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"term": {
"omit.id": "A"
}
}
}
},
"functions": [
{
"linear": {
"omit.percent": {
"origin": 0,
"scale": 50,
"offset": 0,
"decay": 0.5
}
}
}
],
"score_mode": "multiply"
}
}
}
}
}
To achieve this final multiplication I faced with the following problems:
If I calculate linear function score inside of a nested query, (according to my interpretation) I cannot use any other field in function_score query.
I cannot multiply the calculated score with any other function_score which is encapsulated into a nested query.
I would like to ask for any advice to resolve this issue.
Note that maybe I should get rid of this nested type and use key-value pairs instead. For example:
{
"omit": {
"A": {
"percent": 10
},
"B": {
"percent": 100
}
}
}
but unfortunately there will be a lot of keys, which would result a huge (continuously growing) mapping, so I not prefer this option.
At least I figured out a possible solution based on a "non-nested way". The complete script can be found here.
I modified the omit list as described in the question:
"omit": {
"A": {
"percent": 10
},
"B": {
"percent": 100
}
}
In addition I set the enabled flag to false to not have these elements in the mapping:
"omit": {
"type" : "object",
"enabled" : false
}
The last trick was to use script_score as a function_score's function, because only there I could use the value of percent by _source.omit.A.percent script:
{
"query": {
"function_score": {
"query": {
...
},
"script_score": {
"lang": "groovy",
"script": "if (_source.omit.A){(100-_source.omit.A.percent)/100} else {1}"
},
"score_mode": "multiply"
}
}
}

How to do nested AND and OR filters in ElasticSearch?

My filters are grouped together into categories.
I would like to retrieve documents where a document can match any filter in a category, but if two (or more) categories are set, then the document must match any of the filters in ALL categories.
If written in pseudo-SQL it would be:
SELECT * FROM Documents WHERE (CategoryA = 'A') AND (CategoryB = 'B' OR CategoryB = 'C')
I've tried Nested filters like so:
{
"sort": [{
"orderDate": "desc"
}],
"size": 25,
"query": {
"match_all": {}
},
"filter": {
"and": [{
"nested": {
"path":"hits._source",
"filter": {
"or": [{
"term": {
"progress": "incomplete"
}
}, {
"term": {
"progress": "completed"
}
}]
}
}
}, {
"nested": {
"path":"hits._source",
"filter": {
"or": [{
"term": {
"paid": "yes"
}
}, {
"term": {
"paid": "no"
}
}]
}
}
}]
}
}
But evidently I don't quite understand the ES syntax. Is this on the right track or do I need to use another filter?
This should be it (translated from given pseudo-SQL)
{
"sort": [
{
"orderDate": "desc"
}
],
"size": 25,
"query":
{
"filtered":
{
"filter":
{
"and":
[
{ "term": { "CategoryA":"A" } },
{
"or":
[
{ "term": { "CategoryB":"B" } },
{ "term": { "CategoryB":"C" } }
]
}
]
}
}
}
}
I realize you're not mentioning facets but just for the sake of completeness:
You could also use a filter as the basis (like you did) instead of a filtered query (like I did). The resulting json is almost identical with the difference being:
a filtered query will filter both the main results as well as facets
a filter will only filter the main results NOT the facets.
Lastly, Nested filters (which you tried using) don't relate to 'nesting filters' like you seemed to believe, but related to filtering on nested-documents (parent-child)
Although I have not understand completely your structure this might be what you need.
You have to think tree-wise. You create a bool where you must (=and) fulfill the embedded bools. Each embedded checks if the field does not exist or else (using should here instead of must) the field must (terms here) be one of the values in the list.
Not sure if there is a better way, and do not know the performance.
{
"sort": [
{
"orderDate": "desc"
}
],
"size": 25,
"query": {
"query": { #
"match_all": {} # These three lines are not necessary
}, #
"filtered": {
"filter": {
"bool": {
"must": [
{
"bool": {
"should": [
{
"not": {
"exists": {
"field": "progress"
}
}
},
{
"terms": {
"progress": [
"incomplete",
"complete"
]
}
}
]
}
},
{
"bool": {
"should": [
{
"not": {
"exists": {
"field": "paid"
}
}
},
{
"terms": {
"paid": [
"yes",
"no"
]
}
}
]
}
}
]
}
}
}
}
}

Resources