How Elastic Search boolean query works? - elasticsearch

Would you please explain for me how Elastic Search boolean query works?
I've read the documentation here:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-bool-query.html
but seems like it's too simple I can not understand. look at this query:
{
"bool" : {
"must" : {
"term" : { "user" : "kimchy" }
},
"must_not" : {
"range" : {
"age" : { "from" : 10, "to" : 20 }
}
},
"should" : [
{
"term" : { "tag" : "wow" }
},
{
"term" : { "tag" : "elasticsearch" }
}
],
"minimum_should_match" : 1,
"boost" : 1.0
}
}
I can not understand the usage of 'should' and 'minimum_should_match'. Would you please explain it to me?

In the query you have provided should will bring the documents up( means they will
come first) if they satisfy the must and must_not part. In this should will match if any one of condition will satisfy provided in the should array (it will join should with OR operator)
Now consider this case
{
"bool": {
"should": [
{
"term": {
"tag": "wow"
}
},
{
"term": {
"tag": "elasticsearch"
}
}
],
"minimum_should_match": 1,
"boost": 1
}
}
In this there is no must and must_not then it will match all the conditions in should array . It will return documents which contains both tags wow & elasticsearch (will join should clauses with AND operator )and in your query (in which it contains must part also) it will join should clauses with OR operator .
And for getting clear with minimum_should_match please refer this
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-minimum-should-match.html
Please let me know if i was able to clarify the difference and functionality ..

Related

Elasticsearch bool query join order

Raising this question to know the order in which ES executes query clauses (must, should, filter, must_not) that are part of bool query. Sharing the sample query from ES docs -
{ "query": {
"bool" : {
"must" : {
"term" : { "user.id" : "kimchy" }
},
"filter": {
"term" : { "tags" : "production" }
},
"must_not" : {
"range" : {
"age" : { "gte" : 10, "lte" : 20 }
}
},
"should" : [
{ "term" : { "tags" : "env1" } },
{ "term" : { "tags" : "deployed" } }
],
"minimum_should_match" : 1,
"boost" : 1.0
} } }
From the documentation it looks like query-clauses are joined using AND condition. For example, above search DSL's SQL counterpart would look like (rough translation) -
select * from user where user_id like 'kimchy' and tags in ('production') and not (10 <= range <= 20) and tags in ('env1', 'deployed');
I actually wasn't able to find official documentation around this, but did see some texts that ES query-evaluation heavily depends on certain cost approximations. Wondering how to map the ordering to SQL like syntax so, we can develop a clear mental picture when authoring ES queries. It also feels like ordering might have some affect for deeply nested boolean AND OR queries.

Elasticsearch - use a field match to boost only and not to fetch the document

I have a query phrase that needs to match in either of the fields - name, summary or description or the exact match on the name field.
Now, I have one more new field brand. Match in this field should be used only to boost results. Meaning if there is a match only in the brand field, the doc should not be in the result set.
To solve the without brand I have the below query:
query: {
bool: {
minimum_should_match: 1,
should: [
multi_match:{
query : "Cadbury chocklate milk",
fields : [name, summary, description]
},
term: {
name_keyword: {
value: "Cadbury chocklate milk"
}
}
]
}
}
This works fine for me.
How do I fetch the data using the same query but boost docs that have brand:cadbury, without increasing the recall set(match based on brand:cadbury).
Thanks!
Using a bool inside must should work for you.
multi_match has multiple types and for phrase you have to use type:phrase.
{
"query": {
"bool": {
"must": [
{ "bool" :
{ "should" : [ {
"multi_match" :{
"type" : "phrase",
"query" : "Cadbury chocklate milk",
"fields" : ["name", "summary", "description"]
} }, {
"term": {
"name_keyword": {
"value": "Cadbury chocklate milk"
} }
}
]
}
}
],
"should" : {
"term" : {
"brand" : {
"value" : "cadbury"
}
}
}
}
}

Query documents with access control filter

Each document in my Elasticsearch index has two access control lists containing user ids. One is an allow list, the other is a deny list. I am trying to add a filter to a given query that considers these ACLs. I thought I could use a bool query with a must clause for the given query, a filter clause for the allow list, and a must_not clause for the deny list. What I have so far (example for user 1):
{
"bool" : {
"must" : {
[given query]
},
"filter" : [ {
"match" : {
"acl.allow" : {
"query" : "/user/1",
"type" : "boolean"
}
}
}],
"must_not" : [ {
"match" : {
"acl.deny" : {
"query" : "/user/1",
"type" : "boolean"
}
}
}]
}
}
Unfortunately, this query does not return the desired result. It returns objects that have not listed user 1 in their allow list (a behavior I don't understand). Also, it (obviously) ignores objects with empty access control lists (which should be visible to anyone). Any suggestions to fix that?
I figured it out. First of all, using match isn't really a good solution for that kind of query—due to its analyzer. Using term though left me puzzled why I did not get any results. Term queries only return results if the corresponding field is set to not_analyzed. Thus I changed my mapping:
"acl": {
"properties": {
"allow": {
"type": "string",
"index": "not_analyzed"
},
"deny": {
"type": "string",
"index": "not_analyzed"
}
}
}
My second problem—treating objects with empty ACLs as visible to anyone—was solved using exists nested in must_not nested in bool. This is recommended as substitute for the deprecated missing query. My final query looks like this and passed all ACL related tests I could think of.
{
"bool" : {
"must" : {
[given query]
},
"filter" : {
"bool" : {
"should" : [ {
"terms" : {
"acl.allow" : [ "/user/1" ]
}
}, {
"bool" : {
"must_not" : {
"exists" : {
"field" : "acl.allow"
}
}
}
} ]
}
},
"must_not" : {
"terms" : {
"acl.deny" : [ "/user/1" ]
}
}
}
}

Elastic(search): How to structure nested queries correctly?

I'm currently quite confuse about the structuring of queries in elastic. Let me explain what I mean with the following template that works fine for me:
{
"template" : {
"query" : {
"filtered" : {
"query" : {
"bool" : {
"must" : [
{ "match" : {
"user" : "{{param_user}}"
} },
{ "match" : {
"session" : "{{param_session}}"
} },
{ "range" : {
"date" : {
"gte" : "{{param_from}}",
"lte" : "{{param_to}}"
}
} }
]
}
}
}
}
}
}
Ok so I want to get entries of a specific session of a user in a certain time period. Now if you take a llok at this link http://www.elastic.co/guide/en/elasticsearch/guide/current/combining-filters.html you can find the following query:
{
"query" : {
"filtered" : {
"filter" : {
"bool" : {
"should" : [
{ "term" : {"price" : 20}},
{ "term" : {"productID" : "XHDK-A-1293-#fJ3"}}
],
"must_not" : {
"term" : {"price" : 30}
}
}
}
}
}
}
In this example we have right after the "filtered" the "filter" keyword. However if I exchange my second "query" with a "filter" as in the example , my template won't work anymore. This is really counterintuitive and I payed alot of time to figure this out. A̶l̶s̶o̶ ̶I̶ ̶d̶o̶n̶'̶t̶ ̶u̶n̶d̶e̶r̶s̶t̶a̶n̶d̶ ̶w̶h̶y̶ ̶w̶e̶ ̶n̶e̶e̶d̶ ̶t̶o̶ ̶p̶u̶t̶ ̶e̶v̶e̶r̶y̶ ̶f̶i̶l̶t̶e̶r̶ ̶i̶n̶ ̶s̶e̶p̶a̶r̶a̶t̶e̶ ̶̶{̶ ̶}̶̶ ̶e̶v̶e̶n̶ ̶t̶h̶o̶u̶g̶h̶ ̶t̶h̶e̶y̶ ̶a̶r̶e̶ ̶a̶l̶r̶e̶a̶d̶y̶ ̶s̶e̶p̶a̶r̶a̶t̶e̶d̶ ̶b̶y̶ ̶t̶h̶e̶ ̶a̶r̶r̶a̶y̶ ̶s̶y̶n̶t̶a̶x̶.̶
Another issue I had was that I suggested to match several fields I can just type smth like:
{
"query" : {
"match" : {
"user" : "{{param_user}}",
"session" : "{{param_session}}"
}
}
}
but it seemed that I have to use a bool query which I didn't know of, so I searched for 'elastic multi match' but got something completely different.
My question: where can I find how to structure a query properly (smth like a PEG)? The documentation only give basic examples but doesn't state what we can actually do and how.
Best regards,
Jan
Edit: Ok I just found by accident that I cannot exchange "query" with "filter" as "match" is a query and not a filter. But then again what about "range"? It seems to be a query as well as a filter... Is there a summary of keywords specifying in which context they can be used?
Is there a summary of keywords specifying in which context they can be used?
I wouldn't consider that as keywords. It's just there are both queries and filters with the same names (but not all of them).
Here is everything you need. For example there are both range query and filter. All you need is to understand the difference between filters and queries.
For example, if you want to move range section from query to filter, you can do that like shown in the code below (not tested). Since your code already contains filtered type of query, you can just create filter section right after query section.
{
"template": {
"query": {
"filtered": {
"query": {
"bool": {
"must": [
{
"match": {
"user": "{{param_user}}"
}
},
{
"match": {
"session": "{{param_session}}"
}
}
]
}
},
"filter": {
"range": {
"date": {
"gte": "{{param_from}}",
"lte": "{{param_to}}"
}
}
}
}
}
}
}
Just remember that you can filter only not analyzed fields.

Elasticsearch DSL query from an SQL statement

I'm new to Elasticsearch. I don't think I fully understand the concept of query and filters. In my case I just want to use filters as I don't want to use advance feature like scoring.
How would I convert the following SQL statement into elasticsearch query?
SELECT * FROM advertiser
WHERE company like '%com%'
AND sales_rep IN (1,2)
What I have so far:
curl -XGET 'localhost:9200/advertisers/advertiser/_search?pretty=true' -d '
{
"query" : {
"bool" : {
"must" : {
"wildcard" : { "company" : "*com*" }
}
}
},
"size":1000000
}'
How to I add the OR filters on sales_rep field?
Thanks
Add a "should" clause after your must clause. In a bool query, one or more should clauses must match by default. Actually, you can set the "minimum_number_should_match" to be any number, Check out the bool query docs.
For your case, this should work.
"should" : [
{
"term" : { "sales_rep_id" : "1" }
},
{
"term" : { "sales_rep_id" : "2" }
}
],
The same concept works for bool filters. Just change "query" to "filter". The bool filter docs are here.
I come across this post 4 years too late...
Anyways, perhaps the following code could be useful...
{
"query": {
"filtered": {
"query": {
"wildcard": {
"company": "*com*"
}
},
"filter": {
"bool": {
"should": [
{
"terms": {
"sales_rep_id": [ "1", "2" ]
}
}
]
}
}
}
}
}

Resources