In the elasticsearch doc for a bool query at this link:
https://www.elastic.co/guide/en/elasticsearch/reference/1.4/query-dsl-bool-query.html
It doesn't say the containing structure. If I just use bool the way they have it, it's totally wrong. I need to surround this with some silly combination of query/filter/ filtered query. I'm not sure what is the correct way to form a json query in elastic. The documents seem to be completely contradictory in many places about what goes where and how. Any elasticsearch experts out there that know about how to properly form a query?
First of all, there is a "bool" query, and a "bool" filter, and they go in different places and do slightly different things. As a general rule, if you can use a filter do it (many of them can be cached, and are a little faster even if not). If you need a "match" then you need a query.
The example on the page you referenced could actually be used either way:
As a query:
POST /test_index/_search
{
"query": {
"bool": {
"must": {
"term": {
"user": "kimchy"
}
},
"must_not": {
"range": {
"age": {
"from": 10,
"to": 20
}
}
},
"should": [
{
"term": {
"tag": "wow"
}
},
{
"term": {
"tag": "elasticsearch"
}
}
],
"minimum_should_match": 1,
"boost": 1
}
}
}
Or as a filter (in a filtered query):
POST /test_index/_search
{
"query": {
"filtered": {
"filter": {
"bool": {
"must": {
"term": {
"user": "kimchy"
}
},
"must_not": {
"range": {
"age": {
"from": 10,
"to": 20
}
}
},
"should": [
{
"term": {
"tag": "wow"
}
},
{
"term": {
"tag": "elasticsearch"
}
}
],
"minimum_should_match": 1
}
}
}
}
}
Also I totally get the frustration with the ES documents. I've been working with them for a couple of years now, and they don't seem to be getting any better. Maybe the people in charge of documentation just don't care all that much. The conspiracy theory view would be that bad documentation helps the company sell professional services.
Related
I'm writing some code to generate queries and I wondered if there was any one way of generating the queries that was kinder to the server.
So this query:
{
"from": 0,
"size": 10,
"query": {
"bool": {
"must": [
{
"match": {
"Text": {
"query": "Scooby Shaggy corridor",
"fuzziness": 1,
"operator": "AND"
}
}
}
]
}
}
}
is logically equivalent to this:
{
"from": 0,
"size": 10,
"query": {
"bool": {
"must": [
{
"match": {
"Text": {
"query": "Scooby",
"fuzziness": 1
}
}
},
{
"match": {
"Text": {
"query": "Shaggy",
"fuzziness": 1
}
}
},
{
"match": {
"Text": {
"query": "corridor",
"fuzziness": 1
}
}
}
]
}
}
}
but is either one easier for the server to process?
Or does it make no difference?
I realise this is a trivial example but could it make a difference with more complex queries?
If someone who knows a bit about how ElasticSearch behaves under the hood could make an observation I'd be grateful.
Thanks,
Adam.
Elasticsearch will rewrite itself your multi-term match query to the logical equivalent. see here for more details.
The match query is of type boolean. It means that the text provided is
analyzed and the analysis process constructs a boolean query from the
provided text.
But you should keep the multi-term match query and let elasticsearch do the job. Its more maintainable and you can control the rewriting thanks to the rewrite parameter ( see here )
I test with these 2 queries
Query with must
{
"size": 200,
"from": 0,
"query": {
"bool": {
"must": [ {
"match": {
"_all": "science"
}
},
{
"match": {
"category": "fiction"
}
},
{
"match": {
"country": "us"
}
}
]
}
}
}
Query with should + minimum_should_match
{
"size": 200,
"from": 0,
"query": {
"bool": {
"should": [ {
"match": {
"_all": "science"
}
},
{
"match": {
"category": "fiction"
}
},
{
"match": {
"country": "us"
}
}
],
minimum_should_match: 3
}
}
}
Both queries give me same result, I don't know the difference between these 2, when we should use minimum_should_match?
I guess you mean minimum_number_should_match, right?
In both cases it would be the same because you have the same number of clauses in should. minimum_number_should_match usually is used when you have more clauses than the number you specify there.
For example if you have 5 should clauses, but for some reason you only need three of them to be fulfilled you would do something like this:
{
"query": {
"bool": {
"should": [
{
"term": {
"tag": "wow"
}
},
{
"term": {
"tag": "elasticsearch"
}
},
{
"term": {
"tag": "tech"
}
},
{
"term": {
"user": "plchia"
}
},
{
"range": {
"age": {
"gte": 10,
"lte": 20
}
}
}
],
"minimum_should_match": 3
}
}
}
That's correct and desired behavior. Let's decipher it a little bit:
Boolean query with must clauses means that all clauses under must section are required to match. Just like in English - it means strong obligation.
Boolean query with should clauses means that some clauses are required to match, whereas the others are not (i.e. soft obligation). The default number of clauses that must match here is simply 1. And to override this behavior the minimum_should_match parameter is coming into play. If you specify minimum_should_match=3 it will mean 3 clauses under should must match. From the practical perspective it exactly the same as specifying those clauses with must.
Hope it explains it in details.
I have a scenario in Elasticsearch where my indexed docs are like this :-
{"id":1,"name":"xyz", "address": "xyz123"}
{"id":1,"name":"xyz", "address": "xyz123"}
{"id":1,"name":"xyz", "address": "xyz123", "note": "imp"}
Here the requirement stress that we have to do a term match query and then provide relevance score to them which is a straight forward thing but the additional aspect here is if any doc found in search result has note field then it should be given higher relevance. How can we achieve it with DSL query? Using exists we can check which docs contain notes but how to integrate with match query in ES query. Have tried lot of ways but none worked.
With ES 5, you could boost your exists query to give a higher score to documents with a note field. For example,
{
"query": {
"bool": {
"must": {
"match": {
"name": {
"query": "your term"
}
}
},
"should": {
"exists": {
"field": "note",
"boost": 4
}
}
}
}
}
With ES 2, you could try a boosted filtered subset
{
"query": {
"function_score": {
"query": {
"match": { "name": "your term" }
},
"functions": [
{
"filter": { "exists" : { "field" : "note" }},
"weight": 4
}
],
"score_mode": "sum"
}
}
}
I believe that you are looking for boosting query feature
https://www.elastic.co/guide/en/elasticsearch/reference/5.1/query-dsl-boosting-query.html
{
"query": {
"boosting": {
"positive": {
<put yours original query here>
},
"negative": {
"filtered": {
"filter": {
"exists": {
"field": "note"
}
}
}
},
"negative_boost": 4
}
}
}
The following query has terrible performance.
100% sure it is the has_child. Query without it runs under 300ms, with it it takes 9 seconds.
Is there some better way to use the has_child query? It seems like I could query parents, and then children by id and then join client side to do the has child check faster than the ES database engine is doing it...
{
"query": {
"filtered": {
"query": {
"bool": {
"must": [
{
"has_child": {
"type": "status",
"query": {
"term": {
"stage": "s3"
}
}
}
},
{
"has_child": {
"type": "status",
"query": {
"term": {
"stage": "es"
}
}
}
}
]
}
},
"filter": {
"bool": {
"must": [
{
"term": {
"source": "IntegrationTest-2016-03-01T23:31:15.023Z"
}
},
{
"range": {
"eventTimestamp": {
"from": "2016-03-01T20:28:15.028Z",
"to": "2016-03-01T23:33:15.028Z"
}
}
}
]
}
}
}
},
"aggs": {
"digests": {
"terms": {
"field": "digest",
"size": 0
}
}
},
"size": 0
}
Cluster info:
CPU and memory usage is low. It is AWS ES Service cluster (v1.5.2). Many small documents, and since version aws is running is old, doc values aren't on by default. Not sure if that is helping or hurting.
Since "stage" is not analyzed (based on your comment) and, therefore, you are not interested in scoring the documents that match on that field, you might realize slight performance gains by using the has_child filter instead of the has_child query. And using a term filter instead of a term query.
In the documentation for has_child, you'll notice:
The has_child filter also accepts a filter instead of a query:
The main performance benefits of using a filter come from the fact that Elasticsearch can skip the scoring phase of the query. Also, filters can be cached which should improve the performance of future searches that use the same filters. Queries, on the other hand, cannot be cached.
Try this instead:
{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"source": "IntegrationTest-2016-03-01T23:31:15.023Z"
}
},
{
"range": {
"eventTimestamp": {
"from": "2016-03-01T20:28:15.028Z",
"to": "2016-03-01T23:33:15.028Z"
}
}
},
{
"has_child": {
"type": "status",
"filter": {
"term": {
"stage": "s3"
}
}
}
},
{
"has_child": {
"type": "status",
"filter": {
"term": {
"stage": "es"
}
}
}
}
]
}
}
}
},
"aggs": {
"digests": {
"terms": {
"field": "digest",
"size": 0
}
}
},
"size": 0
}
I bit the bullet and just performed the parent:child join in my application. Instead of waiting 7 seconds for the has_child query, I fire off two consecutive term queries and do some post processing: 200ms.
Is it possible to have a query like this
"query": {
"filtered": {
"filter": {
"terms": {
"names": [
"Anna",
"Mark",
"Joe"
],
"execution" : "and"
}
}
}
}
With the "minimum_should_match": "2" statement?
I know that I can use a simple query (I've tried, it works) but I don't need the score to be computed. My goal is just to filter documents which contains 2 of the values.
Does the score generally heavily impact the time needed to retrieves document?
Using this query:
"query": {
"filtered": {
"filter": {
"terms": {
"names": [
"Anna",
"Mark",
"Joe"
],
"execution" : "and",
"minimum_should_match": "2"
}
}
}
}
I got this error:
QueryParsingException[[my_db] [terms] filter does not support [minimum_should_match]]
Minimum should match is not a parameter for the terms filter. If that is the functionality you are looking for, I might rewrite your query like this, to use the bool query wrapped in a query filter:
{
"filter": {
"query": {
"bool": {
"should": [
{
"term": {
"names": "Anna"
}
},
{
"term": {
"names": "Mark"
}
},
{
"term": {
"name": "Joe"
}
}
],
"minimum_should_match": 2
}
}
}
}
You will get documents matching preferably exactly all three, but the query will also match document with exactly two of the three terms. The must is an implicit and. We also do not compute score, as we have executed the query as a filter.