Elasticsearch - filter conditions order - elasticsearch

Can you tell me please if the conditions in Elasticsearch filter are evaluated in the order as they are in the request json or if Elasticsearch will make some optimization in it?
I have a query like:
{
"sort": {
"publishDate": "desc"
},
"query": {
"bool": {
"filter": [
{
"range": {
"publishDate": {
"lte": "2018-10-26",
"gt": "2018-08-31"
}
}
},
{
"terms": {
"ico": [
31322832,
34444444
]
}
}
]
}
}
}
and I think the optimal order of filters when evaluating is terms first and range next. So what happens in Elasticsearch? Filters will be evaluated in request order or will be optimized? Also if somebody knows how is it in Elasticsearch 2?
Thanks.

Check out this article about execution order of filters and queries, it is really great. I hope it help you ES execution order

Related

Elasticsearch: How to write an 'OR' clause in filter context?

I'm looking for syntax/example compatible with ES version is 6.7.
I have seen the docs, I don't see any examples for this and the explanation isn't clear enough to me. I have tried writing query according to that, but I keep on getting syntax error. I have seen below questions on SO already but they don't help me:
Filter context for should in bool query (Elasticsearch)
It doesn't have any example.
Multiple OR filter in Elasticsearch
I get a syntax error
"type": "parsing_exception",
"reason": "no [query] registered for [filtered]",
"line": 1,
"col": 31
Maybe it's for a different version of ES.
All I need is a simple example with two 'or'ed conditions (mine is one range and one term but I guess that shouldn't matter much), both I would like to have in filter context (I don't care about scores, nor text search).
If you really need it, I can show my attempts (need to remove some 'sensitive'(duh) parts from it before posting), but they give parsing/syntax errors so I don't think there is any sense in them. I am aware that questions which don't show any efforts are considered bad for SO but I don't see any logic in showing attempts that aren't even parsed successfully, and any example would help me understand the syntax.
You need to wrap your should query in a filter query.
{
"query":{
"bool":{
"filter":[{
"bool":{
"should":[
{ // Query 1 },
{ // Query 2 }
]
}
}]
}
}
}
I had a similar scenario (even the range and match filter), with one more nested level, two conditions to be 'or'ed (as in your case) and another condition to be logically 'and'ed with its result. As #Pierre-Nicolas Mougel suggested in another answer I had nested bool clauses with one more level around the should clause.
{
"_source": [
"my_field"
],
"query": {
"bool": {
"filter": {
"bool": {
"must": [
{
"bool": {
"should": [
{
"range": {
"start": {
"gt": "1558878457851",
"lt": "1557998559147"
}
}
},
{
"range": {
"stop": {
"gt": "1558898457851",
"lt": "1558899559147"
}
}
}
]
}
},
{
"match": {
"my_id": "<My_Id>"
}
}
],
"must_not": []
}
}
}
},
"from": 0,
"size": -1,
"sort": [],
"aggs": {}
}
I read in the docs that minimum_should_match can be used too for forcing filter context. This might help you if this query doesn't work.

Compare query with and without score calculation

I would like to know if it is possible to disable score calculation for should types of queries or maybe it is possible to have an OR for filter context?
ES version: 6+
For example:
this query will search matches in either records OR voIds and will have score calculation
POST customers/_search
{
"size": 10000,
"version": true,
"query": {
"bool": {
"should": [
{
"terms": {
"voIds": [
78031203, ...
]
}
},
{
"terms": {
"records.keyword": [
"S3G82U", ....
]
}
}
]
}
}
}
this query will filter documents that match in both records AND voIds and will not have score calculation. not what I need because it uses AND
POST customers/_search
{
"size": 10000,
"version": true,
"query": {
"bool": {
"filter": [
{
"terms": {
"voIds": [
78031203
]
}
},
{
"terms": {
"records.keyword": [
"S3G82U"
]
}
}
]
}
}
}
The goal for me to troubleshoot performance of the same queries with and without score. So I have first query that has score. how to write second query without score?
Thanks.
This is not possible. And I don't see much use case functionality wise. Are you seeing slowness in elasticsearch or query itself?
You can't disable scoring compltely. But you can disable query coordination. Not sure how much it helps performance wise if at all.

Elasticsearch query for matching two parameters at the same time

I have to search two fields in a DB using elasticsearch where i should be getting total hits isequal to the sum of individual field search. I did it on port 9200 like this and its working. How to write a must match code for this.
http://localhost:9200/indexname/typename/_search?q=Both:Yes++Type:Comm
Where Both is one field and Comm is another.
Thank you
You need to use an "AND" query.
GET hilden1/type1/_search
{
"query": {
"filtered": {
"filter": {
"and": {
"filters": [
{
"term": {
"both": "yes"
}
},
{
"term": {
"type": "comm"
}
}
]
}
}
}
}
}
I think this is what you need:
Elasticsearch URI based query with AND operator
_search?q=%2Bboth:yes%20%2Btype:comm

Can _score from different queries be compared?

In my application, I issue multiple queries, each of which to a different index. Then, I merge the results from these queries, and sort them using the _score attribute, in order to rank them according to their relavance. But I wonder if this makes sense at all, since the results came from different queries?
I guess my question is: can _scores from different queries be compared?
Instead of issuing multiple queries , it would be a good idea to club them together in a single query.
You can use index query to do index specefic operation.
So something like
{
"bool": {
"should": [
{
"indices": {
"indices": [
"index1"
],
"query": {
"term": {
"tag": "wow"
}
}
}
},
{
"indices": {
"indices": [
"index2"
],
"query": {
"term": {
"name": "laptop"
}
}
}
}
]
}
}
Once this is done , results would be sorted based on the _score.
Hope that helps.

Is there a way to have elasticsearch return a hit per generated bucket during an aggregation?

right now I have a query like this:
{
"query": {
"bool": {
"must": [
{
"match": {
"uuid": "xxxxxxx-xxxx-xxxx-xxxxx-xxxxxxxxxxxxx"
}
},
{
"range": {
"date": {
"from": "now-12h",
"to": "now"
}
}
}
]
}
},
"aggs": {
"query": {
"terms": [
{
"field": "query",
"size": 3
}
]
}
}
}
The aggregation works perfectly well, but I can't seem to find a way to control the hit data that is returned, I can use the size parameter at the top of the dsl, but the hits that are returned are not returned in the same order as the bucket so the bucket results do not line up with the hit results. Is there any way to correct this or do I have to issue 2 separate queries?
To expand on Filipe's answer, it seems like the top_hits aggregation is what you are looking for, e.g.
{
"query": {
... snip ...
},
"aggs": {
"query": {
"terms": {
"field": "query",
"size": 3
},
"aggs": {
"top": {
"top_hits": {
"size": 42
}
}
}
}
}
}
Your query uses exact matches (match and range) and binary logic (must, bool) and thus should probably be converted to use filters instead:
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"uuid": "xxxxxxx-xxxx-xxxx-xxxxx-xxxxxxxxxxxxx"
}
},
{
"range": {
"date": {
"from": "now-12h",
"to": "now"
}
}
}
]
}
}
As for the aggregations,
The hits that are returned do not represent all the buckets that were returned. so if have buckets for terms 'a', 'b', and 'c' I want to have hits that represent those buckets as well
Perhaps you are looking to control the scope of the buckets? You can make an aggregation bucket global so that it will not be influenced by the query or filter.
Keep in mind that Elasticsearch will not "group" hits in any way -- it is always a flat list ordered according to score and additional sorting options.
Aggregations can be organized in a nested structure and return computed or extracted values, in a specific order. In the case of terms aggregation, it is in descending count (highest number of hits first). The hits section of the response is never influenced by your choice of aggregations. Similarly, you cannot find hits in the aggregation sections.
If your goal is to group documents by a certain field, yes, you will need to run multiple queries in the current Elasticsearch release.
I'm not 100% sure, but I think there's no way to do that in the current version of Elasticsearch (1.2.x). The good news is that there will be when version 1.3.x gets released:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-metrics-top-hits-aggregation.html

Resources