ElasticSearch ignoring sort when filtered - elasticsearch

ElasticSearch Version: 0.90.1, JVM: 1.6.0_51(20.51-b01-457)
I'm trying to do two things with my ElasticSearch query: 1) filter the results based on a boolean (searchable) and "open_date < tomorrow" and 2) two sort by the field "open_date" DESC
This produces the following query:
{
"query": {
"bool": {
"should": [
{
"prefix": {
"name": "foobar"
}
},
{
"query_string": {
"query": "foobar"
}
},
{
"match": {
"name": {
"query": "foobar"
}
}
}
],
"minimum_number_should_match": 1
},
"filtered": {
"filter": {
"and": [
{
"term": {
"searchable": true
}
},
{
"range": {
"open_date": {
"lt": "2013-07-16"
}
}
}
]
}
}
},
"sort": [
{
"open_date": "desc"
}
]
}
However, the results that come back are not being sorted by "open_date". If I remove the filter:
{
"query": {
"bool": {
"should": [
{
"prefix": {
"name": "foobar"
}
},
{
"query_string": {
"query": "foobar"
}
},
{
"match": {
"name": {
"query": "foobar"
}
}
}
],
"minimum_number_should_match": 1
}
},
"sort": [
{
"open_date": "desc"
}
]
}
... the results come back as expected.
Any ideas?

I'm not sure about the Tire code, but the JSON does not correctly construct a filtered query. My guess is that this overflows and causes the sort element to also not be correctly parsed.
A filtered query should be constructed like this (see http://www.elasticsearch.org/guide/reference/query-dsl/filtered-query/ ):
{
"query": {
"filtered": { // Note: this contains both query and filter
"query": {
"bool": {
"should": [
{
"prefix": {
"name": "foobar"
}
},
{
"query_string": {
"query": "foobar"
}
},
{
"match": {
"name": {
"query": "foobar"
}
}
}
],
"minimum_number_should_match": 1
}
},
"filter": {
"and": [
{
"term": {
"searchable": true
}
},
{
"range": {
"open_date": {
"lt": "2013-07-16"
}
}
}
]
}
}
},
"sort": [
{
"open_date": "desc"
}
]
}
Cheers,
Boaz

Related

ElasticSearch should with nested and bool must_not exists

With the following mapping:
"categories": {
"type": "nested",
"properties": {
"category": {
"type": "integer"
},
"score": {
"type": "float"
}
}
},
I want to use the categories field to return documents that either:
have a score above a threshold in a given category, or
do not have the categories field
This is my query:
{
"query": {
"bool": {
"should": [
{
"nested": {
"path": "categories",
"query": {
"bool": {
"must": [
{
"terms": {
"categories.category": [
<id>
]
}
},
{
"range": {
"categories.score": {
"gte": 0.5
}
}
}
]
}
}
}
},
{
"bool": {
"must_not": [
{
"exists": {
"field": "categories"
}
}
]
}
}
],
"minimum_should_match": 1
}
}
}
It correctly returns documents both with and without the categories field, and orders the results so the ones I want are first, but it doesn't filter the results having score below the 0.5 threshold.
Great question.
That is because categories is not exactly a field from the elasticsearch point of view[a field on which inverted index is created and used for querying/searching] but categories.category and categories.score is.
As a result categories being not found in any document, which is actually true for all the documents, you observe the result what you see.
Modify the query to the below and you'd see your use-case working correctly.
POST <your_index_name>/_search
{
"query": {
"bool": {
"should": [
{
"nested": {
"path": "categories",
"query": {
"bool": {
"must": [
{
"terms": {
"categories.category": [
"100"
]
}
},
{
"range": {
"categories.score": {
"gte": 0.5
}
}
}
]
}
}
}
},
{
"bool": {
"must_not": [ <----- Note this
{
"nested": {
"path": "categories",
"query": {
"bool": {
"must": [
{
"exists": {
"field": "categories.category"
}
},
{
"exists": {
"field": "categories.score"
}
}
]
}
}
}
}
]
}
}
],
"minimum_should_match": 1
}
}
}

Elasticsearch query error - [or] query malformed, no start_object after query name

Can somebody explain me please what is wrong with this query? I need to convert this generated query from Elasticsearch 2 to Elasticsearch 6. In ES2 this one works well, but in ES6 it throws me an error: [or] query malformed, no start_object after query name. I am lost in it. OR is necessary cause there could be more conditions than this one.
{
"query": {
"bool": {
"filter": {
"or": [
{
"nested": {
"path": "zalozcovia",
"query": {
"bool": {
"filter": [
{
"match": {
"zalozcovia.meno": "\u013dubo\u0161"
}
},
{
"match": {
"zalozcovia.priezvisko": "Majgot"
}
},
{
"match": {
"zalozcovia.mesto": "Trnava"
}
}
]
}
}
}
}
]
}
}
},
"size": 20,
"sort": [
{
"rok": "desc"
},
{
"cislo": "desc"
}
]
}
Thanks.
In ES6 there is afaik no "OR" Query (https://www.elastic.co/guide/en/elasticsearch/reference/6.4/query-dsl-or-query.html). You should use a bool query and use there the "should" Part (https://www.elastic.co/guide/en/elasticsearch/reference/6.4/query-dsl-bool-query.html).
{
"query": {
"bool": {
"filter": [{
"bool": {
"should": [{
"nested": {
"path": "zalozcovia",
"query": {
"bool": {
"filter": [{
"match": {
"zalozcovia.meno": "\u013dubo\u0161"
}
},
{
"match": {
"zalozcovia.priezvisko": "Majgot"
}
},
{
"match": {
"zalozcovia.mesto": "Trnava"
}
}
]
}
}
}
}]
}
}]
}
},
"size": 20,
"sort": [{
"rok": "desc"
},
{
"cislo": "desc"
}
]
}
Try changing "filter-or" with should
{
"query": {
"bool": {
"should" : [
{
"nested": {
"path": "zalozcovia",
"query": {
"bool": {
"filter": [
{
"match": {
"zalozcovia.meno": "\u013dubo\u0161"
}
},
{
"match": {
"zalozcovia.priezvisko": "Majgot"
}
},
{
"match": {
"zalozcovia.mesto": "Trnava"
}
}
]
}
}
}
}
]
}
},
"size": 20,
"sort": [
{
"rok": "desc"
},
{
"cislo": "desc"
}
]
}

Using multiple Should queries

I want to get docs that are similar to multiple "groups" but separately. Each group has it's own rules (terms).
When I try to use more than one Should query inside a "bool" I get items that are a mix of both Should's terms.
I want to use 1 query total and not msearch for example.
Can someone please help me with that?
{
"explain": true,
"query": {
"filtered": {
"filter": {
"bool": {
"must_not": [
{
"term": {
"p_id": "123"
}
},
{
"term": {
"p_id": "124"
}
}
]
}
},
"query": {
"bool": {
"minimum_should_match": 1,
"should": [
{
"bool": {
"minimum_should_match": 1,
"should": [
{
"term": {
"cat": "1"
}
},
{
"term": {
"cat": "2"
}
},
{
"term": {
"keys": "a"
}
},
{
"term": {
"keys": "b"
}
}
]
}
},
{
"bool": {
"minimum_should_match": 1,
"should": [
{
"term": {
"cat": "6"
}
},
{
"term": {
"cat": "7"
}
},
{
"term": {
"keys": "r"
}
},
{
"term": {
"keys": "u"
}
}
]
}
}
]
}
}
}
},
"from": 0,
"size": 3
}
You can try using a terms aggregation on multiple fields with scripting and add a top hits aggregation as a sub-aggregation. Be warned this will be pretty slow. Add this after the query/filter and adjust the size parameter as needed
"aggs": {
"Cat_and_Keys": {
"terms": {
"script": "doc['cat'].values + doc['keys'].values"
},
"aggs":{ "separate_docs": {"top_hits":{"size":1 }} }
}
}

Match multiple properties on the same nested document in ElasticSearch

I'm trying to accomplish what boils down to a boolean AND on nested documents in ElasticSearch. Let's say I have the following two documents.
{
"id": 1,
"secondLevels": [
{
"thirdLevels": [
{
"isActive": true,
"user": "anotheruser#domain.com"
}
]
},
{
"thirdLevels": [
{
"isActive": false,
"user": "user#domain.com"
}
]
}
]
}
{
"id": 2,
"secondLevels": [
{
"thirdLevels": [
{
"isActive": true,
"user": "user#domain.com"
}
]
}
]
}
In this case, I want to only match documents (in this case ID: 2) that have a nested document with both isActive: true AND user: user#domain.com.
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "secondLevels.thirdLevels",
"query": {
"bool": {
"must": [
{
"term": {
"secondLevels.thirdLevels.isActive": true
}
},
{
"term": {
"secondLevels.thirdLevels.user": "user#domain.com"
}
}
]
}
}
}
}
]
}
}
}
However, what seems to be happening is that my query turns up both documents because the first document has one thirdLevel that has isActive: true and another thirdLevel that has the appropriate user.
Is there any way to enforce this strictly at query/filter time or do I have to do this in a script?
With nested-objects and nested-query, you have made most of the way.
All you have to do now is to add the inner hits flag and also use source filtering for move entire secondLevels documents out of the way:
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "secondLevels.thirdLevels",
"query": {
"bool": {
"must": [
{
"term": {
"secondLevels.thirdLevels.isActive": true
}
},
{
"term": {
"secondLevels.thirdLevels.user": "user#domain.com"
}
}
]
}
},
"inner_hits": {
"size": 100
}
}
}
]
}
}
}

Filtered bool vs Bool query : elasticsearch

I have two queries in ES. Both have different turnaround time on the same set of documents. Both are doing the same thing conceptually. I have few doubts
1- What is the difference between these two?
2- Which one is better to use?
3- If both are same why they are performing differently?
1. Filtered bool
{
"from": 0,
"size": 5,
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"called_party_address_number": "1987112602"
}
},
{
"term": {
"original_sender_address_number": "6870340319"
}
},
{
"range": {
"x_event_timestamp": {
"gte": "2016-07-01T00:00:00.000Z",
"lte": "2016-07-30T00:00:00.000Z"
}
}
}
]
}
}
}
},
"sort": [
{
"x_event_timestamp": {
"order": "desc",
"ignore_unmapped": true
}
}
]
}
2. Simple Bool
{
"query": {
"bool": {
"must": [
{
"term": {
"called_party_address_number": "1277478699"
}
},
{
"term": {
"original_sender_address_number": "8020564722"
}
},
{
"term": {
"cause_code": "573"
}
},
{
"range": {
"x_event_timestamp": {
"gt": "2016-07-13T13:51:03.749Z",
"lt": "2016-07-16T13:51:03.749Z"
}
}
}
]
}
},
"from": 0,
"size": 10,
"sort": [
{
"x_event_timestamp": {
"order": "desc",
"ignore_unmapped": true
}
}
]
}
Mapping:
{
"ccp": {
"mappings": {
"type1": {
"properties": {
"original_sender_address_number": {
"type": "string"
},
"called_party_address_number": {
"type": "string"
},
"cause_code": {
"type": "string"
},
"x_event_timestamp": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
},
.
.
.
}
}
}
}
}
Update 1:
I tried bool/must query and bool/filter query on same set of data,but I found the strange behaviour
1-
bool/must query is able to search the desired document
{
"query": {
"bool": {
"must": [
{
"term": {
"called_party_address_number": "8701662243"
}
},
{
"term": {
"cause_code": "401"
}
}
]
}
}
}
2-
While bool/filter is not able to search the document. If I remove the second field condition it searches the same record with field2's value as 401.
{
"query": {
"bool": {
"filter": [
{
"term": {
"called_party_address_number": "8701662243"
}
},
{
"term": {
"cause_code": "401"
}
}
]
}
}
}
Update2:
Found a solution of suppressing scoring phase with bool/must query by wrapping it within "constant_score".
{
"query": {
"constant_score": {
"filter": {
"bool": {
"must": [
{
"term": {
"called_party_address_number": "1235235757"
}
},
{
"term": {
"cause_code": "304"
}
}
]
}
}
}
}
}
Record we are trying to match have "called_party_address_number": "1235235757" and "cause_code": "304".
The first one uses the old 1.x query/filter syntax (i.e. filtered queries have been deprecated in favor of bool/filter).
The second one uses the new 2.x syntax but not in a filter context (i.e. you're using bool/must instead of bool/filter). The query with 2.x syntax which is equivalent to your first query (i.e. which runs in a filter context without score calculation = faster) would be this one:
{
"query": {
"bool": {
"filter": [
{
"term": {
"called_party_address_number": "1277478699"
}
},
{
"term": {
"original_sender_address_number": "8020564722"
}
},
{
"term": {
"cause_code": "573"
}
},
{
"range": {
"x_event_timestamp": {
"gt": "2016-07-13T13:51:03.749Z",
"lt": "2016-07-16T13:51:03.749Z"
}
}
}
]
}
},
"from": 0,
"size": 10,
"sort": [
{
"x_event_timestamp": {
"order": "desc",
"ignore_unmapped": true
}
}
]
}

Resources