ElasticSearch should with nested and bool must_not exists - elasticsearch

With the following mapping:
"categories": {
"type": "nested",
"properties": {
"category": {
"type": "integer"
},
"score": {
"type": "float"
}
}
},
I want to use the categories field to return documents that either:
have a score above a threshold in a given category, or
do not have the categories field
This is my query:
{
"query": {
"bool": {
"should": [
{
"nested": {
"path": "categories",
"query": {
"bool": {
"must": [
{
"terms": {
"categories.category": [
<id>
]
}
},
{
"range": {
"categories.score": {
"gte": 0.5
}
}
}
]
}
}
}
},
{
"bool": {
"must_not": [
{
"exists": {
"field": "categories"
}
}
]
}
}
],
"minimum_should_match": 1
}
}
}
It correctly returns documents both with and without the categories field, and orders the results so the ones I want are first, but it doesn't filter the results having score below the 0.5 threshold.

Great question.
That is because categories is not exactly a field from the elasticsearch point of view[a field on which inverted index is created and used for querying/searching] but categories.category and categories.score is.
As a result categories being not found in any document, which is actually true for all the documents, you observe the result what you see.
Modify the query to the below and you'd see your use-case working correctly.
POST <your_index_name>/_search
{
"query": {
"bool": {
"should": [
{
"nested": {
"path": "categories",
"query": {
"bool": {
"must": [
{
"terms": {
"categories.category": [
"100"
]
}
},
{
"range": {
"categories.score": {
"gte": 0.5
}
}
}
]
}
}
}
},
{
"bool": {
"must_not": [ <----- Note this
{
"nested": {
"path": "categories",
"query": {
"bool": {
"must": [
{
"exists": {
"field": "categories.category"
}
},
{
"exists": {
"field": "categories.score"
}
}
]
}
}
}
}
]
}
}
],
"minimum_should_match": 1
}
}
}

Related

Return parent records even if no child records query matches

(this is actually AWS OpenSearch, which I believe is a fork of Elastic Search 7.x)
So in this contrived example, I have a parent-child relationship between manufacturer and products. I want to return "acme" information and all of the products. Some of the products may be embargoed (not ready to be listed to the public). For a new company, like acme, it only has new embargoed products, so when I run this query, I do not get back the company info. I tried using "min_children": 0, but I still do not get back the manufacturer.
For this query, other manufacturers are returned if they have at least one product that is not embargoed, so it's something about has_child hits not returning any products.
{
"track_total_hits": true,
"query": {
"bool": {
"must": [
{
"has_child": {
"inner_hits": {
"name": "manf_products",
"size": 100
},
"min_children": 0,
"query": {
"bool": {
"should": [
{
"range": {
"embargo_date": {
"lt": "now/s"
}
}
}
]
}
},
"type": "product"
}
},
{
"bool": {
"should": [
{
"term": {
"manuf": {
"value": "acme"
}
}
}
]
}
}
]
}
}
}
The minimum value of min_children is 1. This can give you more info on this.
Below query will return parents and child docs which are not embargoed. It will return parent if no child non embargoed child exists
{
"track_total_hits": true,
"query": {
"bool": {
"minimum_should_match": 1,
"should": [
{
"has_child": {
"inner_hits": {
"name": "manf_products",
"size": 100
},
"query": {
"bool": {
"should": [
{
"range": {
"embargo_date": {
"lt": "now/s"
}
}
}
]
}
},
"type": "product"
}
},
{
"bool": {
"must_not": [
{
"has_child": {
"inner_hits": {
"name": "manf_products",
"size": 100
},
"query": {
"bool": {
"should": [
{
"range": {
"embargo_date": {
"lt": "now/s"
}
}
}
]
}
},
"type": "product"
}
}
]
}
}
],
"filter": [
{
"term": {
"manf": {
"value": "acme"
}
}
}
]
}
}
}

Find distinct/unique people without a birthday or have a birthday earlier than 3/1/1963

We have some employees and needed to find those we haven't entered their birthday or are born before 3/1/1963:
{
"query": {
"bool": {
"should": [
{
"bool": {
"must_not": [{ "exists": { "field": "birthday" } }]
}
},
{
"bool": {
"filter": [{ "range": {"birthday": { "lte": 19630301 }} }]
}
}
]
}
}
}
We now need to get distinct names...we only want 1 Jason or 1 Susan, etc. How do we apply a distinct filter to the "name" field while still filtering for the birthday as above? I've tried:
{
"query": {
"bool": {
"should": [
{
"bool": {
"must_not": [
{
"exists": {
"field": "birthday"
}
}
]
}
},
{
"bool": {
"filter": [
{
"range": {
"birthday": {
"lte": 19630301
}
}
}
]
}
}
]
}
},
"aggs": {
"uniq_gender": {
"terms": {
"field": "name"
}
}
},
"from": 0,
"size": 25
}
but just get results with duplicate Jasons and Susans. At the bottom it will show me that there are 10 Susans and 12 Jasons. Not sure how to get unique ones.
EDIT:
My mapping is very simple. The name field doesn't need to be keyword...can be text or anything else as it is just a field that just gets returned in the query.
{
"mappings": {
"birthdays": {
"properties": {
"name": {
"type": "keyword"
},
"birthday": {
"type": "date",
"format": "basic_date"
}
}
}
}
}
Without knowing your mapping, I'm guessing that your field name is not analyzed and able to be used on terms aggregation properly.
I suggest you, use filtered aggregation:
{
"aggs": {
"filtered_employes": {
"filter": {
"bool": {
"must": [
{
"bool": {
"must_not": [
{
"exists": {
"field": "birthday"
}
}
]
}
},
{
"range": {
"birthday": {
"lte": 19630301
}
}
}
]
}
},
"aggs": {
"filtered_employes_by_name": {
"terms": {
"field": "name"
}
}
}
}
}
}
In other hand your query is not correct your applying a should bool filter. Change it by must and the aggregation will return only results from employes with (missing birthday) and (born before date).

Elasticsearch use filter on index only when index has field

There are 2 indexes: categories, posts.
categories
name
body
posts
name
body
publish_at
publish_until
I want to do a query on both indexes with a filter on publish_at and publish_until for the posts index.
http://localhost:9200/categories,posts/_search
{
"query": {
"bool": {
"must": {
"multi_match": {
"query": "keyword",
"fields": [
"name^3",
"body"
]
}
},
"filter": [{
"bool": {
"must": [
{
"range": {
"publish_at": {
"lte" : "now"
}
}
},
{
"range": {
"publish_until": {
"gt" : "now"
}
}
}
]
}
}]
}
}
}
This query only gives me posts as results. I also want categories in my results.
How do I apply the date range filters to only indexes with publish_at and publish_until fields and skip the date range filters for the other indexes?
Ok after a day of fiddling with bool I got it working:
{
"query": {
"bool" : {
"must" : [
{
"multi_match": {
"query": "keyword",
"fields": [
"name^3",
"body"
]
}
},
{
"bool": {
"should": [
{
"bool": {
"must": [
{
"range": {
"publish_at": {
"lte" : "now"
}
}
},
{
"range": {
"publish_until": {
"gt" : "now"
}
}
}
]
}
},
{
"bool": {
"must_not": [
{
"exists": {
"field": "publish_at"
}
},
{
"exists": {
"field": "publish_until"
}
}
]
}
}
]
}
}
]
}
}
}

Elasticsearch: execute a filter on nested document only if it exists

I am using ES 2.3 and have a query in which filter section looks as follows:
"filter": {
"query": {
"bool": {
"must": [
{
"nested": {
"path": "employees",
"query": {
"bool": {
"must": [
{
"range": {
"employees.max_age": {
"lte": 50
}
}
},
{
"range": {
"employees.min_age": {
"gte": 20
}
}
}
]
}
}
}
},
{
"exists": {
"field": "employees"
}
},
{
#....other filter here based on root document, not on nested employee document
}
]
}
}
}
}
I have a filter, where I check some conditions in the nested document "employees" in a bigger document called company, But I want to run this filter, only if "employees" object exists, as some of the document may not have that nested document at all. So I added , {"exists": {"field": "employees"}}
but this doesn't seem to work. Any idea what change I should make to get it work?
You can do it like this. However, if documents don't have the employees field, they will not be picked up anyway, so I'm not sure why you want/need that exists query in the first place.
{
"filter": {
"query": {
"bool": {
"must": [
{
"nested": {
"path": "employees",
"query": {
"exists": {
"field": "employees"
}
}
}
},
{
"nested": {
"path": "employees",
"query": {
"bool": {
"must": [
{
"range": {
"employees.max_age": {
"lte": 50
}
}
},
{
"range": {
"employees.min_age": {
"gte": 20
}
}
}
]
}
}
}
}
]
}
}
}
}

ElasticSearch ignoring sort when filtered

ElasticSearch Version: 0.90.1, JVM: 1.6.0_51(20.51-b01-457)
I'm trying to do two things with my ElasticSearch query: 1) filter the results based on a boolean (searchable) and "open_date < tomorrow" and 2) two sort by the field "open_date" DESC
This produces the following query:
{
"query": {
"bool": {
"should": [
{
"prefix": {
"name": "foobar"
}
},
{
"query_string": {
"query": "foobar"
}
},
{
"match": {
"name": {
"query": "foobar"
}
}
}
],
"minimum_number_should_match": 1
},
"filtered": {
"filter": {
"and": [
{
"term": {
"searchable": true
}
},
{
"range": {
"open_date": {
"lt": "2013-07-16"
}
}
}
]
}
}
},
"sort": [
{
"open_date": "desc"
}
]
}
However, the results that come back are not being sorted by "open_date". If I remove the filter:
{
"query": {
"bool": {
"should": [
{
"prefix": {
"name": "foobar"
}
},
{
"query_string": {
"query": "foobar"
}
},
{
"match": {
"name": {
"query": "foobar"
}
}
}
],
"minimum_number_should_match": 1
}
},
"sort": [
{
"open_date": "desc"
}
]
}
... the results come back as expected.
Any ideas?
I'm not sure about the Tire code, but the JSON does not correctly construct a filtered query. My guess is that this overflows and causes the sort element to also not be correctly parsed.
A filtered query should be constructed like this (see http://www.elasticsearch.org/guide/reference/query-dsl/filtered-query/ ):
{
"query": {
"filtered": { // Note: this contains both query and filter
"query": {
"bool": {
"should": [
{
"prefix": {
"name": "foobar"
}
},
{
"query_string": {
"query": "foobar"
}
},
{
"match": {
"name": {
"query": "foobar"
}
}
}
],
"minimum_number_should_match": 1
}
},
"filter": {
"and": [
{
"term": {
"searchable": true
}
},
{
"range": {
"open_date": {
"lt": "2013-07-16"
}
}
}
]
}
}
},
"sort": [
{
"open_date": "desc"
}
]
}
Cheers,
Boaz

Resources