How to do nested AND and OR filters in ElasticSearch? - elasticsearch

My filters are grouped together into categories.
I would like to retrieve documents where a document can match any filter in a category, but if two (or more) categories are set, then the document must match any of the filters in ALL categories.
If written in pseudo-SQL it would be:
SELECT * FROM Documents WHERE (CategoryA = 'A') AND (CategoryB = 'B' OR CategoryB = 'C')
I've tried Nested filters like so:
{
"sort": [{
"orderDate": "desc"
}],
"size": 25,
"query": {
"match_all": {}
},
"filter": {
"and": [{
"nested": {
"path":"hits._source",
"filter": {
"or": [{
"term": {
"progress": "incomplete"
}
}, {
"term": {
"progress": "completed"
}
}]
}
}
}, {
"nested": {
"path":"hits._source",
"filter": {
"or": [{
"term": {
"paid": "yes"
}
}, {
"term": {
"paid": "no"
}
}]
}
}
}]
}
}
But evidently I don't quite understand the ES syntax. Is this on the right track or do I need to use another filter?

This should be it (translated from given pseudo-SQL)
{
"sort": [
{
"orderDate": "desc"
}
],
"size": 25,
"query":
{
"filtered":
{
"filter":
{
"and":
[
{ "term": { "CategoryA":"A" } },
{
"or":
[
{ "term": { "CategoryB":"B" } },
{ "term": { "CategoryB":"C" } }
]
}
]
}
}
}
}
I realize you're not mentioning facets but just for the sake of completeness:
You could also use a filter as the basis (like you did) instead of a filtered query (like I did). The resulting json is almost identical with the difference being:
a filtered query will filter both the main results as well as facets
a filter will only filter the main results NOT the facets.
Lastly, Nested filters (which you tried using) don't relate to 'nesting filters' like you seemed to believe, but related to filtering on nested-documents (parent-child)

Although I have not understand completely your structure this might be what you need.
You have to think tree-wise. You create a bool where you must (=and) fulfill the embedded bools. Each embedded checks if the field does not exist or else (using should here instead of must) the field must (terms here) be one of the values in the list.
Not sure if there is a better way, and do not know the performance.
{
"sort": [
{
"orderDate": "desc"
}
],
"size": 25,
"query": {
"query": { #
"match_all": {} # These three lines are not necessary
}, #
"filtered": {
"filter": {
"bool": {
"must": [
{
"bool": {
"should": [
{
"not": {
"exists": {
"field": "progress"
}
}
},
{
"terms": {
"progress": [
"incomplete",
"complete"
]
}
}
]
}
},
{
"bool": {
"should": [
{
"not": {
"exists": {
"field": "paid"
}
}
},
{
"terms": {
"paid": [
"yes",
"no"
]
}
}
]
}
}
]
}
}
}
}
}

Related

Find distinct/unique people without a birthday or have a birthday earlier than 3/1/1963

We have some employees and needed to find those we haven't entered their birthday or are born before 3/1/1963:
{
"query": {
"bool": {
"should": [
{
"bool": {
"must_not": [{ "exists": { "field": "birthday" } }]
}
},
{
"bool": {
"filter": [{ "range": {"birthday": { "lte": 19630301 }} }]
}
}
]
}
}
}
We now need to get distinct names...we only want 1 Jason or 1 Susan, etc. How do we apply a distinct filter to the "name" field while still filtering for the birthday as above? I've tried:
{
"query": {
"bool": {
"should": [
{
"bool": {
"must_not": [
{
"exists": {
"field": "birthday"
}
}
]
}
},
{
"bool": {
"filter": [
{
"range": {
"birthday": {
"lte": 19630301
}
}
}
]
}
}
]
}
},
"aggs": {
"uniq_gender": {
"terms": {
"field": "name"
}
}
},
"from": 0,
"size": 25
}
but just get results with duplicate Jasons and Susans. At the bottom it will show me that there are 10 Susans and 12 Jasons. Not sure how to get unique ones.
EDIT:
My mapping is very simple. The name field doesn't need to be keyword...can be text or anything else as it is just a field that just gets returned in the query.
{
"mappings": {
"birthdays": {
"properties": {
"name": {
"type": "keyword"
},
"birthday": {
"type": "date",
"format": "basic_date"
}
}
}
}
}
Without knowing your mapping, I'm guessing that your field name is not analyzed and able to be used on terms aggregation properly.
I suggest you, use filtered aggregation:
{
"aggs": {
"filtered_employes": {
"filter": {
"bool": {
"must": [
{
"bool": {
"must_not": [
{
"exists": {
"field": "birthday"
}
}
]
}
},
{
"range": {
"birthday": {
"lte": 19630301
}
}
}
]
}
},
"aggs": {
"filtered_employes_by_name": {
"terms": {
"field": "name"
}
}
}
}
}
}
In other hand your query is not correct your applying a should bool filter. Change it by must and the aggregation will return only results from employes with (missing birthday) and (born before date).

use wildcard with Terms in elasticsearch

I wanted to simulate SQL's IN so I used terms filter, but terms does not support wild cards like adding astrisck in "*egypt*".
so how can i achieve the following query?
PS: i am using elastica
{
"query": {
"bool": {
"should": [
{
"terms": {
"country_name": [
"*egypt*",
"*italy*"
]
}
}
]
}
},
"sort": [
{
"rank": {
"order": "desc"
}
}
]
}
terms query does not support wildcards. You can use match or wildcard query instead. If your problem is multiple values to filter you can combine queries inside should, so it will look like this
{
"query": {
"bool": {
"should": [
{
"wildcard": {
"country_name": "*egypt*"
}
},
{
"wildcard": {
"country_name": "*italy*"
}
}
]
}
},
"sort": [
{
"rank": {
"order": "desc"
}
}
]
}

Match multiple properties on the same nested document in ElasticSearch

I'm trying to accomplish what boils down to a boolean AND on nested documents in ElasticSearch. Let's say I have the following two documents.
{
"id": 1,
"secondLevels": [
{
"thirdLevels": [
{
"isActive": true,
"user": "anotheruser#domain.com"
}
]
},
{
"thirdLevels": [
{
"isActive": false,
"user": "user#domain.com"
}
]
}
]
}
{
"id": 2,
"secondLevels": [
{
"thirdLevels": [
{
"isActive": true,
"user": "user#domain.com"
}
]
}
]
}
In this case, I want to only match documents (in this case ID: 2) that have a nested document with both isActive: true AND user: user#domain.com.
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "secondLevels.thirdLevels",
"query": {
"bool": {
"must": [
{
"term": {
"secondLevels.thirdLevels.isActive": true
}
},
{
"term": {
"secondLevels.thirdLevels.user": "user#domain.com"
}
}
]
}
}
}
}
]
}
}
}
However, what seems to be happening is that my query turns up both documents because the first document has one thirdLevel that has isActive: true and another thirdLevel that has the appropriate user.
Is there any way to enforce this strictly at query/filter time or do I have to do this in a script?
With nested-objects and nested-query, you have made most of the way.
All you have to do now is to add the inner hits flag and also use source filtering for move entire secondLevels documents out of the way:
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "secondLevels.thirdLevels",
"query": {
"bool": {
"must": [
{
"term": {
"secondLevels.thirdLevels.isActive": true
}
},
{
"term": {
"secondLevels.thirdLevels.user": "user#domain.com"
}
}
]
}
},
"inner_hits": {
"size": 100
}
}
}
]
}
}
}

Elasticsearch - generic facets structure - calculating aggregations combined with filters

In a new project of ours, we were inspired by this article http://project-a.github.io/on-site-search-design-patterns-for-e-commerce/#generic-faceted-search for doing our “facet” structure. And while I have got it working to the extent the article describes, I have run into issues in getting it to work when selecting facets. I hope someone can give a hint as to something to try, so I don’t have to redo all our aggregations into separate aggregation calculations again.
The problem is basically that we are using a single aggregation to calculate all the “facets” at once, but when I add a filter (fx. checking a brand name), then it “removes” all the other brands when returning the aggregates. What I basically want is that it should use that brand as filter when calculating the other facets, but not when calculating the brand aggregations. This is necessary so the user can, for example, choose multiple brands.
Looking at https://www.contorion.de/search/Metabo_Fein/ou1-ou2?q=Winkelschleifer&c=bovy (which is the site described in the above article), I have selected the “Metabo” and “Fein” manufacturer (Hersteller), and unfolding the Hersteller menu it shows all manufacturers and not just the ones selected. So I know it’s possible somehow and I hope some one out there has a hint as to how to write the aggregations / filters, so I get the "correct e-commerce facet behavior".
On the products in ES I have the following structure: (the same as in the original article, though “C#’ified” in naming)
"attributeStrings": [
{
"facetName": "Property",
"facetValue": "Organic"
},
{
"facetName": "Property",
"facetValue": "Without parfume"
},
{
"facetName": "Brand",
"facetValue": "Adidas"
}
]
So the above product has 2 attributes/facet groups – Property with 2 values (Organic, Without parfume) and Brand with 1 value (Adidas).
Without any filters I calculate the aggregations from the following query:
"aggs": {
"agg_attr_strings_filter": {
"filter": {},
"aggs": {
"agg_attr_strings": {
"nested": {
"path": "attributeStrings"
},
"aggs": {
"attr_name": {
"terms": {
"field": "attributeStrings.facetName"
},
"aggs": {
"attr_value": {
"terms": {
"field": "attributeStrings.facetValue",
"size": 1000,
"order": [
{
"_term": "asc"
}
]
} } } } } } } }
Now if I select Property "Organic" and Brand "Adidas" I build the same aggregation, but with a filter to apply those two constraints (which is were it kind of goes wrong...):
"aggs": {
"agg_attr_strings_filter": {
"filter": {
"bool": {
"filter": [
{
"nested": {
"query": {
"bool": {
"filter": [
{
"term": {
"attributeStrings.facetName": {
"value": "Property"
}
}
},
{
"terms": {
"attributeStrings.facetValue": [
"Organic"
]
}
}
]
}
},
"path": "attributeStrings"
}
},
{
"nested": {
"query": {
"bool": {
"filter": [
{
"term": {
"attributeStrings.facetName": {
"value": "Brand"
}
}
},
{
"terms": {
"attributeStrings.facetValue": [
"Adidas"
]
}
}
]
}
},
"path": "attributeStrings"
}
}
]
}
},
"aggs": {
"agg_attr_strings": {
"nested": {
"path": "attributeStrings"
},
"aggs": {
"attr_name": {
"terms": {
"field": "attributeStrings.facetName",
},
"aggs": {
"attr_value": {
"terms": {
"field": "attributeStrings.facetValue",
"size": 1000,
"order": [
{
"_term": "asc"
}
]
} } } } } } } }
The only way I can see forward with this model, is to calculate the aggregation for each selected facet and somehow merge the result. But it seems very complex and kind of defeats the point of having the model as described in the article, so I hope there's a more clean solution and someone can give a hint at something to try.
The only way I can see forward with this model, is to calculate the aggregation for each selected facet and somehow merge the result.
This is exactly right. If one facet (e.g. brand) is selected than you can not use global brand filter if you also want to fetch other brands for multi-selection. What you can do is apply all other filters on selected facets, and all filters on non-selected facets. As a results you will have n+1 separate aggregations for n selected filters - first one is for all facets and the rest are for selected facets.
In your case query might look like:
{
"aggs": {
"agg_attr_strings_filter": {
"filter": {
"bool": {
"filter": [
{
"nested": {
"query": {
"bool": {
"filter": [
{
"term": {
"attributeStrings.facetName": {
"value": "Property"
}
}
},
{
"terms": {
"attributeStrings.facetValue": [
"Organic"
]
}
}
]
}
},
"path": "attributeStrings"
}
},
{
"nested": {
"query": {
"bool": {
"filter": [
{
"term": {
"attributeStrings.facetName": {
"value": "Brand"
}
}
},
{
"terms": {
"attributeStrings.facetValue": [
"Adidas"
]
}
}
]
}
},
"path": "attributeStrings"
}
}
]
}
},
"aggs": {
"agg_attr_strings": {
"nested": {
"path": "attributeStrings"
},
"aggs": {
"attr_name": {
"terms": {
"field": "attributeStrings.facetName"
},
"aggs": {
"attr_value": {
"terms": {
"field": "attributeStrings.facetValue",
"size": 1000,
"order": [
{
"_term": "asc"
}
]
}
}
}
}
}
}
}
},
"special_agg_property": {
"filter": {
"nested": {
"query": {
"bool": {
"filter": [
{
"term": {
"attributeStrings.facetName": {
"value": "Brand"
}
}
},
{
"terms": {
"attributeStrings.facetValue": [
"Adidas"
]
}
}
]
}
},
"path": "attributeStrings"
}
},
"aggs": {
"special_agg_property": {
"nested": {
"path": "attributeStrings"
},
"aggs": {
"agg_filtered_special": {
"filter": {
"query": {
"match": {
"attributeStrings.facetName": "Property"
}
}
},
"aggs": {
"facet_value": {
"terms": {
"size": 1000,
"field": "attributeStrings.facetValue"
}
}
}
}
}
}
}
},
"special_agg_brand": {
"filter": {
"nested": {
"query": {
"bool": {
"filter": [
{
"term": {
"attributeStrings.facetName": {
"value": "Property"
}
}
},
{
"terms": {
"attributeStrings.facetValue": [
"Organic"
]
}
}
]
}
},
"path": "attributeStrings"
}
},
"aggs": {
"special_agg_brand": {
"nested": {
"path": "attributeStrings"
},
"aggs": {
"agg_filtered_special": {
"filter": {
"query": {
"match": {
"attributeStrings.facetName": "Brand"
}
}
},
"aggs": {
"facet_value": {
"terms": {
"size": 1000,
"field": "attributeStrings.facetValue"
}
}
}
}
}
}
}
}
}
}
This query looks super big and scary but generating such query can be done with few dozen lines of code.
When parsing query results, you need to first parse general aggregation (one that uses all filters) and after special facet aggregations. From the upper example, first parse results from agg_attr_strings_filter but those results will also contain aggregation values for Brand and Property that should be overwritten by aggregation values from special_agg_property and special_agg_brand
Also, this query is efficient since Elasticsearch does good job in caching separate filter clauses so applying same filters in different parts of query should be cheap.
But it seems very complex and kind of defeats the point of having the model as described in the article, so I hope there's a more clean solution and someone can give a hint at something to try.
There is really no way around the fact that you need to apply different filters to different facets and at the same time have different query filters. If you need to support "correct e-commerce facet behavior" you will have complex query :)
Disclaimer: I'm coauthor of the mentioned article.
The issue comes from the fact that you are adding a filter on Property and Organic inside your aggregation, hence the more facets you pick, the more you will restrain the terms you will get. In that article, the filter they use is in fact a post_filter, both names were allowed until recently, but filter got removed because that was causing ambiguities.
What you need to do is to move that filter outside the aggregations into the post_filter section, so that the results get correctly filtered out by whatever facets have been picked, but all your facets still get computed correctly on the whole document set.
{
"post_filter": {
"bool": {
"filter": [
{
"nested": {
"query": {
"bool": {
"filter": [
{
"term": {
"attributeStrings.facetName": {
"value": "Property"
}
}
},
{
"terms": {
"attributeStrings.facetValue": [
"Organic"
]
}
}
]
}
},
"path": "attributeStrings"
}
},
{
"nested": {
"query": {
"bool": {
"filter": [
{
"term": {
"attributeStrings.facetName": {
"value": "Brand"
}
}
},
{
"terms": {
"attributeStrings.facetValue": [
"Adidas"
]
}
}
]
}
},
"path": "attributeStrings"
}
}
]
}
},
"aggs": {
"agg_attr_strings_full": {
"nested": {
"path": "attributeStrings"
},
"aggs": {
"attr_name": {
"terms": {
"field": "attributeStrings.facetName"
},
"aggs": {
"attr_value": {
"terms": {
"field": "attributeStrings.facetValue",
"size": 1000,
"order": [
{
"_term": "asc"
}
]
}
}
}
}
}
},
"agg_attr_strings_filtered": {
"filter": {
"bool": {
"filter": [
{
"nested": {
"path": "attributeStrings",
"query": {
"bool": {
"filter": [
{
"term": {
"attributeStrings.facetName": {
"value": "Property"
}
}
},
{
"terms": {
"attributeStrings.facetValue": [
"Organic"
]
}
}
]
}
}
}
},
{
"nested": {
"path": "attributeStrings",
"query": {
"bool": {
"filter": [
{
"term": {
"attributeStrings.facetName": {
"value": "Brand"
}
}
},
{
"terms": {
"attributeStrings.facetValue": [
"Adidas"
]
}
}
]
}
}
}
}
]
}
},
"aggs": {
"nested": {
"path": "attributeStrings"
},
"aggs": {
"attr_name": {
"terms": {
"field": "attributeStrings.facetName"
},
"aggs": {
"attr_value": {
"terms": {
"field": "attributeStrings.facetValue",
"size": 1000,
"order": [
{
"_term": "asc"
}
]
}
}
}
}
}
}
}
}
}

Elasticsearch must_not filter not works with a big bunch of values

I have the next query that include some filters:
{
"from": 0,
"query": {
"function_score": {
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"idpais": [
115
]
}
},
{
"term": {
"tipo": [
1
]
}
}
],
"must_not": [
{
"term": {
"idregistro": [
5912471,
3433876,
9814443,
11703069,
6333176,
8288242,
9924922,
6677850,
11852501,
12530205,
4703469,
12776479,
12287659,
11823679,
12456304,
12777457,
10977614,
...
]
}
}
]
}
},
"query": {
"bool": {
"should": [
{
"match_phrase": {
"area": "Coordinator"
}
},
{
"match_phrase": {
"company": {
"boost": 5,
"query": "IBM"
}
}
},
{
"match_phrase": {
"topic": "IT and internet stuff"
}
},
{
"match_phrase": {
"institution": {
"boost": 5,
"query": "University of my city"
}
}
}
]
}
}
}
},
"script_score": {
"params": {
"idpais": 115,
"idprovincia": 0,
"relationships": []
},
"script_id": "ScoreUsuarios"
}
}
},
"size": 24,
"sort": [
{
"_script": {
"order": "desc",
"script_id": "SortUsuarios",
"type": "number"
}
}
]
}
The must_not filter has a big bunch of values to exclude (around 200 values), but it looks like elasticsearch ignores those values and it includes on the result set. If I try to set only a few values (10 to 20 values) then elasticsearch applies the must_not filter.
Exists some restriction a bout the amount of values in the filters? Exists some way to remove a big amount of results from the query?
terms query is used for passing a list of values not term query.You have to use it like below in your must filter.
{
"query": {
"terms": {
"field_name": [
"VALUE1",
"VALUE2"
]
}
}
}

Resources