Elasticsearch Aggregations: filtering a global aggregation with nested queries - elasticsearch

I have 'nested' mapping like so:
"stringAttributes":{
"type":"nested",
"properties":{
"Name":{
"type":"keyword"
},
"Value":{
"type":"keyword"
}
}
},
and thus have docs that such as:
stringAttributes:[
{
Name:"supplier",
Value:"boohoo"
},
{
Name:"brand",
Value:"gucci"
},
{
Name:"primaryColour",
Value:"black"
},
{
Name:"secondaryColour",
Value:"green"
},
{
Name:"size",
Value:"12"
}
]
In building faceted search I believe I need a global aggregation. I.e. when a supplier is filtered by a user, the result set will not contains docs from other suppliers, so the regular aggregation will not contain any of the other supplier.
The query could include the following clauses:
"must": [
{
"nested": {
"path": "stringAttributes",
"query": {
"bool": {
"must": [
{
"term": {
"stringAttributes.Name": "supplier"
}
},
{
"terms": {
"stringAttributes.Value": [
"boohoo"
]
}
}
]
}
}
}
},
{
"nested": {
"path": "stringAttributes",
"query": {
"bool": {
"must": [
{
"term": {
"stringAttributes.Name": "brand"
}
},
{
"terms": {
"stringAttributes.Value": [
"warehouse"
]
}
}
]
}
}
}
}
]
So in this case I need a global aggregation that is then filtered by all OTHER filters applied (e.g. by brand) that will return the other suppliers that could be selected given these other filters.
This is what I have so far. It returns the 'global' unfiltered results however. At this point I am completely stumped.
{
"global":{},
"aggs":{
"inner":{
"filter":{
"nested":{
"query":{
"bool":{
"filter":[
{
"term":{
"stringAttributes.Name":{
"value":"brand"
}
}
},
{
"terms":{
"stringAttributes.Value":[
"warehouse"
]
}
}
]
}
},
"path":"stringAttributes"
}
}
},
"aggs":{
"nested":{
"path":"stringAttributes"
},
"aggs":{
"aggs":{
"filter":{
"match":{
"stringAttributes.Name":"supplier"
}
},
"aggs":{
"facet_value":{
"terms":{
"size":1000,
"field":"stringAttributes.Value"
}
}
}
}
}
}
}
}
Any suggestions for filtering a global aggregation with nested attributes? I have read through a lot of documentation of various other answers on SO but still struggling to understand why this particular agg is not being filtered.

My suggested answer after some more digging...
{
"global":{
},
"aggs":{
"inner":{
"filter":{
"nested":{
"query":{
"bool":{
"filter":[
{
"term":{
"stringAttributes.Name":{
"value":"brand"
}
}
},
{
"terms":{
"stringAttributes.Value":[
"warehouse"
]
}
}
]
}
},
"path":"stringAttributes"
}
},
"aggs":{
"nested":{
"path":"stringAttributes"
},
"aggs":{
"agg_filtered_special":{
"filter":{
"match":{
"stringAttributes.Name":"supplier"
}
},
"aggs":{
"facet_value":{
"terms":{
"size":1000,
"field":"stringAttributes.Value"
}
}
}
}
}
}
}
}
}

Related

elasticsearch nested query, more than one one object should meet conditions

I have some questions about nested query.
Here is my example. The mapping is {"user":"nested"}.The exist data just like this:
{
"user": [
{
"first":"John",
"last":"Smith"
},
{
"first":"Alice",
"last":"White"
}
]
}
How do I create a query to find this document that meets all the conditions:
the first object of user that its "first" is "John" and "last" is "Smith";
the second object of user that its "first" is "Alice" and "last" is "White"
Try with below query :
{
"query":{
"bool":{
"filter":[
{
"bool":{
"must":[
{
"bool":{
"must":[
{
"nested":{
"query":{
"bool":{
"must":[
{
"match_phrase":{
"user.first":{
"query":"John"
}
}
},
{
"match_phrase":{
"user.last":{
"query":"Smith"
}
}
}
]
}
},
"path":"user"
}
},
{
"nested":{
"query":{
"bool":{
"must":[
{
"match_phrase":{
"user.first":{
"query":"Alice"
}
}
},
{
"match_phrase":{
"user.last":{
"query":"White"
}
}
}
]
}
},
"path":"user"
}
}
]
}
}
]
}
}
]
}
}
}
Below query is what you are looking for. You simply need to have two nested queries, one for each conditions you've mentioned, combined in a bool using must clause.
Note that I'm assuming that the fields user.first and user.last are of text type having standard analyzer
POST <your_index_name>
{
"query":{
"bool":{
"must":[
{
"nested":{
"path":"user",
"query":{
"bool":{
"must":[
{
"match":{
"user.first":"john"
}
},
{
"match":{
"user.last":"smith"
}
}
]
}
}
}
},
{
"nested":{
"path":"user",
"query":{
"bool":{
"must":[
{
"match":{
"user.first":"alice"
}
},
{
"match":{
"user.last":"white"
}
}
]
}
}
}
}
]
}
}
}
Hope this helps!
The answer is:
{
"query": {
"bool": {
"must": [
{
"has_parent": {
"parent_type": "doc",
"query": {
"bool": {
"must": [
{
"terms": {
"id": [
713
]
}
},
{
"range": {
"created": {
"lte": "now/d"
}
}
},
{
"range": {
"expires": {
"gte": "now/d"
}
}
}
]
}
}
}
},
{
"nested": {
"path": "prices",
"query": {
"bool": {
"filter": [
{
"term": {
"prices.id_prcknd": 167
}
}
]
}
}
}
},
{
"term": {
"doc_type": "item"
}
},
{
"bool": {
"should": [
{
"term": {
"have_prices": true
}
},
{
"term": {
"is_folder": true
}
}
]
}
}
],
"must_not": {
"exists": {
"field": "folder"
}
}
}
},
"sort": [
{
"is_folder": {
"order": "desc"
}
},
{
"title_low.order": {
"order": "asc"
}
}
],
"size": 1000
}

Filtering nested aggregations in ElasticSearch

Given the following mapping from my index (Items):
{
"title": {
"type":"string"
},
"tag_groups": {
"type":"nested",
"include_in_parent":true,
"properties": {
"name":{
"type":"string",
"index":"not_analyzed"
},
"terms": {
"type":"string",
"index":"not_analyzed"
}
}
}
}
And the following sample of data that each document in the index follows:
{
"title":"Christian Louboutin Magenta Leather Lady Peep",
"tag_groups": [
{
"name": "Color",
"terms": ["pink"]
},
{
"name":"Material/Fabric",
"terms":["leather"]
},
{
"name":"Season",
"terms":["summer", "spring"]
},
{
"name":"Occasion",
"terms":["cocktail", "night out", "wedding: for the guests", "date night"]
}
],
}
IMPORTANT: These tag_groups are variable from product to product and category to category. So pulling them out of the nested property would be tough since it would create index properties that don't apply to all documents in the index.
Here is my query that is producing the correct aggregated results across each tag_groups.name and corresponding set of values. Counts are accurate too.
{
"size":"40",
"query": {
"filtered": {
"query": {"match_all": {}}
}
},
"aggs":{
"tagGroupAgg": {
"nested": {
"path":"tag_groups"
},
"aggs":{
"tagGroupNameAgg":{
"terms":{
"field":"tag_groups.name"
},
"aggs":{
"tagGroupTermsAgg":{
"terms": {
"field":"tag_groups.terms"
}
}
}
}
}
}
}
}
NOW FOR THE QUESTION...
In order for the aggregation counts on the left to reflect accurately, when I apply a TermsFilter to the aggregation (tag_groups.Color = ['pink']), how do I make sure that aggregation filter isn't applied to the tag_groups.Color result?
Currently, when I apply that filter I am losing all of my tag_groups.Colors (except for pink) preventing the user from search other colors...
I'm hitting a wall on this one. Any help would be much appreciated!
{
"size":"40",
"query":{
"filtered":{
"query":{
"match_all":{
}
}
}
},
"aggs":{
"tagGroupAgg":{
"nested":{
"path":"tag_groups"
},
"aggs":{
"tagGroupNameAgg":{
"terms":{
"field":"tag_groups.name"
},
"aggs":{
"tagGroupTermsAgg":{
"terms":{
"field":"tag_groups.terms"
},
"aggs":{
"tagGroupTermsReverseAgg":{
"reverse_nested":{
},
"aggs":{
"testingReverseFilter":{
"filter":{
"bool":{
"must":[
{
"terms":{
"tag_groups.name":[
"Color"
]
}
},
{
"terms":{
"brand_name.raw":[
"Chanel"
]
}
}
]
}
},
"aggs":{
"tagGroupTermsAgg2":{
"terms":{
"field":"tag_groups.terms"
}
}
}
}
}
}
}
}
}
}
}
}
}
}

Elastic search filtered query, query part being ignored?

I'm building up the following search in code, the idea being that it filters down the set of matches then queries this so I can add score based on certain fields. For some reason the filter part works but whatever I put in the query (i.e. in the below I have no index sdfsdfsdf) it still returns anything matching the filter.
Is the syntax wrong?
{
"query":{
"filtered":{
"query":{
"bool":{
"must":{
"match":{
"sdfsdfsdf":{
"query":"4",
"boost":2.0
}
}
}
},
"filter":{
"bool":{
"must":[
{
"terms":{
"_id":[
"55f93ead5df34f1900abc20b",
"55f8ab0226ec4bb216d7c938",
"55dc4e949dcf833308c63d6b"
]
}
},
{
"range":{
"published_date":{
"lte":"now"
}
}
}
],
"must_not":{
"terms":{
"_id":[
"55f0a799acccc28204a5058c"
]
}
}
}
}
}
}
}
}
Your filter is not at the right level. It should not be inside query but at the same level as query like this:
{
"query": {
"filtered": {
"query": { <--- query and filter at the same level
"bool": {
"must": {
"match": {
"sdfsdfsdf": {
"query": "4",
"boost": 2
}
}
}
}
},
"filter": { <--- query and filter at the same level
"bool": {
"must": [
{
"terms": {
"_id": [
"55f93ead5df34f1900abc20b",
"55f8ab0226ec4bb216d7c938",
"55dc4e949dcf833308c63d6b"
]
}
},
{
"range": {
"published_date": {
"lte": "now"
}
}
}
],
"must_not": {
"terms": {
"_id": [
"55f0a799acccc28204a5058c"
]
}
}
}
}
}
}
}
You need to replace sdfsdfsdf with your existing field name in your type, e.g. title, otherwise I think it will fallback to match_all query.
"match":{
"title":{
"query": "some text here",
"boost":2.0
}
}

How to filter an elasticsearch global aggregation?

What I want to achieve: I want my "age" aggregation to not be filtered by the query filter and I want to be able to apply filters to it.
So if I start with this query:
{
"query":{
"filtered":{
"filter":{ "terms":{ "family_name":"Brown" } } //filter_1
}
},
"aggs":{
"young_age":{
"filter":{
"range":{ "lt":40, "gt":18 } //filter_2
},
"aggs":{
"age":{
"terms":{
"field":"age"
}
}
}
}
}
}
My aggregation "young_age" will be filtered by both filter_1 and filter_2. I don't want my aggregation to be filtered by filter_1.
As I was looking into the documentation, I thought global aggregation would solve my problem, and I wrote that query:
{
"query":{
"filtered":{
"filter":{ "terms":{ "family_name":"Brown" } } //filter_1
}
},
"aggs":{
"young_age":{
"global":{}, //<----------- add global
"filter":{
"range":{ "lt":40, "gt":18 } //filter_2
},
"aggs":{
"age":{
"terms":{
"field":"age"
}
}
}
}
}
}
But then elastic search complains about my filter_2:
"""
Found two aggregation type definitions [age] in [global] and [filter]
"""
And of course if I remove the filter_2:
{
"query":{
"filtered":{
"filter":{
"terms":{
"family_name":"Brown"
}
}
}
},
"aggs":{
"young_age":{
"global":{},
"aggs":{
"age":{
"terms":{
"field":"age"
}
}
}
}
}
}
Then my aggregation won't be filtered by filter_1 (as expected).
So how am I suppose to apply filter_2 to my global aggregation? Or how am I supposed to achieved that? I remember writing something similar with the facet filters...
In my opinion this is the typical use case of a post_filter. As the doc says:
The post_filter is applied to the search hits at the very end of a search request, after aggregations have already been calculated
Your query will look like:
{
"post_filter":{
"terms":{
"family_name":"Brown" //filter_1
}
},
"aggs":{
"young_age":{
"filter":{
"range":{ "lt":40, "gt":18 } //filter_2
},
"aggs":{
"age":{
"terms":{
"field":"age"
}
}
}
}
}
}
In this case the search hits are all the documents in the index. Then the aggregation is calculated (before filter_1). And after that the post_filter with the filter_1 will be executed.
Edit: As you said in your commend you have many aggregations and only one that shouldn't be affected by filter_1 I fixed your query using global aggregation
{
"query": {
"filtered": {
"filter": {
"term": {
"family_name": "Brown"
}
}
}
},
"aggs": {
"young_age": {
"global": {},
"aggs": {
"filter2": {
"filter": {
"range": {
"lt": 40,
"gt": 18
}
},
"aggs": {
"age": {
"terms": {
"field": "age"
}
}
}
}
}
}
}
}
globals and filters are not allowed at same level. you have to put filter at one level inside to global aggregation.
something like this should do for you.
{
"query":{
"filtered":{
"filter":{
"terms":{
"family_name":"Brown"
}
}
}
},
"aggs":{
"young_age":{
"global":{},
"aggs":{
"filter": {"term": {"family_name": "Brown"}}, #or {"bool": {"filter": {"term": {"family_name": "Brown"}}}}
"aggs": {
"age":{
"terms":{
"field":"age"
}
}
}
}
}
}
}

Which DSL is correct for performing a pre-filtered query?

I've looked back at some queries I have saved, and it appears I've managed to achieve essentially the same query in three different ways. They all return the same data, but which one is 'correct'? I.e., which one contains no superfluous code and is most performant?
Option 1
{
"query":{
"bool":{
"must":[
{
"match":{
"event":"eventname"
}
},
{
"range":{
"#timestamp":{
"gt":"now-70s"
}
}
}
]
}
},
"aggs":{
"myterms":{
"terms":{
"field":"fieldname"
}
}
}
}
Option 2
{
"query":{
"filtered":{
"filter":{
"bool":{
"must":[
{
"match":{
"event":"eventname"
}
},
{
"range":{
"#timestamp":{
"gt":"now-70s"
}
}
}
]
}
}
}
},
"aggs":{
"myterms":{
"terms":{
"field":"fieldname"
}
}
}
}
Option 3
{
"query":{
"filtered":{
"query":{
"bool":{
"must":[
{
"match":{
"event":"eventname"
}
},
{
"range":{
"#timestamp":{
"gt":"now-70s"
}
}
}
]
}
}
}
},
"aggs":{
"myterms":{
"terms":{
"field":"fieldname"
}
}
}
}
If I were to guess, I'd go for Option 2, as the others appear that they might be running match as query. But the documentation is pretty confusing regarding the correct form that DSL queries should take.
Based on your comment, I'd go for option 2 but with a simple term filter for starters instead of match which isn't allowed in filters.
{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"event": "eventname"
}
},
{
"range": {
"#timestamp": {
"gt": "now-70s"
}
}
}
]
}
}
}
},
"aggs": {
"myterms": {
"terms": {
"field": "event"
}
}
}
}

Resources