Filtering nested aggregations in ElasticSearch - elasticsearch

Given the following mapping from my index (Items):
{
"title": {
"type":"string"
},
"tag_groups": {
"type":"nested",
"include_in_parent":true,
"properties": {
"name":{
"type":"string",
"index":"not_analyzed"
},
"terms": {
"type":"string",
"index":"not_analyzed"
}
}
}
}
And the following sample of data that each document in the index follows:
{
"title":"Christian Louboutin Magenta Leather Lady Peep",
"tag_groups": [
{
"name": "Color",
"terms": ["pink"]
},
{
"name":"Material/Fabric",
"terms":["leather"]
},
{
"name":"Season",
"terms":["summer", "spring"]
},
{
"name":"Occasion",
"terms":["cocktail", "night out", "wedding: for the guests", "date night"]
}
],
}
IMPORTANT: These tag_groups are variable from product to product and category to category. So pulling them out of the nested property would be tough since it would create index properties that don't apply to all documents in the index.
Here is my query that is producing the correct aggregated results across each tag_groups.name and corresponding set of values. Counts are accurate too.
{
"size":"40",
"query": {
"filtered": {
"query": {"match_all": {}}
}
},
"aggs":{
"tagGroupAgg": {
"nested": {
"path":"tag_groups"
},
"aggs":{
"tagGroupNameAgg":{
"terms":{
"field":"tag_groups.name"
},
"aggs":{
"tagGroupTermsAgg":{
"terms": {
"field":"tag_groups.terms"
}
}
}
}
}
}
}
}
NOW FOR THE QUESTION...
In order for the aggregation counts on the left to reflect accurately, when I apply a TermsFilter to the aggregation (tag_groups.Color = ['pink']), how do I make sure that aggregation filter isn't applied to the tag_groups.Color result?
Currently, when I apply that filter I am losing all of my tag_groups.Colors (except for pink) preventing the user from search other colors...
I'm hitting a wall on this one. Any help would be much appreciated!
{
"size":"40",
"query":{
"filtered":{
"query":{
"match_all":{
}
}
}
},
"aggs":{
"tagGroupAgg":{
"nested":{
"path":"tag_groups"
},
"aggs":{
"tagGroupNameAgg":{
"terms":{
"field":"tag_groups.name"
},
"aggs":{
"tagGroupTermsAgg":{
"terms":{
"field":"tag_groups.terms"
},
"aggs":{
"tagGroupTermsReverseAgg":{
"reverse_nested":{
},
"aggs":{
"testingReverseFilter":{
"filter":{
"bool":{
"must":[
{
"terms":{
"tag_groups.name":[
"Color"
]
}
},
{
"terms":{
"brand_name.raw":[
"Chanel"
]
}
}
]
}
},
"aggs":{
"tagGroupTermsAgg2":{
"terms":{
"field":"tag_groups.terms"
}
}
}
}
}
}
}
}
}
}
}
}
}
}

Related

Elasticsearch Aggregations: filtering a global aggregation with nested queries

I have 'nested' mapping like so:
"stringAttributes":{
"type":"nested",
"properties":{
"Name":{
"type":"keyword"
},
"Value":{
"type":"keyword"
}
}
},
and thus have docs that such as:
stringAttributes:[
{
Name:"supplier",
Value:"boohoo"
},
{
Name:"brand",
Value:"gucci"
},
{
Name:"primaryColour",
Value:"black"
},
{
Name:"secondaryColour",
Value:"green"
},
{
Name:"size",
Value:"12"
}
]
In building faceted search I believe I need a global aggregation. I.e. when a supplier is filtered by a user, the result set will not contains docs from other suppliers, so the regular aggregation will not contain any of the other supplier.
The query could include the following clauses:
"must": [
{
"nested": {
"path": "stringAttributes",
"query": {
"bool": {
"must": [
{
"term": {
"stringAttributes.Name": "supplier"
}
},
{
"terms": {
"stringAttributes.Value": [
"boohoo"
]
}
}
]
}
}
}
},
{
"nested": {
"path": "stringAttributes",
"query": {
"bool": {
"must": [
{
"term": {
"stringAttributes.Name": "brand"
}
},
{
"terms": {
"stringAttributes.Value": [
"warehouse"
]
}
}
]
}
}
}
}
]
So in this case I need a global aggregation that is then filtered by all OTHER filters applied (e.g. by brand) that will return the other suppliers that could be selected given these other filters.
This is what I have so far. It returns the 'global' unfiltered results however. At this point I am completely stumped.
{
"global":{},
"aggs":{
"inner":{
"filter":{
"nested":{
"query":{
"bool":{
"filter":[
{
"term":{
"stringAttributes.Name":{
"value":"brand"
}
}
},
{
"terms":{
"stringAttributes.Value":[
"warehouse"
]
}
}
]
}
},
"path":"stringAttributes"
}
}
},
"aggs":{
"nested":{
"path":"stringAttributes"
},
"aggs":{
"aggs":{
"filter":{
"match":{
"stringAttributes.Name":"supplier"
}
},
"aggs":{
"facet_value":{
"terms":{
"size":1000,
"field":"stringAttributes.Value"
}
}
}
}
}
}
}
}
Any suggestions for filtering a global aggregation with nested attributes? I have read through a lot of documentation of various other answers on SO but still struggling to understand why this particular agg is not being filtered.
My suggested answer after some more digging...
{
"global":{
},
"aggs":{
"inner":{
"filter":{
"nested":{
"query":{
"bool":{
"filter":[
{
"term":{
"stringAttributes.Name":{
"value":"brand"
}
}
},
{
"terms":{
"stringAttributes.Value":[
"warehouse"
]
}
}
]
}
},
"path":"stringAttributes"
}
},
"aggs":{
"nested":{
"path":"stringAttributes"
},
"aggs":{
"agg_filtered_special":{
"filter":{
"match":{
"stringAttributes.Name":"supplier"
}
},
"aggs":{
"facet_value":{
"terms":{
"size":1000,
"field":"stringAttributes.Value"
}
}
}
}
}
}
}
}
}

Can i filter subarray in Elasticsearch?

I have orders and order products attached for each order as subarray in Elastic Search. When i'm aggregating Prices i need possibility to filter my order products in my documents of orders.
Example of my document in Elastic:
{
"OrderID":4567488,
"projectId":"4",
"Project":"direkt",
"legacy_id":null,
"supporterId":null,
"Origin":"FR",
"orderProducts":[
{
"OrderProductID":"15694898",
"OrderID":"4567488",
"brandNo":"30",
"Price":"26.95",
},
{
"OrderProductID":"15694898",
"OrderID":"4567488",
"brandNo":"15",
"Price":"15.22",
},
{
"OrderProductID":"15694898",
"OrderID":"4567488",
"brandNo":"123",
"Price":"24.55",
},
]
}
How im filter right now:
{
"index":"order_index",
"from":0,
"size":100,
"body":{
"query":{
"filtered":{
"filter":{
"bool":{
"must":[
{
"term":{
"orderProducts.brandNo":"30"
}
}
],
}
}
}
}
}
}
What i'm expecting
{
"OrderID":4567488,
"projectId":"4",
"Project":"direkt",
"legacy_id":null,
"supporterId":null,
"Origin":"FR",
"orderProducts":[
{
"OrderProductID":"15694898",
"OrderID":"4567488",
"brandNo":"30",
"Price":"26.95",
},
]
}
What i'm really getting:
All document.
That is possible? To filter subarray data?
UPD.
Yes this is my schema mappings:
"mappings":{
"order":{
"dynamic_templates":[
{
"strings":{
"mapping":{
"type":"string",
"fields":{
"raw":{
"index":"not_analyzed",
"type":"string"
}
}
},
"match_mapping_type":"string"
}
}
],
"properties":{
"orderProducts":{
"include_in_parent":true,
"properties":{
"OrderProductID":{
"type":"long"
},
"OrderID":{
"type":"long"
},
"brandNo":{
"type":"long"
},
"Price":{
"type":"double"
}
},
"type":"nested"
},
"OrderID":{
"type":"long"
}
}
}
},
All right, after some experiments i discovered that that aggregation can be done like this:
{
"aggs":{
"sales":{
"nested":{
"path":"orderProducts"
},
"aggs":{
"filtered_nestedobjects":{
"filter":{
"bool":{
"must":[
{
"terms":{
"orderProducts.brandNo":[
"30"
]
}
}
]
}
},
"aggs":{
"Quantity":{
"sum":{
"field":"orderProducts.Quantity"
}
}
}
}
}
}
}
}
And the answer to main question can we filter subarray of elastic is yes. With the inner_hits only i did this.

How to filter top terms aggregation in ElasticSearch?

I have orders documents like this:
{
"customer":{
"id":1,
"Name":"Foobar"
},
"products":[
{
"id":1,
"name":"Television",
"category": 11
},
{
"id":2,
"name":"Smartphone",
"category": 12
}
]
}
And I am performing a top_terms_aggregation in order to know the products best sellers. To do it globally, I use:
{
"size":0,
"aggs":{
"products":{
"nested":{
"path":"products"
},
"aggs":{
"top_terms_aggregation":{
"terms":{
"field":"products.id",
"size":10
}
}
}
}
}
}
But, how would I filter the products given a category_id? I tried adding this filter:
"query":{
"bool":{
"must":[
{
"match":{
"products.category":11
}
}
]
}
}
But this filters the orders itself that has some product with the given category, and the aggregation gets corrupted.
I want to get the best sellers products that belongs to a given category.
Solved this way:
{
"size":0,
"aggs":{
"products":{
"nested":{
"path":"products"
},
"aggs":{
"first_filter":{
"filter":{
"term":{
"products.category":11
}
},
"aggs":{
"top_terms_aggregation":{
"terms":{
"field":"products.id",
"size":10
}
}
}
}
}
}
}
}
Must be this exact sequence of aggs or "stranger things" happens.
You may use filter aggregation
GET _search
{
"size":0,
"aggs":{
"products":{
"nested":{
"path":"products"
},
"filter": {
"term": {
"category": 11
}
},
"aggs":{
"top_terms_aggregation":{
"terms":{
"field":"products.id",
"size":10
}
}
}
}
}
}

'Should' bool query fetches unwanted results

I want to perform a query equivalent to the following MYSQL query
SELECT http_user, http_req_method, dst dst_port count(*) as total
FROM my_table
WHERE http_req_method='GET' OR http_req_method="POST"
GROUP BY http_user, http_req_method, dst dst_port
I built the following query:
{
"query":{
"bool":{
"should":[
{
"term":{"http_req_method":"GET"}
},
{
"term":{"http_req_method":"POST"}
}
],
}
},
"aggs":{
suser":{
"terms":{
"field":"http_user"
},
"aggs":{
"dst":{
"terms":{
"field":"dst"
},
"aggs":{
"dst_port":{
"terms":{
"field":"dst_port"
},
"aggs":{
"http_req_method":{
"terms":{
"field":"http_req_method"
}
}
}
}
}
}
}
}
}
}
( I might be missing some branches there but it's correct in my code). The problem is that results also include other methods too like CONNECT, although I only ask for GET or POST. I thought aggregations are applied on the results after the query. Am I doing something wrong here?
I would leverage "minimum_should_match", like this:
"query":{
"bool":{
"minimum_should_match": 1,
"should":[
{
"term":{"http_req_method":"GET"}
},
{
"term":{"http_req_method":"POST"}
}
],
}
},
Another way that works better would be to leverage the terms query in a bool/filter clause instead
"query":{
"bool":{
"filter":[
{
"terms": {"http_req_method": ["GET", "POST"] }
}
]
}
},
According to the latest Elasticsearch documentation, you should move the filter part inside the aggregation. Something like this:
{
"aggs":{
get_post_requests":{
"filter" : {
"bool": [
{ "term":{"http_req_method":"GET"} },
{ "term":{"http_req_method":"POST"} },
]
},
"aggs": {
"suser"{
"terms":{
"field":"http_user"
}
},
"aggs":{
"dst":{
"terms":{
"field":"dst"
},
"aggs":{
"dst_port":{
"terms":{
"field":"dst_port"
},
"aggs":{
"http_req_method":{
"terms":{
"field":"http_req_method"
}
}
}
}
}
}
}
}
}
}
}
Hope the parentheses are ok. Let me know if this gets you closer to the result :)

How to filter an elasticsearch global aggregation?

What I want to achieve: I want my "age" aggregation to not be filtered by the query filter and I want to be able to apply filters to it.
So if I start with this query:
{
"query":{
"filtered":{
"filter":{ "terms":{ "family_name":"Brown" } } //filter_1
}
},
"aggs":{
"young_age":{
"filter":{
"range":{ "lt":40, "gt":18 } //filter_2
},
"aggs":{
"age":{
"terms":{
"field":"age"
}
}
}
}
}
}
My aggregation "young_age" will be filtered by both filter_1 and filter_2. I don't want my aggregation to be filtered by filter_1.
As I was looking into the documentation, I thought global aggregation would solve my problem, and I wrote that query:
{
"query":{
"filtered":{
"filter":{ "terms":{ "family_name":"Brown" } } //filter_1
}
},
"aggs":{
"young_age":{
"global":{}, //<----------- add global
"filter":{
"range":{ "lt":40, "gt":18 } //filter_2
},
"aggs":{
"age":{
"terms":{
"field":"age"
}
}
}
}
}
}
But then elastic search complains about my filter_2:
"""
Found two aggregation type definitions [age] in [global] and [filter]
"""
And of course if I remove the filter_2:
{
"query":{
"filtered":{
"filter":{
"terms":{
"family_name":"Brown"
}
}
}
},
"aggs":{
"young_age":{
"global":{},
"aggs":{
"age":{
"terms":{
"field":"age"
}
}
}
}
}
}
Then my aggregation won't be filtered by filter_1 (as expected).
So how am I suppose to apply filter_2 to my global aggregation? Or how am I supposed to achieved that? I remember writing something similar with the facet filters...
In my opinion this is the typical use case of a post_filter. As the doc says:
The post_filter is applied to the search hits at the very end of a search request, after aggregations have already been calculated
Your query will look like:
{
"post_filter":{
"terms":{
"family_name":"Brown" //filter_1
}
},
"aggs":{
"young_age":{
"filter":{
"range":{ "lt":40, "gt":18 } //filter_2
},
"aggs":{
"age":{
"terms":{
"field":"age"
}
}
}
}
}
}
In this case the search hits are all the documents in the index. Then the aggregation is calculated (before filter_1). And after that the post_filter with the filter_1 will be executed.
Edit: As you said in your commend you have many aggregations and only one that shouldn't be affected by filter_1 I fixed your query using global aggregation
{
"query": {
"filtered": {
"filter": {
"term": {
"family_name": "Brown"
}
}
}
},
"aggs": {
"young_age": {
"global": {},
"aggs": {
"filter2": {
"filter": {
"range": {
"lt": 40,
"gt": 18
}
},
"aggs": {
"age": {
"terms": {
"field": "age"
}
}
}
}
}
}
}
}
globals and filters are not allowed at same level. you have to put filter at one level inside to global aggregation.
something like this should do for you.
{
"query":{
"filtered":{
"filter":{
"terms":{
"family_name":"Brown"
}
}
}
},
"aggs":{
"young_age":{
"global":{},
"aggs":{
"filter": {"term": {"family_name": "Brown"}}, #or {"bool": {"filter": {"term": {"family_name": "Brown"}}}}
"aggs": {
"age":{
"terms":{
"field":"age"
}
}
}
}
}
}
}

Resources