Elasticsearch. Terms aggregation on nested field with duplicated values - elasticsearch

I have some problem with nested aggregation in Elasticsearch. I have mapping with nested field:
POST my_index/ my_type / _mapping
{
"properties": {
"name": {
"type": "keyword"
},
"nested_fields": {
"type": "nested",
"properties": {
"key": {
"type": "keyword"
},
"value": {
"type": "keyword"
}
}
}
}
}
Then I add one document to index:
POST my_index/ my_type
{
"name":"object1",
"nested_fields":[
{
"key": "key1",
"value": "value1"
},
{
"key": "key1",
"value": "value2"
}
]
}
As you see, in my nested array I have two items, which have similar key field, but different value field. Then I want to make such query:
GET / my_index / my_type / _search
{
"query": {
"nested": {
"path": "nested_fields",
"query": {
"bool": {
"must": [
{
"term": {
"nested_fields.key": {
"value": "key1"
}
}
},
{
"terms": {
"nested_fields.value": [
"value1",
"value2"
]
}
}
]
}
}
}
},
"aggs": {
"agg_nested_fields": {
"nested": {
"path": "nested_fields"
},
"aggs": {
"agg_nested_fields_key": {
"terms": {
"field": "nested_fields.key",
"size": 10
}
}
}
}
}
}
As you see, I want to find all documents, which have at least one object in nested_field array, with key property equal to key1 and one of provided values (value1 or value2). Then I want to group founded documents by nested_fields.key. But I have such response
{
"took": 13,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.87546873,
"hits": [
{
"_index": "my_index",
"_type": "my_type",
"_id": "AVuLNXxiryKmA7VEwOfV",
"_score": 0.87546873,
"_source": {
"name": "object1",
"nested_fields": [
{
"key": "key1",
"value": "value1"
},
{
"key": "key1",
"value": "value2"
}
]
}
}
]
},
"aggregations": {
"agg_nested_fields": {
"doc_count": 2,
"agg_nested_fields_key": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "key1",
"doc_count": 2
}
]
}
}
}
}
As you see from the response, I have one hit (it is correct), but the document was counted two times in aggregation (see doc_count: 2), because it has two items with 'key1' value in nested_fields array. How can I get the right count in aggregation?

You will have to use reverse_nested aggs inside the nested aggregation to return the aggregation count on root document.
{
"query": {
"nested": {
"path": "nested_fields",
"query": {
"bool": {
"must": [{
"term": {
"nested_fields.key": {
"value": "key1"
}
}
},
{
"terms": {
"nested_fields.value": [
"value1",
"value2"
]
}
}
]
}
}
}
},
"aggs": {
"agg_nested_fields": {
"nested": {
"path": "nested_fields"
},
"aggs": {
"agg_nested_fields_key": {
"terms": {
"field": "nested_fields.key",
"size": 10
},
"aggs": {
"back_to_root": {
"reverse_nested": {
"path": "_source"
}
}
}
}
}
}
}
}

Related

Query/filter binned documents in a terms query?

I've got some data that share a property, lets say I've got these documents:
{
session: "session-1",
status: "New",
},
{
session: "session-1",
title: "My session",
},
{
session: "session-1",
message: "hi there",
},
{
session: "session-2",
status: "Closed",
},
{
session: "session-2",
message: "hi!",
},
If I do an aggregation:
body: {
aggs: {
sessions: {
field: "session",
},
},
},
I get two buckets, with 3 and 2 documents in:
"aggregations": {
"sessions": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "session-1",
"doc_count": 3
},
{
"key": "session-2",
"doc_count": 2
},
]
}
}
Can I run a filter or query on the buckets in some way?
body: {
aggs: {
sessions: {
field: "session",
},
aggs: {
filter_docs: { bool: must [{ match: { message: "hi" } }, { match: { status: "New" } }],
}
},
},
I know I can can apply a query over all the documents, but I want to be able to do more complex filters in the sub documents (i.e. filter out buckets that contain BOTH a message: hi and a status: New)
Since there is no document in the example shown above that contains BOTH a message: hi and a status: New.
Adding a working example for filtering the documents that contain both message: Hi and session: session-1 using filter aggregation.
{
"size": 0,
"aggs": {
"filtererd": {
"filter": {
"bool": {
"must": [
{
"match": {
"message": "hi"
}
},
{
"match": {
"session.keyword": "session-1"
}
}
]
}
},
"aggs": {
"top_filter": {
"top_hits": {}
}
}
}
}
}
Search Result will be
"aggregations": {
"filtererd": {
"doc_count": 1,
"top_filter": {
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1.0,
"hits": [
{
"_index": "66714173",
"_type": "_doc",
"_id": "3",
"_score": 1.0,
"_source": {
"session": "session-1",
"message": "hi there"
}
}
]
}
}
}
}
If you want to filter the result of terms aggregation
Search Query:
{
"size": 0,
"aggs": {
"genres": {
"terms": {
"field": "session"
},
"aggs": {
"filtererd": {
"filter": {
"bool": {
"must": [
{
"match": {
"message": "hi"
}
}
]
}
}
}
}
}
}
}
Search Result will be
"aggregations": {
"genres": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "session-1",
"doc_count": 3,
"filtererd": {
"doc_count": 1 // note this
}
},
{
"key": "session-2",
"doc_count": 2,
"filtererd": {
"doc_count": 1
}
}
]
}

How to sort nested aggregation field based on parent document field in elasticsearch?

I have index of stores at various location. With each store I have a nested list of discount coupon.
Now I have query to get list of all unique coupons in a x km of radius sorted by the distance of the nearest applicable coupon on given location
Database :: Elasticsearch
Index Mapping ::
{
"mappings": {
"car_stores": {
"properties": {
"location": {
"type": "geo_point"
},
"discount_coupons": {
"type": "nested",
"properties": {
"name": {
"type": "keyword"
}
}
}
}
}
}
}
Sample Doc ::
{
"_index": "stores",
"_type": "car_stores",
"_id": "1258c81d-b6f2-400f-a448-bd728f524b55",
"_score": 1.0,
"_source": {
"location": {
"lat": 36.053757,
"lon": 139.525482
},
"discount_coupons": [
{
"name": "c1"
},
{
"name": "c2"
}
]
}
}
Old Query to get unique discount coupon names in x km area for given location ::
{
"size": 0,
"query": {
"bool": {
"must": {
"match_all": {}
},
"filter": {
"geo_distance": {
"distance": "100km",
"location": {
"lat": 40,
"lon": -70
}
}
}
}
},
"aggs": {
"coupon": {
"nested": {
"path": "discount_coupons"
},
"aggs": {
"name": {
"terms": {
"field": "discount_coupons.name",
"order": {
"_key": "asc"
},
"size": 200
}
}
}
}
}
}
Updated Response ::
{
"took": 60,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 3,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 245328,
"max_score": 0.0,
"hits": []
},
"aggregations": {
"coupon": {
"doc_count": 657442,
"name": {
"doc_count_error_upper_bound": -1,
"sum_other_doc_count": 641189,
"buckets": [
{
"key": "local20210211",
"doc_count": 1611,
"back_to_base": {
"doc_count": 1611,
"distance_script": {
"value": 160.61034409639765
}
}
},
{
"key": "local20210117",
"doc_count": 1621,
"back_to_base": {
"doc_count": 1621,
"distance_script": {
"value": 77.51459886447356
}
}
},
{
"key": "local20201220",
"doc_count": 1622,
"back_to_base": {
"doc_count": 1622,
"distance_script": {
"value": 84.15734462544432
}
}
},
{
"key": "kisekae1",
"doc_count": 1626,
"back_to_base": {
"doc_count": 1626,
"distance_script": {
"value": 88.23770888201268
}
}
},
{
"key": "local20210206",
"doc_count": 1626,
"back_to_base": {
"doc_count": 1626,
"distance_script": {
"value": 86.78376012847237
}
}
},
{
"key": "local20210106",
"doc_count": 1628,
"back_to_base": {
"doc_count": 1628,
"distance_script": {
"value": 384.12156408078397
}
}
},
{
"key": "local20210113",
"doc_count": 1628,
"back_to_base": {
"doc_count": 1628,
"distance_script": {
"value": 153.61681676703674
}
}
},
{
"key": "local20",
"doc_count": 1629,
"back_to_base": {
"doc_count": 1629,
"distance_script": {
"value": 168.74132991524073
}
}
},
{
"key": "local20210213",
"doc_count": 1630,
"back_to_base": {
"doc_count": 1630,
"distance_script": {
"value": 155.8335679860034
}
}
},
{
"key": "local20210208",
"doc_count": 1632,
"back_to_base": {
"doc_count": 1632,
"distance_script": {
"value": 99.58790590445102
}
}
}
]
}
}
}
}
Now the above query will return first 200 discount coupon default sorted by count but I want to return coupons sorted on distance based to given location i.e. the coupon that is nearest applicable should come first.
Is there any way to sort nested aggregations based on a parent key or can I solve this use case using a different data model?
Update Query ::
{
"size": 0,
"query": {
"bool": {
"filter": [
{
"geo_distance": {
"distance": "100km",
"location": {
"lat": 35.699104,
"lon": 139.825211
}
}
},
{
"nested": {
"path": "discount_coupons",
"query": {
"bool": {
"filter": {
"exists": {
"field": "discount_coupons"
}
}
}
}
}
}
]
}
},
"aggs": {
"coupon": {
"nested": {
"path": "discount_coupons"
},
"aggs": {
"name": {
"terms": {
"field": "discount_coupons.name",
"order": {
"back_to_base": "asc"
},
"size": 10
},
"aggs": {
"back_to_base": {
"reverse_nested": {},
"aggs": {
"distance_script": {
"min": {
"script": {
"source": "doc['location'].arcDistance(35.699104, 139.825211)"
}
}
}
}
}
}
}
}
}
}
}
Interesting question. You can always order a terms aggregation by the result of a numeric sub-aggregation. The trick here is to escape the nested context via a reverse_nested aggregation and then calculate the distance from the pivot using a script:
{
"size": 0,
"query": {
"bool": {
"must": {
"match_all": {}
},
"filter": {
"geo_distance": {
"distance": "100km",
"location": {
"lat": 40,
"lon": -70
}
}
}
}
},
"aggs": {
"coupon": {
"nested": {
"path": "discount_coupons"
},
"aggs": {
"name": {
"terms": {
"field": "discount_coupons.name",
"order": {
"back_to_base": "asc"
},
"size": 200
},
"aggs": {
"back_to_base": {
"reverse_nested": {},
"aggs": {
"distance_script": {
"min": {
"script": {
"source": "doc['location'].arcDistance(40, -70)"
}
}
}
}
}
}
}
}
}
}
}

I want to show Top 10 records and apply filter for specific fields in Elastic search

This is the query to get the Top 10 records. There is a Field name Answer inside this we have a record "UNHANDLED". I want to exclude the UNHANDLED inside the Answer field.
How to write the query to get both Top 10 and Exclude UNHANDLED
GET /logstash-sdc-mongo-abcsearch/_search?size=0
{
"aggs": {
"top_tags": {
"terms": {
"field": "question.keyword"
},
"aggs": {
"top_faq_hits": {
"top_hits": {
"_source": {
"includes": [
"answer"
]
},
"size": 1
}
}
}
}
}
}
You can use the must_not clause, to exclude the documents that containsUNHANDLED in the answer field. Try out the below query -
Index Mapping:
{
"mappings": {
"properties": {
"question": {
"type": "keyword"
},
"answer": {
"type": "keyword"
}
}
}
}
Index Data:
{
"question": "a",
"answer": "b"
}
{
"question": "c",
"answer": "UNHANDLED"
}
Search Query:
{
"query": {
"bool": {
"must_not": {
"term": {
"answer": "UNHANDLED"
}
}
}
},
"aggs": {
"top_tags": {
"terms": {
"field": "question"
},
"aggs": {
"top_faq_hits": {
"top_hits": {
"_source": {
"includes": [
"answer"
]
},
"size": 1
}
}
}
}
}
}
Search Result:
"aggregations": {
"top_tags": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "a",
"doc_count": 1,
"top_faq_hits": {
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 0.0,
"hits": [
{
"_index": "65563925",
"_type": "_doc",
"_id": "1",
"_score": 0.0,
"_source": {
"answer": "b"
}
}
]
}
}
}
]
}
}
Update 1:
Based on the comments below, try out the below query:
{
"query": {
"bool": {
"must_not": {
"term": {
"answer": "UNHANDLED"
}
},
"must": {
"term": {
"source": "sonax"
}
}
}
},
"aggs": {
"top_tags": {
"terms": {
"field": "question"
},
"aggs": {
"top_faq_hits": {
"top_hits": {
"_source": {
"includes": [
"answer"
]
},
"size": 1
}
}
}
}
}
}

Nested object aggregation term with mixed nested/non-nested filter

We have facets showing the number of results that will show when clicking the filters (and combining them). Something like this:
Before we introduced nested objects, the following would do the job:
GET /x_v1/_search/
{
"size": 0,
"aggs": {
"FilteredDescriptiveFeatures": {
"filter": {
"bool": {
"must": [
{
"terms": {
"breadcrumbs.categoryIds": [
"category"
]
}
},
{
"terms": {
"products.sterile": [
"0"
]
}
}
]
}
},
"aggs": {
"DescriptiveFeatures": {
"terms": {
"field": "products.descriptiveFeatures",
"size": 1000
}
}
}
}
}
}
This gives the result:
"aggregations": {
"FilteredDescriptiveFeatures": {
"doc_count": 280,
"DescriptiveFeatures": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "somekey",
"doc_count": 42
},
We needed to make products a nested object though, and I'm currently trying rewrite the above to work with this change.
My attempt looks like the following. It doesn't give the correct result though, and doesn't seem properly connected to the filter.
GET /x_v2/_search/
{
"size": 0,
"aggs": {
"FilteredDescriptiveFeatures": {
"filter": {
"bool": {
"must": [
{
"terms": {
"breadcrumbs.categoryIds": [
"category"
]
}
},
{
"nested": {
"path": "products",
"query": {
"terms": {
"products.sterile": [
"0"
]
}
}
}
}
]
}
},
"aggs": {
"nested": {
"nested": {
"path": "products"
},
"aggregations": {
"DescriptiveFeatures": {
"terms": {
"field": "products.descriptiveFeatures",
"size": 1000
}
}
}
}
}
}
}
}
This gives the result:
"aggregations": {
"FilteredDescriptiveFeatures": {
"doc_count": 280,
"nested": {
"doc_count": 1437,
"DescriptiveFeatures": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "somekey",
"doc_count": 164
},
I've also tried to put the nested definition higher up to contain both the filter and aggs, but then the filter term breadcrumbs.categoryId, which is not in the nested object, won't work.
Is what I'm trying to do even possible?
And how can it be solved?
In your FilteredDescriptiveFeatures step, you return all documents that have one product with sterile = 0
But after in the nested step you dont specify again this filter. So all nested products are return in this step, thus you make your terms aggregations on all products, not only products with sterile = 0
You should move your sterile filter in the nested step. And like Richa points out, you need to use a reverse_nested aggregation in the final step to count elasticsearch document and not nested products sub-documents.
Could you try this query ?
{
"size": 0,
"aggs": {
"filteredCategory": {
"filter": {
"terms": {
"breadcrumbs.categoryIds": [
"category"
]
}
},
"aggs": {
"nestedProducts": {
"nested": {
"path": "products"
},
"aggs": {
"filteredByProductsAttributes": {
"filter": {
"terms": {
"products.sterile": [
"0"
]
}
},
"aggs": {
"DescriptiveFeatures": {
"terms": {
"field": "products.descriptiveFeatures",
"size": 1000
},
"aggs": {
"productCount": {
"reverse_nested": {}
}
}
}
}
}
}
}
}
}
}
}
What I under stand from the description is that you want to filter your results on the basis of some Nested and Non Nested Fields and then apply aggregations on the Nested Field. I created a sample Index and data with some Nested and Non Nested Fields and created a query
Mapping
PUT stack-557722203
{
"mappings": {
"_doc": {
"properties": {
"category": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"user": {
"type": "nested", // NESTED FIELD
"properties": {
"fName": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"lName": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"type": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
}
Sample Data
POST _bulk
{"index":{"_index":"stack-557722203","_id":"1","_type":"_doc"}}
{"category":"X","user":[{"fName":"A","lName":"B","type":"X"},{"fName":"A","lName":"C","type":"X"},{"fName":"P","lName":"B","type":"Y"}]}
{"index":{"_index":"stack-557722203","_id":"2","_type":"_doc"}}
{"category":"X","user":[{"fName":"P","lName":"C","type":"Z"}]}
{"index":{"_index":"stack-557722203","_id":"3","_type":"_doc"}}
{"category":"X","user":[{"fName":"A","lName":"C","type":"Y"}]}
{"index":{"_index":"stack-557722203","_id":"4","_type":"_doc"}}
{"category":"Y","user":[{"fName":"A","lName":"C","type":"Y"}]}
Query
GET stack-557722203/_search
{
"size": 0,
"query": {
"bool": {
"must": [
{
"nested": {
"path": "user",
"query": {
"term": {
"user.fName.keyword": {
"value": "A"
}
}
}
}
},
{
"term": {
"category.keyword": {
"value": "X"
}
}
}
]
}
},
"aggs": {
"group BylName": {
"nested": {
"path": "user"
},
"aggs": {
"group By lName": {
"terms": {
"field": "user.lName.keyword",
"size": 10
},
"aggs": {
"reverse Nested": {
"reverse_nested": {} // NOTE THIS
}
}
}
}
}
}
}
Output
{
"took": 18,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0,
"hits": []
},
"aggregations": {
"group BylName": {
"doc_count": 4,
"group By lName": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "B",
"doc_count": 2,
"reverse Nested": {
"doc_count": 1
}
},
{
"key": "C",
"doc_count": 2,
"reverse Nested": {
"doc_count": 2
}
}
]
}
}
}
}
As per the discrepancy in data where you are getting, more documents in doc_count when you changed the mapping to Nested is because of the way Nested and Object(NonNested) documents are stored. See here to understand how are they internally stored. In order to connect them back to the root Document , you can use Reverse Nested aggregation and then you will have the same result.
Hope this helps!!

Nested query in nested, filter aggregation fails

I am trying to use a nested query filter inside of a nested, filter aggregation. When I do so, the aggregation returns with no items. If I change the query to just a plain old match_all filter, I do get items back in the bucket.
Here is a simplified version of the mapping I'm working with:
"player": {
"properties": {
"rating": {
"type": "float"
},
"playerYears": {
"type": "nested",
"properties": {
"schoolsOfInterest": {
"type": "nested",
"properties": {
"name": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}
This query, with a match_all filter on the aggregation:
GET /players/_search
{
"size": 0,
"aggs": {
"rating": {
"nested": {
"path": "playerYears"
},
"aggs": {
"rating-filtered": {
"filter": {
"match_all": {}
},
"aggs": {
"rating": {
"histogram": {
"field": "playerYears.rating",
"interval": 1
}
}
}
}
}
}
},
"query": {
"filtered": {
"filter": {
"match_all": {}
}
}
}
}
returns the following:
{
"took": 16,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 167316,
"max_score": 0,
"hits": []
},
"aggregations": {
"rating": {
"doc_count": 363550,
"rating-filtered": {
"doc_count": 363550,
"rating": {
"buckets": [
{
"key_as_string": "-1",
"key": -1,
"doc_count": 20978
},
{
"key_as_string": "0",
"key": 0,
"doc_count": 312374
},
{
"key_as_string": "1",
"key": 1,
"doc_count": 1162
},
{
"key_as_string": "2",
"key": 2,
"doc_count": 12104
},
{
"key_as_string": "3",
"key": 3,
"doc_count": 9558
},
{
"key_as_string": "4",
"key": 4,
"doc_count": 5549
},
{
"key_as_string": "5",
"key": 5,
"doc_count": 1825
}
]
}
}
}
}
}
But this query, which has a nested filter in the aggregation, returns an empty bucket:
GET /players/_search
{
"size": 0,
"aggs": {
"rating": {
"nested": {
"path": "playerYears"
},
"aggs": {
"rating-filtered": {
"filter": {
"nested": {
"query": {
"match_all": {}
},
"path": "playerYears.schoolsOfInterest"
}
},
"aggs": {
"rating": {
"histogram": {
"field": "playerYears.rating",
"interval": 1
}
}
}
}
}
}
},
"query": {
"filtered": {
"filter": {
"match_all": {}
}
}
}
}
the empty bucket:
{
"took": 8,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 167316,
"max_score": 0,
"hits": []
},
"aggregations": {
"rating": {
"doc_count": 363550,
"rating-filtered": {
"doc_count": 0,
"rating": {
"buckets": []
}
}
}
}
}
Is it possible to use nested filters inside of nested, filtered aggregations? Is there a known bug in elasticsearch about this? The nested filter works fine in the query context of the search, and it works fine if I don't use a nested aggregation.
Based on the information provided, and a few assumptions, I would like to provide two suggestions. I hope it helps solve your problem.
Case 1: using reverse nested aggregation:
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"rating": {
"nested": {
"path": "playerYears.schoolsOfInterest"
},
"aggs": {
"rating-filtered": {
"filter": {
"match_all": {}
},
"aggs": {
"rating_nested": {
"reverse_nested": {},
"aggs": {
"rating": {
"histogram": {
"field": "rating",
"interval": 1
}
}
}
}
}
}
}
}
}
}
Case 2: changes to filtered aggregation:
{
"size": 0,
"aggs": {
"rating-filtered": {
"filter": {
"nested": {
"query": {
"match_all": {}
},
"path": "playerYears.schoolsOfInterest"
}
},
"aggs": {
"rating": {
"histogram": {
"field": "playerYears.rating",
"interval": 1
}
}
}
}
},
"query": {
"filtered": {
"filter": {
"match_all": {}
}
}
}
}
I would suggest you to use case 1 and verify your required results.

Resources