Elasticsearch - add normal field filter to nested field aggregation - elasticsearch

I have document structure like below in ES:
{
customer_id: 1,
is_member: true,
purchases: [
{
pur_id: 1,
pur_channel_id: 1,
pur_amount: 100.00,
pur_date: '2021-08-01'
},
{
pur_id: 2,
pur_channel_id: 2,
pur_amount: 100.00,
pur_date: '2021-08-02'
}
]
},
{
customer_id: 2,
is_member: false,
purchases: [
{
pur_id: 3,
pur_channel_id: 1,
pur_amount: 200.00,
pur_date: '2021-07-01'
},
{
pur_id: 4,
pur_channel_id: 3,
pur_amount: 300.00,
pur_date: '2021-07-02'
}
]
}
I want to aggregate sum by purchases.pur_channel_id and also for each sub aggregation I want to add sub sum aggregation on documents that contains "is_member=false", therefore, I composed following query:
{
"size": 0,
"query": {
"match_all": {}
}
},
"aggs": {
"purchases": {
"nested": {
"path": "purchases"
},
"aggs": {
"pur_channel_id": {
"terms": {
"field": "purchases.pur_channel_id",
"size": 10
},
"aggs": {
"none_member": {
"filter": {
"term": {
"is_member": false
}
},
"aggs": {
"none_member_amount": {
"sum": {
"field": "purchases.pur_amount"
}
}
}
},
"aggs": {
"pur_channel_amount": {
"sum": {
"field": "purchases.pur_amount"
}
}
}
}
}
}
}
}
The query runs success, while I got 0 for all "none_member_amount". I wonder a normal field perhaps can not be added inside of a nested aggregation.
Please help! Thanks.

Nested aggregation runs at nested block level, so your query is searching for is_member field in nested documents. To join back to parent doc you need to use reverse nested aggregation or you can move is_member check before nested aggregation using filter aggregation.

Related

Elastic-search aggregate top 3 common result

My indexed data is of below structure, i want to aggregate top 3 most repeted productProperty, so top 3 most repeated productProperty will be there in aggregation result
[
{
productProperty: "material",
productValue:[{value: wood},{value: plastic}] ,
},
{
productProperty: "material",
productValuea:[{value: wood},{value: plastic}] ,
},
{
productProperty: "type",
productValue:[{value: 26A},{value: 23A}] ,
},
{
productProperty: "type",
productValue:[{value: 22B},{value: 90C}] ,
},
{
productProperty: "material",
productValue:[{value: wood},{value: plastic}] ,
},
{
productProperty: "age_rating",
productValue:[{value: 18},{value: 13}] ,
}
]
Below query aggregates all based on productProperty but how can i get top 3 results out of that
{
"query": {},
"aggs": {
"filtered_product_property": {
"filter": {
"bool": {
"must": []
}
},
"aggs": {
"aggs": {
"productProperty": {
"terms": {
"field": "productProperty"
}
}
}
}
}
}
}
You can use the size parameter in your term aggregation.
{
"query": {},
"aggs": {
"filtered_product_property": {
"filter": {
"bool": {
"must": []
}
},
"aggs": {
"aggs": {
"productProperty": {
"terms": {
"field": "productProperty",
"size" : 3
}
}
}
}
}
}
}
Important to point out, that terms aggregations are not the most accurate in some cases.
As mentioned by #Tushar you can use the size param. According to the ES official documentation
when there are lots of unique terms, Elasticsearch only returns the
top terms; this number is the sum of the document counts for all
buckets that are not part of the response
However, you can define the order in which the sorting of the results should be done of the aggregation response, using the order param.
By default, the result is sorted on the basis of doc count in descending order
Search Query will be
{
"aggs": {
"productProperty": {
"terms": {
"field": "productProperty.keyword",
"size": 3
}
}
}
}
And, search result would be
"aggregations": {
"productProperty": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "material",
"doc_count": 3
},
{
"key": "type",
"doc_count": 2
},
{
"key": "age_rating",
"doc_count": 1
}
]
}
}

Sorting a reverse nested back to parent aggregation

I'm currently aggregating a collection by a multi-level nested field and calculating some sub-aggregation metrics from this collection and thats working using elasticsearch's reverse nested feature as described at Sub-aggregate a multi-level nested composite aggregation.
My current struggle is to find a way to sort the aggregations by one of the calculated metrics. For example, considering the following document and my current search call I would like to sort all the aggregations by their clicks sums.
I've tried using bucket_sort inside the inner aggs at the back_to_parent level but got the following java exception.
class org.elasticsearch.search.aggregations.bucket.nested.InternalReverseNested cannot be cast to class org.elasticsearch.search.aggregations.InternalMultiBucketAggregation
(org.elasticsearch.search.aggregations.bucket.nested.InternalReverseNested and org.elasticsearch.search.aggregations.InternalMultiBucketAggregation are in unnamed module of loader 'app')
{
id: '32ead132eq13w21',
statistics: {
clicks: 123,
views: 456
},
categories: [{ //nested type
name: 'color',
tags: [{ //nested type
slug: 'blue'
},{
slug: 'red'
}]
}]
}
GET /acounts-123321/_search
{
size: 0,
aggs: {
categories_parent: {
nested: {
path: 'categories.tags'
},
aggs: {
filtered: {
filter: {
term: { 'categories.tags.category': 'color' }
},
aggs: {
by_slug: {
terms: {
field: 'categories.tags.slug',
size: perPage
},
aggs: {
back_to_parent: {
reverse_nested: {},
aggs: {
clicks: {
sum: {
field: 'statistics.clicks'
}
},
custom_metric: {
scripted_metric: {
init_script: 'state.accounts = []',
map_script: 'state.accounts.add(new HashMap(params["_source"]))',
combine_script: 'double result = 0;
for (acc in state.accounts) {
result += ( acc.statistics.clicks + acc.statistics.impressions);
}
return result;',
reduce_script: 'double sum = 0;
for (state in states) {
sum += state;
}
return sum;'
}
},
by_tag_sort: {
bucket_sort: {
sort: [{ 'clicks.value': { order: 'desc' } }]
}
}
}
}
}
}
}
}
}
Update:
It would also be nice to understand how it would be possible to sort the buckets by a custom metric calculated through a painless scripted_metric. I have updated the search call above adding a sample custom_metric that I wish to allow sorting through it.
I see that using bucket_sort directly does not work with the standard sort array we use for concrete fields. So the following does not seem to sort things. It also won't work for a sort script as well since [bucket_sort] only supports field based sorting.
by_tag_sort: {
bucket_sort: {
sort: [{ 'custom_metric.value': { order: 'desc' } }]
}
}
bucket_sort expects to be run within a multi-bucket context but your reverse_nested aggregation is single-bucket (irrespective of the fact that it's a child of a multi-bucket terms aggregation).
The trick is to use an empty-ish filters aggregation to generate a multi-bucket context and then run the bucket sort:
{
"size": 0,
"aggs": {
"categories_parent": {
"nested": {
"path": "categories.tags"
},
"aggs": {
"filtered": {
"filter": {
"term": {
"categories.tags.category": "color"
}
},
"aggs": {
"by_slug": {
"terms": {
"field": "categories.tags.slug",
"size": 10
},
"aggs": {
"back_to_parent": {
"reverse_nested": {},
"aggs": {
"multi_bucket_emulator": {
"filters": {
"filters": {
"placeholder_match_all_query": {
"match_all": {}
}
}
},
"aggs": {
"clicks": {
"sum": {
"field": "statistics.clicks"
}
},
"by_tag_sort": {
"bucket_sort": {
"sort": [
{
"clicks.value": {
"order": "desc"
}
}
]
}
}
}
}
}
}
}
}
}
}
}
}
}
}
Update: sorting by the result of a custom scripted metric value
{
"size": 0,
"aggs": {
"categories_parent": {
"nested": {
"path": "categories.tags"
},
"aggs": {
"filtered": {
"filter": {
"term": {
"categories.tags.category": "color"
}
},
"aggs": {
"by_slug": {
"terms": {
"field": "categories.tags.slug",
"size": 10
},
"aggs": {
"back_to_parent": {
"reverse_nested": {},
"aggs": {
"multi_bucket_emulator": {
"filters": {
"filters": {
"placeholder_match_all_query": {
"match_all": {}
}
}
},
"aggs": {
"clicks": {
"sum": {
"field": "statistics.clicks"
}
},
"custom_metric": {
"scripted_metric": {
"init_script": "state.accounts = []",
"map_script": """state.accounts.add(params["_source"])""",
"combine_script": """
double result = 0;
for (def acc : state.accounts) {
result += ( acc.statistics.clicks + acc.statistics.impressions);
}
return result;
""",
"reduce_script": """
double sum = 0;
for (def state : states) {
sum += state;
}
return sum;
"""
}
},
"by_tag_sort": {
"bucket_sort": {
"sort": [
{
"custom_metric.value": {
"order": "desc"
}
}
]
}
}
}
}
}
}
}
}
}
}
}
}
}
}
Joe - Elasticsearch Handbook - I have an equivalent query to yours (one that sorts by the result of a custom scripted metric) and I expect the response to your query looks something like the below.
I have noticed that sorting specified by the bucket_sort does not get applied to the uppermost buckets (i.e. by_slug.buckets), which are still sorted by the default doc_count ordering. This can also be verified by changing the custom_metric.value ordering from desc to asc, which has no effect on the order of the results.
My understanding of bucket_sort suggests that sorting based on the custom_metric is applied to the aggregation one level up, which in this case would be multi_bucket_emulator.buckets (but because this is an emulator it has no actual buckets to sort).
Is it possible to sort the by_slug.buckets based on the custom_metric values?
I am using Elasticsearch v7.10.
Thanks very much.
(Sorry for posting this question as an answer; it was too long to be a comment.)
Response (approximation):
{
"aggregations": {
"categories_parent": {
"filtered": {
"by_slug": {
"buckets": [
{
"key": "xxxxxx",
"back_to_parent": {
"multi_bucket_emulator": {
"buckets": {
"placeholder_match_all_query": {
"clicks": {
"buckets": [
{
"key": 5.0,
"doc_count": 1
},
…
]
},
"custom_metric": {
"value": 20.0
}
}
}
}
}
},
…
]
}
}
}
}
}

Elasticsearch: Aggregate all unique values of a field and apply a condition or filter by another field

My documents look like this:
{
"ownID": "Val_123",
"parentID": "Val_456",
"someField": "Val_78",
"otherField": "Val_90",
...
}
I am trying to get all (unique, as in one instance) results for a list of ownID values, while filtering by a list of parentID values and vice-versa.
What I did so far is:
Get (separate!) unique values for ownID and parentID in key1 and key2
{
"size": 0,
"aggs": {
"key1": {
"terms": {
"field": "ownID",
"include": {
"partition": 0,
"num_partitions": 10
},
"size": 100
}
},
"key2": {
"terms": {
"field": "parentID",
"include": {
"partition": 0,
"num_partitions": 10
},
"size": 100
}
}
}
}
Use filter to get (some) results matching either ownID OR parentID
{
"size": 0,
"query": {
"bool": {
"should": [
{
"terms": {
"ownID": ["Val_1","Val_2","Val_3"]
}
},
{
"terms": {
"parentID": ["Val_8","Val_9"]
}
}
]
}
},
"aggs": {
"my_filter": {
"top_hits": {
"size": 30000,
"_source": {
"include": ["ownID", "parentID","otherField"]
}
}
}
}
}
However, I need to get separate results for each filter in the second query, and get:
(1) the parentID of the documents matching some value of ownID
(2) the ownID for the documents matching some value of parentID.
So far I managed to do it using two similar queries (see below for (1)), but I would ideally want to combine them and query only once.
{
"size": 0,
"query": {
"bool": {
"should": [
{
"terms": {
"ownID": [ "Val1", Val_2, Val_3 ]
}
}
]
}
},
"aggs": {
"my_filter": {
"top_hits": {
"size": 30000,
"_source": {
"include": "parentID"
}
}
}
}
}
I'm using Elasticsearch version 5.2
If I got your question correctly then you need to get all the aggregations count correct irrespective of the filter query but in search hits you want the filtered documents only, so for this elasticsearch has another type of filter : "post filter" : refer to this : https://www.elastic.co/guide/en/elasticsearch/reference/5.5/search-request-post-filter.html
its really simple, it will just filter the results after the aggregations have been computed.

Elasticsearch - Query field against aggregation

I am exploring the ease of querying and aggregating the data using elasticsearch. But i am not able to pivot and aggregate the data in a single query as below:
Considering the data:
Is there a way to query the below result
that pivots and aggregates the value as below:
Required Result:
{
{
"A":a1,
"B":b1,
"Value":3
},
{
"A":a1,
"B":b2,
"Value":3
},
{
"A":a2,
"B":b2,
"Value":4
},
{
"A":a1,
"B":b3,
"Value":11
}
}
Yes, you can nest two terms aggregations for A and B, like this, and you'll get exactly the results you expect:
{
"size": 0,
"aggs": {
"A": {
"terms": {
"field": "A"
},
"aggs": {
"B": {
"terms": {
"field": "B"
},
"aggs": {
"value_sum": {
"sum": {
"field": "Value1"
}
}
}
}
}
}
}
}

ElasticSearch - Ordering aggregation by nested aggregation on nested field

{
"query": {
"match_all": {}
},
"from": 0,
"size": 0,
"aggs": {
"itineraryId": {
"terms": {
"field": "iid",
"size": 2147483647,
"order": [
{
"price>price>price.max": "desc"
}
]
},
"aggs": {
"duration": {
"stats": {
"field": "drn"
}
},
"price": {
"nested": {
"path": "prl"
},
"aggs": {
"price": {
"filter": {
"terms": {
"prl.cc.keyword": [
"USD"
]
}
},
"aggs": {
"price": {
"stats": {
"field": "prl.spl.vl"
}
}
}
}
}
}
}
}
}
}
Here, I am getting the error:
"Invalid terms aggregation order path [price>price>price.max]. Terms
buckets can only be sorted on a sub-aggregator path that is built out
of zero or more single-bucket aggregations within the path and a final
single-bucket or a metrics aggregation at the path end. Sub-path
[price] points to non single-bucket aggregation"
query works fine if I order by duration aggregation like
"order": [
{
"duration.max": "desc"
}
So is there any way to Order aggregation by nested aggregation on nested field i.e something like below ?
"order": [
{
"price>price>price.max": "desc"
}
As Val has pointed out in the comments ES does not support it yet.
Till then you can first aggregate the nested aggregation and then use the reverse nested aggregation to aggregate the duration, that is present in the root of the document.
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-reverse-nested-aggregation.html

Resources