Sorting a reverse nested back to parent aggregation - elasticsearch

I'm currently aggregating a collection by a multi-level nested field and calculating some sub-aggregation metrics from this collection and thats working using elasticsearch's reverse nested feature as described at Sub-aggregate a multi-level nested composite aggregation.
My current struggle is to find a way to sort the aggregations by one of the calculated metrics. For example, considering the following document and my current search call I would like to sort all the aggregations by their clicks sums.
I've tried using bucket_sort inside the inner aggs at the back_to_parent level but got the following java exception.
class org.elasticsearch.search.aggregations.bucket.nested.InternalReverseNested cannot be cast to class org.elasticsearch.search.aggregations.InternalMultiBucketAggregation
(org.elasticsearch.search.aggregations.bucket.nested.InternalReverseNested and org.elasticsearch.search.aggregations.InternalMultiBucketAggregation are in unnamed module of loader 'app')
{
id: '32ead132eq13w21',
statistics: {
clicks: 123,
views: 456
},
categories: [{ //nested type
name: 'color',
tags: [{ //nested type
slug: 'blue'
},{
slug: 'red'
}]
}]
}
GET /acounts-123321/_search
{
size: 0,
aggs: {
categories_parent: {
nested: {
path: 'categories.tags'
},
aggs: {
filtered: {
filter: {
term: { 'categories.tags.category': 'color' }
},
aggs: {
by_slug: {
terms: {
field: 'categories.tags.slug',
size: perPage
},
aggs: {
back_to_parent: {
reverse_nested: {},
aggs: {
clicks: {
sum: {
field: 'statistics.clicks'
}
},
custom_metric: {
scripted_metric: {
init_script: 'state.accounts = []',
map_script: 'state.accounts.add(new HashMap(params["_source"]))',
combine_script: 'double result = 0;
for (acc in state.accounts) {
result += ( acc.statistics.clicks + acc.statistics.impressions);
}
return result;',
reduce_script: 'double sum = 0;
for (state in states) {
sum += state;
}
return sum;'
}
},
by_tag_sort: {
bucket_sort: {
sort: [{ 'clicks.value': { order: 'desc' } }]
}
}
}
}
}
}
}
}
}
Update:
It would also be nice to understand how it would be possible to sort the buckets by a custom metric calculated through a painless scripted_metric. I have updated the search call above adding a sample custom_metric that I wish to allow sorting through it.
I see that using bucket_sort directly does not work with the standard sort array we use for concrete fields. So the following does not seem to sort things. It also won't work for a sort script as well since [bucket_sort] only supports field based sorting.
by_tag_sort: {
bucket_sort: {
sort: [{ 'custom_metric.value': { order: 'desc' } }]
}
}

bucket_sort expects to be run within a multi-bucket context but your reverse_nested aggregation is single-bucket (irrespective of the fact that it's a child of a multi-bucket terms aggregation).
The trick is to use an empty-ish filters aggregation to generate a multi-bucket context and then run the bucket sort:
{
"size": 0,
"aggs": {
"categories_parent": {
"nested": {
"path": "categories.tags"
},
"aggs": {
"filtered": {
"filter": {
"term": {
"categories.tags.category": "color"
}
},
"aggs": {
"by_slug": {
"terms": {
"field": "categories.tags.slug",
"size": 10
},
"aggs": {
"back_to_parent": {
"reverse_nested": {},
"aggs": {
"multi_bucket_emulator": {
"filters": {
"filters": {
"placeholder_match_all_query": {
"match_all": {}
}
}
},
"aggs": {
"clicks": {
"sum": {
"field": "statistics.clicks"
}
},
"by_tag_sort": {
"bucket_sort": {
"sort": [
{
"clicks.value": {
"order": "desc"
}
}
]
}
}
}
}
}
}
}
}
}
}
}
}
}
}
Update: sorting by the result of a custom scripted metric value
{
"size": 0,
"aggs": {
"categories_parent": {
"nested": {
"path": "categories.tags"
},
"aggs": {
"filtered": {
"filter": {
"term": {
"categories.tags.category": "color"
}
},
"aggs": {
"by_slug": {
"terms": {
"field": "categories.tags.slug",
"size": 10
},
"aggs": {
"back_to_parent": {
"reverse_nested": {},
"aggs": {
"multi_bucket_emulator": {
"filters": {
"filters": {
"placeholder_match_all_query": {
"match_all": {}
}
}
},
"aggs": {
"clicks": {
"sum": {
"field": "statistics.clicks"
}
},
"custom_metric": {
"scripted_metric": {
"init_script": "state.accounts = []",
"map_script": """state.accounts.add(params["_source"])""",
"combine_script": """
double result = 0;
for (def acc : state.accounts) {
result += ( acc.statistics.clicks + acc.statistics.impressions);
}
return result;
""",
"reduce_script": """
double sum = 0;
for (def state : states) {
sum += state;
}
return sum;
"""
}
},
"by_tag_sort": {
"bucket_sort": {
"sort": [
{
"custom_metric.value": {
"order": "desc"
}
}
]
}
}
}
}
}
}
}
}
}
}
}
}
}
}

Joe - Elasticsearch Handbook - I have an equivalent query to yours (one that sorts by the result of a custom scripted metric) and I expect the response to your query looks something like the below.
I have noticed that sorting specified by the bucket_sort does not get applied to the uppermost buckets (i.e. by_slug.buckets), which are still sorted by the default doc_count ordering. This can also be verified by changing the custom_metric.value ordering from desc to asc, which has no effect on the order of the results.
My understanding of bucket_sort suggests that sorting based on the custom_metric is applied to the aggregation one level up, which in this case would be multi_bucket_emulator.buckets (but because this is an emulator it has no actual buckets to sort).
Is it possible to sort the by_slug.buckets based on the custom_metric values?
I am using Elasticsearch v7.10.
Thanks very much.
(Sorry for posting this question as an answer; it was too long to be a comment.)
Response (approximation):
{
"aggregations": {
"categories_parent": {
"filtered": {
"by_slug": {
"buckets": [
{
"key": "xxxxxx",
"back_to_parent": {
"multi_bucket_emulator": {
"buckets": {
"placeholder_match_all_query": {
"clicks": {
"buckets": [
{
"key": 5.0,
"doc_count": 1
},
…
]
},
"custom_metric": {
"value": 20.0
}
}
}
}
}
},
…
]
}
}
}
}
}

Related

Elasticsearch - add normal field filter to nested field aggregation

I have document structure like below in ES:
{
customer_id: 1,
is_member: true,
purchases: [
{
pur_id: 1,
pur_channel_id: 1,
pur_amount: 100.00,
pur_date: '2021-08-01'
},
{
pur_id: 2,
pur_channel_id: 2,
pur_amount: 100.00,
pur_date: '2021-08-02'
}
]
},
{
customer_id: 2,
is_member: false,
purchases: [
{
pur_id: 3,
pur_channel_id: 1,
pur_amount: 200.00,
pur_date: '2021-07-01'
},
{
pur_id: 4,
pur_channel_id: 3,
pur_amount: 300.00,
pur_date: '2021-07-02'
}
]
}
I want to aggregate sum by purchases.pur_channel_id and also for each sub aggregation I want to add sub sum aggregation on documents that contains "is_member=false", therefore, I composed following query:
{
"size": 0,
"query": {
"match_all": {}
}
},
"aggs": {
"purchases": {
"nested": {
"path": "purchases"
},
"aggs": {
"pur_channel_id": {
"terms": {
"field": "purchases.pur_channel_id",
"size": 10
},
"aggs": {
"none_member": {
"filter": {
"term": {
"is_member": false
}
},
"aggs": {
"none_member_amount": {
"sum": {
"field": "purchases.pur_amount"
}
}
}
},
"aggs": {
"pur_channel_amount": {
"sum": {
"field": "purchases.pur_amount"
}
}
}
}
}
}
}
}
The query runs success, while I got 0 for all "none_member_amount". I wonder a normal field perhaps can not be added inside of a nested aggregation.
Please help! Thanks.
Nested aggregation runs at nested block level, so your query is searching for is_member field in nested documents. To join back to parent doc you need to use reverse nested aggregation or you can move is_member check before nested aggregation using filter aggregation.

Can I sort grouped search result by formula?

I am trying to implement query which will sort aggregated results by the formula.
For example, we have the next entities:
{
"price":"1000",
"zip":"77777",
"field1":"1",
"field2":"5"
},
{
"price":"2222",
"zip":"77777",
"field1":"2",
"field2":"5"
},
{
"price":"1111",
"zip":"77777",
"field1":"1",
"field2":"5"
}
Now, my query without sorting looks like:
POST /entities/_search {
"size": 0,
"query": {
"term": {
"zip": {
"value": "77777"
}
}
},
"aggs": {
"my composite": {
"composite": {
"size": 500,
"sources": [{
"field1_term": {
"terms": {
"field": "field1"
}
}
},
{
"field2_term": {
"terms": {
"field": "field2"
}
}
}
]
},
"aggs": {
"avg_price_per_group": {
"avg": {
"field": "price"
}
},
"results_per_group": {
"top_hits": {
"size": 100,
"_source": {
"include": ["entity_id", "price"]
}
}
}
}
}
}
}
The first one I need to group result by field1 and field2 and then calculate the average price for each group.
Then I need to divide the price of each doc by average price value and sort documents based on this value.
Is it possible to do this somehow?

Elasticsearch - Query field against aggregation

I am exploring the ease of querying and aggregating the data using elasticsearch. But i am not able to pivot and aggregate the data in a single query as below:
Considering the data:
Is there a way to query the below result
that pivots and aggregates the value as below:
Required Result:
{
{
"A":a1,
"B":b1,
"Value":3
},
{
"A":a1,
"B":b2,
"Value":3
},
{
"A":a2,
"B":b2,
"Value":4
},
{
"A":a1,
"B":b3,
"Value":11
}
}
Yes, you can nest two terms aggregations for A and B, like this, and you'll get exactly the results you expect:
{
"size": 0,
"aggs": {
"A": {
"terms": {
"field": "A"
},
"aggs": {
"B": {
"terms": {
"field": "B"
},
"aggs": {
"value_sum": {
"sum": {
"field": "Value1"
}
}
}
}
}
}
}
}

Elasticsearch distinct count on nested fields

According to docs, distinct count can be achieved approximately by using cardinality.
https://www.elastic.co/guide/en/elasticsearch/guide/current/cardinality.html
I have a large store of data of type like this:
{
{
"foo": {
"bar": "a1"
}
},
{
"foo": {
"bar": "a2"
}
}
}
and I want to do a distinct count of "foo.bar" values.
My DSL query:
{
"size": 0,
"aggs": {
"number_of_bars": {
"cardinality": {
"field": "bar"
}
}
}
}
returns "number_of_bars": 0. I was also trying "field": "foo.bar", which results in an error.
Can you tell me, what I am doing wrong?
Use this:
{
"size": 0,
"aggs": {
"number_of_bars": {
"cardinality": {
"field": "foo.bar.keyword"
}
}
}
}

Elasticsearch aggregation order by top hit score

I want to order buckets by doc.score of top_hit. My current implementation is below.
group_by_iid: {
terms: {
field: 'iid',
order: { max_score: 'desc' },
size: 0
},
aggs: {
max_score: { max: { script: 'doc.score' } },
top_hit: {
top_hits: {
sort: [{ source_priority: { order: 'desc' } }],
size: 1
}
}
}
}
This is wrong because buckets are ordered by their top score, not their top source_priority document's score. Is there a way to solve this problem?
I had the same issue, and the way I resolved it was to introduce a sub-aggregation on the docs score. Then in my outer aggregation, I ordered by name of the max_score aggregation.
GET /my-index/my-type/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"searchTerm": {
"query": "style",
"type": "boolean"
}
}
},
{
"flt_field": {
"searchTerm": {
"like_text": "style"
}
}
}
]
}
},
"aggs": {
"group_by_target_url": {
"terms": {
"field": "targetUrl",
"order": {
"max_score": "desc"
}
},
"aggs": {
"max_score": {
"max": {
"script": "_score"
}
}
}
}
}
}
I followed the directions on this link:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html

Resources