Elasticsearch aggregation order by top hit score - elasticsearch

I want to order buckets by doc.score of top_hit. My current implementation is below.
group_by_iid: {
terms: {
field: 'iid',
order: { max_score: 'desc' },
size: 0
},
aggs: {
max_score: { max: { script: 'doc.score' } },
top_hit: {
top_hits: {
sort: [{ source_priority: { order: 'desc' } }],
size: 1
}
}
}
}
This is wrong because buckets are ordered by their top score, not their top source_priority document's score. Is there a way to solve this problem?

I had the same issue, and the way I resolved it was to introduce a sub-aggregation on the docs score. Then in my outer aggregation, I ordered by name of the max_score aggregation.
GET /my-index/my-type/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"searchTerm": {
"query": "style",
"type": "boolean"
}
}
},
{
"flt_field": {
"searchTerm": {
"like_text": "style"
}
}
}
]
}
},
"aggs": {
"group_by_target_url": {
"terms": {
"field": "targetUrl",
"order": {
"max_score": "desc"
}
},
"aggs": {
"max_score": {
"max": {
"script": "_score"
}
}
}
}
}
}
I followed the directions on this link:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html

Related

Order by doc_count in composite aggregation (or suitable alternatives)

I have a search like the following
{
"size": 0,
"query": { "...": "..." },
"_source": false,
"aggregations": {
"agg1": { "...": "..." },
"agg2": { "...": "..." }
}
}
where agg* is composite aggregation of the kind
"agg1" : {
"composite": {
"size": 300,
"sources": [
{
"field1": {
"terms": {
"field": "field1.keyword",
"missing_bucket": true,
}
}
},
{
"field2": {
"terms": {
"field": "field2.keyword",
"missing_bucket": true,
"order": "asc"
}
}
}
]
},
"aggregations": {
"field3": {
"filter": { "term": { "field3.keyword": "xyz" } }
}
}
}
I want to order by doc_count of the buckets as I don't need all the buckets, but just the top n, like what happens in some Kibana visualizations. From the documentation of composite aggregations it doesn't seem possible to order the results similarly at what happens with terms aggregations. Is there a workaround or alternative queries to do this?

Bucket sort on dynamic aggregation name

I would like to sort my aggregations value from quantity.
But my problem is that each aggregation have a name that couldn't be know in advance :
Given this query :
{
"size": 0,
"query": {
"bool": {
"must": [
{
"range": {
"datetime": {
"gte": "2021-01-01",
"lte": "2021-12-09"
}
}
}
]
}
},
"aggs": {
"sorting": {
"bucket_sort": {
"sort": [
{
"year>quantity": {
"order": "desc"
}
}
]
}
},
"UNKNOWN_1": {
"aggs": {
"year": {
"filter": {
"bool": {
"must": [
{
"range": {
"datetime": {
"gte": "2021-01-01",
"lte": "2021-12-09"
}
}
}
]
}
},
"aggs": {
"quantity": {
"sum": {
"field": "item.quantity"
}
}
}
}
}
},
"UNKNOWN_2": {
"aggs": {
"year": {
"aggs": {
"quantity": {
"sum": {
"field": "item.quantity"
}
}
}
}
}
},
....
}
}
it miss one level on my bucket_sort aggregation to reach that quantity value.
Here is one elastic record :
{
datetime: '2021-12-01',
item.quantity: 5
}
Note that I have remove the biggest part of the request for comprehension, like filter aggregation, ect....
I tried something with wildcard :
"sorting": {
"bucket_sort": {
"sort": [
{
"*>year>quantity": {
"order": "desc"
}
}
]
}
},
But got the same error....
Is it possible to achieve this behaviour ?
I think you misunderstood the "bucket_sort" aggregation: it won't sort your aggregations but it sorts the buckets coming from one multi-bucket aggregation. Also the bucket_sort aggregation has to be subordinate to that multi-bucket aggregation.
From the docs:
[The bucket sort aggregation is] "a parent pipeline aggregation which sorts the buckets of its parent multi-bucket aggregation"
If I get it correct, you try to create "buckets" with specific filter aggregations and you can't know in advance how many of those filter aggregations you create.
For that you can use the "multi filters" aggregation where you can specify as many filters as you want and each of them creates a bucket.
Subordinated to that filters-aggregation you can create one single sum aggregation on item.quantity.
Also subordinated to the filters-aggregations you then add your buckets_sort aggregation, where you also just have to name the sibling "sum" aggregation.
All in all it might look like that:
{
"aggs": {
"your_filters": {
"filters": {
"filters": {
"unknown_1": {
"range": {
"datetime": {
"gte": "2021-01-01",
"lte": "2021-12-09"
}
}
},
"unknown_2": {
/** more filters here... **/
}
}
},
"aggs": {
"quantity": {
"sum": {
"field": "item.quantity"
}
},
"sorting": {
"bucket_sort": {
"sort": [
{ "quantity": { "order": "desc" } }
]
}
}
}
}
}
}

Sorting a reverse nested back to parent aggregation

I'm currently aggregating a collection by a multi-level nested field and calculating some sub-aggregation metrics from this collection and thats working using elasticsearch's reverse nested feature as described at Sub-aggregate a multi-level nested composite aggregation.
My current struggle is to find a way to sort the aggregations by one of the calculated metrics. For example, considering the following document and my current search call I would like to sort all the aggregations by their clicks sums.
I've tried using bucket_sort inside the inner aggs at the back_to_parent level but got the following java exception.
class org.elasticsearch.search.aggregations.bucket.nested.InternalReverseNested cannot be cast to class org.elasticsearch.search.aggregations.InternalMultiBucketAggregation
(org.elasticsearch.search.aggregations.bucket.nested.InternalReverseNested and org.elasticsearch.search.aggregations.InternalMultiBucketAggregation are in unnamed module of loader 'app')
{
id: '32ead132eq13w21',
statistics: {
clicks: 123,
views: 456
},
categories: [{ //nested type
name: 'color',
tags: [{ //nested type
slug: 'blue'
},{
slug: 'red'
}]
}]
}
GET /acounts-123321/_search
{
size: 0,
aggs: {
categories_parent: {
nested: {
path: 'categories.tags'
},
aggs: {
filtered: {
filter: {
term: { 'categories.tags.category': 'color' }
},
aggs: {
by_slug: {
terms: {
field: 'categories.tags.slug',
size: perPage
},
aggs: {
back_to_parent: {
reverse_nested: {},
aggs: {
clicks: {
sum: {
field: 'statistics.clicks'
}
},
custom_metric: {
scripted_metric: {
init_script: 'state.accounts = []',
map_script: 'state.accounts.add(new HashMap(params["_source"]))',
combine_script: 'double result = 0;
for (acc in state.accounts) {
result += ( acc.statistics.clicks + acc.statistics.impressions);
}
return result;',
reduce_script: 'double sum = 0;
for (state in states) {
sum += state;
}
return sum;'
}
},
by_tag_sort: {
bucket_sort: {
sort: [{ 'clicks.value': { order: 'desc' } }]
}
}
}
}
}
}
}
}
}
Update:
It would also be nice to understand how it would be possible to sort the buckets by a custom metric calculated through a painless scripted_metric. I have updated the search call above adding a sample custom_metric that I wish to allow sorting through it.
I see that using bucket_sort directly does not work with the standard sort array we use for concrete fields. So the following does not seem to sort things. It also won't work for a sort script as well since [bucket_sort] only supports field based sorting.
by_tag_sort: {
bucket_sort: {
sort: [{ 'custom_metric.value': { order: 'desc' } }]
}
}
bucket_sort expects to be run within a multi-bucket context but your reverse_nested aggregation is single-bucket (irrespective of the fact that it's a child of a multi-bucket terms aggregation).
The trick is to use an empty-ish filters aggregation to generate a multi-bucket context and then run the bucket sort:
{
"size": 0,
"aggs": {
"categories_parent": {
"nested": {
"path": "categories.tags"
},
"aggs": {
"filtered": {
"filter": {
"term": {
"categories.tags.category": "color"
}
},
"aggs": {
"by_slug": {
"terms": {
"field": "categories.tags.slug",
"size": 10
},
"aggs": {
"back_to_parent": {
"reverse_nested": {},
"aggs": {
"multi_bucket_emulator": {
"filters": {
"filters": {
"placeholder_match_all_query": {
"match_all": {}
}
}
},
"aggs": {
"clicks": {
"sum": {
"field": "statistics.clicks"
}
},
"by_tag_sort": {
"bucket_sort": {
"sort": [
{
"clicks.value": {
"order": "desc"
}
}
]
}
}
}
}
}
}
}
}
}
}
}
}
}
}
Update: sorting by the result of a custom scripted metric value
{
"size": 0,
"aggs": {
"categories_parent": {
"nested": {
"path": "categories.tags"
},
"aggs": {
"filtered": {
"filter": {
"term": {
"categories.tags.category": "color"
}
},
"aggs": {
"by_slug": {
"terms": {
"field": "categories.tags.slug",
"size": 10
},
"aggs": {
"back_to_parent": {
"reverse_nested": {},
"aggs": {
"multi_bucket_emulator": {
"filters": {
"filters": {
"placeholder_match_all_query": {
"match_all": {}
}
}
},
"aggs": {
"clicks": {
"sum": {
"field": "statistics.clicks"
}
},
"custom_metric": {
"scripted_metric": {
"init_script": "state.accounts = []",
"map_script": """state.accounts.add(params["_source"])""",
"combine_script": """
double result = 0;
for (def acc : state.accounts) {
result += ( acc.statistics.clicks + acc.statistics.impressions);
}
return result;
""",
"reduce_script": """
double sum = 0;
for (def state : states) {
sum += state;
}
return sum;
"""
}
},
"by_tag_sort": {
"bucket_sort": {
"sort": [
{
"custom_metric.value": {
"order": "desc"
}
}
]
}
}
}
}
}
}
}
}
}
}
}
}
}
}
Joe - Elasticsearch Handbook - I have an equivalent query to yours (one that sorts by the result of a custom scripted metric) and I expect the response to your query looks something like the below.
I have noticed that sorting specified by the bucket_sort does not get applied to the uppermost buckets (i.e. by_slug.buckets), which are still sorted by the default doc_count ordering. This can also be verified by changing the custom_metric.value ordering from desc to asc, which has no effect on the order of the results.
My understanding of bucket_sort suggests that sorting based on the custom_metric is applied to the aggregation one level up, which in this case would be multi_bucket_emulator.buckets (but because this is an emulator it has no actual buckets to sort).
Is it possible to sort the by_slug.buckets based on the custom_metric values?
I am using Elasticsearch v7.10.
Thanks very much.
(Sorry for posting this question as an answer; it was too long to be a comment.)
Response (approximation):
{
"aggregations": {
"categories_parent": {
"filtered": {
"by_slug": {
"buckets": [
{
"key": "xxxxxx",
"back_to_parent": {
"multi_bucket_emulator": {
"buckets": {
"placeholder_match_all_query": {
"clicks": {
"buckets": [
{
"key": 5.0,
"doc_count": 1
},
…
]
},
"custom_metric": {
"value": 20.0
}
}
}
}
}
},
…
]
}
}
}
}
}

Can I sort grouped search result by formula?

I am trying to implement query which will sort aggregated results by the formula.
For example, we have the next entities:
{
"price":"1000",
"zip":"77777",
"field1":"1",
"field2":"5"
},
{
"price":"2222",
"zip":"77777",
"field1":"2",
"field2":"5"
},
{
"price":"1111",
"zip":"77777",
"field1":"1",
"field2":"5"
}
Now, my query without sorting looks like:
POST /entities/_search {
"size": 0,
"query": {
"term": {
"zip": {
"value": "77777"
}
}
},
"aggs": {
"my composite": {
"composite": {
"size": 500,
"sources": [{
"field1_term": {
"terms": {
"field": "field1"
}
}
},
{
"field2_term": {
"terms": {
"field": "field2"
}
}
}
]
},
"aggs": {
"avg_price_per_group": {
"avg": {
"field": "price"
}
},
"results_per_group": {
"top_hits": {
"size": 100,
"_source": {
"include": ["entity_id", "price"]
}
}
}
}
}
}
}
The first one I need to group result by field1 and field2 and then calculate the average price for each group.
Then I need to divide the price of each doc by average price value and sort documents based on this value.
Is it possible to do this somehow?

How to build Price Comparison with Elasticsearch

I have to build a price comparison system. My idea was to use Elasticsearch to build on.
Now I have this problem. How can I aggregate seller prices for each Product.
As Example see this Screenshot:
Let me say I have this simple mapping:
products: {
product: {
properties: {
id: {
type: "long"
},
name: {
type: "string"
},
....
sellers: {
dynamic: "true",
properties: {
sellerId: {
type: "long"
},
price: {
type: "float"
}
}
}
}
}
}
Can I aggregate or facet the price (min,max, and sellers count) for each Product?
Or is there a way to build this thing with parent child relations?
Assuming you're using 1.0 and not 0.90, then you can do this quite easily using min, max and value_count aggregations.
{
"query": {
"match": {
"name": "item1"
}
},
"aggs": {
"Min": {
"min": {
"field": "sellers.price"
}
},
"Max": {
"max": {
"field": "sellers.price"
}
},
"SellerCount": {
"value_count": {
"field": "sellers.sellerId"
}
}
}
}
Or, you could use a sub-aggregation to return the information for each product and not a specific one.
{
"aggs": {
"Products": {
"terms": {
"field": "name",
"size": 10
},
"aggs": {
"Min": {
"min": {
"field": "sellers.price"
}
},
"Max": {
"max": {
"field": "sellers.price"
}
},
"SellerCount": {
"value_count": {
"field": "sellers.sellerId"
}
}
}
}
}
}

Resources