Ordering Aggregation Buckets by Score - elasticsearch

Is it possible to order the aggregation bucket by score?
"aggs": {
"UnitAggregationBucket": {
"terms": {
"field": "unitId",
"size": 10,
/* "order": order by max score documents per bucket */
}
}
}
I have seen this document which explains the default order is doc_count, but I cannot find out if it is possible and how to order the buckets by score.

Yes, it is possible to do that like this:
{
"size": 0,
"query": {
...
},
"aggs": {
"UnitAggregationBucket": {
"terms": {
"field": "unitId",
"size": 10,
"order": {
"score": "desc"
}
},
"aggs": {
"score": {
"max": {
"script": "_score"
}
}
}
}
}
}

Related

How to define percentage of result items with specific field in Elasticsearch query?

I have a search query that returns all items matching users that have type manager or lead.
{
"from": 0,
"size": 20,
"query": {
"bool": {
"should": [
{
"terms": {
"type": ["manager", "lead"]
}
}
]
}
}
}
Is there a way to define what percentage of the results should be of type "manager"?
In other words, I want the results to have 80% of users with type manager and 20% with type lead.
I want to make a suggestion to use bucket_path aggregation. As I know this aggregation needs to be run in sub-aggs of a histogram aggregation. As you have such field in your mapping so I think this query should work for you:
{
"size": 0,
"aggs": {
"NAME": {
"date_histogram": {
"field": "my_datetime",
"interval": "month"
},
"aggs": {
"role_type": {
"terms": {
"field": "type",
"size": 10
},
"aggs": {
"count": {
"value_count": {
"field": "_id"
}
}
}
},
"role_1_ratio": {
"bucket_script": {
"buckets_path": {
"role_1": "role_type['manager']>count",
"role_2": "role_type['lead']>count"
},
"script": "params.role_1 / (params.role_1+params.role_2)*100"
}
},
"role_2_ratio": {
"bucket_script": {
"buckets_path": {
"role_1": "role_type['manager']>count",
"role_2": "role_type['lead']>count"
},
"script": "params.role_2 / (params.role_1+params.role_2)*100"
}
}
}
}
}
}
Please let me know if it didn't work well for you.

How to mention from and size for the first level of elastic search aggregation in nested aggregation?

I have written a query to get the buckets based on id and then sort it. This works fine. But how to make it return buckets from position 100 till 200 for aggregation_by_id rule?
{
"query": {
"match_all": {}
},
"size": 0,
"aggregations": {
"aggregation_by_id": {
"terms": {
"field": "id.keyword"
"size" : 200
},
"aggs": {
"sort_timestamp": {
"top_hits": {
"sort": [{
"timestamp": {
"order": "desc",
"unmapped_type": "long"
}
}],
"size": 1
}
}
}
}
}
}

Paging the top_hits aggregation in ElasticSearch

Right now I'm doing a top_hits aggregation in Elastic Search that groups my data by a field, sorts the groups by a date, and chooses the top 1.
I need to somehow page this aggregation results in a way that I can pass through the pageSize and the pageNumber, but I don't know how.
In addition to this, I also need the total results of this aggregation so we can show it in a table in our web interface.
The aggregation looks like this:
POST my_index/_search
{
"size": 0,
"aggs": {
"top_artifacts": {
"terms": {
"field": "artifactId.keyword"
},
"aggs": {
"top_artifacts_hits": {
"top_hits": {
"size": 1,
"sort": [{
"date": {
"order": "desc"
}
}]
}
}
}
}
}
}
If I understand what you want, you should be able to do pagination through a Composite Aggregation. You can still pass your size parameter in your pagination, but your from would be the key for the bucket.
POST my_index/_search
{
"size": 0,
"aggs": {
"top_artifacts": {
"composite": {
"sources": [
{
"artifact": {
"terms": {
"field": "artifactId.keyword"
}
}
}
]
,
"size": 1, // OPTIONAL SIZE (How many buckets)
"after": {
"artifact": "FOO_BAZ" // Buckets after this bucket key
}
},
"aggs": {
"hits": {
"top_hits": {
"size": 1,
"sort": [
{
"timestamp": {
"order": "desc"
}
}
]
}
}
}
}
}
}

How do I aggregate over top_hits results in elasticsearch

Here are example documents:
{
"player": "Jim",
"score" : 5
"timestamp": 1459492890000
}
{
"player": "Jim",
"score" : 7
"timestamp": 1459492895000
}
{
"player": "Dave",
"score" : 9
"timestamp": 1459492894000
}
{
"player": "Dave",
"score" : 4
"timestamp": 1459492898000
}
I want to get the latest score for each player and then get the average of all those scores. So the answer would be 5.5. Jim's latest score is 7 and Dave's latest score is 4. The average between those two is 5.5
The only way I found to get the "latest" document of a player was to use the top_hits aggregation. However, it does not seem that I am able to do another aggregation after I get the latest document.
This is the best I came up with:
{
"aggs": {
"last_score": {
"terms": { "field": "player" },
"aggs": {
"last_score_hits": {
"top_hits": {
"sort": [ { "timestamp": { "order": "desc" } } ],
"size": 1
},
"aggs": {
"avg_score": {
"avg": { "field": "score" }
}
}
}
}
}
}
}
However, this gives me this error:
Aggregator [last_score_hits] of type [top_hits] cannot accept
sub-aggregations
If there is another way to accomplish this search without using top_hits as well, then I would be all for it.
You're trying to put avg_score as a sub-aggregation of last_score_hits.
To get success you have to put avg_score as a sub-aggregation of last_score. See an example bellow:
{
"aggs": {
"last_score": {
"terms": {
"field": "player"
},
"aggs": {
"last_score_hits": {
"top_hits": {
"sort": [
{
"timestamp": {
"order": "desc"
}
}
],
"size": 1
}
},
"avg_score": {
"avg": {
"field": "score"
}
}
}
}
}
}
You can have other aggregation on a parallel level of top_hit but you cannot have any sub_aggregation below top_hit. It is not supported by ElasticSearch. here is the link to Github issue
You can have a parallel level aggregation like:
"aggs": {
"top_hits_agg": {
"top_hits": {
"size": 10,
"_source": {
"includes": ["score"]
}
}
},
"avg_agg": {
"avg": {
"field": "score"
}
}
}

Union of sorted sized queries in Elasticsearch

I have docs in Elasticsearch like:
{
"key1":1,
"key2":2,
"key3":3
}
I would like to make a query that returns 30 docs which are the union of the:
the 10 docs with the highest values in key1 +
the 10 docs with the highest values in key2 +
the 10 docs with the highest values in key3
I got 2 ideas:
Using DisMaxQuery - but I couldn't use sorting. Probably missed something..
using MultiSearch - but I would like to get one result object
Any suggestions would be helpful!
Another idea would be to add three terms aggregations on key1, key2 and key3 each sorted by a max sub-aggregation (in order to get the highest value for each key) and for each of them you can add a another top_hits sub-aggregation. You might get more less than 10 docs per key, if that's a problem you can increase the size of the terms aggregations to 2 or 3 and then filter out the unneeded top hits on the client side.
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"topkey1": {
"terms": {
"field": "key1",
"size": 1,
"order": {
"max_key1": "desc"
}
},
"aggs": {
"max_key1": {
"max": {
"field": "key1"
}
},
"key1_tophits": {
"top_hits": {
"size": 10
}
}
}
},
"topkey2": {
"terms": {
"field": "key2",
"size": 1,
"order": {
"max_key2": "desc"
}
},
"aggs": {
"max_key2": {
"max": {
"field": "key2"
}
},
"key2_tophits": {
"top_hits": {
"size": 10
}
}
}
},
"topkey3": {
"terms": {
"field": "key3",
"size": 1,
"order": {
"max_key3": "desc"
}
},
"aggs": {
"max_key3": {
"max": {
"field": "key3"
}
},
"key_tophits": {
"top_hits": {
"size": 10
}
}
}
}
}
}

Resources