ElasticSearch return aggregations random order - elasticsearch

I've got the following ElasticSearch-query, to get 10 documents from each "category" grouped on "cat.id":
"aggs": {
"test": {
"terms": {
"size": 10,
"field": "cat.id"
},
"aggs": {
"top_test_hits": {
"top_hits": {
"_source": {
"includes": [
"id"
]
},
"size": 10
}
}
}
}
}
This is working fine. However I cannot seem to find a way, to randomly take 10 results from each bucket. The results are always the same. And I would like to have 10 random items from each bucket. I tried all kinds of things which are intended for documents, but non of them seem to be working.

As was already suggested in this answer, you can try using random sort in the top_hits aggregation, using a _script like this:
{
"aggs": {
"test": {
"terms": {
"size": 10,
"field": "cat.id"
},
"aggs": {
"top_test_hits": {
"top_hits": {
"_source": {
"includes": [
"id"
]
},
"size": 10,
"sort": {
"_script": {
"type": "number",
"script": {
"lang": "painless",
"source": "(System.currentTimeMillis() + doc['_id'].value).hashCode()"
},
"order": "asc"
}
}
}
}
}
}
}
}
Random sorting was broadly covered in this question.
Hope that helps!

Related

How to get the last Elasticsearch document for each unique value of a field?

I have a data structure in Elasticsearch that looks like:
{
"name": "abc",
"date": "2022-10-08T21:30:40.000Z",
"rank": 3
}
I want to get, for each unique name, the rank of the document (or the whole document) with the most recent date.
I currently have this:
"aggs": {
"group-by-name": {
"terms": {
"field": "name"
},
"aggs": {
"max-date": {
"max": {
"field": "date"
}
}
}
}
}
How can I get the rank (or the whole document) for each result, and if possible, in 1 request ?
You can use below options
Collapse
"collapse": {
"field": "name"
},
"sort": [
{
"date": {
"order": "desc"
}
}
]
Top hits aggregation
{
"aggs": {
"group-by-name": {
"terms": {
"field": "name",
"size": 100
},
"aggs": {
"top_doc": {
"top_hits": {
"sort": [
{
"date": {
"order": "desc"
}
}
],
"size": 1
}
}
}
}
}
}

Elasticsearch sort data upon all buckets

I am trying to make an es sort but I am struggling.
The base story of my data is that I have for example product definition which can consist of various products. (We call them abstract and concrete).
Let's say I have product A that is abstract it can consist of product B,C,D (called concretes).
I also for example have product E that can have F as a concrete and so on.
I want to aggregate the products by their abstract (to only show 1 of each concrete) and then sort all concretes based on some criteria.
I have written the following that doesn't work as expected.
"aggs": {
"category:58": {
"aggs": {
"products": {
"aggs": {
"abstract": {
"top_hits": {
"size": 1,
"sort": [
{
"criteria1": {
"order": "desc"
}
},
{
"_score": {
"order": "desc"
}
},
{
"criteria3": {
"missing": "_last",
"order": "asc",
"unmapped_type": "integer"
}
}
]
}
}
},
"terms": {
"field": "abstract_id",
"size": 10
}
}
},
"filter": {
"term": {
"categories.id": {
"value": "58"
}
}
}
}
},
If I got it correctly this will create 10 buckets and each bucket will have one product, and then my sort sorts a single product, where I should be sorting the entire result. The question is where do I place my sort that is currently in aggs->abstract.
If I remove the grouping by abstract_id and change it to something that is unique then the sorting does work, but then for one abstract product I can get all concretes displayed which I don't want to be the case.
I saw that I can't sort on terms so I'm kinda clueless now.
I ended up using multiple aggregations and then doing a bucket sort.
The query I ended up with looks like this
"aggs": {
"abstract": {
"top_hits": {
"size": 1
}
},
"criteria3": {
"sum": {
"field": "custom_filed_foo_bar"
}
},
"criteria1": {
"sum": {
"field": "boosted_value"
}
},
"criteria2": {
"max": {
"script":{
"source": "_score"
}
}
},
"sorting": {
"bucket_sort": {
"sort": [
{
"criteria1": {
"order": "desc"
}
},
{
"criteria2": {
"order": "desc"
}
},
{
"criteria3": {
"order": "desc"
}
}
]
}
}
I don't know if it's the correct approach but seems to be working

How to define percentage of result items with specific field in Elasticsearch query?

I have a search query that returns all items matching users that have type manager or lead.
{
"from": 0,
"size": 20,
"query": {
"bool": {
"should": [
{
"terms": {
"type": ["manager", "lead"]
}
}
]
}
}
}
Is there a way to define what percentage of the results should be of type "manager"?
In other words, I want the results to have 80% of users with type manager and 20% with type lead.
I want to make a suggestion to use bucket_path aggregation. As I know this aggregation needs to be run in sub-aggs of a histogram aggregation. As you have such field in your mapping so I think this query should work for you:
{
"size": 0,
"aggs": {
"NAME": {
"date_histogram": {
"field": "my_datetime",
"interval": "month"
},
"aggs": {
"role_type": {
"terms": {
"field": "type",
"size": 10
},
"aggs": {
"count": {
"value_count": {
"field": "_id"
}
}
}
},
"role_1_ratio": {
"bucket_script": {
"buckets_path": {
"role_1": "role_type['manager']>count",
"role_2": "role_type['lead']>count"
},
"script": "params.role_1 / (params.role_1+params.role_2)*100"
}
},
"role_2_ratio": {
"bucket_script": {
"buckets_path": {
"role_1": "role_type['manager']>count",
"role_2": "role_type['lead']>count"
},
"script": "params.role_2 / (params.role_1+params.role_2)*100"
}
}
}
}
}
}
Please let me know if it didn't work well for you.

Elasticsearch sort terms agg by arbitrary order

I have a terms aggregation and they want some specific values to always be at the top.
Like:
POST _search
{ "size": 0,
"aggs": {
"pets": {
"terms": {
"field": "species",
"order": "Dogs, Cats"
}
}
}
}
Where the results would be like "Dog", "Cat", "Iguana".
Dog and Cat at the top and everything else below.
Is this possible without scripting?
Thanks!
One way to do it is by filtering values in the terms aggregation. You'd create two terms aggregations, one with the desired terms and another with all other terms.
{
"size": 0,
"aggs": {
"top_terms": {
"terms": {
"field": "species",
"include": ["Dogs", "Cats"],
"order": { "_key" : "desc" }
}
},
"other_terms": {
"terms": {
"field": "species",
"exclude": ["Dogs", "Cats"]
}
}
}
}
Try it out
A script wouldn't be too complicated though -- first boost the two species, then sort by the scores first and then by _count:
GET pets/_search
{
"size": 0,
"query": {
"bool": {
"should": [
{
"terms": {
"species": [
"dog",
"cat"
],
"boost": 10
}
},
{
"match_all": {}
}
]
}
},
"aggs": {
"pets": {
"terms": {
"field": "species.keyword",
"order": [
{
"max_score": "desc"
},
{
"_count": "desc"
}
]
},
"aggs": {
"max_score": {
"max": {
"script": "_score"
}
}
}
}
}
}

How do I aggregate over top_hits results in elasticsearch

Here are example documents:
{
"player": "Jim",
"score" : 5
"timestamp": 1459492890000
}
{
"player": "Jim",
"score" : 7
"timestamp": 1459492895000
}
{
"player": "Dave",
"score" : 9
"timestamp": 1459492894000
}
{
"player": "Dave",
"score" : 4
"timestamp": 1459492898000
}
I want to get the latest score for each player and then get the average of all those scores. So the answer would be 5.5. Jim's latest score is 7 and Dave's latest score is 4. The average between those two is 5.5
The only way I found to get the "latest" document of a player was to use the top_hits aggregation. However, it does not seem that I am able to do another aggregation after I get the latest document.
This is the best I came up with:
{
"aggs": {
"last_score": {
"terms": { "field": "player" },
"aggs": {
"last_score_hits": {
"top_hits": {
"sort": [ { "timestamp": { "order": "desc" } } ],
"size": 1
},
"aggs": {
"avg_score": {
"avg": { "field": "score" }
}
}
}
}
}
}
}
However, this gives me this error:
Aggregator [last_score_hits] of type [top_hits] cannot accept
sub-aggregations
If there is another way to accomplish this search without using top_hits as well, then I would be all for it.
You're trying to put avg_score as a sub-aggregation of last_score_hits.
To get success you have to put avg_score as a sub-aggregation of last_score. See an example bellow:
{
"aggs": {
"last_score": {
"terms": {
"field": "player"
},
"aggs": {
"last_score_hits": {
"top_hits": {
"sort": [
{
"timestamp": {
"order": "desc"
}
}
],
"size": 1
}
},
"avg_score": {
"avg": {
"field": "score"
}
}
}
}
}
}
You can have other aggregation on a parallel level of top_hit but you cannot have any sub_aggregation below top_hit. It is not supported by ElasticSearch. here is the link to Github issue
You can have a parallel level aggregation like:
"aggs": {
"top_hits_agg": {
"top_hits": {
"size": 10,
"_source": {
"includes": ["score"]
}
}
},
"avg_agg": {
"avg": {
"field": "score"
}
}
}

Resources