Elasticsearch - How to order buckets using keyword field - elasticsearch

I encountered a problem, as I need to sort my buckets using a keyword field for this I have tried two approaches.
I have been trying to sort the result of my aggregation (buckets) from the top hit aggregation. My top_hits contains one element which is the username
"user_data": {
"top_hits": {
"_source": {
"includes": ["username"]
"size": 1
To sort the buckets i'm trying with a bucket sort, the bucket sort is something like this
sorting": {
"bucket_sort": {
"sort": [
"user_data>username": { ----> This is the error
"order": "desc"
"from": 0,
"size": 25
But I received a syntax error basically the bucket path is wrong.
Another approach that I used to accomplish the sort was to add another aggregation over the username to obtain the max. Something like this
"to_sort" : {
"max": {
"field": "username"
And use the following bucket_sort
"sorting": {
"bucket_sort": {
"sort": [
"to_sort": {
"order": "desc"
"from": 0,
"size": 25
But basically I can't to use a keyword field to use the max aggregation.
Is there a way to sort my buckets using the username, the username is a keyword field?
The parent of my aggregation is
"aggs": {
"CountryId": {
"terms": {
"field": "countryId",
"size": 10000
The value of the username is different between each bucket
The result of the buckets is something like this
"buckets" : [
"key" : "11111",
"doc_count" : 17,
"user_data" : {
"hits" : {
"total" : 10,
"max_score" : 11,
"hits" : [
"_index" : "index_name",
"_type" : "index_name",
"_id" : "101010",
"_score" : 0.0,
"_source" : {
"username" : "cccccc"
"key" : "33333",
"doc_count" : 17,
"user_data" : {
"hits" : {
"total" : 10,
"max_score" : 11,
"hits" : [
"_index" : "index_name",
"_type" : "index_name",
"_id" : "101010",
"_score" : 0.0,
"_source" : {
"username" : "bbbbb"
"key" : "22222",
"doc_count" : 17,
"user_data" : {
"hits" : {
"total" : 10,
"max_score" : 11,
"hits" : [
"_index" : "index_name",
"_type" : "index_name",
"_id" : "101010",
"_score" : 0.0,
"_source" : {
"username" : "aaaaa"
And the following buckets result is I would like to have
"buckets" : [
"key" : "22222",
"doc_count" : 17,
"user_data" : {
"hits" : {
"total" : 10,
"max_score" : 11,
"hits" : [
"_index" : "index_name",
"_type" : "index_name",
"_id" : "101010",
"_score" : 0.0,
"_source" : {
"username" : "aaaaa"
"key" : "33333",
"doc_count" : 17,
"user_data" : {
"hits" : {
"total" : 10,
"max_score" : 11,
"hits" : [
"_index" : "index_name",
"_type" : "index_name",
"_id" : "101010",
"_score" : 0.0,
"_source" : {
"username" : "bbbbb"
"key" : "11111",
"doc_count" : 17,
"user_data" : {
"hits" : {
"total" : 10,
"max_score" : 11,
"hits" : [
"_index" : "index_name",
"_type" : "index_name",
"_id" : "101010",
"_score" : 0.0,
"_source" : {
"username" : "ccccc"
How you can see the buckets was order by username.

I had a problem similar to this and didn't found any answer on the internet. So I tried to build my own, took me almost a week :/. It won't work always because of the limit on the ordered hashcode generation for strings, so you will have to play with your own charset and the length of the first chars on the string you deem enough to sort (6 for me), do some tests because you only want to use the positive interval of the long type or it will not work at all (due to my charset length I could go up to 13). I basically, build my metric for the bucket_sort using a scripted_metric based on finding the top_hits manually from here and adapted it to compute an ordered hashcode of my wanted keyword.
Below is my query where I sort the user's last session top hits by sso.name keyword, it should be more or less easy for you to adapt it to your problem.
"size": 0,
"timeout": "60s",
"query": {
"bool": {
"must": [
"exists": {
"field": "user_id"
"aggregations": {
"by_user": {
"terms": {
"field": "user_id",
"size": 10000,
"order": [
"_count": "desc"
"_key": "asc"
"aggregations": {
"my_top_hits_sso_ordered_hash": {
"scripted_metric": {
"init_script": "state.timestamp_latest = 0L; state.last_sso_ordered_hash = 0L",
"map_script": """
def current_date = doc['login_timestamp'].getValue().toInstant().toEpochMilli();
if (current_date > state.timestamp_latest) {
state.timestamp_latest = current_date;
state.last_sso_ordered_hash = 0L;
if(doc['sso.name'].size()>0) {
String charset = "abcdefghijklmnopqrstuvwxyz";
String ssoName = doc['sso.name'].value;
int length = charset.length();
for(int i = 0; i<Math.min(ssoName.length(), 6); i++) {
state.last_sso_ordered_hash = state.last_sso_ordered_hash*length + charset.indexOf(String.valueOf(ssoName.charAt(i))) + 1;
"combine_script":"return state",
"reduce_script": """
def last_sso_ordered_hash = '';
def timestamp_latest = 0L;
for (s in states) {
if (s.timestamp_latest > (timestamp_latest)) {
timestamp_latest = s.timestamp_latest; last_sso_ordered_hash = s.last_sso_ordered_hash;
return last_sso_ordered_hash;
"user_last_session": {
"top_hits": {
"from": 0,
"size": 1,
"sort": [
"login_timestamp": {
"order": "desc"
"pagination": {
"bucket_sort": {
"sort": [
"my_top_hits_sso_ordered_hash.value": {
"order": "desc"
"from": 0,
"size": 100


is there a way of showing documents after a sum aggregation?

I've been trying lately to retrieve information about sales on Kibana DSL.
I've been told to show vendors information PLUS their monthly sales.
(I'll use the "Kibana_sample_data_ecommerce" for this example)
I already did this aggregation in order to group all clients by their 'customer_id':
#Aggregations (group by)
GET kibana_sample_data_ecommerce/_search
"size": 0,
"aggs": {
"by user_id": {
"terms": {
"field": "customer_id"
"aggs": {
"add_field_to_bucket": {
"top_hits": {"size": 1, "_source": {"includes": ["customer_full_name"]}}
in which i've included customer_full_name in the result:
"aggregations" : {
"by user_id" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 2970,
"buckets" : [
"key" : "27",
"doc_count" : 348,
"add_field_to_bucket" : {
"hits" : {
"total" : 348,
"max_score" : 1.0,
"hits" : [
"_index" : "kibana_sample_data_ecommerce",
"_type" : "_doc",
"_id" : "fhwUR3sBpfDKGuVlpu8r",
"_score" : 1.0,
"_source" : {
"customer_full_name" : "Elyssa Underwood"
So, in this result i know that 'Elyssa Underwood' with 'customerid' '27' has 348 hits (or documents related).
Also i recquire to know the total spent by 'Elyssa' on those products, using the field 'products.taxful_price'.
The thing is that i cannot perform a subaggregation on top_hits (as far as i know); Also I've tried to do a sum_aggregation, but it ends on the same result (i got my sum, but i cannot access top_hits sub aggregation at that point).
At the end of the day i want to have a result like this:
"hits" : [
"_index" : "kibana_sample_data_ecommerce",
"_type" : "_doc",
"_id" : "fhwUR3sBpfDKGuVlpu8r",
"_score" : 1.0,
"_source" : {
"customer_full_name" : "Elyssa Underwood",
"total_spent": 1234.5678
Is there something I can do to achieve it?.
PS: I'm using ElasticSearch 5.x and also I have access to NEST client, if there's a solution I can reach through it.
Thanks In Advance.
I have used below as sample data.
"purchase": 2001
GET index/_search
"size": 0,
"aggs": {
"NAME": {
"terms": {
"field": "customer_id",
"size": 10
"aggs": {
"total_sales": {
"sum": {
"field": "purchase"
"top_hits": {
"size": 10
"key" : 2,
"doc_count" : 1,
"documents" : {
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
"max_score" : 1.0,
"hits" : [
"_index" : "index1",
"_type" : "_doc",
"_id" : "0HPzcHsBjw4ziwrzGzrq",
"_score" : 1.0,
"_source" : {
"customer_id" : 2,
"client-name" : "b",
"purchase" : 2001
"total_sales" : {
"value" : 2001.0

global sorting across different buckets after aggregation in elasticsearch

a sample in my document is as shown below.
{"rackName" : "rack005", "roomName" : "roomB", "power" : 132, "timestamp" : 1594540106208}
the thing I wanna do is get the latest data of each rack in a given room then sort them by power.
with the code below I did something to get close to my target.losing mind with the last step which seems like soring my data cross different buckets by field 'power'.
GET /power/_search
"query": {
"term": {
"roomName.keyword": {
"value": "roomB"
"aggs": {
"rk_ag": {
"terms": {
"field": "rackName"
"aggs": {
"latest": {
"top_hits": {
"sort": [
"timestamp": {
"order": "desc"
"size": 1
"aggregations" : {
"rk_ag" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
"key" : "rack003",
"doc_count" : 4,
"latest" : {
"hits" : {
"total" : {
"value" : 4,
"relation" : "eq"
"max_score" : null,
"hits" : [
"_index" : "power",
"_type" : "_doc",
"_id" : "0FXVQnMB8DPB7H9t6U0E",
"_score" : null,
"_source" : {
"rackName" : "rack003",
"roomName" : "roomB",
"power" : 115,
"timestamp" : 1594540117492
"sort" : [
"key" : "rack004",
"doc_count" : 4,
"latest" : {
"hits" : {
"total" : {
"value" : 4,
"relation" : "eq"
"max_score" : null,
"hits" : [
"_index" : "power",
"_type" : "_doc",
"_id" : "1FXVQnMB8DPB7H9t6U0E",
"_score" : null,
"_source" : {
"rackName" : "rack004",
"roomName" : "roomB",
"power" : 108,
"timestamp" : 1594540117492
"sort" : [
"key" : "rack005",
"doc_count" : 4,
"latest" : {
"hits" : {
"total" : {
"value" : 4,
"relation" : "eq"
"max_score" : null,
"hits" : [
"_index" : "power",
"_type" : "_doc",
"_id" : "2FXVQnMB8DPB7H9t6U0E",
"_score" : null,
"_source" : {
"rackName" : "rack005",
"roomName" : "roomB",
"power" : 118,
"timestamp" : 1594540114492
"sort" : [
You're sorting by timestamp instead of power. Try this instead:
GET /power/_search
"query": {
"term": {
"roomName.keyword": {
"value": "roomB"
"aggs": {
"rk_ag": {
"terms": {
"field": "rackName"
"aggs": {
"latest": {
"top_hits": {
"sort": [
"power": {
"order": "desc"
"size": 1
You can sort by multiple fields too.
Adding to #Joe's answer. As he mentioned, you can use multiple fields in the sort.
Below query would give you what you are looking for:
POST my_rack_index/_search
"size": 0,
"query": {
"term": {
"roomName.keyword": {
"value": "roomB"
"aggs": {
"rk_ag": {
"terms": {
"field": "rackName"
"aggs": {
"latest": {
"top_hits": {
"sort": [ <---- Note this part
"timestamp": {
"order": "desc"
"power": {
"order": "desc"
"size": 1
So now if for every rack you have two documents having same rackName with exact same power, the one with the latest timestamp would be showing up in the response.
The way sort would work is, first it would sort based on the timestamp, then it would do the sorting based on power by keeping the sort based on timestamp intact.

How to get last and first document ids by given criteria

I have some documents indexed on Elasticsearch, looking like these samples:
how can i get in one query the last migrated id and first not migrated id?
Any ideas?
Filter aggregation and top_hits aggregation can be used to get last migrated and first not migrated
"size": 0,
"aggs": {
"migrated": {
"filter": { --> filter where isMigrated:true
"term": {
"isMigrated": true
"aggs": {
"last_migrated": { --> get first documents sorted on id in descending order
"top_hits": {
"size": 1,
"sort": [{"id.keyword":"desc"}]
"not_migrated": {
"filter": {
"term": {
"isMigrated": false
"aggs": {
"first_not_migrated": {
"top_hits": {
"size": 1,
"sort": [{"id.keyword":"asc"}] -->any keyword field can be used to sort
"aggregations" : {
"not_migrated" : {
"doc_count" : 2,
"first_not_migrated" : {
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
"max_score" : null,
"hits" : [
"_index" : "index86",
"_type" : "_doc",
"_id" : "TxuKUHIB8mx5yKbJ_rGH",
"_score" : null,
"_source" : {
"id" : "3",
"isMigrated" : false
"sort" : [
"migrated" : {
"doc_count" : 2,
"last_migrated" : {
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
"max_score" : null,
"hits" : [
"_index" : "index86",
"_type" : "_doc",
"_id" : "ThuKUHIB8mx5yKbJ87HF",
"_score" : null,
"_source" : {
"id" : "2",
"isMigrated" : true
"sort" : [
You can store the timestamp information with each document and query based on the latest timestamp and isMigrated: true condition.
As per comment, https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-bool-query.html can you used to combine multiple boolean conditions.

Elasticsearch, terms aggs according to sibling nested fields

Elasticsearch v7.5
Hello and good day!
We have 2 indices named socialmedia and influencers
Sample contents:
'_id' : 1001,
'title' : "Title 1",
'smp_id' : 1,
"latest" : [
"soc_mm_score" : "5",
'_id' : 1002,
'title' : "Title 2",
'smp_id' : 2,
"latest" : [
"soc_mm_score" : "10",
'_id' : 1003,
'title' : "Title 3",
'smp_id' : 3,
"latest" : [
"soc_mm_score" : "35",
'_id' : 1004,
'title' : "Title 4",
'smp_id' : 2,
"latest" : [
"soc_mm_score" : "30",
//omitted some other fields
'_id' : 1,
'name' : "John",
'smp_id' : 1
'_id' : 2,
'name' : "Peter",
'smp_id' : 2
'_id' : 3,
'name' : "Mark",
'smp_id' : 3
Now I have this simple query that determines which documents in the socialmedia index has the most latest.soc_mm_score value, and also displaying their corresponding influencers determined by the smp_id
GET socialmedia/_search
"size": 0,
"_source": "latest",
"query": {
"match_all": {}
"aggs": {
"nested": {
"path": "latest"
"aggs": {
"terms": {
"field": "latest.soc_mm_score",
"order": {
"_key": "desc"
"size": 3
"aggs": {
"reverse_nested": {},
"aggs": {
"SMP_ID": {
"top_hits": {
"_source": ["smp_id"],
"size": 1
"aggregations" : {
"LATEST" : {
"doc_count" : //omitted,
"MM_SCORE" : {
"doc_count_error_upper_bound" : //omitted,
"sum_other_doc_count" : //omitted,
"buckets" : [
"key" : 35,
"doc_count" : 1,
"doc_count" : 1,
"SMP_ID" : {
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
"max_score" : 1.0,
"hits" : [
"_index" : "socialmedia",
"_type" : "index",
"_id" : "1003",
"_score" : 1.0,
"_source" : {
"smp_id" : "3"
"key" : 30,
"doc_count" : 1,
"doc_count" : 1,
"SMP_ID" : {
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
"max_score" : 1.0,
"hits" : [
"_index" : "socialmedia",
"_type" : "index",
"_id" : "1004",
"_score" : 1.0,
"_source" : {
"smp_id" : "2"
"key" : 10,
"doc_count" : 1,
"doc_count" : 1,
"SMP_ID" : {
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
"max_score" : 1.0,
"hits" : [
"_index" : "socialmedia",
"_type" : "index",
"_id" : "1002",
"_score" : 1.0,
"_source" : {
"smp_id" : "2"
with the query above, I was able to successfully display which documents have the highest latest.soc_mm_score values
The sample output above only displays DOCUMENTS, telling that the influencers (a.k.a smp_id) related to them are the TOP INFLUENCERS according to latest.soc_mm_score
Ideally just by using this aggs query,
"terms" : {
"field" : "smp_id"
portrays the concept of which influencers are the top according to the doc_count
Now, displaying the terms query according to latest.soc_mm_score displays TOP DOCUMENTS
"terms" : {
"field" : "latest.soc_mm_score"
I want to display the TOP INFLUENCERS according to the latest.soc_mm_count in the socialmedia index. If Elasticsearch can count all the documents where according to unique smp_id, is there a way for ES to sum all latest.soc_mm_score values and use it as terms?
My objective above should output these:
smp_id 2 as the Top Influencer because he has 2 posts (with soc_mm_score of 30 and 10), adding them gets him 40 soc_mm_score
smp_id 3 as the 2nd Top Influencer, he has 1 post with 35 soc_mm_score
smp_id 1 as the 3rd Top Influencer, he has 1 post with 5 soc_mm_score
Is there a proper query to meet this objective?
"aggs": {
"INFS": {
"terms": {
"field": "smp_id.keyword",
"order": {
"aggs": {
"nested": {
"path": "latest"
"aggs": {
"sum" : {
"field": "latest.soc_mm_score"
Displays the following sample:

ElasticSearch get last n distinct records

I am trying to implement a search query over records stored in elasticsearch.
The record structure looks something like this.
"_index" : "box_info_store",
"_type" : "boxes",
"_id" : "pWjQLWkBIJk0ORjd0X2P",
"_score" : null,
"_source" : {
"transactionID" : "60ab66cf24c9924f562bf1a2b5d92305d0a6",
"boxNumber" : "Box3",
"createDate" : "2013-09-17T00:00:00",
"itemNumber" : "Item1",
"address" : "Sample Address"
one box can contain multiple items. For example Box3 can have Item1, Item2 and Item3. So in elasticsearch i will have 3 different documents. Also at the same time, same box and same item can also exist but with different address. The transactionID may or maynot be the same for these documents.
My requirement is to fetch last n recent and distinct transactionIDs, along with their records.
I tried following query to fetch last 7 distinct transactionIDs
GET /box_info_store/boxes/_search?size=7
"query": {
"bool": {
"must": [
"sort": [
"createDate": {
"order": "desc"
"aggs": {
"distinct_transactions": {
"terms": { "field": "transactionID"}
This fetched me last 7 documents where boxNumber is Box3 and itemNumber is Item1, but not 7 distinct transactionIDs, two out of these seven documents have the same transactionID(both having separate address though).
But my requirement is to get 7 distinct transactionIds, no matter how many document it returns.
Hope i was able to explain myself.
Appreciate any kind of help here
------Edited #gaurav9620, i ran the first query and got count as 32, then i ran the second query with distinct count as 3 i got the following result
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
"hits" : {
"total" : 32,
"max_score" : null,
"hits" : [
"_index" : "box_info_store",
"_type" : "boxes",
"_id" : "RWjRLWkBIJk0ORjdEX-L",
"_score" : null,
"_source" : {
"transactionID" : "3087e106244f6247a5290fb21ce64254529c",
"boxNumber" : "Box3",
"createDate" : "2017-11-15T00:00:00",
"itemNumber" : "Item1",
"address" : "sampleAddress12",
"sort" : [
"_index" : "box_info_store",
"_type" : "boxes",
"_id" : "MGjQLWkBIJk0ORjdwX0M",
"_score" : null,
"_source" : {
"transactionID" : "60ab66cf24c9924f562bf1a2b5d92305d0a6",
"boxNumber" : "Box3",
"createDate" : "2016-04-03T00:00:00",
"itemNumber" : "Item1",
"address" : "sampleAddress321",
"sort" : [
"_index" : "box_info_store",
"_type" : "boxes",
"_id" : "AGjRLWkBIJk0ORjdK4CJ",
"_score" : null,
"_source" : {
"transactionID" : "3087e106244f6247a5290fb21ce64254529c",
"boxNumber" : "Box3",
"createDate" : "1996-02-16T00:00:00",
"itemNumber" : "Item1",
"address" : "sampleAddress4324",
"sort" : [
"aggregations" : {
"unique_transactions" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 16,
"buckets" : [
"key" : "3087e106244f6247a5290fb21ce64254529c",
"doc_count" : 6
"key" : "27c5f3422f4482495d29e7b2c15c0e311743",
"doc_count" : 5
"key" : "c40e53212e74e24bf02a5bd2b134cf92bffb",
"doc_count" : 5
The size which you have used : represents number of raw documents that are retrieved.
If your case what you need to do is :
Mention size as 0 -> which will return you no raw documents
Include a size parameter in aggregation which will return you unique 7 ids.
GET /box_info_store/boxes/_search?size=7
"query": {
"bool": {
"must": [
"match": {
"boxNumber": "Box3"
"match": {
"itemNumber": "Item1"
"sort": [
"createDate": {
"order": "desc"
"aggs": {
"distinct_transactions": {
"terms": {
"field": "transactionID",
"size": 7
First fire this query
GET /box_info_store/boxes/_search?size=0
"query": {
"bool": {
"must": [
"match": {
"boxNumber": "Box3"
"match": {
"itemNumber": "Item1"
Here you will find total number of documents matching your query which you can set as n
After this fire your query as below
GET /box_info_store/boxes/_search?size=**n**
"query": {
"bool": {
"must": [
"match": {
"boxNumber": "Box3"
"match": {
"itemNumber": "Item1"
"sort": [
"createDate": {
"order": "desc"
"aggs": {
"distinct_transactions": {
"terms": {
"field": "transactionID",
