Search documents with highest fields - elasticsearch

I'm trying to get all the documents with highest field value (+ conditional term filter)
Given the Employees mapping
Name Department Salary
----------------------------
Tomcat Dev 100
Bobcat QA 90
Beast QA 100
Tom Dev 100
Bob Dev 90
In SQL it would look like
select * from Employees where Salary = select max(salary) from Employees
expected output
Name Department Salary
----------------------------
Tomcat Dev 100
Beast QA 100
Tom Dev 100
and
select * from Employees where Salary = (select max(salary) from Employees where Department ='Dev' )
expected output
Name Department Salary
----------------------------
Tomcat Dev 100
Tom Dev 100
Is it possible with Elasticsearch ?

The below should help:
Looking at your data, note that I've come up with the below mapping:
Mapping:
PUT my-salary-index
{
"mappings": {
"properties": {
"name": {
"type": "keyword"
},
"department":{
"type": "keyword"
},
"salary":{
"type": "float"
}
}
}
}
Sample Documents:
POST my-salary-index/_doc/1
{
"name": "Tomcat",
"department": "Dev",
"salary": 100
}
POST my-salary-index/_doc/2
{
"name": "Bobcast",
"department": "QA",
"salary": 90
}
POST my-salary-index/_doc/3
{
"name": "Beast",
"department": "QA",
"salary": 100
}
POST my-salary-index/_doc/4
{
"name": "Tom",
"department": "Dev",
"salary": 100
}
POST my-salary-index/_doc/5
{
"name": "Bob",
"department": "Dev",
"salary": 90
}
Solutions:
Scenario 1: Return all employees with max salary
POST my-salary-index/_search
{
"size": 0,
"aggs": {
"my_employees_salary":{
"terms": {
"field": "salary",
"size": 1, <--- Note this
"order": {
"_key": "desc"
}
},
"aggs": {
"my_employees": {
"top_hits": { <--- Note this. Top hits aggregation
"size": 10
}
}
}
}
}
}
Note that I've made use of Terms Aggregation with Top Hits aggregation chained to it. I'd suggest to go through the links to understand both the aggregations.
So basically you just need to retrieve the first element in the Terms Aggregation that is why I've mentioned the size: 1. Also note the order, just in case if you requirement to retrieve the lowest.
Scenario 1 Response:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 5,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"my_employees" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 2,
"buckets" : [
{
"key" : 100.0,
"doc_count" : 3,
"employees" : {
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "my-salary-index",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"name" : "Tomcat",
"department" : "Dev",
"salary" : 100
}
},
{
"_index" : "my-salary-index",
"_type" : "_doc",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"name" : "Beast",
"department" : "QA",
"salary" : 100
}
},
{
"_index" : "my-salary-index",
"_type" : "_doc",
"_id" : "4",
"_score" : 1.0,
"_source" : {
"name" : "Tom",
"department" : "Dev",
"salary" : 100
}
}
]
}
}
}
]
}
}
}
Scenario 2: Return all employee with max salary from particular department
POST my-salary-index/_search
{
"size": 0,
"query": {
"bool": {
"must": [
{
"term": {
"department": "Dev"
}
}
]
}
},
"aggs": {
"my_employees_salary":{
"terms": {
"field": "salary",
"size": 1,
"order": {
"_key": "desc"
}
},
"aggs": {
"my_employees": {
"top_hits": {
"size": 10
}
}
}
}
}
}
For this, there are many ways to do this, but the idea is that you basically filter the documents before you apply aggregation on top of it. That way it would be more efficient.
Note that I'v just added a bool condition to the aggregation query mentioned in solution for Scenario 1.
Scenario 2 Response
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"my_employees_salary" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 1,
"buckets" : [
{
"key" : 100.0,
"doc_count" : 2,
"my_employees" : {
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 0.53899646,
"hits" : [
{
"_index" : "my-salary-index",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.53899646,
"_source" : {
"name" : "Tomcat",
"department" : "Dev",
"salary" : 100
}
},
{
"_index" : "my-salary-index",
"_type" : "_doc",
"_id" : "4",
"_score" : 0.53899646,
"_source" : {
"name" : "Tom",
"department" : "Dev",
"salary" : 100
}
}
]
}
}
}
]
}
}
}
You can also think of making use of SQL Access if you have complete xpack or rather licensed version of x-pack.
Hope this helps.

Related

is there a way of showing documents after a sum aggregation?

I've been trying lately to retrieve information about sales on Kibana DSL.
I've been told to show vendors information PLUS their monthly sales.
(I'll use the "Kibana_sample_data_ecommerce" for this example)
I already did this aggregation in order to group all clients by their 'customer_id':
#Aggregations (group by)
GET kibana_sample_data_ecommerce/_search
{
"size": 0,
"aggs": {
"by user_id": {
"terms": {
"field": "customer_id"
},
"aggs": {
"add_field_to_bucket": {
"top_hits": {"size": 1, "_source": {"includes": ["customer_full_name"]}}
}
}
}
}
}
in which i've included customer_full_name in the result:
"aggregations" : {
"by user_id" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 2970,
"buckets" : [
{
"key" : "27",
"doc_count" : 348,
"add_field_to_bucket" : {
"hits" : {
"total" : 348,
"max_score" : 1.0,
"hits" : [
{
"_index" : "kibana_sample_data_ecommerce",
"_type" : "_doc",
"_id" : "fhwUR3sBpfDKGuVlpu8r",
"_score" : 1.0,
"_source" : {
"customer_full_name" : "Elyssa Underwood"
}
}
]
}
}
}
So, in this result i know that 'Elyssa Underwood' with 'customerid' '27' has 348 hits (or documents related).
Also i recquire to know the total spent by 'Elyssa' on those products, using the field 'products.taxful_price'.
The thing is that i cannot perform a subaggregation on top_hits (as far as i know); Also I've tried to do a sum_aggregation, but it ends on the same result (i got my sum, but i cannot access top_hits sub aggregation at that point).
At the end of the day i want to have a result like this:
"hits" : [
{
"_index" : "kibana_sample_data_ecommerce",
"_type" : "_doc",
"_id" : "fhwUR3sBpfDKGuVlpu8r",
"_score" : 1.0,
"_source" : {
"customer_full_name" : "Elyssa Underwood",
"total_spent": 1234.5678
}
}
]
Is there something I can do to achieve it?.
PS: I'm using ElasticSearch 5.x and also I have access to NEST client, if there's a solution I can reach through it.
Thanks In Advance.
I have used below as sample data.
Data:
{
"customer_id":2,
"client-name":"b",
"purchase": 2001
}
Query:
GET index/_search
{
"size": 0,
"aggs": {
"NAME": {
"terms": {
"field": "customer_id",
"size": 10
},
"aggs": {
"total_sales": {
"sum": {
"field": "purchase"
}
},
"documents":{
"top_hits": {
"size": 10
}
}
}
}
}
}
Result:
{
"key" : 2,
"doc_count" : 1,
"documents" : {
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "index1",
"_type" : "_doc",
"_id" : "0HPzcHsBjw4ziwrzGzrq",
"_score" : 1.0,
"_source" : {
"customer_id" : 2,
"client-name" : "b",
"purchase" : 2001
}
}
]
}
},
"total_sales" : {
"value" : 2001.0
}
}

Compare last two elasticsearch inputs and show smaller

I have nested query where I want to compare last two inputs and display smaller one.
For example:
"price_history":[
{"id":0,
"price":16.99,
"date":"2021-02-07"
},
"id":1,
"price":20.99,
"date":"2021-02-08"
},
{"id":2,
"price":16.99,
"date":"2021-02-09"
}
]
So I want only id 1 and 2 to be compared and only id 2 to be shown.
I am looking for help to build such a query and am open to any other data model suggestions.
I understand you want to compare the last 2 prices and display the lowest. That information can be calculated on index time so it should be done there instead of re-calculating on each query.
I will show you how to do it on index time assuming you have no control on the software is ingesting the data using a Pipeline that will process your data before putting it in elasticsearch.
We are going to create a new field called best_price that store this price so you can then make queries against this field instead of calculate it on each query.
Ingesting data
POST test_uzer/_doc
{
"price_history": [
{
"id": 0,
"price": 16.99,
"date": "2021-02-07"
},
{
"id": 1,
"price": 20.99,
"date": "2021-02-08"
},
{
"id": 2,
"price": 16.99,
"date": "2021-02-09"
}
]
}
POST test_uzer/_doc
{
"price_history": [
{
"id": 0,
"price": 1.99,
"date": "2021-02-07"
},
{
"id": 1,
"price": 15.99,
"date": "2021-02-08"
},
{
"id": 2,
"price": 16.99,
"date": "2021-02-09"
}
]
}
Creating the ingest pipeline
PUT _ingest/pipeline/best_price
{
"description": "return the best price between the 2 last",
"processors": [
{
"script": {
"lang": "painless",
"source": "def prices = ctx.price_history; def length = prices.length; ctx.best_price = prices[length - 1].price > prices[length - 2].price ? prices[length - 2].price : prices[length - 1].price"
}
}
]
}
Reindexing the data to have the new field
POST _reindex
{
"source": {
"index": "test_uzer"
},
"dest": {
"index": "test_uzer_new",
"pipeline": "best_price"
}
}
Add the ingest pipeline as default to apply to all the new documents
PUT test_uzer_new/_settings
{
"index": {
"default_pipeline": "best_price"
}
}
Ingest document to test
POST test_uzer_new/_doc
{
"price_history": [
{
"id": 0,
"price": 2,
"date": "2021-02-07"
},
{
"id": 1,
"price": 3,
"date": "2021-02-08"
},
{
"id": 2,
"price": 1,
"date": "2021-02-09"
}
]
}
best_price should be 1
POST test_uzer_new/_search
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "test_uzer_new",
"_type" : "_doc",
"_id" : "a8oz-ncBRP0FeAG5geN1",
"_score" : 1.0,
"_source" : {
"best_price" : 16.99,
"price_history" : [
{
"date" : "2021-02-07",
"price" : 16.99,
"id" : 0
},
{
"date" : "2021-02-08",
"price" : 20.99,
"id" : 1
},
{
"date" : "2021-02-09",
"price" : 16.99,
"id" : 2
}
]
}
},
{
"_index" : "test_uzer_new",
"_type" : "_doc",
"_id" : "bMpE-ncBRP0FeAG54ONR",
"_score" : 1.0,
"_source" : {
"best_price" : 15.99,
"price_history" : [
{
"date" : "2021-02-07",
"price" : 1.99,
"id" : 0
},
{
"date" : "2021-02-08",
"price" : 15.99,
"id" : 1
},
{
"date" : "2021-02-09",
"price" : 16.99,
"id" : 2
}
]
}
},
{
"_index" : "test_uzer_new",
"_type" : "_doc",
"_id" : "bspJ-ncBRP0FeAG59uOi",
"_score" : 1.0,
"_source" : {
"best_price" : 1,
"price_history" : [
{
"date" : "2021-02-07",
"price" : 2,
"id" : 0
},
{
"date" : "2021-02-08",
"price" : 3,
"id" : 1
},
{
"date" : "2021-02-09",
"price" : 1,
"id" : 2
}
]
}
}
]
}
}
Works!
Of course there are many ways to achieve what you want, but the point is index and calculate as much data as you can before querying. This will make your searches faster.

How to get last and first document ids by given criteria

I have some documents indexed on Elasticsearch, looking like these samples:
{"id":"1","isMigrated":true}
{"id":"2","isMigrated":true}
{"id":"3","isMigrated":false}
{"id":"4","isMigrated":false}
how can i get in one query the last migrated id and first not migrated id?
Any ideas?
Filter aggregation and top_hits aggregation can be used to get last migrated and first not migrated
{
"size": 0,
"aggs": {
"migrated": {
"filter": { --> filter where isMigrated:true
"term": {
"isMigrated": true
}
},
"aggs": {
"last_migrated": { --> get first documents sorted on id in descending order
"top_hits": {
"size": 1,
"sort": [{"id.keyword":"desc"}]
}
}
}
},
"not_migrated": {
"filter": {
"term": {
"isMigrated": false
}
},
"aggs": {
"first_not_migrated": {
"top_hits": {
"size": 1,
"sort": [{"id.keyword":"asc"}] -->any keyword field can be used to sort
}
}
}
}
}
}
Result:
"aggregations" : {
"not_migrated" : {
"doc_count" : 2,
"first_not_migrated" : {
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "index86",
"_type" : "_doc",
"_id" : "TxuKUHIB8mx5yKbJ_rGH",
"_score" : null,
"_source" : {
"id" : "3",
"isMigrated" : false
},
"sort" : [
"3"
]
}
]
}
}
},
"migrated" : {
"doc_count" : 2,
"last_migrated" : {
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "index86",
"_type" : "_doc",
"_id" : "ThuKUHIB8mx5yKbJ87HF",
"_score" : null,
"_source" : {
"id" : "2",
"isMigrated" : true
},
"sort" : [
"2"
]
}
]
}
}
}
}
You can store the timestamp information with each document and query based on the latest timestamp and isMigrated: true condition.
As per comment, https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-bool-query.html can you used to combine multiple boolean conditions.

How can i extend an elastic search date range histogram aggregation query?

Hi I have an elastic search index named mep-report.
Each document has a status field. The possible values for status fields are "ENROUTE", "SUBMITTED", "DELIVERED", "FAILED" . Below is the sample elastic search index with 6 documents.
{
"took" : 10,
"timed_out" : false,
"_shards" : {
"total" : 13,
"successful" : 13,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1094313,
"max_score" : 1.0,
"hits" : [
{
"_index" : "mep-reports-2019.09.11",
"_type" : "doc",
"_id" : "68e8e03f-baf8-4bfc-a920-58e26edf835c-353899837500",
"_score" : 1.0,
"_source" : {
"status" : "ENROUTE",
"#timestamp" : "2019-09-11T10:21:26.000Z"
},
{
"_index" : "mep-reports-2019.09.11",
"_type" : "doc",
"_id" : "68e8e03f-baf8-4bfc-a920-58e26edf835c-353899837501",
"_score" : 1.0,
"_source" : {
"status" : "ENROUTE",
"#timestamp" : "2019-09-11T10:21:26.000Z"
},
{
"_index" : "mep-reports-2019.09.11",
"_type" : "doc",
"_id" : "68e8e03f-baf8-4bfc-a920-58e26edf835c-353899837502",
"_score" : 1.0,
"_source" : {
"status" : "SUBMITTED",
"#timestamp" : "2019-09-11T10:21:26.000Z"
}
},
{
"_index" : "mep-reports-2019.09.11",
"_type" : "doc",
"_id" : "68e8e03f-baf8-4bfc-a920-58e26edf835c-353899837503",
"_score" : 1.0,
"_source" : {
"status" : "DELIVERED",
"#timestamp" : "2019-09-11T10:21:26.000Z"
}
},
{
"_index" : "mep-reports-2019.09.11",
"_type" : "doc",
"_id" : "68e8e03f-baf8-4bfc-a920-58e26edf835c-353899837504",
"_score" : 1.0,
"_source" : {
"status" : "FAILED",
"#timestamp" : "2019-09-11T10:21:26.000Z"
},
{
"_index" : "mep-reports-2019.09.11",
"_type" : "doc",
"_id" : "68e8e03f-baf8-4bfc-a920-58e26edf835c-353899837504",
"_score" : 1.0,
"_source" : {
"status" : "FAILED",
"#timestamp" : "2019-09-11T10:21:26.000Z"
}
}
}
I would like to find an aggregation histogram distribution something like to get messages_processed, message_delivered,messages_failed .
messages_processed : 3 ( 2 documents in status ENROUTE + 1 Document with status SUBMITTED )
message_delivered 1 ( 1 document with status DELIVERED )
messages_failed : 2 ( 2 documents with status FAILED )
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 13,
"successful" : 13,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 21300,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"performance_over_time" : {
"buckets" : [
{
"key_as_string" : "2020-02-21",
"key" : 1582243200000,
"doc_count" : 6,
"message_processed": 3,
"message_delivered": 1,
"message_failed": 2
}
]
}
}
}
So the following is my current query and i would like to modify it to get some additional statistics such as message_processed , message_delivered, message_failed. kindly let me know .
{ "size": 0, "query": { "bool": { "must": [ { "range": { "#timestamp": { "from": "2020-02-21T00:00Z", "to": "2020-02-21T23:59:59.999Z", "include_lower": true, "include_upper": true, "format": "yyyy-MM-dd'T'HH:mm:ss.SSSZ ||yyyy-MM-dd'T'HH:mmZ", "boost": 1.0 } } } ], "adjust_pure_negative": true, "boost": 1.0 } }, "aggregations": { "performance_over_time": { "date_histogram": { "field": "#timestamp", "format": "yyyy-MM-dd", "interval": "1d", "offset": 0, "order": { "_key": "asc" }, "keyed": false, "min_doc_count": 0 } } } }
You are almost there with the query, you just need to add Terms Aggregation and looking at your request, I've come up with a Scripted Terms Aggregation.
I've also modified the date histogram aggregation field interval to calendar_interval so that you get the values as per the calendar date.
Query Request:
POST <your_index_name>/_search
{
"size": 0,
"query":{
"bool":{
"must":[
{
"range":{
"#timestamp":{
"from":"2019-09-10",
"to":"2019-09-12",
"include_lower":true,
"include_upper":true,
"boost":1.0
}
}
}
],
"adjust_pure_negative":true,
"boost":1.0
}
},
"aggs":{
"message_processed":{
"date_histogram": {
"field": "#timestamp",
"calendar_interval": "1d" <----- Note this
},
"aggs": {
"my_messages": {
"terms": {
"script": { <----- Core Logic of Terms Agg
"source": """
if(doc['status'].value=="ENROUTE" || doc['status'].value == "SUBMITTED"){
return "message_processed";
}else if(doc['status'].value=="DELIVERED"){
return "message_delivered"
}else {
return "message_failed"
}
""",
"lang": "painless"
},
"size": 10
}
}
}
}
}
}
Note that the core logic what you are looking for is inside the scripted terms aggregation. Logic is self explainable if you go through it. Feel free to modify the logic that fits you.
For the sample date you've shared, you would get the result in the below format:
Response:
{
"took" : 144,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 6,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"message_processed" : {
"buckets" : [
{
"key_as_string" : "2019-09-11T00:00:00.000Z",
"key" : 1568160000000,
"doc_count" : 6,
"my_messages" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "message_processed",
"doc_count" : 3
},
{
"key" : "message_failed",
"doc_count" : 2
},
{
"key" : "message_delivered",
"doc_count" : 1
}
]
}
}
]
}
}
}

Elasticsearch, terms aggs according to sibling nested fields

Elasticsearch v7.5
Hello and good day!
We have 2 indices named socialmedia and influencers
Sample contents:
socialmedia:
{
'_id' : 1001,
'title' : "Title 1",
'smp_id' : 1,
"latest" : [
{
"soc_mm_score" : "5",
}
]
},
{
'_id' : 1002,
'title' : "Title 2",
'smp_id' : 2,
"latest" : [
{
"soc_mm_score" : "10",
}
]
},
{
'_id' : 1003,
'title' : "Title 3",
'smp_id' : 3,
"latest" : [
{
"soc_mm_score" : "35",
}
]
},
{
'_id' : 1004,
'title' : "Title 4",
'smp_id' : 2,
"latest" : [
{
"soc_mm_score" : "30",
}
]
}
//omitted some other fields
influencers:
{
'_id' : 1,
'name' : "John",
'smp_id' : 1
},
{
'_id' : 2,
'name' : "Peter",
'smp_id' : 2
},
{
'_id' : 3,
'name' : "Mark",
'smp_id' : 3
}
Now I have this simple query that determines which documents in the socialmedia index has the most latest.soc_mm_score value, and also displaying their corresponding influencers determined by the smp_id
GET socialmedia/_search
{
"size": 0,
"_source": "latest",
"query": {
"match_all": {}
},
"aggs": {
"LATEST": {
"nested": {
"path": "latest"
},
"aggs": {
"MM_SCORE": {
"terms": {
"field": "latest.soc_mm_score",
"order": {
"_key": "desc"
},
"size": 3
},
"aggs": {
"REVERSE": {
"reverse_nested": {},
"aggs": {
"SMP_ID": {
"top_hits": {
"_source": ["smp_id"],
"size": 1
}
}
}
}
}
}
}
}
}
}
SAMPLE OUTPUT:
"aggregations" : {
"LATEST" : {
"doc_count" : //omitted,
"MM_SCORE" : {
"doc_count_error_upper_bound" : //omitted,
"sum_other_doc_count" : //omitted,
"buckets" : [
{
"key" : 35,
"doc_count" : 1,
"REVERSE" : {
"doc_count" : 1,
"SMP_ID" : {
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "socialmedia",
"_type" : "index",
"_id" : "1003",
"_score" : 1.0,
"_source" : {
"smp_id" : "3"
}
}
]
}
}
}
},
{
"key" : 30,
"doc_count" : 1,
"REVERSE" : {
"doc_count" : 1,
"SMP_ID" : {
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "socialmedia",
"_type" : "index",
"_id" : "1004",
"_score" : 1.0,
"_source" : {
"smp_id" : "2"
}
}
]
}
}
}
},
{
"key" : 10,
"doc_count" : 1,
"REVERSE" : {
"doc_count" : 1,
"SMP_ID" : {
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "socialmedia",
"_type" : "index",
"_id" : "1002",
"_score" : 1.0,
"_source" : {
"smp_id" : "2"
}
}
]
}
}
}
}
]
}
}
}
with the query above, I was able to successfully display which documents have the highest latest.soc_mm_score values
The sample output above only displays DOCUMENTS, telling that the influencers (a.k.a smp_id) related to them are the TOP INFLUENCERS according to latest.soc_mm_score
Ideally just by using this aggs query,
"terms" : {
"field" : "smp_id"
}
portrays the concept of which influencers are the top according to the doc_count
Now, displaying the terms query according to latest.soc_mm_score displays TOP DOCUMENTS
"terms" : {
"field" : "latest.soc_mm_score"
}
REAL OBJECTIVE:
I want to display the TOP INFLUENCERS according to the latest.soc_mm_count in the socialmedia index. If Elasticsearch can count all the documents where according to unique smp_id, is there a way for ES to sum all latest.soc_mm_score values and use it as terms?
My objective above should output these:
smp_id 2 as the Top Influencer because he has 2 posts (with soc_mm_score of 30 and 10), adding them gets him 40 soc_mm_score
smp_id 3 as the 2nd Top Influencer, he has 1 post with 35 soc_mm_score
smp_id 1 as the 3rd Top Influencer, he has 1 post with 5 soc_mm_score
Is there a proper query to meet this objective?
FINALLY! FOUND AN ANSWER!!!
"aggs": {
"INFS": {
"terms": {
"field": "smp_id.keyword",
"order": {
"LATEST > SUM_SVALUE": "desc"
}
},
"aggs": {
"LATEST": {
"nested": {
"path": "latest"
},
"aggs": {
"SUM_SVALUE": {
"sum" : {
"field": "latest.soc_mm_score"
}
}
}
}
}
}
}
Displays the following sample:

Resources