Sorting after aggregation in Elasticsearch - sorting

I have docs with this structure:
{
FIELD1:string,
FIELD2:
[ {SUBFIELD:number}, {SUBFIELD:number}...]
}
I want to sort on the result of the sum of numbers in FIELD2.SUBFIELDs:
GET myindex/_search
{
"size":0,
"aggs": {
"a1": {
"terms": {
"field": "FIELD1",
"size":0
},
"aggs":{
"a2":{
"sum":{
"field":"FIELD2.SUBFIELD"
}
}
}
}
}
}
If I do this I obtain buckets not sorted, but I want buckets sorted by "a2" value.
How I can do this?
Thank you!

You almost had it. You just need to add an order property to your a1 terms aggregations, like this:
GET myindex/_search
{
"size":0,
"aggs": {
"a1": {
"terms": {
"field": "FIELD1",
"size":0,
"order": {"a2": "desc"} <--- add this
},
"aggs":{
"a2":{
"sum":{
"field":"FIELD2.SUBFIELD"
}
}
}
}
}
}

Brilliant from Val https://stackoverflow.com/users/4604579/val
Basically the same thing, but here's what worked for me to find the largest "size" for each "name", and to show the top 25 largest:
{
"size": 0,
"aggs": {
"agg1": {
"terms": {
"field": "name.keyword",
"order": {
"agg2": "desc"
},
"size": 25
},
"aggs": {
"agg2": {
"max": {
"field": "size"
}
}
}
}
}
}

Related

How to define percentage of result items with specific field in Elasticsearch query?

I have a search query that returns all items matching users that have type manager or lead.
{
"from": 0,
"size": 20,
"query": {
"bool": {
"should": [
{
"terms": {
"type": ["manager", "lead"]
}
}
]
}
}
}
Is there a way to define what percentage of the results should be of type "manager"?
In other words, I want the results to have 80% of users with type manager and 20% with type lead.
I want to make a suggestion to use bucket_path aggregation. As I know this aggregation needs to be run in sub-aggs of a histogram aggregation. As you have such field in your mapping so I think this query should work for you:
{
"size": 0,
"aggs": {
"NAME": {
"date_histogram": {
"field": "my_datetime",
"interval": "month"
},
"aggs": {
"role_type": {
"terms": {
"field": "type",
"size": 10
},
"aggs": {
"count": {
"value_count": {
"field": "_id"
}
}
}
},
"role_1_ratio": {
"bucket_script": {
"buckets_path": {
"role_1": "role_type['manager']>count",
"role_2": "role_type['lead']>count"
},
"script": "params.role_1 / (params.role_1+params.role_2)*100"
}
},
"role_2_ratio": {
"bucket_script": {
"buckets_path": {
"role_1": "role_type['manager']>count",
"role_2": "role_type['lead']>count"
},
"script": "params.role_2 / (params.role_1+params.role_2)*100"
}
}
}
}
}
}
Please let me know if it didn't work well for you.

Elasticsearch sort terms agg by arbitrary order

I have a terms aggregation and they want some specific values to always be at the top.
Like:
POST _search
{ "size": 0,
"aggs": {
"pets": {
"terms": {
"field": "species",
"order": "Dogs, Cats"
}
}
}
}
Where the results would be like "Dog", "Cat", "Iguana".
Dog and Cat at the top and everything else below.
Is this possible without scripting?
Thanks!
One way to do it is by filtering values in the terms aggregation. You'd create two terms aggregations, one with the desired terms and another with all other terms.
{
"size": 0,
"aggs": {
"top_terms": {
"terms": {
"field": "species",
"include": ["Dogs", "Cats"],
"order": { "_key" : "desc" }
}
},
"other_terms": {
"terms": {
"field": "species",
"exclude": ["Dogs", "Cats"]
}
}
}
}
Try it out
A script wouldn't be too complicated though -- first boost the two species, then sort by the scores first and then by _count:
GET pets/_search
{
"size": 0,
"query": {
"bool": {
"should": [
{
"terms": {
"species": [
"dog",
"cat"
],
"boost": 10
}
},
{
"match_all": {}
}
]
}
},
"aggs": {
"pets": {
"terms": {
"field": "species.keyword",
"order": [
{
"max_score": "desc"
},
{
"_count": "desc"
}
]
},
"aggs": {
"max_score": {
"max": {
"script": "_score"
}
}
}
}
}
}

How to use cumulative_sum with a previous aggregation?

I would like to plot a cumulative sum of some events, per day. The cumulative sum aggregation seems to be the way to go so I tried to reuse the example given in the docs.
The first aggregation works fine, the following query
{
"aggs": {
"vulns_day" : {
"date_histogram" :{
"field": "HOST_START_iso",
"interval": "day"
}
}
}
}
gives replies such as
(...)
{
"key_as_string": "2016-09-08T00:00:00.000Z",
"key": 1473292800000,
"doc_count": 76330
},
{
"key_as_string": "2016-09-09T00:00:00.000Z",
"key": 1473379200000,
"doc_count": 37712
},
(...)
I then wanted to query the cumulative sum of doc_count above via
{
"aggs": {
"vulns_day" : {
"date_histogram" :{
"field": "HOST_START_iso",
"interval": "day"
}
},
"aggs": {
"vulns_cumulated": {
"cumulative_sum": {
"buckets_path": "doc_count"
}
}
}
}
}
but it gives an error:
"reason": {
"type": "search_parse_exception",
"reason": "Could not find aggregator type [vulns_cumulated] in [aggs]",
I see that bucket_path should point to the elements to be summed and the example for cumulative aggregations created a specific intermediate sum but I do not have anything to sum (beside doc_count).
I guess, you should change your query like this:
{
"aggs": {
"vulns_day": {
"date_histogram": {
"field": "HOST_START_iso",
"interval": "day"
},
"aggs": {
"document_count": {
"value_count": {
"field": "HOST_START_iso"
}
},
"vulns_cumulated": {
"cumulative_sum": {
"buckets_path": "document_count"
}
}
}
}
}
}
I found the solution. Since doc_count did not seem to be available, I tried to retrieve stats for the time parameter, and use its count value. It worked:
{
"size": 0,
"aggs": {
"vulns_day": {
"date_histogram": {
"field": "HOST_START_iso",
"interval": "day"
},
"aggs": {
"dates_stats": {
"stats": {
"field": "HOST_START_iso"
}
},
"vulns_cumulated": {
"cumulative_sum": {
"buckets_path": "dates_stats.count"
}
}
}
}
}
}

Elasticsearch Aggregation Max Value

{
"aggs":{
"nest_exams":{
"nested":{
"path": "exams"
},
"aggs":{
"exams":{
"filter" : {
"term": {
"exams.exam_id": 96690
}
},
"aggs":{
"nested_attempts":{
"nested":{
"path": "exams.attempts"
},
"aggs":{
"user_attempts":{
"terms":{
"field": "exams.attempts.user_id",
"size": 0
},
"aggs":{
"max_score":{
"max":{
"field": "exams.attempts.order_score"
}
}
}
}
}
}
}
}
}
}
}
Hello, I have this aggregation query. The problem is that even I can found the max_score per user, I can't sub aggregate to the max aggregator to find the date of this best score.
An attempt have user_id,order_score,date_start
An alternative is to not run the max metric sub-aggregation, but a top_hits instead sorted by descending max_score so you can retrieve the date_start of that document:
{
"aggs": {
"nest_exams": {
"nested": {
"path": "exams"
},
"aggs": {
"exams": {
"filter": {
"term": {
"exams.exam_id": 96690
}
},
"aggs": {
"nested_attempts": {
"nested": {
"path": "exams.attempts"
},
"aggs": {
"user_attempts": {
"terms": {
"field": "exams.attempts.user_id",
"size": 0
},
"aggs": {
"max_score": {
"top_hits": {
"sort": {
"exams.attempts.order_score": "desc"
},
"size": 1,
"_source": [
"date_start"
]
}
}
}
}
}
}
}
}
}
}
}
}

For each country/colour/brand combination , find sum of number of items in elasticsearch

This is a portion of the data I have indexed in elasticsearch:
{
"country" : "India",
"colour" : "white",
"brand" : "sony"
"numberOfItems" : 3
}
I want to get the total sum of numberOfItems on a per country basis, per colour basis and per brand basis. Is there any way to do this in elasticsearch?
The following should land you straight to the answer.
Make sure you enable scripting before using it.
{
"aggs": {
"keys": {
"terms": {
"script": "doc['country'].value + doc['color'].value + doc['brand'].value"
},
"aggs": {
"keySum": {
"sum": {
"field": "numberOfItems"
}
}
}
}
}
}
To get a single result you may use sum aggregation applied to a filtered query with term (terms) filter, e.g.:
{
"query": {
"filtered": {
"filter": {
"term": {
"country": "India"
}
}
}
},
"aggs": {
"total_sum": {
"sum": {
"field": "numberOfItems"
}
}
}
}
To get statistics for all countries/colours/brands in a single pass over the data you may use the following query with 3 multi-bucket aggregations, each of them containing a single-bucket sum sub-aggregation:
{
"query": {
"match_all": {}
},
"aggs": {
"countries": {
"terms": {
"field": "country"
},
"aggs": {
"country_sum": {
"sum": {
"field": "numberOfItems"
}
}
}
},
"colours": {
"terms": {
"field": "colour"
},
"aggs": {
"colour_sum": {
"sum": {
"field": "numberOfItems"
}
}
}
},
"brands": {
"terms": {
"field": "brand"
},
"aggs": {
"brand_sum": {
"sum": {
"field": "numberOfItems"
}
}
}
}
}
}

Resources