Elasticsearch, group by field and calculate average value for another field - elasticsearch

Mapping:
player_id: int
stat_date: date
some_param: int
I need to calculate average value for "some_param" per each player_id used row with max "stat_date" in case of several rows with same player_id.
So i need average value for last date for all players
This snippet it not working because of "Aggregator [average_val] of type [avg] cannot accept sub-aggregations"
get test/test/_search
{
"size":0,
"aggs": {
"average_val":{
"avg": {
"field": "some_param"
},
"aggs": {
"by_player": {
"terms": { "field" : "player_id" },
"aggs" : {
"by_date" : {
"max" : { "field" : "stat_date" }
}
}
}
}
}
}
}
Simpliest way is use simple avg
get test/test/_search
{
"size":0,
"aggs": {
"averages": {
"avg": {
"field": "some_param"
}
}
}
}
But i need to calc avg player "some_param" only for last stat dates.

I think you would just need to reverse the order of your aggregation. Put avg aggregation in the deepest aggregation and it should work fine.
There are two major types of aggregation. Avg is a Metrics Aggregation and it outputs metrics (numbers). You would need to put Bucket Aggregations ( like terms aggregation) in the outside and do metrics aggregations for their output.

Related

How to also display the values within the bucket that considered during aggregation?

I need to aggregate records based on the created_date. So based on each created date, there are group of records right?. Now, Could someone tell me how to display the created date as well along with each set of results.?
"aggs": {
"by_created_date": {
"terms": {
"field": "createddate"
},
_source["createddate"] //Something like this. so that i can see what date it has used.
"aggs": {
....
}, //Also may need to use some aggregation on this level.
},
}
aggs":{
"by_created_date":{
"terms":{
"field":"createddate.keyword",
"size":1000
},
"aggs":{
"bucket" : {
"terms" : {
"field" : "field_name",
"size": 10
}
}
}
}
}
terms is used for grouping a field.
So, for nested grouping...you have to write nested aggregation like upper code.

Elasticsearch 5 (Searchkick) Aggregation Bucket Averages

We have an ES index holding scores given for different products. What we're trying to do is aggregate on product names and then get the average scores for each of product name 'buckets'. Currently the default aggregation functionality only gives us the counts for each bucket - is it possible to extend this to giving us average score per product name?
We've looked at pipeline aggregations but the documentation is pretty dense and doesn't seem to quite match what we're trying to do.
Here's where we've got to:
{
"aggs"=>{
"prods"=>{
"terms"=>{
"field"=>"product_name"
},
"aggs"=>{
"avgscore"=>{
"avg"=>{
"field"=>"score"
}
}
}
}
}
}
Either this is wrong, or could it be that there's something in how searckick compiles its ES queries that is breaking things?
Thanks!
Think this is the pipeline aggregation you want...
POST /_search
{
"size": 0,
"aggs": {
"product_count" : {
"terms" : {
"field" : "product"
},
"aggs": {
"total_score": {
"sum": {
"field": "score"
}
}
}
},
"avg_score": {
"avg_bucket": {
"buckets_path": "product_count>total_score"
}
}
}
}
Hopefully I have that the right way round, if not - switch the first two buckets.

ElasticSearch range in sum aggregation

I'm a new user of elasticsearch and I would like make a range on sum aggregation.
So, I have :
{
"query": {},
"aggs": {
"group_by_trainset" : {
"terms": {
"field": "trainset",
"order": { "sum_compteur": "desc" }
},
"aggs": {
"sum_compteur": {
"sum": {
"field": "compteur"
}
}
}
}
}
}
And I have a 10 first results.
I want a pagination or it's not possible to aggs on elasticsearch. I try to return the next 10 results.
So, I want display the 10 results that are lower than the lowest value of the "sum_compteur" of the first 10 results and I don't know how.
Thanks for your help !
For every hit you'll get same Aggregations given input parameters are not changes.
If you want to specify size in aggregation counts you can do is:
"aggs": {
"sum_compteur": {
"sum": {
"field": "compteur",
"size" : 1000,
"order" : { "_count" : "asc" }
}
}
}
Where *1000 is the no of aggregation values you need.
You can also sort the results using "order". And later add pagination in the output array..

Aggregation on top 100 documents sorted by a field

I would like to do a terms aggregation on top 100 documents sorted on a field (not relevance score!).
I know how to do the aggregation:
{
"query": {
"match_all" : {}
},
"aggs" : {
"mydata_agg" : {
"terms": {
"field" : "title"
}
}
}
}
and I know how to get top 100 documents sorted on a field:
{
"query": {
"match_all": {}
},
"sort": {
"units_sold": {
"order": "desc"
}
},
"size": 100
}
But how do I run the terms aggregation on those 100 sorted documents? I could use a range filter but then I need to specify myself the cutoff value of units_sold that results in top 100 documents. results. I prefer to do everything in one query. Is that possible?
I have searched for couple hours but was unable to find a solution.
The term aggregation creates buckets, and we need to sort the outcome of the first aggregation. this can be done using bucket_sort.Read this article for more information.

Elasticsearch - calculate percentage in nested aggregations in relation to parent bucket

Updated question
In my query I aggregate on date and then on sensor name. It is possible to calculate a ratio from a nested aggregation and the total count of documents (or any other aggregation) of the parent bucket? Example query:
{
"size": 0,
"aggs": {
"over_time": {
"aggs": {
"by_date": {
"date_histogram": {
"field": "date",
"interval": "1d",
"min_doc_count": 0
},
"aggs": {
"measure_count": {
"cardinality": {
"field": "date"
}
},
"all_count": {
"value_count": {
"field": "name"
}
},
"by_name": {
"terms": {
"field": "name",
"size": 0
},
"aggs": {
"count_by_name": {
"value_count": {
"field": "name"
}
},
"my ratio": count_by_name / all_count * 100 <-- How to do that?
}
}
}
}
}
}
}
}
I want a custom metric that gives me the ratio count_by_name / all_count * 100. Is that possible in ES, or do I have to compute that on the client?
This seems very simple to me, but I haven't found a way yet.
Old post:
Is there a way to let Elasticsearch consider the overall count of documents (or any other metric) when calculating the average for a bucket?
Example:
I have like 100000 sensors that generate events on different times. Every event is indexed as a document that has a timestamp and a value.
When I want to calculate a ratio of the value and a date histogram, and some sensors only generated values at one time, I want Elasticsearch to treat the not existing values(documents) for my sensors as 0 instead of null.
So when aggregating by day and a sensor only has generated two values at 10pm (3) and 11pm (5), the aggregate for the day should be (3+5)/24, or formal: SUM(VALUE)/24.
Instead, Elasticsearch calculates the average like (3+5)/2, which is not correct in my case.
There was once a ticket on Github https://github.com/elastic/elasticsearch/issues/9745, but the answer was "handle it in your application". That's no answer for me, as I would have to generate zillions of zero-Value documents for every sensor/time combination to get the average ratio right.
Any ideas on this?
If this is the case , simply divide the results by 24 from application side.And when granularity change , change this value accordingly. Number of hours per day is fixed right ....
You can use the Bucket script aggregation to do what you want.
{
"bucket_script": {
"buckets_path": {
"count_by_name": "count_by_name",
"all_count": "all_count"
},
"script": "count_by_name / all_count*100"
}
}
It's just an example.
https://www.elastic.co/guide/en/elasticsearch/reference/2.4/search-aggregations-pipeline-bucket-script-aggregation.html

Resources