ElasticSearch range in sum aggregation - elasticsearch

I'm a new user of elasticsearch and I would like make a range on sum aggregation.
So, I have :
{
"query": {},
"aggs": {
"group_by_trainset" : {
"terms": {
"field": "trainset",
"order": { "sum_compteur": "desc" }
},
"aggs": {
"sum_compteur": {
"sum": {
"field": "compteur"
}
}
}
}
}
}
And I have a 10 first results.
I want a pagination or it's not possible to aggs on elasticsearch. I try to return the next 10 results.
So, I want display the 10 results that are lower than the lowest value of the "sum_compteur" of the first 10 results and I don't know how.
Thanks for your help !

For every hit you'll get same Aggregations given input parameters are not changes.
If you want to specify size in aggregation counts you can do is:
"aggs": {
"sum_compteur": {
"sum": {
"field": "compteur",
"size" : 1000,
"order" : { "_count" : "asc" }
}
}
}
Where *1000 is the no of aggregation values you need.
You can also sort the results using "order". And later add pagination in the output array..

Related

Get max bucket of terms aggregation (with pipeline aggregation)

I was wondering how to get the bucket with the highest doc_count when using a terms aggregation with Elasticsearch. I'm using the Kibana sample data kibana_sample_data_flights:
GET kibana_sample_data_flights/_search
{
"size": 0,
"aggs": {
"destinations": {
"terms": {
"field": "DestCityName"
}
}
}
}
If there was a single bucket with the max doc_count value I could set the size of the terms aggregation to 1, however this doesn't work if there are two buckets with the same max doc_count value.
Since I came accross pipeline aggregations, I feel there should be an easy way to achieve this. The max bucket aggregation seems to be able to deal with multiple max buckets, since the guide says this:
[...] which identifies the bucket(s) with the maximum value of [...]
However the only way to make this work was using a work-around with a sub-aggregation using value_count:
GET kibana_sample_data_flights/_search
{
"size": 0,
"aggs": {
"destinations": {
"terms": {
"field": "DestCityName"
},
"aggs": {
"counter": {
"value_count": {
"field": "_id"
}
}
}
},
"max_destination": {
"max_bucket": {
"buckets_path": "destinations>counter"
}
}
}
}
a) Is there a better way in general, to find the terms bucket with the max value?
b) Is there a better way using pipeline aggrations?
Thanks in advance!
Well you can simplify as below and you don't need to make use of value_count aggregation.
However, unfortunately using max_bucket is the only way to get what you are looking for.
POST <your_index_name>/_search
{
"size": 0,
"aggs": {
"destinations": {
"terms": {
"field": "DestCityName"
}
},
"max_destination": {
"max_bucket": {
"buckets_path": "destinations>_count" <---- Note the usage of _count
}
}
}
}
Hope this helps!

elasticsearch Need average per week of some value

I have simple data as
sales, date_of_sales
I need is average per week i.e. sum(sales)/no.of weeks.
Please help.
What i have till now is
{
"size": 0,
"aggs": {
"WeekAggergation": {
"date_histogram": {
"field": "date_of_sales",
"interval": "week"
}
},
"TotalSales": {
"sum": {
"field": "sales"
}
},
"myValue": {
"bucket_script": {
"buckets_path": {
"myGP": "TotalSales",
"myCount": "WeekAggergation._bucket_count"
},
"script": "params.myGP/params.myCount"
}
}
}
}
I get the error
Invalid pipeline aggregation named [myValue] of type [bucket_script].
Only sibling pipeline aggregations are allowed at the top level.
I think this may help:
{
"size": 0,
"aggs": {
"WeekAggergation": {
"date_histogram": {
"field": "date_of_sale",
"interval": "week",
"format": "yyyy-MM-dd"
},
"aggs": {
"TotalSales": {
"sum": {
"field": "sales"
}
},
"AvgSales": {
"avg": {
"field": "sales"
}
}
}
},
"avg_all_weekly_sales": {
"avg_bucket": {
"buckets_path": "WeekAggergation>TotalSales"
}
}
}
}
Note the TotalSales aggregation is now a nested aggregation under the weekly histogram aggregation (I believe there was a typo in the code provided - the simple schema provided indicated the field name of date_of_sale and the aggregation provided uses the plural form date_of_sales). This provides you a total of all sales in the weekly bucket.
Additionally, AvgSales provides a similar nested aggregation under the weekly histogram aggregation so you can see the average of all sales specific to that week.
Finally, the pipeline aggregation avg_all_weekly_sales will give the average of weekly sales based on the TotalSales bucket and the number of non-empty buckets - if you want to include empty buckets, add the gap_policy parameter like so:
...
"avg_all_weekly_sales": {
"avg_bucket": {
"buckets_path": "WeekAggergation>TotalSales",
"gap_policy": "insert_zeros"
}
}
...
(see: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-pipeline-avg-bucket-aggregation.html).
This pipeline aggregation may or may not be what you're actually looking for, so please check the math to ensure the result is what is expected, but should provide the correct output based on the original script.

Unexpected results when using min sub-aggregation in Elasticsearch

My documents include the fields name and date_year, and my goal is to find the most recently added names (e.g. the ten last added names with their first year of appearance and the total number of documents). I therefore have a terms aggregation on name, which is ordered by a min sub-aggregation on date_year:
{
"aggs": {
"group_by_name": {
"terms": {
"field": "name",
"order": {
"start_year": "desc"
}
},
"aggs": {
"start_year": {
"min": {
"field": "date_year"
}
}
}
}
}
}
This is returning unexpected results, when not adding size under terms. For example, the first bucket has doc_count 1 and start_year 2015, while I'm sure that there are tens of documents with this name, and the earliest date_year is 1870. When I add a large enough size, the results are accurate. For example:
{
"aggs": {
"group_by_name": {
"terms": {
"field": "name",
"size": 10000, <------ large enough value
"order": {
"start_year": "desc"
}
},
"aggs": {
"start_year": {
"min": {
"field": "date_year"
}
}
}
}
}
}
Can anyone explain to me what is causing this, and how I can limit the number of buckets returned? What I need would look something like this in SQL:
select name, min(year), count(*) from documents group by name order by min(year) desc limit 10

elastic search by day aggregation, sum of two properties

I'm trying to aggregate on the sum of two fields, but can't seem to get the syntax right.
Let's say I have the following aggregation:
{
"aggregations": {
"byDay": {
"date_histogram": {
"field": "#timestamp",
"interval": "1d"
},
"aggregations": {
"sum_a": {
"sum": {
"field": "a"
}
},
"sum_b": {
"sum": {
"field": "b"
}
},
"sum_a_and_b": {
/* what goes here? */
}
}
}
}
}
What I really want is an aggregation that is the sum of fields a and b.
It seem like something that would be simple, but I've hit a brick wall trying to get it right. Online examples have either been too simple (summing only on one field), or tried to do much more than this, so I've not found them helpful.
Try Terms Aggregation generating the terms using a script :
"aggs": {
"sum_a_and_b": {
"terms": {
"script": "doc['a'].value + doc['b'].value"
}
}
}
In order to enable dynamic scripting add the following to your config file (elasticsearch.yml by default) :
script.aggs: true # enable just for aggregations

Getting cardinality of multiple fields?

How can I get count of all unique combinations of values of 2 fields that are present in documents of my database, i.e. achieve the same functionality as the "cardinality" aggregation provides, but for more than 1 field?
You can use a script to achieve this. Assuming the character '#' is not present in any value of both the fields (you can use anything else to act as a separator), the query you're looking for is as under. Mind you, scripting will come with a performance hit.
{
"aggs" : {
"multi_field_cardinality" : {
"cardinality" : {
"script": "doc['<field1>'].value + '#' + doc['<field2'].value"
}
}
}
}
Read more about it here.
A better solution is to use nested aggregations and then count the resulting buckets.
"aggs": {
"Group1": {
"terms": {
"field": "Field1",
"size": 0
},
"aggs": {
"Group2": {
"terms": {
"field": "Field2",
"size": 0
}
}
}
}
}

Resources