how to get the total value of field in kibana 4.x - elasticsearch

I need to get the total value of field in Kibana using script field.
e.g) if id field have values like 1, 2, 3, 4, 5
I am looking to sum up all the values of id, I am expecting the output is 15.
I need to achieve the below formula after getting total of each field.
lifetime=a-b-c-(d-e-f)*g
here a,b,c,d,e,f,g all are total of the each field values.
for more info please refer this question which is raised by me.

You could do something like this in your scripted fields:
doc['id'].value
Hence, you could use a sum aggregation to get the total value in Kibana.
This SO could be handy!
EDIT
If you're trying to do it using Elasticsearch, you could do something like this within your request body:
"aggs":{
"total":{
"sum":{
"script":"doc['id'].value"
}
}
}
You could follow up this ref, but then if you're using painless make sure you do include it within lang. related SO

You can definitely use sum aggregations to get the sum of id, but to further equate your formula, you can take a look at pipeline aggregations to use the sum value for further calculations.
Take a look at bucket script aggregation, with proper bucket path to sum aggregator you can achieve your solution.
For sample documents
{
"a":100,
"b":200,
"c":400,
"d":600
}
query
{
"size": 0,
"aggs": {
"result": {
"terms": {"script":"'nice to have it here'"},
"aggs": {
"suma": {
"sum": {
"field": "a"
}
},
"sumb": {
"sum": {
"field": "b"
}
},
"sumc": {
"sum": {
"field": "c"
}
},
"equation": {
"bucket_script": {
"buckets_path": {
"suma": "suma",
"sumb": "sumb",
"sumc" : "sumc"
},
"script": "suma + sumb + 2*sumc"
}
}
}
}
}
}
Now you can surely add term filter on each sum agg to filter the summation for each sum aggregator.

Related

Get max bucket of terms aggregation (with pipeline aggregation)

I was wondering how to get the bucket with the highest doc_count when using a terms aggregation with Elasticsearch. I'm using the Kibana sample data kibana_sample_data_flights:
GET kibana_sample_data_flights/_search
{
"size": 0,
"aggs": {
"destinations": {
"terms": {
"field": "DestCityName"
}
}
}
}
If there was a single bucket with the max doc_count value I could set the size of the terms aggregation to 1, however this doesn't work if there are two buckets with the same max doc_count value.
Since I came accross pipeline aggregations, I feel there should be an easy way to achieve this. The max bucket aggregation seems to be able to deal with multiple max buckets, since the guide says this:
[...] which identifies the bucket(s) with the maximum value of [...]
However the only way to make this work was using a work-around with a sub-aggregation using value_count:
GET kibana_sample_data_flights/_search
{
"size": 0,
"aggs": {
"destinations": {
"terms": {
"field": "DestCityName"
},
"aggs": {
"counter": {
"value_count": {
"field": "_id"
}
}
}
},
"max_destination": {
"max_bucket": {
"buckets_path": "destinations>counter"
}
}
}
}
a) Is there a better way in general, to find the terms bucket with the max value?
b) Is there a better way using pipeline aggrations?
Thanks in advance!
Well you can simplify as below and you don't need to make use of value_count aggregation.
However, unfortunately using max_bucket is the only way to get what you are looking for.
POST <your_index_name>/_search
{
"size": 0,
"aggs": {
"destinations": {
"terms": {
"field": "DestCityName"
}
},
"max_destination": {
"max_bucket": {
"buckets_path": "destinations>_count" <---- Note the usage of _count
}
}
}
}
Hope this helps!

How can I filter the counter less than a parameter in Kibana?

I have a question similar to this: How can I filter a field greater than a counter on Kibana? https://github.com/elastic/kibana/issues/9684
On this link there is a perfect answer: You need use "{'min_doc_count': X}" on your Json Input Advanced Bucket Option. Perfect, It runs exactly like I want, except because I want the oposite, something like "max_doc_count".
For my surprise, this options doesn't existis... Some one knows what would be the "max_doc_count" equivalent of?
In SQL would be something like: GROUP BY my_field HAVING COUNT(*) < 3
Thanks.
The correct way of doing this in ES is to use a bucket_selector pipeline aggregation with the special _count path.
POST /_search
{
"size": 0,
"aggs": {
"my_terms": {
"terms": {
"field": "my_field.keyword"
},
"aggs": {
"max_doc_count": {
"bucket_selector": {
"buckets_path": {
"count": "_count"
},
"script": {
"source": "params.count < 3"
}
}
}
}
}
}
}
In the results, the my_terms aggregations will only contain buckets where the document count is < 3. No need to order anything or to program your application to ignore anything.

ElasticSearch - How to Aggregate the Geometric Mean?

I'm currently aggregating records to get the average (arithmetic average) of a field in the returned records. My use case requires me to get hold of the geometric average:
The geometric mean is defined as the nth root of the product of n
How could I go about getting this value? I don't even know where to start!
Thanks!
It is not trivial, but it can be done. The idea is to use a sum of logs and then apply the n-th root:
pow(exp((sum of logs)), 1/n)
In fact, GeometricMean aggregation of Elasticsearch Index Termlist Plugin does exactly that. (However, this is a third-party plugin, I can't tell if it is stable enough.)
Mapping and sample data
Let's assume we have the following mapping:
PUT geom_mean
{
"mappings": {
"nums": {
"properties": {
"x": {
"type": "double"
}
}
}
}
}
And we insert the following documents:
{"x":33}
{"x":324}
{"x":134}
{"x":0.1}
Now we can try the query.
The ES query
Here is the query to calculate geometric mean:
POST geom_mean/nums/_search
{
"size": 0,
"aggs": {
"aggs_root": {
"terms": {
"script": "'Bazinga!'"
},
"aggs": {
"sum_log_x": {
"sum": {
"script": {
"inline": "Math.log(doc.x.getValue())"
}
}
},
"geom_mean": {
"bucket_script": {
"buckets_path": {
"sum_log_x": "sum_log_x",
"x_cnt": "_count"
},
"script": "Math.pow(Math.exp(params.sum_log_x), 1 / params.x_cnt)"
}
}
}
}
}
}
The return value will be:
"aggregations": {
"aggs_root": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Bazinga!",
"doc_count": 4,
"sum_log_x": {
"value": 11.872505784215674
},
"geom_mean": {
"value": 19.455434622111177
}
}
]
}
}
Now a bit of explanation. Aggregation sum_log_x computes the sum of x. Aggregation named geom_mean is a sibling pipeline aggregation which is applied on the result of sum_log_x aggregation (its sibling). It uses special bucket path _count to get the number of elements. (Here you can read about bucket_script aggregation a bit more.)
The final trick is to wrap both of them with some aggregation, because, as explained in this issue, bucket_script cannot be a top-level aggregation. Here I do a terms aggregation on a script that always returns 'Bazinga!'
Thanks to anhzhi who proposed this hack.
Important considerations
Since the geometric mean is computed through logs, all x values should be greater than 0. However:
if any of values are < 0, the result is "NaN"
if all values are non-negative and less than "+Infinity", but at least one value is 0, the result is "-Infinity"
if both "+Infinity" and "-Infinity" are among the values, the result is "NaN".
The queries were tested with Elasticsearch 5.4. Performance on a large collection of documents was not tested, you might consider inserting x together with its log to make aggregations more efficient.
Hope that helps!

Elasticsearch 5 (Searchkick) Aggregation Bucket Averages

We have an ES index holding scores given for different products. What we're trying to do is aggregate on product names and then get the average scores for each of product name 'buckets'. Currently the default aggregation functionality only gives us the counts for each bucket - is it possible to extend this to giving us average score per product name?
We've looked at pipeline aggregations but the documentation is pretty dense and doesn't seem to quite match what we're trying to do.
Here's where we've got to:
{
"aggs"=>{
"prods"=>{
"terms"=>{
"field"=>"product_name"
},
"aggs"=>{
"avgscore"=>{
"avg"=>{
"field"=>"score"
}
}
}
}
}
}
Either this is wrong, or could it be that there's something in how searckick compiles its ES queries that is breaking things?
Thanks!
Think this is the pipeline aggregation you want...
POST /_search
{
"size": 0,
"aggs": {
"product_count" : {
"terms" : {
"field" : "product"
},
"aggs": {
"total_score": {
"sum": {
"field": "score"
}
}
}
},
"avg_score": {
"avg_bucket": {
"buckets_path": "product_count>total_score"
}
}
}
}
Hopefully I have that the right way round, if not - switch the first two buckets.

Elasticsearch - calculate percentage in nested aggregations in relation to parent bucket

Updated question
In my query I aggregate on date and then on sensor name. It is possible to calculate a ratio from a nested aggregation and the total count of documents (or any other aggregation) of the parent bucket? Example query:
{
"size": 0,
"aggs": {
"over_time": {
"aggs": {
"by_date": {
"date_histogram": {
"field": "date",
"interval": "1d",
"min_doc_count": 0
},
"aggs": {
"measure_count": {
"cardinality": {
"field": "date"
}
},
"all_count": {
"value_count": {
"field": "name"
}
},
"by_name": {
"terms": {
"field": "name",
"size": 0
},
"aggs": {
"count_by_name": {
"value_count": {
"field": "name"
}
},
"my ratio": count_by_name / all_count * 100 <-- How to do that?
}
}
}
}
}
}
}
}
I want a custom metric that gives me the ratio count_by_name / all_count * 100. Is that possible in ES, or do I have to compute that on the client?
This seems very simple to me, but I haven't found a way yet.
Old post:
Is there a way to let Elasticsearch consider the overall count of documents (or any other metric) when calculating the average for a bucket?
Example:
I have like 100000 sensors that generate events on different times. Every event is indexed as a document that has a timestamp and a value.
When I want to calculate a ratio of the value and a date histogram, and some sensors only generated values at one time, I want Elasticsearch to treat the not existing values(documents) for my sensors as 0 instead of null.
So when aggregating by day and a sensor only has generated two values at 10pm (3) and 11pm (5), the aggregate for the day should be (3+5)/24, or formal: SUM(VALUE)/24.
Instead, Elasticsearch calculates the average like (3+5)/2, which is not correct in my case.
There was once a ticket on Github https://github.com/elastic/elasticsearch/issues/9745, but the answer was "handle it in your application". That's no answer for me, as I would have to generate zillions of zero-Value documents for every sensor/time combination to get the average ratio right.
Any ideas on this?
If this is the case , simply divide the results by 24 from application side.And when granularity change , change this value accordingly. Number of hours per day is fixed right ....
You can use the Bucket script aggregation to do what you want.
{
"bucket_script": {
"buckets_path": {
"count_by_name": "count_by_name",
"all_count": "all_count"
},
"script": "count_by_name / all_count*100"
}
}
It's just an example.
https://www.elastic.co/guide/en/elasticsearch/reference/2.4/search-aggregations-pipeline-bucket-script-aggregation.html

Resources