Calculating sum of nested fields with date_histogram aggregation in Elasticsearch - elasticsearch

I'm having trouble getting the sum of a nested field in Elasticsearch using a date_histogram, and I'm hoping somebody can lend me a hand.
I have a mapping that looks like this:
"client" : {
// various irrelevant stuff here...
"associated_transactions" : {
"type" : "nested",
"include_in_parent" : true,
"properties" : {
"amount" : {
"type" : "double"
},
"effective_at" : {
"type" : "date",
"format" : "dateOptionalTime"
}
}
}
}
I'm trying to get a date_histogram that shows total revenue by month across all clients--i.e. a time series showing the sum associated_transactions.amount in a histogram determined by associated_transactions.effective_date. I tried running this query:
{
"query": {
// ...
},
"aggregations": {
"revenue": {
"date_histogram": {
"interval": "month",
"min_doc_count": 0,
"field": "associated_transactions.effective_at"
},
"aggs": {
"monthly_revenue": {
"sum": {
"field": "associated_transactions.amount"
}
}
}
}
}
}
But the sum it's giving me isn't right. It seems that what ES is doing is finding all clients who have any transaction in a given month, then summing all of the transactions (from any time) for those clients. That is, it's a sum of the amount spent in the lifetime of a client who made a purchase in a given month, not the sum of purchases in a given month.
Is there any way to get the data I'm looking for, or is this a limitation in how ES handles nested fields?
Thanks very much in advance for your help!
David

Try this?
{
"query": {
// ...
},
"aggregations": {
"revenue": {
"date_histogram": {
"interval": "month",
"min_doc_count": 0,
"field": "associated_transactions.effective_at"
"aggs": {
"monthly_revenue": {
"sum": {
"field": "associated_transactions.amount"
}
}
}
}
}
}
}
i.e. move the "aggs" key into the "date_histogram" field.

I stumbled upon this question while trying to solve similar problem with my implementation of ES.
It seems that currently Elasticsearch looks at position of aggregation in the JSON body request tree - not inheritance of its objects and filelds. So you should not put your sum aggregation "inside" "date_histogram", but place it outside on the same JSON tree level.
This worked for me:
{
"size": 0,
"aggs": {
"histogram_aggregation": {
"date_histogram": {
"field": "date_vield",
"calendar_interval": "day"
},
"aggs": {
"views": {
"sum": {
"field": "the_vield_i_want_to_sum"
}
}
}
}
},
"query": {
#some query
}
OP made mistake of placing his sum aggregation inside date histogram aggregation.

Related

How to get maximum value and id using Max aggregation by country in Elasticsearch

Getting maximum value by country but I want additional information for maximum value id. I tried many ways but I don't know how to fetch.
{
"aggs" : {
"country_groups" : {
"terms" : { "field" : "country.keyword",
"size":30000
},
"aggs":{
"max_price":{
"max": { "field" : "video_count"}
}
}
}
}
}
Depending on the type of your id field (numeric or string), you have two ways of doing it.
If you look at the query below, if your id is numeric you can do the same as you did with video_count, i.e. using the max metric aggregation (see max_id_num).
However, if your id field is a string, you can leverage the top_hits aggregation and sort it in descending order (see max_id_str).
{
"aggs": {
"country_groups": {
"terms": {
"field": "country.keyword",
"size": 30000
},
"aggs": {
"max_price_and_id": {
"top_hits": {
"size": 1,
"sort": {
"video_count": "desc"
},
"_source": ["channel_id", "video_count"]
}
}
}
}
}
}

ElasticSearch: search in two different ranges with different aggregations for each

This is an odd question, but I'm trying to avoid calling ES twice to obtain different data from two different range of times.
Let's say that:
from "2016-10-01 to 2016-10-31" I want to SUM the field "orders.total_sales" (just an example) and another sum "reviews.count".
And from "2016-09-01 to 2016-09-30"
I only want to sum "orders.total_sales".
(The truth is I need like 50 sum aggregations on the first range), but for the second range, I only need 2).
I know it's possible to filter by two ranges of anything using should instead of must. But is it possible to distinguish the result from each range in order to operate with them (aggregations sum).
I don't think it's possible, but just in case someone has come with this issue before.
Thanks in advance.
You can use filter aggregation for this purpose. You would basically write two filters for two different range and then do sub aggregations as you want.
{
"size": 0,
"aggs": {
"range_one": {
"filter": {
"range": {
"your_date_field": {
"gte": "2016-01-01",
"lte": "2016-02-02"
}
}
},
"aggs": {
"sum_orders": {
"sum": {
"field": "your_sum_field1"
}
}
}
},
"range_two": {
"filter": {
"range": {
"your_date_field": {
"gte": "2016-02-01",
"lte": "2016-03-02"
}
}
},
"aggs": {
"sum_orders": {
"sum": {
"field": "your_sum_field2"
}
}
}
}
}
}
I ended up writing something like this with (due to ES errors, until I got it working)
Thank you very much! It worked, but not with filter, but the idea is the same
I did something like this:
{
"timeout" : 1500,
"query" : {
"bool" : {
"must" : [
{
"term" : {
"businessId" : "101598"
}
} ,
{
"range" : {
"date" : {
"from" : "2016-10-15T03:00:00.000Z",
"to" : "2016-10-31T03:00:00.000Z",
"include_lower" : true,
"include_upper" : true
}
}
}]
}
},
"aggs": {
"range_one": {
"date_range": {
"field": "date",
"ranges": [
{
"from": "2016-10-15T03:00:00.000Z",
"to": "2016-10-22T03:00:00.000Z"
}
]
},
"aggs": {
"sum_orders_sales": {
"sum": {
"field": "orders.totalSales"
}
}
}
},
"range_two": {
"date_range": {
"field": "date",
"ranges": [
{
"from": "2016-10-23T03:00:00.000Z",
"to": "2016-10-31T03:00:00.000Z"
}
]
},
"aggs": {
"sum_orders_count": {
"sum": {
"field": "orders.orderCount"
}
}
}
}
}
}
In my case performance and speed is important and since my two ranges are consecutive, I thought I could filter by the business_id (I need) and from the oldest date (start date of the first range) to the newest date (end date of the second range), assuming that aggregation works with the result of the query (otherwise, it will search all documents, and it would be great just to have it doing the aggregation operations over a resultset obtained just one). But I'm new with ES, so not sure I'm seeing it right. However, it's working like charm!
Thanks a lot1

Kibana - Calculating duration between events

I am pushing events directly into the elastic search rest API, in the following format:
Timestamp
RequestId
EventName
I would like to bucket by RequestId and then subtract the max and min Timestamps to calculate duration between events in Kibana.
I can quite easily bucket them in Kibana, although its not intuitive to know how to calculate the duration. I have been changing the JSON input to try and get it to render something sensible without luck.
I have managed to achieve what I want using the elastic search API directly:
{
"size": 0,
"query": { },
"aggs": {
"requests_field": {
"terms": {
"field": "requestId",
"size": 5
},
"aggs": {
"min_date": {
"min": {
"field": "timeStamp"
}
},
"max_date": {
"max": {
"field": "timeStamp"
}
},
"duration" : {
"bucket_script" : {
"buckets_path" : {
"minDate" : "min_date",
"maxDate" : "max_date"
},
"script" : "maxDate-minDate"
}
}
}
}
}
}
How can I "visualise" this in Kibana as a simple line graph?

Elasticsearch derivate of a deep metric

I have a web crawler that collects data and stores snapshots several times a day. My query has some aggregations that group the snapshots together per day and return the last snapshot of each day using top_hits.
The documents look like this:
"_source": {
"taken_at": "2016-02-01T11:27:09.184-03:00",
... ,
"my_metric": 113
}
I'd like to be able to calculate the derivative of a certain metric, say my_metric, of the documents returned by top_hits (i.e., the derivative of the last snapshots of each day's my_metric).
Here's what I have so far:
{
"aggs": {
"filtered_snapshots": {
"filter": {
// ...
},
"aggs" : {
"grouped_data": {
"date_histogram": {
"field": "taken_at",
"interval": "day",
"format": "YYYY-MM-dd",
"order": { "_key" : "asc" }
},
"aggs": {
"resource_by_date": {
"terms": { "field": "remote_id" },
"aggs": {
"latest_snapshots": {
"top_hits": {
"sort": { "taken_at": { "order": "asc" }},
"size" : 1
}
}
}
},
"my_metric_deriv": {
"derivative": {
"buckets_path": "resource_by_date>latest_snapshots>my_metric"
}
}
}
}
}
}
}
}
I get a "No aggregation [my_metric] found for path ..." error with the query above.
Am I using a wrong bucket_path? I've read through the bucket_path and the derivative documentation and haven't found much that could help.
The documentation mentions briefly "deep metrics", stating that they can be limited in some ways, which I couldn't quite understand. I'm not sure how or if the limitations affect my case.

Elastic Search Aggregation and Details

I am trying to get the teachers name too in this query..
From this I am able to get loop the teachers and get the number of classes she is working for and also the amount of money she gets for each year.
But I can't get full details in this query. I want to display teachers name too.
here is my current query
{
"aggs": {
"teacher": {
"terms": {
"field": "teacher_id",
"size": 10
},
"aggs": {
"academic_year": {
"date_histogram": {
"field": "acc_year",
"interval": "year"
},
"aggs": {
"income": {
"stats": {
"field": "teacher_hourly_fee"
}
}
}
}
}
}
},
"size": 0
}
Most straightforward approach may be to combine teacher ID and name as a generated term using a script:
{
"aggs" : {
"teacher" : {
"terms" : {
"script" : "_source.teacher_id + '-' + _source.teacher_name",
"size": 10
}
}
}
}
Adjust script particulars per your actual schema.

Resources