Calculate & Sort Sales Differences Across 2 Date Ranges in Elasticsearch 6 - elasticsearch

I'm using Elasticsearch 6. I want to calculate sales differences (grouped by categories) across 2 date ranges and sort it afterwards.
In the example query below, I used Bucket Script aggregation to evaluate the sales differences before sorting it with Bucket Sort aggregation to obtain the top 10 largest sales difference.
{
"size":0,
"aggs":{
"categories":{
"terms":{
"field":"category"
},
"aggs":{
"months_range":{
"date_range":{
"field":"date",
"ranges":[
{ "from":"01-2015", "to":"03-2015", "key":"start_month" },
{ "from":"03-2015", "to":"06-2015", "key":"end_month" }
],
"keyed":true
},
"aggs":{
"sales":{
"sum":{
"field":"sales_amount"
}
}
}
},
"sales_difference":{
"bucket_script":{
"buckets_path":{
"startMonthSales":"months_range.start_month>sales", // correct syntax?
"endMonthSales":"months_range.end_month>sales" // correct syntax?
},
"script":"params.endMonthSales - params.startMonthSales"
}
},
"sales_bucket_sort":{
"bucket_sort":{
"sort":[
{
"sales_difference":{
"order":"desc"
}
}
],
"size":10
}
}
}
}
}
}
My questions:
Is my bucket path's syntax correct? Is it possible to access individual date_range bucket, e.g. months_range.end_month, in the bucket path?
How's the performance of executing a custom script in Elasticsearch as compared to running similar business logic in the application server?

Related

Elasticsearch - Summing filtered array based on terms aggregation

I have the following data structure in my documents in elasticsearch which represents a purchase:
{
...
"lineItems": [
{ "id": "1", "quantity": 2 },
{ "id": "2", "quantity": 1 },
]
...
}
I'm trying to work out the most popular product id in a provided date range. Using a terms aggregation I can work out the number of appearances of a product id in baskets, but I'm having trouble summing the quantities to work out how many of an item was purchased.
My current search looks like this:
{
"query": ...,
"size":0,
"aggs":{
"basketAppearances":{
"terms":{
"field":"lineItems.id.keyword"
},
"aggs":{
"timesPurchased":{
"sum":{
"field":"lineItems.quantity"
}
},
"order":{
"bucket_sort":{
"sort":[
{
"timesPurchased":"desc"
}
]
}
}
}
}
}
}
The problem with the above is that it obviously takes the full lineItems array and sums all the values, so basketAppearances is correct and timesPurchased is incorrect. i.e. I get the following result:
{
...
"aggregations":{
"basketAppearances":{
"doc_count_error_upper_bound":0,
"sum_other_doc_count":0,
"buckets":[
{
"key":"1",
"doc_count":1,
"timesPurchased":{
"value": 3
}
},
{
"key":"2",
"doc_count":1,
"timesPurchased":{
"value": 3
}
}
]
}
}
...
}
I need to sum only the rows in the array with the same ID as the terms bucket it resides. i.e. filter the array based upon the term of the terms aggregation
I appreciate the best answer here is probably to change my data format to have a different document type of "line item" and add a line item per array entry, but the data structure matches my data elsewhere (and it makes sense elsewhere) and I'd ideally like to keep it the same.

Looking to get average speed for each ID in terms list and within a date range in elastic

I'm looking in kibana to get the average speed each of four road ids within a specific date range. I have this aggregation code that doesn't bring up an errors, but it also Gateway Time-out every time I try. The two fields are there, but I can't seem to build the aggregation well enough to get my intended averages. Can someone see what I'm doing incorrectly? Here is my query:
GET herev322_*/_search
{
"aggs": {
"ns2ids":{
"filter": {
"bool":{
"must":[
{
"terms":{
"nS2ID": [
"102-4893",
"102-4894",
"102+10103",
"102+10104"
]
}
},
{
"range":{
"pBT":{
"from": "2021-01-01T07:00:00",
"to": "2021-02-01T19:00:00"
}
}
}
]
}
},
"aggs":{
"avg_speed":{
"avg":{
"field": "speed"
}
}
}
}
}
}

Incorrect aggregation when using sorting

I use this query to get search hits and the count of hits across multiple indices:
/index1,index2/_search
{
"query":{
"query_string":{
"query":"*"
}
},
"aggs":{
"group_by_index":{
"terms":{
"field":"_index",
"min_doc_count":0
}
}
},
"post_filter":{
"terms":{
"_index":"index1"
}
},
"sort":{
"my_field":"asc"
}
}
The problem is if I sort on a field (my_field) that only exist in index1, the aggregation will only give me the hits count of index1, and not index2.
I thought the aggregation would work regardless of what sorting I have specified?
Using Elasticsearch 6.4
Solved it by using unmapped_type

Elasticsearch - Aggregation and Bucket size

I have data in ES which looks like this:
'{"Emp_ID":"12212","Emp_Name":"Jim","Emp_Sal":300,"Dep_Id":22,"Dep_Name":"IT","Dep_Cnt":40}'
'{"Emp_ID":"6874590","Emp_Name":"Joe","Emp_Sal":140,"Dep_Id":66,"Dep_Name":"Admin","Dep_Cnt":20}'
'{"Emp_ID":"32135","Emp_Name":"Jill","Emp_Sal":170,"Dep_Id":66,"Dep_Name":"Admin","Dep_Cnt":20}'
'{"Emp_ID":"43312","Emp_Name":"Andy","Emp_Sal":450,"Dep_Id":22,"Dep_Name":"IT","Dep_Cnt":40}'
'{"Emp_ID":"315609","Emp_Name":"Cody","Emp_Sal":150,"Dep_Id":22,"Dep_Name":"IT","Dep_Cnt":40}'
'{"Emp_ID":"87346","Emp_Name":"Dave","Emp_Sal":500,"Dep_Id":55,"Dep_Name":"hr","Dep_Cnt":10}'
I want to get all the unique departments ordered by Dep_Cnt, for which I wrote the following query
{
"size":0,
"aggs":{
"by_Dep_Cnt":{
"terms":{
"field":"Dep_Cnt",
"order":{
"_term":"asc"
}
},
"aggs":{
"by_unique_dep_id":{
"terms":{
"field":"Dep_Id"
},
"aggs":{
"tops":{
"top_hits":{
"size":1
}
}
}
}
}
}
}
}
And got expected output of 3 unique departments ordered by Dep_Cnt.
But now my requirement is to get only the top two departments.
How do I modify the query to get only 2 buckets?
What you are looking for is the parameter size of the terms aggregation:
If Dep_Cnt is the number of employees in your department and your document are per employee and you have all the employee in your index (from your mapping it may be the case) you can just do:
{
"size":0,
"aggs":{
"by_Dep_Id":{
"terms":{
"field":"Dep_Id",
"size": 2
}
}
}
Since by default it will sort by the number of documents with the corresponding value i.e. the number of documents with this Dep_Id i.e. the number of employees in this department.
If you are not in this situation:
Your current request does not behave the same way when you have two department with the same size (you will have two Dep_Ids in the same bucket of Dep_Cnt)
You can group documents by Dep_Id, get the Dep_Cnt using the metric you want (min, max, avg, ...) and sort on this metric:
{
"size":0,
"aggs":{
"by_Dep_Id":{
"terms":{
"field":"Dep_Id",
"size": 2
"order":{
"avg_Dep_Cnt":"asc"
}
},
"aggs":{
"avg_Dep_Cnt":{
"avg":{
"field":"Dep_Cnt"
}
}
}
}
}
}
NB: I removed the top_hits aggregations since you do not need them according to what you explained, if you have extra requirement just add them in the aggregation.

Bounce Rate Query Elasticsearch

i am planning to implement a query for calculating the bounce rate using elasticsearch query.
can any body know how to use the input of aggreation results using script ?
{
"aggs":{
"monthly":{
"date_histogram":{
"field":"timestamp",
"interval":"month",
"script":""
},
"aggs":{
"visits_greater_than_one":{
"terms":{
"field":"sessionId",
"min_doc_count":2
}
}
},
"aggs":{
"visitor_count":{
"cardinality":{
"field":"sessionId"
}
}
}
}
}
}
Thanks,
Ankireddy Polu
i have found a little workaround for addressing this problem
{
"aggs":{
"monthly":{
"date_histogram":{
"field":"timestamp",
"interval":"month"
},
"aggs":{
"visits_greater_than_one":{
"terms":{
"field":"sessionId",
"min_doc_count":2
}
},
"visitor_count":{
"cardinality":{
"field":"sessionId"
}
}
}
}
}
}
the drawback with the approach is we need to perform the calculation separately where ever we pares the result, we will have a two different buckets one will hold the number of sessions whose have more than one entry and total number of session during that interval. using that (visitor_count - visits_greater_than_one)/visitor_count will be my bounce rate
(visitor_count - visits_greater_than_one) gives me the sessions user visited only single page

Resources