How to do a year over year aggregation with Elasticsearch? - elasticsearch

Assuming I have a date field on a document, I know using the date_histogram aggregation I can get a document count by day, month, year, etc.
What I want to do is get the average document count for January, February, March, etc. over several given years. The same goes for Monday, Tuesday, Wednesday, etc. over several given weeks. Is there a way to do this having just that same date field or what is the best way to accomplish this with Elasticsearch?
Example
Let's say we have a bunch of orders placed over three years:
2012 - Jan (10 orders), Feb (5 orders), Mar (7 orders), Apr (11 orders), etc
2013 - Jan (13 orders), Feb (7 orders), Mar (12 orders), Apr (15 orders), etc.
2014 - Jan (10 orders), Feb (7 orders), Mar (6 orders), Apr (13 orders), etc.
What I want is the average of each month over the given years, so the output would be:
Jan (10 + 13 + 10 / 3 = 11 orders), Feb (6.33 orders), Mar (8.33 orders), Apr (13 orders), etc.
It would be best if this can be generalized for N years (or N Januaries, etc.) so that we search over any date range.

You can use 'monthOfYear' like this:
"aggregations": {
"timeslice": {
"histogram": {
"script": "doc['timestamp'].date.getMonthOfYear()",
"interval": 1,
"min_doc_count": 0,
"extended_bounds": {
"min": 1,
"max": 12
},
"order": {
"_key": "desc"
}
}
}
The extended bounds will ensure you get a value for every month (even if it is zero).
If you want the month names, you can either do that in your own code, or, do this (at the consequence that you won't get values for months that have no data):
"aggregations": {
"monthOfYear": {
"terms": {
"script": "doc['timestamp'].date.monthOfYear().getAsText()",
"order": {
"_term": "asc"
}
}
}
Once you've got this, you can nest your stats aggregation inside this one:
"aggregations: {
"monthOfYear": {
"terms": {
...
},
"aggregations": {
"stats": ...
}
}
}
The question is pretty old now, but, hope this helps someone.

My understanding of what you want is:
You'd like to see the average number of documents per month in yearly buckets
is that correct?
if so, you could count the number of documents in a year (i.e. the yearly bucket) and then divide by 12 using a script.
E.g. to show the daily average doc count in weekly buckets (assuming 30 days per month):
curl -XGET 'http://localhost:9200/index/type/_search?pretty' -d '{
"aggs" : {
"monthly_bucket": {
"date_histogram": {"field": "datefield","interval": "week"},
"aggs" : {
"weekly_average": {"sum" : {"script" : " doc[\"datefield\"].value>0 ? 1/30 : 0"} }}
}
}
}'

Related

Golang how to sort a struct that has a map of string structs [duplicate]

This question already has answers here:
Map in order range loop
(1 answer)
How to iterate maps in insertion order?
(2 answers)
Why are iterations over maps random?
(2 answers)
Closed 4 months ago.
I am working on a project that stores a little bit of analytics data. Everything is working oké but I am just facing one issue where I have to sort the contents of, for example: TotalsPerDay["2022-02-01"] on the amount of clicks.
I cannot figure out how to properly do this. The reason I chose for a map[string]struct{} is that I need the keys as dates and the values as structs so I can, somewhere else in the program, loop through all the records and sum up a total for that specific day.
dataModel := types.DataModel{
Totals: types.DataModelTotals{
Clicks: 0,
Impressions: 0,
Position: 0,
ClickThroughRate: 0,
TotalsPerDay: map[string]types.DataModelTotalsPerDay{},
},
Searches: map[string]types.Searches{},
Pages: map[string]types.Pages{},
Countries: map[string]types.Countries{},
Devices: map[string]types.Devices{},
SearchFilter: map[string]types.SearchFilter{},
Dates: map[string]types.Dates{},
}
What TotalsPerDay could look like:
TotalsPerDay {
DataModelTotalsPerDay["2022-02-01"]{ Clicks: 17 },
DataModelTotalsPerDay["2022-02-01"]{ Clicks: 9 },
DataModelTotalsPerDay["2022-02-01"]{ Clicks: 82 }
DataModelTotalsPerDay["2022-02-01"]{ Clicks: 52 }
}
What I would like it to be:
TotalsPerDay {
DataModelTotalsPerDay["2022-02-01"]{ Clicks: 82 },
DataModelTotalsPerDay["2022-02-01"]{ Clicks: 52 },
DataModelTotalsPerDay["2022-02-01"]{ Clicks: 17 },
DataModelTotalsPerDay["2022-02-01"]{ Clicks: 9 }
}
The reason I have the keys as a date (string) is because in the front-end I need to loop through the dates to display them and this way it was very easy to do.
Hope someone can help me out with this as I have been stuck for this for the past few days.
I tried looking into various sort functions but none got me the result I wanted so far.
I also read somewhere (not sure if this is true) that the elements of a map cannot be sorted? if that is true, how am I supposed to get my data summed up in a way so when send to the front-end they can simply loop over the dates and for each date easily get the amount of clicks and other statistics?
Below is a JSON version of TotalsPerDay.
{"2022-08-27 00:00:00 +0000 UTC":{"clicks":6,"impressions":7558,"position":280683.4432563555,"clickThroughRate":280669.4432563555,"totalRrows":5436},"2022-08-28 00:00:00 +0000 UTC":{"clicks":1,"impressions":8061,"position":289043.214145452,"clickThroughRate":288990.8808121187,"totalRrows":5665},"2022-08-29 00:0 0:00 +0000 UTC":{"clicks":8,"impressions":8283,"position":303046.8245871944,"clickThroughRate":302952.8245871944,"totalRrows":5905},"2022-08-30 00:00:00 +0000 UTC":{"clicks":4,"impressions":8447,"position":300142.1673121948,"clickThroughRate":300071.1673121948,"totalRrows":5904},"2022-08-31 00:00:00 +0000 UTC": {"clicks":7,"impressions":8114,"position":285296.87973927736,"clickThroughRate":285158.87973927736,"totalRrows":5648},"2022-09-01 00:00:00 +0000 UTC":{"clicks":6,"impressions":8306,"position":297513.4591694122,"clickThroughRate":297337.4591694122,"totalRrows":5932},"2022-09-02 00:00:00 +0000 UTC":{"clicks":4,"impressions":7938,"position":284144.3877642734,"clickThroughRate":284102.3877642734,"totalRrows":5590},"2022-09-03 00:00:00 +0000 UTC":{"clicks":0,"impressions":7024,"position":266604.6929695027,"clickThroughRate":266536.6929695027,"totalRrows":5205}}

Optimistic multiIndex query to get min of maximum of all indexes

I am learning elastic search and need help with multiindex search query.
So basically I have 7 indexes. Every index has lastUpdatedDate with every document. Now I want to query all selected indexes at once and get minimum of maximum of last updated date.
eg
index - "A" last updated on 20th Dec - max of all lastUpdatedDate records - 20th Dec
index - "B" last updated on 18th Dec - max of all lastUpdatedDate records - 18th Dec
index - "C" last updated on 19th Dec - max of all lastUpdatedDate records - 19th Dec
min of all these three indexes is 18th.
I can make query to all indexes separately from my backend service, but thinking of optimise query in Java to index all these indexes at once.
One more example:
Index-A {
Id:1, lastUpdatedDate: 15th Dec;
Id:2, lastUpdatedDate: 16th Dec;
Id:5, lastUpdatedDate: 15th Dec;
Id:6, lastUpdatedDate: 20th Dec;
};
Index-B{
Id:1, lastUpdatedDate: 21st Dec;
Id:2, lastUpdatedDate: 16th Dec;
Id:5, lastUpdatedDate: 15th Dec;
Id:6, lastUpdatedDate: 20th Dec;
};
Index-C{
Id:1, lastUpdatedDate: 22nd Dec;
Id:2, lastUpdatedDate: 16th Dec;
Id:5, lastUpdatedDate: 15th Dec;
Id:6, lastUpdatedDate: 20th Dec;
}
Now max of indexes are:
Index-A -> 20th Dec
Index-B -> 21st Dec
Index-C -> 22nd Dec
Then min is 20th Dec
A very simple query would be this one. Retrieve the max updated value per index and then retrieve the bucket with the minimum value in all those buckets:
GET _all/_search
{
"size": 0,
"aggs": {
"all_indexes": {
"terms": {
"field": "_index",
"size": 100
},
"aggs": {
"max_updated": {
"max": {
"field": "lastUpdatedDate"
}
}
}
},
"min_updated": {
"min_bucket": {
"buckets_path": "all_indexes>max_updated"
}
}
}
}

Elasticsearch count overlapping timeranges in date histogram

I have events stored in Elasticsearch 6.6 that have a start and end time e.g.:
{
"startTime": "2019-01-11T14:49:16.719Z"
"endTime": "2019-01-11T16:31:56.483Z"
}
I want to display a date histogram which shows the number of overlapping events in each hour.
Example:
Hour of Day:
12 13 14 15 16 17 18 19
Events:
<====E1====> <===E2==>
<===E3====>
<==E4==>
Result:
0 1 1 3 2 2 1 0
Is there a way to do this with an elasticsearch aggregation or do I have to implement it in the application?

Date difference scripted field in Kibana

I wanted to find the difference between two fields using scripted fields. Here are the two date fields and their format:
start_time - June 17th 2018, 09:44:46.000
end_time - June 17th 2018, 09:44:49.000
Which will give proc_time.
Here's what I am trying to do in scripted fields:
doc['start_time'].date.millis - doc['end_time'].date.millis
But this is returning the processing time which is deducted from epoch time.
For example, if my processing time is 2 seconds, then the output will be epoch time - 2 seconds.
Which is not what I want.
This is the sample doc:
17 Jun 2018 04:14:46 INFO CSG event file generation started at: Sun Jun 17 04:14:46 CDT 2018
17 Jun 2018 04:14:46 INFO Executing CSG file generation process
Warning: Using a password on the command line interface can be insecure.
17 Jun 2018 04:15:57 INFO Finished at: Sun Jun 17 04:15:57 CDT 2018
Any help would be appreciated.
Update
I've got this working with the following painless script:
((doc['csg_proc_end_time'].date.year) * 31536000 + doc['csg_proc_end_time'].date.monthOfYear * 86400 + doc['csg_proc_end_time'].date.dayOfMonth * 3600 + doc['csg_proc_end_time'].date.secondOfDay) - ((doc['csg_proc_start_time'].date.year) * 31536000 + doc['csg_proc_start_time'].date.monthOfYear * 86400 + doc['csg_proc_start_time'].date.dayOfMonth * 3600 + doc['csg_proc_start_time'].date.secondOfDay)
However, I would welcome any other script which does this in a simpler way.
JSON format for added fields:
"fields": {
"#timestamp": [
"2018-06-20T04:45:00.258Z"
],
"zimbra_proc_time": [
0
],
"csg_proc_time": [
71
],
"perftech_proc_time": [
0
],
"csg_proc_end_time": [
"2018-06-17T04:15:57.000Z"
],
"csg_proc_start_time": [
"2018-06-17T04:14:46.000Z"
]
},
This is what I've done to reproduce your issue and it works properly:
PUT test/doc/1
{
"csg_proc_end_time": "2018-06-17T04:15:57.000Z",
"csg_proc_start_time": "2018-06-17T04:14:46.000Z"
}
Now compute the processing time in a script field:
GET test/_search
{
"script_fields": {
"proc_time": {
"script": {
"source": "(doc.csg_proc_end_time.value.millis - doc.csg_proc_start_time.value.millis) / 1000"
}
}
}
}
Result: 71 seconds
{
"_index": "test",
"_type": "doc",
"_id": "1",
"_score": 1,
"fields": {
"proc_time": [
71
]
}
}

search with comma separated values in elasticsearch [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I am quite new in elastic search here so if anybody can help me here
Suppose I have selected
1) Category - Hollywood
2) Sub-Category - Bond Special
3) Genre - Action & Drama & Comedy ( as multiple selection will be there )
4) Language - English, Russian and Hindi ( as multiple selection will be there)
5) Release Year - 1990,1999,2000 ( as multiple selection will be there)
6) 3D Movie - True OR False (any one will be selected)
7) SortBy - “A-Z”OR “Z-A” OR “Date”
Can anyone help me in making this query for elastic-search. I will use "match_phrase" for making AND condition but the issue is matching parameters or search parameter will be multiple and comma separated (u can say).
and my index array is given below : -
[_source] => Array (
[id] => 43
[value] => GREENBERG
[imageName] => Done
[date] => (1905) USA (Bengali)
[language] => (Bengali) | 1905 | 1.47hrs
[directorName] => Alejandro González Iñárritu, Ang Lee
[castForSearch] => Ben Stiller, John Turturro
[viewDetailsUrl] => /movie/greenberg
[movieType] => Animation
[rating] => 0
[cast] => Ben Stiller, John Turturro, Olivier Martinez
[synopsis] => A man from Los Angeles, who moved to New York years ago, returns to L.A. to figure out his life while he house-sits for his brother. He soon sparks with his brother's assistant.
[code] => HLR06
[type] => Non-3D
[trailer] => https://www.youtube.com/watch?v=cwdliqOGTLw
[imdb_code] => 1234654
[tags] => Array
(
[0] => Animation
)
[genre] => Adventure
[languages] => Bengali
[categories_filter] => Array
(
[0] => Category 2,Hollywood
)
[sub_categories_filter] => Array
(
[0] => Sub-Category 1,Sub-Category 4,Sub-Category 5,Sub-Category 6,Sub-Category 7
)
)
Weekly Sunday 12 AM
everyday 12 AM
every day 12:15 AM
daily 12:01 AM
daily 12:01 AM
joinScreenCancellationScheduler - Weekly Sunday 12 AM
0 0 * * 7 curl <url>
goLiveDate - everyday 12 AM
0 0 * * * curl <url>
nearestDateDisable - every day 12:15 AM
15 0 * * * curl <url>
reminderOfEvent - daily 12:01 AM
01 0 * * * curl <url>
thresholdNotMet - daily 12:01 AM
daily 12:01 AM
To match against one of multiple possible values, use a terms query. You don't need a match_phrase query because you're not doing any sort of free-text matching.
You'll need to split the comma-separated values into arrays before indexing your data into Elasticsearch (or use a comma-separated tokenizer).
Your use case suggests that you don't care about scoring but only about filtering, in which case your query should probably just have a filter.
Sorting is not the same as filtering; for your A-Z/Z-A/Date sorting you'll need to use a sort clause outside of the query.
The final thing would probably look like this:
GET /my_index/my_type/_search
{
"query": {
"bool": {
"filter": [
"terms": { "genre": ["Action", "Drama", "Comedy"] },
"terms": { "language": ["English", "Russian", "Hindi"] },
// more terms filters
]
}
},
"sort": { "title": "asc" }
}

Resources