How to group by month in Elastic search - elasticsearch

I am using elastic search version 6.0.0
for group by month, I am using date histogram aggregation.
example which I've tried :
{
"from":0,
"size":2000,
"_source":{
"includes":[
"cost",
"date"
],
"excludes":[
],
"aggregations":{
"date_hist_agg":{
"date_histogram":{
"field":"date",
"interval":"month",
"format":"M",
"order":{
"_key":"asc"
},
"min_doc_count":1
},
"aggregations":{
"cost":{
"sum":{
"field":"cost"
}
}
}
}
}
}
}
and as a result i got 1(Jan/January) multiple times.
As I have data of January-2016 ,January-2017 , January-2018 so will return 3 times January. but i Want January only once which contains the sum of All years of January.

Instead of using a date_histogram aggregation you could use a terms aggregation with a script that extracts the month from the date.
{
"from": 0,
"size": 2000,
"_source": {"includes": ["cost","date"],"excludes"[]},
"aggregations": {
"date_hist_agg": {
"terms": {
"script": "doc['date'].date.monthOfYear",
"order": {
"_key": "asc"
},
"min_doc_count": 1
},
"aggregations": {
"cost": {
"sum": {
"field": "cost"
}
}
}
}
}
}
Note that using scripting is not optimal, if you know you'll need the month information, just create another field with that information so you can use a simple terms aggregation on it without having to use scripting.

We can use the calendar_interval with month value:
Documentation: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-datehistogram-aggregation.html#calendar_interval_examples
GET my_index/_search
{
"size": 0,
"query": {},
"aggs": {
"over_time": {
"date_histogram": {
"field": "yourDateAttribute",
"calendar_interval": "month",
"format": "yyyy-MM" // <--- control the output format
}
}
}
}

Related

ElasticSearch - order with min in aggregation

I have objects in the index that are related by an id, which groups them.
The group creation time is the time between the min createdAt object in the group and the max createdAt object in the group.
I'd like to order these groups by the min or max time, how can I do this?
{
"size":0,
"aggs":{
"intervals":{
"composite":{
"size":10000,
"sources":[
{
"totalId":{
"terms":{
"field":"totalId"
}
},
"name": {
"terms":{
"field":"name"
}
}
}
]
},
"aggs": {
"createdAtStart": {
"min": {"field": "createdAt", "format": "YYYY-MM-DD'T'HH:mm:ssZ"}, "order": { "createdAtStart": "desc" }
},
"createdAtEnd": {
"max": {"field": "createdAt", "format": "YYYY-MM-DD'T'HH:mm:ssZ"}
}
}
}
}
I'm using order wrong:
Found two aggregation type definitions
You cannot achieve that with a composite aggregation because the terms source is not orderable by the values of a sub-aggregation, like it is the case with a "normal" terms aggregation. (also the date formats are wrong)
So the correct query that will give you want you want is this one:
{
"size": 0,
"aggs": {
"totalId": {
"terms": {
"field": "totalId",
"order": {
"createdAtStart": "asc"
}
},
"aggs": {
"createdAtStart": {
"min": {
"field": "createdAt",
"format": "yyyy-MM-dd'T'HH:mm:ssZ"
}
},
"createdAtEnd": {
"max": {
"field": "createdAt",
"format": "yyyy-MM-dd'T'HH:mm:ssZ"
}
}
}
}
}
}
Because of the way the composite aggregation works, it's not possible to achieve what you want. The reason is that the composite aggregation has been created in order to "paginate" over a big amount of buckets. That pagination is defined by the way the buckets are ordered. If it was possible to sort buckets according to sub-aggregations, it would mean that all buckets would need to be pre-computed and pre-sorted before returning the first page of results, which would completely defeat the purpose of this aggregation.
You are adding an extra {
{
"size": 0,
"aggs": {
"intervals": {
"composite": {
"size": 10000,
"sources": [
{
"totalId": {
"terms": {
"field": "totalId"
}
}
}
] <-- note this
},
"aggs": {
"createdAtStart": {
"min": {
"field": "createdAt",
"format": "YYYY-MM-DD'T'HH:mm:ssZ"
},
"order": {
"createdAtStart": "desc"
}
},
"createdAtEnd": {
"max": {
"field": "createdAt",
"format": "YYYY-MM-DD'T'HH:mm:ssZ"
}
}
}
}
}
}

Elasticsearch Aggregations: Only return results of one of them?

I'm trying to find a way to only return the results of one aggregation in an Elasticsearch query. I have a max bucket aggregation (the one that I want to see) that is calculated from a sum bucket aggregation based on a date histogram aggregation. Right now, I have to go through 1,440 results to get to the one I want to see. I've already removed the results of the base query with the size: 0 modifier, but is there a way to do something similar with the aggregations as well? I've tried slipping the same thing into a few places with no luck.
Here's the query:
{
"size": 0,
"query": {
"range": {
"timestamp": {
"gte": "2018-11-28",
"lte": "2018-11-28"
}
}
},
"aggs": {
"hits_per_minute": {
"date_histogram": {
"field": "timestamp",
"interval": "minute"
},
"aggs": {
"total_hits": {
"sum": {
"field": "hits_count"
}
}
}
},
"max_transactions_per_minute": {
"max_bucket": {
"buckets_path": "hits_per_minute>total_hits"
}
}
}
}
Fortunately enough, you can do that with bucket_sort aggregation, which was added in Elasticsearch 6.4.
Do it with bucket_sort
POST my_index/doc/_search
{
"size": 0,
"query": {
"range": {
"timestamp": {
"gte": "2018-11-28",
"lte": "2018-11-28"
}
}
},
"aggs": {
"hits_per_minute": {
"date_histogram": {
"field": "timestamp",
"interval": "minute"
},
"aggs": {
"total_hits": {
"sum": {
"field": "hits_count"
}
},
"max_transactions_per_minute": {
"bucket_sort": {
"sort": [
{"total_hits": {"order": "desc"}}
],
"size": 1
}
}
}
}
}
}
This will give you a response like this:
{
...
"aggregations": {
"hits_per_minute": {
"buckets": [
{
"key_as_string": "2018-11-28T21:10:00.000Z",
"key": 1543957800000,
"doc_count": 3,
"total_hits": {
"value": 11
}
}
]
}
}
}
Note that there is no extra aggregation in the output and the output of hits_per_minute is truncated (because we asked to give exactly one, topmost bucket).
Do it with filter_path
There is also a generic way to filter the output of Elasticsearch: Response filtering, as this answer suggests.
In this case it will be enough to just do the following query:
POST my_index/doc/_search?filter_path=aggregations.max_transactions_per_minute
{ ... (original query) ... }
That would give the response:
{
"aggregations": {
"max_transactions_per_minute": {
"value": 11,
"keys": [
"2018-12-04T21:10:00.000Z"
]
}
}
}

Elasticsearch aggregate field between dates

I want to compare two buckets against each other and find new occurrences that appear in the second bucket. The below query returns all entries in the "query.keyword" field between the two UNIX timestamps provided but I want the UNIX timestamps to be apart of the aggregation section itself.
GET _search
{
"size": 0,
"query": {
"range" :{
"ts": {
"gte":1535155200,
"lte":1535414399
}
}
},
"aggs": {
"domains": {
"terms": {
"field":"query.keyword"
}
}
}
}
I've also tried this but received the error:
"Found two aggregation type definitions in [domains_prev]: [range] and [terms]",
GET _search
{
"size": 0,
"aggs": {
"domains_prev": {
"range" :{
"field":"ts",
"ranges": [
{"to" : 1535414399},
{"from" : 1535155200}
]
},
"terms": {
"field":"query.keyword"
}
}
}
}
The goal is to have something similar to this:
Agg1
"domains_prev"
"field":"query.keyword"
date:gte:timestamp, lte:timestamp
Agg2
"domains_today"
"field":"query.keyword"
date:today
show all "query.keyword" in agg2 that does not appear in agg1.
This is the SQL query that I use to achieve the intended result:
select domains FROM table WHERE date >= 20171123 and domains NOT IN (SELECT domains FROM table WHERE date < 20171123 group by domains)
You'll want to do a nested bucket aggregation starting with date range:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-daterange-aggregation.html
From their page, start with an aggregation like this at the top level:
{
"aggs": {
"range": {
"date_range": {
"field": "date",
"format": "MM-yyy",
"ranges": [
{ "to": "now-10M/M" },
{ "from": "now-10M/M" }
]
}
}
}
}
Then nest your existing terms aggregation using query.keyword under that.
The end result should be something like:
{
"aggs": {
"range": {
"date_range": {
"field": "date",
"format": "MM-yyy",
"ranges": [
{ "to": "now-10M/M" },
{ "from": "now-10M/M" }
]
},
"aggs": {
"domains": {
"terms": {
"field":"query.keyword"
}
}
}
}
}
}

How can i add additional terms in the ElasticSearch Aggregation with Datetime Buckets?

Using Elastic Search 5.3 aggregation api - unable to write a query which calculates a measure on a date bucket- week split by Dimension/ term/field. i am able to make the date buckets and get the measure calculated for that bucket, but unable to split it down by a term: say application or term say transaction. Elastic search 5+ version has deprecated a lot of APIs from previous versions. here is what i got - this is right now aggregating the measure across all terms for that date bucket. Need to split it by some fields / terms. How do I go about doing it.
POST /index_name/_search?size=0
{
"aggs": {
"myname_Summary": {
"date_histogram": {
"field": "#timestamp",
"interval": "week"
, "format": "yyyy-MM-dd"
, "time_zone": "-04:00"
},
"aggs":{ "total_volume" : {"sum": {"field": "volume"}}
}
}
}}
you can try this
{
"size": 0,
"aggs": {
"myname_Summary": {
"date_histogram": {
"field": "#timestamp",
"interval": "week",
"format": "yyyy-MM-dd",
"time_zone": "-04:00"
},
"aggs": {
"split": {
"terms": {
"field": "application",
"size": 10
},
"aggs": {
"transaction": {
"terms": {
"field": "transaction",
"size": 10
},
"aggs": {
"total_volume": {
"sum": {
"field": "volume"
}
}
}
}
}
}
}
}
}
}
Hope this helps

How can I count the number of documents where a field is within a certain range?

I am trying to build an elasticsearch query that counts the number of documents where a certain field is within a certain range. This aggregation is also contained inside of a date histogram aggregation, but I don't think that matters for the purpose of this question.
Example Data:
ID: Score
01: 4
02: 5
03: 10
04: 9
I would like to count the number of documents where 'Score' is >= 9. I have tried scripts and filters within this aggregation, but I can't get it to work.
This aggregation counts all documents, not just the ones that match the script.
"aggs": {
"report_days": {
"date_histogram": {
"field": "Date",
"interval": "day"
},
"aggs": {
"value_count": {
"field": "Score",
"script": "_value >=9"
}
}
}
}
This following aggregation gives me a parse failure, saying Parse Failure [Expected [START_OBJECT] under [field], but got a [VALUE_STRING] in [value_count]]:
"aggs": {
"report_days": {
"date_histogram": {
"field": "Date",
"interval": "day"
},
"aggs": {
"value_count": {
"field": "Score",
"filter": {
"range": {
"Score": {
"gte": 9
}
}
}
}
}
}
}
Thanks for any suggestions!
This query will give you the number of docs with score >= 9
{
"query": {
"range": {
"score": {
"gte": 9
}
}
}
}
and this agg will do the same
{
"aggs": {
"my agg": {
"range": {
"field": "score",
"ranges": [
{
"from": 9
}
]
}
}
}
}
Run the query ("score:>9") and check the hits->total value. See the examples in the doc.

Resources