Getting count and grouping by date range in elastic search - elasticsearch

Is there a way to get the count of rows and group them by hour, day or month.
For instance, assume I have the messages
_source{
"timestamp":"2013-10-01T12:30:25.421Z",
"amount":200
}
_source{
"timestamp":"2013-10-01T12:35:25.421Z",
"amount":300
}
_source{
"timestamp":"2013-10-02T13:53:25.421Z",
"amount":100
}
_source{
"timestamp":"2013-10-03T15:53:25.421Z",
"amount":400
}
Is there a way to get something alone the lines of {date, sum} (not necessarily in this format, just wondering if there is any way i can achieve this)
{
{"2013-10-01T12:00:00.000Z", 500},
{"2013-10-02T13:00:00.000Z", 100},
{"2013-10-03T15:00:00.000Z", 400}
}
Thank you

Try with aggregations.
{
"aggs": {
"amount_per_month": {
"date_histogram": {
"field": "timestamp",
"interval": "week"
},
"aggs": {
"total_amount": {
"sum": {
"field": "amount"
}
}
}
}
}
}
In addition, if you wanna count number of indexes replace sum content by:
"sum": {
"script": "1"
}
Hope it helps.

I need Query to fetch data from ElasticeSearch for count of month wise and count of Year wise registered Customer in our platform.
Below Queries are perfectly working and giving data correctly:
here : CustOnboardedOn : is Feild when Cust
Method type: POST
URL: http://SomeIP:9200/customer/_search?size=0
ES Query for Month wise aggregated customer
{
"aggs": {
"amount_per_month": {
"date_histogram": {
"field": "CustOnboardedOn",
"interval": "month"
}
}
}
}
ES Query: Year wise Aggregation.
{
"aggs": {
"amount_per_month": {
"date_histogram": {
"field": "CustOnboardedOn",
"interval": "year"
}
}
}
}

Related

elasticsearch Need average per week of some value

I have simple data as
sales, date_of_sales
I need is average per week i.e. sum(sales)/no.of weeks.
Please help.
What i have till now is
{
"size": 0,
"aggs": {
"WeekAggergation": {
"date_histogram": {
"field": "date_of_sales",
"interval": "week"
}
},
"TotalSales": {
"sum": {
"field": "sales"
}
},
"myValue": {
"bucket_script": {
"buckets_path": {
"myGP": "TotalSales",
"myCount": "WeekAggergation._bucket_count"
},
"script": "params.myGP/params.myCount"
}
}
}
}
I get the error
Invalid pipeline aggregation named [myValue] of type [bucket_script].
Only sibling pipeline aggregations are allowed at the top level.
I think this may help:
{
"size": 0,
"aggs": {
"WeekAggergation": {
"date_histogram": {
"field": "date_of_sale",
"interval": "week",
"format": "yyyy-MM-dd"
},
"aggs": {
"TotalSales": {
"sum": {
"field": "sales"
}
},
"AvgSales": {
"avg": {
"field": "sales"
}
}
}
},
"avg_all_weekly_sales": {
"avg_bucket": {
"buckets_path": "WeekAggergation>TotalSales"
}
}
}
}
Note the TotalSales aggregation is now a nested aggregation under the weekly histogram aggregation (I believe there was a typo in the code provided - the simple schema provided indicated the field name of date_of_sale and the aggregation provided uses the plural form date_of_sales). This provides you a total of all sales in the weekly bucket.
Additionally, AvgSales provides a similar nested aggregation under the weekly histogram aggregation so you can see the average of all sales specific to that week.
Finally, the pipeline aggregation avg_all_weekly_sales will give the average of weekly sales based on the TotalSales bucket and the number of non-empty buckets - if you want to include empty buckets, add the gap_policy parameter like so:
...
"avg_all_weekly_sales": {
"avg_bucket": {
"buckets_path": "WeekAggergation>TotalSales",
"gap_policy": "insert_zeros"
}
}
...
(see: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-pipeline-avg-bucket-aggregation.html).
This pipeline aggregation may or may not be what you're actually looking for, so please check the math to ensure the result is what is expected, but should provide the correct output based on the original script.

Unexpected results when using min sub-aggregation in Elasticsearch

My documents include the fields name and date_year, and my goal is to find the most recently added names (e.g. the ten last added names with their first year of appearance and the total number of documents). I therefore have a terms aggregation on name, which is ordered by a min sub-aggregation on date_year:
{
"aggs": {
"group_by_name": {
"terms": {
"field": "name",
"order": {
"start_year": "desc"
}
},
"aggs": {
"start_year": {
"min": {
"field": "date_year"
}
}
}
}
}
}
This is returning unexpected results, when not adding size under terms. For example, the first bucket has doc_count 1 and start_year 2015, while I'm sure that there are tens of documents with this name, and the earliest date_year is 1870. When I add a large enough size, the results are accurate. For example:
{
"aggs": {
"group_by_name": {
"terms": {
"field": "name",
"size": 10000, <------ large enough value
"order": {
"start_year": "desc"
}
},
"aggs": {
"start_year": {
"min": {
"field": "date_year"
}
}
}
}
}
}
Can anyone explain to me what is causing this, and how I can limit the number of buckets returned? What I need would look something like this in SQL:
select name, min(year), count(*) from documents group by name order by min(year) desc limit 10

Elasticsearch count in groups by date range

I have documents like this:
{
body: 'some text',
read_date: '2017-12-22T10:19:40.223000'
}
Is there a way to query count of documents published in last 10 days group by date? For example:
2017-12-22, 150
2017-12-21, 79
2017-12-20, 111
2017-12-19, 27
2017-12-18, 100
Yes, you can easily achieve that using a date_histogram aggregation, like this:
{
"query": {
"range": {
"read_date": {
"gte": "now-10d"
}
}
},
"aggs": {
"byday": {
"date_histogram": {
"field": "read_date",
"interval": "day"
}
}
}
}
To receive day count of the past 10 days, per day you can POST the following query:
{
"query": {
"range": {
"read_date": {
"gte": "now-11d/d",
"lte": "now-1d/d"
}
}
},
"aggs" : {
"byDay" : {
"date_histogram" : {
"field" : "read_date",
"calendar_interval" : "1d",
"format" : "yyyy-MM-dd"
}
}
}
}
To the following Url: http://localhost:9200/Index_Name/Index_Type/_search?size=0
Setting size to 0 avoids executing the fetch phase of the search making the request more efficient. See this elastic documentation for more information.

ElasticSearch range in sum aggregation

I'm a new user of elasticsearch and I would like make a range on sum aggregation.
So, I have :
{
"query": {},
"aggs": {
"group_by_trainset" : {
"terms": {
"field": "trainset",
"order": { "sum_compteur": "desc" }
},
"aggs": {
"sum_compteur": {
"sum": {
"field": "compteur"
}
}
}
}
}
}
And I have a 10 first results.
I want a pagination or it's not possible to aggs on elasticsearch. I try to return the next 10 results.
So, I want display the 10 results that are lower than the lowest value of the "sum_compteur" of the first 10 results and I don't know how.
Thanks for your help !
For every hit you'll get same Aggregations given input parameters are not changes.
If you want to specify size in aggregation counts you can do is:
"aggs": {
"sum_compteur": {
"sum": {
"field": "compteur",
"size" : 1000,
"order" : { "_count" : "asc" }
}
}
}
Where *1000 is the no of aggregation values you need.
You can also sort the results using "order". And later add pagination in the output array..

Using Date Histogram in Elasticsearch to count sequential activity

I am indexing Tomcat access-log data into Elasticsearch (1.7.3).
The documents that I deal with have the concept of duration, represented as end time and duration in millisec
(start time can be calculated, though I can store it as well, if it helps solve my problem).
For example:
{
ztime: "10-17-2015T04:05:00.000+02:00",
duration: 4500,
thred: "http-nio-8080-exec-14"
},
{
ztime: "10-17-2015T04:07:42.227+02:00",
duration: 3100,
thred: "http-nio-8080-exec-25"
}
My goal is to produce a histogram where I show for each second how many threads existed.
I thought of using a date_histogram that will aggregate my docs into 1 sec buckets.
GET /mindex/mtype/_search?search_type=count
{
"aggs": {
"threads_per_hr": {
"date_histogram": {
"field": "ztime",
"interval": "1s",
"min_doc_count": 1
},
"aggs": {
"per_hr_threads": {
"cardinality": {
"field": "thread"
}
}
}
}
}
}
however, thus each thread will be bucketized only once.
What I need is for each doc to be bucketized into several buckets.
For example, I will need the first document to be bucketized into the 04:05:00.000, 04:05:01.000, 04:05:02.000, 04:05:03.000 buckets.
What kind of query (Java API and/or REST API) would help me achieve this goal?
You need to use cardinality aggregation here. It gives the number of unique values for the field.
GET /{index}/{type}/_search?search_type=count
{
"aggs": {
"threads_per_hr": {
"date_histogram": {
"field": "ztime",
"interval": "1s",
"min_doc_count": 0
},
"aggs": {
"per_hr_threads": {
"cardinality": {
"field": "thread"
}
}
}
}
}
}

Resources