Elasticsearch count in groups by date range - elasticsearch

I have documents like this:
{
body: 'some text',
read_date: '2017-12-22T10:19:40.223000'
}
Is there a way to query count of documents published in last 10 days group by date? For example:
2017-12-22, 150
2017-12-21, 79
2017-12-20, 111
2017-12-19, 27
2017-12-18, 100

Yes, you can easily achieve that using a date_histogram aggregation, like this:
{
"query": {
"range": {
"read_date": {
"gte": "now-10d"
}
}
},
"aggs": {
"byday": {
"date_histogram": {
"field": "read_date",
"interval": "day"
}
}
}
}

To receive day count of the past 10 days, per day you can POST the following query:
{
"query": {
"range": {
"read_date": {
"gte": "now-11d/d",
"lte": "now-1d/d"
}
}
},
"aggs" : {
"byDay" : {
"date_histogram" : {
"field" : "read_date",
"calendar_interval" : "1d",
"format" : "yyyy-MM-dd"
}
}
}
}
To the following Url: http://localhost:9200/Index_Name/Index_Type/_search?size=0
Setting size to 0 avoids executing the fetch phase of the search making the request more efficient. See this elastic documentation for more information.

Related

ElasticSearch/Kibana: get values that are not found in entries more recent than a certain date

I have a fleet of devices that push to ElasticSearch at regular intervals (let's say every 10 minutes) entries of this form:
{
"deviceId": "unique-device-id",
"timestamp": 1586390031,
"payload" : { various data }
}
I usually look at this through Kibana by filtering for the last 7 days of data and then drilling down by device id or some other piece of data from the payload.
Now I'm trying to get a sense of the health of this fleet by finding devices that haven't reported anything in the last hour let's say. I've been messing around with all sorts of filters and visualisations and the closest I got to this is a data table with device ids and the timestamp of the last entry for each, sorted by timestamp. This is useful but is somewhat hard to work with as I have a few thousand devices.
What I dream of is getting either the above mentioned table to contain only the device ids that have not reported in the last hour, or getting only two numbers: the total count of distinct device ids seen in the last 7 days and the total count of device ids not seen in the last hour.
Can you point me in the right direction, if any one of these is even possible?
I'll skip the table and take the second approach -- only getting the counts. I think it's possible to walk your way backwards to the rows from the counts.
Note: I'll be using a human readable time format instead of timestamps but epoch_seconds will work just as fine in your real use case. Also, I've added the comment field to give each doc some background.
First, set up a your index:
PUT fleet
{
"mappings": {
"properties": {
"timestamp": {
"type": "date",
"format": "epoch_second||yyyy-MM-dd HH:mm:ss"
},
"comment": {
"type": "text"
},
"deviceId": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
Sync a few docs -- I'm in UTC+2 so I chose these timestamps:
POST fleet/_doc
{
"deviceId": "asdjhfa343",
"timestamp": "2020-04-05 10:00:00",
"comment": "in the last week"
}
POST fleet/_doc
{
"deviceId": "asdjhfa343",
"timestamp": "2020-04-10 13:05:00",
"comment": "#asdjhfa343 in the last hour"
}
POST fleet/_doc
{
"deviceId": "asdjhfa343",
"timestamp": "2020-04-10 12:05:00",
"comment": "#asdjhfa343 in the 2 hours"
}
POST fleet/_doc
{
"deviceId": "asdjhfa343sdas",
"timestamp": "2020-04-07 09:00:00",
"comment": "in the last week"
}
POST fleet/_doc
{
"deviceId": "asdjhfa343sdas",
"timestamp": "2020-04-10 12:35:00",
"comment": "in last 2hrs"
}
In total, we've got 5 docs and 2 distinct device ids w/ the following conditions
all have appeared in the last 7d
both of which in the last 2h and
only one of which in the last hour
so I'm interested in finding precisely 1 deviceId which has appeared in the last 2hrs BUT not last 1hr.
Using a combination of filter (for range filters), cardinality (for distinct counts) and bucket script (for count differences) aggregations.
GET fleet/_search
{
"size": 0,
"aggs": {
"distinct_devices_last7d": {
"filter": {
"range": {
"timestamp": {
"gte": "now-7d"
}
}
},
"aggs": {
"uniq_device_count": {
"cardinality": {
"field": "deviceId.keyword"
}
}
}
},
"not_seen_last1h": {
"filter": {
"range": {
"timestamp": {
"gte": "now-2h"
}
}
},
"aggs": {
"device_ids_per_hour": {
"date_histogram": {
"field": "timestamp",
"calendar_interval": "day",
"format": "'disregard' -- yyyy-MM-dd"
},
"aggs": {
"total_uniq_count": {
"cardinality": {
"field": "deviceId.keyword"
}
},
"in_last_hour": {
"filter": {
"range": {
"timestamp": {
"gte": "now-1h"
}
}
},
"aggs": {
"uniq_count": {
"cardinality": {
"field": "deviceId.keyword"
}
}
}
},
"uniq_difference": {
"bucket_script": {
"buckets_path": {
"in_last_1h": "in_last_hour>uniq_count",
"in_last2h": "total_uniq_count"
},
"script": "params.in_last2h - params.in_last_1h"
}
}
}
}
}
}
}
}
The date_histogram aggregation is just a placeholder that enables us to use a bucket script to get the final difference and not have to do any post-processing.
Since we passed size: 0, we're not interested in the hits section. So taking only the aggregations, here are the annotated results:
...
"aggregations" : {
"not_seen_last1h" : {
"doc_count" : 3,
"device_ids_per_hour" : {
"buckets" : [
{
"key_as_string" : "disregard -- 2020-04-10",
"key" : 1586476800000,
"doc_count" : 3, <-- 3 device messages in the last 2hrs
"total_uniq_count" : {
"value" : 2 <-- 2 distinct IDs
},
"in_last_hour" : {
"doc_count" : 1,
"uniq_count" : {
"value" : 1 <-- 1 distict ID in the last hour
}
},
"uniq_difference" : {
"value" : 1.0 <-- 1 == final result !
}
}
]
}
},
"distinct_devices_last7d" : {
"meta" : { },
"doc_count" : 5, <-- 5 device messages in the last 7d
"uniq_device_count" : {
"value" : 2 <-- 2 unique IDs
}
}
}

Trends metric on Kibana Dashboard, it’s possible?

I want to create a metric in kibana dashboard, which use ratio of multiple metrics and offset period.
Example :
Date Budget
YYYY-MM-DD $
2019-01-01 15
2019-01-02 10
2019-01-03 5
2019-01-04 10
2019-01-05 12
2019-01-06 4
If I select time range between 2019-01-04 to 2019-01-06 , I want to compute ratio with offset period: 2019-01-01 to 2019-01-03.
to resume : (sum(10+12+4) - sum(15+10+5)) / sum(10+12+4) = -0.15
evolution of my budget equal to -15% (and this is what I want to print in the dashboard)
But, with metric it's not possible (no offset), with visual builder: different metric aggregation do not have different offset (too bad because bucket script allow to compute ratio), and with vega : I not found a solution too.
Any idea ? Thanks a lot
Aurélien
NB: I use kibana version > 6.X
Please check the below sample mapping which I've constructed based on data you've provided in the query and aggregation solution that you wanted to take a look.
Mapping:
PUT <your_index_name>
{
"mappings": {
"mydocs": {
"properties": {
"date": {
"type": "date",
"format": "yyyy-MM-dd"
},
"budget": {
"type": "float"
}
}
}
}
}
Aggregation
I've made use of the following types of aggregation:
Date Histogram where I've mentioned interval as 4d based on the data you've mentioned in the question
Sum
Derivative
Bucket Script which actually gives you the required budget evolution figure.
Also I'm assuming that the date format would be in yyyy-MM-dd and budget would be of float data type.
Below is how your aggregation query would be.
POST <your_index_name>/_search
{
"size": 0,
"query": {
"range": {
"date": {
"gte": "2019-01-01",
"lte": "2019-01-06"
}
}
},
"aggs": {
"my_date": {
"date_histogram": {
"field": "date",
"interval": "4d",
"format": "yyyy-MM-dd"
},
"aggs": {
"sum_budget": {
"sum": {
"field": "budget"
}
},
"budget_derivative": {
"derivative": {
"buckets_path": "sum_budget"
}
},
"budget_evolution": {
"bucket_script": {
"buckets_path": {
"input_1": "sum_budget",
"input_2": "budget_derivative"
},
"script": "(params.input_2/params.input_1)*(100)"
}
}
}
}
}
}
Note that the result that you are looking for would be in the budget_evolution part.
Hope this helps!

how to get derivative aggregations on the simple count

Using ES 2.3.3
I want to use a Derivative Aggregation but the metric that should be used to calculate it it's not something like avg or sum, it's just the raw doc_count of each bucket of the parent histogram (sales_per_month).
I got it to work like this, by using stats agg:
"aggs" : {
"sales_per_month" : {
"date_histogram" : {
"field" : "date",
"interval" : "month"
},
"aggs": {
"sales": {
"stats": {
"field": "price"
}
},
"sales_deriv": {
"derivative": {
"buckets_path": "sales.count"
}
}
}
}
}
Is this really the way to do this or am I missing a simpler way?
I don't think it can get any simpler than that. It looks good, simple and elegant.
There is no need do define a nested stats aggregation just for the purposes of referencing the count. There is an implicit _count property for each bucket which corresponds to doc_count, which you can use in bucket_path.
As in your example you're referencing a contextual parent aggregation you would simply only reference _count (i.e. you're already in context of sales_per_month aggregation).
In your specific case you would use it as:
"aggs": {
"sales_per_month": {
"date_histogram": {
"field": "date",
"interval": "month"
},
"aggs": {
"sales_deriv": {
"derivative": {
"buckets_path": "_count"
}
}
}
}
}

Elastic search aggregation with range query

I am working to build a ES query that satisfies the condition >= avg .
Here is an example:
GET /_search
{
"size" : 0,
"query" : {
"filtered": {
"filter": {
"range": {
"price": {
"gte": {
"aggs" : {
"single_avg_price": {
"avg" :{
"field" : "price"
}
}
}
}
}
}
}
}
}
}
I get the following error
"type": "query_parsing_exception",
"reason": "[range] query does not support [aggs]",
I wonder how do we use aggregated value with range query in Elastic query
You cannot embed aggregations inside a query. You need to first send an aggregation query to find out the average and then send a second range query using the obtained average value.
Query 1:
POST /_search
{
"size": 0,
"aggs": {
"single_avg_price": {
"avg": {
"field": "price"
}
}
}
}
Then you get the average price, say it was 12.3 and use it in your second query, like this:
Query 2:
POST /_search
{
"size": 10,
"query": {
"filtered": {
"filter": {
"range": {
"price": {
"gte": 12.3
}
}
}
}
}
}
After I tried using different ES aggregations such as bucket selector , I found that it can be done using python.
Here is the python code I created to solve this issue.
Please note: URL , USER_NAME , PASSWORD needs to be filled before run it.
#! /usr/bin/python
import sys,json,requests
from requests.auth import HTTPBasicAuth
# static variables
URL=''
USER_NAME=''
PASSWORD=''
# returns avg value
def getAvg():
query = json.dumps({
"aggs": {
"single_avg_price": {
"avg": {
"field": "price"
}
}
}
})
response = requests.get(URL,auth=HTTPBasicAuth(USER_NAME,PASSWORD), data=query)
results = json.loads(response.text)
return results['aggregations']['single_avg_price']['value']
#returns rows that are greater than avg value
def rows_greater_than_avg(avg_value):
query = json.dumps({
"query" : {
"range": {
"price": {
"gte":avg_value
}
}
}
})
response = requests.get(URL,auth=HTTPBasicAuth(USER_NAME,PASSWORD), data=query)
results = json.loads(response.text)
return results
# main method
def main():
avg_value = getAvg()
print( rows_greater_than_avg(avg_value))
main()

Getting count and grouping by date range in elastic search

Is there a way to get the count of rows and group them by hour, day or month.
For instance, assume I have the messages
_source{
"timestamp":"2013-10-01T12:30:25.421Z",
"amount":200
}
_source{
"timestamp":"2013-10-01T12:35:25.421Z",
"amount":300
}
_source{
"timestamp":"2013-10-02T13:53:25.421Z",
"amount":100
}
_source{
"timestamp":"2013-10-03T15:53:25.421Z",
"amount":400
}
Is there a way to get something alone the lines of {date, sum} (not necessarily in this format, just wondering if there is any way i can achieve this)
{
{"2013-10-01T12:00:00.000Z", 500},
{"2013-10-02T13:00:00.000Z", 100},
{"2013-10-03T15:00:00.000Z", 400}
}
Thank you
Try with aggregations.
{
"aggs": {
"amount_per_month": {
"date_histogram": {
"field": "timestamp",
"interval": "week"
},
"aggs": {
"total_amount": {
"sum": {
"field": "amount"
}
}
}
}
}
}
In addition, if you wanna count number of indexes replace sum content by:
"sum": {
"script": "1"
}
Hope it helps.
I need Query to fetch data from ElasticeSearch for count of month wise and count of Year wise registered Customer in our platform.
Below Queries are perfectly working and giving data correctly:
here : CustOnboardedOn : is Feild when Cust
Method type: POST
URL: http://SomeIP:9200/customer/_search?size=0
ES Query for Month wise aggregated customer
{
"aggs": {
"amount_per_month": {
"date_histogram": {
"field": "CustOnboardedOn",
"interval": "month"
}
}
}
}
ES Query: Year wise Aggregation.
{
"aggs": {
"amount_per_month": {
"date_histogram": {
"field": "CustOnboardedOn",
"interval": "year"
}
}
}
}

Resources