Defining a time range for aggregation in elasticsearch - elasticsearch

I've got an index in ElasticSearch with documents having info about user connections to my platform. I want to build a query with day aggregation where I can count all users connected every day between two given dates.
I have 3 relevant fields to do so: user_id, connection_time_start, connection_time_end. I was doing the query this way:
{
"size": 0,
"query": {
"bool": {
"must": [
{
"range": {
"connection_time_start": {
"lte": "2017-08-04T23:59:59"
}
}
},
{
"range": {
"connection_time_end": {
"gte": "2017-08-02T00:00:00"
}
}
}
]
}
},
"aggs": {
"franja_horaria": {
"date_histogram": {
"field": "connection_time_start",
"interval": "day",
"format": "yyyy-MM-dd"
},
"aggs": {
"ids": {
"cardinality": {
"field": "user_id"
}
}
}
}
}
}
This query has given as a result buckets containing the number of users that had the starting connection at day 2, 3 & 4 of August. The problem is that there are users with connections starting on day 2 and ending on day 3 and even on day 4.
These users should compute for the connected user count for each day but as I'm doing the aggregation with the connection_time_start only counts for that day.
I've tried to add a range in the aggregation some thing like this(https://www.elastic.co/guide/en/elasticsearch/reference/2.4/search-aggregations-bucket-daterange-aggregation.html) but haven't got a good result.
Can anybody help me with this? Thanks in advance!

Related

elasticsearch sum more then 10000 results

I would like to know how to do a sum aggregation with more than 10000 results plz ?
I can't find it in the docs
Thank you.
GET index/_search?pretty
{
"query": {
"bool": {
"must": [
{
"range": {
"created_at": {
"gte": "2022-01-01 00:00",
"format": "yyyy-MM-dd HH:mm"
}
}
}
]
}
},
"aggs": {
"nb_sms": {
"sum": {
"field": "sms_count"
}
}
},
"size": 0
}
You can do partitions and then sum the results of the partitions.
you can check this link: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#_filtering_values_with_partitions
It is going to split you data not evenly but it is not going to duplicate anything.
So, you can do aggregation for the partitions and a bucket_sum of that aggregation, and a subaggregation under the partition for the sum.

Find lowest price each day in list of documents

we store in elasticsearch documents with following fields
a (keyword)
b (keyword)
c (keyword)
date (date-time)
p (long)
how to find the lowest value p each date between 12/1/1920 and 12/31/1920 (pacific time zone)
You need to combine 2 elements: query and pipeline aggregations. Pipeline aggregations are aggregations on aggregations - first you create buckets per day (first aggregation) and then you take the min of each.
Here's how the query should look like:
{
"query": {
"range": {
"date": {
"gte": "1920-12-01",
"lte": "1920-12-31"
}
}
},
"aggs": {
"daily": {
"date_histogram": {
"field": "date",
"calendar_interval": "day"
},
"aggs": {
"min_price": {
"min": {"field": "p"}
}
}
}
}
}

How to calculate the number of empty bucket when aggregating by days?

I want to get the number of days that a person stayed in a town in May (Month equal to 5).
This is my query, but it gives me the number of entries in myindex that have PersonID equal to 111 and Month equal to 5. For example, this query may give me an output like 90, but there are maximally 31 days per month.
GET myindex/_search?
{
"size":0,
"query": {
"bool": {
"must": [
{ "match": {
"PersonID": "111"
}},
{ "match": {
"Month": "5"
}}
]
} },
"aggs": {
"stay_days": {
"terms" : {
"field": "Month"
}
}
}
}
In myindex I have fields like DateTime with the date and time when a person was registered by a camera, e.g. 2017-05-01T00:30:08". So, during a single day the same person may pass several times by the camera, but it should be count as 1.
How can I update my query in order to calculate the number of days per month instead of the number of capturing by a camera?
Assuming your DateTime field called datetime, one way to consider is DateHistogram aggregation:
{
"size": 0,
"query": {
"bool": {
"must": [
{
"match": {
"PersonID": "111"
}
},
{
"range": {
"datetime": {
"gte": "2017-05-01",
"lt": "2017-06-01"
}
}
}
]
}
},
"aggregations": {
"my_day_histogram": {
"date_histogram": {
"field": "datetime",
"interval": "1d",
"min_doc_count": 1
}
}
}
}
Pay attention, that, in the must clause I used range term with the datetime field (not necessary but you may consider the Month field redundant). Also, you may need to edit the date format in the range term to your mapping
my_day_histogram: divide the data to buckets of separate days by setting the "interval": "1d".
"min_doc_count": 1 removes buckets contains zero documents.
Other approach, remove the range/match for month 5 and extend the histogram for every day in the year.
This can be also aggregated with month histogram like so:
"aggregations": {
"my_month_histogram": {
"date_histogram": {
"field": "first_timestamp",
"interval": "1M",
"min_doc_count": 1
},
"aggregations": {
"my_day_histogram": {
"date_histogram": {
"field": "first_timestamp",
"interval": "1d"
}
}
}
}
}
Its clear to me that, in both ways you'll need to count the number of buckets for which indicates the number of days.

Applying filters on results of aggregation in elastic search

I am stuck with a problem where I need to apply some filters on results of an aggregation in elastic search.
For example, assume that the following are the fields
event_name, location, time, user_id
Now my requirement is to get the user ids who have performed a specific action (lets say "logged_in") in the last one month atleast 5 times. I am able to get the users who have logged_in in the last one month. But how do I filter the results further?
The query I have written is:
{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"range":{
"time":{
"from": 1412312824,
"to": 1422142824
}
}
},
{
"term": {
"action": "logged_in"
}
}
]
}
}
}
},
"aggs": {
"result": {
"terms": {
"field": "user_id"
}
}
}
}
Sample output:
user_id, doc_count
1 10
2 25
3 1
4 2
I need to apply filter on the above result. How do I do it?
I believe you can just add a min_doc_count key to your terms aggregation, like so:
...
"aggs": {
"result": {
"terms": {
"field": "user_id",
"min_doc_count": 5
}
}
}
...
Source: https://www.elastic.co/guide/en/elasticsearch/reference/1.6/search-aggregations-bucket-terms-aggregation.html#_minimum_document_count

How to limit a date histogram aggregation of nested documents to a specific date range?

Version
Using Elasticsearch 1.7.2
Objective
I would like to create a graph of the number of predictions made by users per day for the last n days. In this case, 10 days.
Current query
{
"size": 0,
"aggs": {
"predictions": {
"nested": {
"path": "user_answers"
},
"aggs": {
"predictions_over_time": {
"date_histogram": {
"field": "user_answers.created",
"interval": "day",
"format": "yyyy-MM-dd",
"min_doc_count": 0
}
}
}
}
}
}
Issue
This query will return a histogram but will return buckets for all available dates across all documents. It doesn't restrict to a specific date range.
What have I tried?
I've tried a number of approaches to solving this, all of which have failed.
* Range filter, then histogram that
* Date range aggregation, then histogram the buckets
* Using extended_bounds with, full dates, now-10d and also timestamps
* Trying a range filter inside the histogram aggregation
Any guidance would be appreciated! Thanks.
query didn't work for me in that situation, what I used is a third aggs:
{
"size": 0,
"aggs": {
"user_answers": {
"nested": { "path": "user_answers" },
"aggs": {
"timed_user_answers": {
"filter": {
"range": {
"user_answers.created": {
"gte": "now",
"lte": "now -10d"
}
}
},
"aggs": {
"predictions_over_time": {
"date_histogram": {
"field": "user_answers.created",
"interval": "day",
"format": "yyyy-MM-dd",
"min_doc_count": 0
}
}
}
}
}
}
}
}
One aggs specifies nested, one specifies filter, and the last specifies the actual aggregation. Don't know why this syntax makes sense, but you seem to not be able to use two on the same aggs.
You need to add a query. Query can be anything except from post_filter. It should be nested and contain date range. One of the ways is to define a constant score query. Inside constant score query, use a nested filter which should use a range filter.
{
"query": {
"constant_score": {
"filter": {
"nested": {
"path": "user_answers",
"filter": {
"range": {
"user_answers.created": {
"gte": "now",
"lte": "now -10d"
}
}
}
}
}
}
}
}
Confirm if this works for you.

Resources