Find lowest price each day in list of documents - elasticsearch

we store in elasticsearch documents with following fields
a (keyword)
b (keyword)
c (keyword)
date (date-time)
p (long)
how to find the lowest value p each date between 12/1/1920 and 12/31/1920 (pacific time zone)

You need to combine 2 elements: query and pipeline aggregations. Pipeline aggregations are aggregations on aggregations - first you create buckets per day (first aggregation) and then you take the min of each.
Here's how the query should look like:
{
"query": {
"range": {
"date": {
"gte": "1920-12-01",
"lte": "1920-12-31"
}
}
},
"aggs": {
"daily": {
"date_histogram": {
"field": "date",
"calendar_interval": "day"
},
"aggs": {
"min_price": {
"min": {"field": "p"}
}
}
}
}
}

Related

Elasticsearch aggregate on term multiple times per different time range

I'm trying to aggregate a field by each half of the time-range given in the query. For example, here's the query:
{
"query": {
"simple_query_string": {
"query": "+sitetype:(redacted) +sort_date:[now-2h TO now]"
}
}
}
...and I want to aggregate on term "product1.keyword" from now-2h to now-1h and aggregate on the same term "product1.keyword" from now-1h to now, so like:
"terms": {
"field": "product1",
"size": 10,
}
^ aggregate the top 10 results on product1 in now-2h TO now-1h,
and aggregate the top 10 results on product1 in now-1h TO now.
Clarification: product1 is not a date or time-related field. It would be like a type of car, phone, etc.
if you want use now in your query,you must make product1 field as date type,then you can try as below:
GET index1/_search
{
"size": 0,
"aggs": {
"dataAgg": {
"date_range": {
"field": "product1",
"ranges": [
{
"from": "now-2h",
"to": "now-1h"
},
{
"from": "now-1h",
"to": "now"
}
]
},
"aggs": {
"top10": {
"top_hits": {
"size": 10
}
}
}
}
}
}
and if you can't change product1's type ,you can try rang agg,but you must write the time explicitly instead of using now

How to calculate the number of empty bucket when aggregating by days?

I want to get the number of days that a person stayed in a town in May (Month equal to 5).
This is my query, but it gives me the number of entries in myindex that have PersonID equal to 111 and Month equal to 5. For example, this query may give me an output like 90, but there are maximally 31 days per month.
GET myindex/_search?
{
"size":0,
"query": {
"bool": {
"must": [
{ "match": {
"PersonID": "111"
}},
{ "match": {
"Month": "5"
}}
]
} },
"aggs": {
"stay_days": {
"terms" : {
"field": "Month"
}
}
}
}
In myindex I have fields like DateTime with the date and time when a person was registered by a camera, e.g. 2017-05-01T00:30:08". So, during a single day the same person may pass several times by the camera, but it should be count as 1.
How can I update my query in order to calculate the number of days per month instead of the number of capturing by a camera?
Assuming your DateTime field called datetime, one way to consider is DateHistogram aggregation:
{
"size": 0,
"query": {
"bool": {
"must": [
{
"match": {
"PersonID": "111"
}
},
{
"range": {
"datetime": {
"gte": "2017-05-01",
"lt": "2017-06-01"
}
}
}
]
}
},
"aggregations": {
"my_day_histogram": {
"date_histogram": {
"field": "datetime",
"interval": "1d",
"min_doc_count": 1
}
}
}
}
Pay attention, that, in the must clause I used range term with the datetime field (not necessary but you may consider the Month field redundant). Also, you may need to edit the date format in the range term to your mapping
my_day_histogram: divide the data to buckets of separate days by setting the "interval": "1d".
"min_doc_count": 1 removes buckets contains zero documents.
Other approach, remove the range/match for month 5 and extend the histogram for every day in the year.
This can be also aggregated with month histogram like so:
"aggregations": {
"my_month_histogram": {
"date_histogram": {
"field": "first_timestamp",
"interval": "1M",
"min_doc_count": 1
},
"aggregations": {
"my_day_histogram": {
"date_histogram": {
"field": "first_timestamp",
"interval": "1d"
}
}
}
}
}
Its clear to me that, in both ways you'll need to count the number of buckets for which indicates the number of days.

Defining a time range for aggregation in elasticsearch

I've got an index in ElasticSearch with documents having info about user connections to my platform. I want to build a query with day aggregation where I can count all users connected every day between two given dates.
I have 3 relevant fields to do so: user_id, connection_time_start, connection_time_end. I was doing the query this way:
{
"size": 0,
"query": {
"bool": {
"must": [
{
"range": {
"connection_time_start": {
"lte": "2017-08-04T23:59:59"
}
}
},
{
"range": {
"connection_time_end": {
"gte": "2017-08-02T00:00:00"
}
}
}
]
}
},
"aggs": {
"franja_horaria": {
"date_histogram": {
"field": "connection_time_start",
"interval": "day",
"format": "yyyy-MM-dd"
},
"aggs": {
"ids": {
"cardinality": {
"field": "user_id"
}
}
}
}
}
}
This query has given as a result buckets containing the number of users that had the starting connection at day 2, 3 & 4 of August. The problem is that there are users with connections starting on day 2 and ending on day 3 and even on day 4.
These users should compute for the connected user count for each day but as I'm doing the aggregation with the connection_time_start only counts for that day.
I've tried to add a range in the aggregation some thing like this(https://www.elastic.co/guide/en/elasticsearch/reference/2.4/search-aggregations-bucket-daterange-aggregation.html) but haven't got a good result.
Can anybody help me with this? Thanks in advance!

Elastic search date_histogram extended_bounds

I want to get date_histogram during specific period, how to restrict the date period? Should I use the extended_bounds parameter? For example : I want to query the date_histogram between '2016-08-01' and '2016-08-31', and the interval is day. I query with this expression :
{
"aggs": {
"cf_loan": {
"date_histogram": {
"field": "createDate",
"interval": "day",
"format": "yyyy-MM-dd",
"min_doc_count": 0,
"extended_bounds": {
"min": "2016-08-01",
"max": "2016-08-31"
}
}
}
}
}
But I get the date_histogram not in the range.
You're almost there, you need to add a range query in order to only select documents whose createDate field is in the desired range.
{
"query": {
"range": { <---- add this range query
"createDate": {
"gte": "2016-08-01T00:00:00.000Z",
"lt": "2016-09-01T00:00:00.000Z"
}
}
},
"aggs": {
"cf_loan": {
"date_histogram": {
"field": "createDate",
"interval": "day",
"format": "yyyy-MM-dd",
"min_doc_count": 0,
"extended_bounds": {
"min": "2016-08-01",
"max": "2016-08-31"
}
}
}
}
}
The role of the extended_bounds parameter is to make sure you'll get daily buckets from min to max even if there are no documents in them. For instance, say you have 1 document each day between 2016-08-04 and 2016-08-28, then without the extended_bounds parameter, you'd get 25 buckets (2016-08-04, 2016-08-05, 2016-08-06, ..., 2016-08-28).
With the extended_bounds parameter, you'll also get the following buckets but with 0 documents:
2016-08-01
2016-08-02
2016-08-03
2016-08-29
2016-08-30
2016-08-31

How to limit a date histogram aggregation of nested documents to a specific date range?

Version
Using Elasticsearch 1.7.2
Objective
I would like to create a graph of the number of predictions made by users per day for the last n days. In this case, 10 days.
Current query
{
"size": 0,
"aggs": {
"predictions": {
"nested": {
"path": "user_answers"
},
"aggs": {
"predictions_over_time": {
"date_histogram": {
"field": "user_answers.created",
"interval": "day",
"format": "yyyy-MM-dd",
"min_doc_count": 0
}
}
}
}
}
}
Issue
This query will return a histogram but will return buckets for all available dates across all documents. It doesn't restrict to a specific date range.
What have I tried?
I've tried a number of approaches to solving this, all of which have failed.
* Range filter, then histogram that
* Date range aggregation, then histogram the buckets
* Using extended_bounds with, full dates, now-10d and also timestamps
* Trying a range filter inside the histogram aggregation
Any guidance would be appreciated! Thanks.
query didn't work for me in that situation, what I used is a third aggs:
{
"size": 0,
"aggs": {
"user_answers": {
"nested": { "path": "user_answers" },
"aggs": {
"timed_user_answers": {
"filter": {
"range": {
"user_answers.created": {
"gte": "now",
"lte": "now -10d"
}
}
},
"aggs": {
"predictions_over_time": {
"date_histogram": {
"field": "user_answers.created",
"interval": "day",
"format": "yyyy-MM-dd",
"min_doc_count": 0
}
}
}
}
}
}
}
}
One aggs specifies nested, one specifies filter, and the last specifies the actual aggregation. Don't know why this syntax makes sense, but you seem to not be able to use two on the same aggs.
You need to add a query. Query can be anything except from post_filter. It should be nested and contain date range. One of the ways is to define a constant score query. Inside constant score query, use a nested filter which should use a range filter.
{
"query": {
"constant_score": {
"filter": {
"nested": {
"path": "user_answers",
"filter": {
"range": {
"user_answers.created": {
"gte": "now",
"lte": "now -10d"
}
}
}
}
}
}
}
}
Confirm if this works for you.

Resources