Elasticsearch Date_Histogram does not cover entire filter - elasticsearch

I'm using ES Date Histogram and a weird behavior started happening and I'm wondering why.
This is the request i'm sending to elasticsearch:
{
"from": 0,
"size": 0,
"query": {
"filtered": {
"filter": {
"and": [
{
"bool": {
"must": [
{
"range": {
"publishTime": {
"from": "2010-07-02T12:15:20.000Z",
"to": "2015-07-08T12:43:59.000Z"
}
}
}
]
}
}
]
}
}
},
"aggs": {
"agg|date_histogram|publishTime": {
"date_histogram": {
"field": "publishTime",
"interval": "1d",
"min_doc_count": 0
}
}
}
}
The result i'm getting are buckets, and the first bucket is:
{
"key_as_string": "2010-08-24T00:00:00.000Z",
"key": 1282608000000,
"doc_count": 1
}
So i'm filtering from 2010-07-02 and getting results only from 2010-08-24
This is just an example, I also saw this behavior with many more missing buckets (several months).
[edit]
this seems to correlate with the date of the first result, meaning that the first result in that time range is from 2010-08-24, but as I included "min_doc_count": 0 I expect to get results from that entire range

min_doc_count is only sufficient for returning empty buckets between the first and last documents matched by your filter. If you want to get results for the entire range you need to use extended_bounds as well:
"aggs": {
"agg|date_histogram|publishTime": {
"date_histogram": {
"field": "publishTime",
"interval": "1d",
"min_doc_count": 0
"extended_bounds": {
"min": 1278072920000,
"max": 1436359439000
}
}
}
}

Related

Filter an elasticsearch result after an aggregation

I have this elasticsearch query that get every x-locations for which the number of documents (with timestamp gte 1 month ago) is greater than 5000. I'm also able to get the most recent data timestamp for each of these x-locations.
Is it possible to add an additional filter at the end of the query, in order to ignore all x-locations for which the most recent timestamp is older than 2 days ago?
The query:
GET /mypattern-*/_search
{
"query": {
"bool": {
"must": [
{"match": {"method": "GET"}},
{
"range": {
"timestamp": {
"gte": "now-1M"
}
}
}
]
}
},
"aggs": {
"location_terms": {
"terms": {
"field": "x-location.keyword",
"min_doc_count": 500,
"size": 1000,
"order": {
"recent_timestamp": "desc"
}
},
"aggs": {
"recent_timestamp": {
"max": {
"field": "timestamp"
}
}
}
}
}
}

Elasticsearch: How set 'doc_count' of a FILTER-Aggregation in relation to total 'doc_count'

A seemingly very trivial problem prompted me today to read the Elasticsearch documentation again diligently. So far, however, I have not come across the solution....
Question:
is ther's a simple way to set the doc_count of a filter aggregation in relation to the total doc_count?
Here's a snippet from my search-request-json.
In the feature_occurrences aggregation I filtered documents.
Now I want to calculate the ratio filtered/all Docs in each time bucket.
GET my_index/_search
{
"aggs": {
"time_buckets": {
"date_histogram": {
"field": "date",
"calendar_interval": "1d",
"min_doc_count": 0
},
"aggs": {
"feature_occurrences": {
"filter": {
"term": {
"x": "y"
}
}
},
"feature_occurrences_per_doc" : {
// feature_occurences.doc_count / doc_count
}
Any Ideas ?
You can use bucket_script to calc the ratio:
{
"aggs": {
"date": {
"date_histogram": {
"field": "#timestamp",
"interval": "hour"
},
"aggs": {
"feature_occurrences": {
"filter": {
"term": {
"cloud.region": "westeurope"
}
}
},
"ratio": {
"bucket_script": {
"buckets_path": {
"doc_count": "_count",
"features_count": "feature_occurrences._count"
},
"script": "params.features_count / params.doc_count"
}
}
}
}
}
}
Elastic bucket script doc:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-pipeline-bucket-script-aggregation.html

Elasticsearch aggregation return null values as 0?

I can count the hits per day that match my queried string with this code, but if the span of a whole week has no hits, then the query will return nothing - as opposed to returning 0 for each day. Is there a way I can 'default' to 0?
GET index/_search
{
"size": 0,
"query": {
"bool": {
"must": [
{"match_phrase": {
"message": "Cannot login"
}
},
{"range": {
"#timestamp":{
"gte":"2021-07-01",
"lte":"2021-07-07"
}
}
}
]
}
},
"aggs": {
"hit_count_per_day": {
"date_histogram": {
"field": "#timestamp",
"calendar_interval": "day"
}
}
}
}
For this propose you need to add extended_bounds to your aggregation like below:
GET index/_search
{
"size": 0,
"query": {
"bool": {
"must": [
{"match_phrase": {
"message": "Cannot login"
}
},
{"range": {
"#timestamp":{
"gte":"2021-07-01",
"lte":"2021-07-07"
}
}
}
]
}
},
"aggs": {
"hit_count_per_day": {
"date_histogram": {
"field": "#timestamp",
"calendar_interval": "day",
"extended_bounds": {
"min": "2021-07-01",
"max": "2021-07-07"
}
}
}
}
}
Please let me know if it did not solve your problem.
When elasticsearch in aggregation response returns 0s means there are values not showed because of filters.
For example, I had a term filter on "region":"mexico" and it returned me:
"colombia": 0
"argentina": 0
"mexico": 7
Firsts two because of they are in dataset (index) but filtered.
Hopin' it could help.

Elasticsearch Aggregations: Only return results of one of them?

I'm trying to find a way to only return the results of one aggregation in an Elasticsearch query. I have a max bucket aggregation (the one that I want to see) that is calculated from a sum bucket aggregation based on a date histogram aggregation. Right now, I have to go through 1,440 results to get to the one I want to see. I've already removed the results of the base query with the size: 0 modifier, but is there a way to do something similar with the aggregations as well? I've tried slipping the same thing into a few places with no luck.
Here's the query:
{
"size": 0,
"query": {
"range": {
"timestamp": {
"gte": "2018-11-28",
"lte": "2018-11-28"
}
}
},
"aggs": {
"hits_per_minute": {
"date_histogram": {
"field": "timestamp",
"interval": "minute"
},
"aggs": {
"total_hits": {
"sum": {
"field": "hits_count"
}
}
}
},
"max_transactions_per_minute": {
"max_bucket": {
"buckets_path": "hits_per_minute>total_hits"
}
}
}
}
Fortunately enough, you can do that with bucket_sort aggregation, which was added in Elasticsearch 6.4.
Do it with bucket_sort
POST my_index/doc/_search
{
"size": 0,
"query": {
"range": {
"timestamp": {
"gte": "2018-11-28",
"lte": "2018-11-28"
}
}
},
"aggs": {
"hits_per_minute": {
"date_histogram": {
"field": "timestamp",
"interval": "minute"
},
"aggs": {
"total_hits": {
"sum": {
"field": "hits_count"
}
},
"max_transactions_per_minute": {
"bucket_sort": {
"sort": [
{"total_hits": {"order": "desc"}}
],
"size": 1
}
}
}
}
}
}
This will give you a response like this:
{
...
"aggregations": {
"hits_per_minute": {
"buckets": [
{
"key_as_string": "2018-11-28T21:10:00.000Z",
"key": 1543957800000,
"doc_count": 3,
"total_hits": {
"value": 11
}
}
]
}
}
}
Note that there is no extra aggregation in the output and the output of hits_per_minute is truncated (because we asked to give exactly one, topmost bucket).
Do it with filter_path
There is also a generic way to filter the output of Elasticsearch: Response filtering, as this answer suggests.
In this case it will be enough to just do the following query:
POST my_index/doc/_search?filter_path=aggregations.max_transactions_per_minute
{ ... (original query) ... }
That would give the response:
{
"aggregations": {
"max_transactions_per_minute": {
"value": 11,
"keys": [
"2018-12-04T21:10:00.000Z"
]
}
}
}

How can I count the number of documents where a field is within a certain range?

I am trying to build an elasticsearch query that counts the number of documents where a certain field is within a certain range. This aggregation is also contained inside of a date histogram aggregation, but I don't think that matters for the purpose of this question.
Example Data:
ID: Score
01: 4
02: 5
03: 10
04: 9
I would like to count the number of documents where 'Score' is >= 9. I have tried scripts and filters within this aggregation, but I can't get it to work.
This aggregation counts all documents, not just the ones that match the script.
"aggs": {
"report_days": {
"date_histogram": {
"field": "Date",
"interval": "day"
},
"aggs": {
"value_count": {
"field": "Score",
"script": "_value >=9"
}
}
}
}
This following aggregation gives me a parse failure, saying Parse Failure [Expected [START_OBJECT] under [field], but got a [VALUE_STRING] in [value_count]]:
"aggs": {
"report_days": {
"date_histogram": {
"field": "Date",
"interval": "day"
},
"aggs": {
"value_count": {
"field": "Score",
"filter": {
"range": {
"Score": {
"gte": 9
}
}
}
}
}
}
}
Thanks for any suggestions!
This query will give you the number of docs with score >= 9
{
"query": {
"range": {
"score": {
"gte": 9
}
}
}
}
and this agg will do the same
{
"aggs": {
"my agg": {
"range": {
"field": "score",
"ranges": [
{
"from": 9
}
]
}
}
}
}
Run the query ("score:>9") and check the hits->total value. See the examples in the doc.

Resources