How can I count the number of documents where a field is within a certain range? - elasticsearch

I am trying to build an elasticsearch query that counts the number of documents where a certain field is within a certain range. This aggregation is also contained inside of a date histogram aggregation, but I don't think that matters for the purpose of this question.
Example Data:
ID: Score
01: 4
02: 5
03: 10
04: 9
I would like to count the number of documents where 'Score' is >= 9. I have tried scripts and filters within this aggregation, but I can't get it to work.
This aggregation counts all documents, not just the ones that match the script.
"aggs": {
"report_days": {
"date_histogram": {
"field": "Date",
"interval": "day"
},
"aggs": {
"value_count": {
"field": "Score",
"script": "_value >=9"
}
}
}
}
This following aggregation gives me a parse failure, saying Parse Failure [Expected [START_OBJECT] under [field], but got a [VALUE_STRING] in [value_count]]:
"aggs": {
"report_days": {
"date_histogram": {
"field": "Date",
"interval": "day"
},
"aggs": {
"value_count": {
"field": "Score",
"filter": {
"range": {
"Score": {
"gte": 9
}
}
}
}
}
}
}
Thanks for any suggestions!

This query will give you the number of docs with score >= 9
{
"query": {
"range": {
"score": {
"gte": 9
}
}
}
}
and this agg will do the same
{
"aggs": {
"my agg": {
"range": {
"field": "score",
"ranges": [
{
"from": 9
}
]
}
}
}
}

Run the query ("score:>9") and check the hits->total value. See the examples in the doc.

Related

Query returns result with small size that is not my intention in elasticsearch

I am using rest api to query the result from ElasticSearch.
Below is the API query string.
GET /..../_search
{
"size":0,
"query": {
"bool": {
"must": [
{ "range": {
"#timestamp": {
"time_zone": "+09:00",
"gte": "2023-01-24T00:00:00.000Z",
"lt": "2023-01-24T03:03:00.000Z" } } },
{
"term" : {
"serviceid.keyword" : {
"value" : "430011397"
}
}
}
]
}
},
"aggs": {
"by_day": {
"auto_date_histogram": {
"field": "#timestamp",
"minimum_interval":"minute"
},
"aggs": {
"agg-type": {
"terms": {
"field": "nxlogtype.keyword",
"size": 100000
},
"aggs": {
"my-sub-agg-name": {
"avg": {
"field": "size"
}
}
}
}
}
}
}
}
As you can see, I specified the time range about three hours in gte and lt field.
However, the result returns only 6 buckets which have 30 minute intervals.
I expected that many buckets will be returned with one minute interval during the timestamp I specified, but the result is always same even though I changed the time range as more extended one.
Since I am quite new to elastic search, I am not familiar with query usage.
How to resolve my issue?

Filter an elasticsearch result after an aggregation

I have this elasticsearch query that get every x-locations for which the number of documents (with timestamp gte 1 month ago) is greater than 5000. I'm also able to get the most recent data timestamp for each of these x-locations.
Is it possible to add an additional filter at the end of the query, in order to ignore all x-locations for which the most recent timestamp is older than 2 days ago?
The query:
GET /mypattern-*/_search
{
"query": {
"bool": {
"must": [
{"match": {"method": "GET"}},
{
"range": {
"timestamp": {
"gte": "now-1M"
}
}
}
]
}
},
"aggs": {
"location_terms": {
"terms": {
"field": "x-location.keyword",
"min_doc_count": 500,
"size": 1000,
"order": {
"recent_timestamp": "desc"
}
},
"aggs": {
"recent_timestamp": {
"max": {
"field": "timestamp"
}
}
}
}
}
}

Elasticsearch: How set 'doc_count' of a FILTER-Aggregation in relation to total 'doc_count'

A seemingly very trivial problem prompted me today to read the Elasticsearch documentation again diligently. So far, however, I have not come across the solution....
Question:
is ther's a simple way to set the doc_count of a filter aggregation in relation to the total doc_count?
Here's a snippet from my search-request-json.
In the feature_occurrences aggregation I filtered documents.
Now I want to calculate the ratio filtered/all Docs in each time bucket.
GET my_index/_search
{
"aggs": {
"time_buckets": {
"date_histogram": {
"field": "date",
"calendar_interval": "1d",
"min_doc_count": 0
},
"aggs": {
"feature_occurrences": {
"filter": {
"term": {
"x": "y"
}
}
},
"feature_occurrences_per_doc" : {
// feature_occurences.doc_count / doc_count
}
Any Ideas ?
You can use bucket_script to calc the ratio:
{
"aggs": {
"date": {
"date_histogram": {
"field": "#timestamp",
"interval": "hour"
},
"aggs": {
"feature_occurrences": {
"filter": {
"term": {
"cloud.region": "westeurope"
}
}
},
"ratio": {
"bucket_script": {
"buckets_path": {
"doc_count": "_count",
"features_count": "feature_occurrences._count"
},
"script": "params.features_count / params.doc_count"
}
}
}
}
}
}
Elastic bucket script doc:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-pipeline-bucket-script-aggregation.html

Elasticsearch Pagination with timestamp range

Elasticsearch official documentation introduce that elasticsearch can realize pagination by composite aggregations.
The composite aggregation will fetch data many times to get all results.
So my question is, Can I use range from now-1h to now when I execute composite aggregation?
If I can. How to composite aggregation query keep source data unchanging when every range query have different now.
If I can't. My query below has no error and the result seems to be right.
{
"size": 0,
"query": {
"bool": {
"filter": [
{
"range": {
"timestamp": {
"gte": "now-1h"
}
}
}
]
}
},
"aggs": {
"user_device": {
"composite": {
"after": {
"user_name": "alen.lv"
},
"size": 100,
"sources": [
{
"user_name": {
"terms": {
"field": "user_name"
}
}
}
]
},
"aggs": {
"user_mac": {
"terms": {
"field": "user_mac",
"size": 1000
}
}
}
}
}
}

Elasticsearch Aggregations: Only return results of one of them?

I'm trying to find a way to only return the results of one aggregation in an Elasticsearch query. I have a max bucket aggregation (the one that I want to see) that is calculated from a sum bucket aggregation based on a date histogram aggregation. Right now, I have to go through 1,440 results to get to the one I want to see. I've already removed the results of the base query with the size: 0 modifier, but is there a way to do something similar with the aggregations as well? I've tried slipping the same thing into a few places with no luck.
Here's the query:
{
"size": 0,
"query": {
"range": {
"timestamp": {
"gte": "2018-11-28",
"lte": "2018-11-28"
}
}
},
"aggs": {
"hits_per_minute": {
"date_histogram": {
"field": "timestamp",
"interval": "minute"
},
"aggs": {
"total_hits": {
"sum": {
"field": "hits_count"
}
}
}
},
"max_transactions_per_minute": {
"max_bucket": {
"buckets_path": "hits_per_minute>total_hits"
}
}
}
}
Fortunately enough, you can do that with bucket_sort aggregation, which was added in Elasticsearch 6.4.
Do it with bucket_sort
POST my_index/doc/_search
{
"size": 0,
"query": {
"range": {
"timestamp": {
"gte": "2018-11-28",
"lte": "2018-11-28"
}
}
},
"aggs": {
"hits_per_minute": {
"date_histogram": {
"field": "timestamp",
"interval": "minute"
},
"aggs": {
"total_hits": {
"sum": {
"field": "hits_count"
}
},
"max_transactions_per_minute": {
"bucket_sort": {
"sort": [
{"total_hits": {"order": "desc"}}
],
"size": 1
}
}
}
}
}
}
This will give you a response like this:
{
...
"aggregations": {
"hits_per_minute": {
"buckets": [
{
"key_as_string": "2018-11-28T21:10:00.000Z",
"key": 1543957800000,
"doc_count": 3,
"total_hits": {
"value": 11
}
}
]
}
}
}
Note that there is no extra aggregation in the output and the output of hits_per_minute is truncated (because we asked to give exactly one, topmost bucket).
Do it with filter_path
There is also a generic way to filter the output of Elasticsearch: Response filtering, as this answer suggests.
In this case it will be enough to just do the following query:
POST my_index/doc/_search?filter_path=aggregations.max_transactions_per_minute
{ ... (original query) ... }
That would give the response:
{
"aggregations": {
"max_transactions_per_minute": {
"value": 11,
"keys": [
"2018-12-04T21:10:00.000Z"
]
}
}
}

Resources