Elasticsearch: Can I return only the cardinality of a buckets agg, without returning all the buckets? - elasticsearch

Take the following query and result,
POST index/_search
{
"size": 0,
"aggs": {
"perDeviceAggregation": {
"terms": {
"field": "deviceID"
},
"aggs": {
"score_avg": {
"avg": {
"field": "device_score"
}
}
}
},
"count":{
"cardinality": {
"field": "deviceID"
}
}
}
}
result:
"aggregations": {
"aads": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "aa",
"doc_count": 3,
"score_avg": {
"value": 3.8
}
},
{
"key": "bb",
"doc_count": 1,
"score_avg": {
"value": 3.8
}
}
]
},
"count": {
"value": 2
}
}
That's great. But in my situation, I don't really care about information about each bucket. I only want to know the # of buckets. Something like the following:
"aggregations": {
"aads": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"bucket_count": 2
}
}
Is this possible in Elasticsearch?
Edit:
You might wonder why I calculate an average (which limits using terms instead of cardinality) if I don't care about what's in buckets. I do use the average to do a range aggregation. My actual problem is like folowing: The above question was simplified.
POST index/_search
{
"size": 0,
"aggs" : {
"mos_over_time" : {
"range" : {
"field" : "device_score",
"ranges" : [
{ "from" : 0.0, "to" : 2.6 },
{ "from" : 2.6, "to" : 4.0 },
{ "from" : 4.0 }
]
},
"aggs": {
"perDeviceAggregation": {
"terms": {
"field": "deviceID"
},
"aggs": {
"score_avg": {
"avg": {
"field": "device_score"
}
}
}
},
"count":{
"cardinality": {
"field": "deviceID"
}
}
}
}
}
}

Related

Elastic aggregation on specific values from within one field

I am migrating my db from postgres to elasticsearch. My postgres query looks like this:
select site_id, count(*) from r_2332 where site_id in ('1300','1364') and date >= '2021-01-25' and date <= '2021-01-30'
The expected result is as follows:
site_id count
1300 1234
1364 2345
I am trying to derive the same result from elasticsearch aggs. I have tried the following:
GET /r_2332/_search
{
"query": {
"bool" : {
"should" : [
{"match" : {"site_id": "1300"}},
{"match" : {"site_id": "1364"}}
],"minimum_should_match": 1
}
},
"aggs" : {
"footfall" : {
"range" : {
"field" : "date",
"ranges" : [
{
"from":"2021-01-21",
"to":"2021-01-30"
}
]
}
}
}
}
This gives me the result as follows:
"aggregations":{"footfall":{"buckets":[{"key":"2021-01-21T00:00:00.000Z-2021-01-30T00:00:00.000Z","from":1.6111872E12,"from_as_string":"2021-01-21T00:00:00.000Z","to":1.6119648E12,"to_as_string":"2021-01-30T00:00:00.000Z","doc_count":2679}]}
and this:
GET /r_2332/_search
{
"query": {
"terms": {
"site_id": [ "1300", "1364" ],
"boost": 1.0
}
},
"aggs" : {
"footfall" : {
"range" : {
"field" : "date",
"ranges" : [
{
"from":"2021-01-21",
"to":"2021-01-30"
}
]
}
}
}
}
This provided the same result:
"aggregations":{"footfall":{"buckets":[{"key":"2021-01-21T00:00:00.000Z-2021-01-30T00:00:00.000Z","from":1.6111872E12,"from_as_string":"2021-01-21T00:00:00.000Z","to":1.6119648E12,"to_as_string":"2021-01-30T00:00:00.000Z","doc_count":2679}]}
How do I get the result separately for each site_id?
You can use a combination of terms and range aggregation to achieve your task
Adding a working example with index data, search query and search result
Index Data:
{
"site_id":1365,
"date":"2021-01-24"
}
{
"site_id":1300,
"date":"2021-01-22"
}
{
"site_id":1300,
"date":"2020-01-22"
}
{
"site_id":1364,
"date":"2021-01-24"
}
Search Query:
{
"size": 0,
"aggs": {
"siteId": {
"terms": {
"field": "site_id",
"include": [
1300,
1364
]
},
"aggs": {
"footfall": {
"range": {
"field": "date",
"ranges": [
{
"from": "2021-01-21",
"to": "2021-01-30"
}
]
}
}
}
}
}
}
Search Result:
"aggregations": {
"siteId": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 1300,
"doc_count": 2,
"footfall": {
"buckets": [
{
"key": "2021-01-21T00:00:00.000Z-2021-01-30T00:00:00.000Z",
"from": 1.6111872E12,
"from_as_string": "2021-01-21T00:00:00.000Z",
"to": 1.6119648E12,
"to_as_string": "2021-01-30T00:00:00.000Z",
"doc_count": 1 // note this
}
]
}
},
{
"key": 1364,
"doc_count": 1,
"footfall": {
"buckets": [
{
"key": "2021-01-21T00:00:00.000Z-2021-01-30T00:00:00.000Z",
"from": 1.6111872E12,
"from_as_string": "2021-01-21T00:00:00.000Z",
"to": 1.6119648E12,
"to_as_string": "2021-01-30T00:00:00.000Z",
"doc_count": 1 // note this
}
]
}
}
]
}
}
This might perform better
{
"size": 0,
"query": {
"bool": {
"filter": [
{
"terms": {
"site_id": [
"1300",
"1365"
]
}
},
{
"range": {
"date": {
"gte": "2021-01-21",
"lte": "2021-01-24"
}
}
}
]
}
},
"aggs": {
"group_by": {
"terms": {
"field": "site_id"
}
}
}
}

Stats Aggregation with Min Mode in ElasticSearch

I have the below mapping in ElasticSearch
{
"properties":{
"Costs":{
"type":"nested",
"properties":{
"price":{
"type":"integer"
}
}
}
}
}
So every document has an Array field Costs, which contains many elements and each element has price in it. I want to find the min and max price with the condition being - that from each array the element with the minimum price should be considered. So it is basically min/max among the minimum value of each array.
Lets say I have 2 documents with the Costs field as
Costs: [
{
"price": 100,
},
{
"price": 200,
}
]
and
Costs: [
{
"price": 300,
},
{
"price": 400,
}
]
So I need to find the stats
This is the query I am currently using
{
"costs_stats":{
"nested":{
"path":"Costs"
},
"aggs":{
"price_stats_new":{
"stats":{
"field":"Costs.price"
}
}
}
}
}
And it gives me this:
"min" : 100,
"max" : 400
But I need to find stats after taking minimum elements of each array for consideration.
So this is what i need:
"min" : 100,
"max" : 300
Like we have a "mode" option in sort, is there something similar in stats aggregation also, or any other way of achieving this, maybe using a script or something. Please suggest. I am really stuck here.
Let me know if anything is required
Update 1:
Query for finding min/max among minimums
{
"_source":false,
"timeout":"5s",
"from":0,
"size":0,
"aggs":{
"price_1":{
"terms":{
"field":"id"
},
"aggs":{
"price_2":{
"nested":{
"path":"Costs"
},
"aggs":{
"filtered":{
"aggs":{
"price_3":{
"min":{
"field":"Costs.price"
}
}
},
"filter":{
"bool":{
"filter":{
"range":{
"Costs.price":{
"gte":100
}
}
}
}
}
}
}
}
}
},
"minValue":{
"min_bucket":{
"buckets_path":"price_1>price_2>filtered>price_3"
}
}
}
}
Only few buckets are coming and hence the min/max is coming among those, which is not correct. Is there any size limit.
One way to achieve your use case is to add one more field id, in each document. With the help of id field terms aggregation can be performed, and so buckets will be dynamically built - one per unique value.
Then, we can apply min aggregation, which will return the minimum value among numeric values extracted from the aggregated documents.
Adding a working example with index data, mapping, search query, and search result
Index Mapping:
{
"mappings": {
"properties": {
"Costs": {
"type": "nested"
}
}
}
}
Index Data:
{
"id":1,
"Costs": [
{
"price": 100
},
{
"price": 200
}
]
}
{
"id":2,
"Costs": [
{
"price": 300
},
{
"price": 400
}
]
}
Search Query:
{
"size": 0,
"aggs": {
"id_terms": {
"terms": {
"field": "id",
"size": 15 <-- note this
},
"aggs": {
"nested_entries": {
"nested": {
"path": "Costs"
},
"aggs": {
"min_position": {
"min": {
"field": "Costs.price"
}
}
}
}
}
}
}
}
Search Result:
"aggregations": {
"id_terms": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 1,
"doc_count": 1,
"nested_entries": {
"doc_count": 2,
"min_position": {
"value": 100.0
}
}
},
{
"key": 2,
"doc_count": 1,
"nested_entries": {
"doc_count": 2,
"min_position": {
"value": 300.0
}
}
}
]
}
Using stats aggregation also, it can be achieved (if you add one more field id that uniquely identifies your document)
{
"size": 0,
"aggs": {
"id_terms": {
"terms": {
"field": "id",
"size": 15 <-- note this
},
"aggs": {
"costs_stats": {
"nested": {
"path": "Costs"
},
"aggs": {
"price_stats_new": {
"stats": {
"field": "Costs.price"
}
}
}
}
}
}
}
}
Update 1:
To find the maximum value among those minimums (as seen in the above query), you can use max bucket aggregation
{
"size": 0,
"aggs": {
"id_terms": {
"terms": {
"field": "id",
"size": 15 <-- note this
},
"aggs": {
"nested_entries": {
"nested": {
"path": "Costs"
},
"aggs": {
"min_position": {
"min": {
"field": "Costs.price"
}
}
}
}
}
},
"maxValue": {
"max_bucket": {
"buckets_path": "id_terms>nested_entries>min_position"
}
}
}
}
Search Result:
"aggregations": {
"id_terms": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 1,
"doc_count": 1,
"nested_entries": {
"doc_count": 2,
"min_position": {
"value": 100.0
}
}
},
{
"key": 2,
"doc_count": 1,
"nested_entries": {
"doc_count": 2,
"min_position": {
"value": 300.0
}
}
}
]
},
"maxValue": {
"value": 300.0,
"keys": [
"2"
]
}
}

How to get hours between Min and Max date in Elasticsearch Aggregation?

How can I calculate hours between max and min dates (same tree level of max and min) in Elasticsearch?
My Query:-
{
"size": 0,
"query": {
"bool": {
"must": []
}
},
"aggs": {
"group_by_areaId": {
"terms": {
"size": 100000,
"field": "areaId.keyword"
},
"aggs": {
"4m": {
"date_histogram": {
"field": "timestamp",
"format": "yyyy-MM-dd'T'HH:mm:ssZZ",
"interval": "4m",
"order": {
"_key": "asc"
}
},
"aggs": {
"maxDate": {
"max": {
"field": "timestamp"
}
},
"minDate": {
"min": {
"field": "timestamp"
}
}
}
}
}
}
}
}
And the response (short) as,
"aggregations": {
"group_by_areaId": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "key1",
"doc_count": 15,
"4m": {
"buckets": [
{
"key_as_string": "2020-02-12T06:08:00+0000",
"key": 1581487680000,
"doc_count": 3,
"minDate": {
"value": 1.581487847E12,
"value_as_string": "2020-02-12T06:10:47Z"
},
"maxDate": {
"value": 1.58148791E12,
"value_as_string": "2020-02-12T06:11:50Z"
},
*// Need hours between maxDate and minDate here
//{
// "hours" : "0.0175" (maxDate-minDate)
//}*
}
]
}
}
]
}
}
Anyone please help me to find out the solution?
Thanks in Advance.
You can leverage the bucket_script pipeline aggregation in order to compute the difference between min and max for each bucket.
Simply add the following at the same level as minDate and maxDate:
"hours": {
"bucket_script": {
"buckets_path": {
"min": "minDate",
"max": "maxDate"
},
"script": "(params.max - params.min) / 3600000"
}
}
For your sample data above, the result in this case would be 0.0175 (i.e. roughly 1 minute)

Return just buckets size of aggregation query - Elasticsearch

I'm using an aggregation query on elasticsearch 2.1, here is my query:
"aggs": {
"atendimentos": {
"terms": {
"field": "_parent",
"size" : 0
}
}
}
The return is like that:
"aggregations": {
"atendimentos": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "1a92d5c0-d542-4f69-aeb0-42a467f6a703",
"doc_count": 12
},
{
"key": "4e30bf6d-730d-4217-a6ef-e7b2450a012f",
"doc_count": 12
}.......
It return 40000 buckets, so i have a lot of buckets in this aggregation, i just want return the buckets size, but i want something like that:
buckets_size: 40000
Guys, how return just the buckets size?
Well, thank you all.
try this query:
POST index/_search
{
"size": 0,
"aggs": {
"atendimentos": {
"terms": {
"field": "_parent"
}
},
"count":{
"cardinality": {
"field": "_parent"
}
}
}
}
It may return something like that:
"aggregations": {
"aads": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "aa",
"doc_count": 1
},
{
"key": "bb",
"doc_count": 1
}
]
},
"count": {
"value": 2
}
}
EDIT: More info here - https://www.elastic.co/guide/en/elasticsearch/reference/2.1/search-aggregations-metrics-cardinality-aggregation.html
{
"aggs" : {
"type_count" : {
"cardinality" : {
"field" : "type"
}
}
}
}
Read more about Cardinality Aggregation

How to hide buckets in ElasticSearch result?

In my query, I aggregate the buckets in one scalar. Since I'm not interested in each bucket (which, in my case, are tens of millions), I'd like to remove them from the returned result; i.e. I want to do something like "size":0 to hide all the hits. Is it possible?
E.g.:
{
"size": 0,
"aggs": {
"pop": {
"terms": {
"field": "account_number",
"size": 0
},
"aggs": {
"average": {
"avg": {
"field": "price"
}
}
}
},
"sum_of_avg": {
"sum_bucket": {
"buckets_path": "pop>average"
}
}
}
}
Result:
[...]
"aggregations": {
"pop": {
"doc_count_error_upper_bound": 40851,
"sum_other_doc_count": 93441329,
"buckets": [...] <== i don't want this
},
"sum_of_avg": {
"value": 128.0768325884469
}
I just posted an answer in the related question.
In this case the request should look like this:
curl -XPOST 'http://localhost:9200/<index>/_search?filter_path=aggregations.sum_of_avg' -d '
{
"size": 0,
"aggs": {
"pop": {
"terms": {
"field": "account_number",
"size": 0
},
"aggs": {
"average": {
"avg": {
"field": "price"
}
}
}
},
"sum_of_avg": {
"sum_bucket": {
"buckets_path": "pop>average"
}
}
}
}
PS: if you found another solution, please share it here. Thanks!
I think what you want is the "Cardinality" Aggregation. That will return to you the number of distinct values.
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-cardinality-aggregation.html
Example:
GET devdev/alert/_search
{
"size": 0,
"aggs": {
"agg1": {
"cardinality": {
"field": "price"
}
}
}
}

Resources