Return just buckets size of aggregation query - Elasticsearch - elasticsearch

I'm using an aggregation query on elasticsearch 2.1, here is my query:
"aggs": {
"atendimentos": {
"terms": {
"field": "_parent",
"size" : 0
}
}
}
The return is like that:
"aggregations": {
"atendimentos": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "1a92d5c0-d542-4f69-aeb0-42a467f6a703",
"doc_count": 12
},
{
"key": "4e30bf6d-730d-4217-a6ef-e7b2450a012f",
"doc_count": 12
}.......
It return 40000 buckets, so i have a lot of buckets in this aggregation, i just want return the buckets size, but i want something like that:
buckets_size: 40000
Guys, how return just the buckets size?
Well, thank you all.

try this query:
POST index/_search
{
"size": 0,
"aggs": {
"atendimentos": {
"terms": {
"field": "_parent"
}
},
"count":{
"cardinality": {
"field": "_parent"
}
}
}
}
It may return something like that:
"aggregations": {
"aads": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "aa",
"doc_count": 1
},
{
"key": "bb",
"doc_count": 1
}
]
},
"count": {
"value": 2
}
}
EDIT: More info here - https://www.elastic.co/guide/en/elasticsearch/reference/2.1/search-aggregations-metrics-cardinality-aggregation.html

{
"aggs" : {
"type_count" : {
"cardinality" : {
"field" : "type"
}
}
}
}
Read more about Cardinality Aggregation

Related

Join Query in Kibana KQL

I have three logs in ES like
{"#timestamp":"2022-07-19T11:24:16.274073+05:30","log":{"level":200,"logger":"production","message":"BUY_ITEM1","context":{"user_id":31312},"datetime":"2022-07-19T11:24:16.274073+05:30","extra":{"ip":"127.0.0.1"}}}
{"#timestamp":"2022-07-19T11:24:16.274073+05:30","log":{"level":200,"logger":"production","message":"BUY_ITEM2","context":{"user_id":31312},"datetime":"2022-07-19T11:24:16.274073+05:30","extra":{"ip":"127.0.0.1"}}}
{"#timestamp":"2022-07-19T11:24:16.274073+05:30","log":{"level":200,"logger":"production","message":"CLICK_ITEM3","context":{"user_id":31312},"datetime":"2022-07-19T11:24:16.274073+05:30","extra":{"ip":"127.0.0.1"}}}
I can get the users who bought Item1 by querying log.message: "BUY_ITEM1" in KQL in Kibana.
How can I get user_ids who have both BUY_ITEM1 and BUY_ITEM2 ?
Tldr;
Join query as they exist in SQL are not really possible in Elasticsearch, they are (very limited)[https://www.elastic.co/guide/en/elasticsearch/reference/current/joining-queries.html].
You will need to work around this issue.
Work around
You could do an aggregation on user_id of all the product they bought.
GET /73031860/_search
{
"query": {
"terms": {
"log.message.keyword": [
"BUY_ITEM1",
"BUY_ITEM2"
]
}
},
"size": 0,
"aggs": {
"users": {
"terms": {
"field": "log.context.user_id",
"size": 10
},
"aggs": {
"products": {
"terms": {
"field": "log.message.keyword",
"size": 10
}
}
}
}
}
}
This will give you the following result
{
...
},
"aggregations": {
"users": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 31312,
"doc_count": 2,
"products": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "BUY_ITEM1",
"doc_count": 1
},
{
"key": "BUY_ITEM2",
"doc_count": 1
}
]
}
}
]
}
}
}

Filtering aggregation results

This question is a subquestion of this question. Posting as a separate question for attention.
Sample Docs:
{
"id":1,
"product":"p1",
"cat_ids":[1,2,3]
}
{
"id":2,
"product":"p2",
"cat_ids":[3,4,5]
}
{
"id":3,
"product":"p3",
"cat_ids":[4,5,6]
}
Ask: To get products belonging to a particular category. e.g cat_id = 3
Query:
GET product/_search
{
"size": 0,
"aggs": {
"cats": {
"terms": {
"field": "cats",
"size": 10
},"aggs": {
"products": {
"terms": {
"field": "name.keyword",
"size": 10
}
}
}
}
}
}
Question:
How to filter the aggregated result for cat_id = 3 here. I tried bucket_selector as well but it is not working.
Note: Due to multi-value of cat_ids filtering and then aggregation isn't working
You can filter values, on the basis of which buckets will be created.
It is possible to filter the values for which buckets will be created.
This can be done using the include and exclude parameters which are
based on regular expression strings or arrays of exact values.
Additionally, include clauses can filter using partition expressions.
Adding a working example with index data, search query, and search result
Index Data:
{
"id":1,
"product":"p1",
"cat_ids":[1,2,3]
}
{
"id":2,
"product":"p2",
"cat_ids":[3,4,5]
}
{
"id":3,
"product":"p3",
"cat_ids":[4,5,6]
}
Search Query:
{
"size": 0,
"aggs": {
"cats": {
"terms": {
"field": "cat_ids",
"include": [ <-- note this
3
]
},
"aggs": {
"products": {
"terms": {
"field": "product.keyword",
"size": 10
}
}
}
}
}
}
Search Result:
"aggregations": {
"cats": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 3,
"doc_count": 2,
"products": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "p1",
"doc_count": 1
},
{
"key": "p2",
"doc_count": 1
}
]
}
}
]
}

Elasticsearch return document ids while doing aggregate query

Is it possible to get an array of elasticsearch document id while group by, i.e
Current output
"aggregations": {,
"types": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Text Document",
"doc_count": 3310
},
{
"key": "Unknown",
"doc_count": 15
},
{
"key": "Document",
"doc_count": 13
}
]
}
}
Desired output
"aggregations": {,
"types": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Text Document",
"doc_count": 3310,
"ids":["doc1","doc2", "doc3"....]
},
{
"key": "Unknown",
"doc_count": 15,
"ids":["doc11","doc12", "doc13"....]
},
{
"key": "Document",
"doc_count": 13
"ids":["doc21","doc22", "doc23"....]
}
]
}
}
Not sure if this is possible in elasticsearch or not,
below is my aggregation query:
{
"size": 0,
"aggs": {
"types": {
"terms": {
"field": "docType",
"size": 10
}
}
}
}
Elasticsearch version:
6.3.2
You can use top_hits aggregation which will return all documents under an aggregation. Using source filtering you can select fields under hits
Query:
"aggs": {
"district": {
"terms": {
"field": "docType",
"size": 10
},
"aggs": {
"docs": {
"top_hits": {
"size": 10,
"_source": ["ids"]
}
}
}
}
}
For anyone interested, another solution is to create a custom key value using a script to create a string of delineated values from the doc, including the id. It may not be pretty, but you can then parse it out later - and if you just need something minimal like the doc id, it may be worth it.
{
"size": 0,
"aggs": {
"types": {
"terms": {
"script": "doc['docType'].value+'::'+doc['_id'].value",
"size": 10
}
}
}
}

How can I know if two different aggregations aggregated the same docs?

Suppose I have two aggs:
GET .../_search
{
"size": 0,
"aggs": {
"foo": {
"terms": {
"field": "foo"
}
},
"bar": {
"terms": {
"field": "bar"
}
}
}
}
Which returns the following:
...
"aggregations": {
"foo": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Africa",
"doc_count": 23
}
]
},
"bar": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Oil",
"doc_count": 23
}
]
}
}
My question is, how can I know if both "foo" and "bar" aggs are aggregating the same 23 docs?
I tried adding a sub agg to both "foo" and "bar" aggs to sum an arbitrary numeric field, but that's not remotely foolproof.
You can add a subaggregation which aggregates the identity field of the documents, you can do this with terms or either composite aggregation. When using terms you need to provide a size. See this example:
GET .../_search
{
"size": 0,
"aggs": {
"foo": {
"terms": {
"field": "foo"
},
"aggs" : {
"terms" : {
"field" : your_id_here
}
}
},
"bar": {
"terms": {
"field": "bar"
},
"aggs" : {
"terms" : {
"field" : your_id_here
}
}
}
}
}
You will need to compare the nested aggregations then.
Another approach would be to just filter out the desired documents using the search query.

Elasticsearch: Can I return only the cardinality of a buckets agg, without returning all the buckets?

Take the following query and result,
POST index/_search
{
"size": 0,
"aggs": {
"perDeviceAggregation": {
"terms": {
"field": "deviceID"
},
"aggs": {
"score_avg": {
"avg": {
"field": "device_score"
}
}
}
},
"count":{
"cardinality": {
"field": "deviceID"
}
}
}
}
result:
"aggregations": {
"aads": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "aa",
"doc_count": 3,
"score_avg": {
"value": 3.8
}
},
{
"key": "bb",
"doc_count": 1,
"score_avg": {
"value": 3.8
}
}
]
},
"count": {
"value": 2
}
}
That's great. But in my situation, I don't really care about information about each bucket. I only want to know the # of buckets. Something like the following:
"aggregations": {
"aads": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"bucket_count": 2
}
}
Is this possible in Elasticsearch?
Edit:
You might wonder why I calculate an average (which limits using terms instead of cardinality) if I don't care about what's in buckets. I do use the average to do a range aggregation. My actual problem is like folowing: The above question was simplified.
POST index/_search
{
"size": 0,
"aggs" : {
"mos_over_time" : {
"range" : {
"field" : "device_score",
"ranges" : [
{ "from" : 0.0, "to" : 2.6 },
{ "from" : 2.6, "to" : 4.0 },
{ "from" : 4.0 }
]
},
"aggs": {
"perDeviceAggregation": {
"terms": {
"field": "deviceID"
},
"aggs": {
"score_avg": {
"avg": {
"field": "device_score"
}
}
}
},
"count":{
"cardinality": {
"field": "deviceID"
}
}
}
}
}
}

Resources