In Elasticsearch, how do I perform nested sub-aggregations? - elasticsearch

In Kibana, I create my index as follows:
PUT cars
{
"mappings":{
"_doc":{
"properties":{
"metadata":{
"type":"nested",
"properties":{
"str_value":{
"type":"keyword"
}
}
}
}
}
}
}
I then insert three records:
POST /cars/_doc/1
{
"metadata": [
{
"key": "model",
"str_value": "Ford"
},
{
"key": "price",
"int_value": 1000
}
]
}
PUT /cars/_doc/2
{
"metadata": [
{
"key": "model",
"str_value": "Ford"
},
{
"key": "price",
"int_value": 2000
}
]
}
PUT /cars/_doc/3
{
"metadata": [
{
"key": "model",
"str_value": "Holden"
},
{
"key": "price",
"int_value": 2500
}
]
}
The schema is a bit unconventional, but I've designed the index this way to avoid mapping explosion:
https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html
What I'd like to be able to do is to get all my car models, and the sum of prices for those models ie Ford $3000, and Holden $2500. So far I have:
GET /cars/_search
{
"aggs":{
"metadata":{
"nested":{
"path":"metadata"
},
"aggs":{
"model_filter":{
"filter":{
"term":{
"metadata.key":"model"
}
},
"aggs":{
"model_counter":{
"terms":{
"field":"metadata.str_value",
"size":1000
}
}
}
}
}
}
}
}
This gets me part of the way there, because it returns car models and document counts:
"aggregations": {
"metadata": {
"doc_count": 6,
"model_filter": {
"doc_count": 3,
"model_counter": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Ford",
"doc_count": 2
},
{
"key": "Holden",
"doc_count": 1
}
]
}
}
}
}
How can I modify my query to add a sub-aggregation which will show the sum of prices ie 3000 for Ford (sum of two documents) and 2500 for Holden (sum of one document)

Below query should help you for what you are looking for.
I've simply added on to your solution for that. I've made use of Reverse Nested Aggregation and then applied Sum Aggregation post again using Nested Aggregation.
So your query hierarchy is as below:
Nested Aggregation
- Terms Aggregation
- Reverse Nested Aggregation to back to parent doc
- Nested Aggregation to enter into nested price document
- Sum Aggregation to calculate all the prices
Aggregation Query:
POST <your_index_name>/_search
{
"size":0,
"aggs":{
"metadata":{
"nested":{
"path":"metadata"
},
"aggs":{
"model_filter":{
"filter":{
"term":{
"metadata.key":"model"
}
},
"aggs":{
"model_counter":{
"terms":{
"field":"metadata.str_value",
"size":1000
},
"aggs":{
"reverseNestedAgg":{
"reverse_nested":{},
"aggs":{
"metadata":{
"nested":{
"path":"metadata"
},
"aggs":{
"sum":{
"sum":{
"field":"metadata.int_value"
}
}
}
}
}
}
}
}
}
}
}
}
}
}
Note that I've added "size": 0 so as to only return aggregation query. You can modify it according to your requirements.
Aggregation Solution:
{
"took" : 7,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"metadata" : {
"doc_count" : 6,
"model_filter" : {
"doc_count" : 3,
"model_counter" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "Ford",
"doc_count" : 2,
"reverseNestedAgg" : {
"doc_count" : 2,
"metadata" : {
"doc_count" : 4,
"sum" : {
"value" : 3000.0
}
}
}
},
{
"key" : "Holden",
"doc_count" : 1,
"reverseNestedAgg" : {
"doc_count" : 1,
"metadata" : {
"doc_count" : 2,
"sum" : {
"value" : 2500.0
}
}
}
}
]
}
}
}
}
}
Note that I've tested the above query in ES version 7.
Important Note:
If your document ends up in the below format, then the above query wouldn't work.
POST /cars/_doc/1
{
"metadata": [
{
"key": "model",
"str_value": "Ford"
},
{
"key": "price",
"int_value": 1000
},
{
"key": "something else",
"int_value": 1000
}
]
}
// There are three nested documents with two documents having int_value field
I see you mentioned that you'd want to avoid mapping explosion and for that matter your schema is the way it is. However if the above scenario occurs, in that case you may want to take a step back and redesign your model or have your service layer handle this aggregation scenario.
Hope this helps!

Related

bucket aggregation/bucket_script computation

How to apply computation using bucket fields via bucket_script? More so, I would like to understand how to aggregate on distinct, results.
For example, below is a sample query, and the response.
What I am looking for is to aggregate the following into two fields:
sum of all buckets dist.value from e.g. response (1+2=3)
sum of all buckets (dist.value x key) from e.g., response (1x10)+(2x20)=50
Query
{
"size": 0,
"query": {
"bool": {
"must": [
{
"match": {
"field": "value"
}
}
]
}
},
"aggs":{
"sales_summary":{
"terms":{
"field":"qty",
"size":"100"
},
"aggs":{
"dist":{
"cardinality":{
"field":"somekey.keyword"
}
}
}
}
}
}
Query Result:
{
"aggregations": {
"sales_summary": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 10,
"doc_count": 100,
"dist": {
"value": 1
}
},
{
"key": 20,
"doc_count": 200,
"dist": {
"value": 2
}
}
]
}
}
}
You need to use a sum bucket aggregation, which is a pipeline aggregation to find the sum of response of cardinality aggregation across all the buckets.
Search Query for sum of all buckets dist.value from e.g. response (1+2=3):
POST idxtest1/_search
{
"size": 0,
"aggs": {
"sales_summary": {
"terms": {
"field": "qty",
"size": "100"
},
"aggs": {
"dist": {
"cardinality": {
"field": "pageview"
}
}
}
},
"sum_buckets": {
"sum_bucket": {
"buckets_path": "sales_summary>dist"
}
}
}
}
Search Response :
"aggregations" : {
"sales_summary" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 10,
"doc_count" : 3,
"dist" : {
"value" : 2
}
},
{
"key" : 20,
"doc_count" : 3,
"dist" : {
"value" : 3
}
}
]
},
"sum_buckets" : {
"value" : 5.0
}
}
For the second requirement, you need to first modify the response of value in the bucket aggregation response, using bucket script aggregation, and then use the modified value to perform bucket sum aggregation on it.
Search Query for sum of all buckets (dist.value x key) from e.g., response (1x10)+(2x20)=50
POST idxtest1/_search
{
"size": 0,
"aggs": {
"sales_summary": {
"terms": {
"field": "qty",
"size": "100"
},
"aggs": {
"dist": {
"cardinality": {
"field": "pageview"
}
},
"format-value-agg": {
"bucket_script": {
"buckets_path": {
"newValue": "dist"
},
"script": "params.newValue * 10"
}
}
}
},
"sum_buckets": {
"sum_bucket": {
"buckets_path": "sales_summary>format-value-agg"
}
}
}
}
Search Response :
"aggregations" : {
"sales_summary" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 10,
"doc_count" : 3,
"dist" : {
"value" : 2
},
"format-value-agg" : {
"value" : 20.0
}
},
{
"key" : 20,
"doc_count" : 3,
"dist" : {
"value" : 3
},
"format-value-agg" : {
"value" : 30.0
}
}
]
},
"sum_buckets" : {
"value" : 50.0
}
}

Count number of inner elements of array property (Including repeated values)

Given I have the following records.
[
{
"profile": "123",
"inner": [
{
"name": "John"
}
]
},
{
"profile": "456",
"inner": [
{
"name": "John"
},
{
"name": "John"
},
{
"name": "James"
}
]
}
]
I want to get something like:
"aggregations": {
"name": {
"buckets": [
{
"key": "John",
"doc_count": 3
},
{
"key": "James",
"doc_count": 1
}
]
}
}
I'm a beginner using Elasticsearch, and this seems to be a pretty simple operation to do, but I can't find how to achieve this.
If I try a simple aggs using term, it returns 2 for John, instead of 3.
Example request I'm trying:
{
"size": 0,
"aggs": {
"name": {
"terms": {
"field": "inner.name"
}
}
}
}
How can I possibly achieve this?
Additional Info: It will be used on Kibana later.
I can change mapping to whatever I want, but AFAIK Kibana doesn't like the "Nested" type. :(
You need to do a value_count aggregation, by default terms only does a doc_count, but the value_count aggregation will count the number of times a given field exists.
So, for your purposes:
{
"size": 0,
"aggs": {
"name": {
"terms": {
"field": "inner.name"
},
"aggs": {
"total": {
"value_count": {
"field": "inner.name"
}
}
}
}
}
}
Which returns:
"aggregations" : {
"name" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "John",
"doc_count" : 2,
"total" : {
"value" : 3
}
},
{
"key" : "James",
"doc_count" : 1,
"total" : {
"value" : 2
}
}
]
}
}

Counting non-unique items in an Elasticsearch aggregation?

I'm trying to use an Elasticsearch aggregation to return all non-unique counts for each term within a bucket.
Given a mapping:-
{
"properties": {
"addresses": {
"properties": {
"meta": {
"properties": {
"types": {
"properties": {
"type": {
"type": "keyword"
}
}
}
}
}
}
}
}
}
And a document:-
{
"id": 3,
"first_name": "James",
"last_name": "Smith",
"addresses": [
{
"meta": {
"types": [
{
"type": "Home"
},
{
"type": "Home"
},
{
"type": "Business"
},
{
"type": "Business"
},
{
"type": "Business"
},
{
"type": "Fax"
}
]
}
}
]
}
The following terms aggregation:-
GET /test/_search
{
"size": 0,
"query": {
"match": {
"id": 3
}
},
"aggs": {
"types": {
"terms": {
"field": "addresses.meta.types.type"
}
}
}
}
Gives this result:-
"aggregations" : {
"types" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "Business",
"doc_count" : 1
},
{
"key" : "Fax",
"doc_count" : 1
},
{
"key" : "Home",
"doc_count" : 1
}
]
}
}
As you can see the terms are unique and I'm really after a total count of each e.g. Home: 2, Business: 3 and Fax: 1.
Is this possible?
I had a look at value_count but as it's not a bucket aggregation it seems a little less convenient to use. Alternatively possible a script might do it but I'm not too sure on the syntax.
Thanks!
I doubt if that is possible using object type in Elasticsearch. The reason is that most of the metrics aggregations is w.r.t the count of documents for particular occurrence of word and not counts of occurrence of words in documents.
You may have to change the type of your field type to nested so that ES would end up saving each type inside types as separate document.
I've provided sample mapping, document(no change in representation), aggregation query and response below.
Sample Mapping:
PUT nested_test
{
"mappings":{
"properties":{
"id":{
"type":"integer"
},
"first_name":{
"type":"text",
"fields":{
"keyword":{
"type":"keyword"
}
}
},
"second_name":{
"type":"text",
"fields":{
"keyword":{
"type":"keyword"
}
}
},
"addresses":{
"properties":{
"meta":{
"properties":{
"types":{
"type":"nested", <----- Note this
"properties":{
"type":{
"type":"keyword"
}
}
}
}
}
}
}
}
}
}
Sample Document (No change)
POST nested_test/_doc/1
{
"id": 3,
"first_name": "James",
"last_name": "Smith",
"addresses": [
{
"meta": {
"types": [
{
"type": "Home"
},
{
"type": "Home"
},
{
"type": "Business"
},
{
"type": "Business"
},
{
"type": "Business"
},
{
"type": "Fax"
}
]
}
}
]
}
Note that every type above is now considered as a separate document linked to the main document.
Aggregation Query:
All that would be required is to make use of Nested Aggregation + Terms Aggregation
POST nested_test/_search
{
"size": 0,
"aggs": {
"myterms": {
"nested": {
"path": "addresses.meta.types"
},
"aggs": {
"myterms": {
"terms": {
"field": "addresses.meta.types.type",
"size": 10,
"min_doc_count": 2 <----- Note this to filter only values with non unique counts
}
}
}
}
}
}
Note that in the above query I've made use of min_doc_count in order to restrict the results as per what you are looking for.
Response:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"myterms" : {
"doc_count" : 6,
"myterms" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "Business",
"doc_count" : 3
},
{
"key" : "Home",
"doc_count" : 2
}
]
}
}
}
}
Hope that helps!

Elasticsearch Terms Aggregation - for dynamic keys of an object

Documents Structure
Doc_1 {
"title":"hello",
"myObject":{
"key1":"value1",
"key2":"value2"
}
}
Doc_2 {
"title":"hello world",
"myObject":{
"key2":"value4",
"key3":"value3"
}
}
Doc_3 {
"title":"hello world2",
"myObject":{
"key1":"value1",
"key3":"value3"
}
}
Information: myObject contains dynamic key-value pair.
Objective: My objective is to write an aggregation query to return the number of unique all dynamic key-value pairs.
Attempt and explanation: I can easily get results for known keys in this way.
{
"size":0,
"query":{
"match":{"title":"hello"}
},
"aggs":{
"key1Agg":{
"terms":{"field":"myObject.key1.keyword"}
},
"key2Agg":{
"terms":{"field":"myObject.key2.keyword"}
},
"key3Agg":{
"terms":{"field":"myObject.key3.keyword"}
}
}
}
This is the typical result of the above hardcoded nested keys aggregation.
{
...
"aggregations": {
"key1Agg": {
...
"buckets": [
{
"key": "value1",
"doc_count": 2
}
]
},
"key2Agg": {
...
"buckets": [
{
"key": "value2",
"doc_count": 1
},
{
"key": "value4",
"doc_count": 1
}
]
},
"key3Agg": {
...
"buckets": [
{
"key": "value3",
"doc_count": 2
}
]
}
}
}
Now all I want is to return the count of all dynamic key-value pairs, i.e without putting any hardcore key names in an aggregation query.
I am using ES 6.3, Thanks in Advance!!
From the information you have provided, it appears that myObject seems to be of object datatype and not nested datatype.
Well, there is no easy way to do without modifying the data you have, what you can do and possibly the simplest solution would be is to include an additional field say let's call it as myObject_list which would be of type keyword where the documents would be as follows:
Sample Documents:
POST test_index/_doc/1
{
"title":"hello",
"myObject":{
"key1":"value1",
"key2":"value2"
},
"myObject_list": ["key1_value1", "key2_value2"] <--- Note this
}
POST test_index/_doc/2
{
"title":"hello world",
"myObject":{
"key2":"value4",
"key3":"value3"
},
"myObject_list": ["key2_value4", "key3_value3"] <--- Note this
}
POST test_index/_doc/3
{
"title":"hello world2",
"myObject":{
"key1":"value1",
"key3":"value3"
},
"myObject_list": ["key1_value1", "key3_value3"] <--- Note this
}
You can have a query as simple as below:
Request Query:
POST test_index/_search
{
"size": 0,
"aggs": {
"key_value_aggregation": {
"terms": {
"field": "myObject_list", <--- Make sure this is of keyword type
"size": 10
}
}
}
}
Note that I've used Terms Aggregation over here.
Response:
{
"took" : 406,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"key_value_aggregation" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "key1_value1",
"doc_count" : 2
},
{
"key" : "key3_value3",
"doc_count" : 2
},
{
"key" : "key2_value2",
"doc_count" : 1
},
{
"key" : "key2_value4",
"doc_count" : 1
}
]
}
}
}
Hope this helps!

How can I filter doc_count value which is a result of a nested aggregation

How can I filter the doc_count value which is a result of a nested aggregation?
Here is my query:
"aggs": {
"CDIDs": {
"terms": {
"field": "CDID.keyword",
"size": 1000
},
"aggs": {
"my_filter": {
"filter": {
"range": {
"transactionDate": {
"gte": "now-1M/M"
}
}
}
},
"in_active": {
"bucket_selector": {
"buckets_path": {
"doc_count": "_count"
},
"script": "params.doc_count > 4"
}
}
}
}
}
The result of the query looks like:
{
"aggregations" : {
"CDIDs" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 2386,
"buckets" : [
{
"key" : "1234567",
"doc_count" : 5,
"my_filter" : {
"doc_count" : 4
}
},
{
"key" : "12345",
"doc_count" : 5,
"my_filter" : {
"doc_count" : 5
}
}
]
}
}
}
I'm trying to filter the second doc_count value here. Let's say I wanna have docs that are > 4 so the result should be having only one aggregation result in a bucket with doc_count = 5. Can anyone help how can I do this filter? Please let me know if any additional information is required.
Take a close look at the bucket_selector aggregation. You simply need to specify the aggregation name in buckets_path section i.e. "doc_count":"my_filter>_count"
Pipeline aggregation (buckets_path) has its own syntax where > acts as a separator. Refer to this LINK for more information on this.
Aggregation Query
POST <your_index_name>/_search
{
"size":0,
"aggs":{
"CDIDs":{
"terms":{
"field":"CDID.keyword",
"size":1000
},
"aggs":{
"my_filter":{
"filter":{
"range":{
"transactionDate":{
"gte":"now-1M/M"
}
}
}
},
"in_active":{
"bucket_selector":{
"buckets_path":{
"doc_count":"my_filter>_count"
},
"script":"params.doc_count > 4"
}
}
}
}
}
}
Hope it helps!

Resources