Sum and count aggregations over Elasticsearch fields - elasticsearch

I am new to Elasticsearch and I am looking to perform certain aggregations over the fields from an Elasticsearch 5.x index. I have an index that contains the documents with fields langs (which have nested structure) and docLang. These are dynamically mapped fields. Following are the examples documents
DOC 1:
{
"_index":"A",
"_type":"document",
"_id":"1",
"_source":{
"text":"This is a test sentence.",
"langs":{
"X":{
"en":1,
"es":2,
"zh":3
},
"Y":{
"en":4,
"es":5,
"zh":6
}
},
"docLang": "en"
}
}
DOC 2:
{
"_index":"A",
"_type":"document",
"_id":"2",
"_source":{
"text":"This is a test sentence.",
"langs":{
"X":{
"en":1,
"es":2
},
"Y":{
"en":3,
"es":4
}
},
"docLang": "es"
}
}
DOC 3:
{
"_index":"A",
"_type":"document",
"_id":"2",
"_source":{
"text":"This is a test sentence.",
"langs":{
"X":{
"en":1
},
"Y":{
"en":2
}
},
"docLang": "en"
}
}
I want to perform sum aggregation over the langs field in a way that for each key (X/Y) and for each language, I can get the sum across all documents in an index. Also, I want to produce the counts of documents for each type of language from docLang field.
e.g.: For above 3 documents, sum aggregation over langs field would look like below:
"langs":{
"X":{
"en":3,
"es":4,
"zh":3
},
"Y":{
"en":9,
"es":9,
"zh":6
}
}
And the docLang count would look like below:
"docLang":{
"en" : 2,
"es" : 1
}
Also because of some production env restrictions, I cannot use scripts in Elasticsearch. So, I was wondering if it is possible to use just field aggregation type for above fields?

{
"size": 0,
"aggs": {
"X": {
"nested": {
"path": "langs.X"
},
"aggs": {
"X_sum_en": {
"sum": {
"field": "langs.X.en"
}
},
"X_sum_es": {
"sum": {
"field": "langs.X.es"
}
},
"X_sum_zh": {
"sum": {
"field": "langs.X.zh"
}
}
}
},
"Y": {
"nested": {
"path": "langs.Y"
},
"aggs": {
"Y_sum_en": {
"sum": {
"field": "langs.Y.en"
}
},
"Y_sum_es": {
"sum": {
"field": "langs.Y.es"
}
},
"Y_sum_zh": {
"sum": {
"field": "langs.Y.zh"
}
}
}
},
"sum_docLang": {
"terms": {
"field": "docLang.keyword",
"size": 10
}
}
}
}
Since you didn't mention, but I think it's important. I made X and Y as nested fields:
"langs": {
"properties": {
"X": {
"type": "nested",
"properties": {
"en": {
"type": "long"
},
"es": {
"type": "long"
},
"zh": {
"type": "long"
}
}
},
"Y": {
"type": "nested",
"properties": {
"en": {
"type": "long"
},
"es": {
"type": "long"
},
"zh": {
"type": "long"
}
}
}
}
}
But, if you fields are not nested at all and here I mean actually the nested field type in Elasticsearch, a simple aggregation like this one should be enough:
{
"size": 0,
"aggs": {
"X_sum_en": {
"sum": {
"field": "langs.X.en"
}
},
"X_sum_es": {
"sum": {
"field": "langs.X.es"
}
},
"X_sum_zh": {
"sum": {
"field": "langs.X.zh"
}
},
"Y_sum_en": {
"sum": {
"field": "langs.Y.en"
}
},
"Y_sum_es": {
"sum": {
"field": "langs.Y.es"
}
},
"Y_sum_zh": {
"sum": {
"field": "langs.Y.zh"
}
},
"sum_docLang": {
"terms": {
"field": "docLang.keyword",
"size": 10
}
}
}
}

Related

Aggregate, sort and paginate on nested documents

I'm managing a product index, with product sales and other KPIs under a nested field.
Trying to sort based on nested aggregation, and paginate - with no success.
Below is a simplified version of my mapping, for the sake of the example -
{
"product_type":
{
"type": "keyword"
},
"family":
{
"type": "keyword"
},
"rootdomain":
{
"type": "keyword"
},
"kpis":
{
"type": "nested",
"properties":
{
"sales_1d":
{
"type": "float"
},
"timestamp":
{
"type": "date",
"format": "strict_date_optional_time_nanos"
},
"views_1d":
{
"type": "float"
}
}
}
}
My aggregation is similar to the one below-
{
"aggs": {
"group_by_family": {
"aggs": {
"nested_aggregation": {
"aggs": {
"range_filtered": {
"aggs": {
"sales_1d": {
"sum": {
"field": "kpis.sales_1d"
}
},
"views_1d": {
"sum": {
"field": "kpis.views_1d"
}
},
"reverse_nesting": {
"aggs": {
"docs": {
"top_hits": {
"size": 1,
"sort": [
{
"_id": {
"order": "asc"
}
}
],
"_source": {
"includes": [
"_id",
"family",
"rootdomain",
"product_type"
]
}
}
}
},
"reverse_nested": {}
}
},
"filter": {
"range": {
"kpis.timestamp": {
"format": "basic_date_time_no_millis",
"gte": "20220721T000000Z",
"lte": "20220918T235959Z"
}
}
}
}
},
"nested": {
"path": "kpis"
}
}
},
"terms": {
"field": "family",
"size": 10
}
}
},
"query": {
//some query to filter by product-type and rootdomain
},
"size": 0
}
I'm aware that I can add an order clause to term aggregation to order the aggregated results.
My target though is to paginate the aggregated results - meaning I want to retrieve and order
1-10 best-selling products, and later retrieve 11-20 best-selling products and so on.
I've tried using bucket sort under range_filtered but I'm getting an error -
class org.elasticsearch.search.aggregations.bucket.filter.InternalFilter cannot be cast to class org.elasticsearch.search.aggregations.InternalMultiBucketAggregation
I'm not sure how to proceed from here, is this possible? if not, is there any workaround?
Thanks.

Compose nested aggregations

Im sorry for any english misstake.
i hope that someone can help me.
Supose that i have the following mapping to my index:
PUT test-index
{
"mappings": {
"properties": {
"nestedOBJField": {
"type": "nested",
"index": true
},
"keywordField": {
"type": "keyword",
"index": true
}
}
}
}
It is possible to use the composite feature with nested fields?
It will be very handful if i can do something like this:
GET /test-index/_search
{
"size": 0,
"aggs": {
"TestAgg": {
"composite": {
"size": 10000,
"sources": [
{
"keyWordFieldAgg": {
"terms": {
"field": "keyWordField"
}
},
{
"nestedFieldAgg": {
"terms": {
"field": "nestedOBJField.attribute"
}
}
}
]
}
}
}
}
But this aproach is returning a several number of errors.
I will appreciate a lot if someone can help
Property nestedOBJField is of data type "nested" and property keyWordField is keyword type and at same level as nestedOBJField.
To use nested fields in aggregation , you need to use nested aggregation but then all sources in composite aggegation must be of type nested. This open issue can tell more about it.
You can use following work arounds.
Move keyWordField inside nested object in your documents.
{
"mappings": {
"properties": {
"nestedOBJField": {
"type": "nested",
"properties":{
"keywordField": {
"type": "keyword"
}
}
}
}
}
}
Sample Document
{
"nestedOBJField":[
{
"attribute":"1",
"age":1,
"keywordField":"xyz"
},
{
"attribute":"2",
"age":2,
"keywordField":"xyz"
}
]
}
Query
"aggs": {
"TestAgg": {
"nested": {
"path": "nestedOBJField"
},
"aggs": {
"name": {
"composite": {
"size": 10000,
"sources": [
{
"nestedFieldAgg": {
"terms": {
"field": "nestedOBJField.attribute.keyword"
}
}
},
{
"a":{
"terms": {
"field": "nestedOBJField.keywordField.keyword"
}
}
}
]
}
}
}
}
}
Moving your field inside nested property will mean data duplication , updating data in all nested documents.
Using terms aggregation - pagination will be an issue in this case
{
"size": 0,
"aggs": {
"TestAgg": {
"nested": {
"path": "nestedOBJField"
},
"aggs": {
"name": {
"terms": {
"field": "nestedOBJField.attribute.keyword",
"size": 10
},
"aggs": {
"back_to_parent": {
"reverse_nested": {},
"aggs": {
"keywords": {
"terms": {
"field": "keywordField.keyword",
"size": 10
}
}
}
}
}
}
}
}
}
}

Perform multi-field / multi-dimensional aggregations with nested fields in Elastic Search

I am tracking attendance of few students. I am storing their details in the index like the below.
Each doc in "entries" have few other fields. The following data shows that a student has attended 6 classes on "Monday".
"entries" is of type "nested"
{
reg_id: 1111,
"entires" : [
{
id: "123"
day: 'Monday'
},
{
id: "1234",
attendance: true
},
{
id: "12345",
classes_attended: 6
}
],
}
I want the count of each classes_attended of students for each day.
For Example "72 entries of students found for "Monday", who has attended 6 classes"
Sample desired output - This is just a sample I am completely fine if the output schema is changed.
[
{
"day" : "monday",
"classes_attended": 6,
count: 4
},
{
"day" : "monday",
"classes_attended": 1,
count: 5
},
{
"day" : "tuesday",
"classes_attended": 5,
count: 2
},
{
"day" : "tuesday",
"classes_attended": 6,
count: 1
}
]
Not sure How to start with the aggregations query:
I tried with the following query but I know its not the correct solution
"aggs": {
"attendance_aggs": {
"nested": {
"path": "entries"
},
"aggs": {
"days": {
"terms": {
"field": "entries.day"
},
"aggs": {
"attended": {
"reverse_nested": {},
"aggs":{
"class_attended_day": {
"terms": {
"field": "entries.classes_attended"
},
"aggs": {
"class_attended_days_count": {
"reverse_nested": {},
"aggs": {
"classes_attended_final": {
"cardinality": {
"field": "entries.class_attended"
}
}
}
}
}
}
}
}
}
}
}
}
}
It's unclear what the top-level object is so I'm going to assume it's a "student attendance entry per day". I'm also unsure what the entries.ids represent but I'll assume you'll be needing them at some later point so I'll keep them untouched.
Now, since all that your entries objects have in common is the id, they can be decoupled. Meaning that you should be using nested if any only if you share some attributes across all objects which need their attribute connections preserved. Since I don't see entries.id anywhere in your aggs, I'd recommend the following adjustments to your mapping:
PUT students
{
"mappings": {
"properties": {
"day": { ------------
"type": "keyword" |
}, |
"attendance": { |
"type": "boolean" | <--
}, |
"classes_attended": { |
"type": "integer" |
}, ------------
"entries": {
"type": "nested",
"properties": {
"day": {
"type": "keyword",
"copy_to": "day" <--
},
"attendance": {
"type": "boolean",
"copy_to": "attendance" <--
},
"classes_attended": {
"type": "integer",
"copy_to": "classes_attended" <--
}
}
}
}
}
}
and here's your query:
GET students/_search
{
"size": 0,
"aggs": {
"days": {
"terms": {
"field": "day"
},
"aggs": {
"classes_attended": {
"terms": {
"field": "classes_attended"
},
"aggs": {
"student_count": {
"cardinality": {
"field": "_id"
}
}
}
}
}
}
}
}
The response can then be post-processed into whatever you prefer.
EDIT
You could hijack reverse_nested but will need to come back to it as you're referencing other nested entries:
GET students/_search
{
"size": 0,
"aggs": {
"attendance_aggs": {
"nested": {
"path": "entries"
},
"aggs": {
"days": {
"terms": {
"field": "entries.day"
},
"aggs": {
"attended": {
"reverse_nested": {},
"aggs": {
"class_attended_day": {
"nested": {
"path": "entries"
},
"aggs": {
"class_attended_day": {
"terms": {
"field": "entries.classes_attended"
},
"aggs": {
"classes_attended_final": {
"cardinality": {
"field": "entries.classes_attended"
}
}
}
}
}
}
}
}
}
}
}
}
}
}

Unable to create nested date aggregation query

I am trying to create an ElasticSearch aggregation query which can generate sum or average of value in all my ingested documents.
The documents are of the format -
{
"weather":"cold",
"date_1":"2017/07/05",
"feedback":[
{
"date_2":"2017/08/07",
"value":28,
"comment":"not cold"
},{
"date_2":"2017/08/09",
"value":48,
"comment":"a bit chilly"
},{
"date_2":"2017/09/07",
"value":18,
"comment":"very cold"
}, ...
]
}
I am able to create a sum aggregation of all "feedback.value" using "date_1" by using the following request -
GET _search
{
"query": {
"query_string": {
"query": "cold"
}
},
"size": 0,
"aggs": {
"temperature": {
"date_histogram":{
"field" : "date_1",
"interval" : "month"
},
"aggs":{
"temperature_agg":{
"terms": {
"field": "feedback.value"
}
}
}
}
}
}
However, I need to generate the same query across all documents aggregate based on "feedback.date_2". I am not sure if ElasticSearch can resolve such aggregation or how to approach it. Any guidance would be helpful
[EDIT]
Mapping file( I only define the nested items, ES identifes other fields on its own)
{
"mappings": {
"catalog_item": {
"properties": {
"feedback":{
"type":"nested",
"properties":{
"date_2":{
"type": "date",
"format":"YYYY-MM-DD"
},
"value": {
"type": "float"
},
"comment": {
"type": "text"
}
}
}
}
}
}
}
You would need to make use of nested documents and sum aggregation.
Here's a working example:
Sample Mapping:
PUT test
{
"mappings": {
"doc": {
"properties": {
"feedback": {
"type": "nested"
}
}
}
}
}
Add Sample document:
PUT test/doc/1
{
"date_1": "2017/08/07",
"feedback": [
{
"date_2": "2017/08/07",
"value": 28,
"comment": "not cold"
},
{
"date_2": "2017/08/09",
"value": 48,
"comment": "a bit chilly"
},
{
"date_2": "2017/09/07",
"value": 18,
"comment": "very cold"
}
]
}
Calculate both the sum and average based on date_2.
GET test/_search
{
"size": 0,
"aggs": {
"temperature_aggregation": {
"nested": {
"path": "feedback"
},
"aggs": {
"temperature": {
"date_histogram": {
"field": "feedback.date_2",
"interval": "month"
},
"aggs": {
"sum": {
"sum": {
"field": "feedback.value"
}
},
"avg": {
"avg": {
"field": "feedback.value"
}
}
}
}
}
}
}
}

Right way access parent field in Elasticsearch nested aggs script

Elasticsearch Version: 5.6.3
I have a mapping like this:
PUT /my_stock
{
"mappings": {
"stock": {
"properties": {
"industry": {
"type": "nested",
"properties": {
"name": {
"type": "keyword"
},
"rate": {
"type": "double"
}
}
},
"changeRatio": {
"type": "double"
}
}
}
}
}
Datas:
POST /_bulk
{"index":{"_index":"my_stock","_type":"stock","_id":null}}
{"industry":[{"name":"Technology","rate":0.6},{"name":"Health", "rate":0.2}],"changeRatio":0.1}
{"index":{"_index":"my_stock","_type":"stock","_id":null}}
{"industry":[{"name":"Health", "rate":0.3},{"name":"Education", "rate":0.2}],"changeRatio":0.2}
{"index":{"_index":"my_stock","_type":"stock","_id":null}}
{"industry":[{"name":"Health","rate":0.5},{"name":"Education","rate":0.2}],"changeRatio":-0.3}
{"index":{"_index":"my_stock","_type":"stock","_id":null}}
{"industry":[{"name":"Technology","rate":0.3},{"name":"Education","rate":0.3}],"changeRatio":0.4}
{"index":{"_index":"my_stock","_type":"stock","_id":null}}
{"industry":[{"name":"Education","rate":0.3},{"name":"Technology","rate":0.1}],"changeRatio":-0.5}
I then want to build a aggs query like this:
GET my_stock/stock/_search
{
"size": 0,
"aggs": {
"industry": {
"nested": {
"path": "industry"
},
"aggs": {
"groups": {
"terms": {
"field": "industry.name",
"order": {
"rate": "desc"
}
},
"aggs": {
"rate": {
"sum": {
"script": {
"source": "doc['changeRatio'].value * doc['industry.rate'].value"
}
}
}
}
}
}
}
}
}
but "doc['changeRatio'].value" can't get right value, it's always return 0
another query like this:
GET my_stock/stock/_search
{
"size": 0,
"aggs": {
"industry": {
"nested": {
"path": "industry"
},
"aggs": {
"groups": {
"terms": {
"field": "industry.name",
"order":{
"reverse>rate":"desc"
}
},
"aggs": {
"reverse": {
"reverse_nested": {},
"aggs": {
"rate": {
"sum": {
"script": {
"source": "doc['changeRatio'].value * doc['industry.rate'].value"
}
}
}
}
}
}
}
}
}
}
}
"doc['changeRatio'].value" is right, but "doc['industry.rate'].value" get 0
Refer to this question: Elasticsearch 5.4: Use normal and nested fields in same Painless script query?
1. { params['_source']['changeRatio'] } or { params['_source']['industry.rate'] } not work in this version
2. "copy to" stored as a multivalue field, also not working
How can i make a correct script get "changeRatio * industry.rate"?

Resources