Elasticsearch aggregate on nested JSON data - elasticsearch

I have to do some aggregation on json data. I saw multiple answers here on stackoverflow but not nothing worked for me.
I have multiple row and in timeCountry column i have an array which stores JSON objects. with keys count, country_name, s_name.
I have to find the sum of all the rows according to s_name,
Example - if in 1st row timeCountry holds array like below
[ {
"count": 12,
"country_name": "america",
"s_name": "us"
},
{
"count": 10,
"country_name": "new zealand",
"s_name": "nz"
},
{
"count": 20,
"country_name": "India",
"s_name": "Ind"
}]
Row 2 data is like below
[{
"count": 12,
"country_name": "america",
"s_name": "us"
},
{
"count": 10,
"country_name": "South Africa",
"s_name": "sa"
},
{
"count": 20,
"country_name": "india",
"s_name": "ind"
}]
like so on.
I need result like below
[{
"count": 24,
"country_name": "america",
"s_name": "us"
}, {
"count": 10,
"country_name": "new zealand",
"s_name": "nz"
},
{
"count": 40,
"country_name": "India",
"s_name": "Ind"
}, {
"count": 10,
"country_name": "South Africa",
"s_name": "sa"
}
]
the above data is for only one row i have multiple rows timeCountry is column
What I tried writing for aggregation
{
"query": {
"match_all": {}
},
"aggregations":{
"records" :{
"nested":{
"path":"timeCountry"
},
"aggregations":{
"ids":{
"terms":{
"field": "timeCountry.country_name"
}
}
}
}
}
}
But its not working Please help

I tried this on my local elastic cluster and I was able to get aggregated data on the nested documents. Depending on your mapping of index the answer may vary from mine. Following is the DSL that I tried with for aggregation :
{
"aggs" : {
"records" : {
"nested" : {
"path" : "timeCountry"
},
"aggs" : {
"ids" : { "terms" : {
"field" : "timeCountry.country_name.keyword"
},
"aggs": {"sum_name": { "sum" : { "field" : "timeCountry.count" } } }
}
}
}
}
}
Following is the mapping of my index:
{
"settings" : {
"number_of_shards" : 1
},
"mappings": {
"agg_data" : {
"properties" : {
"timeCountry" : {
"type" : "nested"
}
}
}
}
}

Related

ElasticSearch Max Agg on lowest value inside a list property of the document

I'm looking to do a Max aggregation on a value of the property under my document, the property is a list of complex object (key and value). Here's my data:
[{
"id" : "1",
"listItems" :
[
{
"key" : "li1",
"value" : 100
},
{
"key" : "li2",
"value" : 5000
}
]
},
{
"id" : "2",
"listItems" :
[
{
"key" : "li3",
"value" : 200
},
{
"key" : "li2",
"value" : 2000
}
]
}]
When I do the Nested Max Aggregation on "listItems.value", I'm expecting the max value returned to be 200 (and not 5000), reason being I want the logic to first figure the MIN value under listItems for each document, then doing the Max Aggregation on that. Is it possible to do something like this?
Thanks.
The search query performs the following aggregation :
Terms aggregation on the id field
Min aggregation on listItems.value
Max bucket aggregation that is a sibling pipeline aggregation which identifies the bucket(s) with the maximum value of a specified metric in a sibling aggregation and outputs both the value and the key(s) of the bucket(s).
Please refer to nested aggregation, to get a detailed explanation on it.
Adding a working example with index data, index mapping, search query, and search result.
Index Mapping:
{
"mappings": {
"properties": {
"listItems": {
"type": "nested"
},
"id":{
"type":"text",
"fielddata":"true"
}
}
}
}
Index Data:
{
"id" : "1",
"listItems" :
[
{
"key" : "li1",
"value" : 100
},
{
"key" : "li2",
"value" : 5000
}
]
}
{
"id" : "2",
"listItems" :
[
{
"key" : "li3",
"value" : 200
},
{
"key" : "li2",
"value" : 2000
}
]
}
Search Query:
{
"size": 0,
"aggs": {
"id_terms": {
"terms": {
"field": "id"
},
"aggs": {
"nested_entries": {
"nested": {
"path": "listItems"
},
"aggs": {
"min_position": {
"min": {
"field": "listItems.value"
}
}
}
}
}
},
"maxValue": {
"max_bucket": {
"buckets_path": "id_terms>nested_entries>min_position"
}
}
}
}
Search Result:
"aggregations": {
"id_terms": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "1",
"doc_count": 1,
"nested_entries": {
"doc_count": 2,
"min_position": {
"value": 100.0
}
}
},
{
"key": "2",
"doc_count": 1,
"nested_entries": {
"doc_count": 2,
"min_position": {
"value": 200.0
}
}
}
]
},
"maxValue": {
"value": 200.0,
"keys": [
"2"
]
}
}
Initial post was mentioning nested aggregation, thus i was sure question is about nested documents. Since i've come to solution before seeing another answer, i'm keeping the whole thing for history, but actually it differs only in adding nested aggregation.
The whole process can be explained like that:
Bucket each document into single bucket.
Use nested aggregation to be able to aggregate on nested documents.
Use min aggregation to find minimum value within all document nested documents, and by that, for document itself.
Finally, use another aggregation to calculate maximum value among results of previous aggregation.
Given this setup:
// PUT /index
{
"mappings": {
"properties": {
"children": {
"type": "nested",
"properties": {
"value": {
"type": "integer"
}
}
}
}
}
}
// POST /index/_doc
{
"children": [
{ "value": 12 },
{ "value": 45 }
]
}
// POST /index/_doc
{
"children": [
{ "value": 7 },
{ "value": 35 }
]
}
I can use those aggregations in request to get required value:
{
"size": 0,
"aggs": {
"document": {
"terms": {"field": "_id"},
"aggs": {
"children": {
"nested": {
"path": "children"
},
"aggs": {
"minimum": {
"min": {
"field": "children.value"
}
}
}
}
}
},
"result": {
"max_bucket": {
"buckets_path": "document>children>minimum"
}
}
}
}
{
"aggregations": {
"document": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "O4QxyHQBK5VO9CW5xJGl",
"doc_count": 1,
"children": {
"doc_count": 2,
"minimum": {
"value": 7.0
}
}
},
{
"key": "OoQxyHQBK5VO9CW5kpEc",
"doc_count": 1,
"children": {
"doc_count": 2,
"minimum": {
"value": 12.0
}
}
}
]
},
"result": {
"value": 12.0,
"keys": [
"OoQxyHQBK5VO9CW5kpEc"
]
}
}
}
There also should be a workaround using script for calculating max - all that you will need to do is just find and return smallest value in document in such script.

How to get multiple fields returned in elasticsearch query?

How to get multiple fields returned that are unique using elasticsearch query?
All of my documents have duplicate name and job fields. I would like to use an es query to get all the unique values which include the name and job in the same response, so they are tied together.
[
{
"name": "albert",
"job": "teacher",
"dob": "11/22/91"
},
{
"name": "albert",
"job": "teacher",
"dob": "11/22/91"
},
{
"name": "albert",
"job": "teacher",
"dob": "11/22/91"
},
{
"name": "justin",
"job": "engineer",
"dob": "1/2/93"
},
{
"name": "justin",
"job": "engineer",
"dob": "1/2/93"
},
{
"name": "luffy",
"job": "rubber man",
"dob": "1/2/99"
}
]
Expected result in any format -> I was trying to use aggs but I only get one field
[
{
"name": "albert",
"job": "teacher"
},
{
"name": "justin",
"job": "engineer"
},
{
"name": "luffy",
"job": "rubber man"
},
]
This is what I tried so far
GET name.test.index/_search
{
"size": 0,
"aggs" : {
"name" : {
"terms" : { "field" : "name.keyword" }
}
}
}
using the above query gets me this which is good that its unique
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 95,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"name" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "Justin",
"doc_count" : 56
},
{
"key" : "Luffy",
"doc_count" : 31
},
{
"key" : "Albert",
"doc_count" : 8
}
]
}
}
}
I tried doing nested aggregation but that did not work. Is there an alternative solution for getting multiple unique values or am I missing something?
That's a good start! There are a few ways to achieve what you want, each provides a different response format, so you can decide which one you prefer.
The first option is to leverage the top_hits sub-aggregation and return the two fields for each name bucket:
GET name.test.index/_search
{
"size": 0,
"aggs": {
"name": {
"terms": {
"field": "name.keyword"
},
"aggs": {
"top": {
"top_hits": {
"_source": [
"name",
"job"
],
"size": 1
}
}
}
}
}
}
The second option is to use a script in your terms aggregation instead of a field to return a compound value:
GET name.test.index/_search
{
"size": 0,
"aggs": {
"name": {
"terms": {
"script": "doc['name'].value + ' - ' + doc['job'].value"
}
}
}
}
The third option is to use two levels of field collapsing:
GET name.test.index/_search
{
"collapse": {
"field": "name",
"inner_hits": {
"name": "by_job",
"collapse": {
"field": "job"
},
"size": 1
}
}
}

Counting non-unique items in an Elasticsearch aggregation?

I'm trying to use an Elasticsearch aggregation to return all non-unique counts for each term within a bucket.
Given a mapping:-
{
"properties": {
"addresses": {
"properties": {
"meta": {
"properties": {
"types": {
"properties": {
"type": {
"type": "keyword"
}
}
}
}
}
}
}
}
}
And a document:-
{
"id": 3,
"first_name": "James",
"last_name": "Smith",
"addresses": [
{
"meta": {
"types": [
{
"type": "Home"
},
{
"type": "Home"
},
{
"type": "Business"
},
{
"type": "Business"
},
{
"type": "Business"
},
{
"type": "Fax"
}
]
}
}
]
}
The following terms aggregation:-
GET /test/_search
{
"size": 0,
"query": {
"match": {
"id": 3
}
},
"aggs": {
"types": {
"terms": {
"field": "addresses.meta.types.type"
}
}
}
}
Gives this result:-
"aggregations" : {
"types" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "Business",
"doc_count" : 1
},
{
"key" : "Fax",
"doc_count" : 1
},
{
"key" : "Home",
"doc_count" : 1
}
]
}
}
As you can see the terms are unique and I'm really after a total count of each e.g. Home: 2, Business: 3 and Fax: 1.
Is this possible?
I had a look at value_count but as it's not a bucket aggregation it seems a little less convenient to use. Alternatively possible a script might do it but I'm not too sure on the syntax.
Thanks!
I doubt if that is possible using object type in Elasticsearch. The reason is that most of the metrics aggregations is w.r.t the count of documents for particular occurrence of word and not counts of occurrence of words in documents.
You may have to change the type of your field type to nested so that ES would end up saving each type inside types as separate document.
I've provided sample mapping, document(no change in representation), aggregation query and response below.
Sample Mapping:
PUT nested_test
{
"mappings":{
"properties":{
"id":{
"type":"integer"
},
"first_name":{
"type":"text",
"fields":{
"keyword":{
"type":"keyword"
}
}
},
"second_name":{
"type":"text",
"fields":{
"keyword":{
"type":"keyword"
}
}
},
"addresses":{
"properties":{
"meta":{
"properties":{
"types":{
"type":"nested", <----- Note this
"properties":{
"type":{
"type":"keyword"
}
}
}
}
}
}
}
}
}
}
Sample Document (No change)
POST nested_test/_doc/1
{
"id": 3,
"first_name": "James",
"last_name": "Smith",
"addresses": [
{
"meta": {
"types": [
{
"type": "Home"
},
{
"type": "Home"
},
{
"type": "Business"
},
{
"type": "Business"
},
{
"type": "Business"
},
{
"type": "Fax"
}
]
}
}
]
}
Note that every type above is now considered as a separate document linked to the main document.
Aggregation Query:
All that would be required is to make use of Nested Aggregation + Terms Aggregation
POST nested_test/_search
{
"size": 0,
"aggs": {
"myterms": {
"nested": {
"path": "addresses.meta.types"
},
"aggs": {
"myterms": {
"terms": {
"field": "addresses.meta.types.type",
"size": 10,
"min_doc_count": 2 <----- Note this to filter only values with non unique counts
}
}
}
}
}
}
Note that in the above query I've made use of min_doc_count in order to restrict the results as per what you are looking for.
Response:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"myterms" : {
"doc_count" : 6,
"myterms" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "Business",
"doc_count" : 3
},
{
"key" : "Home",
"doc_count" : 2
}
]
}
}
}
}
Hope that helps!

Multiple key aggregation in ElasticSearch

I am new to Elastic Search and was exploring aggregation query. The documents I have are in the format -
{"name":"A",
"class":"10th",
"subjects":{
"S1":92,
"S2":92,
"S3":92,
}
}
We have about 40k such documents in our ES with the Subjects varying from student to student. The query to the system can be to aggregate all subject-wise scores for a given class. We tried to create a bucket aggregation query as explained in this guide here, however, this generates a single bucket per document and in our understanding requires an explicit mention of every subject.
We want to system to generate subject wise aggregate for the data by executing a single aggregation query, the problem I face is that in our data the subjects could vary from student to student and we don't have a global list of subject keys.
We wrote the following script but this only works if we know all possible subjects.
GET student_data_v1_1/_search
{ "query" :
{"match" :
{ "class" : "' + query + '" }},
"aggs" : { "my_buckets" : { "terms" :
{ "field" : "subjects", "size":10000 },
"aggregations": {"the_avg":
{"avg": { "field": "subjects.value" }}} }},
"size" : 0 }'
but this query only works for the document structure, but does not work multiple subjects are defined where we may not know the key-pair -
{"name":"A",
"class":"10th",
"subjects":{
"value":93
}
}
An alternate form the document is present is that the subject is a list of dictionaries -
{"name":"A",
"class":"10th",
"subjects":[
{"S1":92},
{"S2":92},
{"S3":92},
]
}
Having an aggregation query to solve either of the 2 document formats would be helpful.
======EDITS======
After updating the document to hold weights for each subject -
{
class": "10th",
"subject": [
{
"name": "s1",
"marks": 90,
"weight":30
},
{
"name": "s2",
"marks": 80,
"weight":70
}
]}
I have updated the query to be -
{
"query": {
"match": {
"class": "10th"
}
},
"aggs": {
"subjects": {
"nested": {
"path": "scores"
},
"aggs": {
"subjects": {
"terms": {
"field": "subject.name"
},
"aggs" : { "weighted_grade": { "weighted_avg": { "value": { "field": "subjects.score" }, "weight": { "field": "subjects.weight" } } } }
}
}
}
}
},
"size": 0
}
but it throws the error-
{u'error': {u'col': 312,
u'line': 1,
u'reason': u'Unknown BaseAggregationBuilder [weighted_avg]',
u'root_cause': [{u'col': 312,
u'line': 1,
u'reason': u'Unknown BaseAggregationBuilder [weighted_avg]',
u'type': u'unknown_named_object_exception'}],
u'type': u'unknown_named_object_exception'},
u'status': 400}
To achieve the required result I would suggest you to keep your index mapping as follows:
{
"properties": {
"class": {
"type": "keyword"
},
"subject": {
"type": "nested",
"properties": {
"marks": {
"type": "integer"
},
"name": {
"type": "keyword"
}
}
}
}
}
In the mapping above I have created subject as nested type with two properties, name to hold subject name and marks to hold marks in the subject.
Sample doc:
{
"class": "10th",
"subject": [
{
"name": "s1",
"marks": 90
},
{
"name": "s2",
"marks": 80
}
]
}
Now you can use nested aggregation and multilevel aggregation (i.e. aggregation inside aggregation). I used nested aggregation with terms aggregation for subject.name to get bucket containing all the available subjects. Then to get avg for each subject we add a child aggregation of avg to the subjects aggregation as below:
{
"query": {
"match": {
"class": "10th"
}
},
"aggs": {
"subjects": {
"nested": {
"path": "subject"
},
"aggs": {
"subjects": {
"terms": {
"field": "subject.name"
},
"aggs": {
"avg_score": {
"avg": {
"field": "subject.marks"
}
}
}
}
}
}
},
"size": 0
}
NOTE: I have added "size" : 0 so that elastic doesn't return matching docs in the result. To include or exclude it depends totally on your use case.
Sample result:
{
"took": 25,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0,
"hits": [
]
},
"aggregations": {
"subjects": {
"doc_count": 6,
"subjects": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "s1",
"doc_count": 3,
"avg_score": {
"value": 80
}
},
{
"key": "s2",
"doc_count": 2,
"avg_score": {
"value": 75
}
},
{
"key": "s3",
"doc_count": 1,
"avg_score": {
"value": 80
}
}
]
}
}
}
}
As you can see the result contains buckets with key as subject name and avg_score.value as the avg of marks.
UPDATE to include weighted_avg:
{
"query": {
"match": {
"class": "10th"
}
},
"aggs": {
"subjects": {
"nested": {
"path": "subject"
},
"aggs": {
"subjects": {
"terms": {
"field": "subject.name"
},
"aggs": {
"avg_score": {
"avg": {
"field": "subject.marks"
}
},
"weighted_grade": {
"weighted_avg": {
"value": {
"field": "subject.marks"
},
"weight": {
"field": "subject.weight"
}
}
}
}
}
}
}
},
"size": 0
}

elastic search query for aggregating on sub-fields of array

My elastic search index contain data in such format, the data is an array of object which contain date , order, visit for that date on that term :-
{
"term": "ふるさと納税",
"data": [
{
"date": "2018-01-25",
"visits": 17670,
"ranking": 1,
"orders": 154
},
{
"date": "2018-02-14",
"visits": 13758,
"ranking": 1,
"orders": 116
},
{
"date": "2017-12-24",
"visits": 142578,
"ranking": 1,
"orders": 2565
},
{
"date": "2018-03-08",
"visits": 21799,
"ranking": 1,
"orders": 312
}
]
},{
"term": "帯 中古 振袖",
"data": [
{
"date": "2018-01-30",
"ranking": 2966,
"orders": 0,
"visits": 345
}
]
}
i would like to sum all the visits and orders for the term within a defined date range
I have created this query :-
{
"_source": [],
"query": {
"bool": {
"filter": [
{"range": {"data.date": {"gte" : "2018-03-21"}}},
{"range": {"data.date": {"lte" : "2018-03-21"}}}
]
}
},
"aggs" : {
"by_term": {
"terms": {
"field": "term",
"order":{"sum_ranking":"desc"},
"size":100
},"aggs": {
"sum_ranking": {
"sum": {
"field" : "data.visits"
}
}
}
}
},
"from" : 0,
"size" : 0
}
it seems the filter is not working .
can any one help.
The mapping is :-
{
"settings" : {
"number_of_shards" : 1
},
"mappings" : {
"keyword" : {
"properties" : {
"term" : { "type" : "keyword" }
}
}
}
}

Resources