Aggregation of fields inside nested type field - elasticsearch

I want to aggregate keyword type field which lies inside a nested type field. The mapping for nested field is as below:
"Nested_field" : {
"type" : "nested",
"properties" : {
"Keyword_field" : {
"type" : "keyword"
}
}
}
And the part of query which I am using to aggregate is as below:
"aggregations": {
"Nested_field": {
"aggregations": {
"Keyword_field": {
"terms": {
"field": "Nested_field.Keyword_field"
}
}
},
"filter": {
"bool": {}
}
},
}
But this is not returning correct aggregation. Even though there are Keyword_field value existing docs, the query returns 0 buckets. So, there is something wrong in my aggregation query. Can anyone help me to find what's wrong?

I think you need to provide a nested path in there. This worked in ES 5, but it looks like you're using 6 based on the "aggregations" vs "aggs", so let me know if it doesn't work and I'll scrap this answer. Give this a try:
{
"aggregations": {
"nested_level": {
"nested": {
"path": "Nested_field"
},
"aggregations": {
"keyword_field": {
"terms": {
"field": "Nested_field.Keyword_field"
}
}
}
}
}
}

Related

Add condition to filter aggregation in elastic search

I want the count of each values of a variable based on some filter applied in elastic search. For example, I want all the age groups but on the filter that the students are from California.
The age groups is text field and contains an array like this,
"age_group": ["5-6-years", "6-7-years"]
I kinda want a query like this but this ain't working. It throws an error saying
unable to parse BaseAggregationBuilder with name [count]: parser not found
"student_aggregation": {
"nested": {
path": "students"
},
"aggs": {
"available": {
"filter": {
"term": { "students.place_of_birth": "California" }
},
"aggs" : {
"age_group" : { "count" : { "field" : "students.age_group" } }
}
}
}
}
Request help from you troops.
That's because there's no metric aggregation called count but value_count instead:
"student_aggregation": {
"nested": {
path": "students"
},
"aggs": {
"available": {
"filter": {
"term": { "students.gender": "boys" }
},
"aggs" : {
"age_group" : { "value_count" : { "field" : "students.age_group" } }
^^^
|||
}
}
}
}
UPDATE:
After discussions, the terms aggregation was more appropriate than value_count. After fixing the mapping (which was text instead of keyword), the query worked out correctly

How to count number of objects in a nested field in elastic search?

How to count number of objects in a nested filed in elastic search?
Sample mapping :
"base_keywords": {
"type": "nested",
"properties": {
"base_key": {
"type": "text"
},
"category": {
"type": "text"
},
"created_at": {
"type": "date"
},
"date": {
"type": "date"
},
"rank": {
"type": "integer"
}
}
}
I would like to count number of objects in nested filed 'base_keywords'.
You would need to do this with inline script. This is what worked for me: (Using ES 6.x):
GET your-indices/_search
{
"aggs": {
"whatever": {
"sum": {
"script": {
"inline": "params._source.base_keywords.size()"
}
}
}
}
}
Aggs are normally good for counting and grouping, for nested documents you can use nested aggs:
"aggs": {
"MyAggregation1": {
"terms": {
"field": "FieldA",
"size": 0
},
"aggs": {
"BaseKeyWords": {
"nested": { "path": "base_keywords" },
"aggs": {
"BaseKeys": {
"terms": {
"field": "base_keywords.base_key.keyword",
"size": 0
}
}
}
}
}
}
}
You don't specify what you want to count, but aggs are quite flexible for grouping and counting data.
The "doc_count" and "key" behave similar to an sql group by + count()
Updated (This assumes you have a .keyword field create the "keys" values, since a property of type "text" can't be aggregated or counted:
{
"aggs": {
"MyKeywords1Agg": {
"nested": { "path": "keywords1" },
"aggs": {
"NestedKeywords": {
"terms": {
"field": "keywords1.keys.keyword",
"size": 0
}
}
}
}
}
}
For simply counting the number of nested keys you could simply do this:
{
"aggs": {
"MyKeywords1Agg": {
"nested": { "path": "keywords1" }
}
}
}
If you want to get some grouping on the field values on the "main" document or the nested documents, you will have to extend your mapping / data model to include terms that are aggregatable, which includes most data types in elasticsearch except "text", ex.: dates, numbers, geolocations, keywords.
Edit:
Example with aggregating on a unique identifier for each top level document, assuming you have a property on it called "WordMappingId" of type integer
{
"aggs": {
"word_maping_agg": {
"terms": {
"field": "WordMappingId",
"size": 0,
"missing": -1
},
"aggs": {
"Keywords1Agg": null,
"nested": { "path": "keywords1" }
}
}
}
}
If you don't add any properties to the "word_maping" document on the top level there is no way to do an aggregation for each unique document. The builtin _id field is by default not aggregateable, and I suggest you include a unique identifier from the source data on the top level to aggregate on.
Note: the "missing" parameter will put all documents that don't have the WordMappingId property set in a bucked with the supplied value, this makes sure you're not missing any documents in the search results.
Aggs can support a behaviour similar to a group by in SQL, but you need something to actually group it by, and according to the mapping you supplied there are no such fields currently in your index.
I was trying to do similar to understand production data distribution
The following query helped me find top 5
{
"query": {
"match_all": {}
},
"aggs": {
"n_base_keywords": {
"nested": { "path": "base_keywords" },
"aggs": {
"top_count": { "terms": { "field": "_id", "size" : 5 } }
}
}
}
}

How to add properties from a root object in a nested object for sorting?

A simplified example of the kind of document in our index:
{
"organisation" : {
"code" : "01310"
},
"publications" : [
{
"dateEnd" : 1393801200000,
"dateStart" : 1391986800000,
"code" : "PUB.02"
},
{
"dateEnd" : 1401055200000,
"dateStart" : 1397512800000,
"code" : "PUB.06"
}
]
}
Note that publications are mapped as nested objects because we need to filter based on a combination of the dateEnd, dateStart and publicationStatus properties.
The PUB.02 status code is special. It states: 'this publication period is valid if the current user is a member of the organisation'.
I have a problem when I want to sort on 'most recent':
{
"sort": {
"publications.dateStart" : {
"mode" : "min",
"order" : "desc",
"nested_filter" : {
"or" : [
{
"and" : [
{ "term" : { "organisation.code" : "01310" } },
{ "term" : { "publications.code" : "PUB.02" } }
]
},
{ "term" : { "publications.code" : "PUB.06" } }
]
}
}
}
}
No error is given, but the PUB.02 entry is ignored. I tried to use copy_to in my mapping to copy the value of organisation.code to the nested object, but that did not help.
Is there a way to reach for the parent document inside a nested sort?
Alternatively, is there a way to copy data from parent to the nested document?
I am currently using version 1.7 of Elasticsearch without the ability to use scripts. Upgrading to a newer version could be done if that would help the situation.
This gist shows that the sort is performed on the PUB.06 publications: https://gist.github.com/EECOLOR/2db9a1ec9d6d5c791ea6
Although the documentation does not explictly mention it does look like we cannot access the parent field in a nested filter context.
Also I wasn't able to use copy_to to add data from root/parent field to nested document. I would suggest asking in elasticsearch discuss thread you would have more luck about the reasons for this.
Before some trigger happy bloke downvotes this answer I would like to add that the query and intended results that was desired in the OP using sort could be achieved using function_score work-around.
One implementation to achieve this is as follows
1) start of with a should query
2) In the first should clause
a) use filtered query to filter documents with the `organisation.code : 01310`
b) then score these documents based on max value of reciprocal of nested document **dateStart** with terms **PUB2.0 PUB6.0**
3) In the second should clause
a) use filtered query to filter documents with those with `organisation.code not equal to 01310`
b) like before score these documents based on max value of reciprocal of nested document **dateStart** with term **PUB6.0** only
Example Query:
POST /testindex/testtype/_search
{
"query": {
"bool": {
"should": [
{
"filtered": {
"filter": {
"term": {
"organisation.code": "01310"
}
},
"query": {
"nested": {
"path": "publications",
"query": {
"filtered": {
"query": {
"function_score": {
"functions": [
{
"field_value_factor": {
"field": "publications.dateStart",
"modifier": "reciprocal"
}
}
],
"boost_mode": "replace",
"score_mode": "max"
}
},
"filter": {
"terms": {
"publications.code": [
"PUB.02",
"PUB.06"
]
}
}
}
},
"score_mode": "max"
}
}
}
},
{
"filtered": {
"filter": {
"not": {
"term": {
"organisation.code": "01310"
}
}
},
"query": {
"nested": {
"path": "publications",
"query": {
"filtered": {
"query": {
"function_score": {
"functions": [
{
"field_value_factor": {
"field": "publications.dateStart",
"modifier": "reciprocal"
}
}
],
"boost_mode": "replace",
"score_mode": "max"
}
},
"filter": {
"terms": {
"publications.code": [
"PUB.06"
]
}
}
}
},
"score_mode": "max"
}
}
}
}
]
}
}
}
I'm first to admit it is not the most readable and if there is a way to 'copy_to' nested it would be much more ideal
If not simulating copy_to by injecting data in the source by client before indexing would be more simpler and flexible.
But the above is an example of how it could be done using function scores.

Elasticsearch: how to do filtered search and aggregation at the same time

I need to do a filtered search plus aggregation the following way, conceptually.
{
"filtered" : {
"query": {
"match_all" : {
}
},
"aggregations": {
"facets": {
"terms": {
"field": "subject"
}
}
},
"filter" : {
...
}
}
}
The above query is not working because I got the following error message:
[filtered] query does not support [aggregations]]
I was trying to solve this problem. I found Filter Aggregation or Filters Aggregation online, but they do not seem to address my need.
Could someone show me the structure of the correct query that can achieve my goal?
Thanks and regards.
The scope of aggregation is the query and all the filters in it. Which means if you give the aggregation along with the query in normal fashion , it should work.
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {}
}
},
"aggregations": {
"facets": {
"terms": {
"field": "subject"
}
}
}
}

Filtering nested aggregation result on number of buckets

I have this query that does a nested aggregation giving me unique machineid per unique key. What I want Elasticsearch to return is only those key with two or more unique machineid. I can of course solve this problem application-side, but is there a way to solve this directly in the query? Or maybe I am going about this the wrong way?
My query:
{
"query": {
"filtered": {
"filter": {
"bool": {
"must_not": {
"term" : { "key" : "" }
}
}
}
}
},
"aggs": {
"keys": {
"terms": {
"field": "key",
"size" : 0
},
"aggs": {
"machines": {
"terms": {
"field": "machineid",
"size" : 0
}
},
}
}
}
}
Example document:
{
"timestamp":"2014-05-23T08:21:51+00:00",
"machineid":"1444056739053156926",
"hash":"77f595dee5ffacea72b135b1fce1312e",
"key":"XXXXXX-XXXXXX-XXXXXX-XXXXXX"
}
I have been looking at scripted metric aggregation but it doesn't seem to be what I'm looking for.
Issue #4404 and issue #8110 on Elasticsearch GitHub seem to describe my problem but they are both closed.

Resources