How to get an Elasticsearch aggregation with multiple fields - elasticsearch

I'm attempting to find related tags to the one currently being viewed. Every document in our index is tagged. Each tag is formed of two parts - an ID and text name:
{
...
meta: {
...
tags: [
{
id: 123,
name: 'Biscuits'
},
{
id: 456,
name: 'Cakes'
},
{
id: 789,
name: 'Breads'
}
]
}
}
To fetch the related tags I am simply querying the documents and getting an aggregate of their tags:
{
"query": {
"bool": {
"must": [
{
"match": {
"item.meta.tags.id": "123"
}
},
{
...
}
]
}
},
"aggs": {
"baked_goods": {
"terms": {
"field": "item.meta.tags.id",
"min_doc_count": 2
}
}
}
}
This works perfectly, I am getting the results I want. However, I require both the tag ID and name to do anything useful. I have explored how to accomplish this, the solutions seem to be:
Combine the fields when indexing
A script to munge together the fields
A nested aggregation
Option one and two are are not available to me so I have been going with 3 but it's not responding in an expected manner. Given the following query (still searching for documents also tagged with 'Biscuits'):
{
...
"aggs": {
"baked_goods": {
"terms": {
"field": "item.meta.tags.id",
"min_doc_count": 2
},
"aggs": {
"name": {
"terms": {
"field": "item.meta.tags.name"
}
}
}
}
}
}
I will get this result:
{
...
"aggregations": {
"baked_goods": {
"buckets": [
{
"key": "456",
"doc_count": 11,
"name": {
"buckets": [
{
"key": "Biscuits",
"doc_count": 11
},
{
"key": "Cakes",
"doc_count": 11
}
]
}
}
]
}
}
}
The nested aggregation includes both the search term and the tag I'm after (returned in alphabetical order).
I have tried to mitigate this by adding an exclude to the nested aggregation but this slowed the query down far too much (around 100 times for 500000 docs). So far the fastest solution is to de-dupe the result manually.
What is the best way to get an aggregation of tags with both the tag ID and tag name in the response?
Thanks for making it this far!

By the looks of it, your tags is not nested.
For this aggregation to work, you need it nested so that there is an association between an id and a name. Without nested the list of ids is just an array and the list of names is another array:
"item": {
"properties": {
"meta": {
"properties": {
"tags": {
"type": "nested", <-- nested field
"include_in_parent": true, <-- to, also, keep the flat array-like structure
"properties": {
"id": {
"type": "integer"
},
"name": {
"type": "string"
}
}
}
}
}
}
}
Also, note that I've added to the mapping this line "include_in_parent": true which means that your nested tags will, also, behave like a "flat" array-like structure.
So, everything you had so far in your queries will still work without any changes to the queries.
But, for this particular query of yours, the aggregation needs to change to something like this:
{
"aggs": {
"baked_goods": {
"nested": {
"path": "item.meta.tags"
},
"aggs": {
"name": {
"terms": {
"field": "item.meta.tags.id"
},
"aggs": {
"name": {
"terms": {
"field": "item.meta.tags.name"
}
}
}
}
}
}
}
}
And the result is like this:
"aggregations": {
"baked_goods": {
"doc_count": 9,
"name": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 123,
"doc_count": 3,
"name": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "biscuits",
"doc_count": 3
}
]
}
},
{
"key": 456,
"doc_count": 2,
"name": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "cakes",
"doc_count": 2
}
]
}
},
.....

Related

Elastic always returns 0 buckets on simple aggregations and 0 on nested

I'm making my move from SOLR to Elasticsearch and am struggling to get aggregations to work properly.
In my index there is a single document that resembles the following json structure:
{
"id": 1,
"title": "some title",
"profile": {
"colour": "GREY",
"brand": "SOME_BRAND",
},
"user_id": 1,
"created_at": "2017-09-09T13:54:30.304Z",
"updated_at": "2017-09-09T13:54:50.282Z",
"email": "john#example.com",
}
I can query my document using:
GET /index/_search
{
"query": {
"match_all": {}
}
}
I want to make (for some reason) to aggregate on e-mail alone. So I use:
GET /index/_search
{
"query": {
"match_all": {}
},
"aggs": {
"emails": {
"terms": {
"field": "email"
}
}
}
}
If I would do this with SOLR I would receive facets back where there is a single document with email address john#example.com.
Elastic however returns:
{
**SNIP**
"aggregations": {
"emails": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": []
}
}
}
I would also like to retrieve aggregations on the hash e.g. profile['colour']
I tried:
GET /index/_search
{
"query": {
"match_all": {}
},
"aggs": {
"profile_colour": {
"terms": {
"field": "profile",
"scripts": {
"inline": "doc.profile.colour"
}
}
}
}
}
But again zero results. It seems like every thing I try results in no aggregations. I am missing something very simple here..
You JSON document is malformed, please remove the trailing comma here
"brand": "SOME_BRAND",
and here
"email": "john#example.com",
And everything will work (at least here, I'm on ES 1.7.3). Note that in the following example I didn't create a specifying mapping for those fields:
"aggregations": {
"emails": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "example.com",
"doc_count": 1
},
{
"key": "john",
"doc_count": 1
}
]
}
}
and I guess it's wrong as the whole email should be managed as a single keyword.

Getting description when aggregating with Elasticsearch

When we use the aggregation feature on elastic, we get a value of the field we aggregating back but we also want to get the description of that field. We have to use the sector.id as other parts of our api uses it later on.
For ex: our data looks like this:
[{
"id":"123"
"sectors":[{
"id":"sector-1",
"name":"Automotive"
}]
},
{
"id":"123"
"sectors":[{
"id":"sector-2",
"name":"Biology"
}]
}]
When we aggregate over sectors.id our response looks like:
"aggregations": {
"sector": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "sector-2",
"doc_count": 19672
},
{
"key": "sector-1",
"doc_count": 11699
}]
}
}
Is there any way to get sectors.name as well as the key in the results?
It seems like that sectors should a nested field. Now assuming that sector name is unique per sector-id.
You may use sub-aggregations to figure out the related keys
GET _search
{
"size": 0,
"aggs": {
"sectors": {
"nested": {
"path": "sectors"
},
"aggs": {
"sector_id": {
"terms": {
"field": "sectors.id"
},
"aggs": {
"sector_name": {
"terms": {
"field": "sectors.name"
}
}
}
}
}
}
}
}

Aggregating with multiple fields returned in ElasticSearch

Suppose I have a relative simple index with the following fields...
"testdata": {
"properties": {
"code": {
"type": "integer"
},
"name": {
"type": "string"
},
"year": {
"type": "integer"
},
"value": {
"type": "integer"
}
}
}
I can write a query to get the total sum of the values aggregated by the code like so:
{
"from":0,
"size":0,
"aggs": {
"by_code": {
"terms": {
"field": "code"
},
"aggs": {
"total_value": {
"sum": {
"field": "value"
}
}
}
}
}
}
And this returns the following (abridged) results:
"aggregations": {
"by_code": {
"doc_count_error_upper_bound": 478,
"sum_other_doc_count": 328116,
"buckets": [
{
"key": 236948,
"doc_count": 739,
"total_value": {
"value": 12537
}
},
However, this data is being fed to a web front-end, where it is required both the code and the name is displayed. So, the question is, is it possible to amend the query somehow to also return the name field, as well as the code field, in the results?
So, for example, the results can look a bit like this:
"aggregations": {
"by_code": {
"doc_count_error_upper_bound": 478,
"sum_other_doc_count": 328116,
"buckets": [
{
"key": 236948,
"code": 236948,
"name": "Test Name",
"doc_count": 739,
"total_value": {
"value": 12537
}
},
I've read up on sub-aggregations, but in this case there is a one-to-one relationship between code and name (so, you wouldn't have different names for the same key). Also, in my real case, there are 5 other fields, like description, that I would like to return, so I am wondering if there was another way to do it.
In SQL (from which this data originally came from before it was swapped to ElasticSearch) I would write the following query
SELECT Code, Name, SUM(Value) AS Total_Value
FROM [TestData]
GROUP BY Code, Name
You can achieve this using scripting, i.e. instead of specifying a field, you specify a combination of fields:
{
"from":0,
"size":0,
"aggs": {
"by_code": {
"terms": {
"script": "[doc.code.value, doc.name.value].join('-')"
},
"aggs": {
"total_value": {
"sum": {
"field": "value"
}
}
}
}
}
}
note: you need to make sure to enable dynamic scripting for this to work

ElasticSearch 2.1.0 - Deep 'children' aggregation with 'sum' metric returning empty results

I have a hierarchy of document types two levels deep. The documents are related by parent-child relationships as follows: category > sub_category > item i.e. each sub_category has a _parent field referring to a category id, and each item has a _parent field referring to a sub_category id.
Each item has a price field. Given a query for categories, which includes conditions for sub-categories and items, I want to calculate a total price for each sub_category.
My query looks something like this:
{
"query": {
"has_child": {
"child_type": "sub_category",
"query": {
"has_child": {
"child_type": "item",
"query": {
"range": {
"price": {
"gte": 100,
"lte": 150
}
}
}
}
}
}
}
}
My aggregation to calculate the price for each sub-category looks like this:
{
"aggs": {
"categories": {
"terms": {
"field": "id"
},
"aggs": {
"sub_categories": {
"children": {
"type": "sub_category"
},
"aggs": {
"sub_category_ids": {
"terms": {
"field": "id"
},
"aggs": {
"items": {
"children": {
"type": "item"
},
"aggs": {
"price": {
"sum": {
"field": "price"
}
}
}
}
}
}
}
}
}
}
}
}
Despite the query response listing matching results, the aggregation response doesn't match any items:
{
"aggregations": {
"categories": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "category1",
"doc_count": 1,
"sub_categories": {
"doc_count": 3,
"sub_category_ids": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "subcat1",
"doc_count": 1,
"items": {
"doc_count": 0,
"price": {
"value": 0
}
}
},
{
"key": "subcat2",
"doc_count": 1,
"items": {
"doc_count": 0,
"price": {
"value": 0
}
}
},
{
"key": "subcat3",
"doc_count": 1,
"items": {
"doc_count": 0,
"price": {
"value": 0
}
}
}
]
}
}
}]
}
}
}
However, omitting the sub_category_ids aggregation does cause the items to appear and for prices to be summed at the level of the categories aggregation. I would expect including the sub_category_ids aggregation to simply change the level at which the prices are summed.
Am I misunderstanding how the aggregation is evaluated, and if so how could I modify it to display the summed prices for each sub-category?
I opened an issue #15413, regarding children aggregation as I and other folks were facing similar issues in ES 2.0
Apparently the problem according to ES developer #martijnvg was that
The children agg makes an assumption (that all segments are being seen by children agg) that was true in 1.x but not in 2.x
PR #15457 fixed this issue, again from #martijnvg
Before we only evaluated segments that yielded matches in parent aggs, which caused us to miss to evaluate child docs in segments we didn't have parent matches for.
The fix for this is stop remember in what segments we have matches for
and simply evaluate all segments. This makes the code simpler and we
can still quickly see if a segment doesn't hold child docs like we did
before
This pull request has been merged and it has also been back ported to the 2.x, 2.1 and 2.0 branches.

ElasticSearch aggregation function

Is that a possible to define an aggregation function in elastic search?
E.g. for data:
author weekday status
me monday ok
me tuesday ok
me moday bad
I want to get an aggregation based on author and weekday, and as a value I want to get concatenation of status field:
agg1 agg2 value
me monday ok,bad
me tuesday ok
I know you can do count, but is that possible to define another function used for aggregation?
EDIT/ANSWER: Looks like there is no multirow aggregation support in ES, thus we had to use subaggregations on last field (see Akshay's example). If you need to have more complex aggregation function, then aggregate by id (note, you won't be able to use _id, so you'll have to duplicate it in other field) - that way you'll be able to do advanced aggregation on individual items in each bucket.
You can get get roughly what you want by using sub aggregations available in 1.0. Assuming the documents are structured as author, weekday and status, you could using the aggregation below:
{
"size": 0,
"aggs": {
"author": {
"terms": {
"field": "author"
},
"aggs": {
"days": {
"terms": {
"field": "weekday"
},
"aggs": {
"status": {
"terms": {
"field": "status"
}
}
}
}
}
}
}
}
Which gives you the following result:
{
...
"aggregations": {
"author": {
"buckets": [
{
"key": "me",
"doc_count": 3,
"days": {
"buckets": [
{
"key": "monday",
"doc_count": 2,
"status": {
"buckets": [
{
"key": "bad",
"doc_count": 1
},
{
"key": "ok",
"doc_count": 1
}
]
}
},
{
"key": "tuesday",
"doc_count": 1,
"status": {
"buckets": [
{
"key": "ok",
"doc_count": 1
}
]
}
}
]
}
}
]
}
}
}

Resources