How to retrieve more data from aggregation when array of items is aggregated? - elasticsearch

This question is related to How to retrieve more data in aggregation?. The difference is that I try to aggregate on field "languages" which is an array and contains many entries like this: { id: 1, name: "English" }.
Here is a mapping:
languages:
properties:
name:
type: "string"
index: "not_analyzed"
id:
type: "long"
I try to aggregate like this:
aggs:
languages:
terms:
field: "languages.id"
aggs:
languageName:
terms:
field: "languages.name"
The result is:
{
"key": 175,
"doc_count": 1,
"languageName": {
"buckets": [
{
"key": "Latin",
"doc_count": 1
},
{
"key": "Lingala",
"doc_count": 1
},
{
"key": "Quechua",
"doc_count": 1
},
{
"key": "Tamil",
"doc_count": 1
},
{
"key": "Walloon",
"doc_count": 1
}
]
}
}
For some reason nested aggregation returns all language names for single language id... How I can retrieve correct language name?

You have to use nested aggregation as,
POST _search
{
"size": 0,
"aggs": {
"nestedlang": {
"nested": {
"path": "languages"
},
"aggs": {
"languages": {
"term": {
"field": "languages.id"
},
"aggs": {
"languageName": {
"terms": {
"field": "languages.name"
}
}
}
}
}
}
}
}
Make sure you have type = nested for "languages"..
eg.
{
...
"languages" : {
"properties" : {
"resellers" : {
"type" : "nested"
"properties" : {
"name" : { "type" : "string" },
"id" : { "type" : "long" }
}
}
}
}
}
Learn more #elasticserach

It is easier if you provide mapping, documents and query in json format so it is easier to copy paste and reproduce. Are you really using nested objects in the mapping? If that is the case you also need the nested aggregation:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-nested-aggregation.html
If you are not using nested objects, look up the documentation for inner objects. In the end you could en up with a document that is actually the following:
languages.id=[1,2,3,4]
languages.name=["Latin","Lingala","Dutch","English"]
Hope that helps

Related

ElasticSearch Max Agg on lowest value inside a list property of the document

I'm looking to do a Max aggregation on a value of the property under my document, the property is a list of complex object (key and value). Here's my data:
[{
"id" : "1",
"listItems" :
[
{
"key" : "li1",
"value" : 100
},
{
"key" : "li2",
"value" : 5000
}
]
},
{
"id" : "2",
"listItems" :
[
{
"key" : "li3",
"value" : 200
},
{
"key" : "li2",
"value" : 2000
}
]
}]
When I do the Nested Max Aggregation on "listItems.value", I'm expecting the max value returned to be 200 (and not 5000), reason being I want the logic to first figure the MIN value under listItems for each document, then doing the Max Aggregation on that. Is it possible to do something like this?
Thanks.
The search query performs the following aggregation :
Terms aggregation on the id field
Min aggregation on listItems.value
Max bucket aggregation that is a sibling pipeline aggregation which identifies the bucket(s) with the maximum value of a specified metric in a sibling aggregation and outputs both the value and the key(s) of the bucket(s).
Please refer to nested aggregation, to get a detailed explanation on it.
Adding a working example with index data, index mapping, search query, and search result.
Index Mapping:
{
"mappings": {
"properties": {
"listItems": {
"type": "nested"
},
"id":{
"type":"text",
"fielddata":"true"
}
}
}
}
Index Data:
{
"id" : "1",
"listItems" :
[
{
"key" : "li1",
"value" : 100
},
{
"key" : "li2",
"value" : 5000
}
]
}
{
"id" : "2",
"listItems" :
[
{
"key" : "li3",
"value" : 200
},
{
"key" : "li2",
"value" : 2000
}
]
}
Search Query:
{
"size": 0,
"aggs": {
"id_terms": {
"terms": {
"field": "id"
},
"aggs": {
"nested_entries": {
"nested": {
"path": "listItems"
},
"aggs": {
"min_position": {
"min": {
"field": "listItems.value"
}
}
}
}
}
},
"maxValue": {
"max_bucket": {
"buckets_path": "id_terms>nested_entries>min_position"
}
}
}
}
Search Result:
"aggregations": {
"id_terms": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "1",
"doc_count": 1,
"nested_entries": {
"doc_count": 2,
"min_position": {
"value": 100.0
}
}
},
{
"key": "2",
"doc_count": 1,
"nested_entries": {
"doc_count": 2,
"min_position": {
"value": 200.0
}
}
}
]
},
"maxValue": {
"value": 200.0,
"keys": [
"2"
]
}
}
Initial post was mentioning nested aggregation, thus i was sure question is about nested documents. Since i've come to solution before seeing another answer, i'm keeping the whole thing for history, but actually it differs only in adding nested aggregation.
The whole process can be explained like that:
Bucket each document into single bucket.
Use nested aggregation to be able to aggregate on nested documents.
Use min aggregation to find minimum value within all document nested documents, and by that, for document itself.
Finally, use another aggregation to calculate maximum value among results of previous aggregation.
Given this setup:
// PUT /index
{
"mappings": {
"properties": {
"children": {
"type": "nested",
"properties": {
"value": {
"type": "integer"
}
}
}
}
}
}
// POST /index/_doc
{
"children": [
{ "value": 12 },
{ "value": 45 }
]
}
// POST /index/_doc
{
"children": [
{ "value": 7 },
{ "value": 35 }
]
}
I can use those aggregations in request to get required value:
{
"size": 0,
"aggs": {
"document": {
"terms": {"field": "_id"},
"aggs": {
"children": {
"nested": {
"path": "children"
},
"aggs": {
"minimum": {
"min": {
"field": "children.value"
}
}
}
}
}
},
"result": {
"max_bucket": {
"buckets_path": "document>children>minimum"
}
}
}
}
{
"aggregations": {
"document": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "O4QxyHQBK5VO9CW5xJGl",
"doc_count": 1,
"children": {
"doc_count": 2,
"minimum": {
"value": 7.0
}
}
},
{
"key": "OoQxyHQBK5VO9CW5kpEc",
"doc_count": 1,
"children": {
"doc_count": 2,
"minimum": {
"value": 12.0
}
}
}
]
},
"result": {
"value": 12.0,
"keys": [
"OoQxyHQBK5VO9CW5kpEc"
]
}
}
}
There also should be a workaround using script for calculating max - all that you will need to do is just find and return smallest value in document in such script.

Elasticsearch: Retrieving filtered and unfiltered count in one request

I am using the following mapping in one of my ElasticSearch indices:
"mappings": {
"my-mapping": {
"properties": {
"id": {
"type": "keyword"
},
"groupId": {
"type" : "keyword"
}
"title": {
"type": "text"
}
}
}
}
I now want to count elements matching to a search string which may be present inside of "title", grouped by my groupId. I can achieve that using aggregations and buckets:
/indexname/_search
{
"query" : {
"term" : {
"title" : "sky"
}
},
"aggs": {
"filtered_buckets": {
"terms": {
"field": "groupId"
}
}
}
}
Additionally, I want to know the count of all elements not respecting the filter. I could simply achieve that using a non-queried search:
/indexname/_search
{
"aggs": {
"filtered_buckets": {
"terms": {
"field": "groupId"
}
}
}
}
Current problem is: Is there any possibility to generate aggregation data containing the filtered count and the unfiltered count of only those groups which had a hit before - in one request?
For example:
"buckets": [
{
"key": "257786",
"doc_count": 3024,
"filtered_doc_count" : 202
},
{
"key": "254640",
"doc_count": 3010
"filtered_doc_count" : 1
},
{
"key": "252256",
"doc_count": 2367
"filtered_doc_count" : 5
},
...
]
One way I see is splitting the requests in two while first requesting all filtered buckets (their IDs) and then requesting the counts of these specific buckets using "terms" : { "id" : ["4", "65", "404"] }. This is not very nice and I don't want to request twice (_msearch does not help here).
Second bad solution would be to persist the all-counts somewhere in all of my entities.
Is there any way to achieve what I described in a single request?
PS: Please correct me, if the question is unclear.
Based on these:
How to filter terms aggregation
http://nocf-www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations.html
I made this:
PUT test
{
"mappings": {
"my-mapping": {
"properties": {
"id": {
"type": "keyword"
},
"groupId": {
"type" : "keyword"
},
"title": {
"type": "text"
}
}
}
}
}
PUT test/type1/1
{
"id":1,
"groupId": 1,
"title": "asd"
}
PUT test/type1/2
{
"id":2,
"groupId": 1,
"title": "sky"
}
PUT test/type1/3
{
"id":3,
"groupId": 2,
"title": "sky"
}
PUT test/type1/4
{
"id":4,
"groupId": 2,
"title": "sky"
}
PUT test/type1/5
{
"id":5,
"groupId": 2,
"title": "sky"
}
POST test/type1/_search
{
"aggs": {
"categories-filtered": {
"filter": {"term": {"title": "sky"}},
"aggs": {
"names": {
"terms": {"field": "groupId"}
}
}
},
"categories": {
"terms": {"field": "groupId"}
}
}
}

Elasticsearch filter the maximum value document

I trying to get the maximum value of document from the same name records. Forexample, I have 3 users, 2 of them have same name but different followers count, I wanted to return only 1 document from the 2 same with same name based on the maximum of followers_count.
{ id: 1, name: "John Greenwood", follower_count: 100 }
{ id: 2, name: "John Greenwood", follower_count: 200 }
{ id: 3, name: "John Underwood", follower_count: 300 }
So the result would be,
{ id: 2, name: "John Greenwood", follower_count: 200 }
{ id: 3, name: "John Underwood", follower_count: 300 }
From 2 same names, the one with the maximum followers wins and other single one will also come.
I have mapping as follow,
"users-development" : {
"mappings" : {
"user" : {
"dynamic" : "false",
"properties" : {
"follower_count" : {
"type" : "integer"
},
"name" : {
"type" : "string",
"fields" : {
"exact" : {
"type" : "string",
"index" : "not_analyzed"
}
}
},
}
}
}
This is where I have been stucked from long,
{
query: {
filtered: {
filter: {
bool: {
must: [
{ terms: { "name.exact": [ "John Greenwood", "John Underwood" ] } },
]
}
}
}
},
aggs: {
max_follower_count: { max: { field: 'follower_count' } }
},
size: 1000,
}
Any suggestions please
Your question have a special tool in the elastic stack as a hammer for a head kkk.
Are Aggregations, See the examples:
First of all in your case you will need aggregate by full name including spaces, your name field need to be not_analyzed like this
`PUT /index
{
"mappings": {
"users" : {
"properties" : {
"name" : {
"type" : "string",
"index": "not_analyzed"
}
}
}
}
}`
Now your query will be like this one:
`POST /index/users/_search
{
"aggs": {
"users": {
"terms": {
"field": "name"
},
"aggs": {
"followers": {
"max": {
"field": "follower_count"
}
}
}
}
}
}`
I just aggregated by name and used a max metric to get the higgest follower count.
The response will be like this:
`"aggregations": {
"users": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "John Greenwood",
"doc_count": 2,
"followers": {
"value": 200
}
},
{
"key": "John Underwood",
"doc_count": 1,
"followers": {
"value": 300
}
}
]
}
}`
Hope that will be good for you.
Use aggregations for all situations that you need aggregate data and get sum on values.
Ok, I think you are looking for something along these lines, using the terms aggregation
{
"query": {
"terms": { "name.exact": [ "John Greenwood", "John Underwood" ] }
},
"aggs": {
"max_follower_count": {
"terms": {
"field":"name.exact"
},
"aggs":{
"max_follow" : { "max" : { "field" : "follower_count" } }
}
}
},
"size": 1000
}
The terms aggregation will make a bucket for each unique value, from names.exact, which will only be those specified in your terms query. So we now have a bucket for both Johns, now we can use the max aggregation to count who has the most followers. The max aggregation will operate on each bucket in its parent aggregation.
Each of these unique terms will then have its max value of follower_count computed, and displayed in the bucket. Results look as follows:
... //query results of just the terms query up here
"aggregations": {
"max_follower_count": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "John Greenwood",
"doc_count": 2,
"max_follow": {
"value": 200
}
},
{
"key": "John Underwood",
"doc_count": 1,
"max_follow": {
"value": 300
}
}
]
}
}
The terms aggregation comes with a few caveats with how it does the counting, and the documentation linked should be pretty clear on that.

Elasticsearch metric aggregation: number of elements in array

I want to do a quite involved query/aggregation. I can't see how because I've just started working with ES. The documents I have look something like this:
{
"keyword": "some keyword",
"items": [
{
"name":"my first item",
"item_property_1":"A",
( other properties here )
},
{
"name":"my second item",
"item_property_1":"B",
( other properties here )
},
{
"name":"my third item",
"item_property_1":"A",
( other properties here )
}
]
( other properties... )
},
{
"keyword": "different keyword",
"items": [
{
"name":"cool item",
"item_property_1":"A",
( other properties here )
},
{
"name":"awesome item",
"item_property_1":"C",
( other properties here )
},
]
( other properties... )
},
( other documents... )
Now, what I would like to do is to, for each keyword, count how many items there are for which of the several possible values that property_1 can have. That is, I want a bucket aggregation that would have the following response:
{
"keyword": "some keyword",
"item_property_1_aggretation": [
{
"key":"A",
"count": 2,
},
{
"key":"B",
"count": 1,
}
]
},
{
"keyword": "different keyword",
"item_property_1_aggretation": [
{
"key":"A",
"count": 1,
},
{
"key":"C",
"count": 1,
}
]
},
( other keywords... )
If mappings are necessary, could you also specificy which? I don't have any non-default mappings, I just dumped everything in there.
EDIT:
Saving you the trouble by posting here the bulk PUT for the previous example
PUT /test/test/_bulk
{ "index": {}}
{ "keyword": "some keyword", "items": [ { "name":"my first item", "item_property_1":"A" }, { "name":"my second item", "item_property_1":"B" }, { "name":"my third item", "item_property_1":"A" } ]}
{ "index": {}}
{ "keyword": "different keyword", "items": [ { "name":"cool item", "item_property_1":"A" }, { "name":"awesome item", "item_property_1":"C" } ]}
EDIT2:
I just tried this:
POST /test/test/_search
{
"size":2,
"aggregations": {
"property_1_count": {
"terms":{
"field":"item_property_1"
}
}
}
}
and got this:
"aggregations": {
"property_1_count": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "a",
"doc_count": 2
},
{
"key": "b",
"doc_count": 1
},
{
"key": "c",
"doc_count": 1
}
]
}
}
close but no cigar. You can see what's happening, it's bucketing over each item_property_1 irrespectively of the keyword it belongs to. I'm sure the solution involves adding some mapping correctly, but I can't put my finger on it. Suggestions?
EDIT3:
Based on this:
https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-nested-type.html
I want to try adding a nested type to property items. To do that, I tried:
PUT /test/_mapping/test
{
"test":{
"properties": {
"items": {
"type": "nested",
"properties": {
"item_property_1":{"type":"string"}
}
}
}
}
}
However, this returns an error:
{
"error": "MergeMappingException[Merge failed with failures {[object mapping [items] can't be changed from non-nested to nested]}]",
"status": 400
}
This might have to do with the warning on that url: "changing an object type to nested type requires reindexing."
So, how do I do that?
Nice tries, you were almost there! Here is what I came up with. Based on your mapping proposal, the mapping I'm using is the following:
curl -XPUT localhost:9200/test/_mapping/test -d '{
"test": {
"properties": {
"keyword": {
"type": "string",
"index": "not_analyzed"
},
"items": {
"type": "nested",
"properties": {
"name": {
"type": "string"
},
"item_property_1": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}'
Note: you need to wipe and reindex your data, since you cannot change a field type from being not nested to nested.
Then I created some data with the bulk query you shared:
curl -XPOST localhost:9200/test/test/_bulk -d '
{ "index": {}}
{ "keyword": "some keyword", "items": [ { "name":"my first item", "item_property_1":"A" }, { "name":"my second item", "item_property_1":"B" }, { "name":"my third item", "item_property_1":"A" } ]}
{ "index": {}}
{ "keyword": "different keyword", "items": [ { "name":"cool item", "item_property_1":"A" }, { "name":"awesome item", "item_property_1":"C" } ]}
'
Finally, here is the aggregation query you can use to get the results you expect. We first bucket by keyword using a terms aggregation and then for each keyword, we bucket by the nested item_property_1 field. Since items is now a nested type, the key is to use a nested aggregation for items and then a terms sub-aggregation for the item_property_1 field.
{
"size": 0,
"aggregations": {
"by_keyword": {
"terms": {
"field": "keyword"
},
"aggs": {
"prop_1_count": {
"nested": {
"path": "items"
},
"aggs": {
"prop_1": {
"terms": {
"field": "items.item_property_1"
}
}
}
}
}
}
}
}
Running that query on your data set will yield this:
{
...
"aggregations" : {
"by_keyword" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ {
"key" : "different keyword", <---- keyword 1
"doc_count" : 1,
"prop_1_count" : {
"doc_count" : 2,
"prop_1" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ { <---- buckets for item_property_1
"key" : "A",
"doc_count" : 1
}, {
"key" : "C",
"doc_count" : 1
} ]
}
}
}, {
"key" : "some keyword", <---- keyword 2
"doc_count" : 1,
"prop_1_count" : {
"doc_count" : 3,
"prop_1" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ { <---- buckets for item_property_1
"key" : "A",
"doc_count" : 2
}, {
"key" : "B",
"doc_count" : 1
} ]
}
}
} ]
}
}
}

How to get an Elasticsearch aggregation with multiple fields

I'm attempting to find related tags to the one currently being viewed. Every document in our index is tagged. Each tag is formed of two parts - an ID and text name:
{
...
meta: {
...
tags: [
{
id: 123,
name: 'Biscuits'
},
{
id: 456,
name: 'Cakes'
},
{
id: 789,
name: 'Breads'
}
]
}
}
To fetch the related tags I am simply querying the documents and getting an aggregate of their tags:
{
"query": {
"bool": {
"must": [
{
"match": {
"item.meta.tags.id": "123"
}
},
{
...
}
]
}
},
"aggs": {
"baked_goods": {
"terms": {
"field": "item.meta.tags.id",
"min_doc_count": 2
}
}
}
}
This works perfectly, I am getting the results I want. However, I require both the tag ID and name to do anything useful. I have explored how to accomplish this, the solutions seem to be:
Combine the fields when indexing
A script to munge together the fields
A nested aggregation
Option one and two are are not available to me so I have been going with 3 but it's not responding in an expected manner. Given the following query (still searching for documents also tagged with 'Biscuits'):
{
...
"aggs": {
"baked_goods": {
"terms": {
"field": "item.meta.tags.id",
"min_doc_count": 2
},
"aggs": {
"name": {
"terms": {
"field": "item.meta.tags.name"
}
}
}
}
}
}
I will get this result:
{
...
"aggregations": {
"baked_goods": {
"buckets": [
{
"key": "456",
"doc_count": 11,
"name": {
"buckets": [
{
"key": "Biscuits",
"doc_count": 11
},
{
"key": "Cakes",
"doc_count": 11
}
]
}
}
]
}
}
}
The nested aggregation includes both the search term and the tag I'm after (returned in alphabetical order).
I have tried to mitigate this by adding an exclude to the nested aggregation but this slowed the query down far too much (around 100 times for 500000 docs). So far the fastest solution is to de-dupe the result manually.
What is the best way to get an aggregation of tags with both the tag ID and tag name in the response?
Thanks for making it this far!
By the looks of it, your tags is not nested.
For this aggregation to work, you need it nested so that there is an association between an id and a name. Without nested the list of ids is just an array and the list of names is another array:
"item": {
"properties": {
"meta": {
"properties": {
"tags": {
"type": "nested", <-- nested field
"include_in_parent": true, <-- to, also, keep the flat array-like structure
"properties": {
"id": {
"type": "integer"
},
"name": {
"type": "string"
}
}
}
}
}
}
}
Also, note that I've added to the mapping this line "include_in_parent": true which means that your nested tags will, also, behave like a "flat" array-like structure.
So, everything you had so far in your queries will still work without any changes to the queries.
But, for this particular query of yours, the aggregation needs to change to something like this:
{
"aggs": {
"baked_goods": {
"nested": {
"path": "item.meta.tags"
},
"aggs": {
"name": {
"terms": {
"field": "item.meta.tags.id"
},
"aggs": {
"name": {
"terms": {
"field": "item.meta.tags.name"
}
}
}
}
}
}
}
}
And the result is like this:
"aggregations": {
"baked_goods": {
"doc_count": 9,
"name": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 123,
"doc_count": 3,
"name": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "biscuits",
"doc_count": 3
}
]
}
},
{
"key": 456,
"doc_count": 2,
"name": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "cakes",
"doc_count": 2
}
]
}
},
.....

Resources