Merge the results of two aggregations - elasticsearch

I have an Elasticsearch index with documents that have the following fields:
author
contributor
Each of these fields may contain multiple user IDs.
I want to perform an aggregation that counts the total number of documents related to each user (either as author or contributor).
I can query each aggregation separately, but how do I combine them? Here's my query:
GET documents/_search
{
"aggs": {
"contributor": {
"terms": {
"field": "contributor"
}
},
"author": {
"terms": {
"field": "author"
}
}
}
}
Right now, I'm getting this result:
"aggregations": {
"author": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [{
"key": 2,
"doc_count": 10
},
{
"key": 1,
"doc_count": 7
},
{
"key": 5,
"doc_count": 3
}
]
},
"contributor": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [{
"key": 5,
"doc_count": 1
}]
}
}
But I'd like to have a single aggregation that returns the count of 4 documents for user 5.

Well if you can update your mappings and add a field this should work. Please not it could be really slow (agg on string are slow and shouldnot be over used). Note if author = contributor in the same doc the agg will wont count 2 occurance (good news).
{
"mappings": {
"test": {
"properties": {
"contributor": {
"type": "keyword",
"copy_to": "author_and_contributor"
},
"author": {
"type": "keyword",
"copy_to": "author_and_contributor"
},
"author_and_contributor": {
"type": "string",
"fielddata": true
}
}
}
}
}
{
"size": 0,
"aggs": {
"author_contrib_agg": {
"terms": {
"field": "author_and_contributor"
}
}
}
}

Related

What is the elastic search query for nested aggregation to return buckets of count values?

I have data of individual customers in Elastic Search, whose likings of Food_Item are stored as shown below. A customer likes many "Food_Items". So its a list. I have many customers also.
I have data in the following format:
{
"id": 1,
"customerName":"John",
"likings":[
{
"Food_Item": "Pizza",
"OnAScaleOfTen": 9
},
{
"Food_Item": "Chinese",
"OnAScaleOfTen": 10
}
]
},
{
"id": 2,
"customerName":"Mary",
"likings":[
{
"Food_Item": "Burger",
"OnAScaleOfTen": 10
},
{
"Food_Item": "Chinese",
"OnAScaleOfTen": 6
}
]
}
Now if i want to bucket list the unique "Food_Items" and their corresponding count something like this in the AGGR result:
"Liking_Status": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Chinese",
"Liking Count": {
"value": 2
}
},
{
"key": "Pizza",
"Liking Count": {
"value": 1
}
},
{
"key": "Burger",
"Liking Count": {
"value": 1
}
}]}
My mapping for the index is:
{
"mappings": {
"doc": {
"properties": {
"customerName": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"id": {
"type": "long"
},
"likings": {
"type":"nested",
"properties": {
"Food_Item": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"OnAScaleOfTen": {
"type": "long"
}
}
}
}
}
}
}
Can anyone help me with the Elastic Search Query. Thank you.
What you need is nested aggregation.
{
"size": 0,
"aggs": {
"buckets": { //aggregating on nested field
"nested": {
"path": "likings"
},
"aggs": {
"liking_count": {//term aggregation on the obj
"terms": {
"field": "likings.Food_Item.keyword"
}
}
}
}
}
}
Mapping:
I just mentioned that likings as nested. Apart from others are default. In this case, Food_Item is a text. Terms aggs works on keywords. So used keyword version of it from the index.
Output:
"aggregations": {
"buckets": {
"doc_count": 4,
"liking_count": { //You can name what you want here
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Chinese",
"doc_count": 2
},
{
"key": "Burger",
"doc_count": 1
},
{
"key": "Pizza",
"doc_count": 1
}
]
}
}
}

Disbale "lowercase_normalizer" normalizer while applying aggregation in Elasticsearch

We have applied "lowercase_normalizer" normalizer to fields to achieve case insensitive search. However, we need to perform aggregation on certain fields without any text transformation. Is there any way of disabling the normalizer while aggregating through the records?
The data in the field has already been normalized before indexing, so it is stored in the index in lowercase. I suggest you make a separate field where you don't apply the lowercase_normalizer to be used for aggregations.
You can try my way.
Example your initial mapping is as follows:
"test_nor": {
"type": "keyword",
"normalizer": "lowerasciinormalizer"
}
Data
"test_nor": "Lê văn Lươn"
Aggregations look like this
{
"size": 0,
"aggs": {
"by_name": {
"terms": {
"field": "test_nor",
"size": 100
},
"aggs": {
"by_email": {
"terms": {
"field": "email",
"size": 100
}
}
}
}
}
}
Result aggregations
"aggregations": {
"by_name": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "le van luon",
"doc_count": 1,
"by_email": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "email1#gmail.com",
"doc_count": 1
},
{
"key": "email2#gmail.com",
"doc_count": 1
}
]
}
}
]
}
}
You want result aggregations "key": "Lê văn Lươn" right.
1: Update mapping
PUT /my-index-000001/_mapping
{
"properties": {
"test_nor": {
"type": "keyword",
"normalizer": "lowerasciinormalizer",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
2: Update a data field so that the data gets the new mapping
3: Update query aggregations
{
"size": 0,
"aggs": {
"by_name": {
"terms": {
"field": "test_nor.keyword",
"size": 100
},
"aggs": {
"by_email": {
"terms": {
"field": "email",
"size": 100
}
}
}
}
}
}
4: So now you got what you want
"aggregations": {
"by_name": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Lê văn Lươn",
"doc_count": 1,
"by_email": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "email1#gmail.com",
"doc_count": 1
},
{
"key": "email2#gmail.com",
"doc_count": 1
}
]
}
}
]
}
}
Hope it helps you !

ES - Sub buckets based on values in bucket property rather than document values

I am new to ElasticSearch and I am trying to bucket objects coming from a search by hierarchical categories.
I apologize in advance for the length of the question but I wanted to give ample samples and information to make the need as clear as possible.
What I am Trying to Achieve
The problem is that categories form a hierarchy but are represented as a flat array of objects, each with a depth. I would like to generate an aggregation that would bucket by category and category depth.
Here is a simplified mapping for the document that contains only the minimum data:
{
"mappings": {
"_doc": {
"properties": {
"categoriesList": {
"properties": {
"depth": {
"type": "long"
},
"title": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
}
Here is a simplified sample document:
{
"_index": "x",
"_type": "_doc",
"_id": "wY0w5GYBOIOl7fi31c_b",
"_score": 22.72073,
"_source": {
"categoriesList": [
{
"title": "category_lvl_2_2",
"depth": 2
},
{
"title": "category_lvl_2",
"depth": 2,
},
{
"title": "category_lvl_1",
"depth": 1
}
]
}
}
Now, what I am trying to achieve is to get hierarchical buckets of categories based on their depth i.e. I want to have a bucket that contains all titles of categories of depth 1 across all hits, then another bucket (or sub-bucket with the titles of just the categories of depth 2 across all hits, and so on.
Something like:
"aggregations": {
"depth": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 1,
"doc_count": 47,
"name": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "category_lvl_1",
"doc_count": 47,
"depth_1": {
"doc_count": 47
}
}
]
}
},
{
"key": 2,
"doc_count": 47,
"name": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "category_lvl_2_1",
"doc_count": 47
},
{
"key": "category_lvl_2_2",
"doc_count": 33
}
]
}
}
]
}
}
What I have tried
At first I tried to simply create nested aggregations as follows:
"aggs": {
"depth": {
"terms": {
"field": "categoriesList.depth"
},
"aggs": {
"name": {
"terms": {
"field": "categoriesList.title.keyword"
},
}
}
}
}
This, of course, did not give what I wanted. It basically gave me buckets whose keys were by depth but that contained all titles of all categories no matter what their depth was; the contents were the same. Something like the following:
"aggregations": {
"depth": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 1,
"doc_count": 47,
"name": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "category_lvl_1",
"doc_count": 47
},
{
"key": "category_lvl_2_1",
"doc_count": 33
},
{
"key": "category_lvl_2_2",
"doc_count": 15
}
]
}
},
{
"key": 2,
"doc_count": 47,
"name": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "category_lvl_1",
"doc_count": 47
},
{
"key": "category_lvl_2_1",
"doc_count": 33
},
{
"key": "category_lvl_2_2",
"doc_count": 15
}
]
}
}
]
}
}
Then I tried to see if a filtered aggregation would work by trying to filter one sub-bucket by value of depth 1:
"aggs": {
"depth": {
"terms": {
"field": "categoriesList.depth"
},
"aggs": {
"name": {
"terms": {
"field": "categoriesList.title.keyword"
},
"aggs": {
"depth_1": {
"filter": {
"term": {
"categoriesList.depth": 1
}
}
}
}
}
}
}
}
This gave the same results as the simple aggregation query above but with an extra nesting level that served no purpose.
The question
With my current understanding of ES, what I am seeing makes sense: it goes over each document from the search and then creates buckets based on category depth but since each document has at least one category with each depth, the entire categories list is added to the bucket.
Is what I am trying to do possible with ES? I get the feeling that this will not work because I am basically trying to bucket and filter the properties used by the initial bucketing query rather than working on the document properties.
I could also bucket myself directly in code since we are getting the categories results but I wanted to know if it was possible to get this done on ES' side which would save me from modifying quite a bit of existing code we have.
Thanks!
Based on sramalingam24's comment I did the following to get it working:
Create an index with a mapping specifying nested types
I changed the mapping to tell ES that the categoriesList property was a nested object. To do so I created a new index with the following mapping:
{
"mappings": {
"_doc": {
"properties": {
"categoriesList": {
"type": "nested",
"properties": {
"depth": {
"type": "long"
},
"title": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
}
Reindex into the new index
Then I reindex from the old index to the new one.
{
"source": {
"index": "old_index"
},
"dest": {
"index": "index_with_nested_mapping"
}
}
Use a nested aggregation
Then I used a nested aggregation similar to this:
{
"aggs": {
"categories": {
"nested": {
"path": "categoriesList"
},
"aggs": {
"depth": {
"terms": {
"field": "categoriesList.depth"
},
"aggs": {
"sub-categories": {
"terms": {
"field": "categoriesList.title.keyword"
}
}
}
}
}
}
}
}
Which gave me the results I desired:
{
"aggregations": {
"categories": {
"doc_count": 96,
"depth": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 2,
"doc_count": 49,
"sub-categories": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "category_lvl_2_1",
"doc_count": 33
},
{
"key": "category_lvl_2_2",
"doc_count": 15
}
]
}
},
{
"key": 1,
"doc_count": 47,
"sub-categories": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "category_lvl_1",
"doc_count": 47
}
]
}
}
]
}
}
}
}

ElasticSearch aggregations - to lowercase or not to lowercase

Please observe this secenario:
Define mappings
PUT /my_index
{
"mappings": {
"my_type": {
"properties": {
"city": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}
Add data
PUT /my_index/my_type/1
{
"city": "New York"
}
PUT /my_index/my_type/2
{
"city": "York"
}
PUT /my_index/my_type/3
{
"city": "york"
}
Query for facets
GET /my_index/_search
{
"size": 0,
"aggs": {
"Cities": {
"terms": {
"field": "city.raw"
}
}
}
}
Result
{
...
"aggregations": {
"Cities": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "New York",
"doc_count": 1
},
{
"key": "York",
"doc_count": 1
},
{
"key": "york",
"doc_count": 1
}
]
}
}
}
Dilemma
I would like to 2 thing:
"York" and "york" should be combined so instead of 3 buckets with each 1 hit I would 2 buckets, one for "New York (1)" and one for "York (2)"
The casing of the city must be preserved - I don't want facet values to be all lowercased
Dream result
{
...
"aggregations": {
"Cities": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "New York",
"doc_count": 1
},
{
"key": "York",
"doc_count": 2
}
]
}
}
}
It's going to make your client-side code slightly more complicated, but you could always do something like this.
Set up the index with an additional sub-field that is only lower-cased (not split on white space):
PUT /my_index
{
"settings": {
"analysis": {
"analyzer": {
"lowercase_analyzer": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"lowercase"
]
}
}
}
},
"mappings": {
"my_type": {
"properties": {
"city": {
"type": "string",
"fields": {
"lowercase": {
"type": "string",
"analyzer": "lowercase_analyzer"
},
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}
PUT /my_index/my_type/_bulk
{"index":{"_id":1}}
{"city":"New York"}
{"index":{"_id":2}}
{"city":"York"}
{"index":{"_id":3}}
{"city":"york"}
Then use a two-level aggregation like this, where the second orders alphabetically ascending (so that upper-case term will come first) and only returns the top raw term for each lower-case term:
GET /my_index/_search
{
"size": 0,
"aggs": {
"city_lowercase": {
"terms": {
"field": "city.lowercase"
},
"aggs": {
"city_terms": {
"terms": {
"field": "city.raw",
"order" : { "_term" : "asc" },
"size": 1
}
}
}
}
}
}
which returns:
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0,
"hits": []
},
"aggregations": {
"city_lowercase": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "york",
"doc_count": 2,
"city_terms": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 1,
"buckets": [
{
"key": "York",
"doc_count": 1
}
]
}
},
{
"key": "new york",
"doc_count": 1,
"city_terms": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "New York",
"doc_count": 1
}
]
}
}
]
}
}
}
Here's the code I used (with a few more doc examples):
http://sense.qbox.io/gist/f3781d58fbaadcc1585c30ebb087108d2752dfff

elasticsearch sort aggregation on categorical value

In elasticsearch, I can aggregate and sort the aggregation on a second aggregation's numeric field.
e.g.
GET myindex/_search
{
"size":0,
"aggs": {
"a1": {
"terms": {
"field": "FIELD1",
"size":0,
"order": {"a2": "desc"}
},
"aggs":{
"a2":{
"sum":{
"field":"FIELD2"
}
}
}
}
}
}
However, I want to sort the aggregation on a categorical field value. ie. let's say the value of FIELD2 was one of ("a", "b", "c") -- I want to sort a1 first by all documents's with FIELD2: "a", then FIELD2: "b", then FIELD2: "c".
In my case, every FIELD1 has a unique FIELD2. So I really just want a way to sort the a1 results by FIELD2.
I am not sure what exactly you want but I tried following.
I created index with mapping
PUT your_index
{
"mappings": {
"your_type": {
"properties": {
"name": {
"type": "string"
},
"fruit" : {"type" : "string", "index": "not_analyzed"}
}
}
}
}
Then I indexed few documents like this
PUT your_index/your_type/1
{
"name" : "federer",
"fruit" : "orange"
}
Then I sorted all players with fruits with following aggregation
{
"size": 0,
"aggs": {
"a1": {
"terms": {
"field": "name",
"order": {
"_term": "asc"
}
},
"aggs": {
"a2": {
"terms": {
"field": "fruit",
"order": {
"_term": "asc"
}
}
}
}
}
}
}
The result I got is
"aggregations": {
"a1": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "federer",
"doc_count": 3,
"a2": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "green apple",
"doc_count": 1
},
{
"key": "orange",
"doc_count": 2
}
]
}
},
{
"key": "messi",
"doc_count": 2,
"a2": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "apple",
"doc_count": 1
},
{
"key": "banana",
"doc_count": 1
}
]
}
},
{
"key": "nadal",
"doc_count": 2,
"a2": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "blueberry",
"doc_count": 1
},
{
"key": "watermelon",
"doc_count": 1
}
]
}
},
{
"key": "ronaldo",
"doc_count": 2,
"a2": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "banana",
"doc_count": 1
},
{
"key": "watermelon",
"doc_count": 1
}
]
}
}
]
}
}
Make sure your FIELD2 is not_analyzed or you will get unexpected results.
Does this help?
I found a way that works. You must first aggregate on FIELD2, then on FIELD1.
{
"size": 0,
"aggs": {
"a2": {
"terms": {
"size": 0,
"field": "FIELD2",
"order": {
"_term": "asc"
}
},
"aggs": {
"a1": {
"terms": {
"size": 0,
"field": "FIELD1",
"order": {
"_term": "asc"
}
}
}
}
}
}
}

Resources