We have applied "lowercase_normalizer" normalizer to fields to achieve case insensitive search. However, we need to perform aggregation on certain fields without any text transformation. Is there any way of disabling the normalizer while aggregating through the records?
The data in the field has already been normalized before indexing, so it is stored in the index in lowercase. I suggest you make a separate field where you don't apply the lowercase_normalizer to be used for aggregations.
You can try my way.
Example your initial mapping is as follows:
"test_nor": {
"type": "keyword",
"normalizer": "lowerasciinormalizer"
}
Data
"test_nor": "Lê văn Lươn"
Aggregations look like this
{
"size": 0,
"aggs": {
"by_name": {
"terms": {
"field": "test_nor",
"size": 100
},
"aggs": {
"by_email": {
"terms": {
"field": "email",
"size": 100
}
}
}
}
}
}
Result aggregations
"aggregations": {
"by_name": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "le van luon",
"doc_count": 1,
"by_email": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "email1#gmail.com",
"doc_count": 1
},
{
"key": "email2#gmail.com",
"doc_count": 1
}
]
}
}
]
}
}
You want result aggregations "key": "Lê văn Lươn" right.
1: Update mapping
PUT /my-index-000001/_mapping
{
"properties": {
"test_nor": {
"type": "keyword",
"normalizer": "lowerasciinormalizer",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
2: Update a data field so that the data gets the new mapping
3: Update query aggregations
{
"size": 0,
"aggs": {
"by_name": {
"terms": {
"field": "test_nor.keyword",
"size": 100
},
"aggs": {
"by_email": {
"terms": {
"field": "email",
"size": 100
}
}
}
}
}
}
4: So now you got what you want
"aggregations": {
"by_name": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Lê văn Lươn",
"doc_count": 1,
"by_email": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "email1#gmail.com",
"doc_count": 1
},
{
"key": "email2#gmail.com",
"doc_count": 1
}
]
}
}
]
}
}
Hope it helps you !
Related
I have an Elasticsearch index with documents that have the following fields:
author
contributor
Each of these fields may contain multiple user IDs.
I want to perform an aggregation that counts the total number of documents related to each user (either as author or contributor).
I can query each aggregation separately, but how do I combine them? Here's my query:
GET documents/_search
{
"aggs": {
"contributor": {
"terms": {
"field": "contributor"
}
},
"author": {
"terms": {
"field": "author"
}
}
}
}
Right now, I'm getting this result:
"aggregations": {
"author": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [{
"key": 2,
"doc_count": 10
},
{
"key": 1,
"doc_count": 7
},
{
"key": 5,
"doc_count": 3
}
]
},
"contributor": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [{
"key": 5,
"doc_count": 1
}]
}
}
But I'd like to have a single aggregation that returns the count of 4 documents for user 5.
Well if you can update your mappings and add a field this should work. Please not it could be really slow (agg on string are slow and shouldnot be over used). Note if author = contributor in the same doc the agg will wont count 2 occurance (good news).
{
"mappings": {
"test": {
"properties": {
"contributor": {
"type": "keyword",
"copy_to": "author_and_contributor"
},
"author": {
"type": "keyword",
"copy_to": "author_and_contributor"
},
"author_and_contributor": {
"type": "string",
"fielddata": true
}
}
}
}
}
{
"size": 0,
"aggs": {
"author_contrib_agg": {
"terms": {
"field": "author_and_contributor"
}
}
}
}
I have an ES query which returns me data in the following format"
"by_group": {
"doc_count_error_upper_bound": 55,
"sum_other_doc_count": 1094497,
"buckets": [{
"key": "a838c7df-1ea9-48f1-aa71-69936b54f47d",
"doc_count": 69,
"by_subGroup": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [{
"key": "k1",
"doc_count": 45
},
{
"key": "k2",
"doc_count": 7
},
{
"key": "k3",
"doc_count": 6
},
{
"key": "k6",
"doc_count": 6
}
]
}
}]
}
I would like to filter my result(by_group) using the keys of my subgroup.
for example
I only want the by_group which have keys:k1,k2 but not k3.
Is it possible to filter in this way?
my current query looks like:
{
"size": 0,
"query": {
},
"aggs": {
"aggs": {
"by_group": {
"terms": {
"field": "field1",
"size": 10
},
"aggs": {
"by_subGroup": {
"terms": {
"field": "field2",
"size": 1000
}
}
}
}
}
}
}
}
Use Filter in aggregation.
{
"size": 0,
"query": {
},
"aggs": {
"aggs": {
"by_group": {
"terms": {
"field": "field1",
"size": 10
},
"aggs": {
"by_subGroup": {
"filter": {
"terms" : {
"field2": ["k1","k2]
}
},
"terms": {
"field": "field2",
"size": 1000
}
}
}
}
}
}
}
}
I am new to ElasticSearch and I am trying to bucket objects coming from a search by hierarchical categories.
I apologize in advance for the length of the question but I wanted to give ample samples and information to make the need as clear as possible.
What I am Trying to Achieve
The problem is that categories form a hierarchy but are represented as a flat array of objects, each with a depth. I would like to generate an aggregation that would bucket by category and category depth.
Here is a simplified mapping for the document that contains only the minimum data:
{
"mappings": {
"_doc": {
"properties": {
"categoriesList": {
"properties": {
"depth": {
"type": "long"
},
"title": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
}
Here is a simplified sample document:
{
"_index": "x",
"_type": "_doc",
"_id": "wY0w5GYBOIOl7fi31c_b",
"_score": 22.72073,
"_source": {
"categoriesList": [
{
"title": "category_lvl_2_2",
"depth": 2
},
{
"title": "category_lvl_2",
"depth": 2,
},
{
"title": "category_lvl_1",
"depth": 1
}
]
}
}
Now, what I am trying to achieve is to get hierarchical buckets of categories based on their depth i.e. I want to have a bucket that contains all titles of categories of depth 1 across all hits, then another bucket (or sub-bucket with the titles of just the categories of depth 2 across all hits, and so on.
Something like:
"aggregations": {
"depth": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 1,
"doc_count": 47,
"name": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "category_lvl_1",
"doc_count": 47,
"depth_1": {
"doc_count": 47
}
}
]
}
},
{
"key": 2,
"doc_count": 47,
"name": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "category_lvl_2_1",
"doc_count": 47
},
{
"key": "category_lvl_2_2",
"doc_count": 33
}
]
}
}
]
}
}
What I have tried
At first I tried to simply create nested aggregations as follows:
"aggs": {
"depth": {
"terms": {
"field": "categoriesList.depth"
},
"aggs": {
"name": {
"terms": {
"field": "categoriesList.title.keyword"
},
}
}
}
}
This, of course, did not give what I wanted. It basically gave me buckets whose keys were by depth but that contained all titles of all categories no matter what their depth was; the contents were the same. Something like the following:
"aggregations": {
"depth": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 1,
"doc_count": 47,
"name": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "category_lvl_1",
"doc_count": 47
},
{
"key": "category_lvl_2_1",
"doc_count": 33
},
{
"key": "category_lvl_2_2",
"doc_count": 15
}
]
}
},
{
"key": 2,
"doc_count": 47,
"name": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "category_lvl_1",
"doc_count": 47
},
{
"key": "category_lvl_2_1",
"doc_count": 33
},
{
"key": "category_lvl_2_2",
"doc_count": 15
}
]
}
}
]
}
}
Then I tried to see if a filtered aggregation would work by trying to filter one sub-bucket by value of depth 1:
"aggs": {
"depth": {
"terms": {
"field": "categoriesList.depth"
},
"aggs": {
"name": {
"terms": {
"field": "categoriesList.title.keyword"
},
"aggs": {
"depth_1": {
"filter": {
"term": {
"categoriesList.depth": 1
}
}
}
}
}
}
}
}
This gave the same results as the simple aggregation query above but with an extra nesting level that served no purpose.
The question
With my current understanding of ES, what I am seeing makes sense: it goes over each document from the search and then creates buckets based on category depth but since each document has at least one category with each depth, the entire categories list is added to the bucket.
Is what I am trying to do possible with ES? I get the feeling that this will not work because I am basically trying to bucket and filter the properties used by the initial bucketing query rather than working on the document properties.
I could also bucket myself directly in code since we are getting the categories results but I wanted to know if it was possible to get this done on ES' side which would save me from modifying quite a bit of existing code we have.
Thanks!
Based on sramalingam24's comment I did the following to get it working:
Create an index with a mapping specifying nested types
I changed the mapping to tell ES that the categoriesList property was a nested object. To do so I created a new index with the following mapping:
{
"mappings": {
"_doc": {
"properties": {
"categoriesList": {
"type": "nested",
"properties": {
"depth": {
"type": "long"
},
"title": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
}
Reindex into the new index
Then I reindex from the old index to the new one.
{
"source": {
"index": "old_index"
},
"dest": {
"index": "index_with_nested_mapping"
}
}
Use a nested aggregation
Then I used a nested aggregation similar to this:
{
"aggs": {
"categories": {
"nested": {
"path": "categoriesList"
},
"aggs": {
"depth": {
"terms": {
"field": "categoriesList.depth"
},
"aggs": {
"sub-categories": {
"terms": {
"field": "categoriesList.title.keyword"
}
}
}
}
}
}
}
}
Which gave me the results I desired:
{
"aggregations": {
"categories": {
"doc_count": 96,
"depth": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 2,
"doc_count": 49,
"sub-categories": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "category_lvl_2_1",
"doc_count": 33
},
{
"key": "category_lvl_2_2",
"doc_count": 15
}
]
}
},
{
"key": 1,
"doc_count": 47,
"sub-categories": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "category_lvl_1",
"doc_count": 47
}
]
}
}
]
}
}
}
}
How do i sort elasticsearch aggregations buckets on keys. I have nested aggregations and want to sort on my 2nd aggregation buckets result.
Like I have:
"result": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 20309,
"doc_count": 752,
"Events": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "impression",
"doc_count": 30
},
{
"key": "page_view",
"doc_count": 10
},
...
]
}
},
{
"key": 20771,
"doc_count": 46,
"Events": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "impression",
"doc_count": 32
},
{
"key": "page_view",
"doc_count": 9
},
...
]
}
},
I want my Events aggregate buckets to sort by desc/asc on key impression or on page_view.
How do I achieve such results set?
Here is my query
GET someindex/useractivity/_search?search_type=count
{
"size": 1000000,
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"range": {
"created_on": {
"from": "2015-01-12",
"to": "2016-05-12"
}
}
},
{
"term": {
"group_id": 1
}
}
]
}
}
}
},
"aggs": {
"result": {
"terms": {
"field": "entity_id",
"size": 1000000
},
"aggs": {
"Events": {
"terms": {
"field": "event_type",
"min_doc_count": 0,
"size": 10
}
}
}
}
}
}
I have tried using _key, but it sort within the bucket. I want to sort by looking at all buckets. Like I have a key impression. I want my buckets result to be sorted with this key. Not within the bucket.
I want my results set to be like if I want to sort on impression by descending order then my result should be
"buckets": [
{
"key": 20771,
"doc_count": 46,
"Events": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "impression",
"doc_count": 32
},
{
"key": "page_view",
"doc_count": 9
},
...
]
}
},
{
"key": 20309,
"doc_count": 752,
"Events": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "impression",
"doc_count": 30
},
{
"key": "page_view",
"doc_count": 10
},
...
]
}
},
i.e the bucket with maximum impression should be on top. (order buckets by impression in descending order)
Try this aggregation:
{
"size": 0,
"aggs": {
"result": {
"terms": {
"field": "entity_id",
"size": 10,
"order": {
"impression_Events": "desc"
}
},
"aggs": {
"Events": {
"terms": {
"field": "event_type",
"min_doc_count": 0,
"size": 10
}
},
"impression_Events": {
"filter": {
"term": {
"event_type": "impression"
}
}
}
}
}
}
}
In elasticsearch, I can aggregate and sort the aggregation on a second aggregation's numeric field.
e.g.
GET myindex/_search
{
"size":0,
"aggs": {
"a1": {
"terms": {
"field": "FIELD1",
"size":0,
"order": {"a2": "desc"}
},
"aggs":{
"a2":{
"sum":{
"field":"FIELD2"
}
}
}
}
}
}
However, I want to sort the aggregation on a categorical field value. ie. let's say the value of FIELD2 was one of ("a", "b", "c") -- I want to sort a1 first by all documents's with FIELD2: "a", then FIELD2: "b", then FIELD2: "c".
In my case, every FIELD1 has a unique FIELD2. So I really just want a way to sort the a1 results by FIELD2.
I am not sure what exactly you want but I tried following.
I created index with mapping
PUT your_index
{
"mappings": {
"your_type": {
"properties": {
"name": {
"type": "string"
},
"fruit" : {"type" : "string", "index": "not_analyzed"}
}
}
}
}
Then I indexed few documents like this
PUT your_index/your_type/1
{
"name" : "federer",
"fruit" : "orange"
}
Then I sorted all players with fruits with following aggregation
{
"size": 0,
"aggs": {
"a1": {
"terms": {
"field": "name",
"order": {
"_term": "asc"
}
},
"aggs": {
"a2": {
"terms": {
"field": "fruit",
"order": {
"_term": "asc"
}
}
}
}
}
}
}
The result I got is
"aggregations": {
"a1": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "federer",
"doc_count": 3,
"a2": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "green apple",
"doc_count": 1
},
{
"key": "orange",
"doc_count": 2
}
]
}
},
{
"key": "messi",
"doc_count": 2,
"a2": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "apple",
"doc_count": 1
},
{
"key": "banana",
"doc_count": 1
}
]
}
},
{
"key": "nadal",
"doc_count": 2,
"a2": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "blueberry",
"doc_count": 1
},
{
"key": "watermelon",
"doc_count": 1
}
]
}
},
{
"key": "ronaldo",
"doc_count": 2,
"a2": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "banana",
"doc_count": 1
},
{
"key": "watermelon",
"doc_count": 1
}
]
}
}
]
}
}
Make sure your FIELD2 is not_analyzed or you will get unexpected results.
Does this help?
I found a way that works. You must first aggregate on FIELD2, then on FIELD1.
{
"size": 0,
"aggs": {
"a2": {
"terms": {
"size": 0,
"field": "FIELD2",
"order": {
"_term": "asc"
}
},
"aggs": {
"a1": {
"terms": {
"size": 0,
"field": "FIELD1",
"order": {
"_term": "asc"
}
}
}
}
}
}
}