elasticsearch sort aggregation on categorical value - sorting

In elasticsearch, I can aggregate and sort the aggregation on a second aggregation's numeric field.
e.g.
GET myindex/_search
{
"size":0,
"aggs": {
"a1": {
"terms": {
"field": "FIELD1",
"size":0,
"order": {"a2": "desc"}
},
"aggs":{
"a2":{
"sum":{
"field":"FIELD2"
}
}
}
}
}
}
However, I want to sort the aggregation on a categorical field value. ie. let's say the value of FIELD2 was one of ("a", "b", "c") -- I want to sort a1 first by all documents's with FIELD2: "a", then FIELD2: "b", then FIELD2: "c".
In my case, every FIELD1 has a unique FIELD2. So I really just want a way to sort the a1 results by FIELD2.

I am not sure what exactly you want but I tried following.
I created index with mapping
PUT your_index
{
"mappings": {
"your_type": {
"properties": {
"name": {
"type": "string"
},
"fruit" : {"type" : "string", "index": "not_analyzed"}
}
}
}
}
Then I indexed few documents like this
PUT your_index/your_type/1
{
"name" : "federer",
"fruit" : "orange"
}
Then I sorted all players with fruits with following aggregation
{
"size": 0,
"aggs": {
"a1": {
"terms": {
"field": "name",
"order": {
"_term": "asc"
}
},
"aggs": {
"a2": {
"terms": {
"field": "fruit",
"order": {
"_term": "asc"
}
}
}
}
}
}
}
The result I got is
"aggregations": {
"a1": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "federer",
"doc_count": 3,
"a2": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "green apple",
"doc_count": 1
},
{
"key": "orange",
"doc_count": 2
}
]
}
},
{
"key": "messi",
"doc_count": 2,
"a2": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "apple",
"doc_count": 1
},
{
"key": "banana",
"doc_count": 1
}
]
}
},
{
"key": "nadal",
"doc_count": 2,
"a2": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "blueberry",
"doc_count": 1
},
{
"key": "watermelon",
"doc_count": 1
}
]
}
},
{
"key": "ronaldo",
"doc_count": 2,
"a2": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "banana",
"doc_count": 1
},
{
"key": "watermelon",
"doc_count": 1
}
]
}
}
]
}
}
Make sure your FIELD2 is not_analyzed or you will get unexpected results.
Does this help?

I found a way that works. You must first aggregate on FIELD2, then on FIELD1.
{
"size": 0,
"aggs": {
"a2": {
"terms": {
"size": 0,
"field": "FIELD2",
"order": {
"_term": "asc"
}
},
"aggs": {
"a1": {
"terms": {
"size": 0,
"field": "FIELD1",
"order": {
"_term": "asc"
}
}
}
}
}
}
}

Related

Disbale "lowercase_normalizer" normalizer while applying aggregation in Elasticsearch

We have applied "lowercase_normalizer" normalizer to fields to achieve case insensitive search. However, we need to perform aggregation on certain fields without any text transformation. Is there any way of disabling the normalizer while aggregating through the records?
The data in the field has already been normalized before indexing, so it is stored in the index in lowercase. I suggest you make a separate field where you don't apply the lowercase_normalizer to be used for aggregations.
You can try my way.
Example your initial mapping is as follows:
"test_nor": {
"type": "keyword",
"normalizer": "lowerasciinormalizer"
}
Data
"test_nor": "Lê văn Lươn"
Aggregations look like this
{
"size": 0,
"aggs": {
"by_name": {
"terms": {
"field": "test_nor",
"size": 100
},
"aggs": {
"by_email": {
"terms": {
"field": "email",
"size": 100
}
}
}
}
}
}
Result aggregations
"aggregations": {
"by_name": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "le van luon",
"doc_count": 1,
"by_email": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "email1#gmail.com",
"doc_count": 1
},
{
"key": "email2#gmail.com",
"doc_count": 1
}
]
}
}
]
}
}
You want result aggregations "key": "Lê văn Lươn" right.
1: Update mapping
PUT /my-index-000001/_mapping
{
"properties": {
"test_nor": {
"type": "keyword",
"normalizer": "lowerasciinormalizer",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
2: Update a data field so that the data gets the new mapping
3: Update query aggregations
{
"size": 0,
"aggs": {
"by_name": {
"terms": {
"field": "test_nor.keyword",
"size": 100
},
"aggs": {
"by_email": {
"terms": {
"field": "email",
"size": 100
}
}
}
}
}
}
4: So now you got what you want
"aggregations": {
"by_name": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Lê văn Lươn",
"doc_count": 1,
"by_email": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "email1#gmail.com",
"doc_count": 1
},
{
"key": "email2#gmail.com",
"doc_count": 1
}
]
}
}
]
}
}
Hope it helps you !

Get all documents of having unique value with elastic

for example:
i have many documents like this:
email status
1#123.com open
1#123.com click
2#123.com open
3#123.com open
i will query all documents with unique status value :"open", due to the record "1#123.com" contains "click" status, so "1#123.com" don't expect!
i tried this below,but not my expect:
{
"aggs": {
"hard_bounce_count": {
"filter": {
"term": {
"actionStatus": "open"
}
},
"aggs": {
"email_count": {
"value_count": {
"field": "email"
}
}
}
my expect response like this:
2#123.com open
3#123.com open
How can i do this,thanks..
Here, outer term-aggs (named EMAIL_LIST) return all emails and then within each email bucket, first it finds whether the status is open or not (using filter-aggs with name OPEN) then it finds if the status is other than "open" (using another filter-aggs with name OTHER_THAN_OPEN)
{
"size": 0,
"aggs": {
"EMAIL_LIST": {
"terms": {
"field": "email.keyword"
},
"aggs": {
"OPEN": {
"filter": {
"bool": {
"must": [
{
"term": {
"status": "open"
}
}
]
}
}
},
"OTHER_THAN_OPEN": {
"filter": {
"bool": {
"must_not": [
{
"term": {
"status": "open"
}
}
]
}
}
},
"SELECTION_SCRIPT": {
"bucket_selector": {
"buckets_path": {
"open_count": "OPEN._count",
"other_than_open_count": "OTHER_THAN_OPEN._count"
},
"script": "params.other_than_open_count==0 && params.open_count>0"
}
}
}
}
}
}
Above "bucket_selector" aggregation select only those bucket to output which have only open status
"aggregations": {
"EMAIL_LIST": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "2#123.com",
"doc_count": 1,
"OTHER_THAN_OPEN": {
"doc_count": 0
},
"OPEN": {
"doc_count": 1
}
},
{
"key": "3#123.com",
"doc_count": 1,
"OTHER_THAN_OPEN": {
"doc_count": 0
},
"OPEN": {
"doc_count": 1
}
}
]
}
}
so final answer will be email "2#123.com" and "3#123.com"
I can query this too.
{
"aggs": {
"email": {
"terms": {
"field": "email"
},
"aggs": {
"status_group": {
"terms": {
"field": "status"
}
}
}
}
}
}
response:
"aggregations": {
"email": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [{
"key": "1#123.com",
"doc_count": 2,
"status_group": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [{
"key": "click",
"doc_count": 1
}, {
"key": "open",
"doc_count": 1
}
]
}
}, {
"key": "2#123.com",
"doc_count": 1,
"status_group": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [{
"key": "open",
"doc_count": 1
}
]
}
}, {
"key": "3#123.com",
"doc_count": 1,
"status_group": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [{
"key": "open",
"doc_count": 1
}
]
}
}
]
}
}
but how can I exclude "1#email" in resulting buckets, Because I eventually need the statistics of all eligible documents

Elasticsearch - Get aggregation key sort as number

I made query result that aggregate some data, and its aggregation key is number. I tried to sort result of aggregation by key. elasticsearch treated key as string.
Since the number of current result bucket is pretty large, it's unable to modify on client side. Any idea of this?
Here is my query.
"aggregations" : {
"startcount" : {
"terms" : {
"script" : "round(doc['startat'].value/1000)",
"size" : 1000,
"order" : { "_term" : "asc" }
}
}
}
and current result bucket.
"buckets": [
{
"key": "0",
"doc_count": 68
},
{
"key": "1",
"doc_count": 21
},
{
"key": "10",
"doc_count": 6
},
{
"key": "11",
"doc_count": 16
},
It's my expect result.
"buckets": [
{
"key": "0",
"doc_count": 68
},
{
"key": "1",
"doc_count": 21
},
{
"key": "2", // not '10'
"doc_count": 6
},
{
"key": "3", // not '11'
"doc_count": 16
},
Using the value_script approach should fix the alphabetical sort issue:
Example:
{
"size": 0,
"aggregations": {
"startcount": {
"terms": {
"field": "startat",
"script": "round(_value/1000)",
"size": 1000,
"order": {
"_term": "asc"
}
}
}
}
}
This is a multiple group by scenario where data are being sorted by the key descending order.
{
"size": 0,
"aggs": {
"categories": {
"filter": {
"exists": {
"field": "organization_industries"
}
},
"aggs": {
"names": {
"terms": {
"field": "organization_revenue_in_thousands_int.keyword",
"size": 200,
"order": {
"_key": "desc"
}
},
"aggs": {
"industry_stats": {
"terms": {
"field": "organization_industries.keyword"
}
}
}
}
}
}
}
}
Output
"aggregations": {
"categories": {
"doc_count": 195161605,
"names": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 19226983,
"buckets": [
{
"key": "99900",
"doc_count": 1742,
"industry_stats": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "internet",
"doc_count": 1605
},
{
"key": "investment management",
"doc_count": 81
},
{
"key": "biotechnology",
"doc_count": 54
},
{
"key": "computer & network security",
"doc_count": 2
}
]
}
},
{
"key": "998000",
"doc_count": 71,
"industry_stats": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "finance",
"doc_count": 48
},
{
"key": "information technology & services",
"doc_count": 23
}
]
}
}
}
]
}
}
enter code here

elasticsearch aggregations sort on buckets keys

How do i sort elasticsearch aggregations buckets on keys. I have nested aggregations and want to sort on my 2nd aggregation buckets result.
Like I have:
"result": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 20309,
"doc_count": 752,
"Events": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "impression",
"doc_count": 30
},
{
"key": "page_view",
"doc_count": 10
},
...
]
}
},
{
"key": 20771,
"doc_count": 46,
"Events": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "impression",
"doc_count": 32
},
{
"key": "page_view",
"doc_count": 9
},
...
]
}
},
I want my Events aggregate buckets to sort by desc/asc on key impression or on page_view.
How do I achieve such results set?
Here is my query
GET someindex/useractivity/_search?search_type=count
{
"size": 1000000,
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"range": {
"created_on": {
"from": "2015-01-12",
"to": "2016-05-12"
}
}
},
{
"term": {
"group_id": 1
}
}
]
}
}
}
},
"aggs": {
"result": {
"terms": {
"field": "entity_id",
"size": 1000000
},
"aggs": {
"Events": {
"terms": {
"field": "event_type",
"min_doc_count": 0,
"size": 10
}
}
}
}
}
}
I have tried using _key, but it sort within the bucket. I want to sort by looking at all buckets. Like I have a key impression. I want my buckets result to be sorted with this key. Not within the bucket.
I want my results set to be like if I want to sort on impression by descending order then my result should be
"buckets": [
{
"key": 20771,
"doc_count": 46,
"Events": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "impression",
"doc_count": 32
},
{
"key": "page_view",
"doc_count": 9
},
...
]
}
},
{
"key": 20309,
"doc_count": 752,
"Events": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "impression",
"doc_count": 30
},
{
"key": "page_view",
"doc_count": 10
},
...
]
}
},
i.e the bucket with maximum impression should be on top. (order buckets by impression in descending order)
Try this aggregation:
{
"size": 0,
"aggs": {
"result": {
"terms": {
"field": "entity_id",
"size": 10,
"order": {
"impression_Events": "desc"
}
},
"aggs": {
"Events": {
"terms": {
"field": "event_type",
"min_doc_count": 0,
"size": 10
}
},
"impression_Events": {
"filter": {
"term": {
"event_type": "impression"
}
}
}
}
}
}
}

How to use elasticsearch facet query to groupby the result

I have a json data in the below format
{
"ID": { "Color": "Black", "Product": "Car" },
"ID": { "Color": "Black", "Product": "Car" },
"ID": { "Color": "Black", "Product": "Van" },
"ID": { "Color": "Black", "Product": "Van" },
"ID": { "Color": "Ash", "Product": "Bike" }
}
I want to calculate the count of car and the corresponding color. I am using elasticsearch facet to do this.
My query
$http.post('http://localhost:9200/product/productinfoinfo/_search?size=5', { "aggregations": { "ProductInfo": { "terms": { "field": "product" } } }, "facets": { "ProductColor": { "terms": { "field": "Color", "size": 10 } } } })
I am getting the output like below
"facets": { "ProductColor": { "_type": "terms", "missing": 0, "total": 7115, "other": 1448, "terms": [ { "term": "Black", "count": 4 }, { "term": "Ash","count":1} },
"aggregations": { "ProductInfo": { "doc_count_error_upper_bound": 94, "sum_other_doc_count": 11414, "buckets": [ { "key": "Car", "doc_count": 2 }, { "key": "Van", "doc_count": 2 }, { "key": "Bike", "doc_count": 1 } ] } } }
What I actually want is,
[ { "key": "Car", "doc_count": 2, "Color":"Black", "count":2 }, { "key": "Van", "doc_count": 2,"Color":"Black", "count":2 }, { "key": "Bike", "doc_count": 1,"Color":"Ash", "count":1 } ]
I would like to groupby the result . Is it possible to do it in elasticsearch query.
Thanks in advance
This is because you're using both aggregations and facets, which, if they are similar, are not meant to be used together.
Facets are deprecated and will be soon removed from ElasticSearch.
Aggregations are the way to go to make "group by"-like queries.
You just have to nest another terms aggregation in the first one, like this :
{
"aggs": {
"By_type": {
"terms": {
"field": "Product"
},
"aggs": {
"By_color": {
"terms": {
"field": "Color"
}
}
}
}
}
}
And the result will be close to what you want :
"aggregations": {
"By_type": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "bike",
"doc_count": 2,
"By_color": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "ash",
"doc_count": 1
},
{
"key": "black",
"doc_count": 1
}
]
}
},
{
"key": "car",
"doc_count": 2,
"By_color": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "black",
"doc_count": 2
}
]
}
},
{
"key": "van",
"doc_count": 1,
"By_color": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "black",
"doc_count": 1
}
]
}
}
]
}
}

Resources