Incorrect unique values from field in elasticsearch - elasticsearch

I am trying to get unique values from the field in elastic search. For doing that first of all I did next:
PUT tv-programs/_mapping/text?update_all_types
{
"properties": {
"channelName": {
"type": "text",
"fielddata": true
}
}
}
After that I executed this :
GET _search
{
"size": 0,
"aggs" : {
"channels" : {
"terms" : { "field" : "channelName" ,
"size": 1000
}
}
}}
And saw next response:
...
"buckets": [
{
"key": "tv",
"doc_count": 4582
},
{
"key": "baby",
"doc_count": 2424
},
{
"key": "24",
"doc_count": 1547
},
{
"key": "channel",
"doc_count": 1192
},..
The problem is that in original entries there are not 4 different records. Correct output should be next:
"buckets": [
{
"key": "baby tv",
"doc_count": 4582
}
{
"key": "channel 24",
"doc_count": 1547
},..
Why that's happening? How can I see the correct output?

I've found the solution.
I just added .keyword after field name:
GET _search
{
"size": 0,
"aggs" : {
"channels" : {
"terms" : { "field" : "channelName.keyword" ,
"size": 1000
}
}
}}

Related

How to merge aggregation bucket in Elasticsearch?

Query
GET /_search
{
"size" : 0,
"query" : {
"ids" : {
"types" : [ ],
"values" : [ "someId1", "someId2", "someId3" ... ]
}
},
"aggregations" : {
"how_to_merge" : {
"terms" : {
"field" : "country",
"size" : 50
}
}
}
}
Result
{
...
"aggregations": {
"how_to_merge": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "KR",
"doc_count": 90
},
{
"key": "JP",
"doc_count": 83
},
{
"key": "US",
"doc_count": 50
},
{
"key": "BE",
"doc_count": 9
}
]
}
}
}
I want to merge "KR" and "JP" and "US"
And change key name to "NEW_RESULT"
So result must like this:
{
...
"aggregations": {
"how_to_merge": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "NEW_RESULT",
"doc_count": 223
},
{
"key": "BE",
"doc_count": 9
}
]
}
}
}
Is it possible in elasticsearch query?
I cannot use a client-side solution since there are too many entities and retrieving all of them and merging would be probably too slow for my application.
Thanks for your help and comments!
You can try writing a script for that though I would recommend benchmarking this approach against the client-side processing since it might be quite slow.

ElasticSearch Max Agg on lowest value inside a list property of the document

I'm looking to do a Max aggregation on a value of the property under my document, the property is a list of complex object (key and value). Here's my data:
[{
"id" : "1",
"listItems" :
[
{
"key" : "li1",
"value" : 100
},
{
"key" : "li2",
"value" : 5000
}
]
},
{
"id" : "2",
"listItems" :
[
{
"key" : "li3",
"value" : 200
},
{
"key" : "li2",
"value" : 2000
}
]
}]
When I do the Nested Max Aggregation on "listItems.value", I'm expecting the max value returned to be 200 (and not 5000), reason being I want the logic to first figure the MIN value under listItems for each document, then doing the Max Aggregation on that. Is it possible to do something like this?
Thanks.
The search query performs the following aggregation :
Terms aggregation on the id field
Min aggregation on listItems.value
Max bucket aggregation that is a sibling pipeline aggregation which identifies the bucket(s) with the maximum value of a specified metric in a sibling aggregation and outputs both the value and the key(s) of the bucket(s).
Please refer to nested aggregation, to get a detailed explanation on it.
Adding a working example with index data, index mapping, search query, and search result.
Index Mapping:
{
"mappings": {
"properties": {
"listItems": {
"type": "nested"
},
"id":{
"type":"text",
"fielddata":"true"
}
}
}
}
Index Data:
{
"id" : "1",
"listItems" :
[
{
"key" : "li1",
"value" : 100
},
{
"key" : "li2",
"value" : 5000
}
]
}
{
"id" : "2",
"listItems" :
[
{
"key" : "li3",
"value" : 200
},
{
"key" : "li2",
"value" : 2000
}
]
}
Search Query:
{
"size": 0,
"aggs": {
"id_terms": {
"terms": {
"field": "id"
},
"aggs": {
"nested_entries": {
"nested": {
"path": "listItems"
},
"aggs": {
"min_position": {
"min": {
"field": "listItems.value"
}
}
}
}
}
},
"maxValue": {
"max_bucket": {
"buckets_path": "id_terms>nested_entries>min_position"
}
}
}
}
Search Result:
"aggregations": {
"id_terms": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "1",
"doc_count": 1,
"nested_entries": {
"doc_count": 2,
"min_position": {
"value": 100.0
}
}
},
{
"key": "2",
"doc_count": 1,
"nested_entries": {
"doc_count": 2,
"min_position": {
"value": 200.0
}
}
}
]
},
"maxValue": {
"value": 200.0,
"keys": [
"2"
]
}
}
Initial post was mentioning nested aggregation, thus i was sure question is about nested documents. Since i've come to solution before seeing another answer, i'm keeping the whole thing for history, but actually it differs only in adding nested aggregation.
The whole process can be explained like that:
Bucket each document into single bucket.
Use nested aggregation to be able to aggregate on nested documents.
Use min aggregation to find minimum value within all document nested documents, and by that, for document itself.
Finally, use another aggregation to calculate maximum value among results of previous aggregation.
Given this setup:
// PUT /index
{
"mappings": {
"properties": {
"children": {
"type": "nested",
"properties": {
"value": {
"type": "integer"
}
}
}
}
}
}
// POST /index/_doc
{
"children": [
{ "value": 12 },
{ "value": 45 }
]
}
// POST /index/_doc
{
"children": [
{ "value": 7 },
{ "value": 35 }
]
}
I can use those aggregations in request to get required value:
{
"size": 0,
"aggs": {
"document": {
"terms": {"field": "_id"},
"aggs": {
"children": {
"nested": {
"path": "children"
},
"aggs": {
"minimum": {
"min": {
"field": "children.value"
}
}
}
}
}
},
"result": {
"max_bucket": {
"buckets_path": "document>children>minimum"
}
}
}
}
{
"aggregations": {
"document": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "O4QxyHQBK5VO9CW5xJGl",
"doc_count": 1,
"children": {
"doc_count": 2,
"minimum": {
"value": 7.0
}
}
},
{
"key": "OoQxyHQBK5VO9CW5kpEc",
"doc_count": 1,
"children": {
"doc_count": 2,
"minimum": {
"value": 12.0
}
}
}
]
},
"result": {
"value": 12.0,
"keys": [
"OoQxyHQBK5VO9CW5kpEc"
]
}
}
}
There also should be a workaround using script for calculating max - all that you will need to do is just find and return smallest value in document in such script.

how to enable elasticsearch return aggregations including the name in the response

I m performing some aggregations (by_shop and by_category ) over a data set using elasticsearch. The thing is I get the response where it s not specified the name of each agg and thus it s difficult to parse the response.
Query
"aggregations" : {
"byShop" : {
"terms" : {
"field" : "shopName",
"size" : 0
}
},
"byCategory" : {
"terms" : {
"field" : "category",
"size" : 0
}
}
}
Respone
"aggs": [
[
{
"name": "bucket",
"count": 5075,
"key": "shop1"
},
{
"name": "bucket",
"count": 1,
"key": "shop2"
}
],
[
{
"name": "bucket",
"count": 11,
"key": "Jewelry & Watches"
},
{
"name": "bucket",
"count": 1,
"key": "Home & Garden/Home Décor"
}
]
Ideally, I would like to see the following:
"aggregations": {
"byShop": {
"buckets": [
{
"count": 5075,
"key": "shop1"
},
{
"count": 1,
"key": "shop2"
}
]
},
"byCategory": {
"buckets": [
{
"count": 11,
"key": "Jewelry & Watches"
},
{
"count": 11,
"key": "Home & Garden/Home Décor"
}
]
}
}
EDIT
productResponse.getAggs().add(searchResult.getAggregations().getTermsAggregation("ByCategory").getBuckets());
productResponse.getAggs().add(searchResult.getAggregations().getTermsAggregation("ByShopname").getBuckets());
where searchResult holds the response from Elastcisearch. It seems that getBucket() trims the names of the aggs , right?

List of all users who have more than 40 documents each in ElasticSearch

I want to query list of all users who have more than 40 documents each.
I've created aggregation:
*"aggs" : {
"user-ids" : {
"terms" : {
"field" : "user-id",
"size": 0
}
}
}*
where all my users in response:
*{
"key": 683696,
"doc_count": 4086
},
{
"key": 678776,
"doc_count": 3625
},
{
"key": 683191,
"doc_count": 3304
},
{
"key": 684065,
"doc_count": 3287
},
.....*
I want to leave only buckets with "doc_count" more than 40. Is it possible?
Yes, you can achieve this with the min_doc_count setting. Try this:
{
"aggs" : {
"user-ids" : {
"terms" : {
"field" : "user-id",
"min_doc_count": 40 <--- use this setting
}
}
}
}

Elasticsearch sub-aggregation excluding key from parent

I am currently doing an aggregation to get the top 20 terms in a given field and the top 5 co-occuring terms.
{
"aggs": {
"descTerms" : {
"terms" : {
"field" : "Desc as Marketed",
"exclude": "[a-z]{1}|and|the|with",
"size" : 20
},
"aggs" : {
"innerTerms" : {
"terms" : {
"field" : "Desc as Marketed",
"size" : 5
}
}
}
}
}
}
Which results in something like this:
"key": "bluetooth",
"doc_count": 11172,
"innerTerms": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 33700,
"buckets": [
{
"key": "bluetooth",
"doc_count": 11172
},
{
"key": "with",
"doc_count": 3827
}
I would like to exclude the key in the sub aggregation as it always returns as the top result (obviously) I just can't seem to figure out how to do so.
aka I want the previous to look like this:
"key": "bluetooth",
"doc_count": 11172,
"innerTerms": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 33700,
"buckets": [
{
"key": "with",
"doc_count": 3827
}

Resources