List of all users who have more than 40 documents each in ElasticSearch - elasticsearch

I want to query list of all users who have more than 40 documents each.
I've created aggregation:
*"aggs" : {
"user-ids" : {
"terms" : {
"field" : "user-id",
"size": 0
}
}
}*
where all my users in response:
*{
"key": 683696,
"doc_count": 4086
},
{
"key": 678776,
"doc_count": 3625
},
{
"key": 683191,
"doc_count": 3304
},
{
"key": 684065,
"doc_count": 3287
},
.....*
I want to leave only buckets with "doc_count" more than 40. Is it possible?

Yes, you can achieve this with the min_doc_count setting. Try this:
{
"aggs" : {
"user-ids" : {
"terms" : {
"field" : "user-id",
"min_doc_count": 40 <--- use this setting
}
}
}
}

Related

elasticsearch filters aggregation does not return array format

The filters aggregation returns bucket as object
"buckets": {
"errors": {
"doc_count": 1
},
"warnings": {
"doc_count": 2
}
}
But i would like to return a buckets array, like the terms aggregation
"buckets": [
{
"key": "errors",
"doc_count": 1
},
{
"key": "warnings",
"doc_count": 2
}
]
Is this possible or any sort of data transformation can be done in the query to make it so?
You can do it by providing an array of filters, but in this case your buckets will be anonymous:
GET logs/_search
{
"size": 0,
"aggs" : {
"messages" : {
"filters" : {
"filters" : [ <--- specify array
{ "match" : { "body" : "error" }},
{ "match" : { "body" : "warning" }}
]
}
}
}
}
The response will provide an array of resulting buckets in the same order
"buckets": [
{
"doc_count": 1
},
{
"doc_count": 2
}
]

How to merge aggregation bucket in Elasticsearch?

Query
GET /_search
{
"size" : 0,
"query" : {
"ids" : {
"types" : [ ],
"values" : [ "someId1", "someId2", "someId3" ... ]
}
},
"aggregations" : {
"how_to_merge" : {
"terms" : {
"field" : "country",
"size" : 50
}
}
}
}
Result
{
...
"aggregations": {
"how_to_merge": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "KR",
"doc_count": 90
},
{
"key": "JP",
"doc_count": 83
},
{
"key": "US",
"doc_count": 50
},
{
"key": "BE",
"doc_count": 9
}
]
}
}
}
I want to merge "KR" and "JP" and "US"
And change key name to "NEW_RESULT"
So result must like this:
{
...
"aggregations": {
"how_to_merge": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "NEW_RESULT",
"doc_count": 223
},
{
"key": "BE",
"doc_count": 9
}
]
}
}
}
Is it possible in elasticsearch query?
I cannot use a client-side solution since there are too many entities and retrieving all of them and merging would be probably too slow for my application.
Thanks for your help and comments!
You can try writing a script for that though I would recommend benchmarking this approach against the client-side processing since it might be quite slow.

Incorrect unique values from field in elasticsearch

I am trying to get unique values from the field in elastic search. For doing that first of all I did next:
PUT tv-programs/_mapping/text?update_all_types
{
"properties": {
"channelName": {
"type": "text",
"fielddata": true
}
}
}
After that I executed this :
GET _search
{
"size": 0,
"aggs" : {
"channels" : {
"terms" : { "field" : "channelName" ,
"size": 1000
}
}
}}
And saw next response:
...
"buckets": [
{
"key": "tv",
"doc_count": 4582
},
{
"key": "baby",
"doc_count": 2424
},
{
"key": "24",
"doc_count": 1547
},
{
"key": "channel",
"doc_count": 1192
},..
The problem is that in original entries there are not 4 different records. Correct output should be next:
"buckets": [
{
"key": "baby tv",
"doc_count": 4582
}
{
"key": "channel 24",
"doc_count": 1547
},..
Why that's happening? How can I see the correct output?
I've found the solution.
I just added .keyword after field name:
GET _search
{
"size": 0,
"aggs" : {
"channels" : {
"terms" : { "field" : "channelName.keyword" ,
"size": 1000
}
}
}}

Elasticsearch sub-aggregation excluding key from parent

I am currently doing an aggregation to get the top 20 terms in a given field and the top 5 co-occuring terms.
{
"aggs": {
"descTerms" : {
"terms" : {
"field" : "Desc as Marketed",
"exclude": "[a-z]{1}|and|the|with",
"size" : 20
},
"aggs" : {
"innerTerms" : {
"terms" : {
"field" : "Desc as Marketed",
"size" : 5
}
}
}
}
}
}
Which results in something like this:
"key": "bluetooth",
"doc_count": 11172,
"innerTerms": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 33700,
"buckets": [
{
"key": "bluetooth",
"doc_count": 11172
},
{
"key": "with",
"doc_count": 3827
}
I would like to exclude the key in the sub aggregation as it always returns as the top result (obviously) I just can't seem to figure out how to do so.
aka I want the previous to look like this:
"key": "bluetooth",
"doc_count": 11172,
"innerTerms": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 33700,
"buckets": [
{
"key": "with",
"doc_count": 3827
}

ElasticSearch: min_doc_count on lower/lowest level nested aggregation

I have this query with some nested aggregations
{
"aggs": {
"by_date": {
"date_histogram": {
"field": "timestamp",
"interval": "day"
},
"aggs": {
"new_users": {
"filter": {
"query": {
"match": {
"action": "USER_ADD"
}
}
},
"aggs": {
"unique_users": {
"cardinality": {
"field": "user"
}
}
}
}
}
}
},
"size": 0
}
It yields results that look like this
"aggregations": {
"by_date": {
"buckets": [
{
"key_as_string": "1970-01-07T00:00:00.000Z",
"key": 518400000,
"doc_count": 210,
"new_users": {
"doc_count": 0,
"unique_users": {
"value": 0
}
}
},
{
"key_as_string": "1970-01-09T00:00:00.000Z",
"key": 691200000,
"doc_count": 6,
"new_users": {
"doc_count": 0,
"unique_users": {
"value": 0
}
}
},
......
What I want to happen is apply min_doc_count on the most nested sub-aggregation such that I don't see zero values for "unique_users" (in this case) returned.
The issue is that min_doc_count can't be applied to my query other than the date_histogram at the top level.
Does the ES query language support something like this? Any know workarounds?
Thanks,
George
As per ElasticSearch Documentation min_doc_count can used with any aggregation including histogram
for example
{
"aggs" : {
"tags" : {
"terms" : {
"field" : "tag"
}
}
}
}
the above query is not date_histogram still you can apply the min_doc_count
{
"aggs" : {
"tags" : {
"terms" : {
"field" : "tag",
"min_doc_count" : 1
}
}
}
}
only thing is min_doc_count can be applied to any aggregation

Resources