Count of all the results from terms aggregation bucket - elasticsearch

GET /test*/_search
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"test": {
"terms": {
"field": "name.keyword",
"min_doc_count": 10
}
}
}
}
the above query will return all the unique names which occurs more than 10 times.
I want the query to return the count of those unique names with occurrance more than 10 times.
I could not find a way to do find out the count. Can anybody help with it.
Thanks.

This will do the trick.
"count" will give you the count of unique names with occurrences more than 10 times.
GET /test*/_search?filter_path=aggregations.count
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"test": {
"terms": {
"field": "name.keyword",
"min_doc_count": 10
},
"aggs": {
"test2": {
"value_count": {
"field": "name.keyword"
}
},
"test3":{
"bucket_script": {
"buckets_path": "test2",
"script": "return 1"
}
}
}
},
"count":{
"sum_bucket": {
"buckets_path": "test>test3"
}
}
}
}
Let me know if this solves your problem.

Related

How to define percentage of result items with specific field in Elasticsearch query?

I have a search query that returns all items matching users that have type manager or lead.
{
"from": 0,
"size": 20,
"query": {
"bool": {
"should": [
{
"terms": {
"type": ["manager", "lead"]
}
}
]
}
}
}
Is there a way to define what percentage of the results should be of type "manager"?
In other words, I want the results to have 80% of users with type manager and 20% with type lead.
I want to make a suggestion to use bucket_path aggregation. As I know this aggregation needs to be run in sub-aggs of a histogram aggregation. As you have such field in your mapping so I think this query should work for you:
{
"size": 0,
"aggs": {
"NAME": {
"date_histogram": {
"field": "my_datetime",
"interval": "month"
},
"aggs": {
"role_type": {
"terms": {
"field": "type",
"size": 10
},
"aggs": {
"count": {
"value_count": {
"field": "_id"
}
}
}
},
"role_1_ratio": {
"bucket_script": {
"buckets_path": {
"role_1": "role_type['manager']>count",
"role_2": "role_type['lead']>count"
},
"script": "params.role_1 / (params.role_1+params.role_2)*100"
}
},
"role_2_ratio": {
"bucket_script": {
"buckets_path": {
"role_1": "role_type['manager']>count",
"role_2": "role_type['lead']>count"
},
"script": "params.role_2 / (params.role_1+params.role_2)*100"
}
}
}
}
}
}
Please let me know if it didn't work well for you.

Elasticsearch : How get result buckets size

Here is my query result
GET _search
{
"size": 0,
"query": {
"bool": {
"must": [
{
"match": {
"serviceName.keyword": "directory-view-service"
}
},
{
"match": {
"path": "thewall"
}
},
{
"range": {
"#timestamp": {
"from": "now-31d",
"to": "now"
}
}
}
]
}
},
"aggs": {
"by_day": {
"date_histogram": {
"field": "date",
"interval": "7d"
},
"aggs": {
"byUserUid": {
"terms": {
"field": "token_userId.keyword",
"size": 150000
},
"aggs": {
"filterByCallNumber": {
"bucket_selector": {
"buckets_path": {
"doc_count": "_count"
},
"script": {
"inline": "params.doc_count <= 1"
}
}
}
}
}
}
}
}
}
I want my query return all user call my endpoint min. once time by 1 month range by 7 days interval, until then everything is good.
But my result is a buckets with 370 elements and I just need to know the array size...
Are there any keyword or how can I handle it ?
Thanks

How to display only the key from the bucket

I have an index with millions of documents. Suppose each of my documents has some code, and I need to find the list of codes matching some criteria. The only way I found doing that, is using whole lot of aggregations, so I created an ugly query which does exactly what I want:
POST my-index/_search
{
"query": {
"range": {
"timestamp": {
"gte": "2017-08-01T00:00:00.000",
"lt": "2017-08-08T00:00:00.000"
}
}
},
"size": 0,
"aggs": {
"codes": {
"terms": {
"field": "code",
"size": 10000
},
"aggs": {
"days": {
"date_histogram": {
"field": "timestamp",
"interval": "day",
"format": "dd"
},
"aggs": {
"hours": {
"date_histogram": {
"field": "timestamp",
"interval": "hour",
"format": "yyyy-MM-dd:HH"
},
"aggs": {
"hour_income": {
"sum": {
"field": "price"
}
}
}
},
"max_income": {
"max_bucket": {
"buckets_path": "hours>hour_income"
}
},
"day_income": {
"sum_bucket": {
"buckets_path": "hours.hour_income"
}
},
"more_than_sixty_percent": {
"bucket_script": {
"buckets_path": {
"dayIncome": "day_income",
"maxIncome": "max_income"
},
"script": "params.maxIncome - params.dayIncome * 60 / 100 > 0 ? 1 : 0"
}
}
}
},
"amount_of_days": {
"sum_bucket": {
"buckets_path": "days.more_than_sixty_percent"
}
},
"bucket_filter": {
"bucket_selector": {
"buckets_path": {
"amountOfDays": "amount_of_days"
},
"script": "params.amountOfDays >= 3"
}
}
}
}
}
}
The response I get is a few millions lines of JSON, consisting of buckets. Each bucket has more than 700 lines (and buckets of its own), but all I need is its key, so that I have my list of codes. I guess it's not good having a response a few thousand times larger than neccessary, and there might be problems with parsing. So I wanted to ask, is there any way to hide the other info in the bucket and get only the keys?
Thanks.

ElasticSearch Query to find intersection of two queries

Records exist in this format: {user_id, state}.
I need to write an elasticsearch query to find all user_id's that have both states present in the records list.
For example, if sample records stored are:
{1,a}
{1,b}
{2,a}
{2,b}
{1,a}
{3,b}
{3,b}
The output from running the query for this example would be
{"1", "2"}
I've tried this so far:
{
"size": 0,
"query": {
"bool": {
"filter": {
"terms": {
"state": [
"a",
"b"
]
}
}
}
},
"aggs": {
"user_id_intersection": {
"terms": {
"field": "user_id",
"min_doc_count": 2,
"size": 100
}
}
}
}
but this will return
{"1", "2", "3"}
Assuming you know the cardinality of the states set, here 2, you can use the
Bucket Selector Aggregation
GET test/_search
{
"size": 0,
"aggs": {
"user_ids": {
"terms": {
"field": "user_id"
},
"aggs": {
"states_card": {
"cardinality": {
"field": "state"
}
},
"state_filter": {
"bucket_selector": {
"buckets_path": {
"states_card": "states_card"
},
"script": "params.states_card == 2"
}
}
}
}
}
}

For each country/colour/brand combination , find sum of number of items in elasticsearch

This is a portion of the data I have indexed in elasticsearch:
{
"country" : "India",
"colour" : "white",
"brand" : "sony"
"numberOfItems" : 3
}
I want to get the total sum of numberOfItems on a per country basis, per colour basis and per brand basis. Is there any way to do this in elasticsearch?
The following should land you straight to the answer.
Make sure you enable scripting before using it.
{
"aggs": {
"keys": {
"terms": {
"script": "doc['country'].value + doc['color'].value + doc['brand'].value"
},
"aggs": {
"keySum": {
"sum": {
"field": "numberOfItems"
}
}
}
}
}
}
To get a single result you may use sum aggregation applied to a filtered query with term (terms) filter, e.g.:
{
"query": {
"filtered": {
"filter": {
"term": {
"country": "India"
}
}
}
},
"aggs": {
"total_sum": {
"sum": {
"field": "numberOfItems"
}
}
}
}
To get statistics for all countries/colours/brands in a single pass over the data you may use the following query with 3 multi-bucket aggregations, each of them containing a single-bucket sum sub-aggregation:
{
"query": {
"match_all": {}
},
"aggs": {
"countries": {
"terms": {
"field": "country"
},
"aggs": {
"country_sum": {
"sum": {
"field": "numberOfItems"
}
}
}
},
"colours": {
"terms": {
"field": "colour"
},
"aggs": {
"colour_sum": {
"sum": {
"field": "numberOfItems"
}
}
}
},
"brands": {
"terms": {
"field": "brand"
},
"aggs": {
"brand_sum": {
"sum": {
"field": "numberOfItems"
}
}
}
}
}
}

Resources