Elasticsearch - get terms aggregation for specified fields - elasticsearch

I am using terms aggregations to get all the no of users from each city
{
"aggs" : {
"cities" : {
"terms" : { "field" : "city.name" }
}
}
}
This is giving results. But I always want to get some specific cities in results of aggregation irrespective of whether they are in top 10 or not. Do I need to use filter aggregation for each of the city separately to get its result?

You have three solutions:
A. You can specify a filter in the query:
{
"query": {
"terms": {
"city.name": [ "city1", "city2", "city3" ]
}
},
"aggs": {
"cities": {
"terms": {
"field": "city.name"
}
}
}
}
B. You can specify a filter in the aggregations:
{
"aggs": {
"city_filter": {
"filter": {
"terms": {
"city.name": [
"city1",
"city2",
"city3"
]
}
},
"aggs": {
"cities": {
"terms": {
"field": "city.name"
}
}
}
}
}
}
C. You can filter values in the terms aggregation:
{
"aggs": {
"cities": {
"terms": {
"field": "city.name",
"include": "city1*",
"exclude": "city2*"
}
}
}
}

Related

Why does pipeline aggs query fail if it includes filter aggs?

I am using Elasticsearch as a database.
I am going to use aggregation.
POST new_logs/_search
{
"query": {
"bool": {
"must": [
{
"term": {
"base.logClass.keyword": "Access"
}
}
]
}
},
"size": 0,
"aggs": {
"Rule1": {
"terms": { "field": "source.srcIp" },
"aggs": {
"MinTime": {
"min": { "field": "base.receiveTime" }
},
"MaxTime": {
"max": { "field": "base.receiveTime" }
}
}
},
"Rule2": {
"filter": { "range": { "base.receiveTime": { "gte": "2022-06-22 11:27:00", "lte": "2022-06-22 11:29:00" } }
},
"aggs": {
"SubFilter": {
"filter": { "term": { "base.subLogClass.keyword": "Login" }
},
"aggs": {
"SourceIP": {
"terms": { "field": "source.srcIp" },
"aggs": {
"DestinationIP": { "terms": { "field": "destination.dstIp" }
}
}
},
"MinTime": {
"min": { "field": "base.receiveTime" }
},
"MaxTime": {
"max": { "field": "base.receiveTime" }
}
}
}
}
},
"Logic1": {
"max_bucket": {
"buckets_path": "Rule1>MinTime"
}
},
"Logic2": {
"min_bucket": {
"buckets_path": "Rule2>SubFilter>MinTime"
}
}
}
}
As you can see in query, there are two aggs - Rule1 and Rule2.
Rule2 is using filter aggs and Rule1 is not using.
When i am going to use pipeline aggs, Logic1 is ok but Logic2 is failed.
This is the error message.
{
"error" : {
"root_cause" : [
{
"type" : "action_request_validation_exception",
"reason" : "Validation Failed: 1: The first aggregation in buckets_path must be a multi-bucket aggregation for aggregation [Logic2] found :org.elasticsearch.search.aggregations.bucket.filter.FilterAggregationBuilder for buckets path: Rule2>SubFilter>MinTime;"
}
],
"type" : "action_request_validation_exception",
"reason" : "Validation Failed: 1: The first aggregation in buckets_path must be a multi-bucket aggregation for aggregation [Logic2] found :org.elasticsearch.search.aggregations.bucket.filter.FilterAggregationBuilder for buckets path: Rule2>SubFilter>MinTime;"
},
"status" : 400
}
I'm not sure what went wrong.
If there is a filter aggs, is it not possible to use the pipeline aggs?
I am asking for help from people who have a lot of experience with Elasticsearch.
Thank you for help.
The filter aggregation is a single bucket aggregation.
The min_bucket complains that it needs a multi-bucket aggregation at first level of input path.
You might be able to use the filters aggregation, which is a multi-bucket filter or nest the filter aggregations under Rule1, because you're already doing these aggregations and you could filter a subset from Rule1.

How to define percentage of result items with specific field in Elasticsearch query?

I have a search query that returns all items matching users that have type manager or lead.
{
"from": 0,
"size": 20,
"query": {
"bool": {
"should": [
{
"terms": {
"type": ["manager", "lead"]
}
}
]
}
}
}
Is there a way to define what percentage of the results should be of type "manager"?
In other words, I want the results to have 80% of users with type manager and 20% with type lead.
I want to make a suggestion to use bucket_path aggregation. As I know this aggregation needs to be run in sub-aggs of a histogram aggregation. As you have such field in your mapping so I think this query should work for you:
{
"size": 0,
"aggs": {
"NAME": {
"date_histogram": {
"field": "my_datetime",
"interval": "month"
},
"aggs": {
"role_type": {
"terms": {
"field": "type",
"size": 10
},
"aggs": {
"count": {
"value_count": {
"field": "_id"
}
}
}
},
"role_1_ratio": {
"bucket_script": {
"buckets_path": {
"role_1": "role_type['manager']>count",
"role_2": "role_type['lead']>count"
},
"script": "params.role_1 / (params.role_1+params.role_2)*100"
}
},
"role_2_ratio": {
"bucket_script": {
"buckets_path": {
"role_1": "role_type['manager']>count",
"role_2": "role_type['lead']>count"
},
"script": "params.role_2 / (params.role_1+params.role_2)*100"
}
}
}
}
}
}
Please let me know if it didn't work well for you.

Elasticsearch: Aggregation on filtered nested objects to find unique values

I have an array of objects (tags) in each document in Elasticsearch 5:
{
"tags": [
{ "key": "tag1", "value": "val1" },
{ "key": "tag2", "value": "val2" },
...
]
}
Now I want to find unique tag values for a certain tag key. Something similiar to this SQL query:
SELECT DISTINCT(tags.value) FROM tags WHERE tags.key='some-key'
I have came to this DSL so far:
{
"size": 0,
"aggs": {
"my_tags": {
"nested": {
"path": "tags"
},
"aggs": {
"filter" : { "terms": { "tags.key": "tag1" } },
"aggs": {
"my_tags_values": {
"terms" : {
"field" : "tags.value",
"size": 9999
}
}
}
}
}
}
}
But It is showing me this error:
[terms] unknown field [tags.key], parser not found.
Is this the right approach to solve the problem? Thanks for your help.
Note: I have declared the tags field as a nested field in my mapping.
You mixed up things there. You wanted probably to add a filter aggregation, but you didn't give it any name:
{
"size": 0,
"aggs": {
"my_tags": {
"nested": {
"path": "tags"
},
"aggs": {
"my_filter": {
"filter": {
"terms": {
"tags.key": [
"tag1"
]
}
},
"aggs": {
"my_tags_values": {
"terms": {
"field": "tags.value",
"size": 9999
}
}
}
}
}
}
}
}
Try Bool Query inside the Filter-Aggregation:
{
"size": 0,
"aggs": {
"my_tags": {
"nested": {
"path": "tags"
},
"aggs": {
"filter": {
"bool": {
"must": [
{
"term": {
"tags.key": "tag1"
}
}
]
},
"aggs": {
"my_tags_values": {
"terms": {
"field": "tags.value",
"size": 0
}
}
}
}
}
}
}
}
BTW: if you want to retrieve all buckets, you can write 0 instead of 9999 in aggregation size.

How to aggregate data by an field in elasticsearch?

I'm using kibana 4.4.1 and in elasticsearch I store the status of PC, only when PC status is changed (open, closed, warings, etc)
My data into Elasticsearch looks like:
{ "status_id":1 , "pc":"lpt001" , "date":"2016-10-25T17:49:00Z" }
{ "status_id":3 , "pc":"lpt001" , "date":"2016-10-25T15:48:00Z" }
{ "status_id":4 , "pc":"lpt002" , "date":"2016-10-25T15:46:00Z" }
{ "status_id":1 , "pc":"lpt002" , "date":"2016-10-25T12:48:00Z" }
And I what to get the newest record in order to have at any time how many PC's are opened, closed or have some issues.
My query is like:
GET cb-2016.10.26/_search
{
"query": {
"match_all": { }
},
"sort": [
{
"date": {
"order": "desc"
}
}
],
"aggs": {
"max_date":{
"max": {
"field": "date"
}
}
}
}
And the result is:
"aggregations": {
"max_date": {
"value": 1477417680000,
"value_as_string": "2016-10-25T17:48:00.000Z"
}
}
But What I want is to have that max_date for each "pc": "lpt001", "lpt002".
There is any way to split max_date by "pc" field? I read something about bucket aggregations but I did not reach the result.
Thank you
Yes, you can do it like this using a terms aggregation for the pc field and then move the max_date to a sub-aggregation of the terms one:
POST cb-2016.10.26/_search
{
"query": {
"match_all": { }
},
"sort": [
{
"date": {
"order": "desc"
}
}
],
"aggs": {
"pcs": {
"terms": {
"field": "pc"
},
"aggs": {
"max_date":{
"max": {
"field": "date"
}
}
}
}
}
}
the final query looks like:
{
"query": {
"match_all": { }
},
"aggs" : {
"pcstatus" : {
"terms" : {
"field" : "pc"
},
"aggs": {
"top_date_hit": {
"top_hits": {
"sort": [
{
"date": {
"order": "desc"
}
}
],
"size" : 1
}
}
}
}
}
}

For each country/colour/brand combination , find sum of number of items in elasticsearch

This is a portion of the data I have indexed in elasticsearch:
{
"country" : "India",
"colour" : "white",
"brand" : "sony"
"numberOfItems" : 3
}
I want to get the total sum of numberOfItems on a per country basis, per colour basis and per brand basis. Is there any way to do this in elasticsearch?
The following should land you straight to the answer.
Make sure you enable scripting before using it.
{
"aggs": {
"keys": {
"terms": {
"script": "doc['country'].value + doc['color'].value + doc['brand'].value"
},
"aggs": {
"keySum": {
"sum": {
"field": "numberOfItems"
}
}
}
}
}
}
To get a single result you may use sum aggregation applied to a filtered query with term (terms) filter, e.g.:
{
"query": {
"filtered": {
"filter": {
"term": {
"country": "India"
}
}
}
},
"aggs": {
"total_sum": {
"sum": {
"field": "numberOfItems"
}
}
}
}
To get statistics for all countries/colours/brands in a single pass over the data you may use the following query with 3 multi-bucket aggregations, each of them containing a single-bucket sum sub-aggregation:
{
"query": {
"match_all": {}
},
"aggs": {
"countries": {
"terms": {
"field": "country"
},
"aggs": {
"country_sum": {
"sum": {
"field": "numberOfItems"
}
}
}
},
"colours": {
"terms": {
"field": "colour"
},
"aggs": {
"colour_sum": {
"sum": {
"field": "numberOfItems"
}
}
}
},
"brands": {
"terms": {
"field": "brand"
},
"aggs": {
"brand_sum": {
"sum": {
"field": "numberOfItems"
}
}
}
}
}
}

Resources