Filtering nested aggregation result on number of buckets - elasticsearch

I have this query that does a nested aggregation giving me unique machineid per unique key. What I want Elasticsearch to return is only those key with two or more unique machineid. I can of course solve this problem application-side, but is there a way to solve this directly in the query? Or maybe I am going about this the wrong way?
My query:
{
"query": {
"filtered": {
"filter": {
"bool": {
"must_not": {
"term" : { "key" : "" }
}
}
}
}
},
"aggs": {
"keys": {
"terms": {
"field": "key",
"size" : 0
},
"aggs": {
"machines": {
"terms": {
"field": "machineid",
"size" : 0
}
},
}
}
}
}
Example document:
{
"timestamp":"2014-05-23T08:21:51+00:00",
"machineid":"1444056739053156926",
"hash":"77f595dee5ffacea72b135b1fce1312e",
"key":"XXXXXX-XXXXXX-XXXXXX-XXXXXX"
}
I have been looking at scripted metric aggregation but it doesn't seem to be what I'm looking for.
Issue #4404 and issue #8110 on Elasticsearch GitHub seem to describe my problem but they are both closed.

Related

Why does pipeline aggs query fail if it includes filter aggs?

I am using Elasticsearch as a database.
I am going to use aggregation.
POST new_logs/_search
{
"query": {
"bool": {
"must": [
{
"term": {
"base.logClass.keyword": "Access"
}
}
]
}
},
"size": 0,
"aggs": {
"Rule1": {
"terms": { "field": "source.srcIp" },
"aggs": {
"MinTime": {
"min": { "field": "base.receiveTime" }
},
"MaxTime": {
"max": { "field": "base.receiveTime" }
}
}
},
"Rule2": {
"filter": { "range": { "base.receiveTime": { "gte": "2022-06-22 11:27:00", "lte": "2022-06-22 11:29:00" } }
},
"aggs": {
"SubFilter": {
"filter": { "term": { "base.subLogClass.keyword": "Login" }
},
"aggs": {
"SourceIP": {
"terms": { "field": "source.srcIp" },
"aggs": {
"DestinationIP": { "terms": { "field": "destination.dstIp" }
}
}
},
"MinTime": {
"min": { "field": "base.receiveTime" }
},
"MaxTime": {
"max": { "field": "base.receiveTime" }
}
}
}
}
},
"Logic1": {
"max_bucket": {
"buckets_path": "Rule1>MinTime"
}
},
"Logic2": {
"min_bucket": {
"buckets_path": "Rule2>SubFilter>MinTime"
}
}
}
}
As you can see in query, there are two aggs - Rule1 and Rule2.
Rule2 is using filter aggs and Rule1 is not using.
When i am going to use pipeline aggs, Logic1 is ok but Logic2 is failed.
This is the error message.
{
"error" : {
"root_cause" : [
{
"type" : "action_request_validation_exception",
"reason" : "Validation Failed: 1: The first aggregation in buckets_path must be a multi-bucket aggregation for aggregation [Logic2] found :org.elasticsearch.search.aggregations.bucket.filter.FilterAggregationBuilder for buckets path: Rule2>SubFilter>MinTime;"
}
],
"type" : "action_request_validation_exception",
"reason" : "Validation Failed: 1: The first aggregation in buckets_path must be a multi-bucket aggregation for aggregation [Logic2] found :org.elasticsearch.search.aggregations.bucket.filter.FilterAggregationBuilder for buckets path: Rule2>SubFilter>MinTime;"
},
"status" : 400
}
I'm not sure what went wrong.
If there is a filter aggs, is it not possible to use the pipeline aggs?
I am asking for help from people who have a lot of experience with Elasticsearch.
Thank you for help.
The filter aggregation is a single bucket aggregation.
The min_bucket complains that it needs a multi-bucket aggregation at first level of input path.
You might be able to use the filters aggregation, which is a multi-bucket filter or nest the filter aggregations under Rule1, because you're already doing these aggregations and you could filter a subset from Rule1.

Deduplicate and perform composite aggregation on deduced result

I've an index in elastic search which contains data of daily transactions. Each doc has mainly three fields as below :
TxnId, Status, TxnType,userId
two documents can have same TxnIds.
I'm looking for a query that provides aggregation over status,TxnType for unique txnIds. Basically I'm looking for something like : select unique txnIds from user_table group by status,txnType.
I've a ES query which will dedup on TxnIds. I've another ES query which can perform composite aggregation on status and txnType. I want to do both things in Single query.
I tried collapse feature . I also tried cardinality and dedup features. But query is not giving correct output.:
{
"size": 0,
"query": {
"bool": {
"filter": [
{
"term": {
"streamSource": 3
}
}
]
}
},
"collapse": {
"field": "txnId"
},
"aggs": {
"buckets": {
"composite": {
"size": 30,
"sources": [
{
"status": {
"terms": {
"field": "status"
}
}
},
{
"txnType": {
"terms": {
"field": "txnType"
}
}
}
]
}
}
}
}

Aggregation of fields inside nested type field

I want to aggregate keyword type field which lies inside a nested type field. The mapping for nested field is as below:
"Nested_field" : {
"type" : "nested",
"properties" : {
"Keyword_field" : {
"type" : "keyword"
}
}
}
And the part of query which I am using to aggregate is as below:
"aggregations": {
"Nested_field": {
"aggregations": {
"Keyword_field": {
"terms": {
"field": "Nested_field.Keyword_field"
}
}
},
"filter": {
"bool": {}
}
},
}
But this is not returning correct aggregation. Even though there are Keyword_field value existing docs, the query returns 0 buckets. So, there is something wrong in my aggregation query. Can anyone help me to find what's wrong?
I think you need to provide a nested path in there. This worked in ES 5, but it looks like you're using 6 based on the "aggregations" vs "aggs", so let me know if it doesn't work and I'll scrap this answer. Give this a try:
{
"aggregations": {
"nested_level": {
"nested": {
"path": "Nested_field"
},
"aggregations": {
"keyword_field": {
"terms": {
"field": "Nested_field.Keyword_field"
}
}
}
}
}
}

Elasticsearch - Aggregations on part of bool query

Say I have this bool query:
"bool" : {
"should" : [
{ "term" : { "FirstName" : "Sandra" } },
{ "term" : { "LastName" : "Jones" } }
],
"minimum_should_match" : 1
}
meaning I want to match all the people with first name Sandra OR last name Jones.
Now, is there any way that I can get perform an aggregation on all the documents that matched the first term only?
For example, I want to get all of the unique values of "Prizes" that anybody named Sandra has. Normally I'd just do:
"query": {
"match": {
"FirstName": "Sandra"
}
},
"aggs": {
"Prizes": {
"terms": {
"field": "Prizes"
}
}
}
Is there any way to combine the two so I only have to perform a single query which returns all of the people with first name Sandra or last name Jones, AND an aggregation only on the people with first name Sandra?
Thanks alot!
Use post_filter.
Please refer the following query. Post_filter will make sure that your bool should clause don't effect your aggregation scope.
Aggregations are filtered based on main query as well, but they are unaffected by post_filter. Please refer to the link
{
"from": 0,
"size": 20,
"aggs": {
"filtered_lastname": {
"filter": {
"query": {
"match": {
"FirstName": "sandra"
}
}
},
"aggs": {
"prizes": {
"terms": {
"field": "Prizes",
"size": 10
}
}
}
}
},
"post_filter": {
"bool": {
"should": [{
"term": {
"FirstName": "Sandra"
}
}, {
"term": {
"LastName": "Jones"
}
}],
"minimum_should_match": 1
}
}
}
Running a filter inside the aggs before aggregating on prizes can help you achieve your desired usecase.
Thanks
Hope this helps

Elasticsearch: how to do filtered search and aggregation at the same time

I need to do a filtered search plus aggregation the following way, conceptually.
{
"filtered" : {
"query": {
"match_all" : {
}
},
"aggregations": {
"facets": {
"terms": {
"field": "subject"
}
}
},
"filter" : {
...
}
}
}
The above query is not working because I got the following error message:
[filtered] query does not support [aggregations]]
I was trying to solve this problem. I found Filter Aggregation or Filters Aggregation online, but they do not seem to address my need.
Could someone show me the structure of the correct query that can achieve my goal?
Thanks and regards.
The scope of aggregation is the query and all the filters in it. Which means if you give the aggregation along with the query in normal fashion , it should work.
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {}
}
},
"aggregations": {
"facets": {
"terms": {
"field": "subject"
}
}
}
}

Resources