Why does pipeline aggs query fail if it includes filter aggs? - elasticsearch

I am using Elasticsearch as a database.
I am going to use aggregation.
POST new_logs/_search
{
"query": {
"bool": {
"must": [
{
"term": {
"base.logClass.keyword": "Access"
}
}
]
}
},
"size": 0,
"aggs": {
"Rule1": {
"terms": { "field": "source.srcIp" },
"aggs": {
"MinTime": {
"min": { "field": "base.receiveTime" }
},
"MaxTime": {
"max": { "field": "base.receiveTime" }
}
}
},
"Rule2": {
"filter": { "range": { "base.receiveTime": { "gte": "2022-06-22 11:27:00", "lte": "2022-06-22 11:29:00" } }
},
"aggs": {
"SubFilter": {
"filter": { "term": { "base.subLogClass.keyword": "Login" }
},
"aggs": {
"SourceIP": {
"terms": { "field": "source.srcIp" },
"aggs": {
"DestinationIP": { "terms": { "field": "destination.dstIp" }
}
}
},
"MinTime": {
"min": { "field": "base.receiveTime" }
},
"MaxTime": {
"max": { "field": "base.receiveTime" }
}
}
}
}
},
"Logic1": {
"max_bucket": {
"buckets_path": "Rule1>MinTime"
}
},
"Logic2": {
"min_bucket": {
"buckets_path": "Rule2>SubFilter>MinTime"
}
}
}
}
As you can see in query, there are two aggs - Rule1 and Rule2.
Rule2 is using filter aggs and Rule1 is not using.
When i am going to use pipeline aggs, Logic1 is ok but Logic2 is failed.
This is the error message.
{
"error" : {
"root_cause" : [
{
"type" : "action_request_validation_exception",
"reason" : "Validation Failed: 1: The first aggregation in buckets_path must be a multi-bucket aggregation for aggregation [Logic2] found :org.elasticsearch.search.aggregations.bucket.filter.FilterAggregationBuilder for buckets path: Rule2>SubFilter>MinTime;"
}
],
"type" : "action_request_validation_exception",
"reason" : "Validation Failed: 1: The first aggregation in buckets_path must be a multi-bucket aggregation for aggregation [Logic2] found :org.elasticsearch.search.aggregations.bucket.filter.FilterAggregationBuilder for buckets path: Rule2>SubFilter>MinTime;"
},
"status" : 400
}
I'm not sure what went wrong.
If there is a filter aggs, is it not possible to use the pipeline aggs?
I am asking for help from people who have a lot of experience with Elasticsearch.
Thank you for help.

The filter aggregation is a single bucket aggregation.
The min_bucket complains that it needs a multi-bucket aggregation at first level of input path.
You might be able to use the filters aggregation, which is a multi-bucket filter or nest the filter aggregations under Rule1, because you're already doing these aggregations and you could filter a subset from Rule1.

Related

Bucket sort on dynamic aggregation name

I would like to sort my aggregations value from quantity.
But my problem is that each aggregation have a name that couldn't be know in advance :
Given this query :
{
"size": 0,
"query": {
"bool": {
"must": [
{
"range": {
"datetime": {
"gte": "2021-01-01",
"lte": "2021-12-09"
}
}
}
]
}
},
"aggs": {
"sorting": {
"bucket_sort": {
"sort": [
{
"year>quantity": {
"order": "desc"
}
}
]
}
},
"UNKNOWN_1": {
"aggs": {
"year": {
"filter": {
"bool": {
"must": [
{
"range": {
"datetime": {
"gte": "2021-01-01",
"lte": "2021-12-09"
}
}
}
]
}
},
"aggs": {
"quantity": {
"sum": {
"field": "item.quantity"
}
}
}
}
}
},
"UNKNOWN_2": {
"aggs": {
"year": {
"aggs": {
"quantity": {
"sum": {
"field": "item.quantity"
}
}
}
}
}
},
....
}
}
it miss one level on my bucket_sort aggregation to reach that quantity value.
Here is one elastic record :
{
datetime: '2021-12-01',
item.quantity: 5
}
Note that I have remove the biggest part of the request for comprehension, like filter aggregation, ect....
I tried something with wildcard :
"sorting": {
"bucket_sort": {
"sort": [
{
"*>year>quantity": {
"order": "desc"
}
}
]
}
},
But got the same error....
Is it possible to achieve this behaviour ?
I think you misunderstood the "bucket_sort" aggregation: it won't sort your aggregations but it sorts the buckets coming from one multi-bucket aggregation. Also the bucket_sort aggregation has to be subordinate to that multi-bucket aggregation.
From the docs:
[The bucket sort aggregation is] "a parent pipeline aggregation which sorts the buckets of its parent multi-bucket aggregation"
If I get it correct, you try to create "buckets" with specific filter aggregations and you can't know in advance how many of those filter aggregations you create.
For that you can use the "multi filters" aggregation where you can specify as many filters as you want and each of them creates a bucket.
Subordinated to that filters-aggregation you can create one single sum aggregation on item.quantity.
Also subordinated to the filters-aggregations you then add your buckets_sort aggregation, where you also just have to name the sibling "sum" aggregation.
All in all it might look like that:
{
"aggs": {
"your_filters": {
"filters": {
"filters": {
"unknown_1": {
"range": {
"datetime": {
"gte": "2021-01-01",
"lte": "2021-12-09"
}
}
},
"unknown_2": {
/** more filters here... **/
}
}
},
"aggs": {
"quantity": {
"sum": {
"field": "item.quantity"
}
},
"sorting": {
"bucket_sort": {
"sort": [
{ "quantity": { "order": "desc" } }
]
}
}
}
}
}
}

bucket script not working - elasticsearch 2.4.2

I have tried to subtract the aggregations
{
"query": {
"match_all": {}
},
"size": 0,
"aggs": {
"total_query_id": {
"sum": {
"field": "query_id"
}
},
"total_num_results": {
"sum": {
"field": "num_results"
}
},
"minus_value": {
"bucket_script": {
"buckets_path": {
"qid": "total_query_id",
"nrs": "total_num_results"
},
"script": "qid - nrs"
}
}
}
}
it throws the below error
"reason": "Invalid pipeline aggregation named [minus_value] of type [bucket_script]. Only sibling pipeline aggregations are allowed at the top level"
I have moved to back and forth minus_value node to aggs node but it does not solve my problem.
can anyone help me on this?
The idea is that pipeline aggregations must work on a parent bucket aggregation.
It is not the case in your example, so you must have one parent aggregation. Since you have a match_all query, you could try using a global bucket aggregation and then embed your 3 aggregations inside it, like this:
{
"query": {
"match_all": {}
},
"size": 0,
"aggs": {
"all": {
"global": {},
"aggs": {
"total_query_id": {
"sum": {
"field": "query_id"
}
},
"total_num_results": {
"sum": {
"field": "num_results"
}
},
"minus_value": {
"bucket_script": {
"buckets_path": {
"qid": "total_query_id",
"nrs": "total_num_results"
},
"script": "qid - nrs"
}
}
}
}
}
}

Applying a filter to exclude a specific numerical value on a nested object's field with elastic search

I am trying to calculate the aggregated average value of a field in my db via elasticsearch.
I am not having any problems calculating the av value without any filtering :
{
"query": {
"match_all":{}
},
"size": 0,
"aggs": {
"avg_quantity": {
"avg": {
"field": "license_offer.unit_price"
}
}
}
}
However I need to exclude from the aggregation docs that have a license_offer.unit_price of 0 (licence_offer is a nested object within license).
I tried different things, this is my latest attempt :
{
"size": 0,
"query": {
"constant_score": {
"filter": {
"license_offer.unit_price": {
"gte": 0
}
}
}
},
"aggs": {
"quantity_stats": {
"stats": {
"field": "license_offer.unit_price"
}
}
}
}
but I am getting an error :
"type": "parsing_exception",
"reason": "no [query] registered for [license_offer.unit_price]",
How do you apply a filter to exclude a specific numerical value on a nested object's field with elastic search ?
Your query is not correct, you're simply missing the range keyword:
{
"size": 0,
"query": {
"constant_score": {
"filter": {
"range": { <--- add this
"license_offer.unit_price": {
"gte": 0
}
}
}
}
},
"aggs": {
"quantity_stats": {
"stats": {
"field": "license_offer.unit_price"
}
}
}
}
You can also move the filter inside the aggregation part:
{
"size": 0,
"aggs": {
"only_positive": {
"filter": {
"range": {
"license_offer.unit_price": {
"gt": 0
}
}
},
"aggs": {
"quantity_stats": {
"stats": {
"field": "license_offer.unit_price"
}
}
}
}
}
}

Elasticsearch - get terms aggregation for specified fields

I am using terms aggregations to get all the no of users from each city
{
"aggs" : {
"cities" : {
"terms" : { "field" : "city.name" }
}
}
}
This is giving results. But I always want to get some specific cities in results of aggregation irrespective of whether they are in top 10 or not. Do I need to use filter aggregation for each of the city separately to get its result?
You have three solutions:
A. You can specify a filter in the query:
{
"query": {
"terms": {
"city.name": [ "city1", "city2", "city3" ]
}
},
"aggs": {
"cities": {
"terms": {
"field": "city.name"
}
}
}
}
B. You can specify a filter in the aggregations:
{
"aggs": {
"city_filter": {
"filter": {
"terms": {
"city.name": [
"city1",
"city2",
"city3"
]
}
},
"aggs": {
"cities": {
"terms": {
"field": "city.name"
}
}
}
}
}
}
C. You can filter values in the terms aggregation:
{
"aggs": {
"cities": {
"terms": {
"field": "city.name",
"include": "city1*",
"exclude": "city2*"
}
}
}
}

Multiple filters and an aggregate in elasticsearch

How can I use a filter in connection with an aggregate in elasticsearch?
The official documentation gives only trivial examples for filter and for aggregations and no formal description of the query dsl - compare it e.g. with postgres documentation.
Through trying out I found following query, which is accepted by elasticsearch (no parsing errors), but ignores the given filters:
{
"filter": {
"and": [
{
"term": {
"_type": "logs"
}
},
{
"term": {
"dc": "eu-west-12"
}
},
{
"term": {
"status": "204"
}
},
{
"range": {
"#timestamp": {
"from": 1398169707,
"to": 1400761707
}
}
}
]
},
"size": 0,
"aggs": {
"time_histo": {
"date_histogram": {
"field": "#timestamp",
"interval": "1h"
},
"aggs": {
"name": {
"percentiles": {
"field": "upstream_response_time",
"percents": [
98.0
]
}
}
}
}
}
}
Some people suggest using query instead of filter. But the official documentation generally recommends the opposite for filtering on exact values. Another issue with query: while filters offer an and, query does not.
Can somebody point me to documentation, a blog or a book, which describe writing non-trivial queries: at least an aggregate plus multiple filters.
I ended up using a filter aggregation - not filtered query. So now I have 3 nested aggs elements.
I also use bool filter instead of and as recommended by #alex-brasetvik because of http://www.elasticsearch.org/blog/all-about-elasticsearch-filter-bitsets/
My final implementation:
{
"aggs": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"_type": "logs"
}
},
{
"term": {
"dc": "eu-west-12"
}
},
{
"term": {
"status": "204"
}
},
{
"range": {
"#timestamp": {
"from": 1398176502000,
"to": 1400768502000
}
}
}
]
}
},
"aggs": {
"time_histo": {
"date_histogram": {
"field": "#timestamp",
"interval": "1h"
},
"aggs": {
"name": {
"percentiles": {
"field": "upstream_response_time",
"percents": [
98.0
]
}
}
}
}
}
}
},
"size": 0
}
Put your filter in a filtered-query.
The top-level filter is for filtering search hits only, and not facets/aggregations. It was renamed to post_filter in 1.0 due to this quite common confusion.
Also, you might want to look into this post on why you often want to use bool and not and/or: http://www.elasticsearch.org/blog/all-about-elasticsearch-filter-bitsets/
more on #geekQ 's answer: to support filter string with space char,for multipal term search,use below:
{ "aggs": {
"aggresults": {
"filter": {
"bool": {
"must": [
{
"match_phrase": {
"term_1": "some text with space 1"
}
},
{
"match_phrase": {
"term_2": "some text with also space 2"
}
}
]
}
},
"aggs" : {
"all_term_3s" : {
"terms" : {
"field":"term_3.keyword",
"size" : 10000,
"order" : {
"_term" : "asc"
}
}
}
}
} }, "size": 0 }
Just for reference, as for the version 7.2, I tried with something as follows to achieve multiple filters for aggregation:
filter aggregation to filter for aggregation
use bool to set up the compound query
POST movies/_search?size=0
{
"size": 0,
"aggs": {
"test": {
"filter": {
"bool": {
"must": {
"term": {
"genre": "action"
}
},
"filter": {
"range": {
"year": {
"gte": 1800,
"lte": 3000
}
}
}
}
},
"aggs": {
"year_hist": {
"histogram": {
"field": "year",
"interval": 50
}
}
}
}
}
}

Resources