Applying a filter to exclude a specific numerical value on a nested object's field with elastic search - elasticsearch

I am trying to calculate the aggregated average value of a field in my db via elasticsearch.
I am not having any problems calculating the av value without any filtering :
{
"query": {
"match_all":{}
},
"size": 0,
"aggs": {
"avg_quantity": {
"avg": {
"field": "license_offer.unit_price"
}
}
}
}
However I need to exclude from the aggregation docs that have a license_offer.unit_price of 0 (licence_offer is a nested object within license).
I tried different things, this is my latest attempt :
{
"size": 0,
"query": {
"constant_score": {
"filter": {
"license_offer.unit_price": {
"gte": 0
}
}
}
},
"aggs": {
"quantity_stats": {
"stats": {
"field": "license_offer.unit_price"
}
}
}
}
but I am getting an error :
"type": "parsing_exception",
"reason": "no [query] registered for [license_offer.unit_price]",
How do you apply a filter to exclude a specific numerical value on a nested object's field with elastic search ?

Your query is not correct, you're simply missing the range keyword:
{
"size": 0,
"query": {
"constant_score": {
"filter": {
"range": { <--- add this
"license_offer.unit_price": {
"gte": 0
}
}
}
}
},
"aggs": {
"quantity_stats": {
"stats": {
"field": "license_offer.unit_price"
}
}
}
}
You can also move the filter inside the aggregation part:
{
"size": 0,
"aggs": {
"only_positive": {
"filter": {
"range": {
"license_offer.unit_price": {
"gt": 0
}
}
},
"aggs": {
"quantity_stats": {
"stats": {
"field": "license_offer.unit_price"
}
}
}
}
}
}

Related

Why does pipeline aggs query fail if it includes filter aggs?

I am using Elasticsearch as a database.
I am going to use aggregation.
POST new_logs/_search
{
"query": {
"bool": {
"must": [
{
"term": {
"base.logClass.keyword": "Access"
}
}
]
}
},
"size": 0,
"aggs": {
"Rule1": {
"terms": { "field": "source.srcIp" },
"aggs": {
"MinTime": {
"min": { "field": "base.receiveTime" }
},
"MaxTime": {
"max": { "field": "base.receiveTime" }
}
}
},
"Rule2": {
"filter": { "range": { "base.receiveTime": { "gte": "2022-06-22 11:27:00", "lte": "2022-06-22 11:29:00" } }
},
"aggs": {
"SubFilter": {
"filter": { "term": { "base.subLogClass.keyword": "Login" }
},
"aggs": {
"SourceIP": {
"terms": { "field": "source.srcIp" },
"aggs": {
"DestinationIP": { "terms": { "field": "destination.dstIp" }
}
}
},
"MinTime": {
"min": { "field": "base.receiveTime" }
},
"MaxTime": {
"max": { "field": "base.receiveTime" }
}
}
}
}
},
"Logic1": {
"max_bucket": {
"buckets_path": "Rule1>MinTime"
}
},
"Logic2": {
"min_bucket": {
"buckets_path": "Rule2>SubFilter>MinTime"
}
}
}
}
As you can see in query, there are two aggs - Rule1 and Rule2.
Rule2 is using filter aggs and Rule1 is not using.
When i am going to use pipeline aggs, Logic1 is ok but Logic2 is failed.
This is the error message.
{
"error" : {
"root_cause" : [
{
"type" : "action_request_validation_exception",
"reason" : "Validation Failed: 1: The first aggregation in buckets_path must be a multi-bucket aggregation for aggregation [Logic2] found :org.elasticsearch.search.aggregations.bucket.filter.FilterAggregationBuilder for buckets path: Rule2>SubFilter>MinTime;"
}
],
"type" : "action_request_validation_exception",
"reason" : "Validation Failed: 1: The first aggregation in buckets_path must be a multi-bucket aggregation for aggregation [Logic2] found :org.elasticsearch.search.aggregations.bucket.filter.FilterAggregationBuilder for buckets path: Rule2>SubFilter>MinTime;"
},
"status" : 400
}
I'm not sure what went wrong.
If there is a filter aggs, is it not possible to use the pipeline aggs?
I am asking for help from people who have a lot of experience with Elasticsearch.
Thank you for help.
The filter aggregation is a single bucket aggregation.
The min_bucket complains that it needs a multi-bucket aggregation at first level of input path.
You might be able to use the filters aggregation, which is a multi-bucket filter or nest the filter aggregations under Rule1, because you're already doing these aggregations and you could filter a subset from Rule1.

Bucket sort on dynamic aggregation name

I would like to sort my aggregations value from quantity.
But my problem is that each aggregation have a name that couldn't be know in advance :
Given this query :
{
"size": 0,
"query": {
"bool": {
"must": [
{
"range": {
"datetime": {
"gte": "2021-01-01",
"lte": "2021-12-09"
}
}
}
]
}
},
"aggs": {
"sorting": {
"bucket_sort": {
"sort": [
{
"year>quantity": {
"order": "desc"
}
}
]
}
},
"UNKNOWN_1": {
"aggs": {
"year": {
"filter": {
"bool": {
"must": [
{
"range": {
"datetime": {
"gte": "2021-01-01",
"lte": "2021-12-09"
}
}
}
]
}
},
"aggs": {
"quantity": {
"sum": {
"field": "item.quantity"
}
}
}
}
}
},
"UNKNOWN_2": {
"aggs": {
"year": {
"aggs": {
"quantity": {
"sum": {
"field": "item.quantity"
}
}
}
}
}
},
....
}
}
it miss one level on my bucket_sort aggregation to reach that quantity value.
Here is one elastic record :
{
datetime: '2021-12-01',
item.quantity: 5
}
Note that I have remove the biggest part of the request for comprehension, like filter aggregation, ect....
I tried something with wildcard :
"sorting": {
"bucket_sort": {
"sort": [
{
"*>year>quantity": {
"order": "desc"
}
}
]
}
},
But got the same error....
Is it possible to achieve this behaviour ?
I think you misunderstood the "bucket_sort" aggregation: it won't sort your aggregations but it sorts the buckets coming from one multi-bucket aggregation. Also the bucket_sort aggregation has to be subordinate to that multi-bucket aggregation.
From the docs:
[The bucket sort aggregation is] "a parent pipeline aggregation which sorts the buckets of its parent multi-bucket aggregation"
If I get it correct, you try to create "buckets" with specific filter aggregations and you can't know in advance how many of those filter aggregations you create.
For that you can use the "multi filters" aggregation where you can specify as many filters as you want and each of them creates a bucket.
Subordinated to that filters-aggregation you can create one single sum aggregation on item.quantity.
Also subordinated to the filters-aggregations you then add your buckets_sort aggregation, where you also just have to name the sibling "sum" aggregation.
All in all it might look like that:
{
"aggs": {
"your_filters": {
"filters": {
"filters": {
"unknown_1": {
"range": {
"datetime": {
"gte": "2021-01-01",
"lte": "2021-12-09"
}
}
},
"unknown_2": {
/** more filters here... **/
}
}
},
"aggs": {
"quantity": {
"sum": {
"field": "item.quantity"
}
},
"sorting": {
"bucket_sort": {
"sort": [
{ "quantity": { "order": "desc" } }
]
}
}
}
}
}
}

ElasticSearch aggregations using filter and without it

I`m building product list page with filters. There a lot of filters, and data for them are counting in ES with aggregation functions.
Simplest example if min/max price:
{
"size": 0,
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"term": {
"shop_id": 44
}
},
{
"term": {
"CategoryId": 36898
}
},
{
"term": {
"products_status": 1
}
},
{
"term": {
"availability": 3
}
}
]
}
}
}
},
"aggs": {
"min_price": {
"min": {
"field": "products_price"
}
},
"max_price": {
"max": {
"field": "products_price"
}
}
}
}
So, this request in ES return me minimal and maximal price according rules installed in filter (category_id 36898, shop_id 44 etc).
It is working perfect.
The question is: is it possible to update this request and get aggregations without filters? Or is it maybe possible to return aggregation data with another filter in one request?
So I want:
min_price and max_price for filtered data (query1)
and mix_price and max_price for unfiltered data (or filtered data with query 2)?
You can use global option for the aggregations to not applying any filters provided in query block.
For example, for your query use the following json input.
{
"size": 0,
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"term": {
"shop_id": 44
}
},
{
"term": {
"CategoryId": 36898
}
},
{
"term": {
"products_status": 1
}
},
{
"term": {
"availability": 3
}
}
]
}
}
}
},
"aggs": {
"min_price": {
"min": {
"field": "products_price"
}
},
"max_price": {
"max": {
"field": "products_price"
}
},
"without_filter_min": {
"global": {},
"aggs": {
"price_value": {
"min": {
"field": "products_price"
}
}
}
},
"without_filter_max": {
"global": {},
"aggs": {
"price_value": {
"max": {
"field": "products_price"
}
}
}
}
}
}

Query elasticsearch with multiple numeric ranges

{
"query": {
"filtered": {
"query": {
"match": {
"log_path": "message_notification.log"
}
},
"filter": {
"numeric_range": {
"time_taken": {
"gte": 10
}
}
}
}
},
"aggs": {
"distinct_user_ids": {
"cardinality": {
"field": "user_id"
}
}
}
}
I have to run this query 20 times as i want to know notification times above each of the following thresholds- [10,30,60,120,240,300,600,1200..]. Right now, i am running a loop and making 20 queries for fetching this.
Is there a more sane way to query elasticsearch once and get ranges that fall into these thresholds respectively?
What you probably want is a "range aggregation".
Here is the possible query where you can add more range or alter them -
{
"size": 0,
"query": {
"match": {
"log_path": "message_notification.log"
}
},
"aggs": {
"intervals": {
"range": {
"field": "time_taken",
"ranges": [
{
"to": 50
},
{
"from": 50,
"to": 100
},
{
"from": 100
}
]
},
"aggs": {
"distinct_user_ids": {
"cardinality": {
"field": "user_id"
}
}
}
}
}
}

Multiple filters and an aggregate in elasticsearch

How can I use a filter in connection with an aggregate in elasticsearch?
The official documentation gives only trivial examples for filter and for aggregations and no formal description of the query dsl - compare it e.g. with postgres documentation.
Through trying out I found following query, which is accepted by elasticsearch (no parsing errors), but ignores the given filters:
{
"filter": {
"and": [
{
"term": {
"_type": "logs"
}
},
{
"term": {
"dc": "eu-west-12"
}
},
{
"term": {
"status": "204"
}
},
{
"range": {
"#timestamp": {
"from": 1398169707,
"to": 1400761707
}
}
}
]
},
"size": 0,
"aggs": {
"time_histo": {
"date_histogram": {
"field": "#timestamp",
"interval": "1h"
},
"aggs": {
"name": {
"percentiles": {
"field": "upstream_response_time",
"percents": [
98.0
]
}
}
}
}
}
}
Some people suggest using query instead of filter. But the official documentation generally recommends the opposite for filtering on exact values. Another issue with query: while filters offer an and, query does not.
Can somebody point me to documentation, a blog or a book, which describe writing non-trivial queries: at least an aggregate plus multiple filters.
I ended up using a filter aggregation - not filtered query. So now I have 3 nested aggs elements.
I also use bool filter instead of and as recommended by #alex-brasetvik because of http://www.elasticsearch.org/blog/all-about-elasticsearch-filter-bitsets/
My final implementation:
{
"aggs": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"_type": "logs"
}
},
{
"term": {
"dc": "eu-west-12"
}
},
{
"term": {
"status": "204"
}
},
{
"range": {
"#timestamp": {
"from": 1398176502000,
"to": 1400768502000
}
}
}
]
}
},
"aggs": {
"time_histo": {
"date_histogram": {
"field": "#timestamp",
"interval": "1h"
},
"aggs": {
"name": {
"percentiles": {
"field": "upstream_response_time",
"percents": [
98.0
]
}
}
}
}
}
}
},
"size": 0
}
Put your filter in a filtered-query.
The top-level filter is for filtering search hits only, and not facets/aggregations. It was renamed to post_filter in 1.0 due to this quite common confusion.
Also, you might want to look into this post on why you often want to use bool and not and/or: http://www.elasticsearch.org/blog/all-about-elasticsearch-filter-bitsets/
more on #geekQ 's answer: to support filter string with space char,for multipal term search,use below:
{ "aggs": {
"aggresults": {
"filter": {
"bool": {
"must": [
{
"match_phrase": {
"term_1": "some text with space 1"
}
},
{
"match_phrase": {
"term_2": "some text with also space 2"
}
}
]
}
},
"aggs" : {
"all_term_3s" : {
"terms" : {
"field":"term_3.keyword",
"size" : 10000,
"order" : {
"_term" : "asc"
}
}
}
}
} }, "size": 0 }
Just for reference, as for the version 7.2, I tried with something as follows to achieve multiple filters for aggregation:
filter aggregation to filter for aggregation
use bool to set up the compound query
POST movies/_search?size=0
{
"size": 0,
"aggs": {
"test": {
"filter": {
"bool": {
"must": {
"term": {
"genre": "action"
}
},
"filter": {
"range": {
"year": {
"gte": 1800,
"lte": 3000
}
}
}
}
},
"aggs": {
"year_hist": {
"histogram": {
"field": "year",
"interval": 50
}
}
}
}
}
}

Resources