Filetered aggregation query ouput needed in non-nested format - elasticsearch

I have following query which gives the desired output, but in nested format.
{
"size": 0,
"aggs": {
"Pre_Post": {
"filters": {
"filters": {
"PRE": {
"range": {
"mydate": {
"gte": "2017-12-31||-6M",
"lte": "2017-12-31"
}
}
},
"POST": {
"range": {
"mydate": {
"gte": "2018-08-01",
"lte": "2018-08-07"
}
}
}
}
},
"aggs": {
"dimension1": {
"terms": {
"field": "myType.keyword"
},
"aggs": {
"sales": {
"sum": {
"field": "sales"
}
}
}
}
}
}
}
}
output of above is roughly in format
"PRE_POST": {
"PRE": {
"buckets": {
"dimension1": {
"key": "field1",
"buckets": {
"sales": 50
}
}
}
}
}
Is there any way to get this in non-nested format something like the one given by composite query, with some less nested-ness.
desired sample output something like
"PRE_POST": {
"Key1": "PRE",
"dimension1": "field1",
"buckets": {
"sales": 50
}
}
I have tried composite, but composite do not allow filters.
I have tried composite with PRE_POST as script field, but i.e. very slow.
I have also tried adjacency matrix, where two filters are for PRE and POST and others are for each dimension1 field. But this returns too much of unnecessary data.
Is there any way or any method I am missing to get the output in less nested format.

Related

Bucket sort on dynamic aggregation name

I would like to sort my aggregations value from quantity.
But my problem is that each aggregation have a name that couldn't be know in advance :
Given this query :
{
"size": 0,
"query": {
"bool": {
"must": [
{
"range": {
"datetime": {
"gte": "2021-01-01",
"lte": "2021-12-09"
}
}
}
]
}
},
"aggs": {
"sorting": {
"bucket_sort": {
"sort": [
{
"year>quantity": {
"order": "desc"
}
}
]
}
},
"UNKNOWN_1": {
"aggs": {
"year": {
"filter": {
"bool": {
"must": [
{
"range": {
"datetime": {
"gte": "2021-01-01",
"lte": "2021-12-09"
}
}
}
]
}
},
"aggs": {
"quantity": {
"sum": {
"field": "item.quantity"
}
}
}
}
}
},
"UNKNOWN_2": {
"aggs": {
"year": {
"aggs": {
"quantity": {
"sum": {
"field": "item.quantity"
}
}
}
}
}
},
....
}
}
it miss one level on my bucket_sort aggregation to reach that quantity value.
Here is one elastic record :
{
datetime: '2021-12-01',
item.quantity: 5
}
Note that I have remove the biggest part of the request for comprehension, like filter aggregation, ect....
I tried something with wildcard :
"sorting": {
"bucket_sort": {
"sort": [
{
"*>year>quantity": {
"order": "desc"
}
}
]
}
},
But got the same error....
Is it possible to achieve this behaviour ?
I think you misunderstood the "bucket_sort" aggregation: it won't sort your aggregations but it sorts the buckets coming from one multi-bucket aggregation. Also the bucket_sort aggregation has to be subordinate to that multi-bucket aggregation.
From the docs:
[The bucket sort aggregation is] "a parent pipeline aggregation which sorts the buckets of its parent multi-bucket aggregation"
If I get it correct, you try to create "buckets" with specific filter aggregations and you can't know in advance how many of those filter aggregations you create.
For that you can use the "multi filters" aggregation where you can specify as many filters as you want and each of them creates a bucket.
Subordinated to that filters-aggregation you can create one single sum aggregation on item.quantity.
Also subordinated to the filters-aggregations you then add your buckets_sort aggregation, where you also just have to name the sibling "sum" aggregation.
All in all it might look like that:
{
"aggs": {
"your_filters": {
"filters": {
"filters": {
"unknown_1": {
"range": {
"datetime": {
"gte": "2021-01-01",
"lte": "2021-12-09"
}
}
},
"unknown_2": {
/** more filters here... **/
}
}
},
"aggs": {
"quantity": {
"sum": {
"field": "item.quantity"
}
},
"sorting": {
"bucket_sort": {
"sort": [
{ "quantity": { "order": "desc" } }
]
}
}
}
}
}
}

Elasticsearch Pagination with timestamp range

Elasticsearch official documentation introduce that elasticsearch can realize pagination by composite aggregations.
The composite aggregation will fetch data many times to get all results.
So my question is, Can I use range from now-1h to now when I execute composite aggregation?
If I can. How to composite aggregation query keep source data unchanging when every range query have different now.
If I can't. My query below has no error and the result seems to be right.
{
"size": 0,
"query": {
"bool": {
"filter": [
{
"range": {
"timestamp": {
"gte": "now-1h"
}
}
}
]
}
},
"aggs": {
"user_device": {
"composite": {
"after": {
"user_name": "alen.lv"
},
"size": 100,
"sources": [
{
"user_name": {
"terms": {
"field": "user_name"
}
}
}
]
},
"aggs": {
"user_mac": {
"terms": {
"field": "user_mac",
"size": 1000
}
}
}
}
}
}

Elasticsearch Aggregations: Only return results of one of them?

I'm trying to find a way to only return the results of one aggregation in an Elasticsearch query. I have a max bucket aggregation (the one that I want to see) that is calculated from a sum bucket aggregation based on a date histogram aggregation. Right now, I have to go through 1,440 results to get to the one I want to see. I've already removed the results of the base query with the size: 0 modifier, but is there a way to do something similar with the aggregations as well? I've tried slipping the same thing into a few places with no luck.
Here's the query:
{
"size": 0,
"query": {
"range": {
"timestamp": {
"gte": "2018-11-28",
"lte": "2018-11-28"
}
}
},
"aggs": {
"hits_per_minute": {
"date_histogram": {
"field": "timestamp",
"interval": "minute"
},
"aggs": {
"total_hits": {
"sum": {
"field": "hits_count"
}
}
}
},
"max_transactions_per_minute": {
"max_bucket": {
"buckets_path": "hits_per_minute>total_hits"
}
}
}
}
Fortunately enough, you can do that with bucket_sort aggregation, which was added in Elasticsearch 6.4.
Do it with bucket_sort
POST my_index/doc/_search
{
"size": 0,
"query": {
"range": {
"timestamp": {
"gte": "2018-11-28",
"lte": "2018-11-28"
}
}
},
"aggs": {
"hits_per_minute": {
"date_histogram": {
"field": "timestamp",
"interval": "minute"
},
"aggs": {
"total_hits": {
"sum": {
"field": "hits_count"
}
},
"max_transactions_per_minute": {
"bucket_sort": {
"sort": [
{"total_hits": {"order": "desc"}}
],
"size": 1
}
}
}
}
}
}
This will give you a response like this:
{
...
"aggregations": {
"hits_per_minute": {
"buckets": [
{
"key_as_string": "2018-11-28T21:10:00.000Z",
"key": 1543957800000,
"doc_count": 3,
"total_hits": {
"value": 11
}
}
]
}
}
}
Note that there is no extra aggregation in the output and the output of hits_per_minute is truncated (because we asked to give exactly one, topmost bucket).
Do it with filter_path
There is also a generic way to filter the output of Elasticsearch: Response filtering, as this answer suggests.
In this case it will be enough to just do the following query:
POST my_index/doc/_search?filter_path=aggregations.max_transactions_per_minute
{ ... (original query) ... }
That would give the response:
{
"aggregations": {
"max_transactions_per_minute": {
"value": 11,
"keys": [
"2018-12-04T21:10:00.000Z"
]
}
}
}

Elasticsearch aggregate field between dates

I want to compare two buckets against each other and find new occurrences that appear in the second bucket. The below query returns all entries in the "query.keyword" field between the two UNIX timestamps provided but I want the UNIX timestamps to be apart of the aggregation section itself.
GET _search
{
"size": 0,
"query": {
"range" :{
"ts": {
"gte":1535155200,
"lte":1535414399
}
}
},
"aggs": {
"domains": {
"terms": {
"field":"query.keyword"
}
}
}
}
I've also tried this but received the error:
"Found two aggregation type definitions in [domains_prev]: [range] and [terms]",
GET _search
{
"size": 0,
"aggs": {
"domains_prev": {
"range" :{
"field":"ts",
"ranges": [
{"to" : 1535414399},
{"from" : 1535155200}
]
},
"terms": {
"field":"query.keyword"
}
}
}
}
The goal is to have something similar to this:
Agg1
"domains_prev"
"field":"query.keyword"
date:gte:timestamp, lte:timestamp
Agg2
"domains_today"
"field":"query.keyword"
date:today
show all "query.keyword" in agg2 that does not appear in agg1.
This is the SQL query that I use to achieve the intended result:
select domains FROM table WHERE date >= 20171123 and domains NOT IN (SELECT domains FROM table WHERE date < 20171123 group by domains)
You'll want to do a nested bucket aggregation starting with date range:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-daterange-aggregation.html
From their page, start with an aggregation like this at the top level:
{
"aggs": {
"range": {
"date_range": {
"field": "date",
"format": "MM-yyy",
"ranges": [
{ "to": "now-10M/M" },
{ "from": "now-10M/M" }
]
}
}
}
}
Then nest your existing terms aggregation using query.keyword under that.
The end result should be something like:
{
"aggs": {
"range": {
"date_range": {
"field": "date",
"format": "MM-yyy",
"ranges": [
{ "to": "now-10M/M" },
{ "from": "now-10M/M" }
]
},
"aggs": {
"domains": {
"terms": {
"field":"query.keyword"
}
}
}
}
}
}

Multiple filters and an aggregate in elasticsearch

How can I use a filter in connection with an aggregate in elasticsearch?
The official documentation gives only trivial examples for filter and for aggregations and no formal description of the query dsl - compare it e.g. with postgres documentation.
Through trying out I found following query, which is accepted by elasticsearch (no parsing errors), but ignores the given filters:
{
"filter": {
"and": [
{
"term": {
"_type": "logs"
}
},
{
"term": {
"dc": "eu-west-12"
}
},
{
"term": {
"status": "204"
}
},
{
"range": {
"#timestamp": {
"from": 1398169707,
"to": 1400761707
}
}
}
]
},
"size": 0,
"aggs": {
"time_histo": {
"date_histogram": {
"field": "#timestamp",
"interval": "1h"
},
"aggs": {
"name": {
"percentiles": {
"field": "upstream_response_time",
"percents": [
98.0
]
}
}
}
}
}
}
Some people suggest using query instead of filter. But the official documentation generally recommends the opposite for filtering on exact values. Another issue with query: while filters offer an and, query does not.
Can somebody point me to documentation, a blog or a book, which describe writing non-trivial queries: at least an aggregate plus multiple filters.
I ended up using a filter aggregation - not filtered query. So now I have 3 nested aggs elements.
I also use bool filter instead of and as recommended by #alex-brasetvik because of http://www.elasticsearch.org/blog/all-about-elasticsearch-filter-bitsets/
My final implementation:
{
"aggs": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"_type": "logs"
}
},
{
"term": {
"dc": "eu-west-12"
}
},
{
"term": {
"status": "204"
}
},
{
"range": {
"#timestamp": {
"from": 1398176502000,
"to": 1400768502000
}
}
}
]
}
},
"aggs": {
"time_histo": {
"date_histogram": {
"field": "#timestamp",
"interval": "1h"
},
"aggs": {
"name": {
"percentiles": {
"field": "upstream_response_time",
"percents": [
98.0
]
}
}
}
}
}
}
},
"size": 0
}
Put your filter in a filtered-query.
The top-level filter is for filtering search hits only, and not facets/aggregations. It was renamed to post_filter in 1.0 due to this quite common confusion.
Also, you might want to look into this post on why you often want to use bool and not and/or: http://www.elasticsearch.org/blog/all-about-elasticsearch-filter-bitsets/
more on #geekQ 's answer: to support filter string with space char,for multipal term search,use below:
{ "aggs": {
"aggresults": {
"filter": {
"bool": {
"must": [
{
"match_phrase": {
"term_1": "some text with space 1"
}
},
{
"match_phrase": {
"term_2": "some text with also space 2"
}
}
]
}
},
"aggs" : {
"all_term_3s" : {
"terms" : {
"field":"term_3.keyword",
"size" : 10000,
"order" : {
"_term" : "asc"
}
}
}
}
} }, "size": 0 }
Just for reference, as for the version 7.2, I tried with something as follows to achieve multiple filters for aggregation:
filter aggregation to filter for aggregation
use bool to set up the compound query
POST movies/_search?size=0
{
"size": 0,
"aggs": {
"test": {
"filter": {
"bool": {
"must": {
"term": {
"genre": "action"
}
},
"filter": {
"range": {
"year": {
"gte": 1800,
"lte": 3000
}
}
}
}
},
"aggs": {
"year_hist": {
"histogram": {
"field": "year",
"interval": 50
}
}
}
}
}
}

Resources