Multiple aggregations in Elasticsearch - elasticsearch

I want to do a terms aggregation on two fields. I don't want a sub-aggregations but I want results in two different bucket groups like if I did two separate queries for the two fields. Is it possible to combine these two queries into one?
First query:
{
"size" : 0,
"aggs" : {
"brands" : {
"terms" : {
"field" : "my_field1",
"size" : 15
},
"aggs" : {
"my_field_top_hits1" : {
"top_hits" : {
"size" : 1
}
}
}
}
}
}
Second query:
{
"size" : 0,
"aggs" : {
"brands" : {
"terms" : {
"field" : "my_field2",
"size" : 15
},
"aggs" : {
"my_field_top_hits2" : {
"top_hits" : {
"size" : 1
}
}
}
}
}
}

Unless I'm missing something obvious, you just need to do:
{
"size": 0,
"aggs": {
"brands_field1": {
"terms": {
"field": "my_field1",
"size": 15
},
"aggs": {
"my_field_top_hits1": {
"top_hits": {
"size": 1
}
}
}
},
"brands_field2": {
"terms": {
"field": "my_field2",
"size": 15
},
"aggs": {
"my_field_top_hits1": {
"top_hits": {
"size": 1
}
}
}
}
}
}

Related

How to filter by sub-aggregated results in Elasticsearch

I've got the following elastic search query in order to get the number of product sales per hour grouped by product id and hour of sale.
POST /my_sales/_search?size=0
{
"aggs": {
"sales_per_hour": {
"date_histogram": {
"field": "event_time",
"fixed_interval": "1h",
"format": "yyyy-MM-dd:HH:mm"
},
"aggs": {
"sales_per_hour_per_product": {
"terms": {
"field": "name.keyword"
}
}
}
}
}
}
One example of data :
{
"#timestamp" : "2020-10-29T18:09:56.921Z",
"name" : "my-beautifull_product",
"event_time" : "2020-10-17T08:01:33.397Z"
}
This query returns several buckets (one per hour and per product) but i would like to only retrieve those who have a doc_count higher than 10 for example, is it possible ?
For those results i would like to know the id of the product and the event_time bucket.
Thanks for your help.
Perhaps using the Bucket Selector feature will help on filtering out the results.
Try out this below search query:
{
"aggs": {
"sales_per_hour": {
"date_histogram": {
"field": "event_time",
"fixed_interval": "1h",
"format": "yyyy-MM-dd:HH:mm"
},
"aggs": {
"sales_per_hour_per_product": {
"terms": {
"field": "name.keyword"
},
"aggs": {
"the_filter": {
"bucket_selector": {
"buckets_path": {
"the_doc_count": "_count"
},
"script": "params.the_doc_count > 10"
}
}
}
}
}
}
}
}
It will filter out all the documents, whose count is greater than 10 based on "params.the_doc_count > 10"
Thank you for your help this is not far from what i would like but not exactly ; with the bucket selector i have something like this :
"aggregations" : {
"sales_per_hour" : {
"buckets" : [
{
"key_as_string" : "2020-08-31:23:00",
"key" : 1598914800000,
"doc_count" : 16,
"sales_per_hour_per_product" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "my_product_1",
"doc_count" : 2
},
{
"key" : "my_product_2",
"doc_count" : 2
},
{
"key" : "myproduct_3",
"doc_count" : 12
}
]
}
}
]
}
And sometimes none of the buckets are greater than 10, is it possible to have the same thing but with the filter on _count applied to the second level aggregation (sales_per_hour_per_product) and not on the first level (sales_per_hour) ?

DSL Query: Sale Amount Per Second in a Five Minute Window

I am trying to get the total amount of sale divide by 300 secs to get the sale amount per second in a five minute window
I am so far only able to construct the query until here. There seems to be no way to do a division on “total_value_five_mins”.
My elasticsearch version is 2.3
Tried all the elasticsearch docs can’t understand a single one.
{
"size": 0,
"query": {
"aggs" : {
"five_minute_data" : {
"date_histogram" : {
"field" : "timestamp",
"interval" : "5m"
},
"aggs": {
"total_value_five_mins": {
"sum": {
"field": "sales"
}
}
}
}
}
}
You can use scripting in your sum aggregation like this:
{
"size": 0,
"query": {
"aggs" : {
"five_minute_data" : {
"date_histogram" : {
"field" : "timestamp",
"interval" : "5m"
},
"aggs": {
"total_value_five_mins": {
"sum": {
"script": {
"inline": "doc.sales.value / 300"
}
}
}
}
}
}
}

Query muilt filed by date and ip in elasticesarch

in elasticsearch data load from next josn data.
,i want get the max value of cpu0 and in_eth1 for every ip in elasticsearch and sorted by date , so some one can help me with the flowing query?
{
"ip":"10.235.13.172",
"date":"2015-11-09",
"time":"18:30:00",
"cpu0":7"cpu13":2,
"cpu14":1,
"diskio(%)":0,
"memuse(MB)":824,
"in_eth1(Mbps)":34
}
"aggs": {
"events_by_date": {
"date_histogram": {
"field": "date",
"interval": "day"
},
"aggs" : {
"genders" : {
"terms" : {
"field" : "ip",
"size": 100000,
"order" : { "_count" : "asc" }
},
"aggs" : {
"maxcpu" : { "max" : { "field" : "cpu(%)" } },
"maxin" : { "max" : { "field" : "in_eth1(Mbps)" } },
}
}
}
}
}

Why elasticsearch cannot support min_doc_count and order by _count asc?

Requirements:
group by hldId having count(*) = 2
Elasticsearch query:
"aggs": {
"groupByHldId": {
"terms": {
"field": "hldId",
"min_doc_count": 2,
"order" : { "_count" : "asc" }
}
}
}
but no records are return
"aggregations" : {
"groupByHldId" : {
"doc_count_error_upper_bound" : -1,
"sum_other_doc_count" : 2660,
"buckets" : [ ]
}
}
but if changed to desc , it has return
"buckets" : [
{
"key" : 200035075,
"doc_count" : 355
},
or if without min_doc_count, it also has return
"buckets" : [
{
"key" : 200000061,
"doc_count" : 1
},
So why both have mid_doc_count and asc direction it returns empty?
You can try like this, bucket selector with a custom script.
{
"aggs": {
"countfield": {
"terms": {
"field": "hldId",
"size": 100,
"order": {
"_count": "desc"
}
},
"aggs": {
"criticals": {
"bucket_selector": {
"buckets_path": {
"doc_count": "_count"
},
"script": "params.doc_count==2"
}
}
}
}
}
}

Post filter on subaggregation in elasticsearch

I am trying to run a post filter on the aggregated data, but it is not working as i expected. Can someone review my query and suggest if i am doing anything wrong here.
"query" : {
"bool" : {
"must" : {
"range" : {
"versionDate" : {
"from" : null,
"to" : "2016-04-22T23:13:50.000Z",
"include_lower" : false,
"include_upper" : true
}
}
}
}
},
"aggregations" : {
"associations" : {
"terms" : {
"field" : "association.id",
"size" : 0,
"order" : {
"_term" : "asc"
}
},
"aggregations" : {
"top" : {
"top_hits" : {
"from" : 0,
"size" : 1,
"_source" : {
"includes" : [ ],
"excludes" : [ ]
},
"sort" : [ {
"versionDate" : {
"order" : "desc"
}
} ]
}
},
"disabledDate" : {
"filter" : {
"missing" : {
"field" : "disabledDate"
}
}
}
}
}
}
}
STEPS in the query:
Filter by indexDate less than or equal to a given date.
Aggregate based on formId. Forming buckets per formId.
Sort in descending order and return top hit result per bucket.
Run a subaggregation filter after the sort subaggregation and remove all the documents from buckets where disabled date is not null.(Which is not working)
The whole purpose of post_filter is to run after aggregations have been computed. As such, post_filter has no effect whatsoever on aggregation results.
What you can do in your case is to apply a top-level filter aggregation so that documents with no disabledDate are not taken into account in aggregations, i.e. consider only documents with disabledDate.
{
"query": {
"bool": {
"must": {
"range": {
"versionDate": {
"from": null,
"to": "2016-04-22T23:13:50.000Z",
"include_lower": true,
"include_upper": true
}
}
}
}
},
"aggregations": {
"with_disabled": {
"filter": {
"exists": {
"field": "disabledDate"
}
},
"aggs": {
"form.id": {
"terms": {
"field": "form.id",
"size": 0
},
"aggregations": {
"top": {
"top_hits": {
"size": 1,
"_source": {
"includes": [],
"excludes": []
},
"sort": [
{
"versionDate": {
"order": "desc"
}
}
]
}
}
}
}
}
}
}
}

Resources