I want to do a terms aggregation on two fields. I don't want a sub-aggregations but I want results in two different bucket groups like if I did two separate queries for the two fields. Is it possible to combine these two queries into one?
First query:
{
"size" : 0,
"aggs" : {
"brands" : {
"terms" : {
"field" : "my_field1",
"size" : 15
},
"aggs" : {
"my_field_top_hits1" : {
"top_hits" : {
"size" : 1
}
}
}
}
}
}
Second query:
{
"size" : 0,
"aggs" : {
"brands" : {
"terms" : {
"field" : "my_field2",
"size" : 15
},
"aggs" : {
"my_field_top_hits2" : {
"top_hits" : {
"size" : 1
}
}
}
}
}
}
Unless I'm missing something obvious, you just need to do:
{
"size": 0,
"aggs": {
"brands_field1": {
"terms": {
"field": "my_field1",
"size": 15
},
"aggs": {
"my_field_top_hits1": {
"top_hits": {
"size": 1
}
}
}
},
"brands_field2": {
"terms": {
"field": "my_field2",
"size": 15
},
"aggs": {
"my_field_top_hits1": {
"top_hits": {
"size": 1
}
}
}
}
}
}
Related
I've got the following elastic search query in order to get the number of product sales per hour grouped by product id and hour of sale.
POST /my_sales/_search?size=0
{
"aggs": {
"sales_per_hour": {
"date_histogram": {
"field": "event_time",
"fixed_interval": "1h",
"format": "yyyy-MM-dd:HH:mm"
},
"aggs": {
"sales_per_hour_per_product": {
"terms": {
"field": "name.keyword"
}
}
}
}
}
}
One example of data :
{
"#timestamp" : "2020-10-29T18:09:56.921Z",
"name" : "my-beautifull_product",
"event_time" : "2020-10-17T08:01:33.397Z"
}
This query returns several buckets (one per hour and per product) but i would like to only retrieve those who have a doc_count higher than 10 for example, is it possible ?
For those results i would like to know the id of the product and the event_time bucket.
Thanks for your help.
Perhaps using the Bucket Selector feature will help on filtering out the results.
Try out this below search query:
{
"aggs": {
"sales_per_hour": {
"date_histogram": {
"field": "event_time",
"fixed_interval": "1h",
"format": "yyyy-MM-dd:HH:mm"
},
"aggs": {
"sales_per_hour_per_product": {
"terms": {
"field": "name.keyword"
},
"aggs": {
"the_filter": {
"bucket_selector": {
"buckets_path": {
"the_doc_count": "_count"
},
"script": "params.the_doc_count > 10"
}
}
}
}
}
}
}
}
It will filter out all the documents, whose count is greater than 10 based on "params.the_doc_count > 10"
Thank you for your help this is not far from what i would like but not exactly ; with the bucket selector i have something like this :
"aggregations" : {
"sales_per_hour" : {
"buckets" : [
{
"key_as_string" : "2020-08-31:23:00",
"key" : 1598914800000,
"doc_count" : 16,
"sales_per_hour_per_product" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "my_product_1",
"doc_count" : 2
},
{
"key" : "my_product_2",
"doc_count" : 2
},
{
"key" : "myproduct_3",
"doc_count" : 12
}
]
}
}
]
}
And sometimes none of the buckets are greater than 10, is it possible to have the same thing but with the filter on _count applied to the second level aggregation (sales_per_hour_per_product) and not on the first level (sales_per_hour) ?
I am trying to get the total amount of sale divide by 300 secs to get the sale amount per second in a five minute window
I am so far only able to construct the query until here. There seems to be no way to do a division on “total_value_five_mins”.
My elasticsearch version is 2.3
Tried all the elasticsearch docs can’t understand a single one.
{
"size": 0,
"query": {
"aggs" : {
"five_minute_data" : {
"date_histogram" : {
"field" : "timestamp",
"interval" : "5m"
},
"aggs": {
"total_value_five_mins": {
"sum": {
"field": "sales"
}
}
}
}
}
}
You can use scripting in your sum aggregation like this:
{
"size": 0,
"query": {
"aggs" : {
"five_minute_data" : {
"date_histogram" : {
"field" : "timestamp",
"interval" : "5m"
},
"aggs": {
"total_value_five_mins": {
"sum": {
"script": {
"inline": "doc.sales.value / 300"
}
}
}
}
}
}
}
in elasticsearch data load from next josn data.
,i want get the max value of cpu0 and in_eth1 for every ip in elasticsearch and sorted by date , so some one can help me with the flowing query?
{
"ip":"10.235.13.172",
"date":"2015-11-09",
"time":"18:30:00",
"cpu0":7"cpu13":2,
"cpu14":1,
"diskio(%)":0,
"memuse(MB)":824,
"in_eth1(Mbps)":34
}
"aggs": {
"events_by_date": {
"date_histogram": {
"field": "date",
"interval": "day"
},
"aggs" : {
"genders" : {
"terms" : {
"field" : "ip",
"size": 100000,
"order" : { "_count" : "asc" }
},
"aggs" : {
"maxcpu" : { "max" : { "field" : "cpu(%)" } },
"maxin" : { "max" : { "field" : "in_eth1(Mbps)" } },
}
}
}
}
}
Requirements:
group by hldId having count(*) = 2
Elasticsearch query:
"aggs": {
"groupByHldId": {
"terms": {
"field": "hldId",
"min_doc_count": 2,
"order" : { "_count" : "asc" }
}
}
}
but no records are return
"aggregations" : {
"groupByHldId" : {
"doc_count_error_upper_bound" : -1,
"sum_other_doc_count" : 2660,
"buckets" : [ ]
}
}
but if changed to desc , it has return
"buckets" : [
{
"key" : 200035075,
"doc_count" : 355
},
or if without min_doc_count, it also has return
"buckets" : [
{
"key" : 200000061,
"doc_count" : 1
},
So why both have mid_doc_count and asc direction it returns empty?
You can try like this, bucket selector with a custom script.
{
"aggs": {
"countfield": {
"terms": {
"field": "hldId",
"size": 100,
"order": {
"_count": "desc"
}
},
"aggs": {
"criticals": {
"bucket_selector": {
"buckets_path": {
"doc_count": "_count"
},
"script": "params.doc_count==2"
}
}
}
}
}
}
I am trying to run a post filter on the aggregated data, but it is not working as i expected. Can someone review my query and suggest if i am doing anything wrong here.
"query" : {
"bool" : {
"must" : {
"range" : {
"versionDate" : {
"from" : null,
"to" : "2016-04-22T23:13:50.000Z",
"include_lower" : false,
"include_upper" : true
}
}
}
}
},
"aggregations" : {
"associations" : {
"terms" : {
"field" : "association.id",
"size" : 0,
"order" : {
"_term" : "asc"
}
},
"aggregations" : {
"top" : {
"top_hits" : {
"from" : 0,
"size" : 1,
"_source" : {
"includes" : [ ],
"excludes" : [ ]
},
"sort" : [ {
"versionDate" : {
"order" : "desc"
}
} ]
}
},
"disabledDate" : {
"filter" : {
"missing" : {
"field" : "disabledDate"
}
}
}
}
}
}
}
STEPS in the query:
Filter by indexDate less than or equal to a given date.
Aggregate based on formId. Forming buckets per formId.
Sort in descending order and return top hit result per bucket.
Run a subaggregation filter after the sort subaggregation and remove all the documents from buckets where disabled date is not null.(Which is not working)
The whole purpose of post_filter is to run after aggregations have been computed. As such, post_filter has no effect whatsoever on aggregation results.
What you can do in your case is to apply a top-level filter aggregation so that documents with no disabledDate are not taken into account in aggregations, i.e. consider only documents with disabledDate.
{
"query": {
"bool": {
"must": {
"range": {
"versionDate": {
"from": null,
"to": "2016-04-22T23:13:50.000Z",
"include_lower": true,
"include_upper": true
}
}
}
}
},
"aggregations": {
"with_disabled": {
"filter": {
"exists": {
"field": "disabledDate"
}
},
"aggs": {
"form.id": {
"terms": {
"field": "form.id",
"size": 0
},
"aggregations": {
"top": {
"top_hits": {
"size": 1,
"_source": {
"includes": [],
"excludes": []
},
"sort": [
{
"versionDate": {
"order": "desc"
}
}
]
}
}
}
}
}
}
}
}