How to do sum + cardinality together in aggregation in Elastic search, like sum(distinct target) with distinct( sum(amount))) - elasticsearch

Datatable:
target amount
10 10000
10 10000
15 12000
15 12000
Expected out put is :
target amount
1 10000
1 12000
I wanted to achieve the above result in aggregation in elastic search.
Some thing like below but cardinality and sum cant be used together...
"accessorials": {
"date_histogram": {
"field": "invoicedt",
"interval": "week",
"format": "Y-MM-dd:w",
"keyed": true
},
"aggs": {
"net_amount": {
"sum": {
"field": "netamt"
}
},
"distinct_trknumber": {
"cardinality": {
"field": "target"
},
"sum": {
"field": "amount"
}
}
}
}

Terms aggregation can be used to get distinct target and amount
Query:
{
"size": 0,
"aggs": {
"targets": {
"terms": {
"field": "target",
"size": 10
},
"aggs": {
"amount": {
"terms": {
"field": "amount",
"size": 10
},
"aggs": {
"count": {
"cardinality": {
"field": "amount"
}
}
}
}
}
}
}
}
Result:
"aggregations" : {
"targets" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 10,
"doc_count" : 2,
"amount" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 10000,
"doc_count" : 2,
"count" : {
"value" : 1
}
}
]
}
},
{
"key" : 15,
"doc_count" : 2,
"amount" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 12000,
"doc_count" : 2,
"count" : {
"value" : 1
}
}
]
}
}
]
}
}

Related

Query filter for searching rollup index works with epoch time fails with date math

`How do we query (filter) a rollup index?
For example, based on the query here
Request:
{
"size": 0,
"aggregations": {
"timeline": {
"date_histogram": {
"field": "timestamp",
"fixed_interval": "7d"
},
"aggs": {
"nodes": {
"terms": {
"field": "node"
},
"aggs": {
"max_temperature": {
"max": {
"field": "temperature"
}
},
"avg_voltage": {
"avg": {
"field": "voltage"
}
}
}
}
}
}
}
}
Response:
{
"took" : 93,
"timed_out" : false,
"terminated_early" : false,
"_shards" : ... ,
"hits" : {
"total" : {
"value": 0,
"relation": "eq"
},
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"timeline" : {
"buckets" : [
{
"key_as_string" : "2018-01-18T00:00:00.000Z",
"key" : 1516233600000,
"doc_count" : 6,
"nodes" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "a",
"doc_count" : 2,
"max_temperature" : {
"value" : 202.0
},
"avg_voltage" : {
"value" : 5.1499998569488525
}
},
{
"key" : "b",
"doc_count" : 2,
"max_temperature" : {
"value" : 201.0
},
"avg_voltage" : {
"value" : 5.700000047683716
}
},
{
"key" : "c",
"doc_count" : 2,
"max_temperature" : {
"value" : 202.0
},
"avg_voltage" : {
"value" : 4.099999904632568
}
}
]
}
}
]
}
}
}
How to filter say last 3 days, is it possible?
For a test case, I used fixed_interval rate of 1m (one minute, and also 60 minutes) and I tried the following and the error was all query shards failed. Is it possible to query filter rollup agggregations?
Test Query for searching rollup index
{
"size": 0,
"query": {
"range": {
"timestamp": {
"gte": "now-3d/d",
"lt": "now/d"
}
}
}
"aggregations": {
"timeline": {
"date_histogram": {
"field": "timestamp",
"fixed_interval": "7d"
},
"aggs": {
"nodes": {
"terms": {
"field": "node"
},
"aggs": {
"max_temperature": {
"max": {
"field": "temperature"
}
},
"avg_voltage": {
"avg": {
"field": "voltage"
}
}
}
}
}
}
}
}

Nested Aggregation for AND Query Not Working

Please can someone help with the below Question.
https://discuss.elastic.co/t/nested-aggregation-with-and-always-return-0-match/315722?u=chattes
I have used following aggregations
1. Terms aggregation
2. Bucket selector
3. Nested aggregation
First I have grouped by user id using terms aggregation. Then further grouped by skill Id. Using bucket selector I have filtered users which have documents under two skills.
Query
GET index5/_search
{
"size": 0,
"aggs": {
"users": {
"terms": {
"field": "id",
"size": 10
},
"aggs": {
"skills": {
"nested": {
"path": "skills"
},
"aggs": {
"filter_skill": {
"terms": {
"field": "skills.id",
"size": 10,
"include": [
553,
426
]
}
}
}
},
"bucket_count": {
"bucket_selector": {
"buckets_path": {
"skill_count": "skills>filter_skill._bucket_count"
},
"script": "params.skill_count ==2"
}
}
}
}
}
}
Results
"aggregations" : {
"users" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 1,
"doc_count" : 1,
"skills" : {
"doc_count" : 3,
"filter_skill" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "426",
"doc_count" : 1
},
{
"key" : "553",
"doc_count" : 1
}
]
}
}
},
{
"key" : 2,
"doc_count" : 1,
"skills" : {
"doc_count" : 2,
"filter_skill" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "426",
"doc_count" : 1
},
{
"key" : "553",
"doc_count" : 1
}
]
}
}
}
]
}

How to exclude the buckets having doc count equal to 0

I want to exclude those buckets from the date histogram aggregation response, whose doc count is equal to 0. And then, get the count of the filtered buckets.
The query is :
GET metricbeat-*/_search
{
"size": 0,
"query": {
"bool": {
"filter": [
{
"range": {
"host.cpu.usage": {
"gte": 0.8
}
}
},
{
"range": {
"#timestamp": {
"gte": "2022-09-22T10:16:00.000Z",
"lte": "2022-09-22T10:18:00.000Z"
}
}
}
]
}
},
"aggs": {
"hostName": {
"terms": {
"field": "host.name"
},
"aggs": {
"docsOverTimeFrame": {
"date_histogram": {
"field": "#timestamp",
"fixed_interval": "10s"
}
},
"min_bucket_selector": {
"bucket_selector": {
"buckets_path": {
"count": "docsOverTimeFrame._bucket_count"
},
"script": {
"source": "params.count == 12"
}
}
}
}
}
}
}
The response that I get right now is :
{
"took" : 8,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 38,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"hostName" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "datahot01",
"doc_count" : 3,
"docsOverTimeFrame" : {
"buckets" : [
{
"key_as_string" : "2022-09-22T10:16:00.000Z",
"key" : 1663841760000,
"doc_count" : 1
},
{
"key_as_string" : "2022-09-22T10:16:10.000Z",
"key" : 1663841770000,
"doc_count" : 1
},
{
"key_as_string" : "2022-09-22T10:16:20.000Z",
"key" : 1663841780000,
"doc_count" : 0
},
{
"key_as_string" : "2022-09-22T10:16:30.000Z",
"key" : 1663841790000,
"doc_count" : 0
},
{
"key_as_string" : "2022-09-22T10:16:40.000Z",
"key" : 1663841800000,
"doc_count" : 0
},
{
"key_as_string" : "2022-09-22T10:16:50.000Z",
"key" : 1663841810000,
"doc_count" : 0
},
{
"key_as_string" : "2022-09-22T10:17:00.000Z",
"key" : 1663841820000,
"doc_count" : 0
},
{
"key_as_string" : "2022-09-22T10:17:10.000Z",
"key" : 1663841830000,
"doc_count" : 0
},
{
"key_as_string" : "2022-09-22T10:17:20.000Z",
"key" : 1663841840000,
"doc_count" : 0
},
{
"key_as_string" : "2022-09-22T10:17:30.000Z",
"key" : 1663841850000,
"doc_count" : 0
},
{
"key_as_string" : "2022-09-22T10:17:40.000Z",
"key" : 1663841860000,
"doc_count" : 0
},
{
"key_as_string" : "2022-09-22T10:17:50.000Z",
"key" : 1663841870000,
"doc_count" : 0
}
]
}
}
]
}
}
}
So, if I am able to exclude those buckets that have doc count = 0, then on the basis of the number of buckets (that is bucket count), I want to check whether the count of buckets formed is equal to 12 or not (which I am doing using the bucket selector aggregation).
Is there some way to exclude the buckets having doc count = 0, and get the bucket count = 2 instead of 12
I was able to solve the above use case, by using a pipeline aggregation (i.e a bucket_selector aggregation) inside of the date histogram aggregation.
The modified query is :
{
"query": {
"bool": {
"must": [
{
"range": {
"#timestamp": {
"gte": "2022-09-22T10:16:00.000Z",
"lte": "2022-09-22T10:22:00.000Z"
}
}
},
{
"range": {
"system.cpu.total.norm.pct": {
"gte": 0.8
}
}
}
]
}
},
"aggs": {
"hostName": {
"terms": {
"field": "host.name"
},
"aggs": {
"docsOverTimeFrame": {
"date_histogram": {
"field": "#timestamp",
"fixed_interval": "10s"
},
"aggs": {
"histogram_doc_count": {
"bucket_selector": {
"buckets_path": {
"the_doc_count": "_count"
},
"script": "params.the_doc_count > 0"
}
}
}
},
"min_bucket_selector": {
"bucket_selector": {
"buckets_path": {
"count": "docsOverTimeFrame._bucket_count"
},
"script": {
"source": "params.count == 12"
}
}
}
}
}
}
}

Getting avg sub aggregation

I'd like to get the avg of a sub aggregation. For example, i have daily profit of each branch. I want to sum them so that i can get total daily profit. and then i want to get the monthly or week average of that daily profit. So far i have done this
{
"size" : 0,
"aggs" : {
"group_by_month": {
"date_histogram": {
"field": "Profit_Day",
"interval": "month",
"format" : "MM-yyyy"
},
"aggs": {
"avgProf": {
"avg": {
"field": "ProfitValue"
}
},
"group_by_day": {
"date_histogram": {
"field": "Profit_Day",
"interval": "day",
"format" : "yyyy-MM-dd"
},
"aggs": {
"prof": {
"sum": {
"field": "ProfitValue"
}
}
}
}
}
}
}
}
Issue is i am getting daaily sum which is correct
but instead of getting monthly average of daily sum
i am getting monthly average of profit from each branch.
You need to use average bucket aggragetion
Query:
GET sales1/_search
{
"size": 0,
"aggs": {
"group_by_month": {
"date_histogram": {
"field": "proffit_day",
"interval": "month",
"format": "MM-yyyy"
},
"aggs": {
"group_by_day": {
"date_histogram": {
"field": "proffit_day",
"interval": "day",
"format": "yyyy-MM-dd"
},
"aggs": {
"prof": {
"sum": {
"field": "proffit_value"
}
}
}
},
"avg_monthly_sales": {
"avg_bucket": {
"buckets_path": "group_by_day>prof"
}
}
}
}
}
}
Response:
{
"group_by_month" : {
"buckets" : [
{
"key_as_string" : "09-2019",
"key" : 1567296000000,
"doc_count" : 2,
"group_by_day" : {
"buckets" : [
{
"key_as_string" : "2019-09-25",
"key" : 1569369600000,
"doc_count" : 2,
"prof" : {
"value" : 15.0
}
}
]
},
"avg_monthly_sales" : {
"value" : 15.0
}
},
{
"key_as_string" : "10-2019",
"key" : 1569888000000,
"doc_count" : 2,
"group_by_day" : {
"buckets" : [
{
"key_as_string" : "2019-10-01",
"key" : 1569888000000,
"doc_count" : 1,
"prof" : {
"value" : 10.0
}
},
{
"key_as_string" : "2019-10-02",
"key" : 1569974400000,
"doc_count" : 0,
"prof" : {
"value" : 0.0
}
},
{
"key_as_string" : "2019-10-03",
"key" : 1570060800000,
"doc_count" : 1,
"prof" : {
"value" : 15.0
}
}
]
},
"avg_monthly_sales" : {
"value" : 12.5
}
}
]
}
}
}

Elasticsearch aggregations: how to get bucket with 'other' results of terms aggregation?

I use aggregation to collect data from nested field and stuck a little
Example of document:
{
...
rectangle: {
attributes: [
{_id: 'some_id', ...}
]
}
ES allows group data by rectangle.attributes._id, but is there any way to get some 'other' bucket to put there documents that were not added to any of groups? Or maybe there is a way to create query to create bucket for documents by {"rectangle.attributes._id": {$ne: "{currentDoc}.rectangle.attributes._id"}}
I think bucket would be perfect because i need to do further aggregations with 'other' docs.
Or maybe there's some cool workaround
I use query like this for aggregation
"aggs": {
"attributes": {
"nested": {
"path": "rectangle.attributes"
},
"aggs": {
"attributesCount": {
"cardinality": {
"field": "rectangle.attributes._id.keyword"
}
},
"entries": {
"terms": {
"field": "rectangle.attributes._id.keyword"
}
}
}
}
}
And get this result
"buckets" : [
{
"key" : "some_parent_id",
"doc_count" : 27616,
"attributes" : {
"doc_count" : 45,
"entries" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "some_id",
"doc_count" : 45,
"attributeOptionsCount" : {
"value" : 2
}
}
]
}
}
}
]
result like this would be perfect:
"buckets" : [
{
"key" : "some_parent_id",
"doc_count" : 1000,
"attributes" : {
"doc_count" : 145,
"entries" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "some_id",
"doc_count" : 45
},
{
"key" : "other",
"doc_count" : 100
}
]
}
}
}
]
You can make use of missing value parameter. Update aggregation as below:
"aggs": {
"attributes": {
"nested": {
"path": "rectangle.attributes"
},
"aggs": {
"attributesCount": {
"cardinality": {
"field": "rectangle.attributes._id.keyword"
}
},
"entries": {
"terms": {
"field": "rectangle.attributes._id.keyword",
"missing": "other"
}
}
}
}
}

Resources