In Elastic Search I have the following index with 'allocated_bytes', 'total_bytes' and other fields:
{
"_index" : "metrics-blockstore_capacity-2017_06",
"_type" : "datapoint",
"_id" : "AVzHwgsi9KuwEU6jCXy5",
"_score" : 1.0,
"_source" : {
"timestamp" : 1498000001000,
"resource_guid" : "2185d15c-5298-44ac-8646-37575490125d",
"allocated_bytes" : 1.159196672E9,
"resource_type" : "machine",
"total_bytes" : 1.460811776E11,
"machine" : "2185d15c-5298-44ac-8646-37575490125d"
}
I have the following query to
1)get a point for 30 minute interval using date-histogram
2)group by field on resource_guid.
3)max aggregate to find the max value.
{
"size": 0,
"query": {
"bool": {
"must": [
{
"range": {
"timestamp": {
"gte": 1497992400000,
"lte": 1497996000000
}
}
}
]
}
},
"aggregations": {
"groupByTime": {
"date_histogram": {
"field": "timestamp",
"interval": "30m",
"order": {
"_key": "desc"
}
},
"aggregations": {
"groupByField": {
"terms": {
"size": 1000,
"field": "resource_guid"
},
"aggregations": {
"maxValue": {
"max": {
"field": "allocated_bytes"
}
}
}
},
"sumUnique": {
"sum_bucket": {
"buckets_path": "groupByField>maxValue"
}
}
}
}
}
}
But with this query I am able to get only allocated_bytes, but I need to have both allocated_bytes and total_bytes at the result point.
Following is the result from the above query:
{
"key_as_string" : "2017-06-20T21:00:00.000Z",
"key" : 1497992400000,
"doc_count" : 9,
"groupByField" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ {
"key" : "2185d15c-5298-44ac-8646-37575490125d",
"doc_count" : 3,
"maxValue" : {
"value" : 1.156182016E9
}
}, {
"key" : "c3513cdd-58bb-4f8e-9b4c-467230b4f6e2",
"doc_count" : 3,
"maxValue" : {
"value" : 1.156165632E9
}
}, {
"key" : "eff13403-9737-4d08-9dca-fb6c12c3a6fa",
"doc_count" : 3,
"maxValue" : {
"value" : 1.156182016E9
}
} ]
},
"sumUnique" : {
"value" : 3.468529664E9
}
}
I do need both allocated_bytes and total_bytes. How do I get multiple fields( allocated_bytes, total_bytes) for each point?
For example:
"sumUnique" : {
"Allocatedvalue" : 3.468529664E9,
"TotalValue" : 9.468529664E9
}
or like this:
"allocatedBytessumUnique" : {
"value" : 3.468529664E9
}
"totalBytessumUnique" : {
"value" : 9.468529664E9
},
You can just add another aggregation:
{
"size": 0,
"query": {
"bool": {
"must": [
{
"range": {
"timestamp": {
"gte": 1497992400000,
"lte": 1497996000000
}
}
}
]
}
},
"aggregations": {
"groupByTime": {
"date_histogram": {
"field": "timestamp",
"interval": "30m",
"order": {
"_key": "desc"
}
},
"aggregations": {
"groupByField": {
"terms": {
"size": 1000,
"field": "resource_guid"
},
"aggregations": {
"maxValueAllocated": {
"max": {
"field": "allocated_bytes"
}
},
"maxValueTotal": {
"max": {
"field": "total_bytes"
}
}
}
},
"sumUniqueAllocatedBytes": {
"sum_bucket": {
"buckets_path": "groupByField>maxValueAllocated"
}
},
"sumUniqueTotalBytes": {
"sum_bucket": {
"buckets_path": "groupByField>maxValueTotal"
}
}
}
}
}
}
I hope you are aware that sum_bucket calculates sibling aggregations only, in this case gives sum of max values, not the sum of total_bytes. If you want to get sum of total_bytes you can use sum aggregation
Related
`How do we query (filter) a rollup index?
For example, based on the query here
Request:
{
"size": 0,
"aggregations": {
"timeline": {
"date_histogram": {
"field": "timestamp",
"fixed_interval": "7d"
},
"aggs": {
"nodes": {
"terms": {
"field": "node"
},
"aggs": {
"max_temperature": {
"max": {
"field": "temperature"
}
},
"avg_voltage": {
"avg": {
"field": "voltage"
}
}
}
}
}
}
}
}
Response:
{
"took" : 93,
"timed_out" : false,
"terminated_early" : false,
"_shards" : ... ,
"hits" : {
"total" : {
"value": 0,
"relation": "eq"
},
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"timeline" : {
"buckets" : [
{
"key_as_string" : "2018-01-18T00:00:00.000Z",
"key" : 1516233600000,
"doc_count" : 6,
"nodes" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "a",
"doc_count" : 2,
"max_temperature" : {
"value" : 202.0
},
"avg_voltage" : {
"value" : 5.1499998569488525
}
},
{
"key" : "b",
"doc_count" : 2,
"max_temperature" : {
"value" : 201.0
},
"avg_voltage" : {
"value" : 5.700000047683716
}
},
{
"key" : "c",
"doc_count" : 2,
"max_temperature" : {
"value" : 202.0
},
"avg_voltage" : {
"value" : 4.099999904632568
}
}
]
}
}
]
}
}
}
How to filter say last 3 days, is it possible?
For a test case, I used fixed_interval rate of 1m (one minute, and also 60 minutes) and I tried the following and the error was all query shards failed. Is it possible to query filter rollup agggregations?
Test Query for searching rollup index
{
"size": 0,
"query": {
"range": {
"timestamp": {
"gte": "now-3d/d",
"lt": "now/d"
}
}
}
"aggregations": {
"timeline": {
"date_histogram": {
"field": "timestamp",
"fixed_interval": "7d"
},
"aggs": {
"nodes": {
"terms": {
"field": "node"
},
"aggs": {
"max_temperature": {
"max": {
"field": "temperature"
}
},
"avg_voltage": {
"avg": {
"field": "voltage"
}
}
}
}
}
}
}
}
With the following query, I get the minimum value in each chunk of 15 minutes. I use the moving_fn function. Now I need to get the maximum value in each chunk in 1 hour from the previous request. As I understand it cannot be used for aggregation after moving_fn. How can you do this?
This is my query:
GET logstash-2021.12.2*/_search
{
"query": {
"bool": {
"filter": [
{
"range": {
"#timestamp": {
"gte": "now-24h"
}
}
},
{
"bool": {
"should": [
{
"match_phrase": {
"company": "BLAH-BLAH"
}
}
]
}
}
]
}
},
"size": 0,
"aggs": {
"myDatehistogram": {
"date_histogram": {
"field": "#timestamp",
"interval": "1m",
"offset": "+30s"
}, "aggs": {
"the_count": {
"moving_fn": {
"buckets_path": "_count",
"window": 15,
"script": "MovingFunctions.min(values)"
}
}
}
}
}
}
My response:
"aggregations" : {
"myDatehistogram" : {
"buckets" : [
{
"key_as_string" : "2021-12-25T05:58:30.000Z",
"key" : 1640411910000,
"doc_count" : 1196,
"the_count" : {
"value" : null
}
},
{
"key_as_string" : "2021-12-25T05:59:30.000Z",
"key" : 1640411970000,
"doc_count" : 1942,
"the_count" : {
"value" : 1196.0
}
},
{
"key_as_string" : "2021-12-25T06:00:30.000Z",
"key" : 1640412030000,
"doc_count" : 1802,
"the_count" : {
"value" : 1196.0
}
},
{
"key_as_string" : "2021-12-25T06:01:30.000Z",
"key" : 1640412090000,
"doc_count" : 1735,
"the_count" : {
"value" : 1196.0
}
},
{
"key_as_string" : "2021-12-25T06:02:30.000Z",
"key" : 1640412150000,
"doc_count" : 1699,
"the_count" : {
"value" : 1196.0
}
},
{
"key_as_string" : "2021-12-25T06:03:30.000Z",
"key" : 1640412210000,
"doc_count" : 1506,
"the_count" : {
"value" : 1196.0
}
}
From this response, I need to get the maximum value for each hour. Thank you in advance
Just add a second agg:
"myDatehistogram": {
"date_histogram": {
"field": "#timestamp",
"interval": "1m",
"offset": "+30s"
}, "aggs": {
"min_15": {
"moving_fn": {
"buckets_path": "_count",
"window": 15,
"script": "MovingFunctions.min(values)"
}
}
"max_60": {
"moving_fn": {
"buckets_path": "_count",
"window": 60,
"script": "MovingFunctions.max(values)"
}
}
}
}
I did the following mapping. I would like to count the number of products in each nested field "products" (for each document separately). I would also like to do a histogram aggregation, so that I would know the number of specific bucket sizes.
PUT /receipts
{
"mappings": {
"properties": {
"id" : {
"type": "integer"
},
"user_id" : {
"type": "integer"
},
"date" : {
"type": "date"
},
"sum" : {
"type": "double"
},
"products" : {
"type": "nested",
"properties": {
"name" : {
"type" : "text"
},
"number" : {
"type" : "double"
},
"price_single" : {
"type" : "double"
},
"price_total" : {
"type" : "double"
}
}
}
}
}
}
I've tried this query, but I get the number of all the products instead of number of products for each document separately.
GET /receipts/_search
{
"query": {
"match_all": {}
},
"size": 0,
"aggs": {
"terms": {
"nested": {
"path": "products"
},
"aggs": {
"bucket_size": {
"value_count": {
"field": "products"
}
}
}
}
}
}
Result of the query:
"aggregations" : {
"terms" : {
"doc_count" : 6552,
"bucket_size" : {
"value" : 0
}
}
}
UPDATE
Now I have this code where I make separate buckets for each id and count the number of products inside them.
GET /receipts/_search
{
"query": {
"match_all": {}
},
"size" : 0,
"aggs": {
"terms":{
"terms":{
"field": "_id"
},
"aggs": {
"nested": {
"nested": {
"path": "products"
},
"aggs": {
"bucket_size": {
"value_count": {
"field": "products.number"
}
}
}
}
}
}
}
}
Result of the query:
"aggregations" : {
"terms" : {
"doc_count_error_upper_bound" : 5,
"sum_other_doc_count" : 490,
"buckets" : [
{
"key" : "1",
"doc_count" : 1,
"nested" : {
"doc_count" : 21,
"bucket_size" : {
"value" : 21
}
}
},
{
"key" : "10",
"doc_count" : 1,
"nested" : {
"doc_count" : 5,
"bucket_size" : {
"value" : 5
}
}
},
{
"key" : "100",
"doc_count" : 1,
"nested" : {
"doc_count" : 12,
"bucket_size" : {
"value" : 12
}
}
},
...
Is is possible to group these values (21, 5, 12, ...) into buckets to make a histogram of them?
products is only the path to the array of individual products, not an aggregatable field. So you'll need to use it on one of your product's field -- such as the number:
GET receipts/_search
{
"size": 0,
"aggs": {
"terms": {
"nested": {
"path": "products"
},
"aggs": {
"bucket_size": {
"value_count": {
"field": "products.number"
}
}
}
}
}
}
Note that is a product has no number, it'll not contribute to the total count. It's therefore best practice to always include an ID in each of them and then aggregate on that field.
Alternatively you could use a script to account for missing values. Luckily value_count does not deduplicate -- meaning if two products are alike and/or have empty values, they'll still be counted as two:
GET receipts/_search
{
"size": 0,
"aggs": {
"terms": {
"nested": {
"path": "products"
},
"aggs": {
"bucket_size": {
"value_count": {
"script": {
"source": "doc['products.number'].toString()"
}
}
}
}
}
}
}
UPDATE
You could also use a nested composite aggregation which'll give you the histogrammed product count w/ the corresponding receipt id:
GET /receipts/_search
{
"size": 0,
"aggs": {
"my_aggs": {
"nested": {
"path": "products"
},
"aggs": {
"composite_parent": {
"composite": {
"sources": [
{
"receipt_id": {
"terms": {
"field": "_id"
}
}
},
{
"product_number": {
"histogram": {
"field": "products.number",
"interval": 1
}
}
}
]
}
}
}
}
}
}
The interval is modifiable.
I am running following aggregation query with nested filter
GET <indexname>/_search
{
"aggs": {
"NAME": {
"nested": {
"path": "crm.LeadStatusHistory"
},
"aggs": {
"agg_filter": {
"filter": {
"bool": {
"must": [
{
"nested": {
"path": "crm",
"query": {
"terms": {
"crm.City.keyword": [
"Rewa"
]
}
}
}
},
{
"nested": {
"path": "crm",
"query": {
"terms": {
"crm.LeadID": [
27961
]
}
}
}
}
]
}
},
"aggs": {
"agg_terms":{
"terms": {
"field": "crm.LeadStatusHistory.StatusID",
"size": 1000
}
}
}
}
}
}
}
}
-----> i have following document
{
"_index" : "crm",
"_type" : "_doc",
"_id" : "4478",
"_score" : 1.0,
"_source" : {
"crm" : [
{
"LeadStatusHistory" : [
{
"StatusID" : 3
},
{
"StatusID" : 2
},
{
"StatusID" : 1
}
],
"LeadID" : 27961,
"City" : "Rewa"
},
{
"LeadStatusHistory" : [
{
"StatusID" : 1
},
{
"StatusID" : 3
},
{
"StatusID" : 2
}
],
"LeadID" : 27959,
"City" : "Rewa"
}
]
}
}]
However in response i am getting following result
"aggregations" : {
"NAME" : {
"doc_count" : 4332,
"agg_filter" : {
"doc_count" : 1,
"agg_terms" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 1,
"doc_count" : 1
}
]
}
}
}
}
Query===> As per source document, i have 3 nested 'crm.LeadStatusHistory' documents for crm.LeadID = 27961. However, results shows for agg_filter equals to 1 instead of 3. Can you please let me know the reason for this case.
Your agg_filter is on the crm.LeadStatusHistory => will target only 1 doc (LeadStatusHistory is one doc, contaning in your case link to others doc).
i build a query who show that, and i thinck will answer to your problem. You will see the different doc_count for each aggregation.
{
"size": 0,
"aggs": {
"NAME": {
"nested": {
"path": "crm"
},
"aggs": {
"agg_LeadID": {
"terms": {
"field": "crm.LeadID"
},
"aggs": {
"agg_LeadStatusHistory": {
"nested": {
"path": "crm.LeadStatusHistory"
},
"aggs": {
"home_type_name": {
"terms": {
"field": "crm.LeadStatusHistory.StatusID"
}
}
}
}
}
}
}
}
}
}
with this one you can count them, with a script (and filter if needed so):
{
"size": 0,
"aggs": {
"NAME": {
"nested": {
"path": "crm"
},
"aggs": {
"agg_LeadID": {
"terms": {
"field": "crm.LeadID"
},
"aggs": {
"agg_LeadStatusHistory": {
"nested": {
"path": "crm.LeadStatusHistory"
},
"aggs": {
"agg_LeadStatusHistory_sum": {
"sum": {
"script": "doc['crm.LeadStatusHistory.StatusID'].values.length"
}
}
}
}
}
}
}
}
}
}
note: if want to get the number of nested documents, take a look to inner_hits:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-body.html#request-body-search-inner-hits
I differ with the response that in 'crm.LeadStatusHistory' is one doc. I have run aggregation query on crm.LeadstatusHistory without filters.
GET crm/_search
{
"_source": ["crm.LeadID","crm.LeadStatusHistory.StatusID","crm.City"],
"size": 10000,
"query": {
"nested": {
"path": "crm",
"query": {
"match": {
"crm.LeadID": "27961"
}
}
}
},
"aggs": {
"agg_statuscount": {
"nested": {
"path": "crm.LeadStatusHistory"
},
"aggs": {
"agg_terms":{
"terms": {
"field": "crm.LeadStatusHistory.StatusID",
"size": 1000
}
}
}
}
}
}
I get following response from above query which shows 'agg_statuscount' as 6 docs without filters
{
"took" : 6,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "crm",
"_type" : "_doc",
"_id" : "4478",
"_score" : 1.0,
"_source" : {
"crm" : [
{
"LeadStatusHistory" : [
{
"StatusID" : 3
},
{
"StatusID" : 2
},
{
"StatusID" : 1
}
],
"LeadID" : 27961,
"City" : "Rewa"
},
{
"LeadStatusHistory" : [
{
"StatusID" : 1
},
{
"StatusID" : 3
},
{
"StatusID" : 2
}
],
"LeadID" : 27959,
"City" : "Rewa"
}
]
}
}
]
},
"aggregations" : {
"agg_statuscount" : {
"doc_count" : 6,
"agg_terms" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 1,
"doc_count" : 2
},
{
"key" : 2,
"doc_count" : 2
},
{
"key" : 3,
"doc_count" : 2
}
]
}
}
}
}
Hence with crm.LeadID = 27961 in aggregation filter, i expected 3 'crm.LeadStatusHistory' docs. Currently the response is 1 as in my original question.
I have this query
GET /my_index3/_search
{
"size": 0,
"aggs": {
"num1": {
"terms": {
"field": "num1.keyword",
"order" : { "_count" : "desc" }
},
"aggs": {
"count_of_distinct_suffix": {
"cardinality" :{
"field" : "suffix.keyword"
},
"aggs": {
"filter_count": {
"bucket_selector": {
"buckets_path": {
"the_doc_count": "_count"
},
"script": "params.doc_count == 2"
}
}
}
}
}
}
}
}
Output:
"key" : "1563866656878888",
"doc_count" : 42,
"count_of_distinct_suffix" : {
"value" : 2
}
},
{
"key" : "1563866656871111",
"doc_count" : 40,
"count_of_distinct_suffix" : {
"value" : 2
}
},
{
"key" : "1563867854325555",
"doc_count" : 36,
"count_of_distinct_suffix" : {
"value" : 1
}
},
{
"key" : "1563867854323333",
"doc_count" : 12,
"count_of_distinct_suffix" : {
"value" : 1
}
},
I want to see only the results which have "count_of_distinct_suffix" : { "value" : 2 }
I'm thinking about bucket selector aggregation but it's impossible to add it into the cardinality aggs...
"aggs": {
"my_filter": {
"bucket_selector": {
"buckets_path": {
"the_doc_count": "_count"
},
"script": "params.doc_count == 2"
}
}
}
It gives me the following error: Aggregator [count_of_distinct_suffix] of type [cardinality] cannot accept sub-aggregations
Do you guys have any idea to solve it?
Thank you very much for any help in advance !!
You don't have to add the bucket_selector aggs as a sub aggregation of cardinality aggs. Instead you should add it parallel to it as below:
{
"size": 0,
"aggs": {
"num1": {
"terms": {
"field": "num1.keyword",
"order": {
"_count": "desc"
}
},
"aggs": {
"count_of_distinct_suffix": {
"cardinality": {
"field": "suffix.keyword"
}
},
"my_filter": {
"bucket_selector": {
"buckets_path": {
"the_doc_count": "count_of_distinct_suffix"
},
"script": "params.the_doc_count == 2"
}
}
}
}
}
}