How to use Aggregation by range time and terms - elasticsearch

Idea: Search Top events on specific range and order by start_time. Like:
{
"from": 0,
"size": 7,
"query": {
"filtered": {
"query": { "match_all": {} },
"filter": {
"and": [
{ "bool": { "must_not": { "term": { "status": "OK" } } } },
{ "bool": { "must": { "term": { "is_blocked": false } } } }, {
"range": {
"start_time": {
"gte": "2016-01-01",
"lte": "2016-03-01"
}
}
}, {
"bool": {
"must": {
"geo_distance": {
"distance": "150km",
"coordinates": "xx.xxx, zz.zz "
}
}
}
}
]
}
}
},
"sort": [{ "start_time": "asc" },
{ "attending": "desc" }
]
}
I quite new on this concept of aggregations so still with basic problems to understand
I wanna 7 results of top events for the next 2 months. So I have two attributes to look. The max of people attending(attendings) is the definition of Top, but also I wanna order this by time(start_time: asc)
What I start to wrote but is wrong:
{
"aggs": {
"aggs": {
"event_interval": {
"date_histogram": {
"field": "start_time",
"interval": "2M",
"format": "dateOptionalTime"
}
},
"max_attending": { "max": { "field": "attending" } },
"_source": {
"include": [
"name"
]
}
}
}
}

I'm not sure you need to be using an aggregation to get what you are looking for, I think that a simple query can yield the results you would like to see, try this:
{
"size": 7,
"sort": {
"attending": {
"order": "desc"
}
},
"query": {
"bool": {
"filter": [
{
"range": {
"start_time": {
"gte": "now-2M",
"lte": "now"
}
}
}
]
}
}
}

Related

Is it possible to search by daily time range between dates?

I can use aggregate to make some stats between two timestamps as following:
{
"size": 0,
"query": {
"bool": {
"filter": [
{
"term": {
"status": "ok"
}
},
{
"term": {
"deviceId": "123456789"
}
},
{
"range": {
"time": {
"gte": 1669852800,
"lt": 1671062400
}
}
}
]
}
},
"aggs": {
"results": {
"date_histogram": {
"field": "time",
"fixed_interval": "60",
}
}
}
}
Is it possible to query the results contain specific time range daily only? For example, 7am - 9am daily between Dec.1 to Dec.15. How to achieve it?
I found the solution on elasticsearch v7.15.2 as following:
{
"size": 0,
"query": {
"bool": {
"filter": [
{
"term": {
"status": "ok"
}
},
{
"term": {
"deviceId": "123456789"
}
},
{
"range": {
"time": {
"gte": 1669852800,
"lt": 1671062400
}
}
},
{
"script": {
"script": {
"source": "doc.time.value.getHourOfDay() >= params.min && doc.time.value.getHourOfDay() < params.max",
"params": {
"min": 8,
"max": 10
}
}
}
}
]
}
},
"aggs": {
"results": {
"date_histogram": {
"field": "time",
"fixed_interval": "60"
}
}
}
}
The syntax is slightly different from the comment above, but it works.

Elasticsearch Composite aggregation with pagination

I am using this query to fetch aggregated result but because the result matching the query criteria is very large the number of buckets is larger than 10000.
How should I write/modify this query so that I can paginate the result?
I have read that bucket aggregation doesn't allow pagination but it can be converted into composite aggregation to support pagination of response.
Any alternate way to paginate the response would also be helpful.
{
"query": {
"bool": {
"must": [
{
"range": {
"timestamp": {
"gte": "2022-04-26T00:00:00.000Z",
"lte": "2022-04-26T23:59:59.999Z"
}
}
},
{
"terms": {
"job.keyword": [
"JOB_1",
"JOB_2",
"JOB_3"
]
}
},
{
"bool": {
"should": [
{
"nested": {
"path": "tags",
"query": {
"bool": {
"should": [
{
"bool": {
"must": [
{
"term": {
"tags.name.keyword": "jobType"
}
},
{
"term": {
"tags.value.keyword": "discrete"
}
}
]
}
}
]
}
}
}
}
]
}
}
]
}
},
"size": 0,
"aggs": {
"job": {
"terms": {
"field": "job.keyword"
},
"aggs": {
"accountId": {
"terms": {
"field": "accountId.keyword",
"size": 10000
},
"aggs": {
"accountUsageStats": {
"stats": {
"field": "count"
}
},
"tags": {
"top_hits": {
"size": 1,
"_source": {
"include": [
"tags"
]
}
}
}
}
}
}
}
}
}

Elasticsearch - Calculate sub range aggregation

I have following ES query to calculate a host average CPU in the last 30 days.
es_query = {
"query": {
"constant_score": {
"filter": {
"bool": {
"must": [
{"range": {"#timestamp": {"gte": "now-30d",}}},
{"query_string": {"query": 'hostname: myhost',"analyze_wildcard": True}}
],
"should": [
{"match": {"metricset.name": "cpu"}}
]
}
}
}
},
"aggs": {
"group_by_time_interval": {
"date_histogram": {
"field": "#timestamp",
"interval": "1h",
"time_zone": "PST8PDT",
"min_doc_count": 1
},
"aggs": {
"cpu_used_avg_pct": {"avg": {"field": "system.cpu.total.pct"}}
}
},
"avg_monthly_cpu_pct": {
"avg_bucket": {
"buckets_path": "group_by_time_interval>cpu_used_avg_pct"
}
}
}
}
After execute it return the avg CPU of last 30 days as expected.
The question is: How I can also compute avg CPU of the last 7 days , by just extending the above query ?
Currently, my dumb solution is to copy to another query, replace "gte: now-30d" by "gte: now-7d" then run again, which is very time consuming.
Thank you.
Alex
The easiest you can do is simply to add another aggregation that is filtered on the last 7 days:
{
"query": {
"constant_score": {
"filter": {
"bool": {
"must": [
{
"range": {
"#timestamp": {
"gte": "now-30d"
}
}
},
{
"query_string": {
"query": "hostname: myhost",
"analyze_wildcard": true
}
}
],
"should": [
{
"match": {
"metricset.name": "cpu"
}
}
]
}
}
}
},
"aggs": {
"group_by_time_interval": {
"date_histogram": {
"field": "#timestamp",
"interval": "1h",
"time_zone": "PST8PDT",
"min_doc_count": 1
},
"aggs": {
"cpu_used_avg_pct": {
"avg": {
"field": "system.cpu.total.pct"
}
}
}
},
"avg_monthly_cpu_pct": {
"avg_bucket": {
"buckets_path": "group_by_time_interval>cpu_used_avg_pct"
}
},
"last_7_days": {
"filter": {
"range": {
"#timestamp": {
"gte": "now-7d"
}
}
},
"aggs": {
"last_7_days_interval": {
"date_histogram": {
"field": "#timestamp",
"interval": "1h",
"time_zone": "PST8PDT",
"min_doc_count": 1
},
"aggs": {
"cpu_used_avg_pct": {
"avg": {
"field": "system.cpu.total.pct"
}
}
}
},
"avg_monthly_cpu_pct": {
"avg_bucket": {
"buckets_path": "last_7_days_interval>cpu_used_avg_pct"
}
}
}
}
}
}

How to nest bool queries?

I am building a search query which dynamically adds a set of constraints (bool) to the query. The general expected structure is as follows
OR (
AND (
condition
condition
...
)
AND (
condition
condition
...
)
)
In other words I have a set (one or more) of conditions which must all be met (AND above). There may be several of such sets, any of them should be enough for the final match (the OR above).
An example of such structure, as generated by my code (this is the full API query, the generated part is "bool"):
{
"query": {
"bool": {
"should": [
{
"bool": {
"must": [
{
"term": {
"attack_ip": "10.89.7.117"
}
},
{
"term": {
"sentinel_port": "17"
}
}
]
}
},
{
"bool": {
"must": [
{
"term": {
"attack_ip": "10.89.7.118"
}
}
]
}
}
]
},
"range": {
"eventtime": {
"gte": "2018-03-05T12:47:22.397+01:00"
}
}
},
"size": 0,
"aggs": {
"src": {
"terms": {
"field": "attack_ip",
"size": 1000
},
"aggs": {
"dst": {
"terms": {
"field": "sentinel_hostname_lan",
"size": 2000
}
}
}
}
}
}
My understanding of this query was:
if "attack_ip === 10.89.7.117" and "sentinel_port === 17"
or
if "attack_ip === 10.89.7.118"
the entry will match
Unfortunately I get upon calling Elasticsearch the error
"error": {
"root_cause": [
{
"type": "parsing_exception",
"reason": "[bool] malformed query, expected [END_OBJECT] but found [FIELD_NAME]",
"line": 1,
"col": 177
}
],
"type": "parsing_exception",
"reason": "[bool] malformed query, expected [END_OBJECT] but found [FIELD_NAME]",
"line": 1,
"col": 177
},
"status": 400
}
What does this error mean?
EDIT
Following Piotr's answer, I tried to move the range constraint into the boolean part. I get the same error, though.
My query is available online for easier reading and reproduced below:
{
"query": {
"bool": {
"must": [
{
"bool": {
"should": [
{
"bool": {
"must": [
{
"term": {
"attack_ip": "10.89.7.117"
}
},
{
"term": {
"sentinel_port": "17"
}
}
]
}
},
{
"bool": {
"must": [
{
"term": {
"attack_ip": "10.89.7.118"
}
}
]
}
}
]
}
},
{
"range": {
"eventtime": {
"gte": "2018-03-05T13:55:27.927+01:00"
}
}
}
]
},
"size": 0,
"aggs": {
"src": {
"terms": {
"field": "attack_ip",
"size": 1000
},
"aggs": {
"dst": {
"terms": {
"field": "sentinel_hostname_lan",
"size": 2000
}
}
}
}
}
}
}
I think the problem you have is with range part. Try to move it inside the bool:
{
"query": {
"bool": {
"should": [{
"bool": {
"must": [{
"term": {
"attack_ip": "10.89.7.117"
}
},
{
"term": {
"sentinel_port": "17"
}
}
]
}
},
{
"term": {
"attack_ip": "10.89.7.118"
}
}
],
"must": {
"range": {
"eventtime": {
"gte": "2018-03-05T12:47:22.397+01:00"
}
}
}
}
},
"size": 0,
"aggs": {
"src": {
"terms": {
"field": "attack_ip",
"size": 1000
},
"aggs": {
"dst": {
"terms": {
"field": "sentinel_hostname_lan",
"size": 2000
}
}
}
}
}
}
or move it to filter section:
{
"query": {
"bool": {
"should": [{
"bool": {
"must": [{
"term": {
"attack_ip": "10.89.7.117"
}
},
{
"term": {
"sentinel_port": "17"
}
}
]
}
},
{
"term": {
"attack_ip": "10.89.7.118"
}
}
],
"filter": {
"bool": {
"must": [{
"range": {
"eventtime": {
"gte": "2018-03-05T12:47:22.397+01:00"
}
}
}]
}
}
}
},
"size": 0,
"aggs": {
"src": {
"terms": {
"field": "attack_ip",
"size": 1000
},
"aggs": {
"dst": {
"terms": {
"field": "sentinel_hostname_lan",
"size": 2000
}
}
}
}
}
}
I hope I formatted this correctly. Please let me know if you have any issues.
In the end, it is possible that you will need to specify minimum_should_match param for bool query to get correct results.

How to properly do sorting under aggregation?

I am still new to elasticSearch, and i have a doubt here. Would like to get assits. I have some error on properly do sorting under aggregation. Please advice me. Thank YOu
{
"size": 20,
"query": {
"bool": {
"filter": [
{
"range": {
"ts": {
"gt": "2016-08-22T00:00:00.000Z",
"lt": "2016-08-23T13:41:09.000Z"
}
}
}
]
}
},
"aggs": {
"group_by_ip": {
"terms": {
"field": "id_orig_h"
},
"aggs": {
"sum_volume": {
"sum": {
"field": "resp_bytes",
"sort": [
{
"resp_bytes": {
"order": "asc"
}
}
]
}
}
}
}
}
}
You can do it with the order setting in your terms aggregation referencing the sum_volume sub-aggregation:
{
"size": 20,
"query": {
"bool": {
"filter": [
{
"range": {
"ts": {
"gt": "2016-08-22T00:00:00.000Z",
"lt": "2016-08-23T13:41:09.000Z"
}
}
}
]
}
},
"aggs": {
"group_by_ip": {
"terms": {
"field": "id_orig_h",
"order": {
"sum_volume": "asc"
}
},
"aggs": {
"sum_volume": {
"sum": {
"field": "resp_bytes"
}
}
}
}
}
}

Resources