How to perform complex query on aggregated fields in ElasticSearch - elasticsearch

I am trying to figure out how to perform a complex query in elastic search, let say I have the following table of data:
Which I got from the following query
{
"aggs": {
"3": {
"terms": {
"field": "ColumnA",
"order": {
"_key": "desc"
},
"size": 50
},
"aggs": {
"4": {
"terms": {
"field": "ColumnB",
"order": {
"_key": "desc"
},
"size": 50
},
"aggs": {
"5": {
"terms": {
"field": "ColumnC",
"order": {
"_key": "desc"
},
"size": 50
},
"aggs": {
"sum_of_views": {
"sum": {
"field": "views"
}
},
"sum_of_costs": {
"sum": {
"field": "cost"
}
},
"sum_of_clicks": {
"sum": {
"field": "clicks"
}
},
"sum_of_earned": {
"sum": {
"field": "earned"
}
},
"sum_of_adv_earned": {
"sum": {
"field": "adv_earned"
}
}
}
}
}
}
}
}
},
"size": 0,
"_source": {
"excludes": []
},
"stored_fields": [
"*"
],
"script_fields": {},
"docvalue_fields": [
{
"field": "hour",
"format": "date_time"
}
],
"query": {
"bool": {
"must": [],
"filter": [
{
"match_all": {}
},
{
"range": {
"hour": {
"format": "strict_date_optional_time",
"gte": "2019-08-08T06:29:34.723Z",
"lte": "2020-08-08T06:29:34.724Z"
}
}
}
],
"should": [],
"must_not": []
}
}
}
Now for example, if I want to get the records that have the following condition
(sum_of_clicks / sum_of_views) * (sum_of_earned2 / sum_of_earned1) < 0.5
What should I query?

Think the below should help. My understanding is that you would want to first group based on ColumnA, ColumnB, ColumnC, calculate the sum for clicks, views, earned1 and earned2 fields and then apply the custom aggregation logic you are looking for.
I've been able to come up with the below query where I've made use of Bucket Selector Aggregation.
POST <your_index_name>/_search
{
"size": 0,
"aggs": {
"3": {
"terms": {
"field": "ColumnA",
"order": {
"_key": "desc"
},
"size": 50
},
"aggs": {
"4": {
"terms": {
"field": "ColumnB",
"order": {
"_key": "desc"
},
"size": 50
},
"aggs": {
"5": {
"terms": {
"field": "ColumnC",
"order": {
"_key": "desc"
},
"size": 50
},
"aggs": {
"sum_views": {
"sum": {
"field": "views"
}
},
"sum_clicks": {
"sum": {
"field": "clicks"
}
},
"sum_earned1": {
"sum": {
"field": "earned1"
}
},
"sum_earned2": {
"sum": {
"field": "earned2"
}
},
"custom_sum_bucket_filter": {
"bucket_selector": {
"buckets_path": {
"sum_of_views": "sum_views",
"sum_of_clicks": "sum_clicks",
"sum_of_earned1": "sum_earned1",
"sum_of_earned2": "sum_earned2"
},
"script": "(params.sum_of_views/params.sum_of_clicks) * (params.sum_of_earned1/params.sum_of_earned2) < 0.5"
}
}
}
},
"min_bucket_selector": {
"bucket_selector": {
"buckets_path": {
"valid_docs_count": "5._bucket_count"
},
"script": {
"source": "params.valid_docs_count >= 1"
}
}
}
}
},
"min_bucket_selector": {
"bucket_selector": {
"buckets_path": {
"valid_docs_count": "4._bucket_count"
},
"script": {
"source": "params.valid_docs_count >= 1"
}
}
}
}
}
}
}
Note that to get the exact result you are looking for, I've had to add the filter conditions of buckets at 4 and 5.
The aggregations I've made use are
Bucket Selector to calculate the condition you've mentioned
Again Bucket Selector so as to not display empty buckets at aggregation 5
Again a bucket selector so as to now show empty buckets aggregation at level 4.
In order to test why I've added the additional empty bucket filters, you can just remove them and see what results you observe.
Note that for sake of simplicity I have ignored the query part as well as the cost field. Please feel free to add them and test it.

Related

How to create OpenSearch query to calculate error rate and trigger condition

I tried creating OpenSearch query to calculate error rate for setting up alerts.
What I'm trying to do is calculate count of #message : "Error" / count of #timestamp * 100.
but it is not working as the #timestamp count is considered as Object rather than number.
Here is my code :
GET Data/_search
{
"size": 0,
"query": {
"bool": {
"must": [{
"match": {
"#message": "Error"
}
}],
"filter": [],
"should": [],
"must_not": []
}
},
"aggs": {
"month": {
"date_histogram": {
"field": "#timestamp",
"interval": "10m"
},
"aggs": {
"#timestamp": {
"terms": {
"field": "#timestamp",
"order": {
"_count": "desc"
}
}
},
"#message": {
"terms": {
"field": "#message.keyword",
"order": {
"_count": "desc"
}
}
},
"rate": {
"bucket_script": {
"buckets_path": {
"timestamp_count": "#timestamp._count",
"error_count": "#message._count"
},
"script": "params.error_count / params.timestamp_count * 100"
}
}
}
}
}
}
Any help would be appreciated.

Elastic search query Pagination help on Aggregations

I have tried the below query for the Pagination on Aggregations but not working properly.
I Am getting the error "reason": "[40:7] [terms] unknown field [from], parser not found"
{
"size": 0,
"query": {
"bool": {
"must": [
{
"term": {
"answer.keyword": "UNHANDLED"
}
},
{
"term": {
"source.keyword": "QUAL2"
}
}
]
}
},
"aggs": {
"MyBuckets": {
"terms": {
"field": "question.keyword",
"order": {
"_count": "asc"
},
"size": "10"
},
"aggs": {
"MyBuckets": {
"terms": {
"field": "timestamp",
"order": {
"_count": "asc"
},
"size": "3",
"from": 8
}
}
}
}
}
}
Only size is supported, you have to remove the param from from the aggregation query.
You can try using partitions in the aggreagtion
Try out the below query:
{
"size": 0,
"query": {
"bool": {
"must": [
{
"term": {
"answer.keyword": "UNHANDLED"
}
},
{
"term": {
"source.keyword": "QUAL2"
}
}
]
}
},
"aggs": {
"MyBuckets": {
"terms": {
"field": "question.keyword",
"order": {
"_count": "asc"
},
"size": "10"
},
"aggs": {
"MyBuckets": {
"terms": {
"field": "timestamp",
"order": {
"_count": "asc"
},
"size": "3",
"include": {
"partition": 1,
"num_partitions": 10
}
}
}
}
}
}
}

Get maximum and minimum value using group by channel id

I want to get maximum and minimum value using group by channel id and also want to get maximum video id and minimum video id
{
"query": {
"term": {
"channel_id.keyword": {
"value": "UCQOd1f6pYldvhgvdQ_ktpGA"
}
}
},
"aggs": {
"views_max": {
"max": {
"field": "views",
"missing": 0
},
"_source":["video_id","views"]
},
"views_min": {
"min": {
"field": "views",
"missing": 0
},
"_source":["video_id","views"]
}
}
}
{
"aggs": {
"2": {
"terms": {
"field": "channel_id.keyword",
"order": {
"1": "desc"
},
"size": 10
},
"aggs": {
"1": {
"max": {
"field": "video_id"
}
},
"3": {
"min": {
"field": "video_id"
}
}
}
}
},
"size": 0,
"_source": {
"excludes": []
},
"query": {
"bool": {
"must": [],
"filter": [
{
"bool": {
"should": [
{
"match": {
"channel_id.keyword": "UCQOd1f6pYldvhgvdQ_ktpGA"
}
}
],
"minimum_should_match": 1
}
}
]
}
}
}
The above query will give the maximum and minimum of video_id for a particular channel_id.
{
"aggs": {
"2": {
"terms": {
"field": "channel_id.keyword",
"order": {
"1": "desc"
},
"size": 10
},
"aggs": {
"1": {
"max": {
"field": "video_id"
}
},
"3": {
"min": {
"field": "video_id"
}
}
}
}
},
"size": 0,
"_source": {
"excludes": []
}
}
With the above query, you will be able to fetch for all the distinct channel_id its respective maximum and minimum video_id

How to order serial_diff aggregation result in Elasticsearch?

I have build a query based on serial_diff aggregation. I am trying to sort the result based on the result of the serial_diff agg. I am struggling to get the result in order, below.
GET db/_search
{
"size": 0,
"query": {
"bool": {
"must": [
{
"terms": {
"Name": [
"q"
]
}
}
],
"filter": [
{
"range": {
"ts": {
"gte": "2020-03-09T09:00:00.000Z",
"lte": "2020-03-09T12:40:00.000Z",
"format": "date_optional_time"
}
}
}
]
}
},
"aggs": {
"sourceNameCount": {
"cardinality": {
"field": "sourceName"
}
},
"sourceName": {
"terms": {
"size": 100,
"field": "sourceName"
},
"aggs": {
"timeseries": {
"date_histogram": {
"field": "ts",
"min_doc_count": 1,
"interval": "15m",
"order": {
"_key": "asc"
}
},
"aggs": {
"the_sum":{
"avg":{
"field": "libVal"
}
},
"ts_diff":{
"serial_diff": {
"buckets_path": "the_sum",
"lag": 1
}
}
}
}
}
}
}
}

Combine two elastic queries into 1. How?

I have two queries which fetched results when performed a GET operation.
The 1st query is -
{
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "*",
"analyze_wildcard": true
}
},
{
"range": {
"database-status.meta.current-time": {
"lte": "now-91d/d"
}
}
}
],
"must_not": []
}
},
"size": 0,
"_source": {
"excludes": []
},
"aggs": {
"2": {
"date_histogram": {
"field": "database-status.meta.current-time",
"interval": "1h",
"time_zone": "CST6CDT",
"min_doc_count": 1
},
"aggs": {
"3": {
"terms": {
"field": "database-status.name.keyword",
"size": 500,
"order": {
"1": "desc"
}
},
"aggs": {
"1": {
"sum": {
"field": "database-status.status-properties.rate-properties.cache-properties.compressed-tree-cache-hit-rate.value",
"script": "_value/60"
}
}
}
}
}
}
}
}
and the 2nd query is -
{
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "*",
"analyze_wildcard": true
}
},
{
"range": {
"database-status.meta.current-time": {
"lte": "now-91d/d"
}
}
}
],
"must_not": []
}
},
"size": 0,
"_source": {
"excludes": []
},
"aggs": {
"2": {
"date_histogram": {
"field": "database-status.meta.current-time",
"interval": "1h",
"time_zone": "CST6CDT",
"min_doc_count": 1
},
"aggs": {
"3": {
"terms": {
"field": "database-status.name.keyword",
"size": 500,
"order": {
"1": "desc"
}
},
"aggs": {
"1": {
"sum": {
"field": "database-status.status-properties.rate-properties.cache-properties.compressed-tree-cache-miss-rate.value",
"script": "_value/60"
}
}
}
}
}
}
}
}
How do I combine two queries into 1 query and get both the results in the same result sets? Based on this I'll try to replicate the method with other queries and even try to combine 3 or more queries into 1.
There are two options to do that:
using multi search (msearch) will allow you to run one request to ES containing both queries. The response of the msearch will contain both queries responses separately, and you can then choose how to combine the answers.
combine the queries in a single bool:
so lets say you have:
Q1->bool->must->inner-q-1
and Q2->bool->must->inner-q-2
then you can combine them with should:
Q3->bool->should->[inner-q-1, inner-q-2], with minimum_should_match equals 1 (very important!)
I made use of nested aggregation.
Here is the combined code -
{
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "*",
"analyze_wildcard": true
}
},
{
"range": {
"server-status.meta.current-time": {
"lte": "now-91d/d"
}
}
}
],
"must_not": []
}
},
"size": 0,
"_source": {
"excludes": []
},
"aggs": {
"time-interval": {
"date_histogram": {
"field": "server-status.meta.current-time",
"interval": "1h",
"time_zone": "CST6CDT",
"min_doc_count": 1
},
"aggs": {
"http-server": {
"terms": {
"field": "server-status.type.keyword",
"include": "http-server",
"size": 500,
"order": {
"1": "desc"
}
},
"aggs": {
"1": {
"sum": {
"field": "server-status.status-properties.expanded-tree-cache-hit-rate.value",
"script": "_value/60"
}
},
"2": {
"sum": {
"field": "server-status.status-properties.expanded-tree-cache-miss-rate.value",
"script": "_value/60"
}
},
"3": {
"terms": {
"field": "server-status.name.keyword",
"size": 500,
"order": {
"1": "desc"
}
},
"aggs": {
"1": {
"sum": {
"field": "server-status.status-properties.expanded-tree-cache-hit-rate.value",
"script": "_value/60"
}
},
"2": {
"sum": {
"field": "server-status.status-properties.expanded-tree-cache-miss-rate.value",
"script": "_value/60"
}
}
}
}
}
}
}
}
}
}

Resources