Elastic - Filter after selecting top 5 hits - elasticsearch

I'm using the alerting feature in Kibana and I want to check if the last 5 consecutive values of a field exceed a threshold x but if I use a filter in my elastic query, it gets applied before the top N aggregation.
Is there a way in which I can apply the filter after or check if the last consecutive values exceed a threshold using some other selector or method? I don't want to check this in the trigger condition in painless because that will return all the documents in the ctx and not just the ones which exceeded the threshold which I want to display in my alert message.
I've been stuck with this for a while and I have only seen blog posts saying sub aggregation is not possible on top N so any help or work around would be much appreciated.
This is my query :
{
"size": 500,
"query": {
"bool": {
"filter": [
{
"match_all": {
"boost": 1
}
},
{
"match_phrase": {
"client.id": {
"query": "42",
"slop": 0,
"zero_terms_query": "NONE",
"boost": 1
}
}
},
{
"range": {
"#timestamp": {
"from": "{{period_end}}||-10m",
"to": "{{period_end}}",
"include_lower": true,
"include_upper": true,
"format": "epoch_millis",
"boost": 1
}
}
}
],
"adjust_pure_negative": true,
"boost": 1
}
},
"aggs": {
"2": {
"terms": {
"field": "component.name",
"order": {
"_key": "desc"
},
"size": 50
},
"aggs": {
"3": {
"terms": {
"field": "client.name.keyword",
"order": {
"_key": "desc"
},
"size": 5
},
"aggs": {
"1": {
"top_hits": {
"docvalue_fields": [
{
"field": "gc.oldgen.used",
"format": "use_field_mapping"
}
],
"_source": "gc.oldgen.used",
"size": 5,
"sort": [
{
"#timestamp": {
"order": "desc"
}
}
]
}
}
}
}
}
}
}
}
}

Did you try to use a sub filter aggregation:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-filter-aggregation.html
Or you can use a pipeline aggregation to manipulate your aggregations results
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-pipeline.html
by the way, a term query on the client id looks more appropriate.

Related

Aggregating against two fields returns nulls for one of them

I've got an index with a lot of records with many fields, including "cacheName" & "cache_ip". Each unique value of "cacheName" has 1 or more records with 1 or more values of corresponding "cache_ip". Each record has a unique 'ts' (timestamp) field as well. For example:
{
"cacheName": "c001.abc001.xyz",
"cache_ip": "1.1.1.0",
},
{
"cacheName": "c001.abc001.xyz",
"cache_ip": "1.1.2.0",
},
{
"cacheName": "c002.efg001.mno",
"cache_ip": "1.1.9.1",
},
{
"cacheName": "c002.efg001.mno",
"cache_ip": "1.1.9.1",
},
I'm trying to craft a search that will return, at most, each unique 'cacheName' & 'cache_ip' record. For the above example, I would get back a total of 3 hits ("cacheName"="c002.efg001.mno" would only be returned once, since it only has one unique permutation).
This is the closest that I've come, but it always returns a Null value for "cache_ip" instead of the actual value (there are no null values in the actual data):
{
"size": 0, 'sort': [{'ts': {'order': 'desc'}}],
"query": {
"bool": {
"must": [
{"match_all": {}},
{"range": {'ts': {'gte': '20200818T010100Z', 'format': 'basic_date_time_no_millis'}}},
]
}
},
"aggs": {
"cacheName": {
"terms": {
"field": "cacheName",
"size": 10000, "order": {"_key": "desc"},
},
"aggs": {
"cache_ip": {"terms": {"field": "cache_ip"}},
},
},
},
}
I'd appreciate any insight, as I'm pulling my hair out trying to make this work.
thanks!
One way to achieve what you want to is use scripting to create all the permutations and you wouldn't need the second terms sub-aggregation:
{
"size": 0,
"sort": [
{
"ts": {
"order": "desc"
}
}
],
"query": {
"bool": {
"must": [
{
"range": {
"ts": {
"gte": "20200818T010100Z",
"format": "basic_date_time_no_millis"
}
}
}
]
}
},
"aggs": {
"cacheName": {
"terms": {
"script": {
"source": "[doc.cache_name.value ?: 'no.name', doc.cache_ip.value ?: 'no.ip'].join('-')"
},
"size": 10000,
"order": {
"_key": "desc"
}
}
}
}
}

How to convert ElasticSearch query to ES7

We are having a tremendous amount of trouble converting an old ElasticSearch query to a newer version of ElasticSearch. The original query for ES 1.8 is:
{
"query": {
"filtered": {
"query": {
"query_string": {
"query": "*",
"default_operator": "AND"
}
},
"filter": {
"and": [
{
"terms": {
"organization_id": [
"fred"
]
}
}
]
}
}
},
"size": 50,
"sort": {
"updated": "desc"
},
"aggs": {
"status": {
"terms": {
"size": 0,
"field": "status"
}
},
"tags": {
"terms": {
"size": 0,
"field": "tags"
}
}
}
}
and we are trying to convert it to ES version 7. Does anyone know how to do that?
The Elasicsearch docs for Filtered query in 6.8 (the latest version of the docs I can find that has the page) state that you should move the query and filter to the must and filter parameters in the bool query.
Also, the terms aggregation no longer support setting size to 0 to get Integer.MAX_VALUE. If you really want all the terms, you need to set it to the max value (2147483647) explicitly. However, the documentation for Size recommends using the Composite aggregation instead and paginate.
Below is the closest query I could make to the original that will work with Elasticsearch 7.
{
"query": {
"bool": {
"must": {
"query_string": {
"query": "*",
"default_operator": "AND"
}
},
"filter": {
"terms": {
"organization_id": [
"fred"
]
}
}
}
},
"size": 50,
"sort": {
"updated": "desc"
},
"aggs": {
"status": {
"terms": {
"size": 2147483647,
"field": "status"
}
},
"tags": {
"terms": {
"size": 2147483647,
"field": "tags"
}
}
}
}

ES query ignoring time range filter

I have mimicked how kibana does a query search and have come up with the below query. Basically I'm looking for the lat 6 days of data (including those days where there is no data, since I need to feed it to a graph). But the returned buckets is giving me more than just those days. I woul like to understand where I'm going wring with this.
{
"version": true,
"size": 0,
"sort": [
{
"#timestamp": {
"order": "desc",
"unmapped_type": "boolean"
}
}
],
"_source": {
"excludes": []
},
"aggs": {
"target_traffic": {
"date_histogram": {
"field": "#timestamp",
"interval": "1d",
"time_zone": "Asia/Kolkata",
"min_doc_count": 0,
"extended_bounds": {
"min": "now-6d/d",
"max": "now"
}
},
"aggs": {
"days_filter": {
"filter": {
"range": {
"#timestamp": {
"gt": "now-6d",
"lte": "now"
}
}
},
"aggs": {
"in_bytes": {
"sum": {
"field": "netflow.in_bytes"
}
},
"out_bytes": {
"sum": {
"field": "netflow.out_bytes"
}
}
}
}
}
}
},
"stored_fields": [
"*"
],
"script_fields": {},
"docvalue_fields": [
"#timestamp",
"netflow.first_switched",
"netflow.last_switched"
],
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "( flow.src_addr: ( \"10.5.5.1\" OR \"10.5.5.2\" ) OR flow.dst_addr: ( \"10.5.5.1\" OR \"10.5.5.2\" ) ) AND flow.traffic_locality: \"private\"",
"analyze_wildcard": true,
"default_field": "*"
}
}
]
}
}
}
If you put the range filter inside your aggregation section without any date range in your query, what is going to happen is that your aggregations will run on all your data and metrics will be bucketed by day over all your data.
The range query on #timestamp should be moved inside the query section so as to compute aggregations only on the data you want, i.e. the last 6 days.

Elasticsearch - adding a separate query for aggregation

Below is the elasticsearch query I am using to get the results and the filter options for the results from the aggregation. The problem is that whenever someone applies a filter, the overall result changes and hence the filter options also changes. I do not want the filter options to changes unless query parameter change. For now I am making two calls:
get all results without aggregation
Get all filters by using aggregation and setting the size parameter to 0
This approach uses 2 api requests and hence doubling the time. Can this be done in one request only ?
First call: All results without aggregation
{
"query": {
"bool": {
"should": [
{
"match_phrase": {
"title": {
"query": "cooking",
"boost": 2,
"slop": 10
}
}
},
{
"match": {
"title": {
"query": "cooking",
"boost": 1
}
}
}
],
"minimum_should_match": 1,
"filter": [
{
"match": {
"is_paid": false
}
}
]
}
},
"sort": [],
"from": 0,
"size": 15
}
Second call: getting filters
{
"query": {
"bool": {
"should": [
{
"match_phrase": {
"title": {
"query": "cooking",
"boost": 2,
"slop": 10
}
}
},
{
"match": {
"title": {
"query": "cooking",
"boost": 1
}
}
}
],
"minimum_should_match": 1
}
},
"size": 0,
"aggs": {
"courseCount": {
"terms": {
"field": "provider",
"size": 100
}
},
"paidCount": {
"terms": {
"field": "is_paid",
"size": 3
}
},
"subjectCount": {
"terms": {
"field": "subject",
"size": 30
}
},
"levelCount": {
"terms": {
"field": "level",
"size": 4
}
},
"pacingCount": {
"terms": {
"field": "pacing_type",
"size": 4
}
}
}
}

elasticsearch facets OR filter

I have a problem with my elasticsearch DSL, in that when using facet navigation, when I apply my facet filter, the next set of results don't include any further facets, even though I've asked for them.
When I do the initial search, I get the results I want back:
{
"sort": {
"_score": {},
"salesQuantity": {
"order": "asc"
}
},
"query": {
"filtered": {
"query": {
"match": {
"categoryTree": "D01"
}
},
"filter": {
"term": {
"publicwebEnabled": true,
"parentID": 0
}
}
}
},
"facets": {
"delivery_locations": {
"terms": {
"field": "delivery_locations",
"all_terms": true
}
},
"categories": {
"terms": {
"field": "categoryTree",
"all_terms": true
}
},
"collectable": {
"terms": {
"field": "collectable",
"all_terms": true
}
}
},
"from": 0,
"size": 12}
When I then apply a filter like so, the results I get back do not include the facets:
{
"sort": {
"_score": {},
"salesQuantity": {
"order": "asc"
}
},
"query": {
"filtered": {
"query": {
"match": {
"categoryTree": "D01"
}
},
"filter": {
"term": {
"publicwebEnabled": true,
"parentID": 0
},
"or": [
{
"range": {
"Retail_Price": {
"to": "49.99",
"from": "0"
}
}
}
]
}
}
},
"facets": {
"delivery_locations": {
"terms": {
"field": "delivery_locations",
"all_terms": true
}
},
"categories": {
"terms": {
"field": "categoryTree",
"all_terms": true
}
},
"collectable": {
"terms": {
"field": "collectable",
"all_terms": true
}
}
},
"from": 0,
"size": 12}
NOTE, I'm adding the OR filter above - because users may choose multiple price ranges to filter on.
Am I doing something wrong?
I want the new facets returned as altering the prices would obviously alter the facet counts of the other facets...
Add the original term-filter inside the or-filter, or add another boolean filter to wrap your whole filter inside a boolean expression. I dont think you can add the two filters just by comma-separating them like that.

Resources