How to use multiple Composite Aggregations in ElasticSearch? - elasticsearch

I am trying to obtain two composite aggregations in ElasticSearch but the second one is always giving me an empty bucket.
GET /resolutions/_search
{
"query": {
"query_string": {
"query": "*"
}
},
"aggs": {
"total": {
"composite": {
"sources": [
{"doi": {"terms": {"field": "doi"}}},
{"access_method": {"terms": {"field": "access_method"}}}
],
"size": 10000
}
},
"unqiue": {
"composite": {
"sources": [
{"doi": {"terms": {"field": "doi"}}},
{"access_method": {"terms": {"field": "access_method"}}},
{"session": {"terms": {"field": "session"}}}
],
"size": 10000
}
}
},
"size": 0,
"track_total_hits": false
}
In the response, you can see the first aggregation (total) with 1000s of objects in the bucket but the second one aggreagtion (unique) is always empty. I have tried swaping the order of the aggregations and it's always the second one in order that is empty.
[![Reponse with second bucket empty][2]][2]
The index mapping are in: https://github.com/datacite/shiba-inu/blob/2d632d341a22a8dca2afec3b01c3b34030144c9c/templates/aggregating_es.json
Why is it returning an empty bucket?

The "after_key" indicates that there are still results left. The search returned the first page. For further pagination you need to repeat the same request with "after" set to the value from the "after_key". Repeat this with every new after_key until the after_key is missing.
Example from elastic
GET /_search
{
"size": 0,
"aggs": {
"my_buckets": {
"composite": {
"size": 2,
"sources": [
{ "date": { "date_histogram": { "field": "timestamp", "calendar_interval": "1d", "order": "desc" } } },
{ "product": { "terms": { "field": "product", "order": "asc" } } }
],
"after": { "date": 1494288000000, "product": "mad max" }
}
}
}
}

Related

Elastic - Filter after selecting top 5 hits

I'm using the alerting feature in Kibana and I want to check if the last 5 consecutive values of a field exceed a threshold x but if I use a filter in my elastic query, it gets applied before the top N aggregation.
Is there a way in which I can apply the filter after or check if the last consecutive values exceed a threshold using some other selector or method? I don't want to check this in the trigger condition in painless because that will return all the documents in the ctx and not just the ones which exceeded the threshold which I want to display in my alert message.
I've been stuck with this for a while and I have only seen blog posts saying sub aggregation is not possible on top N so any help or work around would be much appreciated.
This is my query :
{
"size": 500,
"query": {
"bool": {
"filter": [
{
"match_all": {
"boost": 1
}
},
{
"match_phrase": {
"client.id": {
"query": "42",
"slop": 0,
"zero_terms_query": "NONE",
"boost": 1
}
}
},
{
"range": {
"#timestamp": {
"from": "{{period_end}}||-10m",
"to": "{{period_end}}",
"include_lower": true,
"include_upper": true,
"format": "epoch_millis",
"boost": 1
}
}
}
],
"adjust_pure_negative": true,
"boost": 1
}
},
"aggs": {
"2": {
"terms": {
"field": "component.name",
"order": {
"_key": "desc"
},
"size": 50
},
"aggs": {
"3": {
"terms": {
"field": "client.name.keyword",
"order": {
"_key": "desc"
},
"size": 5
},
"aggs": {
"1": {
"top_hits": {
"docvalue_fields": [
{
"field": "gc.oldgen.used",
"format": "use_field_mapping"
}
],
"_source": "gc.oldgen.used",
"size": 5,
"sort": [
{
"#timestamp": {
"order": "desc"
}
}
]
}
}
}
}
}
}
}
}
}
Did you try to use a sub filter aggregation:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-filter-aggregation.html
Or you can use a pipeline aggregation to manipulate your aggregations results
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-pipeline.html
by the way, a term query on the client id looks more appropriate.

Aggregating against two fields returns nulls for one of them

I've got an index with a lot of records with many fields, including "cacheName" & "cache_ip". Each unique value of "cacheName" has 1 or more records with 1 or more values of corresponding "cache_ip". Each record has a unique 'ts' (timestamp) field as well. For example:
{
"cacheName": "c001.abc001.xyz",
"cache_ip": "1.1.1.0",
},
{
"cacheName": "c001.abc001.xyz",
"cache_ip": "1.1.2.0",
},
{
"cacheName": "c002.efg001.mno",
"cache_ip": "1.1.9.1",
},
{
"cacheName": "c002.efg001.mno",
"cache_ip": "1.1.9.1",
},
I'm trying to craft a search that will return, at most, each unique 'cacheName' & 'cache_ip' record. For the above example, I would get back a total of 3 hits ("cacheName"="c002.efg001.mno" would only be returned once, since it only has one unique permutation).
This is the closest that I've come, but it always returns a Null value for "cache_ip" instead of the actual value (there are no null values in the actual data):
{
"size": 0, 'sort': [{'ts': {'order': 'desc'}}],
"query": {
"bool": {
"must": [
{"match_all": {}},
{"range": {'ts': {'gte': '20200818T010100Z', 'format': 'basic_date_time_no_millis'}}},
]
}
},
"aggs": {
"cacheName": {
"terms": {
"field": "cacheName",
"size": 10000, "order": {"_key": "desc"},
},
"aggs": {
"cache_ip": {"terms": {"field": "cache_ip"}},
},
},
},
}
I'd appreciate any insight, as I'm pulling my hair out trying to make this work.
thanks!
One way to achieve what you want to is use scripting to create all the permutations and you wouldn't need the second terms sub-aggregation:
{
"size": 0,
"sort": [
{
"ts": {
"order": "desc"
}
}
],
"query": {
"bool": {
"must": [
{
"range": {
"ts": {
"gte": "20200818T010100Z",
"format": "basic_date_time_no_millis"
}
}
}
]
}
},
"aggs": {
"cacheName": {
"terms": {
"script": {
"source": "[doc.cache_name.value ?: 'no.name', doc.cache_ip.value ?: 'no.ip'].join('-')"
},
"size": 10000,
"order": {
"_key": "desc"
}
}
}
}
}

How to convert ElasticSearch query to ES7

We are having a tremendous amount of trouble converting an old ElasticSearch query to a newer version of ElasticSearch. The original query for ES 1.8 is:
{
"query": {
"filtered": {
"query": {
"query_string": {
"query": "*",
"default_operator": "AND"
}
},
"filter": {
"and": [
{
"terms": {
"organization_id": [
"fred"
]
}
}
]
}
}
},
"size": 50,
"sort": {
"updated": "desc"
},
"aggs": {
"status": {
"terms": {
"size": 0,
"field": "status"
}
},
"tags": {
"terms": {
"size": 0,
"field": "tags"
}
}
}
}
and we are trying to convert it to ES version 7. Does anyone know how to do that?
The Elasicsearch docs for Filtered query in 6.8 (the latest version of the docs I can find that has the page) state that you should move the query and filter to the must and filter parameters in the bool query.
Also, the terms aggregation no longer support setting size to 0 to get Integer.MAX_VALUE. If you really want all the terms, you need to set it to the max value (2147483647) explicitly. However, the documentation for Size recommends using the Composite aggregation instead and paginate.
Below is the closest query I could make to the original that will work with Elasticsearch 7.
{
"query": {
"bool": {
"must": {
"query_string": {
"query": "*",
"default_operator": "AND"
}
},
"filter": {
"terms": {
"organization_id": [
"fred"
]
}
}
}
},
"size": 50,
"sort": {
"updated": "desc"
},
"aggs": {
"status": {
"terms": {
"size": 2147483647,
"field": "status"
}
},
"tags": {
"terms": {
"size": 2147483647,
"field": "tags"
}
}
}
}

How to mention from and size for the first level of elastic search aggregation in nested aggregation?

I have written a query to get the buckets based on id and then sort it. This works fine. But how to make it return buckets from position 100 till 200 for aggregation_by_id rule?
{
"query": {
"match_all": {}
},
"size": 0,
"aggregations": {
"aggregation_by_id": {
"terms": {
"field": "id.keyword"
"size" : 200
},
"aggs": {
"sort_timestamp": {
"top_hits": {
"sort": [{
"timestamp": {
"order": "desc",
"unmapped_type": "long"
}
}],
"size": 1
}
}
}
}
}
}

Elasticsearch single request to do Union query Top N

Not sure how to do SQL like union in Elasticsearch. I tried bool query but it doesn't meet my requirement yet. For example, the document structure is
{
"id": "123",
"authorId": 28,
"title": "Five Ways to Tap into...",
"byLine": "ashd jsabbdjs international",
"category": "Cat1"
}
I need to find top 5 matched "title" in each "category" when user types something. This can be done using multiple queries to Elasticsearch, but I was wondering if there are other ways to do it in one request.
Use an aggregation with top_hits sub-aggregation:
{
"size": 0,
"query": {"match_all": {}},
"aggs": {
"categories": {
"terms": {
"field": "category",
"size": 10
},
"aggs": {
"top_5": {
"top_hits": {
"size": 5
}
}
}
}
}
}
Here is query which returns multi buckets based on "category"
{
"size": 0,
"query": {
"bool": {
"must": [
{
"terms": {
"authorId": [
1,
28
]
}
}
],
"should": [
{
"query_string": {
"query": "*int*",
"fields": [
"title^2",
"byLine^1"
]
}
}
]
}
},
"aggs": {
"categories": {
"terms": {
"field": "category",
"size": 10
},
"aggs": {
"top_5": {
"top_hits": {
"size": 5
}
}
}
}
}
}

Resources