Elasticsearch bucket_selector to include parent aggregation size - elasticsearch

I found this query in elasticsearch docs
link to docs
GET /seats/_search
{
"size": 0,
"aggs": {
"theatres": {
"terms": {
"field": "theatre",
"size": 10
},
"aggs": {
"max_cost": {
"max": {
"field": "cost"
}
},
"filtering_agg": {
"bucket_selector": {
"buckets_path": {
"max": "max_cost"
},
"script": {
"params": {
"base_cost": 5
},
"source": "params.max + params.base_cost > 10"
}
}
}
}
}
}
}
here size is 10 in theatre parent aggregation and inside child aggregation we have condition on bucket, so if parent aggregation gives 10 documents and bucket_selector filter removes 4 documents then final result has 6 documents, i want final result to have 10 documents as mentioned in parent aggregation size field, is there any way to achieve this. can we include size after bucket_selector and remove size for parent aggregation.

Related

How to do proportions in Elastic search query

I have a field in my data that has four unique values for all the records. I have to aggregate the records based on each unique value and find the proportion of each field in the data. Essentially, (Number of records in each unique field/total number of records). Is there a way to do this with elastic search dashboards? I have used terms aggregation to aggregate the fields and applied value_count metric aggregation to get the doc_count value. But I am not able to use the bucket script to do the division. I am getting the error ""buckets_path must reference either a number value or a single value numeric metric aggregation, got: [StringTerms] at aggregation [latest_version]""
Below is my code:
{
"size": 0,
"aggs": {
"BAR": {
"date_histogram": {
"field": "timestamp",
"calendar_interval": "day"
},
"aggs": {
"latest_version": {
"filter": {
"match_phrase": {
"log": "main_filter"
}
},
"aggs": {
"latest_version_count": {
"terms": {
"field": "field_name"
},
"aggs": {
"version_count": {
"value_count": {
"field": "field_name"
}
}
}
},
"sum_buckets": {
"sum_bucket": {
"buckets_path": "latest_version_count>_count"
}
}
}
},
"BAR-percentage": {
"bucket_script": {
"buckets_path": {
"eachVersionCount": "latest_version>latest_version_count",
"totalVersionCount": "latest_version>sum_buckets"
},
"script": "params.eachVersionCount/params.totalVersionCount"
}
}
}
}
}
}

is it possible to do item aggregtions only on the aggregations that are returned in the current query?

I'm attempting to enrich my documents with some aggregations about them.
"item": {
"terms": {
"field": "_id",
"size": 50
},
"aggs": {
"avgSize": {
"avg": {
"field": "contracts.dollarsObligated"
}
},
"awardingAgency": {
"terms": {
"field": "contracts.awardingAgency.keyword"
}
},
}
}
I'm going to return 50 results per page, is there any way to ensure that those aggregations are run on the 50 aggregations that are returned in this query? Or should I do a second query for that ?

Elasticsearch top_hits aggregation

I have to get top N documents from multiple indices, then group the resulting set by index. I've tried the following:
{
"size": 0,
"query": {
"multi_match" : {
"query": "some term"
}
},
"aggs": {
"by_index": {
"terms": {
"field": "_index"
},
"aggs": {
"top_results": {
"top_hits": {
"size": 20
}
}
}
}
}
}
It aggregates results by _index and then limits each group to N (20) documents. But I need to receive no more than 20 documents in total.

Search document from an index which _ids does not exists in another index

Is any way to search documents from one index that _ids do not exists in another index? Something like NOT EXISTS in MySQL.
There is no convinient way to do this within elasticsearch but only by using aggregations as a workaround:
GET index-a,index-b/_search
{
"size": 0,
"aggs": {
"group_by_id": {
"terms": {
"field": "_id",
"size": 1000
},
"aggs": {
"containied_in_indices_count": {
"cardinality": {
"field": "_index"
}
},
"filter_only_differences": {
"bucket_selector": {
"buckets_path": {
"count": "containied_in_indices_count"
},
"script": "params.count < 2"
}
}
}
}
}
}
Then you'll need to iterate over all buckets in group_by_id aggregation. Consider using a larger size, as it´s 10 by default and 1000 in my example. If there are more differences in your indicies you need to use bucket partitioning as described here: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#_filtering_values_with_partitions

multiple metric sub aggregations situation with ElasticSearch

I am aware that Elasticsearch supports sub aggregations with bucketing (where bucketing aggregation can have bucketing or metric sub aggregations). Sub aggregation isn't possible with metric aggregations. May be that makes sense but here is the use case.
I have term aggregation as a parent. And using another term aggregation as a child of it. child term has a child aggregation of type top_hits. top_hits is a metric aggregation so it can't take any child aggregation. And now need to include avg aggregation into the mix. Given top_hits is the last aggregation in the aggregation tree can't have avg as a child to it since top_hits is a metric aggregation.
following is the desired aggregation levels. (of course it's invalid given top_hits is a metric aggregation and true for avg aggregation too.
{
"aggregations": {
"top_makes": {
"terms": {
"field": "make"
},
"aggregations": {
"top_models": {
"terms": {
"field": "model"
},
"aggregations": {
"top_res": {
"top_hits": {
"_source": {
"include": [
"model",
"color"
]
},
"size": 10
}
}
}
}
},
"aggregations": {
"avg_length": {
"avg": {
"field": "vlength"
}
}
}
}
}
}
What's the workaround or best way to address this?
I think this will work , verify ..
{
"aggregations": {
"top_makes": {
"terms": {
"field": "make"
},
"aggregations": {
"top_models": {
"terms": {
"field": "model"
},
"aggregations": {
"top_res": {
"top_hits": {
"_source": {
"include": [
"model",
"color"
]
},
"size": 10
}
}
},
"avg_length": {
"avg": {
"field": "vlength"
}
}
}
}
}
}
}
The point is you can have 1 or more sibblings (sub aggregation) for a parent aggregation.

Resources