multiple metric sub aggregations situation with ElasticSearch - elasticsearch

I am aware that Elasticsearch supports sub aggregations with bucketing (where bucketing aggregation can have bucketing or metric sub aggregations). Sub aggregation isn't possible with metric aggregations. May be that makes sense but here is the use case.
I have term aggregation as a parent. And using another term aggregation as a child of it. child term has a child aggregation of type top_hits. top_hits is a metric aggregation so it can't take any child aggregation. And now need to include avg aggregation into the mix. Given top_hits is the last aggregation in the aggregation tree can't have avg as a child to it since top_hits is a metric aggregation.
following is the desired aggregation levels. (of course it's invalid given top_hits is a metric aggregation and true for avg aggregation too.
{
"aggregations": {
"top_makes": {
"terms": {
"field": "make"
},
"aggregations": {
"top_models": {
"terms": {
"field": "model"
},
"aggregations": {
"top_res": {
"top_hits": {
"_source": {
"include": [
"model",
"color"
]
},
"size": 10
}
}
}
}
},
"aggregations": {
"avg_length": {
"avg": {
"field": "vlength"
}
}
}
}
}
}
What's the workaround or best way to address this?

I think this will work , verify ..
{
"aggregations": {
"top_makes": {
"terms": {
"field": "make"
},
"aggregations": {
"top_models": {
"terms": {
"field": "model"
},
"aggregations": {
"top_res": {
"top_hits": {
"_source": {
"include": [
"model",
"color"
]
},
"size": 10
}
}
},
"avg_length": {
"avg": {
"field": "vlength"
}
}
}
}
}
}
}
The point is you can have 1 or more sibblings (sub aggregation) for a parent aggregation.

Related

How to do proportions in Elastic search query

I have a field in my data that has four unique values for all the records. I have to aggregate the records based on each unique value and find the proportion of each field in the data. Essentially, (Number of records in each unique field/total number of records). Is there a way to do this with elastic search dashboards? I have used terms aggregation to aggregate the fields and applied value_count metric aggregation to get the doc_count value. But I am not able to use the bucket script to do the division. I am getting the error ""buckets_path must reference either a number value or a single value numeric metric aggregation, got: [StringTerms] at aggregation [latest_version]""
Below is my code:
{
"size": 0,
"aggs": {
"BAR": {
"date_histogram": {
"field": "timestamp",
"calendar_interval": "day"
},
"aggs": {
"latest_version": {
"filter": {
"match_phrase": {
"log": "main_filter"
}
},
"aggs": {
"latest_version_count": {
"terms": {
"field": "field_name"
},
"aggs": {
"version_count": {
"value_count": {
"field": "field_name"
}
}
}
},
"sum_buckets": {
"sum_bucket": {
"buckets_path": "latest_version_count>_count"
}
}
}
},
"BAR-percentage": {
"bucket_script": {
"buckets_path": {
"eachVersionCount": "latest_version>latest_version_count",
"totalVersionCount": "latest_version>sum_buckets"
},
"script": "params.eachVersionCount/params.totalVersionCount"
}
}
}
}
}
}

Deduplicate and perform composite aggregation on deduced result

I've an index in elastic search which contains data of daily transactions. Each doc has mainly three fields as below :
TxnId, Status, TxnType,userId
two documents can have same TxnIds.
I'm looking for a query that provides aggregation over status,TxnType for unique txnIds. Basically I'm looking for something like : select unique txnIds from user_table group by status,txnType.
I've a ES query which will dedup on TxnIds. I've another ES query which can perform composite aggregation on status and txnType. I want to do both things in Single query.
I tried collapse feature . I also tried cardinality and dedup features. But query is not giving correct output.:
{
"size": 0,
"query": {
"bool": {
"filter": [
{
"term": {
"streamSource": 3
}
}
]
}
},
"collapse": {
"field": "txnId"
},
"aggs": {
"buckets": {
"composite": {
"size": 30,
"sources": [
{
"status": {
"terms": {
"field": "status"
}
}
},
{
"txnType": {
"terms": {
"field": "txnType"
}
}
}
]
}
}
}
}

Elasticsearch bucket_selector to include parent aggregation size

I found this query in elasticsearch docs
link to docs
GET /seats/_search
{
"size": 0,
"aggs": {
"theatres": {
"terms": {
"field": "theatre",
"size": 10
},
"aggs": {
"max_cost": {
"max": {
"field": "cost"
}
},
"filtering_agg": {
"bucket_selector": {
"buckets_path": {
"max": "max_cost"
},
"script": {
"params": {
"base_cost": 5
},
"source": "params.max + params.base_cost > 10"
}
}
}
}
}
}
}
here size is 10 in theatre parent aggregation and inside child aggregation we have condition on bucket, so if parent aggregation gives 10 documents and bucket_selector filter removes 4 documents then final result has 6 documents, i want final result to have 10 documents as mentioned in parent aggregation size field, is there any way to achieve this. can we include size after bucket_selector and remove size for parent aggregation.

Excluding inner hits from top hits aggregation with source filter

In my query, I am using the inner_hits to return the list of nested objects that match my query.
I then add an aggregations for categoryId of my document, and then a top hit aggregation to get the display name for that category.
"aggs": {
"category": {
"terms": {
"field": "categoryId",
"size": 100
},
"aggs": {
"category_value": {
"top_hits": {
"size": 1,
"_source": {
"includes": "categoryName"
}
}
}
}
}
}
Now, when I look at the aggregation buckets, I do get a _source document with only the categoryName property, but I also get the entire inner_hits collection:
{
...
"_source": {
"categoryName": "Armchairs"
},
"inner_hits": {
"my_inner_hits": {
"hits": {
"total": 260,
"max_score": null,
"hits": [{
...
"_source": {
//nested document here
}
}
]
}
}
}
}
Is there a way to not include the inner_hits data in a top_hits aggregation?
Since you only need a single field, what I suggest you to do is to get rid of top_hits aggregation and use another terms aggregation for the name:
{
...
"aggs": {
"category": {
"terms": {
"field": "categoryId",
"size": 100
},
"aggs": {
"category_value": {
"terms": {
"field": "categoryName",
"size": 1
}
}
}
}
}
}
That will also be a little bit more efficient.
UPDATE:
Another way to keep using terms/top_hits is to leverage response filtering and only return what you need. For instance, appending this to your URL will make sure that you won't find any inner hits inside your aggregation
?filter_path=hits.hits,aggregations.**.key,aggregations.**.doc_count,aggregations.**.hits.hits.hits._source

ElasticSearch - Ordering aggregation by nested aggregation on nested field

{
"query": {
"match_all": {}
},
"from": 0,
"size": 0,
"aggs": {
"itineraryId": {
"terms": {
"field": "iid",
"size": 2147483647,
"order": [
{
"price>price>price.max": "desc"
}
]
},
"aggs": {
"duration": {
"stats": {
"field": "drn"
}
},
"price": {
"nested": {
"path": "prl"
},
"aggs": {
"price": {
"filter": {
"terms": {
"prl.cc.keyword": [
"USD"
]
}
},
"aggs": {
"price": {
"stats": {
"field": "prl.spl.vl"
}
}
}
}
}
}
}
}
}
}
Here, I am getting the error:
"Invalid terms aggregation order path [price>price>price.max]. Terms
buckets can only be sorted on a sub-aggregator path that is built out
of zero or more single-bucket aggregations within the path and a final
single-bucket or a metrics aggregation at the path end. Sub-path
[price] points to non single-bucket aggregation"
query works fine if I order by duration aggregation like
"order": [
{
"duration.max": "desc"
}
So is there any way to Order aggregation by nested aggregation on nested field i.e something like below ?
"order": [
{
"price>price>price.max": "desc"
}
As Val has pointed out in the comments ES does not support it yet.
Till then you can first aggregate the nested aggregation and then use the reverse nested aggregation to aggregate the duration, that is present in the root of the document.
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-reverse-nested-aggregation.html

Resources