is it possible to do item aggregtions only on the aggregations that are returned in the current query? - elasticsearch

I'm attempting to enrich my documents with some aggregations about them.
"item": {
"terms": {
"field": "_id",
"size": 50
},
"aggs": {
"avgSize": {
"avg": {
"field": "contracts.dollarsObligated"
}
},
"awardingAgency": {
"terms": {
"field": "contracts.awardingAgency.keyword"
}
},
}
}
I'm going to return 50 results per page, is there any way to ensure that those aggregations are run on the 50 aggregations that are returned in this query? Or should I do a second query for that ?

Related

How to use pipeline aggs on elasticsearch top aggs

I'd like to filter the elasticsearch aggs result, on the first aggs. At first I thought the bucket selector in the sub-aggs will filter the inner aggs, but in fact I found it worked on the first aggs, then I wondered how I could filter on the inner aggs. I tried to put the bucket_selector as slibing to the first aggs, didn't work, put it as sibling to the inner aggs and adjusted the bucket_path, didn't find the way.
My working aggs, it filter the first aggs result:
GET network/_search
{
"size": 0,
"aggs": {
"asns": {
"terms": {
"field": "asn",
"size": 100000
},
"aggs": {
"users": {
"terms": {
"field": "user.keyword",
"size": 1000
}
},
"asns_bucket_filter": {
"bucket_selector": {
"buckets_path": {
"num": "_count"
},
"script": "params.num <= 200"
}
}
}
}
}
}
I read this https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-pipeline.html#buckets-path-syntax, and tried this and so on:
GET network/_search
{
"size": 0,
"aggs": {
"asns": {
"terms": {
"field": "asn",
"size": 100000
},
"aggs": {
"users": {
"terms": {
"field": "user.keyword",
"size": 1000
}
},
"asns_bucket_filter": {
"bucket_selector": {
"buckets_path": {
"num": "asns>users._count"
},
"script": "params.num <= 200"
}
}
}
}
}
}

How to do proportions in Elastic search query

I have a field in my data that has four unique values for all the records. I have to aggregate the records based on each unique value and find the proportion of each field in the data. Essentially, (Number of records in each unique field/total number of records). Is there a way to do this with elastic search dashboards? I have used terms aggregation to aggregate the fields and applied value_count metric aggregation to get the doc_count value. But I am not able to use the bucket script to do the division. I am getting the error ""buckets_path must reference either a number value or a single value numeric metric aggregation, got: [StringTerms] at aggregation [latest_version]""
Below is my code:
{
"size": 0,
"aggs": {
"BAR": {
"date_histogram": {
"field": "timestamp",
"calendar_interval": "day"
},
"aggs": {
"latest_version": {
"filter": {
"match_phrase": {
"log": "main_filter"
}
},
"aggs": {
"latest_version_count": {
"terms": {
"field": "field_name"
},
"aggs": {
"version_count": {
"value_count": {
"field": "field_name"
}
}
}
},
"sum_buckets": {
"sum_bucket": {
"buckets_path": "latest_version_count>_count"
}
}
}
},
"BAR-percentage": {
"bucket_script": {
"buckets_path": {
"eachVersionCount": "latest_version>latest_version_count",
"totalVersionCount": "latest_version>sum_buckets"
},
"script": "params.eachVersionCount/params.totalVersionCount"
}
}
}
}
}
}

Elasticsearch query shows more data than it has

In my contains field I have "xr" data and "xra","xrb","xrc" seperately. When I make query for the count of "xr" elasticsearch does not return me 1, it returns 4. How can I manage it?
This is my query
"aggs": {
"Group1": {
"terms": {
"field": "method.keyword",
"include": ".*POST.*",
},
"aggs": {
"Group3": {
"terms": {
"field": "contains.keyword",
"size": 11593,
}
}
},
}

How to get specific _source fields in aggregation

I am exploring ElasticSearch, to be used in an application, which will handle large volumes of data and generate some statistical results over them. My requirement is to retrieve certain statistics for a particular field. For example, for a given field, I would like to retrieve its unique values and document frequency of each value, along-with the length of the value. The value lengths are indexed along-with each document.
So far, I have experimented with Terms Aggregation, with the following query:
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"type_count": {
"terms": {
"field": "val.keyword",
"size": 100
}
}
}
}
The query returns all the values in the field val with the number of documents in which each value occurs. I would like the field val_len to be returned as well. Is it possible to achieve this using ElasticSearch? In other words, is it possible to include specific _source fields in buckets? I have looked through the documentation available online, but I haven't found a solution yet.
Hoping somebody could point me in the right direction. Thanks in advance!
I tried to include _source in the following manners:
"aggs": {
"type_count": {
"terms": {
"field": "val.keyword",
"size": 100
},
"_source":["val_len"]
}
}
and
"aggs": {
"type_count": {
"terms": {
"field": "val.keyword",
"size": 100,
"_source":["val_len"]
}
}
}
But I guess this isn't the right way, because both gave me parsing errors.
You need to use another sub-aggregation called top_hits, like this:
"aggs": {
"type_count": {
"terms": {
"field": "val.keyword",
"size": 100
},
"aggs": {
"hits": {
"top_hits": {
"_source":["val_len"],
"size": 1
}
}
}
}
}
Another way of doing it is to use another avg sub-aggregation so you can sort on it, too
"aggs": {
"type_count": {
"terms": {
"field": "val.keyword",
"size": 100,
"order": {
"length": "desc"
}
},
"aggs": {
"length": {
"avg": {
"field": "val_len"
}
}
}
}
}

multiple metric sub aggregations situation with ElasticSearch

I am aware that Elasticsearch supports sub aggregations with bucketing (where bucketing aggregation can have bucketing or metric sub aggregations). Sub aggregation isn't possible with metric aggregations. May be that makes sense but here is the use case.
I have term aggregation as a parent. And using another term aggregation as a child of it. child term has a child aggregation of type top_hits. top_hits is a metric aggregation so it can't take any child aggregation. And now need to include avg aggregation into the mix. Given top_hits is the last aggregation in the aggregation tree can't have avg as a child to it since top_hits is a metric aggregation.
following is the desired aggregation levels. (of course it's invalid given top_hits is a metric aggregation and true for avg aggregation too.
{
"aggregations": {
"top_makes": {
"terms": {
"field": "make"
},
"aggregations": {
"top_models": {
"terms": {
"field": "model"
},
"aggregations": {
"top_res": {
"top_hits": {
"_source": {
"include": [
"model",
"color"
]
},
"size": 10
}
}
}
}
},
"aggregations": {
"avg_length": {
"avg": {
"field": "vlength"
}
}
}
}
}
}
What's the workaround or best way to address this?
I think this will work , verify ..
{
"aggregations": {
"top_makes": {
"terms": {
"field": "make"
},
"aggregations": {
"top_models": {
"terms": {
"field": "model"
},
"aggregations": {
"top_res": {
"top_hits": {
"_source": {
"include": [
"model",
"color"
]
},
"size": 10
}
}
},
"avg_length": {
"avg": {
"field": "vlength"
}
}
}
}
}
}
}
The point is you can have 1 or more sibblings (sub aggregation) for a parent aggregation.

Resources