Sort aggregation buckets by shared field values - elasticsearch

I would like to group documents based on a group field G. I use the „field aggregation“ strategy described in the Elastic documention to sort the buckets by the maximal score of the contained documents (called 'field collapse example in the Elastic doc), like this:
{
"query": {
"match": {
"body": "elections"
}
},
"aggs": {
"top_sites": {
"terms": {
"field": "domain",
"order": {
"top_hit": "desc"
}
},
"aggs": {
"top_tags_hits": {
"top_hits": {}
},
"top_hit" : {
"max": {
"script": {
"source": "_score"
}
}
}
}
}
}
}
This query also includes the top hits in each bucket.
If the maximal score is not unique for the buckets, I would like to specify a second order column. From the application context I know that inside a bucket all documents share the same value for a field F. Therefore, this field should be employed as the second order column.
How can I realize this in Elastic? Is there a way to make a field from the top hits subaggregation useable in the enclosing aggregation?
Any ideas? Many thanks!

It seems you can. In this page all the sorting strategy for terms aggregation are listed.
And they is an example of multi criteria buckets sorting :
Multiple criteria can be used to order the buckets by providing an
array of order criteria such as the following:
GET /_search
{
"aggs" : {
"countries" : {
"terms" : {
"field" : "artist.country",
"order" : [ { "rock>playback_stats.avg" : "desc" }, { "_count" : "desc" } ]
},
"aggs" : {
"rock" : {
"filter" : { "term" : { "genre" : "rock" }},
"aggs" : {
"playback_stats" : { "stats" : { "field" : "play_count" }}
}
}
}
}
}
}

Related

How to get maximum value and id using Max aggregation by country in Elasticsearch

Getting maximum value by country but I want additional information for maximum value id. I tried many ways but I don't know how to fetch.
{
"aggs" : {
"country_groups" : {
"terms" : { "field" : "country.keyword",
"size":30000
},
"aggs":{
"max_price":{
"max": { "field" : "video_count"}
}
}
}
}
}
Depending on the type of your id field (numeric or string), you have two ways of doing it.
If you look at the query below, if your id is numeric you can do the same as you did with video_count, i.e. using the max metric aggregation (see max_id_num).
However, if your id field is a string, you can leverage the top_hits aggregation and sort it in descending order (see max_id_str).
{
"aggs": {
"country_groups": {
"terms": {
"field": "country.keyword",
"size": 30000
},
"aggs": {
"max_price_and_id": {
"top_hits": {
"size": 1,
"sort": {
"video_count": "desc"
},
"_source": ["channel_id", "video_count"]
}
}
}
}
}
}

ElasticSearch: Sort Aggregations by Filtered Average

I have an ElasticSearch index with documents structured like this:
"created": "2019-07-31T22:44:41.437Z",
"id": "2956",
"rating": 1
If I wish to create an aggregation of the id fields which is sorted on the average of the rating, that could be handled by:
{
"aggs" : {
"sorted" : {
"terms" : {
"field" : "id",
"order" : { "sort" : "asc" }
},
"aggs" : {
"sort" : {
"avg" : {
"field" : "rating"
}
}
}
}
}
}
However, I'm looking to only factor in documents which have a created value that was within the last week (and then take the average of those rating fields).
My naive thoughts on this would be to apply a filter or range within the sort aggregation, but an aggregation cannot have multiple types, and looking through the avg documentation, I don't see a means to put it in the avg. Optimistically attempting to put range fields in the avg regardless of what the documentation says yielded no results (as expected).
How would I go about achieving this?
Try adding a bool query to the body with a range query:
{
query:
bool: {
must: {
"range": {
"created_time": {
"gte": one_week_ago,
}
}
}
}
},
{
"aggs" : {
"sorted" : {
"terms" : {
"field" : "id",
"order" : { "sort" : "asc" }
},
"aggs" : {
"sort" : {
"avg" : {
"field" : "rating"
}
}
}
}
}
}
and you can query for dynamic dates like this
as Tom referred but use "now-7d/d"
{
query:
bool: {
must: {
"range": {
"created_time": {
"gte": "now-7d/d"
}
}
}
}
}

Get topmost aggregation in elasticsearch

I am trying to find the count of different path parameters using elasticsearch query
{
"size":0,
"aggs" : {
"genres" : {
"terms" : {
"field" : "path.keyword"
}
}
}
However it is not returning the path with highest counts. Its returning some random 10 paths with counts. To get paths with topmost frequencies, I modified it to
{
"size":0,
"aggs" : {
"genres" : {
"terms" : {
"field" : "path.keyword"
}
},
"aggs": {
"top_hits" : {
"size":11
}
}
}
}
But it doesn't change previous response instead adds some new documents in response. I can't find a way to get topmost frequencies. Please suggest some way.
The order of the buckets can be customized by setting the order parameter. By default, the buckets are ordered by their doc_count descending. It is possible to change this behaviour as documented as below:. see
GET _search
{
"size": 0,
"aggs": {
"genres": {
"terms": {
"field": "path.keyword",
"size": 100,
"order" : { "_count" : "asc" }
}
}
}
}

Timeseries histogram of data with Elasticsearch

I have a list of documents organized as followed:
{
"date": "2010-12-12" // Some valid datetime string
"category": "some_category" // This can be any string
}
I need to create a frequency distribution for the data within buckets of time. I have looked at the date_histogram API but that only gets me halfway there.
{
"size": 0,
"aggs" : {
"my_search" : {
"date_histogram" : {
"field" : "date",
"interval" : "1s"
}
}
}
}
Which returns me the count of my data that falls into all 1 second buckets. Within those 1 second buckets, I also need to aggregate all of the data into type category buckets, such that I'm left with buckets of time with counts of category within each bucket. Is there a built in method to do this?
You're on the right path, you simply need to add another terms sub-aggregation for the category field:
{
"size": 0,
"aggs" : {
"my_search" : {
"date_histogram" : {
"field" : "date",
"interval" : "1s"
},
"aggs": {
"categories": {
"terms": {
"field": "category"
}
}
}
}
}
}

filter by child frequency in ElasticSearch

I currently have parents indexed in elastic search (documents) and child (comments) related to these documents.
My first objective was to search for a document with more than N comments, based on a child query. Here is how I did it:
documents/document/_search
{
"min_score": 0,
"query": {
"has_child" : {
"type" : "comment",
"score_type" : "sum",
"boost": 1,
"query" : {
"range": {
"date": {
"lte": 20130204,
"gte": 20130201,
"boost": 1
}
}
}
}
}
}
I used score to calculate the amount of comments a document has and then I filtered the documents by this amount, using "min_score".
Now, my objective is to search not just comments, but several other child documents related to the document, always based on frequency. Something like the query bellow:
documents/document/_search
{
"query": {
"match_all": {
}
},
"filter" : {
"and" : [{
"query": {
"has_child" : {
"type" : "comment",
"query" : {
"range": {
"date": {
"lte": 20130204,
"gte": 20130201
}
}
}
}
}
},
{
"or" : [
{"query": {
"has_child" : {
"type" : "comment",
"query" : {
"match": {
"text": "Finally"
}
}
}
}
},
{ "query": {
"has_child" : {
"type" : "comment",
"query" : {
"match": {
"text": "several"
}
}
}
}
}
]
}
]
}
}
The query above works fine, but it doesn't filter based on frequency as the first one does. As filters are computed before scores are calculated, I cannot use min_score to filter each child query.
Any solutions to this problem?
There is no score at all associated with filters. I'd suggest to move the whole logic to the query part and use a bool query to combine the different queries together.

Resources