ElasticSearch: Sort Aggregations by Filtered Average - elasticsearch

I have an ElasticSearch index with documents structured like this:
"created": "2019-07-31T22:44:41.437Z",
"id": "2956",
"rating": 1
If I wish to create an aggregation of the id fields which is sorted on the average of the rating, that could be handled by:
{
"aggs" : {
"sorted" : {
"terms" : {
"field" : "id",
"order" : { "sort" : "asc" }
},
"aggs" : {
"sort" : {
"avg" : {
"field" : "rating"
}
}
}
}
}
}
However, I'm looking to only factor in documents which have a created value that was within the last week (and then take the average of those rating fields).
My naive thoughts on this would be to apply a filter or range within the sort aggregation, but an aggregation cannot have multiple types, and looking through the avg documentation, I don't see a means to put it in the avg. Optimistically attempting to put range fields in the avg regardless of what the documentation says yielded no results (as expected).
How would I go about achieving this?

Try adding a bool query to the body with a range query:
{
query:
bool: {
must: {
"range": {
"created_time": {
"gte": one_week_ago,
}
}
}
}
},
{
"aggs" : {
"sorted" : {
"terms" : {
"field" : "id",
"order" : { "sort" : "asc" }
},
"aggs" : {
"sort" : {
"avg" : {
"field" : "rating"
}
}
}
}
}
}

and you can query for dynamic dates like this
as Tom referred but use "now-7d/d"
{
query:
bool: {
must: {
"range": {
"created_time": {
"gte": "now-7d/d"
}
}
}
}
}

Related

Sort aggregation buckets by shared field values

I would like to group documents based on a group field G. I use the „field aggregation“ strategy described in the Elastic documention to sort the buckets by the maximal score of the contained documents (called 'field collapse example in the Elastic doc), like this:
{
"query": {
"match": {
"body": "elections"
}
},
"aggs": {
"top_sites": {
"terms": {
"field": "domain",
"order": {
"top_hit": "desc"
}
},
"aggs": {
"top_tags_hits": {
"top_hits": {}
},
"top_hit" : {
"max": {
"script": {
"source": "_score"
}
}
}
}
}
}
}
This query also includes the top hits in each bucket.
If the maximal score is not unique for the buckets, I would like to specify a second order column. From the application context I know that inside a bucket all documents share the same value for a field F. Therefore, this field should be employed as the second order column.
How can I realize this in Elastic? Is there a way to make a field from the top hits subaggregation useable in the enclosing aggregation?
Any ideas? Many thanks!
It seems you can. In this page all the sorting strategy for terms aggregation are listed.
And they is an example of multi criteria buckets sorting :
Multiple criteria can be used to order the buckets by providing an
array of order criteria such as the following:
GET /_search
{
"aggs" : {
"countries" : {
"terms" : {
"field" : "artist.country",
"order" : [ { "rock>playback_stats.avg" : "desc" }, { "_count" : "desc" } ]
},
"aggs" : {
"rock" : {
"filter" : { "term" : { "genre" : "rock" }},
"aggs" : {
"playback_stats" : { "stats" : { "field" : "play_count" }}
}
}
}
}
}
}

How to group documents by hours in elastic search aggregation?

I tried to group my document by hours for a day through aggregation but always get exception "expected field name but got [START_OBJECT]"? What's the problem?
{
"query" : {
"bool" : {
"must" : {
"range" : {
"timestamp" : {
"from" : "2017-08-14 00:00:00",
"to" : "2017-08-15 00:00:00",
"include_lower" : true,
"include_upper" : true
}
}
}
}
},
"aggs": {
"result_by_hours": {
"histogram": {
"script": "doc.timestamp.date.getHourOfDay()",
"interval": 1
}
}
}
}
What I expect is to return the number of documents for each hour on yesterday. How can I use dynamic real time instead of "2017-08-14 - 2017-08-15"?
Thanks in advance:)
Depending on ES version, you can use range filter/query relative to "now", ex now-1d/d will go 1 day back in time.
See examples at https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-range-query.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-datehistogram-aggregation.html
As for the aggs you can also group by interval of for instance an hour using date_histogram with interval
In ES 5.5:
Query:
"range" : {
"timestamp" : {
"gte" : "now-1d/d,
"lte" : "now/d"
}
}
Aggs:
{
"aggs" : {
"values_over_time" : {
"date_histogram" : {
"field" : "timestamp",
"interval" : "1h"
}
}
}
}

Elasticsearch - Remove double results in search

I don't know how to remove double results with the same value in one field.
My Searchquery:
query :{
range : {
"endtime" : {
"lt" : "2017-02-09T20:00:00",
"gt" : "2017-02-09T01:00:00"
}
}
}
In my results there's one field called "link" which has often the same value (f.ex. https://www.facebook.com).
I would prefer a solution for my query, that would be great.
Thanks.
Greetings!
You can do a terms aggregation.
GET /cars/transactions/_search?search_type=count
{
"query": {
"range" : {
"endtime" : {
"gte" : "2017-02-09T20:00:00",
"lt" : "2017-02-09T01:00:00"
}
}
},
"aggs": {
"distinct_links": {
"terms": {
"field": "links",
"size": 100
}
}
}
}
something like this.

How to fetch records with aggregation in elasticsearch?

I am using below range aggregation in ElasticSearch and I want the aggregated records also with doc count. Can it be achieved ??
Below is the query:
{
"aggs" : {
"Age" : {
"filter" : { "range" : { "AGE" : { "gt" : 33 } } }
}
}
}
and here is the output:
{
"aggregations" : {
"Age" : {
"doc_count" : 2
}
}
}
Is there any way to fetch the records also ??
Thanks.
Yes , you can use the top hits aggregation
The documents in that bucket would be returned.
The below code should work fine -
{
"aggs": {
"Age": {
"filter": {
"range": {
"AGE": {
"gt": 33
}
},
"aggs": {
"results": {
"top_hits": {}
}
}
}
}
}
}

filter by child frequency in ElasticSearch

I currently have parents indexed in elastic search (documents) and child (comments) related to these documents.
My first objective was to search for a document with more than N comments, based on a child query. Here is how I did it:
documents/document/_search
{
"min_score": 0,
"query": {
"has_child" : {
"type" : "comment",
"score_type" : "sum",
"boost": 1,
"query" : {
"range": {
"date": {
"lte": 20130204,
"gte": 20130201,
"boost": 1
}
}
}
}
}
}
I used score to calculate the amount of comments a document has and then I filtered the documents by this amount, using "min_score".
Now, my objective is to search not just comments, but several other child documents related to the document, always based on frequency. Something like the query bellow:
documents/document/_search
{
"query": {
"match_all": {
}
},
"filter" : {
"and" : [{
"query": {
"has_child" : {
"type" : "comment",
"query" : {
"range": {
"date": {
"lte": 20130204,
"gte": 20130201
}
}
}
}
}
},
{
"or" : [
{"query": {
"has_child" : {
"type" : "comment",
"query" : {
"match": {
"text": "Finally"
}
}
}
}
},
{ "query": {
"has_child" : {
"type" : "comment",
"query" : {
"match": {
"text": "several"
}
}
}
}
}
]
}
]
}
}
The query above works fine, but it doesn't filter based on frequency as the first one does. As filters are computed before scores are calculated, I cannot use min_score to filter each child query.
Any solutions to this problem?
There is no score at all associated with filters. I'd suggest to move the whole logic to the query part and use a bool query to combine the different queries together.

Resources