Paging the top_hits aggregation in ElasticSearch - elasticsearch

Right now I'm doing a top_hits aggregation in Elastic Search that groups my data by a field, sorts the groups by a date, and chooses the top 1.
I need to somehow page this aggregation results in a way that I can pass through the pageSize and the pageNumber, but I don't know how.
In addition to this, I also need the total results of this aggregation so we can show it in a table in our web interface.
The aggregation looks like this:
POST my_index/_search
{
"size": 0,
"aggs": {
"top_artifacts": {
"terms": {
"field": "artifactId.keyword"
},
"aggs": {
"top_artifacts_hits": {
"top_hits": {
"size": 1,
"sort": [{
"date": {
"order": "desc"
}
}]
}
}
}
}
}
}

If I understand what you want, you should be able to do pagination through a Composite Aggregation. You can still pass your size parameter in your pagination, but your from would be the key for the bucket.
POST my_index/_search
{
"size": 0,
"aggs": {
"top_artifacts": {
"composite": {
"sources": [
{
"artifact": {
"terms": {
"field": "artifactId.keyword"
}
}
}
]
,
"size": 1, // OPTIONAL SIZE (How many buckets)
"after": {
"artifact": "FOO_BAZ" // Buckets after this bucket key
}
},
"aggs": {
"hits": {
"top_hits": {
"size": 1,
"sort": [
{
"timestamp": {
"order": "desc"
}
}
]
}
}
}
}
}
}

Related

sql to es : get limit page and order result on agg

SELECT
max( timestamp ) AS first_time,
min( timestamp ) AS last_time,
src_ip,
threat_target ,
count(*) as count
FROM
traffic
GROUP BY
src_ip,
threat_target
ORDER BY
first_time desc
LIMIT 0 ,10
I want to get this result, but I don't know how to get limit size and where to use sort
{
"size": 0,
"aggregations": {
"src_ip": {
"aggregations": {
"threat_target": {
"aggregations": {
"last_time": {
"max": {
"field": "`timestamp`"
}
},
"first_time": {
"min": {
"field": "`timestamp`"
}
}
},
"terms": {
"field": "threat_target.keyword"
}
}
},
"terms": {
"field": "src_ip.keyword"
}
}
}
}
Aggregation Pagination is generally not supported in Elastic Search, however, composite aggregation provides a way to paginate your aggregation.
Unlike the other multi-bucket aggregation the composite aggregation can be used to paginate all buckets from a multi-level aggregation efficiently.
Excerpt from Composite-Aggregation ES Docs.
CHECK: THIS
Except "ORDER BY first_time desc", below query should run fine for you. I don't think ordering on any fields other than the grouping fields (src_ip,
threat_target) is possible.
GET traffic/_search
{
"size": 0,
"aggs": {
"my_bucket": {
"composite": {
"size": 2, //<=========== PAGE SIZE
/*"after":{ // <========== INCLUDE THIS FROM Second request onwards, passing after_key of the last output here for next page
"src_ip" : "1.2.3.5",
"threat_target" : "T3"
},*/
"sources": [
{
"src_ip": {
"terms": {
"field": "source_ip",
"order": "desc"
}
}
},
{
"threat_target": {
"terms": {
"field": "threat_target"
}
}
}
]
},
"aggs": {
"first_time": {
"max": {
"field": "first_time"
}
}
}
}
}
}

Ordering Aggregation Buckets by Score

Is it possible to order the aggregation bucket by score?
"aggs": {
"UnitAggregationBucket": {
"terms": {
"field": "unitId",
"size": 10,
/* "order": order by max score documents per bucket */
}
}
}
I have seen this document which explains the default order is doc_count, but I cannot find out if it is possible and how to order the buckets by score.
Yes, it is possible to do that like this:
{
"size": 0,
"query": {
...
},
"aggs": {
"UnitAggregationBucket": {
"terms": {
"field": "unitId",
"size": 10,
"order": {
"score": "desc"
}
},
"aggs": {
"score": {
"max": {
"script": "_score"
}
}
}
}
}
}

How to mention from and size for the first level of elastic search aggregation in nested aggregation?

I have written a query to get the buckets based on id and then sort it. This works fine. But how to make it return buckets from position 100 till 200 for aggregation_by_id rule?
{
"query": {
"match_all": {}
},
"size": 0,
"aggregations": {
"aggregation_by_id": {
"terms": {
"field": "id.keyword"
"size" : 200
},
"aggs": {
"sort_timestamp": {
"top_hits": {
"sort": [{
"timestamp": {
"order": "desc",
"unmapped_type": "long"
}
}],
"size": 1
}
}
}
}
}
}

Aggregated results show less items than doc_count?

I have an ElasticSearch query which aggregates the result on a certain field, called _aggregate. Now I have this strange situation given this query:
"size": 100,
"aggregations": {
"results": {
"terms": {
"field": "_aggregate",
"size": 1000,
"order": {
"_count": "desc"
}
},
"aggregations": {
"bundled": {
"top_hits": {
"sort": [
{
"_weight": "asc"
}
]
}
}
}
}
},
"query": {
"bool": {
"must": [
{
"term": {
"_aggregate": "5713618784853"
}
}
]
}
}
}
When I do this search, it returns 8 hits (like expected). However, when I take a look at the aggregated results, I see a doc_count of 8 (so far so good), but it only returns 3 hits.
Increasing the size of the _aggregate field does not have any effect.
Does anyone know how this is possible, or what can possibly cause this?
This is because the top_hits metric aggregation returns 3 hits by default. You can override this
"aggregations": {
"bundled": {
"top_hits": {
"size": 10, <--- add this
"sort": [
{
"_weight": "asc"
}
]
}
}
}

Elasticsearch Ordering terms aggregation buckets after field in top hits sub aggregation

I would like to order the buckets from a terms aggregation based on a property possessed by the first element in a top hits aggregation.
My best effort query looks like this (with syntax errors):
{
"aggregations": {
"toBeOrdered": {
"terms": {
"field": "parent_uuid",
"size": 1000000,
"order": {
"topAnswer._source.id": "asc"
}
},
"aggregations": {
"topAnswer": {
"top_hits": {
"size": 1
}
}
}
}
}
}
Does anyone know how to accomplish this?
Example:
{
"a":1,
"b":2,
"id":4
}
{
"a":1,
"b":3,
"id":1
}
{
"a":2,
"b":4,
"id":3
}
Grouping by "a" and ordering the buckets by "id" (desc) and sorting the top hits on "b" (desc) would give:
{2:{
"a":2,
"b":4,
"id":3
},1:{
"a":1,
"b":3,
"id":1
}}
You can do it with the following query. The idea is to show for each parent_uuid bucket the first top hit with the minimum id value and to sort the parent_uuid buckets according the smallest id value as well using a min sub-aggregation.
{
"aggregations": {
"toBeOrdered": {
"terms": {
"field": "parent_uuid",
"size": 1000000,
"order": {
"topSort": "desc"
}
},
"aggregations": {
"topAnswer": {
"top_hits": {
"size": 1,
"sort": {
"b": "desc"
}
}
},
"topSort": {
"max": {
"field": "id"
}
}
}
}
}
}
Try it out and report if this works out for you.

Resources