Elasticsearch Ordering terms aggregation buckets after field in top hits sub aggregation - elasticsearch

I would like to order the buckets from a terms aggregation based on a property possessed by the first element in a top hits aggregation.
My best effort query looks like this (with syntax errors):
{
"aggregations": {
"toBeOrdered": {
"terms": {
"field": "parent_uuid",
"size": 1000000,
"order": {
"topAnswer._source.id": "asc"
}
},
"aggregations": {
"topAnswer": {
"top_hits": {
"size": 1
}
}
}
}
}
}
Does anyone know how to accomplish this?
Example:
{
"a":1,
"b":2,
"id":4
}
{
"a":1,
"b":3,
"id":1
}
{
"a":2,
"b":4,
"id":3
}
Grouping by "a" and ordering the buckets by "id" (desc) and sorting the top hits on "b" (desc) would give:
{2:{
"a":2,
"b":4,
"id":3
},1:{
"a":1,
"b":3,
"id":1
}}

You can do it with the following query. The idea is to show for each parent_uuid bucket the first top hit with the minimum id value and to sort the parent_uuid buckets according the smallest id value as well using a min sub-aggregation.
{
"aggregations": {
"toBeOrdered": {
"terms": {
"field": "parent_uuid",
"size": 1000000,
"order": {
"topSort": "desc"
}
},
"aggregations": {
"topAnswer": {
"top_hits": {
"size": 1,
"sort": {
"b": "desc"
}
}
},
"topSort": {
"max": {
"field": "id"
}
}
}
}
}
}
Try it out and report if this works out for you.

Related

sql to es : get limit page and order result on agg

SELECT
max( timestamp ) AS first_time,
min( timestamp ) AS last_time,
src_ip,
threat_target ,
count(*) as count
FROM
traffic
GROUP BY
src_ip,
threat_target
ORDER BY
first_time desc
LIMIT 0 ,10
I want to get this result, but I don't know how to get limit size and where to use sort
{
"size": 0,
"aggregations": {
"src_ip": {
"aggregations": {
"threat_target": {
"aggregations": {
"last_time": {
"max": {
"field": "`timestamp`"
}
},
"first_time": {
"min": {
"field": "`timestamp`"
}
}
},
"terms": {
"field": "threat_target.keyword"
}
}
},
"terms": {
"field": "src_ip.keyword"
}
}
}
}
Aggregation Pagination is generally not supported in Elastic Search, however, composite aggregation provides a way to paginate your aggregation.
Unlike the other multi-bucket aggregation the composite aggregation can be used to paginate all buckets from a multi-level aggregation efficiently.
Excerpt from Composite-Aggregation ES Docs.
CHECK: THIS
Except "ORDER BY first_time desc", below query should run fine for you. I don't think ordering on any fields other than the grouping fields (src_ip,
threat_target) is possible.
GET traffic/_search
{
"size": 0,
"aggs": {
"my_bucket": {
"composite": {
"size": 2, //<=========== PAGE SIZE
/*"after":{ // <========== INCLUDE THIS FROM Second request onwards, passing after_key of the last output here for next page
"src_ip" : "1.2.3.5",
"threat_target" : "T3"
},*/
"sources": [
{
"src_ip": {
"terms": {
"field": "source_ip",
"order": "desc"
}
}
},
{
"threat_target": {
"terms": {
"field": "threat_target"
}
}
}
]
},
"aggs": {
"first_time": {
"max": {
"field": "first_time"
}
}
}
}
}
}

Paging the top_hits aggregation in ElasticSearch

Right now I'm doing a top_hits aggregation in Elastic Search that groups my data by a field, sorts the groups by a date, and chooses the top 1.
I need to somehow page this aggregation results in a way that I can pass through the pageSize and the pageNumber, but I don't know how.
In addition to this, I also need the total results of this aggregation so we can show it in a table in our web interface.
The aggregation looks like this:
POST my_index/_search
{
"size": 0,
"aggs": {
"top_artifacts": {
"terms": {
"field": "artifactId.keyword"
},
"aggs": {
"top_artifacts_hits": {
"top_hits": {
"size": 1,
"sort": [{
"date": {
"order": "desc"
}
}]
}
}
}
}
}
}
If I understand what you want, you should be able to do pagination through a Composite Aggregation. You can still pass your size parameter in your pagination, but your from would be the key for the bucket.
POST my_index/_search
{
"size": 0,
"aggs": {
"top_artifacts": {
"composite": {
"sources": [
{
"artifact": {
"terms": {
"field": "artifactId.keyword"
}
}
}
]
,
"size": 1, // OPTIONAL SIZE (How many buckets)
"after": {
"artifact": "FOO_BAZ" // Buckets after this bucket key
}
},
"aggs": {
"hits": {
"top_hits": {
"size": 1,
"sort": [
{
"timestamp": {
"order": "desc"
}
}
]
}
}
}
}
}
}

Aggregated results show less items than doc_count?

I have an ElasticSearch query which aggregates the result on a certain field, called _aggregate. Now I have this strange situation given this query:
"size": 100,
"aggregations": {
"results": {
"terms": {
"field": "_aggregate",
"size": 1000,
"order": {
"_count": "desc"
}
},
"aggregations": {
"bundled": {
"top_hits": {
"sort": [
{
"_weight": "asc"
}
]
}
}
}
}
},
"query": {
"bool": {
"must": [
{
"term": {
"_aggregate": "5713618784853"
}
}
]
}
}
}
When I do this search, it returns 8 hits (like expected). However, when I take a look at the aggregated results, I see a doc_count of 8 (so far so good), but it only returns 3 hits.
Increasing the size of the _aggregate field does not have any effect.
Does anyone know how this is possible, or what can possibly cause this?
This is because the top_hits metric aggregation returns 3 hits by default. You can override this
"aggregations": {
"bundled": {
"top_hits": {
"size": 10, <--- add this
"sort": [
{
"_weight": "asc"
}
]
}
}
}

Aggregations for categories, sorted by category sequence

I have an elastic index, in which each document contains the following:
category {
"id": 4,
"name": "Green",
"seq": 2
}
I can use aggregations to get me the doc count for each of the categories:
{
"size": 0,
"aggs": {
"category": {
"terms": {
"field": "category.name"
}
}
}
}
This is fine, but the aggs are sorted by the doc count. What I'd like is to have the buckets sorted by the seq value, something that's easy in SQL.
Any suggestions?
Thanks!
Take a look at ordering terms aggregations.
Something like this could work, but only if "name" and "sequence" have the right relationships (one-to-one, or it works out in some other way):
POST /test_index/_search
{
"size": 0,
"aggs": {
"category": {
"terms": {
"field": "category.name",
"order" : { "seq_num" : "asc" }
},
"aggs": {
"seq_num": {
"max": {
"field": "category.seq"
}
}
}
}
}
}
Here is some code I used for testing:
http://sense.qbox.io/gist/4e551b2faec81eb0343e0e6d0cc9b10f20d7d4c1

Union of sorted sized queries in Elasticsearch

I have docs in Elasticsearch like:
{
"key1":1,
"key2":2,
"key3":3
}
I would like to make a query that returns 30 docs which are the union of the:
the 10 docs with the highest values in key1 +
the 10 docs with the highest values in key2 +
the 10 docs with the highest values in key3
I got 2 ideas:
Using DisMaxQuery - but I couldn't use sorting. Probably missed something..
using MultiSearch - but I would like to get one result object
Any suggestions would be helpful!
Another idea would be to add three terms aggregations on key1, key2 and key3 each sorted by a max sub-aggregation (in order to get the highest value for each key) and for each of them you can add a another top_hits sub-aggregation. You might get more less than 10 docs per key, if that's a problem you can increase the size of the terms aggregations to 2 or 3 and then filter out the unneeded top hits on the client side.
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"topkey1": {
"terms": {
"field": "key1",
"size": 1,
"order": {
"max_key1": "desc"
}
},
"aggs": {
"max_key1": {
"max": {
"field": "key1"
}
},
"key1_tophits": {
"top_hits": {
"size": 10
}
}
}
},
"topkey2": {
"terms": {
"field": "key2",
"size": 1,
"order": {
"max_key2": "desc"
}
},
"aggs": {
"max_key2": {
"max": {
"field": "key2"
}
},
"key2_tophits": {
"top_hits": {
"size": 10
}
}
}
},
"topkey3": {
"terms": {
"field": "key3",
"size": 1,
"order": {
"max_key3": "desc"
}
},
"aggs": {
"max_key3": {
"max": {
"field": "key3"
}
},
"key_tophits": {
"top_hits": {
"size": 10
}
}
}
}
}
}

Resources