Union of sorted sized queries in Elasticsearch - elasticsearch

I have docs in Elasticsearch like:
{
"key1":1,
"key2":2,
"key3":3
}
I would like to make a query that returns 30 docs which are the union of the:
the 10 docs with the highest values in key1 +
the 10 docs with the highest values in key2 +
the 10 docs with the highest values in key3
I got 2 ideas:
Using DisMaxQuery - but I couldn't use sorting. Probably missed something..
using MultiSearch - but I would like to get one result object
Any suggestions would be helpful!

Another idea would be to add three terms aggregations on key1, key2 and key3 each sorted by a max sub-aggregation (in order to get the highest value for each key) and for each of them you can add a another top_hits sub-aggregation. You might get more less than 10 docs per key, if that's a problem you can increase the size of the terms aggregations to 2 or 3 and then filter out the unneeded top hits on the client side.
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"topkey1": {
"terms": {
"field": "key1",
"size": 1,
"order": {
"max_key1": "desc"
}
},
"aggs": {
"max_key1": {
"max": {
"field": "key1"
}
},
"key1_tophits": {
"top_hits": {
"size": 10
}
}
}
},
"topkey2": {
"terms": {
"field": "key2",
"size": 1,
"order": {
"max_key2": "desc"
}
},
"aggs": {
"max_key2": {
"max": {
"field": "key2"
}
},
"key2_tophits": {
"top_hits": {
"size": 10
}
}
}
},
"topkey3": {
"terms": {
"field": "key3",
"size": 1,
"order": {
"max_key3": "desc"
}
},
"aggs": {
"max_key3": {
"max": {
"field": "key3"
}
},
"key_tophits": {
"top_hits": {
"size": 10
}
}
}
}
}
}

Related

Ordering Aggregation Buckets by Score

Is it possible to order the aggregation bucket by score?
"aggs": {
"UnitAggregationBucket": {
"terms": {
"field": "unitId",
"size": 10,
/* "order": order by max score documents per bucket */
}
}
}
I have seen this document which explains the default order is doc_count, but I cannot find out if it is possible and how to order the buckets by score.
Yes, it is possible to do that like this:
{
"size": 0,
"query": {
...
},
"aggs": {
"UnitAggregationBucket": {
"terms": {
"field": "unitId",
"size": 10,
"order": {
"score": "desc"
}
},
"aggs": {
"score": {
"max": {
"script": "_score"
}
}
}
}
}
}

Paging the top_hits aggregation in ElasticSearch

Right now I'm doing a top_hits aggregation in Elastic Search that groups my data by a field, sorts the groups by a date, and chooses the top 1.
I need to somehow page this aggregation results in a way that I can pass through the pageSize and the pageNumber, but I don't know how.
In addition to this, I also need the total results of this aggregation so we can show it in a table in our web interface.
The aggregation looks like this:
POST my_index/_search
{
"size": 0,
"aggs": {
"top_artifacts": {
"terms": {
"field": "artifactId.keyword"
},
"aggs": {
"top_artifacts_hits": {
"top_hits": {
"size": 1,
"sort": [{
"date": {
"order": "desc"
}
}]
}
}
}
}
}
}
If I understand what you want, you should be able to do pagination through a Composite Aggregation. You can still pass your size parameter in your pagination, but your from would be the key for the bucket.
POST my_index/_search
{
"size": 0,
"aggs": {
"top_artifacts": {
"composite": {
"sources": [
{
"artifact": {
"terms": {
"field": "artifactId.keyword"
}
}
}
]
,
"size": 1, // OPTIONAL SIZE (How many buckets)
"after": {
"artifact": "FOO_BAZ" // Buckets after this bucket key
}
},
"aggs": {
"hits": {
"top_hits": {
"size": 1,
"sort": [
{
"timestamp": {
"order": "desc"
}
}
]
}
}
}
}
}
}

how to sort the records by another field for the top_hit in ES

I have data such as:
Id name startTime(timestamp)
1 c 1510000000000
2 c 1500000000000
3 a 1510000000000
4 a 1500000000000
5 b 1500662700000
I want to get the max startTime record for each name, and then sort by name.
the result should be:
Id name startTime(timestamp)
1 a 1510000000000
5 b 1500662700000
2 c 1510000000000
currently, I can get the max startTime group by each name, but I don't know how to sort by name for the results.
Here is my query:
GET index/default/_search
{
"aggs": {
"group": {
"terms": {
"field": "name"
},
"aggs": {
"tops": {
"top_hits": {
"sort": [
{
"startTime": {
"order": "desc"
}
}
],
"size": 1
}
}
}
}
},
"size": 0
}
As I'm understand, except for top_hits sort, you want the name buckets to be sorted by the name.
Have a look at Terms Aggregation order. All you have to do is to add order by key under the terms aggregation.
Here is my suggestion:
{
"aggs": {
"group": {
"terms": {
"field": "name",
"order": { --> this will do the trick
"_term": "asc"
}
},
"aggs": {
"tops": {
"top_hits": {
"sort": [
{
"startTime": {
"order": "desc"
}
}
],
"size": 1
}
}
}
}
},
"size": 0
}

How to detect the number of days that a person passed in a city?

I have the following mapping in Elasticsearch:
PUT /traffic-data
{
"mappings": {
"traffic-entry": {
"_all": {
"enabled": false
},
"properties": {
"CameraId": {
"type":"keyword"
},
"VehiclePlateNumber": {
"type":"keyword"
},
"DateTime": {
"type":"date"
}
}
}
}
}
I want to calculate how many days per month has a vehicle stayed. A unique vehicle is identified by VehiclePlateNumber.
So, I want to get the result something like this:
VehiclePlaneNumber Month StayDays
111 1 5
222 1 1
...
How can I do it using Elasticsearch query?
This is what I tried:
GET traffic-data/_search?
{
"size": 0,
"aggs":{
"by_district":{
"terms": {
"field": "VehiclePlateNumber",
"size": 100000
},
"aggs": {
"by_month": {
"terms": {
"field": "DateTime",
"size": 12
}
}
}
}
}
}
You can do terms aggregation on Vehicle plate number then a terms sub agg on month then a sum sub agg on days.
Something like:
GET traffic-data/_search
{
"size": 0,
"aggs":{
"by_district":{
"terms": {
"field": "VehiclePlateNumber",
"size": 100000
},
"aggs": {
"by_month": {
"terms": {
"field": "DateTime",
"size": 12
},
"aggs": {
"days": {
"sum": {
"field": "days"
}
}
}
}
}
}
}
}
Month should be a scripted field but would be better to compute it at index time.
That should work.
Or you can use entity centric design and regularly index that value computed. See https://www.elastic.co/elasticon/2015/sf/building-entity-centric-indexes

Elasticsearch Ordering terms aggregation buckets after field in top hits sub aggregation

I would like to order the buckets from a terms aggregation based on a property possessed by the first element in a top hits aggregation.
My best effort query looks like this (with syntax errors):
{
"aggregations": {
"toBeOrdered": {
"terms": {
"field": "parent_uuid",
"size": 1000000,
"order": {
"topAnswer._source.id": "asc"
}
},
"aggregations": {
"topAnswer": {
"top_hits": {
"size": 1
}
}
}
}
}
}
Does anyone know how to accomplish this?
Example:
{
"a":1,
"b":2,
"id":4
}
{
"a":1,
"b":3,
"id":1
}
{
"a":2,
"b":4,
"id":3
}
Grouping by "a" and ordering the buckets by "id" (desc) and sorting the top hits on "b" (desc) would give:
{2:{
"a":2,
"b":4,
"id":3
},1:{
"a":1,
"b":3,
"id":1
}}
You can do it with the following query. The idea is to show for each parent_uuid bucket the first top hit with the minimum id value and to sort the parent_uuid buckets according the smallest id value as well using a min sub-aggregation.
{
"aggregations": {
"toBeOrdered": {
"terms": {
"field": "parent_uuid",
"size": 1000000,
"order": {
"topSort": "desc"
}
},
"aggregations": {
"topAnswer": {
"top_hits": {
"size": 1,
"sort": {
"b": "desc"
}
}
},
"topSort": {
"max": {
"field": "id"
}
}
}
}
}
}
Try it out and report if this works out for you.

Resources