Elasticsearch Aggregation Pagination - elasticsearch

Is there any way to do pagination using Elastic search with aggregation?
the elasticsearch version is 2.3.
This is the query:
{
"query": {
"match": {
"clientMac": "88:"
}
},
"aggs": {
"top_tags": {
"terms": {
"field": "clientMac.rawData"
},
"aggs": {
"top_client_hits": {
"top_hits": {
"sort": [
{
"event_timestamp": {
"order": "desc"
}
}
],
"_source": {
"includes": [
"event_timestamp"
]
},
"size": 1
}
}
}
}
}
}

From elastic 5 you do have the ability by partitioning the buckets of terms aggregation. You can read about it here:
https://www.elastic.co/guide/en/elasticsearch/reference/5.5/search-aggregations-bucket-terms-aggregation.html#_filtering_values_with_partitions

Related

How to define percentage of result items with specific field in Elasticsearch query?

I have a search query that returns all items matching users that have type manager or lead.
{
"from": 0,
"size": 20,
"query": {
"bool": {
"should": [
{
"terms": {
"type": ["manager", "lead"]
}
}
]
}
}
}
Is there a way to define what percentage of the results should be of type "manager"?
In other words, I want the results to have 80% of users with type manager and 20% with type lead.
I want to make a suggestion to use bucket_path aggregation. As I know this aggregation needs to be run in sub-aggs of a histogram aggregation. As you have such field in your mapping so I think this query should work for you:
{
"size": 0,
"aggs": {
"NAME": {
"date_histogram": {
"field": "my_datetime",
"interval": "month"
},
"aggs": {
"role_type": {
"terms": {
"field": "type",
"size": 10
},
"aggs": {
"count": {
"value_count": {
"field": "_id"
}
}
}
},
"role_1_ratio": {
"bucket_script": {
"buckets_path": {
"role_1": "role_type['manager']>count",
"role_2": "role_type['lead']>count"
},
"script": "params.role_1 / (params.role_1+params.role_2)*100"
}
},
"role_2_ratio": {
"bucket_script": {
"buckets_path": {
"role_1": "role_type['manager']>count",
"role_2": "role_type['lead']>count"
},
"script": "params.role_2 / (params.role_1+params.role_2)*100"
}
}
}
}
}
}
Please let me know if it didn't work well for you.

Elasticsearch Pagination with timestamp range

Elasticsearch official documentation introduce that elasticsearch can realize pagination by composite aggregations.
The composite aggregation will fetch data many times to get all results.
So my question is, Can I use range from now-1h to now when I execute composite aggregation?
If I can. How to composite aggregation query keep source data unchanging when every range query have different now.
If I can't. My query below has no error and the result seems to be right.
{
"size": 0,
"query": {
"bool": {
"filter": [
{
"range": {
"timestamp": {
"gte": "now-1h"
}
}
}
]
}
},
"aggs": {
"user_device": {
"composite": {
"after": {
"user_name": "alen.lv"
},
"size": 100,
"sources": [
{
"user_name": {
"terms": {
"field": "user_name"
}
}
}
]
},
"aggs": {
"user_mac": {
"terms": {
"field": "user_mac",
"size": 1000
}
}
}
}
}
}

How to diversify the result of top-hits aggregation?

Let's start with a concrete example. I have a document with these fields:
{
"template": {
"mappings": {
"template": {
"properties": {
"tid": {
"type": "long"
},
"folder_id": {
"type": "long"
},
"status": {
"type": "integer"
},
"major_num": {
"type": "integer"
}
}
}
}
}
}
I want to aggregate the query result by field folder_id, and for each group divided by folder_id, retrieve the top-N documents' _source detail. So i write query DSL like:
GET /template/template/_search
{
"size": 0,
"query": {
"bool": {
"filter": [
{
"term": {
"status": 1
}
}
]
}
},
"aggs": {
"folder": {
"terms": {
"field": "folder_id",
"size": 10
},
"aggs": {
"top_hit":{
"top_hits": {
"size": 5,
"_source": ["major_num"]
}
}
}
}
}
}
However, now comes a requirement that the top hits documents for each folder_id must be diversified on the field major_num. For each folder_id, the top hits documents retrieve by the sub top_hits aggregation under the terms aggregation, must be unique on field major_num, and for each major_num value, return at most 1 document in the sub top hits aggregation result.
top_hits aggregation cannot accept sub-aggregations, so how should i solve the question?
Why not simply adding another terms aggregation on the major_num field ?
GET /template/template/_search
{
"size": 0,
"query": {
"bool": {
"filter": [
{
"term": {
"status": 1
}
}
]
}
},
"aggs": {
"folder": {
"terms": {
"field": "folder_id",
"size": 10
},
"aggs": {
"majornum": {
"terms": {
"field": "major_num",
"size": 10
},
"aggs": {
"top_hit": {
"top_hits": {
"size": 1
}
}
}
}
}
}
}
}

How do I aggregate over top_hits results in elasticsearch

Here are example documents:
{
"player": "Jim",
"score" : 5
"timestamp": 1459492890000
}
{
"player": "Jim",
"score" : 7
"timestamp": 1459492895000
}
{
"player": "Dave",
"score" : 9
"timestamp": 1459492894000
}
{
"player": "Dave",
"score" : 4
"timestamp": 1459492898000
}
I want to get the latest score for each player and then get the average of all those scores. So the answer would be 5.5. Jim's latest score is 7 and Dave's latest score is 4. The average between those two is 5.5
The only way I found to get the "latest" document of a player was to use the top_hits aggregation. However, it does not seem that I am able to do another aggregation after I get the latest document.
This is the best I came up with:
{
"aggs": {
"last_score": {
"terms": { "field": "player" },
"aggs": {
"last_score_hits": {
"top_hits": {
"sort": [ { "timestamp": { "order": "desc" } } ],
"size": 1
},
"aggs": {
"avg_score": {
"avg": { "field": "score" }
}
}
}
}
}
}
}
However, this gives me this error:
Aggregator [last_score_hits] of type [top_hits] cannot accept
sub-aggregations
If there is another way to accomplish this search without using top_hits as well, then I would be all for it.
You're trying to put avg_score as a sub-aggregation of last_score_hits.
To get success you have to put avg_score as a sub-aggregation of last_score. See an example bellow:
{
"aggs": {
"last_score": {
"terms": {
"field": "player"
},
"aggs": {
"last_score_hits": {
"top_hits": {
"sort": [
{
"timestamp": {
"order": "desc"
}
}
],
"size": 1
}
},
"avg_score": {
"avg": {
"field": "score"
}
}
}
}
}
}
You can have other aggregation on a parallel level of top_hit but you cannot have any sub_aggregation below top_hit. It is not supported by ElasticSearch. here is the link to Github issue
You can have a parallel level aggregation like:
"aggs": {
"top_hits_agg": {
"top_hits": {
"size": 10,
"_source": {
"includes": ["score"]
}
}
},
"avg_agg": {
"avg": {
"field": "score"
}
}
}

sorting elasticsearch top hits results

I am trying to execute a query in elasticsearch to get reuslt of specific users from certain date range. the results should be grouped by userId and sorted on trackTime field, I am able to use group by using aggregation but i am not able to sort aggregation buckets on tracktime, i write down the following query
GET _search
{
"size": 0,
"query": {
"filtered": {
"query": {
"bool": {
"must": [
{
"range": {
"trackTime": {
"from": "2016-02-08T05:51:02.000Z"
}
}
}
]
}
},
"filter": {
"terms": {
"userId": [
9,
10,
3
]
}
}
}
},
"aggs": {
"by_district": {
"terms": {
"field": "userId"
},
"aggs": {
"tops": {
"top_hits": {
"size": 2
}
}
}
}
}
}
what more should i have to use to sort the top hits result? Thanks in advance...
You can use sort like .
"aggs": {
"by_district": {
"terms": {
"field": "userId"
},
"aggs": {
"tops": {
"top_hits": {
"sort": [
{
"fieldName": {
"order": "desc"
}
}
],
"size": 2
}
}
}
}
}
Hope it helps

Resources