Elastic Search - Pagination on Aggregations - elasticsearch

I have an index and I query an aggregation, instead of returning the whole aggregation at once I want to have it returned in chunks, that is small small blocks, is it possible to do so in Elastic Search?

Try to use Bucket sort
POST /sales/_search
{
"size": 0,
"aggs" : {
"sales_per_month" : {
"date_histogram" : {
"field" : "date",
"interval" : "month"
},
"aggs": {
"total_sales": {
"sum": {
"field": "price"
}
},
"sales_bucket_sort": {
"bucket_sort": {
"sort": [
{"total_sales": {"order": "desc"}}
],
"size": 3,
"from": 10
}
}
}
}
}
}

Related

Elasticsearch : How to do 'group by' with painless in scripted fields?

I would like to do something like the following using painless:
select day,sum(price)/sum(quantity) as ratio
from data
group by day
Is it possible?
I want to do this in order to visualize the ratio field in kibana, since kibana itself doesn't have the ability to divide aggregated values, but I would gladly listen to alternative solutions beyond scripted fields.
Yes, it's possible, you can achieve this with the bucket_script pipeline aggregation:
{
"aggs": {
"days": {
"date_histogram": {
"field": "dateField",
"interval": "day"
},
"aggs": {
"price": {
"sum": {
"field": "price"
}
},
"quantity": {
"sum": {
"field": "quantity"
}
},
"ratio": {
"bucket_script": {
"buckets_path": {
"sumPrice": "price",
"sumQuantity": "quantity"
},
"script": "params.sumPrice / params.sumQuantity"
}
}
}
}
}
}
UPDATE:
You can use the above query through the Transform API which will create an aggregated index out of the source index.
For instance, I've indexed a few documents in a test index and then we can dry-run the above aggregation query in order to see how the target aggregated index would look like:
POST _transform/_preview
{
"source": {
"index": "test2",
"query": {
"match_all": {}
}
},
"dest": {
"index": "transtest"
},
"pivot": {
"group_by": {
"days": {
"date_histogram": {
"field": "#timestamp",
"calendar_interval": "day"
}
}
},
"aggregations": {
"price": {
"sum": {
"field": "price"
}
},
"quantity": {
"sum": {
"field": "quantity"
}
},
"ratio": {
"bucket_script": {
"buckets_path": {
"sumPrice": "price",
"sumQuantity": "quantity"
},
"script": "params.sumPrice / params.sumQuantity"
}
}
}
}
}
The response looks like this:
{
"preview" : [
{
"quantity" : 12.0,
"price" : 1000.0,
"days" : 1580515200000,
"ratio" : 83.33333333333333
}
],
"mappings" : {
"properties" : {
"quantity" : {
"type" : "double"
},
"price" : {
"type" : "double"
},
"days" : {
"type" : "date"
}
}
}
}
What you see in the preview array are documents that are going to be indexed in the transtest target index, that you can then visualize in Kibana as any other index.
So what a transform actually does is run the aggregation query I gave you above and it will then store each bucket into another index that can be used.
I found a solution to get the ratio of sums with TSVB visualization in kibana.
You may see the image here to see an example.
At first, you have to create two sum aggregations, one that sums price and another that sums quantity. Then, you choose the 'Bucket Script' aggregation to divide the aforementioned sums, with the use of painless script.
The only drawback that I found is that you can not aggregate on multiple columns.

Aggregation using elastic search

I have my search query for fetch latest 5000 documents from my elastic DB as below
{
"size": 5000,
"from": 0,
"query": {
"range" : {
"hostTimestamp" : {
"gte" : 1499674634382,
"lte" : 1499680034000
}
}
},
"sort": [
{
"hostTimestamp": {
"order": "desc"
}
}
]
}
Now in the documents that are fetched as result of this query I want to count no of documents with eventSeverity as Alert or Critical. How can this be achieved?
You can achieve that with a terms aggregation on the eventSeverity field:
{
"size": 5000,
"from": 0,
"query": {
"range" : {
"hostTimestamp" : {
"gte" : 1499674634382,
"lte" : 1499680034000
}
}
},
"sort": [
{
"hostTimestamp": {
"order": "desc"
}
}
],
"aggs": { <--- add this part
"severities": {
"terms": {
"field": "eventSeverity"
}
}
}
}

ElasticSearch multiple terms aggregation order

I have a document structure which describes a container, some of its fields are:
containerId -> Unique Id,String
containerManufacturer -> String
containerValue -> Double
estContainerWeight ->Double
actualContainerWeight -> Double
I want to run a search aggregation which has two levels of terms aggregations on the two weight fields, but in descending order of the weight fields, like below:
{
"size": 0,
"aggs": {
"by_manufacturer": {
"terms": {
"field": "containerManufacturer",
"size": 10,
"order": {"estContainerWeight": "desc"} //Cannot do this
},
"aggs": {
"by_est_weight": {
"terms": {
"field": "estContainerWeight",
"size": 10,
"order": { "actualContainerWeight": "desc"} //Cannot do this
},
"aggs": {
"by_actual_weight": {
"terms": {
"field": "actualContainerWeight",
"size": 10
},
"aggs" : {
"container_value_sum" : {"sum" : {"field" : "containerValue"}}
}
}
}
}
}
}
}
}
Sample documents:
{"containerId":1,"containerManufacturer":"A","containerValue":12,"estContainerWeight":5.0,"actualContainerWeight":5.1}
{"containerId":2,"containerManufacturer":"A","containerValue":24,"estContainerWeight":5.0,"actualContainerWeight":5.2}
{"containerId":3,"containerManufacturer":"A","containerValue":23,"estContainerWeight":5.0,"actualContainerWeight":5.2}
{"containerId":4,"containerManufacturer":"A","containerValue":32,"estContainerWeight":6.0,"actualContainerWeight":6.2}
{"containerId":5,"containerManufacturer":"A","containerValue":26,"estContainerWeight":6.0,"actualContainerWeight":6.3}
{"containerId":6,"containerManufacturer":"A","containerValue":23,"estContainerWeight":6.0,"actualContainerWeight":6.2}
Expected Output(not complete):
{
"by_manufacturer": {
"buckets": [
{
"key": "A",
"by_est_weight": {
"buckets": [
{
"key" : 5.0,
"by_actual_weight" : {
"buckets" : [
{
"key" : 5.2,
"container_value_sum" : {
"value" : 1234 //Not actual sum
}
},
{
"key" : 5.1,
"container_value_sum" : {
"value" : 1234 //Not actual sum
}
}
]
}
},
{
"key" : 6.0,
"by_actual_weight" : {
"buckets" : [
{
"key" : 6.2,
"container_value_sum" : {
"value" : 1234 //Not actual sum
}
},
{
"key" : 6.3,
"container_value_sum" : {
"value" : 1234 //Not actual sum
}
}
]
}
}
]
}
}
]
}
}
However, I cannot order by the nested aggregations. (Error: Terms buckets can only be sorted on a sub-aggregator path that is built out of zero or more single-bucket aggregations within the path and a final single-bucket or a metrics aggregation...)
For example, for the above sample output, I have no control on the buckets generated if I introduce a size on the terms aggregations (which I will have to do if my data is large),so I would like to only get the top N weights for each terms aggregation.
Is there a way to do this ?
If I understand your problem correctly, you would like to sort the manufacturer terms in decreasing order of the estimated weights of their containers and then each bucket of "estimated weight" in decreasing order of their actual weight.
{
"size": 0,
"aggs": {
"by_manufacturer": {
"terms": {
"field": "containerManufacturer",
"size": 10
},
"by_est_weight": {
"terms": {
"field": "estContainerWeight",
"size": 10,
"order": {
"_term": "desc" <--- change to this
}
},
"by_actual_weight": {
"terms": {
"field": "actualContainerWeight",
"size": 10,
"order" : {"_term" : "desc"} <----- Change to this
},
"aggs": {
"container_value_sum": {
"sum": {
"field": "containerValue"
}
}
}
}
}
}
}
}
}
}

get buckets count in elasticsearch aggregations

I am using elasticsearch to search a database with a lot of duplicates.
I am using field colapse and it works, however it returns the amount of hits (including duplicates) and not the amount of buckets.
"aggs": {
"uniques": {
"terms": {
"field": "guid"
},
"aggs": {
"jobs": { "top_hits": { "_source": "title", "size": 1 }}
}
}
}
I can count the buckets by making another request using cardinality (but it only returns count, not the documents):
{
"aggs" : {
"uniques" : {
"cardinality" : {
"field" : "guid"
}
}
}
}
Is there a way to return both requests (buckets + total bucket count) in one search?
Thanks
You can combine both of these aggregations into 1 request.
{
"aggs" : {
"uniques" : {
"cardinality" : {
"field" : "guid"
}
},
"uniquesTerms": {
"terms": {
"field": "guid"
},
"aggs": {
"jobs": { "top_hits": { "_source": "title", "size": 1 }}
}
}
}

sub field aggregation group by order by in elasticsearch

I am unable to find the correct syntax to get an aggregation of a sub object ordered by a count field.
A good example of this is a twitter document:
{
"properties" : {
"id" : {
"type" : "long"
},
"message" : {
"type" : "string"
},
"user" : {
"type" : "object",
"properties" : {
"id" : {
"type" : "long"
},
"screenName" : {
"type" : "string"
},
"followers" : {
"type" : "long"
}
}
}
}
}
How would I go about getting the Top Influencers for a given set of tweets? This would be a unique list of the top 10 "user" objects ordered by the "user.followers" field.
I have tried using top_hits but get an exception:
org.elasticsearch.common.breaker.CircuitBreakingException: [FIELDDATA]
Data too large, data for [user.id]
"aggs": {
"top-influencers": {
"terms": {
"field": "user.id",
"order": {
"top_hit": "desc"
}
},
"aggs": {
"top_tags_hits": {
"top_hits": {}
},
"top_hit": {
"max": {
"field": "user.followers"
}
}
}
}
}
I can get almost what I want using the "sort" field on the query (no aggregation), however if a user has multiple tweets then they will appear twice in the result. I need to be able to group by the sub object "user" and only return each user once.
---UPDATE---
I have managed to get a list of the top users returning in very good time. Unfortunatly it still isnt unique. Also the docs say top_hits is designed to be a sub agg..., I am using it as a top level agg...
"aggs": {
"top_influencers": {
"top_hits": {
"sort": [
{
"user.followers": {
"order": "desc"
}
}
],
"_source": {
"include": [
"user.id",
"user.screenName",
"user.followers"
]
},
"size": 10
}
}
}
Try this:
{
"aggs": {
"GroupByType": {
"terms": {
"field": "user.id",
"size": 10000
},
"aggs": {
"Group": {
"top_hits":{
"size":1,
"_source": {
"includes": ["user.id", "user.screenName", "user.followers"]
},
"sort":[{
"user.followers": {
"order": "desc"
}
}]
}
}
}
}
}
}
You can then take the top 10 results of this query. Note that normal search in elastic search only goes up to 10000 records.

Resources