Pagination with aggregation in elasticSearch - elasticsearch

ElasticSearch version 8.5
I have cron job inside java class which transfer data from one elasticSearch index to another. To fetch data from first index I use aggregation query. After some time I expect to have big amount of data within one request. Can I use some type of pagination together with aggregation so my backend can handle this amount of data. Updates in first index can occur any time so options like search_after not suitable because of consistency.
Request example to get amount of employee in each department
{ "size": 0, "aggs": { "group_by_company_id": { "terms": { "field": "company_id" }, "aggs": { "group_by_department_id": { "terms": { "field": "department_id" }, "aggs": { "group_by_department_name": { "terms": { "field": "department_name" } } } } } } } }
I try to find information in official documentation but did't find info how combine aggregation and pagination

Related

Paginate an aggregation sorted by hits on Elastic index

I have an Elastic index (say file) where I append a document every time the file is downloaded by a client. Each document is quite basic, it contains a field filename and a date when to indicate the time of the download.
What I want to achieve is to get, for each file the number of times it has been downloaded in the last 3 months. Thanks to another question, I have a query that returns all the results:
{
"query": {
"range": {
"when": {
"gte": "now-3M"
}
}
},
"aggs": {
"downloads": {
"terms": {
"field": "filename.keyword",
"size": 1000
}
}
},
"size": 0
}
Now, I want to have a paginated result. The term aggreation cannot be paginated, so I use a composite aggregation. Of course, if there is a better aggregation, it can be used here...
So for the moment, I have something like that:
{
"query": {
"range": {
"when": {
"gte": "now-3M"
}
}
},
"aggs": {
"downloads_agg": {
"composite": {
"size": 100,
"sources": [
{
"downloads": {
"terms": {
"field": "filename.keyword"
}
}
}
]
}
}
},
"size": 0
}
This aggregation allows me to paginate (thanks to after_key value in response), but it is not sorted by the number of downloads - it is sorted by the filename.
How can I sort that composite aggregation on the number of documents for each filename in my index?
Thanks.
Composite aggregation don't allow sorting based on the value field.
Excerpt from the discussion on elastic forum:
it's designed as a memory-friendly way to paginate over aggregations.
Part of the tradeoff is that you lose things like ordering by doc
count, since that isn't known until after all the docs have been
collected.
I have no experience with Transforms (part of X-pack & Licensed) but you can try that out. Apart from this, I don't see a way to get the expected output.

How to get newest data from Elasticsearch based on a date field

I implemented a scheduled script that inject date into my Elasticsearch. The script doesn't check if the data exist already in Elasticsearch so it inserts duplications. What I want is to get all events that have the latest timestamp field value (insertion dateTime).
Note: I don't have an id or a unique field that can help me group by it and set size to 1 to get the latest.
So can you give some other options?
You could aggregate by the latest available timestamp and get the top, potentially duplicate docs like so:
GET index/_search
{
"size": 0,
"aggs": {
"latest": {
"terms": {
"field": "timestamp",
"order": {
"_key": "desc"
},
"size": 1
},
"aggs": {
"latest_docs": {
"top_hits": {
"size": 100
}
}
}
}
}
}

How to filter response in multi search in elasticsearch?

I am using python's client of elasticsearch 6.5 for multi search since I have to fetch data from multiple indexes with different queries and aggregations.
GET _msearch/
{
"index": QUESTION_INDEX
}
{
"aggs": {
"order_info":{
"terms": {
"field": "order_ids",
"size": 9999
},
"aggs": {
"total_value": {
"sum": "selling_price"
}
}
},
"median_price": {
"percentiles_bucket": {
"buckets_path": "order_info>total_value",
"percents": [50]
}
}
}
}
Now in my response I am getting the order_info bucket but I only need the percentile value. So is there any way to filter out this bucket from response of elasticsearch?
Edit 1: I want to reduce the response size which is coming over network call from es

Pagination with specific search type on ElasticSearch

We are currently using ElasticSearch 6.7 and have a huge amount of data making some request taking too much time.
To avoid this problem, we want to set up pagination within our research towards elasticsearch. The problem is that I can't put one of the pagination methods proposed by ES on the different requests that already exist.
For example, this request contains different aggregations and a query:
https://github.com/trackit/trackit/blob/master/usageReports/lambda/es_request_constructor.go#L61-L75
In addition, the results are sorted after the information is collected.
I tried to set up the Search After method as well as a form of pagination using from & size.
Scroll doesn't works with aggregations and composite aggregation doesn't accept query.
So, there is any good way to do pagination in ElasticSearch combined with other request type and how to do it with the example above?
composite aggregation doesn't accept query
It does accept query. In the example below, the results are filtered based on play_name. The aggregation only get applied to the result of the query and it can be paginated using the after option.
{
"query": {
"term": {
"play_name": "A Winters Tale"
}
},
"size": 0,
"aggs": {
"speaker": {
"composite": {
"after": {
"product": "FLORIZEL"
},
"sources": [
{
"product": {
"terms": {
"field": "speaker"
}
}
}
]
},
"aggs": {
"speech_number": {
"terms": {
"field": "speech_number"
},
"aggs": {
"line_id": {
"terms": {
"field": "line_id"
}
}
}
}
}
}
}
}

How to use Scroll on Elasticsearch aggregation?

I am using Elasticsearch 5.3. I am aggregating on some data but the results are far too much to return in a single query. I tried using size = Integer.MAX_VALUE; but even that has proved to be less. In ES search API, there is a method to scroll through the search results. Is there a similar feature to use for the org.elasticsearch.search.aggregations.AggregationBuilders.terms aggregator and how do I use it? Can the search scroll API be used for the aggregators?
In ES 5.3, you can partition the terms buckets and retrieve one partition per request.
For instance, in the query below, you can request to partition your buckets into 10 partitions and only return the first partition. It will return ~10x less data than if you wanted to retrieve all buckets at once.
{
"size": 0,
"aggs": {
"my_terms": {
"terms": {
"field": "my_field",
"include": {
"partition": 0,
"num_partitions": 10
},
"size": 10000
}
}
}
}
You can then make the second request by increasing the partition to 1 and so on
{
"size": 0,
"aggs": {
"my_terms": {
"terms": {
"field": "my_field",
"include": {
"partition": 1, <--- increase this up until partition 9
"num_partitions": 10
},
"size": 10000
}
}
}
}
To add this in your Java code, you can do it like this:
TermsAggregationBuilder agg = AggregationBuilders.terms("my_terms");
agg.includeExclude(new IncludeExclude(0, 10));

Resources