How to paging aggregation result in ElasticSearch? - elasticsearch

When I execute the query below, how to paging the aggs results?
And is there a method to put the aggs results to hits part in json result?
POST http://myElastic.com/test/e1,e2,e3/_search
{
"aggs":{
"dedup" : {
"terms":{
"field": "id"
},
"aggs":{
"dedup_docs":{
"top_hits":{
"size":1
}
}
}
}
}
}

I searched a moment before found and I came across several positions during my research, so I post a new answer for people who will make the same journey as me.
We can partition the results as below:
{
"aggs":{
"group" : {
"terms":{
"field": "id",
"size":5000,
"include": {
"partition": 1,
"num_partitions": 1000
}
},
"aggs":{
"dedup_docs":{
"top_hits":{
"size":1
}
}
}
}
}
}
// size:5000 : return 5.000 results per page
// num_partitions:1000 : return 1.000 pages of results
// partition:1 : return page index 1 (start at 0)
// size:5000,num_partitions:1000,partition:1 : returns results from 5.000 to 9.999
// size:5000,num_partitions:1000,partition:2 : returns results from 10.000 to 14.999
// size:5000,num_partitions:1000,partition:3 : returns results from 15.000 to 19.999

Based on the below issue on the Elasticsearch github site I don't think what you are asking for is possible:
https://github.com/elastic/elasticsearch/issues/4915
Seems like a common request however. Add your own feedback and they may get around to adding it.

Related

Looking to get average speed for each ID in terms list and within a date range in elastic

I'm looking in kibana to get the average speed each of four road ids within a specific date range. I have this aggregation code that doesn't bring up an errors, but it also Gateway Time-out every time I try. The two fields are there, but I can't seem to build the aggregation well enough to get my intended averages. Can someone see what I'm doing incorrectly? Here is my query:
GET herev322_*/_search
{
"aggs": {
"ns2ids":{
"filter": {
"bool":{
"must":[
{
"terms":{
"nS2ID": [
"102-4893",
"102-4894",
"102+10103",
"102+10104"
]
}
},
{
"range":{
"pBT":{
"from": "2021-01-01T07:00:00",
"to": "2021-02-01T19:00:00"
}
}
}
]
}
},
"aggs":{
"avg_speed":{
"avg":{
"field": "speed"
}
}
}
}
}
}

How to also display the values within the bucket that considered during aggregation?

I need to aggregate records based on the created_date. So based on each created date, there are group of records right?. Now, Could someone tell me how to display the created date as well along with each set of results.?
"aggs": {
"by_created_date": {
"terms": {
"field": "createddate"
},
_source["createddate"] //Something like this. so that i can see what date it has used.
"aggs": {
....
}, //Also may need to use some aggregation on this level.
},
}
aggs":{
"by_created_date":{
"terms":{
"field":"createddate.keyword",
"size":1000
},
"aggs":{
"bucket" : {
"terms" : {
"field" : "field_name",
"size": 10
}
}
}
}
}
terms is used for grouping a field.
So, for nested grouping...you have to write nested aggregation like upper code.

Irregularities in Elasticsearh Aggregarions

I am using elasticsearch for creating some aggregation reports. Here is my aggregation query
{
"size":10,
"_source":["country_iso", "username"],
"aggs":{
"Granulated Reports":{
"date_histogram" :{
"field":"aggr_time",
"interval" : "month"
},
"aggs":{
"calls":{
"sum":{"field":"bill_duration"}
}
}
}
}
}
I get a doc_count of 27000 but if I remove the aggregation calls and make it
{
"size":10,
"_source":["country_iso", "username"],
"aggs":{
"Granulated Reports":{
"date_histogram" :{
"field":"aggr_time",
"interval" : "month"
},
"aggs":{
}
}
}
}
I am getting a doc_count of 44000. My understanding is that the doc_count should differ only if i change a query or a filter. Adding or deleting aggregations should not have any effect on the number of documents it is scanning. The Doc_count goes down if I add another aggregation. I am not able to understand this behavior as this is giving different answers based on the number of aggregations.

ElasticSearch - Get extra field in aggregation

I am trying to get extra field with aggregation. Below is the query
GET /iacmpi/_search?_source=false
{
"query": {
"match": {
"Document_Type": "INVOICEDoc"
}
},
"aggs": {
"GroupByCDMInvoiceID": {
"terms":{ "field" : "INVOICE_ID" },
"aggs":{
"LatestVersion":{
"max":{
"field":"DocVersion"
}
}
}
}
}
}
So at the level of INVOICE_ID field aggregation, i need to fetch one more field 'NAME'. I dont want that in query part as it will show me all hits and i have to traverse the hits and get a match.
Is it possible?
Thanks,
Sameer
I think top hits is what you're looking for.

Bounce Rate Query Elasticsearch

i am planning to implement a query for calculating the bounce rate using elasticsearch query.
can any body know how to use the input of aggreation results using script ?
{
"aggs":{
"monthly":{
"date_histogram":{
"field":"timestamp",
"interval":"month",
"script":""
},
"aggs":{
"visits_greater_than_one":{
"terms":{
"field":"sessionId",
"min_doc_count":2
}
}
},
"aggs":{
"visitor_count":{
"cardinality":{
"field":"sessionId"
}
}
}
}
}
}
Thanks,
Ankireddy Polu
i have found a little workaround for addressing this problem
{
"aggs":{
"monthly":{
"date_histogram":{
"field":"timestamp",
"interval":"month"
},
"aggs":{
"visits_greater_than_one":{
"terms":{
"field":"sessionId",
"min_doc_count":2
}
},
"visitor_count":{
"cardinality":{
"field":"sessionId"
}
}
}
}
}
}
the drawback with the approach is we need to perform the calculation separately where ever we pares the result, we will have a two different buckets one will hold the number of sessions whose have more than one entry and total number of session during that interval. using that (visitor_count - visits_greater_than_one)/visitor_count will be my bounce rate
(visitor_count - visits_greater_than_one) gives me the sessions user visited only single page

Resources