Irregularities in Elasticsearh Aggregarions - elasticsearch

I am using elasticsearch for creating some aggregation reports. Here is my aggregation query
{
"size":10,
"_source":["country_iso", "username"],
"aggs":{
"Granulated Reports":{
"date_histogram" :{
"field":"aggr_time",
"interval" : "month"
},
"aggs":{
"calls":{
"sum":{"field":"bill_duration"}
}
}
}
}
}
I get a doc_count of 27000 but if I remove the aggregation calls and make it
{
"size":10,
"_source":["country_iso", "username"],
"aggs":{
"Granulated Reports":{
"date_histogram" :{
"field":"aggr_time",
"interval" : "month"
},
"aggs":{
}
}
}
}
I am getting a doc_count of 44000. My understanding is that the doc_count should differ only if i change a query or a filter. Adding or deleting aggregations should not have any effect on the number of documents it is scanning. The Doc_count goes down if I add another aggregation. I am not able to understand this behavior as this is giving different answers based on the number of aggregations.

Related

How to also display the values within the bucket that considered during aggregation?

I need to aggregate records based on the created_date. So based on each created date, there are group of records right?. Now, Could someone tell me how to display the created date as well along with each set of results.?
"aggs": {
"by_created_date": {
"terms": {
"field": "createddate"
},
_source["createddate"] //Something like this. so that i can see what date it has used.
"aggs": {
....
}, //Also may need to use some aggregation on this level.
},
}
aggs":{
"by_created_date":{
"terms":{
"field":"createddate.keyword",
"size":1000
},
"aggs":{
"bucket" : {
"terms" : {
"field" : "field_name",
"size": 10
}
}
}
}
}
terms is used for grouping a field.
So, for nested grouping...you have to write nested aggregation like upper code.

Incorrect aggregation when using sorting

I use this query to get search hits and the count of hits across multiple indices:
/index1,index2/_search
{
"query":{
"query_string":{
"query":"*"
}
},
"aggs":{
"group_by_index":{
"terms":{
"field":"_index",
"min_doc_count":0
}
}
},
"post_filter":{
"terms":{
"_index":"index1"
}
},
"sort":{
"my_field":"asc"
}
}
The problem is if I sort on a field (my_field) that only exist in index1, the aggregation will only give me the hits count of index1, and not index2.
I thought the aggregation would work regardless of what sorting I have specified?
Using Elasticsearch 6.4
Solved it by using unmapped_type

Elastic Search max on string fields

In SQL, it is possible to use MAX() on string fields to get a distinct value (assuming the group by is correct).
However this is not possible in ElasticSearch, since MAX only works on numeric fields. However I want to retrieve the values of some string fields after my aggregations, so I can display these values.
eg assuming a generic books structure
{
"aggs" : {
"group_by_author" : { "terms" : { "field" : "author"},
"aggs" : {
"books_published" : { "sum" : { "field" : "name"}},
"distinct_title" : { "max" : {"field" : "some_relevant_field_name"}}
}
}
}
}
Here I cannot perform the max on some_relevant_field_name since it is a string. Is there an alternative way to do this apart from more aggregations ?
If you want to find the distinct book titles for each author, maybe should your try to use the "terms" aggregation in the "distinct_title" field:
{
"aggs":{
"group_by_author":{
"terms":{
"field":"author"
},
"aggs":{
"books_published":{
"sum":{
"field":"name"
}
},
"distinct_title":{
"terms":{
"field":"some_relevant_field_name"
}
}
}
}
}
}
It should create buckets of book titles for each author as described in the documentation.

ElasticSearch - Get extra field in aggregation

I am trying to get extra field with aggregation. Below is the query
GET /iacmpi/_search?_source=false
{
"query": {
"match": {
"Document_Type": "INVOICEDoc"
}
},
"aggs": {
"GroupByCDMInvoiceID": {
"terms":{ "field" : "INVOICE_ID" },
"aggs":{
"LatestVersion":{
"max":{
"field":"DocVersion"
}
}
}
}
}
}
So at the level of INVOICE_ID field aggregation, i need to fetch one more field 'NAME'. I dont want that in query part as it will show me all hits and i have to traverse the hits and get a match.
Is it possible?
Thanks,
Sameer
I think top hits is what you're looking for.

How to paging aggregation result in ElasticSearch?

When I execute the query below, how to paging the aggs results?
And is there a method to put the aggs results to hits part in json result?
POST http://myElastic.com/test/e1,e2,e3/_search
{
"aggs":{
"dedup" : {
"terms":{
"field": "id"
},
"aggs":{
"dedup_docs":{
"top_hits":{
"size":1
}
}
}
}
}
}
I searched a moment before found and I came across several positions during my research, so I post a new answer for people who will make the same journey as me.
We can partition the results as below:
{
"aggs":{
"group" : {
"terms":{
"field": "id",
"size":5000,
"include": {
"partition": 1,
"num_partitions": 1000
}
},
"aggs":{
"dedup_docs":{
"top_hits":{
"size":1
}
}
}
}
}
}
// size:5000 : return 5.000 results per page
// num_partitions:1000 : return 1.000 pages of results
// partition:1 : return page index 1 (start at 0)
// size:5000,num_partitions:1000,partition:1 : returns results from 5.000 to 9.999
// size:5000,num_partitions:1000,partition:2 : returns results from 10.000 to 14.999
// size:5000,num_partitions:1000,partition:3 : returns results from 15.000 to 19.999
Based on the below issue on the Elasticsearch github site I don't think what you are asking for is possible:
https://github.com/elastic/elasticsearch/issues/4915
Seems like a common request however. Add your own feedback and they may get around to adding it.

Resources