Wrong sum with aggregations

Wrong sum with aggregations - elasticsearch

I'm trying to run this query:
GET my_index/_search
{
"size": 0,
"query": {
"bool": {
"filter": [
{
"query_string": {
"query": "_exists_:products_count",
"default_operator": "AND"
}
}
]
}
},
"aggs": {
"pid": {
"terms": {
"field": "pid",
"size": 15,
"order": {
"products_counter": "ASC"
}
},
"aggs": {
"products_counter": {
"sum": {
"field": "products_count"
}
}
}
}
}
}
The results I get are:
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 12,
"successful": 12,
"failed": 0
},
"hits": {
"total": 489681,
"max_score": 0,
"hits": []
},
"aggregations": {
"pid": {
"doc_count_error_upper_bound": -1,
"sum_other_doc_count": 488443,
"buckets": [
{
"key": 3229479298,
"doc_count": 14,
"products_counter": {
"value": 26
}
},
{...
Although the results for the pid returned are 188 and not 26.
if I raise the size of the aggregation from 15 to 100000 for example I do get the right number.
any help with understanding and fixing my problem?

Related

Elasticsearch - aggregate over filtered data

I have a query that returns a set of documents (100). Over these I want to apply an aggregation, because these are most relevant. When I try to aggregate, that returns aggregations over all results, not over the first 100.
Query:
{
"size": 100,
"sort": [
{
"_score": {
"order": "desc"
}
}
],
"from": 0,
"query": {
.......
},
"aggregations": {
"category.category_id": {
"nested": {
"path": "category"
},
"aggregations": {
"category.category_id": {
"terms": {
"field": "category.category_id",
"size": 2,
"order": {
"_count": "desc"
}
}
}
}
}
}
Result:
{
"took": 33,
"timed_out": false,
"_shards": {
"total": 4,
"successful": 4,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1042,
"max_score": 60,
"hits": [...100 hits...]
},
"aggregations": {
"category.category_id": {
"doc_count": 5186,
"category.category_id": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 196,
"buckets": [
{
"key": 2,
"doc_count": 1042
},
{
"key": 2764,
"doc_count": 272
}
....
]
}
}
}
Expected:
{
"took": 33,
"timed_out": false,
"_shards": {
"total": 4,
"successful": 4,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1042,
"max_score": 60,
"hits": [...100 hits...]
},
"aggregations": {
"category.category_id": {
"doc_count": 100,
"category.category_id": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": x,
"buckets": [
{
"key": 2,
"doc_count": x (x< 100) (eg 37)
},
{
"key": 2764,
"doc_count": y (y <= 100 -x) (eg 10)
}
....
]
}
}
}
Is possible to aggregate over filtered data? or haw can I aggregate over most relevant data?

You can use a filter aggregation as described by elasticsearch documentation
{
"aggs" : {
"agg_name" : {
"filter" : { //Add your query },
"aggs" : {
"terms": {
"field": "category.category_id",
"size": 2,
"order": {
"_count": "desc"
}
}
}
}
}
If you want you can add one more aggregation inside the 2nd aggs

Appending further aggregations within Terms Aggregation

Sorry if this has been asked already but been lurking around SO and couldn't find anything which suits my needs.
Basically, what I'm trying to achieve in my first quick tries with ES is to add further counters within a Terms Aggregation.
Giving it a quick try I'm sending the following request to ES.
POST http://localhost:9200/people/_search
{
"size": 0,
"aggs": {
"agg_by_name": {
"terms": { "field": "name"}
}
}
}
And what I'm getting right now is just what the sample shows in the docs.
{
"took": 89,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 10000,
"relation": "gte"
},
"max_score": null,
"hits": []
},
"aggregations": {
"agg_by_name": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 9837,
"buckets": [
{
"key": "James",
"doc_count": 437
},
{
"key": "Eduard",
"doc_count": 367
},
{
"key": "Leonardo",
"doc_count": 235
},
{
"key": "George",
"doc_count": 209
},
{
"key": "Harrison",
"doc_count": 180
}, ...
However, I can't really get how to include further inner aggregations in the bucket. Something that would result in a document like this.
{
"key": "Harrison",
"doc_count": 180,
"lives_in_NY": 40,
"lives_in_CA": 140,
"distinct_surnames": [ ... ]
}
How should I structure my aggregation so that those are included bucket-wise?

You could try sometihng like this:
{
"size": 0,
"aggs": {
"getAllTheNames": {
"terms": {
"field": "name",
"size": 100
},
"aggs": {
"getAllTheSurnames": {
"terms": {
"field": "surname",
"size": 100
}
}
}
}
}
}
For living city could be something like:
{
"size": 0,
"aggs": {
"getAllTheNames": {
"terms": {
"field": "name",
"size": 100
},
"aggs": {
"getAllTheCities": {
"terms": {
"field": "city",
"size": 100
}
}
}
}
}
}

Elasticsearch nested aggregations returns duplicate results [duplicate]

This question already has an answer here:
how to return the count of unique documents by using elasticsearch aggregation
(1 answer)
Closed 5 years ago.
With this mapping:
PUT pizzas
{
"mappings": {
"pizza": {
"properties": {
"name": {
"type": "keyword"
},
"types": {
"type": "nested",
"properties": {
"topping": {
"type": "keyword"
},
"base": {
"type": "keyword"
}
}
}
}
}
}
}
And this data:
PUT pizzas/pizza/1
{
"name": "meat",
"types": [
{
"topping": "bacon",
"base": "normal"
},
{
"topping": "pepperoni",
"base": "normal"
}
]
}
PUT pizzas/pizza/2
{
"name": "veg",
"types": [
{
"topping": "broccoli",
"base": "normal"
}
]
}
If I run this nested aggregation query:
GET pizzas/_search
{
"size": 0,
"aggs": {
"types_agg": {
"nested": {
"path": "types"
},
"aggs": {
"base_agg": {
"terms": {
"field": "types.base"
}
}
}
}
}
}
I get this result:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0,
"hits": []
},
"aggregations": {
"types_agg": {
"doc_count": 3,
"base_agg": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "normal",
"doc_count": 3
}
]
}
}
}
}
I expected my aggregation to return a doc_count of 2 because there are only two documents which match my query. However it is clear that because it's an inverted index, it is finding 3 results and therefore 3 documents.
Is there anyway to get it to return unique document counts?
(tested in Elasticsearch 5.4.3)

Just discovered the answer shortly after asking the question.
Changing the aggregation query to be:
GET pizzas/_search
{
"size": 0,
"aggs": {
"types_agg": {
"nested": {
"path": "types"
},
"aggs": {
"base_agg": {
"terms": {
"field": "types.base"
},
"aggs": {
"top_reverse_nested": {
"reverse_nested": {}
}
}
}
}
}
}
}
Yields the result:
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0,
"hits": []
},
"aggregations": {
"types_agg": {
"doc_count": 3,
"base_agg": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "normal",
"doc_count": 3,
"top_reverse_nested": {
"doc_count": 2
}
}
]
}
}
}
}
The important part which was added to the query was:
"aggs": {
"top_reverse_nested": {
"reverse_nested": {}
}
}
Reverse nested join back to the root of the document so it only gets unique aggregations.
You can read about reverse_nested here.

Get count of particular field in a document using Elasticsearch

Requirement:
I want to find the count of aID for a particular category ID.
(i.e for categoryID 2532 i want the count as 2 that means it is assigned to two aID's).
I tried with aggregations but with that i can able to get only the doc count rather than field count.
Mappings
"List": {
"properties": {
"aId": {
"type": "long"
},
"CategoryList": {
"properties": {
"categoryId": {
"type": "long"
},
"categoryName": {
"type": "string"
}
}
}
}
}
Sample Document:
"List": [
{
"aId": 33074,
"CategoryList": [
{
"categoryId": 2532,
"categoryName": "VODAFONE"
}
]
},
{
"aId": 12074,
"CategoryList": [
{
"categoryId": 2532,
"categoryName": "VODAFONE"
}
]
},
{
"aId": 120755,
"CategoryList": [
{
"categoryId": 1234,
"categoryName": "SMPLKE"
}
]
}
]

using cardinality aggregation will not help you getting the desired results. Cardinality aggregation returns the count of distinct values for the field, where are you want to find the count of appearance for number of times for a field.
You can use the following query, Here you can first filter the document for CategoryList.categoryId and then run a simple terms aggregation on this field
POST index_name1111/_search
{
"query": {
"bool": {
"must": [{
"term": {
"CategoryList.categoryId": {
"value": 2532
}
}
}]
}
},
"aggs": {
"count_is": {
"terms": {
"field": "CategoryList.categoryId",
"size": 10
}
}
}
}
Response of above query -
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0,
"hits": []
},
"aggregations": {
"count_is": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 2532,
"doc_count": 2
}
]
}
}
}
Or you can also chuck away the filter and running the aggregation only will return you all categoryId with their count of appearance.
POST index_name1111/_search
{
size: 0,
"aggs": {
"count_is": {
"terms": {
"field": "CategoryList.categoryId",
"size": 10
}
}
}
}
Response of above query
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0,
"hits": []
},
"aggregations": {
"count_is": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 2532,
"doc_count": 2
},
{
"key": 1234,
"doc_count": 1
}
]
}
}
}
Using cardinality aggregation you will get the following response with following query
POST index_name1111/_search
{
"size": 0,
"query": {
"bool": {
"must": [{
"term": {
"CategoryList.categoryId": {
"value": 2532
}
}
}]
}
},
"aggs": {
"id_count": {
"cardinality": {
"field": "CategoryList.categoryId"
}
}
}
}
Response of above query which doesn't give you desired result, since two documents matched both with categoryId as 252 so count of distinct is 1.
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0,
"hits": []
},
"aggregations": {
"id_count": {
"value": 1
}
}
}
Hope this helps
Thanks

Terms Facet in Elasticsearch, how to filter out results with equal min/max values?

I am posting the below terms facet query to my Elasticsearch cluster. This results in thousands of term stats with unequal min/max values as well as even more results with equal min/max values. Those term stats with equal min/max values (eg. min=0, max= 0) are of no interest. How can I adapt the below query to only return the interesting results where a term stat's min/max values are different?
{
"facets": {
"terms": {
"terms_stats": {
"value_field": "amountReviews",
"key_field": "id",
"size": 2000000,
"order": "count"
},
"facet_filter": {
"fquery": {
"query": {
"filtered": {
"query": {
"bool": {
"should": [
{
"query_string": {
"query": "*"
}
}
]
}
},
"filter": {
"bool": {
"must": [
{
"match_all": {}
}
]
}
}
}
}
}
}
}
},
"size": 0
}
Response
{
"took": 1799,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 856000,
"max_score": 0,
"hits": []
},
"facets": {
"terms": {
"_type": "terms_stats",
"missing": 0,
"terms": [
{
"term": "c71e8c7aceed5aaa8697bafd8a413ed0",
"count": 7,
"total_count": 7,
"min": 1,
"max": 10,
"total": 31,
"mean": 4.428571428571429
},
{
"term": "b6e4db022e46a1ae98926c3ed24c8785",
"count": 7,
"total_count": 7,
"min": 7,
"max": 17,
"total": 72,
"mean": 10.285714285714286
},
{
"term": "859d0f2c4c7486ed668dcc713c059a99",
"count": 7,
"total_count": 7,
"min": 4,
"max": 8,
"total": 35,
"mean": 5
},
{
"term": "826aa5acf107b1e6c602f4450abeca7b",
"count": 7,
"total_count": 7,
"min": 0,
"max": 0,
"total": 0,
"mean": 0
} ,
...,
...
]
}
}
}

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Wrong sum with aggregations - elasticsearch

Related

Elasticsearch - aggregate over filtered data

Appending further aggregations within Terms Aggregation

Elasticsearch nested aggregations returns duplicate results [duplicate]

Get count of particular field in a document using Elasticsearch

Terms Facet in Elasticsearch, how to filter out results with equal min/max values?

Categories

Resources