Query on nested type with aggregation on nested types returns unexpected results - elasticsearch

We are using elasticsearch 5.6.4. As mentioned in the ES documentation,
aggregation operates in the context of the query scope, any filter
applied to the query will also apply to the aggregation.
Now, what I have is this :
An index with mapping :
{
"properties":{
"asset":{
"properties":{
"customerId":{
"type":"long"
}
}
},
"software":{
"type": "nested",
"properties":{
"id":{
"type":"long"
},
"name":{
"type":"text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
I have created several documents to perform various tests here. Docs are indexed on customerId. I have in all 10 documents each having 2 or more softwares. For testing aggregations on softwares, I created softwares with same ID across multiple documents. e.g. I have software with Id as 12 twice in doc with customerId 1 and 2 and 3. Also, Doc with customerId 2 has two softwares with Id as 12.
So there are 4 softwares with Id as 12 across documents 1, 2 and 3.
The aggregation result includes only the documents with customerId 1 and not 2 and 3 ,when this query is run :
{
"query" : {
"term":{
"asset.customerId":1
}
},
"aggregations" : {
"aggs" : {
"nested" : {
"path" : "software"
},
"aggregations" : {
"software.id.agg" : {
"terms" : {
"field" : "software.id",
"size" : 10,
"min_doc_count" : 1,
"shard_min_doc_count" : 0,
"show_term_doc_count_error" : false,
"order" : [
{
"_count" : "desc"
},
{
"_term" : "asc"
}
]
}
}
}
}
}
}
But when the query filter is run on a nested type (software.id), aggregation result includes all the docs (1, 2 and 3) and hence the buckets which should be filtered out because of the query are also present. :
{
"query" : {
"nested" : {
"query" : {
"match_phrase_prefix" : {
"software.id" : {
"query" : 12,
"slop" : 100,
"max_expansions" : 50,
"boost" : 1.0
}
}
},
"path" : "software",
"ignore_unmapped" : false,
"score_mode" : "none",
"boost" : 1.0
}
},
"aggregations" : {
"aggs" : {
"nested" : {
"path" : "software"
},
"aggregations" : {
"software.id.agg" : {
"terms" : {
"field" : "software.id",
"size" : 10,
"min_doc_count" : 1,
"shard_min_doc_count" : 0,
"show_term_doc_count_error" : false,
"order" : [
{
"_count" : "desc"
},
{
"_term" : "asc"
}
]
}
}
}
}
}
}
What's the correct way to provide the query filter on nested type so that it is applied on aggregation?

Related

Can Elastic Search do aggregations for within a document?

I have a mapping like this:
mappings: {
"seller": {
"properties" : {
"overallRating": {"type" : byte}
"items": [
{
itemName: {"type": string},
itemRating: {"type" : byte}
}
]
}
}
}
Each item will only have one itemRating. Each seller will only have one overall rating. There can be many items, and at most I'm expecting maybe 50 items with itemRatings. Not all items have to have an itemRating.
I'm trying to get an average rating for each seller that combines all itemRatings and the overallRating. I have looked into aggregations but all I have seen are aggregations for across all documents. The aggregation I'm looking to do is within the document itself, and I am not sure if that is possible. Any tips would be appreciated.
Yes this is very much possible with Elasticeasrch. To produce a combined rating, you simply need to subaggregate by the document id. The only thing present in the bucket would be the individual document . That is what you want.
Here is an example:
Create the index:
PUT /ratings
{
"mappings": {
"properties": {
"overallRating": {"type" : "float"},
"items": {
"type" : "nested",
"properties": {
"itemName" : {"type" : "keyword"},
"itemRating" : {"type" : "float"},
"overallRating": {"type" : "float"}
}
}
}
}
}
Add some data:
POST ratings/_doc/
{
"overallRating" : 1,
"items" : [
{
"itemName" : "labrador",
"itemRating" : 10,
"overallRating" : 1
},
{
"itemName" : "saint bernard",
"itemRating" : 20,
"overallRating" : 1
}
]
}
{
"overallRating" : 1,
"items" : [
{
"itemName" : "cat",
"itemRating" : 5,
"overallRating" : 1
},
{
"itemName" : "rat",
"itemRating" : 10,
"overallRating" : 1
}
]
}
Query the index for a combined rating and sort by the rating:
GET ratings/_search
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"average_rating": {
"composite": {
"sources": [
{
"ids": {
"terms": {
"field": "_id"
}
}
}
]
},
"aggs": {
"average_rating": {
"nested": {
"path": "items"
},
"aggs": {
"avg": {
"avg": {
"field": "items.compound"
}
}
}
}
}
}
},
"runtime_mappings": {
"items.compound": {
"type": "double",
"script": {
"source": "emit(doc['items.overallRating'].value + doc['items.itemRating'].value)"
}
}
}
}
The result (Pls note that i changed the exact values of ratings between writing the answer and running it in the console, so the averages are a bit different)
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"average_rating" : {
"after_key" : {
"ids" : "3vUp44EBbR3hrRYkA8pj"
},
"buckets" : [
{
"key" : {
"ids" : "3_Up44EBbR3hrRYkLsrC"
},
"doc_count" : 1,
"average_rating" : {
"doc_count" : 2,
"avg" : {
"value" : 151.0
}
}
},
{
"key" : {
"ids" : "3vUp44EBbR3hrRYkA8pj"
},
"doc_count" : 1,
"average_rating" : {
"doc_count" : 2,
"avg" : {
"value" : 8.5
}
}
}
]
}
}
}
One change for convenience:
I edited your mappings to add the overAllRating to each Item entry. This simplifies the calculations that come subsequently, simply because you only look in the nested scope and never have to step out.
I also had to use a "runtime mapping" to combine the value of each overAllRating and ItemRating, to produce a better average. I basically made a sum of every ItemRating with the OverAllRating and averaged those across every entry.
I had to use a top level composite "id" aggregation so that we only get results per document (which is what you want).
There is some pretty heavy lifting happening here, but it is very possible and easy to edit this as you require.
HTH.

How can I aggregate the whole field value in Elasticsearch

I am using Elasticsearch 7.15 and need to aggregate a field and sort them by order.
My document saved in Elasticsearch looks like:
{
"logGroup" : "/aws/lambda/myLambda1",
...
},
{
"logGroup" : "/aws/lambda/myLambda2",
...
}
I need to find out which logGroup has the most document. In order to do that, I tried to use aggregate in Elasticsearch:
GET /my-index/_search?size=0
{
"aggs": {
"types_count": {
"terms": {
"field": "logGroup",
"size": 10000
}
}
}
}
the output of this query looks like:
"aggregations" : {
"types_count" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "aws",
"doc_count" : 26303620
},
{
"key" : "lambda",
"doc_count" : 25554470
},
{
"key" : "myLambda1",
"doc_count" : 25279201
}
...
}
As you can see from above output, it splits the logGroup value into terms and aggregate based on term not the whole string. Is there a way for me to aggregate them as a whole string?
I expect the output looks like:
"buckets" : [
{
"key" : "/aws/lambda/myLambda1",
"doc_count" : 26303620
},
{
"key" : "/aws/lambda/myLambda2",
"doc_count" : 25554470
},
The logGroup field in the index mapping is:
"logGroup" : {
"type" : "text",
"fielddata" : true
},
Can I achieve it without updating the index?
In order to get what you expect you need to change your mapping to this:
"logGroup" : {
"type" : "keyword"
},
Failing to do that, your log groups will get analyzed by the standard analyzer which splits the whole string and you'll not be able to aggregate by full log groups.
If you don't want or can't change the mapping and reindex everything, what you can do is the following:
First, add a keyword sub-field to your mapping, like this:
PUT /my-index/_mapping
{
"properties": {
"logGroup" : {
"type" : "text",
"fields": {
"keyword": {
"type" : "keyword"
}
}
}
}
}
And then run the following so that all existing documents pick up this new field:
POST my-index/_update_by_query?wait_for_completion=false
Finally, you'll be able to achieve what you want with the following query:
GET /my-index/_search
{
"size": 0,
"aggs": {
"types_count": {
"terms": {
"field": "logGroup.keyword",
"size": 10000
}
}
}
}

Include parent _source fields in nested top hits aggregation

I am trying to aggregate on a field and get the top records using top_ hits but I want to include other fields in the response which are not included in the nested property mapping. Currently if I specify _source:{"include":[]}, I am able to get only the fields which are in the current nested property.
Here is my mapping
{
"my_cart":{
"mappings":{
"properties":{
"store":{
"properties":{
"name":{
"type":"keyword"
}
}
},
"sales":{
"type":"nested",
"properties":{
"Price":{
"type":"float"
},
"Time":{
"type":"date"
},
"product":{
"properties":{
"name":{
"type":"text",
"fields":{
"keyword":{
"type":"keyword"
}
}
}
}
}
}
}
}
}
}
}
UPDATE
Joe's answer solved my above issue.
My current issue in response is that though I am getting the product name as "key" and other details, But I am getting other product names as well in the hits which were part of that transaction in the billing receipt. I want to aggregate on the product's name and find last sold date of each product along with other details such as price,quantity, etc .
Current Response
"aggregations" : {
"aggregate_by_most_sold_product" : {
"doc_count" : 2878592,
"all_products" : {
"buckets" : [
{
"key" : "shampoo",
"doc_count" : 1,
"lastSold" : {
"value" : 1.602569793E12,
"value_as_string" : "2018-10-13T06:16:33.000Z"
},
"using_reverse_nested" : {
"doc_count" : 1,
"latest product" : {
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.0,
"hits" : [
{
"_index" : "my_cart",
"_type" : "_doc",
"_id" : "36303258-9r7w-4b3e-ba3d-fhds7cfec7aa",
"_source" : {
"cashier" : {
"firstname" : "romeo",
"uuid" : "2828dhd-0911-7229-a4f8-8ab80dde86a6"
},
"product_price": {
"price":20,
"discount_offered":10
},
"sales" : [
{
"product" : {
"name" : "shampoo",
"time":"2018-10-13T04:44:26+00:00
},
"product" : {
"name" : "noodles",
"time":"2018-10-13T04:42:26+00:00
},
"product" : {
"name" : "biscuits",
"time":"2018-10-13T04:41:26+00:00
}
}
]
}
}
]
}
}
]
Expected Response
It gives me all product name's in that transaction which is increasing the bucket size. I only want single product name with the last date sold along with other details for each product.
My aggregation is same as Joe's aggregation in answer
Also my doubt is that can I also add scripts to perform actions on fields which I got in _source.
Ex:- price-discount_offered = Final amount.
The nested context does not have access to the parent unless you use reverse_nested. In that case, however, you've lost the ability to only retrieve the applicable nested subdocument. But there is luckily a way to sort a terms aggregation by the result of a different, numeric one:
GET my_cart/_search
{
"size": 0,
"aggs": {
"aggregate": {
"nested": {
"path": "sales"
},
"aggs": {
"all_products": {
"terms": {
"field": "sales.product.name.keyword",
"size": 6500,
"order": { <--
"lowest_date": "asc"
}
},
"aggs": {
"lowest_date": { <--
"min": {
"field": "sales.Time"
}
},
"using_reverse_nested": {
"reverse_nested": {}, <--
"aggs": {
"latest product": {
"top_hits": {
"_source": {
"includes": [
"store.name"
]
},
"size": 1
}
}
}
}
}
}
}
}
}
}
The caveat is that you won't be getting the store.name inside of the top_hits -- though I suspect you're probably already doing some post-processing on the client side where you could combine those entries:
"aggregate" : {
...
"all_products" : {
...
"buckets" : [
{
"key" : "myproduct", <--
...
"using_reverse_nested" : {
...
"latest product" : {
"hits" : {
...
"hits" : [
{
...
"_source" : {
"store" : {
"name" : "mystore" <--
}
}
}
]
}
}
},
"lowest_date" : {
"value" : 1.4200704E12,
"value_as_string" : "2015/01/01" <--
}
}
]
}
}

aggregation_execution_exception : Invalid aggregation order path,Sub-path points to non single-bucket aggregation

when i run es aggration :
"aggregations": {
"author": {
"terms": {
"field": "author",
"size": 100,
"min_doc_count": 1,
"shard_min_doc_count": 0,
"show_term_doc_count_error": false,
"order": {
"interactions-c>interactions-sum": "desc"
}
},
"aggregations": {
"interactions-c": {
"children": {
"type": "interactions"
},
"aggregations": {
"interactions-sum": {
"sum": {
"field": "interactions.likes"
}
}
}
}
}
}
}
exception:
{
"error" : {
"root_cause" : [
{
"type" : "aggregation_execution_exception",
"reason" : "Invalid aggregation order path [interactions-c>interactions-sum]. Buckets can only be sorted on a sub-aggregator path that is built out of zero or more single-bucket aggregations within the path and a final single-bucket or a metrics aggregation at the path end. Sub-path [interactions-c] points to non single-bucket aggregation"
}
],
"type" : "search_phase_execution_exception",
"reason" : "all shards failed",
"phase" : "query",
"grouped" : true,
"failed_shards" : [
{
"shard" : 0,
"index" : "article_20200910",
"node" : "fLYvCQjfTEKG0QIivtn3Hg",
"reason" : {
"type" : "aggregation_execution_exception",
"reason" : "Invalid aggregation order path [interactions-c>interactions-sum]. Buckets can only be sorted on a sub-aggregator path that is built out of zero or more single-bucket aggregations within the path and a final single-bucket or a metrics aggregation at the path end. Sub-path [interactions-c] points to non single-bucket aggregation"
}
}
]
},
"status" : 500
}
this is my index mapping:
{
"article" : {
"aliases" : { },
"mappings" : {
"properties" : {
"author" : {
"type" : "keyword"
},
"interactions" : {
"properties" : {
"comments" : {
"type" : "long"
},
"dislikes" : {
"type" : "long"
},
"forwards" : {
"type" : "long"
},
"likes" : {
"type" : "long"
},
"views" : {
"type" : "long"
}
}
},
"joinField" : {
"type" : "join",
"eager_global_ordinals" : false,
"relations" : {
"article" : [
"interactions"
]
}
}
}
}
}
}
i create a index which use join field (parent :article ;children :interations )
what i want:
aggreate interations number by author (author is a parent index field,interations is a children field)
then order by interations sum value desc
but es says childern aggs is not a single bucket aggs!So is there any way to this?

Error:Class cast exception in elastic search while sorting buckets in aggregation

Error:
ClassCastException[org.elasticsearch.search. aggregations.support.ValuesSource$Bytes$WithOrdinals$FieldData cannot
be cast to
org.elasticsearch.search.aggregations.support.ValuesSource$Numeric]}{[vTHdFzpuTEGMGR8MES_b9g]
My Query:
GET _search
{
"size" : 0,
"query" : {
"filtered" : {
"query" : {
"dis_max" : {
"tie_breaker" : 0.7,
"queries" : [ {
"bool" : {
"should" : [ {
"match" : {
"post.body" : {
"query" : "check",
"type" : "boolean"
}
}
}, {
"match" : {
"post.parentBody" : {
"query" : "check",
"type" : "boolean",
"boost" : 2.0
}
}
} ]
}
} ]
}
}
}
},
"aggregations" : {
"by_parent_id" : {
"terms" : {
"field" : "post.parentId",
"order" : {
"max_score" : "desc"
}
},
"aggregations" : {
"max_score" : {
"max" : {}
},
"top_post" : {
"top_hits" : {
"size" : 1
}
}
}
}
}
I want to sort buckets by max_score rather than by doc_count which is the default behaviour of elastic search.
I am trying to aggregate posts (which contains body and parentBody)
by parentId and then sorting buckets by max_score and in each bucket
I am getting top_hits. But I am getting the above error when I sorted
the buckets by defining max score aggregation. Rest everything works if I remove max_score aggregation. Every post object has parentId, body and parentBody. I have used the following references for coding this:
Elasticsearch Aggregation: How to Sort Bucket Order
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-top-hits-aggregation.html#_field_collapse_example
Tell me what am I doing wrong? I have shared the query above.

Resources