ElasticSearch - Bucket average with Nested fields aggregation - elasticsearch

I am trying to execute the following query in the elasticsearch. The scenario is I have one field in the document which has 3 subfields: time1, time2, and id, the field is an array of objects having the above fields.
I want to calculate the average of difference b/w time2 and time1 for all the items.
Query being executed is :
`{
"query":{"match_all":{}},
"aggs":{
"total_time_diff":{
"nested":{"path":"diff_list"},
"aggs":{
"diff_r":{
"sum":"doc['time2'].date.getMills()-doc['time1'].date.getMills()"
}
}
},
// Here I need average of the sum which is calculated in total_time_diff "sum" aggregation
"avg_diff":{
"avg_bucket":{"buckets_path":"total_time_diff"}
}
}
}`
I am gettting following error:
{
"error": {
"root_cause": [],
"type": "search_phase_execution_exception",
"reason": "",
"phase": "fetch",
"grouped": true,
"failed_shards": [],
"caused_by": {
"type": "class_cast_exception",
"reason": "org.elasticsearch.search.aggregations.bucket.nested.InternalNested cannot be cast to org.elasticsearch.search.aggregations.InternalMultiBucketAggregation"
}
},
"status": 503
}
Index Mapping
{
"my_index": {
"mappings": {
"response_index": {
"date_detection": false,
"diff_list": {
"type": "nested",
"properties": {
"age": {
"type": "long"
},
"time2": {
"type": "date"
},
"time1": {
"type": "date"
},
"id": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
}
}
Thank you in advance.

"aggs":{
"diff_r":{
"sum":"doc['time2'].date.getMills()-doc['time1'].date.getMills()"
}
}
isnot a bucket selector and so the total_time_diff wont work inside the last aggregation (avg_diff).
use a script instead (like)
"script": "doc['time2'].date.getMills()-doc['time1'].date.getMills()"
Let us know it it word.

I have found different solution for your problem here. Instead of doing the sum in the script and then looking for bucket script aggregation to work on it. i used average script aggregation using script.
Avg bucket aggregation will not work here for this sibling aggregation as the aggregation doing sum is not multi bucket aggregation.
i have made some changes to the script to compute the difference between two date fields. Following query should work for you.
{
"size": 0,
"aggs": {
"total_time_diff": {
"nested": {
"path": "diff_list"
},
"aggs": {
"diff_r": {
"avg": {
"script": {
"source": "doc['diff_list.time2'].value.millis - doc['diff_list.time1'].value.millis"
}
}
}
}
}
}
}
Hope this works for you.

Related

How can I get parent with all children in one query

I have following mapping:
PUT /test_products
{
"mappings": {
"_doc": {
"properties": {
"type": {
"type": "keyword"
},
"name": {
"type": "text"
},
"entity_id": {
"type": "integer"
},
"weighted": {
"type": "integer"
}
"product_relation": {
"type": "join",
"relations": {
"window": "simple"
}
}
}
}
}
}
I want to get "window" products with all "simple"s but only where one or more "simple"s have property "weighted" = 1
I wrote following query:
GET test_products/_search
{
"query": {
"has_child": {
"type": "simple",
"query": {
"term": {
"weighted": 1
}
},
"inner_hits": {}
}
}
}
But I've got "window"s with "simple"s which are match to the term. In other words I want to filter "window"s list by "simple"'s option and get all matched "window"s with all their "simple"s. Is it possible without "nested" in one query? Or I have to do some queries?
OK. Luckily, I need to get only one "window" product with all it's children by it's ID, so I found parent_id query which can helps me with this task.
Now I have following query:
GET test_products/_search
{
"query": {
"parent_id": {
"type": "simple",
"id": "window-1"
}
}
}
Unfortunately, I have to execute 2 queries (has_child and then parent_id) instead of one but it's OK for me.

Unwind in ElasticSearch

I am currently having the below index in ElasticSearch
PUT my_index
{
"mappings": {
"doc": {
"properties": {
"type" : {
"type": "text",
"fielddata": true
},
"id" : {
"type": "text",
"fielddata": true
},
"nestedTypes": {
"type": "nested",
"properties": {
"nestedTypeId":{
"type": "integer"
},
"nestedType":{
"type": "text",
"fielddata": true
},
"isLead":{
"type": "boolean"
},
"share":{
"type": "float"
},
"amount":{
"type": "float"
}
}
}
}
}
}
}
I need the nested types to be displayed in a HTML table along with the id and type fields in each row.
I am trying to achieve something similar to unwind in MongoDB.
I have tried the reverse nested aggregation as below
GET my_index/_search
{
"size": 0,
"aggs": {
"NestedTypes": {
"nested": {
"path": "nestedTypes"
},
"aggs": {
"NestedType": {
"terms": {
"field": "nestedTypes.nestedType",
"order": {
"_key": "desc"
}
},
"aggs": {
"Details": {
"reverse_nested": {},
"aggs": {
"type": {
"terms": {
"field": "type"
}
},
"id": {
"terms": {
"field": "id"
}
}
}
}
}
}
}
}
}
}
But the above returns only one field from the nestedTypes, but I need all of them.
Also, I need sorting and pagination for this table. Could you please let me know how this can be achieved in ElasticSearch.
ElasticSearch does not support this operation out of the box. When a request was raised to implement the same in git, the below response was given:
We discussed it in Fixit Friday and agreed that we won't try to
implement it due to the fact that we can't think of a way to support
such operations efficiently.
The only ideas that we thought were reasonable boiled down to having
another index that stores the same data but flattened. Depending on
your use-case, you might be able to maintain those two views in
parallel or would only maintain the one you have today, then
materialize a flattened view of the data when you need it and throw it
away after you are done querying. In both cases, this requires
client-side logic.
The link to the request is here

Sort parent type based on one field within an array of nested Object in elasticsearch

I have below mapping in my index:
{
"testIndex": {
"mappings": {
"type1": {
"properties": {
"text": {
"type": "string"
},
"time_views": {
"properties": {
"timestamp": {
"type": "long"
},
"views": {
"type": "integer"
}
}
}
}
}
}
}
}
"time_views" actually is an array, but inner attributes not array.
I want to sort my type1 records based on maximum value of "views" attribute of each type1 record. I read elasticsearch sort documentation, it's have solution for use cases that sorting is based on field (single or array) of single nested object. but what I want is different. I want pick maximum value of "views" for each document and sort the documents based on these values
I made this json query
{
"size": 10,
"query": {
"range": {
"timeStamp": {
"gte": 1468852617347,
"lte": 1468939017347
}
}
},
"from": 0,
"sort": [
{
"time_views.views": {
"mode": "max",
"nested_path": "time_views",
"order": "desc"
}
}
]
}
but I got this error
{
"error": {
"phase": "query",
"failed_shards": [
{
"node": "n4rxRCOuSBaGT5xZoa0bHQ",
"reason": {
"reason": "[nested] nested object under path [time_views] is not of nested type",
"col": 136,
"line": 1,
"index": "data",
"type": "query_parsing_exception"
},
"index": "data",
"shard": 0
}
],
"reason": "all shards failed",
"grouped": true,
"type": "search_phase_execution_exception",
"root_cause": [
{
"reason": "[nested] nested object under path [time_views] is not of nested type",
"col": 136,
"line": 1,
"index": "data",
"type": "query_parsing_exception"
}
]
},
"status": 400
}
as I mentioned above time_views is an array and I guess this error is because of that.
even I can't use sorting based on array field feature, because "time_views" is not a primitive type.
I think my last chance is write a custom sorting by scripting, but I don't know how.
please tell me my mistake if it's possible to achieve to what I'm want, otherwise give me a simple script sample.
tnx :)
The error message does a lot to explain what is wrong with the query. Actually, the problem is with the mapping. And I think you intended on using nested fields, since you are using nested queries.
You just need to make your time_views field as nested:
"mappings": {
"type1": {
"properties": {
"text": {
"type": "string"
},
"time_views": {
"type": "nested",
"properties": {
"timestamp": {
"type": "long"
},
"views": {
"type": "integer"
}
}
}
}
}
}

Return Documents with Null Field in Multi Level Aggregation

We are using Multi Level Aggregation. We have Buckets of City and each Bucket has Buckets of Class.
For Few documents Class is Null and in such cases an empty bucket is returned for the City. Please refer to below response:
Sample Output:
"aggregations":
{
"CITY":{
"buckets":[
{
"key":"CITY 1",
"doc_count":2
"CLASS":{
"buckets":[
{
"key":"CLASS A",
"top_tag_hits":{
}
}
]
}
},
{
"key":"CITY 2",
"doc_count":2
"CLASS":{
"buckets":[
]
}
},
]
}
}
Here the key CITY 2 has an empty bucket of CLASS as all documents under key CITY 2 has the field CITY as null. But we are having a doc count.
How can we return documents under the bucket when terms field is null
Update:
Field Mapping for CLASS:
"CLASS":
{
"type": "string",
"index_analyzer": "text_with_autocomplete_analyzer",
"search_analyzer": "text_standard_analyzer",
"fields": {
"raw": {
"type": "string",
"null_value" : "na",
"index": "not_analyzed"
},
"partial_matching": {
"type": "string",
"index_analyzer": "text_with_partial_matching_analyzer",
"search_analyzer": "text_standard_analyzer"
}
}
}
Please refer to mapping to solve the issue.
You can use the missing setting or the terms aggregation in order to handle buckets with missing values. So in your case, you'd do it like this:
{
"aggs": {
"CITY": {
"terms": {
"field": "city_field"
},
"aggs": {
"CLASS": {
"terms": {
"field": "class_field",
"missing": "NO_CLASS"
}
}
}
}
}
}
With this setup, all documents than don't have a class_field field (or a null value) will land in the NO_CLASS bucket.
PS: Note that this only works since ES 2.0 and not in prior releases.

Elasticsearch getting the last nested or most recent nested element

We have this mapping:
{
"product_achievement": {
"type": "nested",
"properties": {
"id": {
"type": "long"
},
"last_purchase": {
"type": "long"
},
"products": {
"type": "long"
}
}
}
}
As you see this is nested, and the last_purchase field is a unixtimestamp value. We would like to query from all nested elements the most recent entry defined by the last_purchase field AND see if in the last entry there is some product id is in products.
You can achieve this using a nested query with inner_hits. In the query part, you can specify the product id you want to match and then using inner_hits you can sort by decreasing last_purchase timestamp and only take the first one using size: 1
{
"query": {
"nested": {
"path": "product_achievement",
"query": {
"term": {
"product_achievement.products": 1
}
},
"inner_hits": {
"size": 1,
"sort": {
"product_achievement.last_purchase": "desc"
}
}
}
}
}

Resources