Sort parent type based on one field within an array of nested Object in elasticsearch - elasticsearch

I have below mapping in my index:
{
"testIndex": {
"mappings": {
"type1": {
"properties": {
"text": {
"type": "string"
},
"time_views": {
"properties": {
"timestamp": {
"type": "long"
},
"views": {
"type": "integer"
}
}
}
}
}
}
}
}
"time_views" actually is an array, but inner attributes not array.
I want to sort my type1 records based on maximum value of "views" attribute of each type1 record. I read elasticsearch sort documentation, it's have solution for use cases that sorting is based on field (single or array) of single nested object. but what I want is different. I want pick maximum value of "views" for each document and sort the documents based on these values
I made this json query
{
"size": 10,
"query": {
"range": {
"timeStamp": {
"gte": 1468852617347,
"lte": 1468939017347
}
}
},
"from": 0,
"sort": [
{
"time_views.views": {
"mode": "max",
"nested_path": "time_views",
"order": "desc"
}
}
]
}
but I got this error
{
"error": {
"phase": "query",
"failed_shards": [
{
"node": "n4rxRCOuSBaGT5xZoa0bHQ",
"reason": {
"reason": "[nested] nested object under path [time_views] is not of nested type",
"col": 136,
"line": 1,
"index": "data",
"type": "query_parsing_exception"
},
"index": "data",
"shard": 0
}
],
"reason": "all shards failed",
"grouped": true,
"type": "search_phase_execution_exception",
"root_cause": [
{
"reason": "[nested] nested object under path [time_views] is not of nested type",
"col": 136,
"line": 1,
"index": "data",
"type": "query_parsing_exception"
}
]
},
"status": 400
}
as I mentioned above time_views is an array and I guess this error is because of that.
even I can't use sorting based on array field feature, because "time_views" is not a primitive type.
I think my last chance is write a custom sorting by scripting, but I don't know how.
please tell me my mistake if it's possible to achieve to what I'm want, otherwise give me a simple script sample.
tnx :)

The error message does a lot to explain what is wrong with the query. Actually, the problem is with the mapping. And I think you intended on using nested fields, since you are using nested queries.
You just need to make your time_views field as nested:
"mappings": {
"type1": {
"properties": {
"text": {
"type": "string"
},
"time_views": {
"type": "nested",
"properties": {
"timestamp": {
"type": "long"
},
"views": {
"type": "integer"
}
}
}
}
}
}

Related

ElasticSearch painless filter script on text fields not working

I want to use an equality filter (exact match) using a painless script in ElasticSearch. I cannot use directly a term query because the check I want to do is on a text field (and not keyword), so I tried with a match_phrase. This is my mapping: I can't change it.
{
"my_index": {
"aliases": {},
"mappings": {
"properties": {
"my_field": {
"type": "text"
},
}
},
"settings": {
"index": {
"max_ngram_diff": "60",
"number_of_shards": "8",
"blocks": {
"read_only_allow_delete": "false",
"write": "false"
},
"analysis": {...}
}
}
}
}
I tried this query, following this guide:
{
"size": 10,
"index": "my_index",
"body": {
"query": {
"bool": {
"should": [{
"match_phrase": {
"my_field": {
"query": "MY_VALUE",
"boost": 1.5,
"slop": 0
}
}
}],
"must": [],
"filter": [{
"script": {
"script": {
"lang": "painless",
"source": "doc['my_field'] == 'MY_VALUE'"
}
}
}],
"minimum_should_match": 1
}
}
}
}
Anyway, I got this error:
body:
{
"error": {
"root_cause": [
{
"type": "script_exception",
"reason": "runtime error",
"script_stack": [
"org.opensearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:101)",
"org.opensearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:53)",
"doc['my_field'] === 'MY_VALUE'",
" ^---- HERE"
],
"script": "doc['my_field'] === 'MY_VALUE'",
"lang": "painless",
"position": {
"offset": 4,
"start": 0,
"end": 30
}
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "my_index",
"node": "R99vOHeORlKsk9dnCzcMeA",
"reason": {
"type": "script_exception",
"reason": "runtime error",
"script_stack": [
"org.opensearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:101)",
"org.opensearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:53)",
"doc['my_field'] === 'MY_VALUE'",
" ^---- HERE"
],
"script": "doc['my_field'] === 'MY_VALUE'",
"lang": "painless",
"position": {
"offset": 4,
"start": 0,
"end": 30
},
"caused_by": {
"type": "illegal_argument_exception",
"reason": "No field found for [my_field] in mapping with types []"
}
}
}
]
},
"status": 400
}
It seems that doc doesn't contain text fields (I tried with other non-text fields and it works!)
Here they say that:
Doc values are a columnar field value store, enabled by default on all
fields except for analyzed text fields.
And here they say that:
text fields are searchable by default, but by default are not
available for aggregations, sorting, or scripting. Set fielddata=true
on your_field_name in order to load fielddata in memory by uninverting
the inverted index.
But I can't change the mapping.
How I can access text fields in a painless filter script?
(This is similar to ElasticSearch exact match on text field with script but more specific on the filtering script)
ScriptQuery only supports doc_values.
Doc values are the on-disk data structure, built at document index time, which makes this data access pattern possible. They store the same values as the _source but in a column-oriented fashion that is way more efficient for sorting and aggregations. Doc values are supported on almost all field types, with the notable exception of text and annotated_text fields.
As per discussion here
https://github.com/elastic/elasticsearch/issues/30984
Accessing the _source field is slow and something that we don't want to expose in the ScriptQuery because it would be need to be accessed on every document making the search very inefficient.
So you will either need to add keyword sub-field in mapping and reindex data or enable fields data - which will consume large memory

Elastic search array of objects nested range aggregation

I'm trying to make range aggregation on the following data set:
{
"ProductType": 1,
"ProductDefinition": "fc588f8e-14f2-4871-891f-c73a4e3d17ca",
"ParentProduct": null,
"Sku": "074617",
"VariantSku": null,
"Name": "Paraboot Avoriaz/Jannu Marron Brut Marron Brown Hiking Boot Shoes",
"AllowOrdering": true,
"Rating": null,
"ThumbnailImageUrl": "/media/1106/074617.jpg",
"PrimaryImageUrl": "/media/1106/074617.jpg",
"Categories": [
"399d7b20-18cc-46c0-b63e-79eadb9390c7"
],
"RelatedProducts": [],
"Variants": [
"84a7ff9f-edf0-4aab-87f9-ba4efd44db74",
"e2eb2c50-6abc-4fbe-8fc8-89e6644b23ef",
"a7e16ccc-c14f-42f5-afb2-9b7d9aefbc5c"
],
"PriceGroups": [
"86182755-519f-4e05-96ef-5f93a59bbaec"
],
"DisplayName": "Paraboot Avoriaz/Jannu Marron Brut Marron Brown Hiking Boot Shoes",
"ShortDescription": "",
"LongDescription": "<ul><li>Paraboot Avoriaz Mountaineering Boots</li><li>Marron Brut Marron (Brown)</li><li>Full leather inners and uppers</li><li>Norwegien Welted Commando Sole</li><li>Hand made in France</li><li>Style number : 074617</li></ul><p>As featured on Pritchards.co.uk</p>",
"UnitPrices": {
"EUR 15 pct": 343.85
},
"Taxes": {
"EUR 15 pct": 51.5775
},
"PricesInclTax": {
"EUR 15 pct": 395.4275
},
"Slug": "paraboot-avoriazjannu-marron-brut-marron-brown-hiking-boot-shoes",
"VariantsProperties": [
{
"Key": "ShoeSize",
"Value": "8"
},
{
"Key": "ShoeSize",
"Value": "10"
},
{
"Key": "ShoeSize",
"Value": "6"
}
],
"Guid": "0d4f6899-c66a-4416-8f5d-26822c3b57ae",
"Id": 178,
"ShowOnHomepage": true
}
I'm aggregating on VariantsProperties which have the following mapping
"VariantsProperties": {
"type": "nested",
"properties": {
"Key": {
"type": "keyword"
},
"Value": {
"type": "keyword"
}
}
}
Terms aggregations are working fine with following code:
{
"aggs": {
"Nest": {
"nested": {
"path": "VariantsProperties"
},
"aggs": {
"fieldIds": {
"terms": {
"field": "VariantsProperties.Key"
},
"aggs": {
"values": {
"terms": {
"field": "VariantsProperties.Value"
}
}
}
}
}
}
}
}
However when I try to do a range aggregation to get shoes in size between 8 - 12 such as:
{
"aggs": {
"Nest": {
"nested": {
"path": "VariantsProperties"
},
"aggs": {
"fieldIds": {
"range": {
"field": "VariantsProperties.Value",
"ranges": [ { "from": 8, "to": 12 }]
}
}
}
}
}
}
I get the following error:
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "Field [VariantsProperties.Value] of type [keyword] is not supported for aggregation [range]"
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "product-avenueproductindexdefinition-24476f82-en-us",
"node": "ejgN4XecT1SUfgrhzP8uZg",
"reason": {
"type": "illegal_argument_exception",
"reason": "Field [VariantsProperties.Value] of type [keyword] is not supported for aggregation [range]"
}
}
],
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Field [VariantsProperties.Value] of type [keyword] is not supported for aggregation [range]",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Field [VariantsProperties.Value] of type [keyword] is not supported for aggregation [range]"
}
}
},
"status": 400
}
Is there a way to "transform" the terms aggregation into a range aggregation, without the need of changing the schema? I know I could build the ranges myself by extracting the data from the terms aggregation and building the ranges out of it, however, I would prefer a solution within the elastic itself.
There are two ways to solve this:
Option A: Use a script instead of a field. This option will work without having to reindex your data, but depending on your volume of data, the performance might suffer.
POST test/_search
{
"aggs": {
"Nest": {
"nested": {
"path": "VariantsProperties"
},
"aggs": {
"fieldIds": {
"range": {
"script": "Integer.parseInt(doc['VariantsProperties.Value'].value)",
"ranges": [
{
"from": 8,
"to": 12
}
]
}
}
}
}
}
}
Option B: Add an integer sub-field in your mapping.
PUT my-index/_mapping
{
"properties": {
"VariantsProperties": {
"type": "nested",
"properties": {
"Key": {
"type": "keyword"
},
"Value": {
"type": "keyword",
"fields": {
"numeric": {
"type": "integer",
"ignore_malformed": true
}
}
}
}
}
}
}
Once your mapping is modified, you can run _update_by_query on your index in order to reindex the VariantsProperties.Value data
PUT my-index/_update_by_query
Finally, when this last command is done, you can run the range aggregation on the VariantsProperties.Value.numeric field.
Also note that this second but will be more performant on the long term.

Using Elasticsearch, how do I apply function scores to documents which conditionally have a property

I have a handful of indexes, some of which have a particular date property indicating when it was published (date_publish), and others do not. I am trying to apply a gauss function to decay the score of documents which were published a long time ago. The relevant indexes are correctly configured to recognise the date_publish property as a date.
I have set up my query as follows, specifically filtering documents which do not have the property:
{
"index": "index_contains_prop,index_does_not_contains_prop",
"body": {
"query": {
"function_score": {
"score_mode": "avg",
"query": {
"match_all": {}
},
"functions": [
{
"script_score": {
"script": {
"source": "0"
}
}
},
{
"filter": {
"exists": {
"field": "date_publish"
}
},
"gauss": {
"date_publish": {
"origin": "now",
"scale": "728d",
"offset": "7d",
"decay": 0.5
}
}
}
]
}
},
"from": 0,
"size": 1000
}
}
However, the query errors with the following:
{
"error": {
"root_cause": [
{
"type": "parsing_exception",
"reason": "unknown field [date_publish]",
"line": 1,
"col": 0
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "index_does_not_contains_prop",
"node": "1hfXZK4TT3-K288nIr0UWA",
"reason": {
"type": "parsing_exception",
"reason": "unknown field [date_publish]",
"line": 1,
"col": 0
}
}
]
},
"status": 400
}
I have RTFM'd many times, and i can't see any discrepancy - I ahve also tried wrapping the exists condition in a bool:must object, to no avail.
Have I misunderstood the purpose of the filter argument?
The exists query will only work on fields that are part of the index mapping. It will return only documents that have a value for this field, but the field itself still needs to be defined in the mapping. This is why you're getting an error - index_does_not_contains_prop does not have date_publish mapped. You can use the put mapping API to add this field to the indexes who don't have it (it won't change any document), and then your query should work.

ElasticSearch - Bucket average with Nested fields aggregation

I am trying to execute the following query in the elasticsearch. The scenario is I have one field in the document which has 3 subfields: time1, time2, and id, the field is an array of objects having the above fields.
I want to calculate the average of difference b/w time2 and time1 for all the items.
Query being executed is :
`{
"query":{"match_all":{}},
"aggs":{
"total_time_diff":{
"nested":{"path":"diff_list"},
"aggs":{
"diff_r":{
"sum":"doc['time2'].date.getMills()-doc['time1'].date.getMills()"
}
}
},
// Here I need average of the sum which is calculated in total_time_diff "sum" aggregation
"avg_diff":{
"avg_bucket":{"buckets_path":"total_time_diff"}
}
}
}`
I am gettting following error:
{
"error": {
"root_cause": [],
"type": "search_phase_execution_exception",
"reason": "",
"phase": "fetch",
"grouped": true,
"failed_shards": [],
"caused_by": {
"type": "class_cast_exception",
"reason": "org.elasticsearch.search.aggregations.bucket.nested.InternalNested cannot be cast to org.elasticsearch.search.aggregations.InternalMultiBucketAggregation"
}
},
"status": 503
}
Index Mapping
{
"my_index": {
"mappings": {
"response_index": {
"date_detection": false,
"diff_list": {
"type": "nested",
"properties": {
"age": {
"type": "long"
},
"time2": {
"type": "date"
},
"time1": {
"type": "date"
},
"id": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
}
}
Thank you in advance.
"aggs":{
"diff_r":{
"sum":"doc['time2'].date.getMills()-doc['time1'].date.getMills()"
}
}
isnot a bucket selector and so the total_time_diff wont work inside the last aggregation (avg_diff).
use a script instead (like)
"script": "doc['time2'].date.getMills()-doc['time1'].date.getMills()"
Let us know it it word.
I have found different solution for your problem here. Instead of doing the sum in the script and then looking for bucket script aggregation to work on it. i used average script aggregation using script.
Avg bucket aggregation will not work here for this sibling aggregation as the aggregation doing sum is not multi bucket aggregation.
i have made some changes to the script to compute the difference between two date fields. Following query should work for you.
{
"size": 0,
"aggs": {
"total_time_diff": {
"nested": {
"path": "diff_list"
},
"aggs": {
"diff_r": {
"avg": {
"script": {
"source": "doc['diff_list.time2'].value.millis - doc['diff_list.time1'].value.millis"
}
}
}
}
}
}
}
Hope this works for you.

Nested type in Elasticsearch: "object mapping can't be changed from nested to non-nested" when indexing a document

I try to index some nested documents into an Elasticsearch (v2.3.1) mapping which looks as follows (based on this example from the documentation):
PUT /my_index
{
"mappings": {
"blogpost": {
"properties": {
"title": { "type": "string" },
"comments": {
"type": "nested",
"properties": {
"name": { "type": "string" },
"comment": { "type": "string" }
}
}
}
}
}
}
However, I do not understand what my JSON documents have to look like in order to fit into that mapping. I tried with
PUT /my_index/some_type/1
{
"title": "some_title",
"comments": {
"name": "some_name",
"comment": "some_comment"
}
}
as well as with
PUT /my_index_some_type/1
{
"title": "some_title",
"comments": [
{
"name": "some_name",
"comment": "some_comment"
}
]
}
which both result in
{
"error":
{
"root_cause":
[
{
"type": "remote_transport_exception",
"reason": "[Caiman][172.18.0.4:9300][indices:data/write/index[p]]"
}
],
"type": "illegal_argument_exception",
"reason": "object mapping [comments] can't be changed from nested to non-nested"
},
"status": ​400
}
Which is the correct format to index nested documents? Any working examples are much appreciated, most examples here at SO or on other pages concentrate on nested queries rather than how the documents have been indexed before.
It seems you're really creating a document of type some_type and comments will default to a normal object (i.e. not nested), which is not allowed since you already have a nested object called comments in the blogpost mapping type in the same index.
Try this instead and it should work:
PUT /my_index/blogpost/1
{
"title": "some_title",
"comments": {
"name": "some_name",
"comment": "some_comment"
}
}

Resources