How to use nested field in Elasticsearch filter script - elasticsearch

I have following mappings:
"properties": {
"created": {
"type": "date"
},
"id": {
"type": "keyword"
},
"identifier": {
"type": "keyword"
},
"values": {
"properties": {
"description_created-date": {
"properties": {
"<all_channels>": {
"properties": {
"<all_locales>": {
"type": "date"
}
}
}
}
},
"footwear_size-option": {
"properties": {
"<all_channels>": {
"properties": {
"<all_locales>": {
"type": "keyword"
}
}
}
}
}
}
}
}
Now I would like to create a query based on description_created-date field and use this value in painless script by comparing to some date.
GET index/pim_catalog_product/_search
{
"query": {
"constant_score": {
"filter": {
"bool": {
"filter": [
{
"script": {
"script": {
"source": "doc['values']['description_created-date']['<all_channels>']['<all_locales>'].value == '2019-12-19'",
"lang": "painless"
}
}
}
]
}
}
}
}
}
But I get following error:
{
"shard": 0,
"index": "index",
"node": "cmh1RMS1SHO92SA3jPAkJA",
"reason": {
"type": "script_exception",
"reason": "runtime error",
"script_stack": [
"org.elasticsearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:81)",
"org.elasticsearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:39)",
"doc['values']['description_created-date']['<all_channels>']['<all_locales>'].value == '2019-12-19'",
" ^---- HERE"
],
"script": "doc['values']['description_created-date']['<all_channels>']['<all_locales>'].value == '2019-12-19'",
"lang": "painless",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "No field found for [values] in mapping with types [pim_catalog_product]"
}
}
}
(I know I can't compare dates like this, but this is another problem).
Searching by values.description_created-date field works:
GET index/pim_catalog_product/_search
{
"query": {
"match": {
"values.description_created-date.<all_channels>.<all_locales>": "2019-12-19"
}
}
}
And when I get specific document, value of this field is presented like this:
"values": {
"description_created-date": {
"<all_channels>": {
"<all_locales>": "2019-12-19"
}
}
}
How can I use this field in script filter? I need this to perform something like this:
(pseudocode)
"source": "doc['values']['stocks_created-date'].value > doc['created'].value + 2 days"
I'm using elasicsearch v6.5.0, here is a docker-compose file with elasticsearch and kibana:
version: '3'
services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:6.5.0
environment:
- discovery.type=single-node
ports:
- 9200:9200
kibana:
image: docker.elastic.co/kibana/kibana:6.5.0
ports:
- 5601:5601
and gist with full mappings and sample data here
Thanks.

Thanks for the expanded mappings! When calling a field within a nested object, try referring to the inner field using the dot notation. Example:
"source": "doc['values.description_created.<all_channels>.<all_locales>'].value == 2019"
Also, you could reduce your compound queries to just your main constant_score compound query. Example:
GET index/_search
{
"query": {
"constant_score": {
"filter": {
"script": {
"script": {
"source": "doc['values.description_created.<all_channels>.<all_locales>'].value == 2019"
}
}
},
"boost": 1
}
}
}
NOTE: The "boost" value is optional, but it's the default if you don't provide a boost value.

Related

How to filter on nested document length by script in Elasticsearch

I am trying to filter documents that have at least a given amount of items in a nested field, but I keep getting the following exception:
"caused_by" : {
"type" : "illegal_argument_exception",
"reason" : "No field found for [items] in mapping"
}
Here's an example code to reproduce:
PUT store
{
"mappings": {
"properties": {
"subject": {
"type": "keyword"
},
"items": {
"type": "nested",
"properties": {
"name": {
"type": "keyword"
},
"count": {
"type": "integer"
}
}
}
}
}
}
POST store/_bulk?refresh=true
{"create":{"_index":"store","_id":"1"}}
{"type":"appliance","items":[{"name":"Color TV"}]}
{"create":{"_index":"store","_id":"2"}}
{"type":"vehicle","items":[{"name":"Car"},{"name":"Bicycle"}]}
{"create":{"_index":"store","_id":"3"}}
{"type":"instrument","items":[{"name":"Guitar"},{"name":"Piano"},{"name":"Drums"}]}
GET store/_search
{
"query": {
"bool": {
"filter": [
{
"script": {
"script": {
"source": "doc['items'].size() > 1"
}
}
}
]
}
}
}
Please note that this is only a simplified filter script of what I really wanted to do, and if I can get over this, I will probable be able to solve my task as well.
Any help would be appreciated.
I ended up solving it with a custom score approach:
GET store/_search
{
"min_score": 0.1,
"query": {
"function_score": {
"query": {
"match_all": {}
},
"functions": [
{
"script_score": {
"script": {
"source": "params['_source']['items'].length > 1 ? 1 : 0"
}
}
}
]
}
}
}

Is there is any way to iterate elastic array document like other programming language with script

Mapping
{
"supply": {
"properties": {
"rotation_list": {
"type": "nested",
"properties": {
"project_end_date": {
"type": "nested",
"properties": {
"end_date": {
"type": "date",
"format": "yyyy-MM-ddTHH:mm:ss"
}
}
},
"total_days": {
"type": "integer"
}
}
}
}
}}
Data
{"rotation_list": [
{
"project_end_date": [
{
"end_date": "2020-08-07"
},
{
"end_date": "2020-06-07"
}
],
"total_days": 23
},
{
"project_end_date": [
{
"end_date": "2020-08-07"
}
],
"total_days": 26
}]}
query
{"query": {
"bool": {
"filter": {
"bool": {
"must": [
{
"nested": {
"path": "rotation_list.project_end_date",
"query": {
"script": {
"script": {
"lang": "groovy",
"inline": "import org.elasticsearch.common.logging.*;logger=ESLoggerFactory.getLogger('myscript');def ratable =false;logger.info(doc['rotation_list.project_end_date.end_date'].values)"
}
}
}
}
}
]
}
}
}}}
Log result
[INFO ][myscript] [1596758400000] [INFO ][myscript] [1591488000000] [INFO ][myscript] [1596758400000]
I am not sure why this is happning. Is there is any way to iterate like [1596758400000, 1591488000000] and [1596758400000].
Data is saved like this as well. I have mentioned in the mapping as well nested type. Not sure why this is returning like this. Is there is any way to iterate like original document i have indexed.
It's impossible to access a nested doc's nested neighbor in a script query due to the nature of nested whereby each (sub)document is treated as a separate document -- be it on the top level or within an array of objects like your rotation_list.project_end_date.
The only permissible situation of having access to the whole context of a nested field is within script_fields -- but you unfortunately cannot query by them -- only construct them on the fly & retrieve them:
Using your mapping from above
GET supply_nested/_search
{
"script_fields": {
"combined_end_dates": {
"script": {
"lang": "painless",
"source": "params['_source']['rotation_list'][0]['project_end_date']"
}
}
}
}
Iterating within a script query be possible only if rotation_list alone were nested but not project_end_date. Using 7.x here:
PUT supply_non_nested
{
"mappings": {
"properties": {
"rotation_list": {
"type": "nested",
"properties": {
"project_end_date": {
"type": "object",
"properties": {
"end_date": {
"type": "date",
"format": "yyyy-MM-dd"
}
}
},
"total_days": {
"type": "integer"
}
}
}
}
}
}
Sync a doc:
POST supply_non_nested/_doc
{
"rotation_list": [
{
"project_end_date": [
{
"end_date": "2020-08-07"
},
{
"end_date": "2020-06-07"
}
],
"total_days": 23
},
{
"project_end_date": [
{
"end_date": "2020-08-07"
}
],
"total_days": 26
}
]
}
Query using painless instead of groovy because it's more secure & less verbose in this case:
GET supply_non_nested/_search
{
"query": {
"bool": {
"filter": {
"bool": {
"must": [
{
"nested": {
"path": "rotation_list",
"query": {
"script": {
"script": {
"lang": "painless",
"inline": "Debug.explain(doc['rotation_list.project_end_date.end_date'])"
}
}
}
}
}
]
}
}
}
}
}
yielding
...
"reason": {
...
"to_string": "[2020-06-07T00:00:00.000Z, 2020-08-07T00:00:00.000Z]",
"java_class": "org.elasticsearch.index.fielddata.ScriptDocValues$Dates",
}
...
It's not exactly clear from your snippet what you were trying to achieve in the query. Can you elaborate?

Elasticsearch search length of array

Here is my object profile:
{
"mappings": {
"_doc": {
"properties": {
"name": {
"type": "text"
},
"posts": {
"properties": {
"id": {
"type": "text"
},
"create_date": {
"type": "long"
}
}
}
}
}
}
}
I want to make a search: return all profiles which
1. have name "bob"
2. and have more than 5 posts
Here is an example that I found, but it does not work
{
"query": {
"bool": {
"must": [
{
"term": {
"name": "bob"
}
}
],
"filter": [
{
"script": {
"script": "doc['posts'].values.size() > 5"
}
}
]
}
}
}
I get error "reason":"Variable [posts] is not defined."
update posts.id to keyword
{"id": {"type":"text"},"fields":{{"keyword":{"type":"keyword","ignore_above":256}}}}
Same error
"caused_by":{"type":"script_exception","reason":"compile error","script_stack":["doc[posts.id].values.size() > ..."," ^---- HERE"],"script":"doc[posts.id].values.size() > 5","lang":"painless","caused_by":{"type":"illegal_argument_exception","reason":"Variable [posts] is not defined."}}}}]},"status":400
According to https://www.elastic.co/guide/en/elasticsearch/reference/master/modules-scripting-fields.html this is because one document is missing the field posts (I suppose).
You could add a filter: "if field posts exists" or use in the script the condition "if doc.containsKey('posts')...."

Is it possible to update nested field by query?

I am using update by query plugin (https://github.com/yakaz/elasticsearch-action-updatebyquery/) to update documents by query.
In my case, there is nested field in document, the mapping is something like this:
"mappings": {
"mytype": {
"properties": {
"Myfield1": {
"type": "nested",
"properties": {
"field1": {
"type": "string"
},
"field2": {
"type": "long"
}
}
},
"Title": {
"type": "string"
}
}
}
}
Then I want to update the nested field Myfield1 by query with following request:
But unfortunately, it does not work.
{
"query": {
"match": {
"Title": "elasticsearch"
}
},
"script": "ctx._source.Myfield1 = [{'nestfield1':'foo blabla...','nestfield2':100},{'nestfield1':'abc...','nestfield2':200}]"
}
Does update by query support nested object?
BTW: any other ways to update document by query?
Is the update by query plugin the only choice?
This example uses _update_by_query
POST indexname/type/_update_by_query
{
"query": {
"match": {
"Title": "elasticsearch"
}
},
"script": {
"source": "ctx._source.Myfield1= params.mifieldAsParam",
"params": {
"mifieldAsParam": [
{
"nestfield1": "foo blabla...",
"nestfield2": 100
},
{
"nestfield1": "abc...",
"nestfield2": 200
}
]
},
"lang": "painless"
}
}
Nested elements need to be iterated in painless script to update values
POST /index/_update_by_query
{
"script": {
"source": "for(int i=0;i<=ctx._source['Myfield1'].size()-1;i++){ctx._source.Myfield1[i].field1='foo blabla...';ctx._source.Myfield1[i].field2=100}",
"lang": "painless"
},
"query": {
"match": {
"Title": "elasticsearch"
}
}
}
Nested elements value update if index is known
POST /index/_update_by_query
{
"script": {
"source": "ctx._source.Myfield1[0].field1='foo blabla...';ctx._source.Myfield1[0].field2=100;ctx._source.Myfield1[1].field1='abc...';ctx._source.Myfield1[1].field2=200;",
"lang": "painless"
},
"query": {
"match": {
"Title": "elasticsearch"
}
}
}
You can try with params, something like this:
"query" : {
"match_all" : {}
},
"script" : "ctx._source.Myfield1 = Myfield1;",
"params": {
"Myfield1": {
"nestfield1": "foo blabla..."
}
}
In my case I'm moving the data from not nested fields in nested fields. I need to add fake information to initialize the nested field. It looks like that:
"query" : {
"match_all" : {}
},
"script" : "ctx._source.Myfield1 = Myfield1; ctx._source.Myfield1.nestfield1 = ctx._source.Myfield1Nestfield1; ctx._source.Myfield1.nestfield2 = ctx._source.Myfield1Nestfield2;",
"params": {
"Myfield1": {
"nestfield1": "init_data"
}
}

Add an extra flag in ES query to check if a field exists

I am writing a query to get some records like this:
curl -X GET 'http://localhost:9200/posts/post/_search?from=0&size=30&pretty' -d '{
"query": {
"filtered": {
"query": {
"query_string": {
"query": "content:(aid OR hiv)"
}
}
}
},
"fields": [
"content",
"entity_avatar_link",
"author_link",
"name"
],
size: 30,
from: 0
}
This much is working fine and I am getting the results.
I am trying to add a script field (which acts a flag) which returns whether a field exists in the doc along with every doc returned (I cannot return the field, as in most cases, it will be a very large size (an embedded field)). So, I added this also to the query:
"script_fields": {
"is_arranged_flag": {
"script": "!_source.arranged_retweets.empty"
}
}
So the whole query will be like:
curl -X GET 'http://localhost:9200/posts/post/_search?from=0&size=30&pretty' -d '{
"query": {
"filtered": {
"query": {
"query_string": {
"query": "content:(aid OR hiv)"
}
}
}
},
"fields": [
"content",
"entity_avatar_link",
"author_link",
"name"
],
"script_fields": {
"is_arranged_flag": {
"script": "!_source.arranged_retweets.empty"
}
}
size: 30,
from: 0
}
But after adding the script_fields section, no result is coming out (results is empty [] for the same search query).
I have also tried:
"script_fields": {
"is_arranged_flag": {
"script": "!doc['arranged_retweets'].empty"
}
}
What am I doing wrong?
Here is the mapping http://localhost:9200/posts/post/_mapping
{
"post": {
"properties": {
"arranged_retweets": {
"properties": {
"author_gender": {
"type": "string"
},
"author_link": {
"type": "string"
}
}
},
"content": {
"type": "string",
"analyzer": "tweet_analyzer"
},
"name": {
"type": "string",
"index": "not_analyzed",
"omit_norms": true,
"index_options": "docs"
},
"author_link": {
"type": "string",
"index": "not_analyzed",
"omit_norms": true,
"index_options": "docs"
},
"entity_avatar_link": {
"type": "string",
"index": "not_analyzed",
"omit_norms": true,
"index_options": "docs"
},
}
}
}
I think this is the valid script_fields segment.
"script_fields": {
"is_arranged_flag": {
"script": "!doc['arranged_retweets'].empty"
}
}
Reference: scripting (Read the section on document fields)
I figured it out with the help of the discussion here (https://groups.google.com/forum/#!topic/elasticsearch/BJZdlFSJSRg). The field arranged_retweets is an object. So, we need to check down to the inner level arranged_retweets.author_gender and check if it is empty like this:
"script_fields": {
"is_arranged_flag": {
"script": "!doc['arranged_retweets.author_gender'].empty"
}
}

Resources