How to filter on nested document length by script in Elasticsearch - elasticsearch

I am trying to filter documents that have at least a given amount of items in a nested field, but I keep getting the following exception:
"caused_by" : {
"type" : "illegal_argument_exception",
"reason" : "No field found for [items] in mapping"
}
Here's an example code to reproduce:
PUT store
{
"mappings": {
"properties": {
"subject": {
"type": "keyword"
},
"items": {
"type": "nested",
"properties": {
"name": {
"type": "keyword"
},
"count": {
"type": "integer"
}
}
}
}
}
}
POST store/_bulk?refresh=true
{"create":{"_index":"store","_id":"1"}}
{"type":"appliance","items":[{"name":"Color TV"}]}
{"create":{"_index":"store","_id":"2"}}
{"type":"vehicle","items":[{"name":"Car"},{"name":"Bicycle"}]}
{"create":{"_index":"store","_id":"3"}}
{"type":"instrument","items":[{"name":"Guitar"},{"name":"Piano"},{"name":"Drums"}]}
GET store/_search
{
"query": {
"bool": {
"filter": [
{
"script": {
"script": {
"source": "doc['items'].size() > 1"
}
}
}
]
}
}
}
Please note that this is only a simplified filter script of what I really wanted to do, and if I can get over this, I will probable be able to solve my task as well.
Any help would be appreciated.

I ended up solving it with a custom score approach:
GET store/_search
{
"min_score": 0.1,
"query": {
"function_score": {
"query": {
"match_all": {}
},
"functions": [
{
"script_score": {
"script": {
"source": "params['_source']['items'].length > 1 ? 1 : 0"
}
}
}
]
}
}
}

Related

Is there is any way to iterate elastic array document like other programming language with script

Mapping
{
"supply": {
"properties": {
"rotation_list": {
"type": "nested",
"properties": {
"project_end_date": {
"type": "nested",
"properties": {
"end_date": {
"type": "date",
"format": "yyyy-MM-ddTHH:mm:ss"
}
}
},
"total_days": {
"type": "integer"
}
}
}
}
}}
Data
{"rotation_list": [
{
"project_end_date": [
{
"end_date": "2020-08-07"
},
{
"end_date": "2020-06-07"
}
],
"total_days": 23
},
{
"project_end_date": [
{
"end_date": "2020-08-07"
}
],
"total_days": 26
}]}
query
{"query": {
"bool": {
"filter": {
"bool": {
"must": [
{
"nested": {
"path": "rotation_list.project_end_date",
"query": {
"script": {
"script": {
"lang": "groovy",
"inline": "import org.elasticsearch.common.logging.*;logger=ESLoggerFactory.getLogger('myscript');def ratable =false;logger.info(doc['rotation_list.project_end_date.end_date'].values)"
}
}
}
}
}
]
}
}
}}}
Log result
[INFO ][myscript] [1596758400000] [INFO ][myscript] [1591488000000] [INFO ][myscript] [1596758400000]
I am not sure why this is happning. Is there is any way to iterate like [1596758400000, 1591488000000] and [1596758400000].
Data is saved like this as well. I have mentioned in the mapping as well nested type. Not sure why this is returning like this. Is there is any way to iterate like original document i have indexed.
It's impossible to access a nested doc's nested neighbor in a script query due to the nature of nested whereby each (sub)document is treated as a separate document -- be it on the top level or within an array of objects like your rotation_list.project_end_date.
The only permissible situation of having access to the whole context of a nested field is within script_fields -- but you unfortunately cannot query by them -- only construct them on the fly & retrieve them:
Using your mapping from above
GET supply_nested/_search
{
"script_fields": {
"combined_end_dates": {
"script": {
"lang": "painless",
"source": "params['_source']['rotation_list'][0]['project_end_date']"
}
}
}
}
Iterating within a script query be possible only if rotation_list alone were nested but not project_end_date. Using 7.x here:
PUT supply_non_nested
{
"mappings": {
"properties": {
"rotation_list": {
"type": "nested",
"properties": {
"project_end_date": {
"type": "object",
"properties": {
"end_date": {
"type": "date",
"format": "yyyy-MM-dd"
}
}
},
"total_days": {
"type": "integer"
}
}
}
}
}
}
Sync a doc:
POST supply_non_nested/_doc
{
"rotation_list": [
{
"project_end_date": [
{
"end_date": "2020-08-07"
},
{
"end_date": "2020-06-07"
}
],
"total_days": 23
},
{
"project_end_date": [
{
"end_date": "2020-08-07"
}
],
"total_days": 26
}
]
}
Query using painless instead of groovy because it's more secure & less verbose in this case:
GET supply_non_nested/_search
{
"query": {
"bool": {
"filter": {
"bool": {
"must": [
{
"nested": {
"path": "rotation_list",
"query": {
"script": {
"script": {
"lang": "painless",
"inline": "Debug.explain(doc['rotation_list.project_end_date.end_date'])"
}
}
}
}
}
]
}
}
}
}
}
yielding
...
"reason": {
...
"to_string": "[2020-06-07T00:00:00.000Z, 2020-08-07T00:00:00.000Z]",
"java_class": "org.elasticsearch.index.fielddata.ScriptDocValues$Dates",
}
...
It's not exactly clear from your snippet what you were trying to achieve in the query. Can you elaborate?

How do I get the size of a 'nested' type array through a Painless script in Elasticsearch version 6.7?

I am using Elasticsearch version 6.7. I have the following mapping:
{
"customers": {
"mappings": {
"customer": {
"properties": {
"name": {
"type": "keyword"
},
"permissions": {
"type": "nested",
"properties": {
"entityId": {
"type": "keyword"
},
"entityType": {
"type": "keyword"
},
"permission": {
"type": "keyword"
},
"permissionLevel": {
"type": "keyword"
},
"userId": {
"type": "keyword"
}
}
}
}
}
}
}
}
I want to run a query to that shows all customers who have > 0 permissions. I have tried the following:
{
"query": {
"bool": {
"filter": {
"script": {
"script": {
"lang": "painless",
"source": "params._source != null && params._source.permissions != null && params._source.permissions.size() > 0"
}
}
}
}
}
}
But this returns no hits because params._source is null as Painless does not have access to the _source document according to this Stackoverflow post. How can I write a Painless script that gives me all customers who have > 0 permissions?
Solution 1: Using Script with must query
POST <your_index_name>/_search
{
"query": {
"bool": {
"must": [
{
"script": {
"script": {
"lang": "painless",
"inline": """
ArrayList st = params._source.permissions;
if(st!=null && st.size()>0)
return true;
"""
}
}
}
]
}
}
}
Solution 2: Using Exists Query on nested fields
You could simply make use of Exists query something like the below to get customers who have > 0 permissions.
Query:
POST <your_index_name>/_search
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "permissions",
"query": {
"bool": {
"should": [
{
"exists":{
"field": "permissions.permission"
}
},
{
"exists":{
"field": "permissions.entityId"
}
},
{
"exists":{
"field": "permissions.entityType"
}
},
{
"exists":{
"field": "permissions.permissionLevel"
}
}
]
}
}
}
}]
}
}
}
Solution 3: Create definitive structure but add empty values to the fields
Another alternative would be to ensure all documents would have the fields.
Basically,
Ensure that all the documents would have the permissions nested document
However for those who would not have the permissions, just set the field permissions.permission to 0
Construct a query that could help you get such documents accordingly
Below would be a sample document for a user who doesn't have permissions:
POST mycustomers/customer/1
{
"name": "john doe",
"permissions": [
{
"entityId" : "null",
"entityType": "null",
"permissionLevel": 0,
"permission": 0
}
]
}
The query in that case would be as simple as this:
POST <your_index_name>/_search
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "permissions",
"query": {
"range": {
"permissions.permission": {
"gte": 1
}
}
}
}
}
]
}
}
}
Hope this helps!

Elasticsearch search length of array

Here is my object profile:
{
"mappings": {
"_doc": {
"properties": {
"name": {
"type": "text"
},
"posts": {
"properties": {
"id": {
"type": "text"
},
"create_date": {
"type": "long"
}
}
}
}
}
}
}
I want to make a search: return all profiles which
1. have name "bob"
2. and have more than 5 posts
Here is an example that I found, but it does not work
{
"query": {
"bool": {
"must": [
{
"term": {
"name": "bob"
}
}
],
"filter": [
{
"script": {
"script": "doc['posts'].values.size() > 5"
}
}
]
}
}
}
I get error "reason":"Variable [posts] is not defined."
update posts.id to keyword
{"id": {"type":"text"},"fields":{{"keyword":{"type":"keyword","ignore_above":256}}}}
Same error
"caused_by":{"type":"script_exception","reason":"compile error","script_stack":["doc[posts.id].values.size() > ..."," ^---- HERE"],"script":"doc[posts.id].values.size() > 5","lang":"painless","caused_by":{"type":"illegal_argument_exception","reason":"Variable [posts] is not defined."}}}}]},"status":400
According to https://www.elastic.co/guide/en/elasticsearch/reference/master/modules-scripting-fields.html this is because one document is missing the field posts (I suppose).
You could add a filter: "if field posts exists" or use in the script the condition "if doc.containsKey('posts')...."

ElasticSearch query on tags

I am trying to crack the elasticsearch query language, and so far I'm not doing very good.
I've got the following mapping for my documents.
{
"mappings": {
"jsondoc": {
"properties": {
"header" : {
"type" : "nested",
"properties" : {
"plainText" : { "type" : "string" },
"title" : { "type" : "string" },
"year" : { "type" : "string" },
"pages" : { "type" : "string" }
}
},
"sentences": {
"type": "nested",
"properties": {
"id": { "type": "integer" },
"text": { "type": "string" },
"tokens": { "type": "nested" },
"rhetoricalClass": { "type": "string" },
"babelSynsetsOcc": {
"type": "nested",
"properties" : {
"id" : { "type" : "integer" },
"text" : { "type" : "string" },
"synsetID" : { "type" : "string" }
}
}
}
}
}
}
}
}
It mainly resembles a JSON file referring to a pdf document.
I have been trying to make queries with aggregations and so far is going great. I've gotten to the point of grouping by (aggregating) rhetoricalClass, get the total number of repetitions of babelSynsetsOcc.synsetID. Heck, even the same query even by grouping the whole result by header.year
But, right now, I am struggling with filtering the documents that contain a term and doing the same query.
So, how could I make a query such that grouping by rhetoricalClass and only taking into account those documents whose field header.plainText contains either ["Computational", "Compositional", "Semantics"]. I mean contain instead of equal!.
If I were to make a rough translation to SQL it would be something similar to
SELECT count(sentences.babelSynsetsOcc.synsetID)
FROM jsondoc
WHERE header.plainText like '%Computational%' OR header.plainText like '%Compositional%' OR header.plainText like '%Sematics%'
GROUP BY sentences.rhetoricalClass
WHERE clauses are just standard structured queries, so they translate to queries in Elasticsearch.
GROUP BY and HAVING loosely translate to aggregations in Elasticsearch's DSL. Functions like count, min max, and sum are a function of GROUP BY and it's therefore also an aggregation.
The fact that you're using nested objects may be necessary, but it adds an extra layer to each part that touches them. If those nested objects are not arrays, then do not use nested; use object in that case.
I would probably look at translating your query to:
{
"query": {
"nested": {
"path": "header",
"query": {
"bool": {
"should": [
{
"match": {
"header.plainText" : "Computational"
}
},
{
"match": {
"header.plainText" : "Compositional"
}
},
{
"match": {
"header.plainText" : "Semantics"
}
}
]
}
}
}
}
}
Alternatively, it could be rewritten as this, which is a little less obvious of its intent:
{
"query": {
"nested": {
"path": "header",
"query": {
"match": {
"header.plainText": "Computational Compositional Semantics"
}
}
}
}
}
The aggregation would then be:
{
"aggs": {
"nested_sentences": {
"nested": {
"path": "sentences"
},
"group_by_rhetorical_class": {
"terms": {
"field": "sentences.rhetoricalClass",
"size": 10
},
"aggs": {
"nested_babel": {
"path": "sentences.babelSynsetsOcc"
},
"aggs": {
"count_synset_id": {
"count": {
"field": "sentences.babelSynsetsOcc.synsetID"
}
}
}
}
}
}
}
}
Now, if you combine them and throw away hits (since you're just looking for the aggregated result), then it looks like this:
{
"size": 0,
"query": {
"nested": {
"path": "header",
"query": {
"match": {
"header.plainText": "Computational Compositional Semantics"
}
}
}
},
"aggs": {
"nested_sentences": {
"nested": {
"path": "sentences"
},
"group_by_rhetorical_class": {
"terms": {
"field": "sentences.rhetoricalClass",
"size": 10
},
"aggs": {
"nested_babel": {
"path": "sentences.babelSynsetsOcc"
},
"aggs": {
"count_synset_id": {
"count": {
"field": "sentences.babelSynsetsOcc.synsetID"
}
}
}
}
}
}
}
}

Is it possible to update nested field by query?

I am using update by query plugin (https://github.com/yakaz/elasticsearch-action-updatebyquery/) to update documents by query.
In my case, there is nested field in document, the mapping is something like this:
"mappings": {
"mytype": {
"properties": {
"Myfield1": {
"type": "nested",
"properties": {
"field1": {
"type": "string"
},
"field2": {
"type": "long"
}
}
},
"Title": {
"type": "string"
}
}
}
}
Then I want to update the nested field Myfield1 by query with following request:
But unfortunately, it does not work.
{
"query": {
"match": {
"Title": "elasticsearch"
}
},
"script": "ctx._source.Myfield1 = [{'nestfield1':'foo blabla...','nestfield2':100},{'nestfield1':'abc...','nestfield2':200}]"
}
Does update by query support nested object?
BTW: any other ways to update document by query?
Is the update by query plugin the only choice?
This example uses _update_by_query
POST indexname/type/_update_by_query
{
"query": {
"match": {
"Title": "elasticsearch"
}
},
"script": {
"source": "ctx._source.Myfield1= params.mifieldAsParam",
"params": {
"mifieldAsParam": [
{
"nestfield1": "foo blabla...",
"nestfield2": 100
},
{
"nestfield1": "abc...",
"nestfield2": 200
}
]
},
"lang": "painless"
}
}
Nested elements need to be iterated in painless script to update values
POST /index/_update_by_query
{
"script": {
"source": "for(int i=0;i<=ctx._source['Myfield1'].size()-1;i++){ctx._source.Myfield1[i].field1='foo blabla...';ctx._source.Myfield1[i].field2=100}",
"lang": "painless"
},
"query": {
"match": {
"Title": "elasticsearch"
}
}
}
Nested elements value update if index is known
POST /index/_update_by_query
{
"script": {
"source": "ctx._source.Myfield1[0].field1='foo blabla...';ctx._source.Myfield1[0].field2=100;ctx._source.Myfield1[1].field1='abc...';ctx._source.Myfield1[1].field2=200;",
"lang": "painless"
},
"query": {
"match": {
"Title": "elasticsearch"
}
}
}
You can try with params, something like this:
"query" : {
"match_all" : {}
},
"script" : "ctx._source.Myfield1 = Myfield1;",
"params": {
"Myfield1": {
"nestfield1": "foo blabla..."
}
}
In my case I'm moving the data from not nested fields in nested fields. I need to add fake information to initialize the nested field. It looks like that:
"query" : {
"match_all" : {}
},
"script" : "ctx._source.Myfield1 = Myfield1; ctx._source.Myfield1.nestfield1 = ctx._source.Myfield1Nestfield1; ctx._source.Myfield1.nestfield2 = ctx._source.Myfield1Nestfield2;",
"params": {
"Myfield1": {
"nestfield1": "init_data"
}
}

Resources