ElasticSearch Accessing Nested Documents in Script - Null Pointer Exception - elasticsearch

Gist: Trying to write a custom filter on nested documents using painless. Want to write error checks when there are no nested documents to surpass null_pointer_exception
I have a mapping as such (simplified and obfuscated)
{
"video_entry" : {
"aliases" : { },
"mappings" : {
"properties" : {
"captions_added" : {
"type" : "boolean"
},
"category" : {
"type" : "keyword"
},
"is_votable" : {
"type" : "boolean"
},
"members" : {
"type" : "nested",
"properties" : {
"country" : {
"type" : "keyword",
},
"date_of_birth" : {
"type" : "date",
}
}
}
}
Each video_entry document can have 0 or more members nested documents.
Sample Document
{
"captions_added": true,
"category" : "Mental Health",
"is_votable: : true,
"members": [
{"country": "Denmark", "date_of_birth": "1998-04-04T00:00:00"},
{"country": "Denmark", "date_of_birth": "1999-05-05T00:00:00"}
]
}
If one or more nested document exist, we want to write some painless scripts that'd check certain fields across all the nested documents. My script works on mappings with a few documents but when I try it on larger set of documents I get null pointer exceptions despite having every null check possible. I've tried various access patterns, error checking mechanisms but I get exceptions.
POST /video_entry/_search
{
"query": {
"script": {
"script": {
"source": """
// various NULL checks that I already tried
// also tried short circuiting on finding null values
if (!params['_source'].empty && params['_source'].containsKey('members')) {
def total = 0;
for (item in params._source.members) {
// custom logic here
// if above logic holds true
// total += 1;
}
return total > 3;
}
return true;
""",
"lang": "painless"
}
}
}
}
Other Statements That I've Tried
if (params._source == null) {
return true;
}
if (params._source.members == null) {
return true;
}
if (!ctx._source.contains('members')) {
return true;
}
if (!params['_source'].empty && params['_source'].containsKey('members') &&
params['_source'].members.value != null) {
// logic here
}
if (doc.containsKey('members')) {
for (mem in params._source.members) {
}
}
Error Message
&& params._source.members",
^---- HERE"
"caused_by" : {
"type" : "null_pointer_exception",
"reason" : null
}
I've looked into changing the structure (flattening the document) and the usage of must_not as indicated in this answer. They don't suit our use case as we need to incorporate some more custom logic.
Different tutorials use ctx, doc and some use params. To add to the confusion Debug.explain(doc.members), Debug.explain(params._source.members) return empty responses and I'm having a hard time figuring out the types.
Gist: Trying to write a custom filter on nested documents using painless. Want to write error checks when there are no nested documents to surpass null_pointer_exception
Any help is appreciated.

TLDr;
Elastic flatten objects. Such that
{
"group" : "fans",
"user" : [
{
"first" : "John",
"last" : "Smith"
},
{
"first" : "Alice",
"last" : "White"
}
]
}
Turn into:
{
"group" : "fans",
"user.first" : [ "alice", "john" ],
"user.last" : [ "smith", "white" ]
}
To access members inner value you need to reference it using doc['members.<field>'] as members will not exist on its own.
Details
As you may know, Elastic handles inner documents in its own way. [doc]
So you will need to reference them accordingly.
Here is what I did to make it work.
Btw, I have been using the Dev tools of kibana
PUT /so_test/
PUT /so_test/_mapping
{
"properties" : {
"captions_added" : {
"type" : "boolean"
},
"category" : {
"type" : "keyword"
},
"is_votable" : {
"type" : "boolean"
},
"members" : {
"properties" : {
"country" : {
"type" : "keyword"
},
"date_of_birth" : {
"type" : "date"
}
}
}
}
}
POST /so_test/_doc/
{
"captions_added": true,
"category" : "Mental Health",
"is_votable" : true,
"members": [
{"country": "Denmark", "date_of_birth": "1998-04-04T00:00:00"},
{"country": "Denmark", "date_of_birth": "1999-05-05T00:00:00"}
]
}
PUT /so_test/_doc/
{
"captions_added": true,
"category" : "Mental breakdown",
"is_votable" : true,
"members": []
}
POST /so_test/_doc/
{
"captions_added": true,
"category" : "Mental success",
"is_votable" : true,
"members": [
{"country": "France", "date_of_birth": "1998-04-04T00:00:00"},
{"country": "Japan", "date_of_birth": "1999-05-05T00:00:00"}
]
}
And then I did this query (it is only a bool filter, but I guess making it work for your own use case should not prove too difficult)
GET /so_test/_search
{
"query":{
"bool": {
"filter": {
"script": {
"script": {
"lang": "painless",
"source": """
def flag = false;
// /!\ notice how the field is referenced /!\
if(doc['members.country'].size() != 0)
{
for (item in doc['members.country']) {
if (item == params.country){
flag = true
}
}
}
return flag;
""",
"params": {
"country": "Japan"
}
}
}
}
}
}
}
BTW you were saying you were a bit confused about the context for painless. you can find in the documentation so details about it.
[doc]
In this case the filter context is the one we want to look at.

Related

Elasticsearch Sort By Length Of Text

I'm using elasticsearch 7.13 and code on kibana
This is my mapping
{
"full_text" : {
"properties" : {
"title" : {
"type" : "text",
"fielddata" : true
},
}
}
}
This is my data
"full_text" : [
{
"title" : "Pkd chuyên cho thuê kingdom 101 1pn đến 3pn giá rẻ nhất thị trường chỉ 11 triệu/căn. lh 0919504***"
}
]
This is my code to sort by length of full_text.title
"sort": {
"_script": {
"type": "number",
"order": "desc",
"script": {
"lang": "painless",
"source": "doc['full_text.title'].value.length()"
}
}
}
So why sort result return only 7?
"_source" : {
"full_text" : [
{
"title" : "Pkd chuyên cho thuê kingdom 101 1pn đến 3pn giá rẻ nhất thị trường chỉ 11 triệu/căn. lh 0919504***"
}
]
},
"sort": [
7.0
]
Because doc['full_text.title'] will split "title" into array, you need to join that array to string.
Try this:
"source": "int length = String.join(' ',doc['full_text.title']).length(); return length;"

How do I query a null date inside an array in elasticsearch?

In an elasticsearch query I am trying to search Document objects that have an array of approval notifications. The notifications are considered complete when dateCompleted is populated with a date, and considered pending when either dateCompleted doesn't exist or exists with null. If the document does not contain an array of approval notifications then it is out of the scope of the search.
I am aware of putting null_value for field dateCompleted and setting it to some arbitrary old date but that seems hackish to me.
I've tried to use Bool queries with must exist doc.approvalNotifications and must not exist doc.approvalNotifications.dateCompleted but that does not work if a document contains a mix of complete and pending approvalNotifications. e.g. it only returns document with ID 2 below. I am expecting documents with IDs 1 and 2 to be found.
How can I find pending approval notifications using elasticsearch?
PUT my_index/_mapping/Document
"properties" : {
"doc" : {
"properties" : {
"approvalNotifications" : {
"properties" : {
"approvalBatchId" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"approvalTransitionState" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"approvedByUser" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"dateCompleted" : {
"type" : "date"
}
}
}
}
}
}
Documents:
{
"id": 1,
"status": "Pending Notifications",
"approvalNotifications": [
{
"approvalBatchId": "e6c39194-5475-4168-9729-8ddcf46cf9ab",
"dateCompleted": "2018-11-15T16:09:15.346+0000"
},
{
"approvalBatchId": "05eaeb5d-d802-4a28-b699-5e593a59d445",
}
]
}
{
"id": 2,
"status": "Pending Notifications",
"approvalNotifications": [
{
"approvalBatchId": "e6c39194-5475-4168-9729-8ddcf46cf9ab",
}
]
}
{
"id": 3,
"status": "Complete",
"approvalNotifications": [
{
"approvalBatchId": "e6c39194-5475-4168-9729-8ddcf46cf9ab",
"dateCompleted": "2018-11-15T16:09:15.346+0000"
},
{
"approvalBatchId": "05eaeb5d-d802-4a28-b699-5e593a59d445",
"dateCompleted": "2018-11-16T16:09:15.346+0000"
}
]
}
{
"id": 4
"status": "No Notifications"
}
You are almost there, you can achieve the desired behavior by using nested datatype for the "approvalNotifications" field.
What happens is that Elasticsearch flattens your approvalNotifications objects, treating their subfields as subfields of the original document. The nested field instead will tell ES to index each inner object as an implicit separate object, though related to the original one.
To query nested objects one should use nested query.
Hope that helps!

Painless scripting Elastic Search : variable is not defined error when trying to access values from doc

I am trying to learn painless scripting in Elastic Search by following the official documentation. ( https://www.elastic.co/guide/en/elasticsearch/painless/6.0/painless-examples.html )
A sample of the document I am working with :
{
"uid" : "CT6716617",
"old_username" : "xyz",
"new_username" : "abc"
}
the following script fields query using params._source to access document values works :
{
"script_fields": {
"sales_price": {
"script": {
"lang": "painless",
"source": "(params._source.old_username != params._source.new_username) ? \"change\" : \"nochange\"",
"params": {
"change": "change"
}
}
}
}
}
The same query but using the doc map to access values fails :
{
"script_fields": {
"sales_price": {
"script": {
"lang": "painless",
"source": "(doc['old_username'] != doc['new_username']) ? \"change\" : \"nochange\"",
"params": {
"change": "change"
}
}
}
}
}
The error message I get is :
"caused_by" : {
"type" : "illegal_argument_exception",
"reason" : "Variable [old_username] is not defined."
}
Based on the documentation both of the approaches should work, especially the 2nd one. I am not sure what I am missing here.?

Update multi level nested document in elasticsearch

Using Elasticsearch 1.7.1, I have the following document structure
"_source" : {
"questions" : {
"defaultQuestion" : {
"tag" : 0,
"gid" : 0,
"rid" : 0,
"caption" : "SRID",
},
"tableQuestion" : {
"rows" : [{
"ids" : {
"answerList" : ["3547", "3548"],
"tag" : "0",
"caption" : "Accounts",
},
"name" : {
"answerList" : ["Some Name"],
"tag" : "0",
"caption" : "Name",
}
}
],
"caption" : "BPI 1500541753537",
"id" : 644251570,
"tag" : ""
}
},
"id" : "447722821"
}
I want to add a new object in in questions.tableQuestion.rows. My current script is replacing the existing object with the new one. Kindly suggest how to append it instead. Following is my update script.
{ "update": {"_id": "935663867", "_retry_on_conflict" : 3} }
{ "script" : "ctx._source.questions += param1", "params" : {"param1" : {"tableQuestion": {"rows" : [ NEWROWOBJECT ]} } }}
You can build the path with next nested fields, right to the rows property and then use += operator. It's also good to have a check if rows array is null and initialize it in this case.
Checked with ES 2.4, but should be similar for earlier versions:
POST http://127.0.0.1:9200/sample/demo/{document_id}/_update
{
"script": {
"inline": "if (ctx._source.questions.tableQuestion.rows == null) ctx._source.questions.tableQuestion.rows = new ArrayList(); ctx._source.questions.tableQuestion.rows += param1;",
"params" : {
"param1" : {
"ids": {
"answerList": [
"478",
"255"
],
"tag": "2",
"caption": "My Test"
},
"name": {
"answerList": [
"My Name"
],
"tag": "1",
"caption": "My Demo"
}
}
}
}
}
For ES 5.x and Painless language the script is a bit different:
POST http://127.0.0.1:9200/sample/demo/{document_id}/_update
{
"script": {
"inline": "if (ctx._source.questions.tableQuestion.rows == null) { ctx._source.questions.tableQuestion.rows = new ArrayList();} ctx._source.questions.tableQuestion.rows.add(params.param1);",
"params" : {
"param1" : {
...
}
}
}
}
Update to the additional comment
If some part of the path is dynamic, you can also use parameters to build the path - with get(param_name) method - try this syntax (I removed the null check for simplicity):
{
"script": {
"inline": "ctx._source.questions.get(param2).rows += param1;",
"params" : {
"param2" : "6105243",
"param1" : {
"ids": {
"answerList": [
"478",
"255"
],
"tag": "2",
"caption": "My Test"
},
"name": {
"answerList": [
"My Name"
],
"tag": "1",
"caption": "My Demo"
}
}
}
}
}

ElasticSearch - Copy one field value to other field for all documents

We have a field "name" in the index. We recently added a new field "alias".
I want to copy name field value to the new field alias for all documents.
Is there any Update query that will do this?
If that is not possible , Help me to achieve this.
Thanks in advance
I am trying this query
http://URL/index/profile/_update_by_query
{
"query": {
"constant_score" : {
"filter" : {
"exists" : { "field" : "name" }
}
}
},
"script" : "ctx._source.alias = name;"
}
In the script , I am not sure how to give name field.
I getting error
{
"error": {
"root_cause": [
{
"type": "class_cast_exception",
"reason": "java.lang.String cannot be cast to java.util.Map"
}
],
"type": "class_cast_exception",
"reason": "java.lang.String cannot be cast to java.util.Map"
},
"status": 500
}
Indeed, the syntax has changed a tiny little bit since. You need to modify your query to this:
POST index/_update_by_query
{
"query": {
"constant_score" : {
"filter" : {
"exists" : { "field" : "name" }
}
}
},
"script" : {
"inline": "ctx._source.alias = ctx._source.name;"
}
}
UPDATE for ES 6
Use source instead of inline

Resources