Update multi level nested document in elasticsearch - elasticsearch

Using Elasticsearch 1.7.1, I have the following document structure
"_source" : {
"questions" : {
"defaultQuestion" : {
"tag" : 0,
"gid" : 0,
"rid" : 0,
"caption" : "SRID",
},
"tableQuestion" : {
"rows" : [{
"ids" : {
"answerList" : ["3547", "3548"],
"tag" : "0",
"caption" : "Accounts",
},
"name" : {
"answerList" : ["Some Name"],
"tag" : "0",
"caption" : "Name",
}
}
],
"caption" : "BPI 1500541753537",
"id" : 644251570,
"tag" : ""
}
},
"id" : "447722821"
}
I want to add a new object in in questions.tableQuestion.rows. My current script is replacing the existing object with the new one. Kindly suggest how to append it instead. Following is my update script.
{ "update": {"_id": "935663867", "_retry_on_conflict" : 3} }
{ "script" : "ctx._source.questions += param1", "params" : {"param1" : {"tableQuestion": {"rows" : [ NEWROWOBJECT ]} } }}

You can build the path with next nested fields, right to the rows property and then use += operator. It's also good to have a check if rows array is null and initialize it in this case.
Checked with ES 2.4, but should be similar for earlier versions:
POST http://127.0.0.1:9200/sample/demo/{document_id}/_update
{
"script": {
"inline": "if (ctx._source.questions.tableQuestion.rows == null) ctx._source.questions.tableQuestion.rows = new ArrayList(); ctx._source.questions.tableQuestion.rows += param1;",
"params" : {
"param1" : {
"ids": {
"answerList": [
"478",
"255"
],
"tag": "2",
"caption": "My Test"
},
"name": {
"answerList": [
"My Name"
],
"tag": "1",
"caption": "My Demo"
}
}
}
}
}
For ES 5.x and Painless language the script is a bit different:
POST http://127.0.0.1:9200/sample/demo/{document_id}/_update
{
"script": {
"inline": "if (ctx._source.questions.tableQuestion.rows == null) { ctx._source.questions.tableQuestion.rows = new ArrayList();} ctx._source.questions.tableQuestion.rows.add(params.param1);",
"params" : {
"param1" : {
...
}
}
}
}
Update to the additional comment
If some part of the path is dynamic, you can also use parameters to build the path - with get(param_name) method - try this syntax (I removed the null check for simplicity):
{
"script": {
"inline": "ctx._source.questions.get(param2).rows += param1;",
"params" : {
"param2" : "6105243",
"param1" : {
"ids": {
"answerList": [
"478",
"255"
],
"tag": "2",
"caption": "My Test"
},
"name": {
"answerList": [
"My Name"
],
"tag": "1",
"caption": "My Demo"
}
}
}
}
}

Related

How to get the local day of week from timestamp in elasticsearch

I'm using the ingest pipeline script processors to extract the day of the week from the local time for each document.
I'm using the client_ip to extract the timezone, use that along with the timestamp to extract the local time, and then extract day of week (and other features) from that local time.
This is my ingest pipeline:
{
"processors" : [
{
"set" : {
"field" : "#timestamp",
"override" : false,
"value" : "{{_ingest.timestamp}}"
}
},
{
"date" : {
"field" : "#timestamp",
"formats" : [
"EEE MMM dd HH:mm:ss 'UTC' yyyy"
],
"ignore_failure" : true,
"target_field" : "#timestamp"
}
},
{
"convert" : {
"field" : "client_ip",
"type" : "ip",
"ignore_failure" : true,
"ignore_missing" : true
}
},
{
"geoip" : {
"field" : "client_ip",
"target_field" : "client_geo",
"properties" : [
"continent_name",
"country_name",
"country_iso_code",
"region_iso_code",
"region_name",
"city_name",
"location",
"timezone"
],
"ignore_failure" : true,
"ignore_missing" : true
}
},
{
"script" : {
"description" : "Extract details of Dates",
"lang" : "painless",
"ignore_failure" : true,
"source" : """
LocalDateTime local_time LocalDateTime.ofInstant( Instant.ofEpochMilli(ctx['#timestamp']), ZoneId.of(ctx['client_geo.timezone']));
int day_of_week = local_time.getDayOfWeek().getValue();
int hour_of_day = local_time.getHour();
int office_hours = 0;
if (day_of_week<6 && day_of_week>0) { if (hour_of_day >= 7 && hour_of_day <= 19 ) {office_hours =1;} else {office_hours = -1;}} else {office_hours = -1;}
ctx['day_of_week'] = day_of_week;
ctx['hour_of_day'] = hour_of_day;
ctx['office_hours'] = office_hours;
"""
}
}
]
}
The first two processors were added before for other purposes. I've added the last 3.
An example document could be the following:
"docs": [
{
"_source": {
"#timestamp": 43109942361111,
"client_ip": "89.160.20.128"
}
}
]
I'm getting the GeoIP fields in the data now, but none of the fields created by the script processor. What am I doing wrong?
EDIT
A few notes about the index that is affected by these changes:
The Dynamic mapping is off.
I have manually added the client_geo.timezone field to the mapping of the index as a keyword.
When I run the following scripted search on the index
GET index_name/_search
{
"script_fields": {
"day_of_week": {
"script": "doc['#timestamp'].value.withZoneSameInstant(ZoneId.of(doc['client_geo']['timezone'])).getDayOfWeek().getValue()"
}
}
}
I get the following runtime error in script execution:
"caused_by" : {
"type" : "illegal_argument_exception",
"reason" : "No field found for [client_geo] in mapping"
}
Thanks for a well formed question + example.
I was able to replicate your problem and figured it out.
ctx is "The document source as it is". Consequently, ingest does not automatically dig-up dot-delimited fields.
Your client data is added as such:
"client_geo" : {
"continent_name" : "Europe"
//<snip>..</snip>
}
So, you have to access it directly as a nested hash map.
Meaning ctx['client_geo.timezone'] should actually be ctx['client_geo']['timezone']
Here is the full pipeline that worked for me:
"processors": [
{
"set": {
"field": "#timestamp",
"override": false,
"value": "{{_ingest.timestamp}}"
}
},
{
"date": {
"field": "#timestamp",
"formats": [
"EEE MMM dd HH:mm:ss 'UTC' yyyy"
],
"ignore_failure": true,
"target_field": "#timestamp"
}
},
{
"convert": {
"field": "client_ip",
"type": "ip",
"ignore_failure": true,
"ignore_missing": true
}
},
{
"geoip": {
"field": "client_ip",
"target_field": "client_geo",
"properties": [
"continent_name",
"country_name",
"country_iso_code",
"region_iso_code",
"region_name",
"city_name",
"location",
"timezone"
],
"ignore_failure": true,
"ignore_missing": true
}
},
{
"script": {
"description": "Extract details of Dates",
"lang": "painless",
"ignore_failure": true,
"source": """
LocalDateTime local_time = LocalDateTime.ofInstant(Instant.ofEpochMilli(ctx['#timestamp']), ZoneId.of(ctx['client_geo']['timezone']));
int day_of_week = local_time.getDayOfWeek().getValue();
int hour_of_day = local_time.getHour();
int office_hours = 0;
if (day_of_week<6 && day_of_week>0) { if (hour_of_day >= 7 && hour_of_day <= 19 ) {office_hours =1;} else {office_hours = -1;}} else {office_hours = -1;}
ctx['day_of_week'] = day_of_week;
ctx['hour_of_day'] = hour_of_day;
ctx['office_hours'] = office_hours;
"""
}
}
]

ElasticSearch Accessing Nested Documents in Script - Null Pointer Exception

Gist: Trying to write a custom filter on nested documents using painless. Want to write error checks when there are no nested documents to surpass null_pointer_exception
I have a mapping as such (simplified and obfuscated)
{
"video_entry" : {
"aliases" : { },
"mappings" : {
"properties" : {
"captions_added" : {
"type" : "boolean"
},
"category" : {
"type" : "keyword"
},
"is_votable" : {
"type" : "boolean"
},
"members" : {
"type" : "nested",
"properties" : {
"country" : {
"type" : "keyword",
},
"date_of_birth" : {
"type" : "date",
}
}
}
}
Each video_entry document can have 0 or more members nested documents.
Sample Document
{
"captions_added": true,
"category" : "Mental Health",
"is_votable: : true,
"members": [
{"country": "Denmark", "date_of_birth": "1998-04-04T00:00:00"},
{"country": "Denmark", "date_of_birth": "1999-05-05T00:00:00"}
]
}
If one or more nested document exist, we want to write some painless scripts that'd check certain fields across all the nested documents. My script works on mappings with a few documents but when I try it on larger set of documents I get null pointer exceptions despite having every null check possible. I've tried various access patterns, error checking mechanisms but I get exceptions.
POST /video_entry/_search
{
"query": {
"script": {
"script": {
"source": """
// various NULL checks that I already tried
// also tried short circuiting on finding null values
if (!params['_source'].empty && params['_source'].containsKey('members')) {
def total = 0;
for (item in params._source.members) {
// custom logic here
// if above logic holds true
// total += 1;
}
return total > 3;
}
return true;
""",
"lang": "painless"
}
}
}
}
Other Statements That I've Tried
if (params._source == null) {
return true;
}
if (params._source.members == null) {
return true;
}
if (!ctx._source.contains('members')) {
return true;
}
if (!params['_source'].empty && params['_source'].containsKey('members') &&
params['_source'].members.value != null) {
// logic here
}
if (doc.containsKey('members')) {
for (mem in params._source.members) {
}
}
Error Message
&& params._source.members",
^---- HERE"
"caused_by" : {
"type" : "null_pointer_exception",
"reason" : null
}
I've looked into changing the structure (flattening the document) and the usage of must_not as indicated in this answer. They don't suit our use case as we need to incorporate some more custom logic.
Different tutorials use ctx, doc and some use params. To add to the confusion Debug.explain(doc.members), Debug.explain(params._source.members) return empty responses and I'm having a hard time figuring out the types.
Gist: Trying to write a custom filter on nested documents using painless. Want to write error checks when there are no nested documents to surpass null_pointer_exception
Any help is appreciated.
TLDr;
Elastic flatten objects. Such that
{
"group" : "fans",
"user" : [
{
"first" : "John",
"last" : "Smith"
},
{
"first" : "Alice",
"last" : "White"
}
]
}
Turn into:
{
"group" : "fans",
"user.first" : [ "alice", "john" ],
"user.last" : [ "smith", "white" ]
}
To access members inner value you need to reference it using doc['members.<field>'] as members will not exist on its own.
Details
As you may know, Elastic handles inner documents in its own way. [doc]
So you will need to reference them accordingly.
Here is what I did to make it work.
Btw, I have been using the Dev tools of kibana
PUT /so_test/
PUT /so_test/_mapping
{
"properties" : {
"captions_added" : {
"type" : "boolean"
},
"category" : {
"type" : "keyword"
},
"is_votable" : {
"type" : "boolean"
},
"members" : {
"properties" : {
"country" : {
"type" : "keyword"
},
"date_of_birth" : {
"type" : "date"
}
}
}
}
}
POST /so_test/_doc/
{
"captions_added": true,
"category" : "Mental Health",
"is_votable" : true,
"members": [
{"country": "Denmark", "date_of_birth": "1998-04-04T00:00:00"},
{"country": "Denmark", "date_of_birth": "1999-05-05T00:00:00"}
]
}
PUT /so_test/_doc/
{
"captions_added": true,
"category" : "Mental breakdown",
"is_votable" : true,
"members": []
}
POST /so_test/_doc/
{
"captions_added": true,
"category" : "Mental success",
"is_votable" : true,
"members": [
{"country": "France", "date_of_birth": "1998-04-04T00:00:00"},
{"country": "Japan", "date_of_birth": "1999-05-05T00:00:00"}
]
}
And then I did this query (it is only a bool filter, but I guess making it work for your own use case should not prove too difficult)
GET /so_test/_search
{
"query":{
"bool": {
"filter": {
"script": {
"script": {
"lang": "painless",
"source": """
def flag = false;
// /!\ notice how the field is referenced /!\
if(doc['members.country'].size() != 0)
{
for (item in doc['members.country']) {
if (item == params.country){
flag = true
}
}
}
return flag;
""",
"params": {
"country": "Japan"
}
}
}
}
}
}
}
BTW you were saying you were a bit confused about the context for painless. you can find in the documentation so details about it.
[doc]
In this case the filter context is the one we want to look at.

Elasticsearch Sort By Length Of Text

I'm using elasticsearch 7.13 and code on kibana
This is my mapping
{
"full_text" : {
"properties" : {
"title" : {
"type" : "text",
"fielddata" : true
},
}
}
}
This is my data
"full_text" : [
{
"title" : "Pkd chuyên cho thuê kingdom 101 1pn đến 3pn giá rẻ nhất thị trường chỉ 11 triệu/căn. lh 0919504***"
}
]
This is my code to sort by length of full_text.title
"sort": {
"_script": {
"type": "number",
"order": "desc",
"script": {
"lang": "painless",
"source": "doc['full_text.title'].value.length()"
}
}
}
So why sort result return only 7?
"_source" : {
"full_text" : [
{
"title" : "Pkd chuyên cho thuê kingdom 101 1pn đến 3pn giá rẻ nhất thị trường chỉ 11 triệu/căn. lh 0919504***"
}
]
},
"sort": [
7.0
]
Because doc['full_text.title'] will split "title" into array, you need to join that array to string.
Try this:
"source": "int length = String.join(' ',doc['full_text.title']).length(); return length;"

Get document by min size of array in Mongodb

I have mongo collection:
{
"_id" : 123,
"index" : "111",
"students" : [
{
"firstname" : "Mark",
"lastname" : "Smith"),
}
],
}
{
"_id" : 456,
"index" : "222",
"students" : [
{
"firstname" : "Mark",
"lastname" : "Smith"),
}
],
}
{
"_id" : 789,
"index" : "333",
"students" : [
{
"firstname" : "Neil",
"lastname" : "Smith"),
},
{
"firstname" : "Sofia",
"lastname" : "Smith"),
}
],
}
I want to get document that has index that is in the set of the given indexes, for example givenSet = ["111","333"] and has min length of students array.
Result should be the first document with _id:123, because its index is in the givenSet and studentsArrayLength = 1, which is smaller than third.
I need to write custom JSON #Query for Spring Mongo Repository. I am new to Mongo and am stuck a bit with this problem.
I wrote something like this:
#Query("{'index':{$in : ?0}, length:{$size:$students}, $sort:{length:-1}, $limit:1}")
Department getByMinStudentsSize(Set<String> indexes);
And got error: error message '$size needs a number'
Should I just use .count() or something like that?
you should use the aggregation framework for this type of query.
filter the result based on your condition.
add a new field and assign the array size to it.
sort based on the new field.
limit the result.
the solution should look something like this:
db.collection.aggregate([
{
"$match": {
index: {
"$in": [
"111",
"333"
]
}
}
},
{
"$addFields": {
"students_size": {
"$size": "$students"
}
}
},
{
"$sort": {
students_size: 1
}
},
{
"$limit": 1
}
])
working example: https://mongoplayground.net/p/ih4KqGg25i6
You are getting the issue because the second param should be enclosed in curly braces. And second param is projection
#Query("{{'index':{$in : ?0}}, {length:{$size:'$students'}}, $sort:{length:1}, $limit:1}")
Department getByMinStudentsSize(Set<String> indexes);
Below is the mongodb query :
db.collection.aggregate(
[
{
"$match" : {
"index" : {
"$in" : [
"111",
"333"
]
}
}
},
{
"$project" : {
"studentsSize" : {
"$size" : "$students"
},
"students" : 1.0
}
},
{
"$sort" : {
"studentsSize" : 1.0
}
},
{
"$limit" : 1.0
}
],
{
"allowDiskUse" : false
}
);

Partial update overwriting whole structure

I'm indexing a new document with the following content
{
"lastUpdate" : "20180114144020452",
"name" : "My Process",
"startDate" : "20180114162356585",
"endData" : "",
"tasks" : [
{
"1" : {
"lastUpdate" : "20180114144020452",
"taskId" : "123",
"subject" : "Terceira Atividade",
"status" : "Active",
"type" : "userTask",
"assign" : [
{
"date" : "20180114144020452",
"type" : "role",
"name" : "Time 3",
"id" : "Team3_345"
}
],
"receivedDate" : "",
"readDate" : "",
"finishDate" : ""
}
}
]
}
And then I'm trying to change task.1.status value with the following update content
{
"doc" : {
"tasks" : [
{
"1" : {
"status" : "Closed"
}
}
]
}
}
But it's overwriting the whole task.1 structure, deleting other values and letting only status value to closed instead of keep other values and change only status value.
How can I solve this? Thanks
You need to do it via a scripted partial updated like this
POST updates/update/1/_update
{
"script": {
"source": "ctx._source.tasks[0].1.status = 'Closed'"
}
}

Resources