Painless scripting initialize new array - elasticsearch-painless

I'm trying to add or update a nested object in Elasticsearch using a script. Below script works fine if integrations is already an array, but when it is null, below script throws a null pointer exception. How do I initialize ctx._source.integrations to be an empty array if its value is null? (Something like the equivalent of JavaScript's myObject.integrations = myObject.integrations ?? [])
POST /products/_update/VFfrnQrKlC5bwdfdeaQ7
{
"script": {
"source": """
ctx._source.integrations.removeIf(i -> i.id == params.integration.id);
ctx._source.integrations.add(params.integration);
ctx._source.integrationCount = ctx._source.integrations.length;
""",
"params": {
"integration": {
"id": "dVTV8GjHj8pXFnlYUUlI",
"from": true,
"to": false,
"vendor": "sfeZWDpZXlF5Qa8mUsiF",
"targetProduct": {
"id": "nyILhphvCrGYm53cfaOx",
"name": "Test Product",
"categoryIds": []
}
}
}
}
}

ok i think this does the trick:
if (ctx._source.integrations == null) {
ctx._source.integrations = new ArrayList();
}
is there a short hand to this like in the JS example?

Related

Elasticsearch search templates - How to construct the search terms in NEST

Currently I have a search template that I am trying to pass in a couple of parameters,
How can I construct my search terms using NEST to get the following result.
Template
PUT _scripts/company-index-template
{
"script": {
"lang": "mustache",
"source": "{\"query\": {\"bool\": {\"filter\":{{#toJson}}clauses{{/toJson}},\"must\": [{\"query_string\": {\"fields\": [\"companyTradingName^2\",\"companyName\",\"companyContactPerson\"],\"query\": \"{{query}}\"}}]}}}",
"params": {
"query": "",
"clauses": []
}
}
}
DSL query looks as follow
GET company-index/_search/template
{
"id": "company-index-template",
"params": {
"query": "sky*",
"clauses": [
{
"terms": {
"companyGroupId": [
1595
]
}
},
{
"terms": {
"companyId": [
158,
836,
1525,
2298,
2367,
3176,
3280
]
}
}
]
}
}
I would like to construct the above query in NEST but cant seem to find a good way to generate the clauses value.
This is what I have so far...
var responses = this.client.SearchTemplate<Company>(descriptor =>
descriptor
.Index(SearchConstants.CompanyIndex)
.Id("company-index-template")
.Params(objects => objects
.Add("query", queryBuilder.Query)
.Add("clauses", "*How do I contruct this JSON*");
UPDATE:
This is how I ended up doing it. I just created a dictionary with all my terms in it.
I do think there might be a beter why of doing it, but I cant find it.
new List<Dictionary<string, object>>
{
new() {{"terms", new Dictionary<string, object> {{"companyGroupId", companyGroupId}}}},
new() {{"terms", new Dictionary<string, object> {{"companyId", availableCompanies}}}}
}
And then I had to Serialize when I passed it to the Params method.
var response = this.client.SearchTemplate<Company>(descriptor =>
descriptor.Index(SearchConstants.CompanyIndex)
.Id("company-index-template")
.Params(objects => objects
.Add("query", "*" + query + "*")
.Add("clauses", JsonConvert.SerializeObject(filterClauses))));

Elasticsearch merging documents in response

I am having data in 3 indexes. I want to generate a invoice report using information from other indexes. For example the following are the sample document of each index
Users index
{
"_id": "userId1",
"name": "John"
}
Invoice index
{
"_id": "invoiceId1",
"userId": "userId1",
"cost": "10000",
"startdate": "",
"enddate": ""
}
Orders index
{
"_id": "orderId1",
"userId": "userId1",
"productName": "Mobile"
}
I want to generate a invoice report by combining information from these three indexes as follows
{
"_id": "invoiceId1",
"userName": "John",
"productName": "Mobile",
"cost": "10000",
"startdate": "",
"enddate": ""
}
How to write Elasticsearch query which returns response by combining information from other index documents?
You cannot do query-time joins in Elasticsearch and will need to denormalize your data in order to efficiently retrieve and group it.
Having said that, you could:
leverage the multi-target syntax and query multiple indices at once
use an OR query on the id and userId -- since either of those is referenced at least once in any of your docs
and then trivially join your data through a map/reduce tool called scripted metric aggregations
Quick side note: you won't be able to use the _id keyword inside your docs because it's reserved.
Assuming your docs and indices are structured as follows:
POST users_index/_doc
{"id":"userId1","name":"John"}
POST invoices_index/_doc
{"id":"invoiceId1","userId":"userId1","cost":"10000","startdate":"","enddate":""}
POST orders_index/_doc
{"id":"orderId1","userId":"userId1","productName":"Mobile"}
Here's how the scripted metric aggregation could look like:
POST users_index,invoices_index,orders_index/_search
{
"size": 0,
"query": {
"bool": {
"should": [
{
"term": {
"id.keyword": {
"value": "userId1"
}
}
},
{
"term": {
"userId.keyword": {
"value": "userId1"
}
}
}
]
}
},
"aggs": {
"group_by_invoiceId": {
"scripted_metric": {
"init_script": "state.users = []; state.invoices = []; state.orders = []",
"map_script": """
def source = params._source;
if (source.containsKey("name")) {
// we're dealing with the users index
state.users.add(source);
} else if (source.containsKey("cost")) {
// we're dealing with the invoices index
state.invoices.add(source);
} else if (source.containsKey("productName")) {
// we're dealing with the orders index
state.orders.add(source);
}
""",
"combine_script": """
def non_empty_state = [:];
for (entry in state.entrySet()) {
if (entry != null && entry.getValue().length > 0) {
non_empty_state[entry.getKey()] = entry.getValue();
}
}
return non_empty_state;
""",
"reduce_script": """
def final_invoices = [];
def all_users = [];
def all_invoices = [];
def all_orders = [];
// flatten all resources
for (state in states) {
for (kind_entry in state.entrySet()) {
def map_kind = kind_entry.getKey();
if (map_kind == "users") {
all_users.addAll(kind_entry.getValue());
} else if (map_kind == "invoices") {
all_invoices.addAll(kind_entry.getValue());
} else if (map_kind == "orders") {
all_orders.addAll(kind_entry.getValue());
}
}
}
// iterate the invoices and enrich them
for (invoice_entry in all_invoices) {
def invoiceId = invoice_entry.id;
def userId = invoice_entry.userId;
def userName = all_users.stream().filter(u -> u.id == userId).findFirst().get().name;
def productName = all_orders.stream().filter(o -> o.userId == userId).findFirst().get().productName;
def cost = invoice_entry.cost;
def startdate = invoice_entry.startdate;
def enddate = invoice_entry.enddate;
final_invoices.add([
'id': invoiceId,
'userName': userName,
'productName': productName,
'cost': cost,
'startdate': startdate,
'enddate': enddate
]);
}
return final_invoices;
"""
}
}
}
}
which'd return
{
...
"aggregations" : {
"group_by_invoiceId" : {
"value" : [
{
"cost" : "10000",
"enddate" : "",
"id" : "invoiceId1",
"userName" : "John",
"startdate" : "",
"productName" : "Mobile"
}
]
}
}
}
Summing up, there are workarounds to achieve query-time joins. At the same time, scripts like this shouldn't be used in production because they could take forever.
Instead, this aggregation should be emulated outside of Elasticsearch after the query resolves and returns the index-specific hits.
BTW — I set size: 0 to return just the aggregation results so increase this parameter if you want to get some actual hits.

Elasticsearch pre-processing to remove null fields as part of ingest

I have a use case where an API i'm calling to retrieve data to put into elasticsearch is returning nulls.
I need to write an ingest pipeline that uses processors to remove all null fields before writing it into elasticsearch. Processors may or may not use painless scripting.
Here is a sample payload that i currently get from the API
{
"master_desc": "TESTING PART",
"date_added": "2019-10-24T09:30:03",
"master_no": {
"master_no": 18460110,
"barcode": "NLSKYTEST1-1",
"external_key": null,
"umid": null
}
}
The pipeline should ideally insert the document as -
{
"master_desc": "TESTING PART",
"date_added": "2019-10-24T09:30:03",
"master_no": {
"master_no": 18460110,
"barcode": "NLSKYTEST1-1"
}
}
Note, the fields are dynamic so i can't write a processor that checks for nulls against a defined set of fields.
Thanks!
Null fields are not indexed nor are searchable.I have written below pipeline to remove such fields. Please test it before use on all of your scenarios. After posting documents using this pipeline, you won't be able to search null fields using "exists"
Pipeline:
PUT _ingest/pipeline/remove_null_fields
{
"description": "Remove any null field",
"processors": [
{
"script": {
"source": """
// return list of field with null values
def loopAllFields(def x){
def ret=[];
if(x instanceof Map){
for (entry in x.entrySet()) {
if (entry.getKey().indexOf("_")==0) {
continue;
}
def val=entry.getValue();
if( val instanceof HashMap ||
val instanceof Map ||
val instanceof ArrayList)
{
def list=[];
if(val instanceof ArrayList)
{
def index=0;
// Call for each object in arraylist
for(v in val)
{
list=loopAllFields(v);
for(item in list)
{
ret.add(entry.getKey()+"["+index+"]."+ item);
}
index++;
}
}
else
{
list =loopAllFields(val);
}
if(list.size()==val.size())
{
ret.add(entry.getKey());
}
else{
for(item in list)
{
ret.add(entry.getKey()+"."+ item);
}
}
}
if(val==null)
{
ret.add(entry.getKey());
}
}
}
return ret;
}
/* remove fields from source, recursively deletes fields which part of other fields */
def removeField(def ctx, def fieldname)
{
def pos=fieldname.indexOf(".");
if(pos>0)
{
def str=fieldname.substring(0,pos);
if(str.indexOf('[')>0 && str.indexOf(']')>0)
{
def s=str.substring(0,str.indexOf('['));
def i=str.substring(str.indexOf('[')+1,str.length()-1);
removeField(ctx[s][Integer.parseInt(i)],fieldname.substring(pos+1,fieldname.length()));
}
else
{
if(ctx[str] instanceof Map)
{
removeField(ctx[str],fieldname.substring(pos+1,fieldname.length()));
}
}
}else{
ctx.remove(fieldname);
}
return ctx;
}
def list=[];
list=loopAllFields(ctx);
for(item in list)
{
removeField(ctx,item);
}
"""
}
}
]
}
Post Document:
POST index8/_doc?pipeline=remove_null_fields
{
"master_desc": "TESTING PART",
"ddd":null,
"date_added": "2019-10-24T09:30:03",
"master_no": {
"master_no": 18460110,
"barcode": "NLSKYTEST1-1",
"external_key": null,
"umid": null
}
}
Result:
"hits" : [
{
"_index" : "index8",
"_type" : "_doc",
"_id" : "06XAyXEBAWHHnYGOSa_M",
"_score" : 1.0,
"_source" : {
"date_added" : "2019-10-24T09:30:03",
"master_no" : {
"master_no" : 18460110,
"barcode" : "NLSKYTEST1-1"
},
"master_desc" : "TESTING PART"
}
}
]
#Jaspreet, so the script almost worked. It didn't however eliminate empty objects, empty arrays or empty values. Here is a doc i tried to index -
{
"master_desc": "TESTING PART",
"date_added": "2019-10-24T09:30:03",
"master_no": {
"master_no": 18460110,
"barcode": "NLSKYTEST1-1",
"external_key": null,
"umid": null
},
"remote_sync_state": "",
"lib_title_footage": [],
"prj_no": {
"prj_no": null,
"prj_desc": null,
}
The above returned -
{
"master_desc": "TESTING PART",
"date_added": "2019-10-24T09:30:03",
"master_no": {
"master_no": 18460110,
"barcode": "NLSKYTEST1-1"
},
"remote_sync_state": "",
"lib_title_footage": [ ],
"prj_no": { }
I tried updated the script to have the condition check for these patterns but got a compile error unfortunately.

loopback REST API filter by nested data

I would like to filter from REST API by nested data. For example this object:
[
{
"name": "Handmade Soft Fish",
"tags": "Rubber, Rubber, Salad",
"categories": [
{
"name": "women",
"id": 2,
"parent_id": 0,
"permalink": "/women"
},
{
"name": "kids",
"id": 3,
"parent_id": 0,
"permalink": "/kids"
}
]
},
{
"name": "Tasty Rubber Soap",
"tags": "Granite, Granite, Chair",
"categories": [
{
"name": "kids",
"id": 3,
"parent_id": 0,
"permalink": "/kids"
}
]
}
]
is comming by GET /api/products?filter[include]=categories
and i would like to get only products which has category name "women". How do this?
LoopBack does not support filters based on related models.
This is a limitation that we have never had bandwidth to solve, unfortunately :(
For more details, see the discussion and linked issues here:
Filter on level 2 properties: https://github.com/strongloop/loopback/issues/517
Filter by properties of related models (use SQL JOIN in queries): https://github.com/strongloop/loopback/issues/683
Maybe you want to get this data by the Category REST API. For example:
GET /api/categories?filter[include]=products&filter[where][name]=woman
The result will be a category object with all products related. To this, will be necessary declare this relation on the models.
Try like this.It has worked for me.
const filter = {
where: {
'categories.name': {
inq: ['women']**strong text**
}
}
};
Pass this filter to request as path parameters and the request would be like bellow
GET /api/categoriesfilter=%7B%22where%22:%7B%categories.name%22:%7B%22inq%22:%5B%women%22%5D%7D%7D%7D
Can you share how it looks like without filter[include]=categorie, please ?
[edit]
after a few questions in comment, I'd build a remote method : in common/models/myModel.js (inside the function) :
function getItems(filter, categorieIds = []) {
return new Promise((resolve, reject) => {
let newInclude;
if (filter.hasOwnProperty(include)){
if (Array.isArray(filter.include)) {
newInclude = [].concat(filter.include, "categories")
}else{
if (filter.include.length > 0) {
newInclude = [].concat(filter.include, "categories");
}else{
newInclude = "categories";
}
}
}else{
newInclude = "categories";
}
myModel.find(Object.assign({}, filter, {include: newInclude}))
.then(data => {
if (data.length <= 0) return resolve(data);
if (categoriesIds.length <= 0) return resolve(data);
// there goes your specific filter on categories
const tmp = data.filter(
item => item.categories.findIndex(
categorie => categorieIds.indexOf(categorie.id) > -1
) > -1
);
return resolve(tmp);
})
}
}
myModel.remoteMethod('getItems', {
accepts: [{
arg: "filter",
type: "object",
required: true
}, {
arg: "categorieIds",
type: "array",
required: true
}],
returns: {arg: 'getItems', type: 'array'}
});
I hope it answers your question...

elasticsearch:use script to update nested field?

I want to add a object into the nested field every update time.
For example,I have a doc:
{
"test":[{"remark":"remark1"}]
}
Next time,i want to add a remark object into test field and save the old remark objects.And the result is :
{
"test":[{"remark":"remark1"},{"remark":"remark2"}]
}
How to achieve it?
Edit
I use the script:
{
"script": "ctx._source.test= ((ctx._source.test?: []) += remarkItem)",
"params": {
"remarkItem": {
"remark": "addd"
}
}
}
But,i get the exception:
{
"error": {
"root_cause": [
{
"type": "remote_transport_exception",
"reason": "[es77][10.14.84.77:9300][indices:data/write/update[s]]"
}
],
"type": "illegal_argument_exception",
"reason": "failed to execute script",
"caused_by": {
"type": "script_exception",
"reason": "Failed to compile inline script [ctx._source.test= ((ctx._source.test?: []) += remarkItem)] using lang [groovy]",
"caused_by": {
"type": "script_exception",
"reason": "failed to compile groovy script",
"caused_by": {
"type": "multiple_compilation_errors_exception",
"reason": "startup failed:\na8220b2cf14b8b7ebeead7f068416882d04fa25d: 1: \nclass org.codehaus.groovy.ast.expr.ElvisOperatorExpression, with its value '(ctx._source.test) ? ctx._source.test: []', is a bad expression as the left hand side of an assignment operator at line: 1 column: 82. File: a8220b2cf14b8b7ebeead7f068416882d04fa25d # line 1, column 82.\n CILastCallResultRemark ?: []) += remarkI\n ^\n\n1 error\n"
}
}
}
},
"status": 400
}
edit
Now,i want to add a field to ensure update or insert the object.
For example:
{
"test":[{"remark":"remark1","id":"1"}]
}
When i update the field,when the id exist,i will update the object.On the contrary,i will insert the object.
I suggest to try a script like this, which takes two parameters in argument. It will check if any of the nested objects already contains the given id:
if yes, it will update the given remark
if not, it will insert a new nested object in the test array.
The script goes like this:
def updated = false
ctx._source.test?.each { obj ->
if (obj.id == item.id) {
obj.remark = item.remark
updated = true
}
}
if (!updated) {
ctx._source.test = ((ctx._source.test ?: []) + item)
}
After being inlined and with proper semicolons, the script looks like this:
{
"script": "def updated = false; ctx._source.test?.each { obj -> if (obj.id == item.id) { obj.remark = item.remark; updated = true } }; if (!updated) { ctx._source.test = ((ctx._source.test ?: []) + item)}",
"params": {
"item": {
"remark": "addd",
"id": "1"
}
}
}

Resources