Elasticsearch - how to check for key exist script query parameters? - elasticsearch

Is there any way to return null if there is no corresponding value?
example) Since params do not have a field4 value, I want the null value to be returned to the syn variable.
"aggs": {
"countries": {
"terms": {
"script": {
"params": {
"field1": "country",
"field2": "test",
"field3": "test"
},
"inline": "def syn = field4; if (syn == null) doc[country].value;"
}
}
}
}
Currently, errors always occur if there is no corresponding value.
"caused_by": {
"type": "missing_property_exception",
"reason": "No such property: field1 for class: e5ce2464b456f9c0fa360269abc927e65998ecf7"
}
I am using groovy and elasticsearch version 2.2
I can't use Python or JavaScript, which requires additional plug-ins to be installed.
How can I get a null value without causing an error if there is no value?
Thanks

You have a boolean value called empty that indicates you whether the document has such a field or not.
So you should do it this way
"inline": "def syn = field4 ?: 'dummy'; return (doc[syn].empty) ? null : doc[syn].value;"
UPDATE: To detect a missing parameter variable in Groovy is trivial if we know the script class name. But since the script class is created dynamically (e.g. e5ce2464b456f9c0fa360269abc927e65998ecf7), it makes the process not trivial at all. One way to circumvent this is to add a try/catch block around your code, so that the code can fail but at least we can catch it, basically like this:
"inline": "try { def syn = field4; return (doc[syn].empty) ? null : doc[syn].value; } catch (e) { return null } "
However, ES introduced some security hardening and class whitelisting for scripting in 2.2. One way to achieve this is to whitelist a few Exception classes in your .java.policy file, like this:
grant {
permission org.elasticsearch.script.ClassPermission "java.lang.Throwable";
permission org.elasticsearch.script.ClassPermission "java.lang.Exception";
permission org.elasticsearch.script.ClassPermission "groovy.lang.GroovyException";
};

Related

How to prevent "Too many dynamic script compilations within" error with search templates?

I use a search template with "mustache" language to build dynamic queries according to different parameters.
When I often modify the values ​​of the parameters of this request, I get this error message :
[script] Too many dynamic script compilations within, max: [150/5m];
I think that each time the values ​​of the parameters change, the script is recompiled but if the values ​​are identical then elasticsearch uses a cache so as not to recompile the script.
In our case, the cache cannot be used because at each request the values ​​are always different (local timestamp, variable distance, random seed generated by a client...)
To prevent this error, I change the cluster settings to increase the max_compilations_rate value at the cost of higher server load.
Is there a way to limit recompilation ?
My "big" script computes score according to many parameters and uses Elasticsearch 8.2.
The structure of the script is as follows :
{
"script": {
"lang": "mustache",
"source": "...",
"params": { ... }
}
}
The source code looks like this :
{
"runtime_mappings": {
"is_opened": {
"type": "long",
"script": {
"source": " ... "
}
}
{{#user_location}}
,"distance": {
"type": "long",
"script": {
"source": " ... "
}
}
{{/user_location}}
},
"query": {
"script_score": {
"query": { ... }
},
"script": {
"source": " ... "
}
}
},
"fields": [
"is_opened"
{{#user_location}},"distance"{{/user_location}}
],
...
}
I use mustache variables (with double brackets) everywhere in the script :
in the computed fields ("is_opened", "distance")
in query and filters
in script score
Is there a way to "optimize" internal scripts (computed fields and score script) so as not to restart compilation each time the values for the parameters change ?
To avoid compilations, I need to use "params" inside the embedded runtime fields scripts and inside the query score script.
I had indeed used the parameters for the main script written in "mustache" but I had not done so for the embedded scripts written in "painless".
Thanks #Val for giving me a hint.

ElasticSearch scripting set an object value

I am trying to use a script to set several values of my Elastic document.
POST myindex/_update_by_query
{
"script" : {
"source": """
ctx._source.categories='categories';
ctx._source.myObject={};
""",
"lang": "painless"
},
"query": {
"term" : {
"name": "Tony"
}
}
}
But I can't set an object value with this painless language. However I write it, I get an error.
Is there a way to do this, maybe with a different script language ?
Thanks!
In order to create an object (i.e. a hash map), you should do it this way:
ctx._source.myObject = [:];

How to use carriage return in a script template with a runtime mapping field?

Here is an example that illustrates the problem we are having with "mustache" and the carriage return.
In our script template, we need :
a runtime mapping field : to compute a result (with a big script in our real case)
conditional template : to build search criteria according to params existence (many criteria in our real case)
We use Elasticsearch 7.16 and kibana debug console to make our tests.
We create this script template with this request :
POST _scripts/test
{
"script": {
"lang": "mustache",
"source": """{
"runtime_mappings": {
"result": {
"type": "long",
"script": {
"source": "emit({{#foo}}{{foo}}{{/foo}}{{^foo}}0{{/foo}})"
}
}
}
{{#foo}}
,"fields": [
"result"
]
{{/foo}}
}"""
}
}
Here are 2 examples of requests that show how this script works:
Request 1 : Search request with param
Return the computed field "result" with the "foo" parameter value (12345)
GET _search/template
{
"id": "test",
"params": {
"foo": 12345
}
}
Request 2 : Search request without param
Don't return computed field "result".
GET _search/template
{
"id": "test"
}
Like i said before, in our real case we have a very big "painless" script in the computed field.
For more readability, we therefore wrote this script on several lines and that's when a problem appears.
An error happened when we declare:
"source": "
emit({{#foo}}{{foo}}{{/foo}}{{^foo}}0{{/foo}})
"
instead of:
"source": "emit({{#foo}}{{foo}}{{/foo}}{{^foo}}0{{/foo}})"
Due to the JSON specifications, we cannot use carriage returns otherwise we get the following error:
Illegal unquoted character ((CTRL-CHAR, code 10)): has to be escaped using backslash to be included in string value
We also cannot use the notation with """ because it will conflict with the one used to declare the source of the script template.
Is there a trick to set the computed field script to multiple lines in Kibana debug console ?

How to rename a nested field containing dots with elasticsearch rename processor and ingest pipeline

I have a field in elasticsearch (5.5.1) which I need to rename because the name contains a '.' and it is causing various problems. The field I want to rename is nested inside another field.
I am trying to use a Rename Processor in an Ingest Pipeline to do a Reindex as described here: https://stackoverflow.com/a/43142634/5114
Here is my pipeline simulation request (you can copy this verbatim into the Dev Tools utility in Kibana to test it):
POST _ingest/pipeline/_simulate
{
"pipeline" : {
"description": "rename nested fields to remove dot",
"processors": [
{
"rename" : {
"field" : "message.message.group1",
"target_field" : "message_group1"
}
},
{
"rename" : {
"field" : "message.message.group2",
"target_field" : "message.message_group2"
}
}
]
},
"docs":[
{
"_type": "status",
"_id": "1509533940000-m1-bfd7183bf036bd346a0bcf2540c05a70fbc4d69e",
"_version": 5,
"_score": null,
"_source": {
"message": {
"_job-id": "AV8wHJEaa4J0sFOfcZI5",
"message.group1": 0,
"message.group2": "foo"
},
"timestamp": 1509533940000
}
}
]
}
The problem is that I get an error when trying to use my pipeline:
{
"docs": [
{
"error": {
"root_cause": [
{
"type": "exception",
"reason": "java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: field [message.message.group1] doesn't exist",
"header": {
"processor_type": "rename"
}
}
],
"type": "exception",
"reason": "java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: field [message.message.group1] doesn't exist",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "java.lang.IllegalArgumentException: field [message.message.group1] doesn't exist",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "field [message.message.group1] doesn't exist"
}
},
"header": {
"processor_type": "rename"
}
}
}
]
}
I think the problem is caused by the field "message.group1" being inside another field ("message"). I'm not sure how to refer to the field I want in the context of the processor. It seems that there could be ambiguity between cases of nested fields, fields containing dots and nested fields containing dots.
I'm looking for the correct way to reference these fields, or if Elasticsearch can not do what I want, confirmation that this is not possible. If Elasticsearch can do this, then it will probably go very fast, else I have to write an external script to pull the documents, transform them, and re-save them to the new index.
Ok, investigating in the Elasticsearch code, I think I know why this won't work.
First we look at the Elasticsearch Rename Processor:
https://github.com/elastic/elasticsearch/blob/9eff18374d68355f6acb58940a796268c9b6f2de/modules/ingest-common/src/main/java/org/elasticsearch/ingest/common/RenameProcessor.java#L76-L84
Object value = document.getFieldValue(field, Object.class);
document.removeField(field);
try {
document.setFieldValue(targetField, value);
} catch (Exception e) {
// setting the value back to the original field shouldn't as we just fetched it from that field:
document.setFieldValue(field, value);
throw e;
}
What this is doing is looking for the field to rename, getting its value, then removing the field and adding a new field with the same value but with the new name.
Now we look at what happens in document.getFieldValue:
https://github.com/elastic/elasticsearch/blob/9eff18374d68355f6acb58940a796268c9b6f2de/core/src/main/java/org/elasticsearch/ingest/IngestDocument.java#L101-L108
public <T> T getFieldValue(String path, Class<T> clazz) {
FieldPath fieldPath = new FieldPath(path);
Object context = fieldPath.initialContext;
for (String pathElement : fieldPath.pathElements) {
context = resolve(pathElement, path, context);
}
return cast(path, context, clazz);
}
Notice it uses a FieldPath object to represent the path to the field in the document.
Now look at how the FieldPath represents the path:
https://github.com/elastic/elasticsearch/blob/9eff18374d68355f6acb58940a796268c9b6f2de/core/src/main/java/org/elasticsearch/ingest/IngestDocument.java#L688
this.pathElements = newPath.split("\\.");
This is splitting the path on any "." character, because that is the delimiter between path elements in field names.
The problem is that the source document has a field named "message.group1", so we need to be able to reference that. Just splitting the path on "." does not account for field names containing a "." in the name. We would need a syntax more like javascript for that, where we could use brackets and quotes to make the dot mean something different.
If the source documents were all transformed so that a "." in the field name would turn that field into an object before saving, then this path scheme would work. But with source documents having field names containing "." we can not reference them in certain contexts.
To solve my problem and reindex my index, I wrote a python script which pulled a batch of documents, transformed them and bulk inserted them in a new index. This is basically what the Elasticsearch reindex api does, but I did it in python instead.
More than two year later, I come across the same issue. You can manage to have your dotted-properties expanded to real nested objects with the the dot_expander processor.
Expands a field with dots into an object field. This processor allows fields with dots in the name to be accessible by other processors in the pipeline. Otherwise these fields can’t be accessed by any processor
Issue 37507 on Elasticsearch's Github pointed me in the right direction.

ElasticSearch: Using an existing field in script param

I am trying to create a nested object and set the field value to be a document fields value. I can create a non nested field with my logic value and I can create a nested field with a hard coded value. But I cannot get the two of these things to work together.
Here is what I have so far.
Create a nested field:
{
"script": "ctx._source.displayFields = displayField",
"params": {
"displayField": {
"displayField": 11
}
}
}
Or I can use a script to fetch the value and sent a field like this:
{
"script" : "if (ctx._source['fielda'] == 'term1') {
ctx._source['displayField'] = ctx._source['field2']; }
else if (ctx._source['fielda'] == 'term2') {
ctx._source['displayFields.displayPrice'] = ctx._source['fieldb'];
}
But if I try and put a script in the param field like either of the below I always get an error. Any advice would be greatly appreciated.
Things I have tried and not worked:
{
"script": "ctx._source.displayFields = displayField",
"params": {
"displayField": {
"displayField": "tag"
},
"tag" : {
"script": "ctx._source['numberField']"
}
}
}
As well as trying to assign a script as its subfield or putting it as the value.

Resources