How to use carriage return in a script template with a runtime mapping field? - elasticsearch

Here is an example that illustrates the problem we are having with "mustache" and the carriage return.
In our script template, we need :
a runtime mapping field : to compute a result (with a big script in our real case)
conditional template : to build search criteria according to params existence (many criteria in our real case)
We use Elasticsearch 7.16 and kibana debug console to make our tests.
We create this script template with this request :
POST _scripts/test
{
"script": {
"lang": "mustache",
"source": """{
"runtime_mappings": {
"result": {
"type": "long",
"script": {
"source": "emit({{#foo}}{{foo}}{{/foo}}{{^foo}}0{{/foo}})"
}
}
}
{{#foo}}
,"fields": [
"result"
]
{{/foo}}
}"""
}
}
Here are 2 examples of requests that show how this script works:
Request 1 : Search request with param
Return the computed field "result" with the "foo" parameter value (12345)
GET _search/template
{
"id": "test",
"params": {
"foo": 12345
}
}
Request 2 : Search request without param
Don't return computed field "result".
GET _search/template
{
"id": "test"
}
Like i said before, in our real case we have a very big "painless" script in the computed field.
For more readability, we therefore wrote this script on several lines and that's when a problem appears.
An error happened when we declare:
"source": "
emit({{#foo}}{{foo}}{{/foo}}{{^foo}}0{{/foo}})
"
instead of:
"source": "emit({{#foo}}{{foo}}{{/foo}}{{^foo}}0{{/foo}})"
Due to the JSON specifications, we cannot use carriage returns otherwise we get the following error:
Illegal unquoted character ((CTRL-CHAR, code 10)): has to be escaped using backslash to be included in string value
We also cannot use the notation with """ because it will conflict with the one used to declare the source of the script template.
Is there a trick to set the computed field script to multiple lines in Kibana debug console ?

Related

How to prevent "Too many dynamic script compilations within" error with search templates?

I use a search template with "mustache" language to build dynamic queries according to different parameters.
When I often modify the values ​​of the parameters of this request, I get this error message :
[script] Too many dynamic script compilations within, max: [150/5m];
I think that each time the values ​​of the parameters change, the script is recompiled but if the values ​​are identical then elasticsearch uses a cache so as not to recompile the script.
In our case, the cache cannot be used because at each request the values ​​are always different (local timestamp, variable distance, random seed generated by a client...)
To prevent this error, I change the cluster settings to increase the max_compilations_rate value at the cost of higher server load.
Is there a way to limit recompilation ?
My "big" script computes score according to many parameters and uses Elasticsearch 8.2.
The structure of the script is as follows :
{
"script": {
"lang": "mustache",
"source": "...",
"params": { ... }
}
}
The source code looks like this :
{
"runtime_mappings": {
"is_opened": {
"type": "long",
"script": {
"source": " ... "
}
}
{{#user_location}}
,"distance": {
"type": "long",
"script": {
"source": " ... "
}
}
{{/user_location}}
},
"query": {
"script_score": {
"query": { ... }
},
"script": {
"source": " ... "
}
}
},
"fields": [
"is_opened"
{{#user_location}},"distance"{{/user_location}}
],
...
}
I use mustache variables (with double brackets) everywhere in the script :
in the computed fields ("is_opened", "distance")
in query and filters
in script score
Is there a way to "optimize" internal scripts (computed fields and score script) so as not to restart compilation each time the values for the parameters change ?
To avoid compilations, I need to use "params" inside the embedded runtime fields scripts and inside the query score script.
I had indeed used the parameters for the main script written in "mustache" but I had not done so for the embedded scripts written in "painless".
Thanks #Val for giving me a hint.

Elasticsearch - how to check for key exist script query parameters?

Is there any way to return null if there is no corresponding value?
example) Since params do not have a field4 value, I want the null value to be returned to the syn variable.
"aggs": {
"countries": {
"terms": {
"script": {
"params": {
"field1": "country",
"field2": "test",
"field3": "test"
},
"inline": "def syn = field4; if (syn == null) doc[country].value;"
}
}
}
}
Currently, errors always occur if there is no corresponding value.
"caused_by": {
"type": "missing_property_exception",
"reason": "No such property: field1 for class: e5ce2464b456f9c0fa360269abc927e65998ecf7"
}
I am using groovy and elasticsearch version 2.2
I can't use Python or JavaScript, which requires additional plug-ins to be installed.
How can I get a null value without causing an error if there is no value?
Thanks
You have a boolean value called empty that indicates you whether the document has such a field or not.
So you should do it this way
"inline": "def syn = field4 ?: 'dummy'; return (doc[syn].empty) ? null : doc[syn].value;"
UPDATE: To detect a missing parameter variable in Groovy is trivial if we know the script class name. But since the script class is created dynamically (e.g. e5ce2464b456f9c0fa360269abc927e65998ecf7), it makes the process not trivial at all. One way to circumvent this is to add a try/catch block around your code, so that the code can fail but at least we can catch it, basically like this:
"inline": "try { def syn = field4; return (doc[syn].empty) ? null : doc[syn].value; } catch (e) { return null } "
However, ES introduced some security hardening and class whitelisting for scripting in 2.2. One way to achieve this is to whitelist a few Exception classes in your .java.policy file, like this:
grant {
permission org.elasticsearch.script.ClassPermission "java.lang.Throwable";
permission org.elasticsearch.script.ClassPermission "java.lang.Exception";
permission org.elasticsearch.script.ClassPermission "groovy.lang.GroovyException";
};

How to change the field type in an ElasticSearch Index?

I have index_A, which includes a number field "foo".
I copy the mapping for index_A, and make a dev tools call PUT /index_B with the field foo changed to text, so the mapping portion of that is:
"foo": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
I then reindex index_A to index_B with:
POST _reindex
{
"source": {
"index": "index_A"
},
"dest": {
"index": "index_B"
}
}
When I go to view any document for index_B, the entry for the "foo" field is still a number. (I was expecting for example: "foo": 30 to become "foo" : "30" in the new document's source).
As much as I've read on Mappings and reindexing, I'm still at a loss on how to accomplish this. What specifically do I need to run in order to get this new index with "foo" as a text field, and all number entries for foo in the original index changed to text entries in the new index?
There's a distinction between how a field is stored vs indexed in ES. What you see inside of _source is stored and it's the "original" document that you've ingested. But there's no explicit casting based on the mapping type -- ES stores what it receives but then proceeds to index it as defined in the mapping.
In order to verify how a field was indexed, you can inspect the script stack returned in:
GET index_b/_search
{
"script_fields": {
"debugging_foo": {
"script": {
"source": "Debug.explain(doc['foo'])"
}
}
}
}
as opposed to how a field was stored:
GET index_b/_search
{
"script_fields": {
"debugging_foo": {
"script": {
"source": "Debug.explain(params._source['foo'])"
}
}
}
}
So in other words, rest assured that foo was indeed indexed as text + keyword.
If you'd like to explicitly cast a field value into a different data type in the _source, you can apply a script along the lines of:
POST _reindex
{
"source": {
"index": "index_a"
},
"dest": {
"index": "index_b"
},
"script": {
"source": "ctx._source.foo = '' + ctx._source.foo"
}
}
I'm not overly familiar with java but I think ... = ctx._source.foo.toString() would work too.
FYI there's a coerce mapping parameter which sounds like it could be of use here but it only works the other way around -- casting/parsing from strings to numerical types etc.
FYI#2 There's a pipeline processor called convert that does exactly what I did in the above script, and more. (A pipeline is a pre-processor that runs before the fields are indexed in ES.) The good thing about pipelines is that they can be run as part of the _reindex process too.

How to write multiline Elasticsearch scripts with Postman

I'm trying to do an Elasticsearch GET query with a very simple script using Postman. When the script is all on one line it works but if I try to do multiple lines I get an error
I'm sening the data as JSON with Content-Type: application/json in the header
Example - Works:
{
"query":{
"match_all": {}
},
"script_fields": {
"my_custom_field":{
"script": {
"lang": "painless",
"source": "int count = 1; return count;"
}
}
}
}
Example - Produces Error:
{
"query":{
"match_all": {}
},
"script_fields": {
"my_custom_field":{
"script": {
"lang": "painless",
"source": """
int count = 1;
return count;
"""
}
}
}
}
The error:
"Unexpected character ('\"' (code 34)): was expecting comma to separate Object entries\n at [Source: org.elasticsearch.transport.netty4.ByteBufStreamInput#20493763; line: 9, column: 18]"
I think postman may be adding line breaks in behind the scenes.
Triple-quotes in JSON are technically not valid -- see this thread for more info.
You've essentially got 3 options:
Write a script which takes in a multiline "JSON" text and produces a \n-separated, valid JSON (what I often did before multiline backtick strings were a thing in JavaScript, and still do in php):
function compactifyMultilineString( $input_string )
{
return str_replace( array( "\r", "\n", "\t" ), " ", $input_string );
}
Use postman's own pre-request scripts
Or, probably the most reasonable option, set up Kibana right next to your ElasticSearch server. Kibana is great for testing out queries and it also supports a postman-ready copy feature:

Match query return records only if query contains all words of object's field

I read about match and multiword queries but it seems that I need to do something a bit different.
Let's say I have following query: "this is a test" and I want to find that query in one field called "text". I want to get objects which match some of that query (doesn't matter how many words) but only those objects which query value contains every word of text field.
Example for query: "this is a test". I want get those objects:
obj1: {"text":"this is a test"}
obj2: {"text":"this is a"}
obj3 : { "text" : "is a" }
obj4 : { "text" : "test" }
But if obj has something more in text field it will not be returned for example:
obj5: {"text":"this is a test and something more"}
Is it possible to achieve this using Elasticsearch?
It's kind of a hack, but I was able to get it to work with a script filter:
POST /test_index/_search
{
"query": {
"match": {
"text": "this is a test"
}
},
"filter": {
"script": {
"script": "for(val in doc[\"text\"].values){ if(!(val in terms)){ return false; }}; return true;",
"params": {
"terms": ["this", "is", "a", "test"]
}
}
}
}
I thought there would be a better way to do this, but wasn't immediately able to come up with one. Using scripting can be problematic in production, unless your ES cluster is behind an auth wall of some kind.
Anyway, here's the code I used to test it:
http://sense.qbox.io/gist/3929abc89d71ebf724e6121b1b5ba6da54501088

Resources