How to prevent "Too many dynamic script compilations within" error with search templates? - elasticsearch

I use a search template with "mustache" language to build dynamic queries according to different parameters.
When I often modify the values ​​of the parameters of this request, I get this error message :
[script] Too many dynamic script compilations within, max: [150/5m];
I think that each time the values ​​of the parameters change, the script is recompiled but if the values ​​are identical then elasticsearch uses a cache so as not to recompile the script.
In our case, the cache cannot be used because at each request the values ​​are always different (local timestamp, variable distance, random seed generated by a client...)
To prevent this error, I change the cluster settings to increase the max_compilations_rate value at the cost of higher server load.
Is there a way to limit recompilation ?
My "big" script computes score according to many parameters and uses Elasticsearch 8.2.
The structure of the script is as follows :
{
"script": {
"lang": "mustache",
"source": "...",
"params": { ... }
}
}
The source code looks like this :
{
"runtime_mappings": {
"is_opened": {
"type": "long",
"script": {
"source": " ... "
}
}
{{#user_location}}
,"distance": {
"type": "long",
"script": {
"source": " ... "
}
}
{{/user_location}}
},
"query": {
"script_score": {
"query": { ... }
},
"script": {
"source": " ... "
}
}
},
"fields": [
"is_opened"
{{#user_location}},"distance"{{/user_location}}
],
...
}
I use mustache variables (with double brackets) everywhere in the script :
in the computed fields ("is_opened", "distance")
in query and filters
in script score
Is there a way to "optimize" internal scripts (computed fields and score script) so as not to restart compilation each time the values for the parameters change ?

To avoid compilations, I need to use "params" inside the embedded runtime fields scripts and inside the query score script.
I had indeed used the parameters for the main script written in "mustache" but I had not done so for the embedded scripts written in "painless".
Thanks #Val for giving me a hint.

Related

ElasticSearch scripting set an object value

I am trying to use a script to set several values of my Elastic document.
POST myindex/_update_by_query
{
"script" : {
"source": """
ctx._source.categories='categories';
ctx._source.myObject={};
""",
"lang": "painless"
},
"query": {
"term" : {
"name": "Tony"
}
}
}
But I can't set an object value with this painless language. However I write it, I get an error.
Is there a way to do this, maybe with a different script language ?
Thanks!
In order to create an object (i.e. a hash map), you should do it this way:
ctx._source.myObject = [:];

How to use carriage return in a script template with a runtime mapping field?

Here is an example that illustrates the problem we are having with "mustache" and the carriage return.
In our script template, we need :
a runtime mapping field : to compute a result (with a big script in our real case)
conditional template : to build search criteria according to params existence (many criteria in our real case)
We use Elasticsearch 7.16 and kibana debug console to make our tests.
We create this script template with this request :
POST _scripts/test
{
"script": {
"lang": "mustache",
"source": """{
"runtime_mappings": {
"result": {
"type": "long",
"script": {
"source": "emit({{#foo}}{{foo}}{{/foo}}{{^foo}}0{{/foo}})"
}
}
}
{{#foo}}
,"fields": [
"result"
]
{{/foo}}
}"""
}
}
Here are 2 examples of requests that show how this script works:
Request 1 : Search request with param
Return the computed field "result" with the "foo" parameter value (12345)
GET _search/template
{
"id": "test",
"params": {
"foo": 12345
}
}
Request 2 : Search request without param
Don't return computed field "result".
GET _search/template
{
"id": "test"
}
Like i said before, in our real case we have a very big "painless" script in the computed field.
For more readability, we therefore wrote this script on several lines and that's when a problem appears.
An error happened when we declare:
"source": "
emit({{#foo}}{{foo}}{{/foo}}{{^foo}}0{{/foo}})
"
instead of:
"source": "emit({{#foo}}{{foo}}{{/foo}}{{^foo}}0{{/foo}})"
Due to the JSON specifications, we cannot use carriage returns otherwise we get the following error:
Illegal unquoted character ((CTRL-CHAR, code 10)): has to be escaped using backslash to be included in string value
We also cannot use the notation with """ because it will conflict with the one used to declare the source of the script template.
Is there a trick to set the computed field script to multiple lines in Kibana debug console ?

How to change the field type in an ElasticSearch Index?

I have index_A, which includes a number field "foo".
I copy the mapping for index_A, and make a dev tools call PUT /index_B with the field foo changed to text, so the mapping portion of that is:
"foo": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
I then reindex index_A to index_B with:
POST _reindex
{
"source": {
"index": "index_A"
},
"dest": {
"index": "index_B"
}
}
When I go to view any document for index_B, the entry for the "foo" field is still a number. (I was expecting for example: "foo": 30 to become "foo" : "30" in the new document's source).
As much as I've read on Mappings and reindexing, I'm still at a loss on how to accomplish this. What specifically do I need to run in order to get this new index with "foo" as a text field, and all number entries for foo in the original index changed to text entries in the new index?
There's a distinction between how a field is stored vs indexed in ES. What you see inside of _source is stored and it's the "original" document that you've ingested. But there's no explicit casting based on the mapping type -- ES stores what it receives but then proceeds to index it as defined in the mapping.
In order to verify how a field was indexed, you can inspect the script stack returned in:
GET index_b/_search
{
"script_fields": {
"debugging_foo": {
"script": {
"source": "Debug.explain(doc['foo'])"
}
}
}
}
as opposed to how a field was stored:
GET index_b/_search
{
"script_fields": {
"debugging_foo": {
"script": {
"source": "Debug.explain(params._source['foo'])"
}
}
}
}
So in other words, rest assured that foo was indeed indexed as text + keyword.
If you'd like to explicitly cast a field value into a different data type in the _source, you can apply a script along the lines of:
POST _reindex
{
"source": {
"index": "index_a"
},
"dest": {
"index": "index_b"
},
"script": {
"source": "ctx._source.foo = '' + ctx._source.foo"
}
}
I'm not overly familiar with java but I think ... = ctx._source.foo.toString() would work too.
FYI there's a coerce mapping parameter which sounds like it could be of use here but it only works the other way around -- casting/parsing from strings to numerical types etc.
FYI#2 There's a pipeline processor called convert that does exactly what I did in the above script, and more. (A pipeline is a pre-processor that runs before the fields are indexed in ES.) The good thing about pipelines is that they can be run as part of the _reindex process too.

Is there a way to update a document with a Painless script without changing the order of unaffected fields?

I'm using Elasticsearch's Update by Query API to update some documents with a Painless script like this (the actual query is more complicated):
POST ts-scenarios/_update_by_query?routing=test
{
"query": {
"term": { "routing": { "value": "test" } }
},
"script": {
"source": """ctx._source.tagIDs = ["5T8QLHIBB_kDC9Ugho68"]"""
}
}
This works, except that upon reindexing, other fields get reordered, including some classes which are automatically (de)serialized using JSON.NET's type handling. That means a document with the following source before the update:
{
"routing" : "testsuite",
"activities" : [
{
"$type" : "Test.Models.SomeActivity, Test"
},
{
"$type" : "Test.Models.AnotherActivity, Test",
"CustomParameter" : 1,
"CustomSetting" : false
}
]
}
ends up as
{
"routing" : "testsuite",
"activities" : [
{
"$type" : "Test.Models.SomeActivity, Test"
},
{
"CustomParameter" : 1,
"CustomSetting" : false,
"$type" : "Test.Models.AnotherActivity, Test"
}
],
"tagIDs" : [
"5T8QLHIBB_kDC9Ugho68"
]
}
which JSON.NET can't deserialize. Is there a way I can tell the script (or the Update by Query API) not to change the order of those other fields?
In case it matters, I'm using Elasticsearch OSS version 7.6.1 on macOS. I haven't checked whether an Ingest pipeline would work here, as I'm not familiar with them.
(It turns out I can make the deserialization more flexible by setting the MetadataPropertyHandling property to ReadAhead, as mentioned here. That works, but as mentioned it may hurt performance and there might be other situations where field order matters. Technically, it shouldn't; JSON isn't XML, but there are always edge cases where it does matter.)

Elasticsearch. Painless script to search based on the last result

Let's see if someone could shed a light on this one, which seems to be a little hard.
We need to correlate data from multiple index and various fields. We are trying painless script.
Example:
We make a search in an index to gather data about the queueid of mails sent by someone#domain
Once we have the queueids, we need to store the queueids in an array an iterate over it to make new searchs to gather data like email receivers, spam checks, postfix results and so on.
Problem: Hos can we store the data from one search and use it later in the second search?
We are testing something like:
GET here_an_index/_search
{
"query": {
"bool" : {
"must": [
{
"range": {
"#timestamp": {
"gte": "now-15m",
"lte": "now"
}
}
}
],
"filter" : {
"script" : {
"script" : {
"source" : "doc['postfix_from'].value == params.from; qu = doc['postfix_queueid'].value; return qu",
"params" : {
"from" : "someona#mdomain"
}
}
}
}
}
}
}
And, of course, it throws an error.
"doc['postfix_from'].value ...",
"^---- HERE"
So, in a nuttshell: is there any way ti execute a search looking for some field value based on a filter (like from:someone#dfomain) and use this values on later searchs?
We have evaluated using script fields or nested, but due to some architecture reasons and what those changes would entail, right now, can not be used.
Thank you very much!

Resources