How to write multiline Elasticsearch scripts with Postman - elasticsearch

I'm trying to do an Elasticsearch GET query with a very simple script using Postman. When the script is all on one line it works but if I try to do multiple lines I get an error
I'm sening the data as JSON with Content-Type: application/json in the header
Example - Works:
{
"query":{
"match_all": {}
},
"script_fields": {
"my_custom_field":{
"script": {
"lang": "painless",
"source": "int count = 1; return count;"
}
}
}
}
Example - Produces Error:
{
"query":{
"match_all": {}
},
"script_fields": {
"my_custom_field":{
"script": {
"lang": "painless",
"source": """
int count = 1;
return count;
"""
}
}
}
}
The error:
"Unexpected character ('\"' (code 34)): was expecting comma to separate Object entries\n at [Source: org.elasticsearch.transport.netty4.ByteBufStreamInput#20493763; line: 9, column: 18]"
I think postman may be adding line breaks in behind the scenes.

Triple-quotes in JSON are technically not valid -- see this thread for more info.
You've essentially got 3 options:
Write a script which takes in a multiline "JSON" text and produces a \n-separated, valid JSON (what I often did before multiline backtick strings were a thing in JavaScript, and still do in php):
function compactifyMultilineString( $input_string )
{
return str_replace( array( "\r", "\n", "\t" ), " ", $input_string );
}
Use postman's own pre-request scripts
Or, probably the most reasonable option, set up Kibana right next to your ElasticSearch server. Kibana is great for testing out queries and it also supports a postman-ready copy feature:

Related

How to prevent "Too many dynamic script compilations within" error with search templates?

I use a search template with "mustache" language to build dynamic queries according to different parameters.
When I often modify the values ​​of the parameters of this request, I get this error message :
[script] Too many dynamic script compilations within, max: [150/5m];
I think that each time the values ​​of the parameters change, the script is recompiled but if the values ​​are identical then elasticsearch uses a cache so as not to recompile the script.
In our case, the cache cannot be used because at each request the values ​​are always different (local timestamp, variable distance, random seed generated by a client...)
To prevent this error, I change the cluster settings to increase the max_compilations_rate value at the cost of higher server load.
Is there a way to limit recompilation ?
My "big" script computes score according to many parameters and uses Elasticsearch 8.2.
The structure of the script is as follows :
{
"script": {
"lang": "mustache",
"source": "...",
"params": { ... }
}
}
The source code looks like this :
{
"runtime_mappings": {
"is_opened": {
"type": "long",
"script": {
"source": " ... "
}
}
{{#user_location}}
,"distance": {
"type": "long",
"script": {
"source": " ... "
}
}
{{/user_location}}
},
"query": {
"script_score": {
"query": { ... }
},
"script": {
"source": " ... "
}
}
},
"fields": [
"is_opened"
{{#user_location}},"distance"{{/user_location}}
],
...
}
I use mustache variables (with double brackets) everywhere in the script :
in the computed fields ("is_opened", "distance")
in query and filters
in script score
Is there a way to "optimize" internal scripts (computed fields and score script) so as not to restart compilation each time the values for the parameters change ?
To avoid compilations, I need to use "params" inside the embedded runtime fields scripts and inside the query score script.
I had indeed used the parameters for the main script written in "mustache" but I had not done so for the embedded scripts written in "painless".
Thanks #Val for giving me a hint.

How to use carriage return in a script template with a runtime mapping field?

Here is an example that illustrates the problem we are having with "mustache" and the carriage return.
In our script template, we need :
a runtime mapping field : to compute a result (with a big script in our real case)
conditional template : to build search criteria according to params existence (many criteria in our real case)
We use Elasticsearch 7.16 and kibana debug console to make our tests.
We create this script template with this request :
POST _scripts/test
{
"script": {
"lang": "mustache",
"source": """{
"runtime_mappings": {
"result": {
"type": "long",
"script": {
"source": "emit({{#foo}}{{foo}}{{/foo}}{{^foo}}0{{/foo}})"
}
}
}
{{#foo}}
,"fields": [
"result"
]
{{/foo}}
}"""
}
}
Here are 2 examples of requests that show how this script works:
Request 1 : Search request with param
Return the computed field "result" with the "foo" parameter value (12345)
GET _search/template
{
"id": "test",
"params": {
"foo": 12345
}
}
Request 2 : Search request without param
Don't return computed field "result".
GET _search/template
{
"id": "test"
}
Like i said before, in our real case we have a very big "painless" script in the computed field.
For more readability, we therefore wrote this script on several lines and that's when a problem appears.
An error happened when we declare:
"source": "
emit({{#foo}}{{foo}}{{/foo}}{{^foo}}0{{/foo}})
"
instead of:
"source": "emit({{#foo}}{{foo}}{{/foo}}{{^foo}}0{{/foo}})"
Due to the JSON specifications, we cannot use carriage returns otherwise we get the following error:
Illegal unquoted character ((CTRL-CHAR, code 10)): has to be escaped using backslash to be included in string value
We also cannot use the notation with """ because it will conflict with the one used to declare the source of the script template.
Is there a trick to set the computed field script to multiple lines in Kibana debug console ?

Split a string in Painless/ELK

I have a string field "myfield.keyword", where entries have the following format:
AAA_BBBB_CC
DDD_EEE_F
I am trying to create a scripted field that outputs the substring before the first _, a scripted field that outputs the substring between the first and second _ and a scripted field that outputs the substring after the second _.
I was trying to use .split('_') to do this, but found that this method is not available in Painless:
def newfield = "";
def path = doc[''myfield.keyword].value;
if (...)
{newfield = path.split('_')[1];} else {newfield="null";}
return newfield
I then tried the workaround suggested here, but found that I must enable regexes in Elastic (which would not be possible in my case):
def newfield = "";
def path = doc[''myfield.keyword].value;
if (...)
{newfield = /_/.split(path)[1];} else {newfield="null";}
return newfield
Is there a way to do this that does presuppose enabling regexes?
EDIT:
Thank you for such an elegante solution Val. This answers the question I asked. My question, however, was not well formed. In particular, the string that needs to be split has four occurrences of '_'. Something like:
AAA_BB_CCC_DD_E
FFF_GGG_HH_JJJJ_KK
So, if I understand correctly, indexOf() and lastIndexOf() cannot give me BB, CCC or DD. I thought that I could adapt your solution, and find the index of the second and third occurrences of _, by using string.indexOf("_", 1) and string.indexOf("_", 2). However, I always get the same result as string.indexOf("_"), without any extra parameters (i.e. the result is always the index of _'s first occurence).
Enabling regular expressions is not terribly complicated, but it requires restarting your cluster and that might not be easy for you depending on the environment.
Another way to achieve this is to do it the "old way". First you create a reusable script for each of the script fields. What that script does is simply find the first, second, third and last occurrence of the _ symbol and returns the split elements. It takes as input the field name to split and the index of the substring to return:
POST _scripts/my-split
{
"script": {
"lang": "painless",
"source": """
def str = doc[params.field].value;
def first = str.indexOf("_");
def second = first + 1 + str.substring(first + 1).indexOf("_");
def third = second + 1 + str.substring(second + 1).indexOf("_");
def last = str.lastIndexOf("_");
def parts = [
str.substring(0, first),
str.substring(first + 1, second),
str.substring(second + 1, third),
str.substring(third + 1, last),
str.substring(last + 1)
];
return parts[params.index];
"""
}
}
Then you can simply define one script field for each of the parts like this:
POST test/_search
{
"script_fields": {
"first": {
"script": {
"id": "my-split",
"params": {
"field": "myfield.keyword",
"index": 0
}
}
},
"second": {
"script": {
"id": "my-split",
"params": {
"field": "myfield.keyword",
"index": 1
}
}
},
"third": {
"script": {
"id": "my-split",
"params": {
"field": "myfield.keyword",
"index": 2
}
}
}
}
}
The response you get will look like this:
{
"_index" : "test",
"_type" : "_doc",
"_id" : "ykS-l3UBeO1HTBdDvTZd",
"_score" : 1.0,
"fields" : {
"first" : [
"AAA"
],
"second" : [
"BBBB"
],
"third" : [
"CC"
]
}
}
You could use str.splitOnToken("_") and retrieve each result as an array and loop the array for any of your purposes.
You can even split on variable tokens such as:
def message = "[LOG] Something something WARNING: Your warning";
def reason = message.splitOnToken("WARNING: ")[1];
So reason will hold the remaining string: Your warning.

Elastic search runs groovy scripts two times, is it a bug?

I found some unexpected behaviour with script query (script is executing two times in a simple query).
My configuration: elastic search version: 2.4.6 (issue remains in elastic 5.6)
My elasticsearch.yml:
script.indexed: true
The steps to reproduce the issue:
1) I have one simple document, doc1.json:
{
"id": "1",
"tags": "t1"
}
2) Insert doc1 in Elastic:
http PUT localhost:9200/default/type1/1 #doc1.json
3) I have one simple groovy script, script1.json (just returns the score and print it):
{
"script": "println('Score is ' + _score * 1.0 + ' for document ' + doc['id'] + ' at ' + DateTime.now().getMillis()); return _score;"
}
4) Register script1:
http POST 'localhost:9200/_scripts/groovy/script1' #script1.json
5) Execute this query_with_script.json:
{
"query":{
"function_score":{
"query":{
"bool":{
"must":{
"match":{
"tags":{
"query":"t1",
"type":"boolean"
}
}
}
}
},
"functions":[
{
"script_score":{
"script":{
"id":"script1",
"lang":"groovy"
}
}
}
],
"boost_mode":"replace"
}
},
"explain" : true
}
http GET 'localhost:9200/default/type1/_search' #query_with_script.json
6) Why in Elastic search logs I see that the script is executed in two different times? Is it a bug?
Score is 0.19178301095962524 for document [1] at 1516586818596
Score is 0.19178301095962524 for document [1] at 1516586818606
Thanks a lot!
You should probably remove the explain flag as it might be the reason why the script gets executed twice.

Match query return records only if query contains all words of object's field

I read about match and multiword queries but it seems that I need to do something a bit different.
Let's say I have following query: "this is a test" and I want to find that query in one field called "text". I want to get objects which match some of that query (doesn't matter how many words) but only those objects which query value contains every word of text field.
Example for query: "this is a test". I want get those objects:
obj1: {"text":"this is a test"}
obj2: {"text":"this is a"}
obj3 : { "text" : "is a" }
obj4 : { "text" : "test" }
But if obj has something more in text field it will not be returned for example:
obj5: {"text":"this is a test and something more"}
Is it possible to achieve this using Elasticsearch?
It's kind of a hack, but I was able to get it to work with a script filter:
POST /test_index/_search
{
"query": {
"match": {
"text": "this is a test"
}
},
"filter": {
"script": {
"script": "for(val in doc[\"text\"].values){ if(!(val in terms)){ return false; }}; return true;",
"params": {
"terms": ["this", "is", "a", "test"]
}
}
}
}
I thought there would be a better way to do this, but wasn't immediately able to come up with one. Using scripting can be problematic in production, unless your ES cluster is behind an auth wall of some kind.
Anyway, here's the code I used to test it:
http://sense.qbox.io/gist/3929abc89d71ebf724e6121b1b5ba6da54501088

Resources