Elasticsearch: Use loop in Painless script - elasticsearch

I have an old version of Elasticsearch (5.6.16) on a production environment that I can't upgrade.
I'm trying to use a loop in a painless script_score script but I always face a runtime error.
All my documents can have one or several "badges", here the mapping:
"myDocument":{
"properties":{
"badges":{
"type":"nested",
"properties":{
"name":{
"type":"keyword"
}
}
},
}
},
My goal is to do a custom script that will provide a better score for documents with a specific type of badge
So I made this script
for (item in doc['badges']) {
if (item['name'] == "myCustomBadge") {
return _score * 10000;
}
}
return _score;
But unfortunately, I'm getting errors while I try to use it
{
"query":{
"function_score":{
"query":{
"match_all": {}
},
"functions":[
{
"script_score":{
"script":{
"inline":"for (item in doc['badges']) { if (item['name'] == \"myCustomBadge\") { return _score * 10000; }}return _score;",
"lang":"painless"
}
}
},
]
}
}
}
"error":{
"root_cause":[
{
"type":"script_exception",
"reason":"runtime error",
"script_stack":[
"org.elasticsearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:77)",
"org.elasticsearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:36)",
"for (item in doc['badges']) { ",
" ^---- HERE"
],
"script":"for (item in doc['badges']) { if (item['name'] == \"myCustomBadge\") { return _score * 10000; }}return _score;",
"lang":"painless"
}
],
"type":"search_phase_execution_exception",
"reason":"all shards failed"
}
I tried to change the for with an other variant, but same error.
for(int i = 0; i < doc['badges']; i++) {
if (doc['badges'][i]['name'] == "uaWorker") {
return _score * 10000;
}
}
return _score;
Could you help me find what I did wrong?
Thanks you all

The problem is not the loop but the fact that badges is nested and you're trying to access it from doc values. In this case, you need to access the array of badges from the _source document directly, like this:
for (item in params._source['badges'])

Related

What do I get 'Unknown key for this start object' error with runtime_mappings on elasticsearch?

I'm trying to run an elasticsearch search using the elasticsearch package (v7.15) in python.
Below is the dict sent to the search function :
{
"runtime_mappings": {
"tag_dynamic": {
"type": "keyword",
"script": {
"source": """
String nowString = params['now'];
ZonedDateTime nowZdt = ZonedDateTime.parse(nowString);
long now = nowZdt.toInstant().toEpochMilli();
ZonedDateTime mtimeZdt = ZonedDateTime.parse(doc['m_time']);
long millisDateTime = mtimeZdt.toInstant().toEpochMilli();
long Mtime_elapsedTime = now - millisDateTime;
ZonedDateTime zdtMinus = nowZdt.minusDays(30);
long millisMinusTime = zdtMinus.toInstant().toEpochMilli();
long Mtime_elapsedTime = now - millisMinusTime;
if (Mtime_elapsedTime > 3d0_elapsedTime) {
dyntag = 'TOARCHIVE';}
emit(dyntag);
""",
"params": {
"now": "<generated string datetime in ISO-8601>"
}
}
}
},
"query": {
"query_string": {
"query": "m_time:*"
}
}
}
And I get this error : Unknown key for a START_OBJECT in [runtime_mappings]. Yet the search syntax is nearly identitical to the one in the Elasticsearch docs.
Can anyone tell me the reason why I keep getting this error ?
I tested many variations , including removing the "query" part, adding "body":{ in the beginning, adding "query":{ in the beginning, etc. And I always get the same error.

Painless script to increase the count if the full path exists or else add the full path and add the count

I am creating a script to increase the count value of the field if the field full path exist, or else I have to add the full path dynamically. For example, In the below example
If the record already has inner->board1->count, I should increment the value of it by the value of the count
If I don't have inner or board1 or count, I should add them and add the value of the count. Please also note here the inner or board1orcount` are not fixed.
If the value is not an object, I can check using ctx._source.myCounts == null, but I am not sure how to check for the object fields and subfields and sub subfields.
Code
POST test/_update/3
{
"script": {
"source": "ctx._source.board_counts = params.myCounts",
"lang": "painless",
"params": {
"myCounts": {
"inner":{
"board1":{"count":5},
"board2":{"count":4},
"board3":{"temp":1,"temp2":3}
},
"outer":{
"board1":{"count":5},
"board10":{"temp":1,"temp2":3}
}
}
}
}
}
I am able to come up with this and working fine.
POST test/_update/3
{
"script": {
"source": "{"source": "if (ctx._source['myCounts'] == null) {ctx._source['myCounts'] = [:];} for (mainItem in params.myCounts) { for (accessItemKey in mainItem.keySet()) { if (ctx._source.myCounts[accessItemKey] == null) { ctx._source.myCounts[accessItemKey] = [:];}for (boardItemKey in mainItem[accessItemKey].keySet()) {if (ctx._source.myCounts[accessItemKey][boardItemKey] == null) {ctx._source.myCounts[accessItemKey][boardItemKey] = [:];} for (countItemKey in mainItem[accessItemKey][boardItemKey].keySet()) { if (ctx._source.myCounts[accessItemKey][boardItemKey][countItemKey] == null) { ctx._source.myCounts[accessItemKey][boardItemKey][countItemKey] =mainItem[accessItemKey][boardItemKey][countItemKey]; }else {ctx._source.myCounts[accessItemKey][boardItemKey][countItemKey] += mainItem[accessItemKey][boardItemKey][countItemKey];}}}}}",
"lang": "painless",
"params": {
"myCounts": {
"inner":{
"board1":{"count":5},
"board2":{"count":4},
"board3":{"temp":1,"temp2":3}
},
"outer":{
"board1":{"count":5},
"board10":{"temp":1,"temp2":3}
}
}
}
}
}

Elasticsearch: Multiply each nested element plus aggregation

Let's imagine an index composed of 2 documents like this one:
doc1 = {
"x":1,
"y":[{brand:b1, value:1},
{brand:b2, value:2}]
},
doc2 = {
"x":2,
"y":[{brand:b1, value:0},
{brand:b2, value:3}]
}
Is it possible to multiply each values of y by x for each document and then do sum aggregation based on brand term to get this result:
b1: 1
b2: 8
If not, could it be done with any other mapping types ?
This is a highly custom use-case so I don't think there's some sort of a pre-optimized mapping for it.
What I would suggest is the following:
Set up an index w/ y being nested:
PUT xy/
{"mappings":{"properties":{"y":{"type":"nested"}}}}
Ingest the docs from your example:
POST xy/_doc
{"x":1,"y":[{"brand":"b1","value":1},{"brand":"b2","value":2}]}
POST xy/_doc
{"x":2,"y":[{"brand":"b1","value":0},{"brand":"b2","value":3}]}
Use a scripted_metric aggregation to compute the products and add them up in a shared HashMap:
GET xy/_search
{
"size": 0,
"aggs": {
"multiply_and_add": {
"scripted_metric": {
"init_script": "state.by_brands = [:]",
"map_script": """
def x = params._source['x'];
for (def brand_pair : params._source['y']) {
def brand = brand_pair['brand'];
def product = x * brand_pair['value'];
if (state.by_brands.containsKey(brand)) {
state.by_brands[brand] += product;
} else {
state.by_brands[brand] = product;
}
}
""",
"combine_script": "return state",
"reduce_script": "return states"
}
}
}
}
which would yield something along the lines of
{
...
"aggregations":{
"multiply_and_add":{
"value":[
{
"by_brands":{ <----
"b2":8,
"b1":1
}
}
]
}
}
}
UPDATE
The combine_script could look like this:
def combined_states = [:];
for (def state : states) {
for (def brand_pair : state['by_brands'].entrySet()) {
def key = brand_pair.getKey();
def value = brand_pair.getValue();
if (combined_states.containsKey(key)) {
combined_states[key] += (float)value;
break;
}
combined_states[key] = (float)value
}
}

Elasticsearch scripted_metric null_pointer_exception

I'm trying to use the scripted_metric aggs of Elasticsearch and normally, it's working perfectly fine with my other scripts
However, with script below, I'm encountering an error called "null_pointer_exception" but they're just copy-pasted scripts and working for 6 modules already
$max = 10;
{
"query": {
"match_all": {}
//omitted some queries here, so I just turned it into match_all
}
},
"aggs": {
"ARTICLE_CNT_PDAY": {
"histogram": {
"field": "pub_date",
"interval": "86400"
},
"aggs": {
"LATEST": {
"nested": {
"path": "latest"
},
"aggs": {
"SUM_SVALUE": {
"scripted_metric": {
"init_script": "
state.te = [];
state.g = 0;
state.d = 0;
state.a = 0;
",
"map_script": "
if(state.d != doc['_id'].value){
state.d = doc['_id'].value;
state.te.add(state.a);
state.g = 0;
state.a = 0;
}
state.a = doc['latest.soc_mm_score'].value;
",
"combine_script": "
state.te.add(state.a);
double count = 0;
for (t in state.te) {
count += ((t*10)/$max)
}
return count;
",
"reduce_script": "
double count = 0;
for (a in states) {
count += a;
}
return count;
"
}
}
}
}
}
}
}
}
I tried running this script in Kibana, and here's the error message:
What I'm getting is, that there's something wrong with the reduce_script portion, tried to change this part:
FROM
for (a in states) {
count += a;
}
TO
for (a in states) {
count += 1;
}
And worked perfectly fine, I felt that the a variable isn't getting what it's supposed to hold
Any ideas here? Would appreciate your help, thank you very much!
The reason is explained here:
If a parent bucket of the scripted metric aggregation does not collect any documents an empty aggregation response will be returned from the shard with a null value. In this case the reduce_script's states variable will contain null as a response from that shard. reduce_script's should therefore expect and deal with null responses from shards.
So obviously one of your buckets is empty, and you need to deal with that null like this:
"reduce_script": "
double count = 0;
for (a in states) {
count += (a ?: 0);
}
return count;
"

Script to return array for scripted metric aggregation from combine

For scripted metric aggregation , in the example shown in the documentation , the combine script returns a single number.
Instead here , can i pass an array or hash ?
I tried doing it , though it did not return any error , i am not able to access those values from reduce script.
In reduce script per shard i am getting an instance when converted to string read as 'Script2$_run_closure1#52ef3bd9'
Kindly let me know , if this can be accomplished in any way.
At least for Elasticsearch version 1.5.1 you can do so.
For example, we can modify Elasticsearch example (scripted metric aggregation) to receive an average profit (profit divided by number of transactions):
{
"query": {
"match_all": {}
},
"aggs": {
"avg_profit": {
"scripted_metric": {
"init_script": "_agg['transactions'] = []",
"map_script": "if (doc['type'].value == \"sale\") { _agg.transactions.add(doc['amount'].value) } else { _agg.transactions.add(-1 * doc['amount'].value) }",
"combine_script": "profit = 0; num_of_transactions = 0; for (t in _agg.transactions) { profit += t; num_of_transactions += 1 }; return [profit, num_of_transactions]",
"reduce_script": "profit = 0; num_of_transactions = 0; for (a in _aggs) { profit += a[0] as int; num_of_transactions += a[1] as int }; return profit / num_of_transactions as float"
}
}
}
}
NOTE: this is just a demo for an array in the combine script, you can calculate average easily without using any arrays.
The response will look like:
"aggregations" : {
"avg_profit" : {
"value" : 42.5
}
}

Resources