create parametric scripted_metric in elasticsearch - elasticsearch

I want to use scripted_metric within an aggregation. I have some parametric values in my script that I want to set those per query, It is possible to create this query at all?
below an example for what I'm looking for
"aggs": {
"testAgg": {
"scripted_metric": {
"init_script": "_agg['maximum'] = []",
"map_script": "max = 0; for(tv in _source.tvs){ if(tv.att1>= param1 && tv.attr2 <= param2 && tv.att3 > max){max = tv.att3; }}; _agg.maximum.add(max);",
"combine_script": "sum = 0; for (m in _agg.maximum) { sum += m }; return sum;",
"reduce_script": "sum = 0; for (a in _aggs) { sum += a }; return sum;"
}
}
}
param1 and param2 are my parametric values, how to change this aggregation for my purpose?
tnx :)

You can do it by specifying a global params map
"aggs": {
"testAgg": {
"scripted_metric": {
"params": {
"_agg": {},
"param1": 10,
"param2": 20
},
"init_script": "_agg['maximum'] = []",
"map_script": "max = 0; for(tv in _source.tvs){ if(tv.att1>= param1 && tv.attr2 <= param2 && tv.att3 > max){max = tv.att3; }}; _agg.maximum.add(max);",
"combine_script": "sum = 0; for (m in _agg.maximum) { sum += m }; return sum;",
"reduce_script": "sum = 0; for (a in _aggs) { sum += a }; return sum;"
}
}
}

Related

Elasticsearch painless sorting null_pointer_error

I'm trying to create a sorting script with painless, filtering nested documents.
The reason I'm doing this with a script, is because I need to emulate a COALESCE statement.
My documents have titles stored like:
{
title: [
{
type: MainTitle,
value: [
{
language: eng,
label: The title
},
{
language: ger,
label: Das title
}
]
},
{
type: AvailabilityTitle,
value: [
{
language: eng,
label: New thing!
}
]
}
]
}
title and title.value are nested documents.
I want to sort documents primarily by their english MainTitle and by their german MainTitle only if no english MainTitle exists - even if the german title gave a higher score.
I'm trying to simply sort by the english MainTitle first to try it out and this is the script:
def source = params._source;
def titles = source.title;
if (titles != null && titles.length > 0) {
for(int i=0; i < titles.length; i++) {
def t = titles[i];
if (t.type == 'MainTitle') {
def values = t.value;
if (values != null && values.length > 0) {
for (int j = 0; j < values.length; j++) {
def v = values[j];
if (v.language == 'eng') {
return v.label;
}
}
}
}
}
}
return \"\";
For some reason I'm getting a null_pointer_exception
"script_stack": [
"if (values != null && values.length > 0) { ",
" ^---- HERE"
],
I don't get how values can be null at that point since I'm specifically checking for null just before it.
The null_pointer_exception is thrown, not because values is null, but because values does not have a method/function called length. That is because for some reason values is an ArrayList even though titles earlier is an Array. Apparently they both have the method/function size() so I can just use that.
So this works:
def source = params._source;
def titles = source.title;
if (titles != null && titles.size() > 0) {
for(int i=0; i < titles.size(); i++) {
def t = titles[i];
if (t.type == 'MainTitle') {
def values = t.value;
if (values != null && values.size() > 0) {
for (int j = 0; j < values.size(); j++) {
def v = values[j];
if (v != null && v.language == 'fin') {
return v.label;
}
}
}
}
}
}
return '';

Elasticsearch: Multiply each nested element plus aggregation

Let's imagine an index composed of 2 documents like this one:
doc1 = {
"x":1,
"y":[{brand:b1, value:1},
{brand:b2, value:2}]
},
doc2 = {
"x":2,
"y":[{brand:b1, value:0},
{brand:b2, value:3}]
}
Is it possible to multiply each values of y by x for each document and then do sum aggregation based on brand term to get this result:
b1: 1
b2: 8
If not, could it be done with any other mapping types ?
This is a highly custom use-case so I don't think there's some sort of a pre-optimized mapping for it.
What I would suggest is the following:
Set up an index w/ y being nested:
PUT xy/
{"mappings":{"properties":{"y":{"type":"nested"}}}}
Ingest the docs from your example:
POST xy/_doc
{"x":1,"y":[{"brand":"b1","value":1},{"brand":"b2","value":2}]}
POST xy/_doc
{"x":2,"y":[{"brand":"b1","value":0},{"brand":"b2","value":3}]}
Use a scripted_metric aggregation to compute the products and add them up in a shared HashMap:
GET xy/_search
{
"size": 0,
"aggs": {
"multiply_and_add": {
"scripted_metric": {
"init_script": "state.by_brands = [:]",
"map_script": """
def x = params._source['x'];
for (def brand_pair : params._source['y']) {
def brand = brand_pair['brand'];
def product = x * brand_pair['value'];
if (state.by_brands.containsKey(brand)) {
state.by_brands[brand] += product;
} else {
state.by_brands[brand] = product;
}
}
""",
"combine_script": "return state",
"reduce_script": "return states"
}
}
}
}
which would yield something along the lines of
{
...
"aggregations":{
"multiply_and_add":{
"value":[
{
"by_brands":{ <----
"b2":8,
"b1":1
}
}
]
}
}
}
UPDATE
The combine_script could look like this:
def combined_states = [:];
for (def state : states) {
for (def brand_pair : state['by_brands'].entrySet()) {
def key = brand_pair.getKey();
def value = brand_pair.getValue();
if (combined_states.containsKey(key)) {
combined_states[key] += (float)value;
break;
}
combined_states[key] = (float)value
}
}

Elasticsearch scripted_metric null_pointer_exception

I'm trying to use the scripted_metric aggs of Elasticsearch and normally, it's working perfectly fine with my other scripts
However, with script below, I'm encountering an error called "null_pointer_exception" but they're just copy-pasted scripts and working for 6 modules already
$max = 10;
{
"query": {
"match_all": {}
//omitted some queries here, so I just turned it into match_all
}
},
"aggs": {
"ARTICLE_CNT_PDAY": {
"histogram": {
"field": "pub_date",
"interval": "86400"
},
"aggs": {
"LATEST": {
"nested": {
"path": "latest"
},
"aggs": {
"SUM_SVALUE": {
"scripted_metric": {
"init_script": "
state.te = [];
state.g = 0;
state.d = 0;
state.a = 0;
",
"map_script": "
if(state.d != doc['_id'].value){
state.d = doc['_id'].value;
state.te.add(state.a);
state.g = 0;
state.a = 0;
}
state.a = doc['latest.soc_mm_score'].value;
",
"combine_script": "
state.te.add(state.a);
double count = 0;
for (t in state.te) {
count += ((t*10)/$max)
}
return count;
",
"reduce_script": "
double count = 0;
for (a in states) {
count += a;
}
return count;
"
}
}
}
}
}
}
}
}
I tried running this script in Kibana, and here's the error message:
What I'm getting is, that there's something wrong with the reduce_script portion, tried to change this part:
FROM
for (a in states) {
count += a;
}
TO
for (a in states) {
count += 1;
}
And worked perfectly fine, I felt that the a variable isn't getting what it's supposed to hold
Any ideas here? Would appreciate your help, thank you very much!
The reason is explained here:
If a parent bucket of the scripted metric aggregation does not collect any documents an empty aggregation response will be returned from the shard with a null value. In this case the reduce_script's states variable will contain null as a response from that shard. reduce_script's should therefore expect and deal with null responses from shards.
So obviously one of your buckets is empty, and you need to deal with that null like this:
"reduce_script": "
double count = 0;
for (a in states) {
count += (a ?: 0);
}
return count;
"

ElasticSearch/Painless: How do I skip an item when iterating?

I have a for loop that iterates a list. If the list contains a certain value, say "5", I want the loop to skip that value. But Painless seems determined to not permit that by not letting me have an empty if block or use a continue statement. How can I accomplish this?
"script_fields": {
"HResultCount": {
"script": {
"lang": "painless",
"inline": "int instance = 0; for (int i = 0; i < doc['numbers'].length; ++i) { if (doc['numbers'] == '5') { /* bail out */ } else { return 1.0; } }"
}
}
Since a script has to return a value in all cases, you can remove the value 5 from the list before iterating as you suggested.
You can achieve this like that by calling removeIf on a copy of your list with a Java 8 lambda:
"script_fields": {
"HResultCount": {
"script": {
"lang": "painless",
"inline": "int instance = 0; List copy = new ArrayList(doc['numbers']); copy.removeIf(i -> i == 5); for (int i = 0; i < copy.length; ++i) { instance += copy[i]; } return instance;"
}
}

Script to return array for scripted metric aggregation from combine

For scripted metric aggregation , in the example shown in the documentation , the combine script returns a single number.
Instead here , can i pass an array or hash ?
I tried doing it , though it did not return any error , i am not able to access those values from reduce script.
In reduce script per shard i am getting an instance when converted to string read as 'Script2$_run_closure1#52ef3bd9'
Kindly let me know , if this can be accomplished in any way.
At least for Elasticsearch version 1.5.1 you can do so.
For example, we can modify Elasticsearch example (scripted metric aggregation) to receive an average profit (profit divided by number of transactions):
{
"query": {
"match_all": {}
},
"aggs": {
"avg_profit": {
"scripted_metric": {
"init_script": "_agg['transactions'] = []",
"map_script": "if (doc['type'].value == \"sale\") { _agg.transactions.add(doc['amount'].value) } else { _agg.transactions.add(-1 * doc['amount'].value) }",
"combine_script": "profit = 0; num_of_transactions = 0; for (t in _agg.transactions) { profit += t; num_of_transactions += 1 }; return [profit, num_of_transactions]",
"reduce_script": "profit = 0; num_of_transactions = 0; for (a in _aggs) { profit += a[0] as int; num_of_transactions += a[1] as int }; return profit / num_of_transactions as float"
}
}
}
}
NOTE: this is just a demo for an array in the combine script, you can calculate average easily without using any arrays.
The response will look like:
"aggregations" : {
"avg_profit" : {
"value" : 42.5
}
}

Resources