What do I get 'Unknown key for this start object' error with runtime_mappings on elasticsearch? - elasticsearch

I'm trying to run an elasticsearch search using the elasticsearch package (v7.15) in python.
Below is the dict sent to the search function :
{
"runtime_mappings": {
"tag_dynamic": {
"type": "keyword",
"script": {
"source": """
String nowString = params['now'];
ZonedDateTime nowZdt = ZonedDateTime.parse(nowString);
long now = nowZdt.toInstant().toEpochMilli();
ZonedDateTime mtimeZdt = ZonedDateTime.parse(doc['m_time']);
long millisDateTime = mtimeZdt.toInstant().toEpochMilli();
long Mtime_elapsedTime = now - millisDateTime;
ZonedDateTime zdtMinus = nowZdt.minusDays(30);
long millisMinusTime = zdtMinus.toInstant().toEpochMilli();
long Mtime_elapsedTime = now - millisMinusTime;
if (Mtime_elapsedTime > 3d0_elapsedTime) {
dyntag = 'TOARCHIVE';}
emit(dyntag);
""",
"params": {
"now": "<generated string datetime in ISO-8601>"
}
}
}
},
"query": {
"query_string": {
"query": "m_time:*"
}
}
}
And I get this error : Unknown key for a START_OBJECT in [runtime_mappings]. Yet the search syntax is nearly identitical to the one in the Elasticsearch docs.
Can anyone tell me the reason why I keep getting this error ?
I tested many variations , including removing the "query" part, adding "body":{ in the beginning, adding "query":{ in the beginning, etc. And I always get the same error.

Related

Elasticsearch: Use loop in Painless script

I have an old version of Elasticsearch (5.6.16) on a production environment that I can't upgrade.
I'm trying to use a loop in a painless script_score script but I always face a runtime error.
All my documents can have one or several "badges", here the mapping:
"myDocument":{
"properties":{
"badges":{
"type":"nested",
"properties":{
"name":{
"type":"keyword"
}
}
},
}
},
My goal is to do a custom script that will provide a better score for documents with a specific type of badge
So I made this script
for (item in doc['badges']) {
if (item['name'] == "myCustomBadge") {
return _score * 10000;
}
}
return _score;
But unfortunately, I'm getting errors while I try to use it
{
"query":{
"function_score":{
"query":{
"match_all": {}
},
"functions":[
{
"script_score":{
"script":{
"inline":"for (item in doc['badges']) { if (item['name'] == \"myCustomBadge\") { return _score * 10000; }}return _score;",
"lang":"painless"
}
}
},
]
}
}
}
"error":{
"root_cause":[
{
"type":"script_exception",
"reason":"runtime error",
"script_stack":[
"org.elasticsearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:77)",
"org.elasticsearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:36)",
"for (item in doc['badges']) { ",
" ^---- HERE"
],
"script":"for (item in doc['badges']) { if (item['name'] == \"myCustomBadge\") { return _score * 10000; }}return _score;",
"lang":"painless"
}
],
"type":"search_phase_execution_exception",
"reason":"all shards failed"
}
I tried to change the for with an other variant, but same error.
for(int i = 0; i < doc['badges']; i++) {
if (doc['badges'][i]['name'] == "uaWorker") {
return _score * 10000;
}
}
return _score;
Could you help me find what I did wrong?
Thanks you all
The problem is not the loop but the fact that badges is nested and you're trying to access it from doc values. In this case, you need to access the array of badges from the _source document directly, like this:
for (item in params._source['badges'])

Run Elasticsearch processor on all the fields of a document

I am trying to trim and lowercase all the values of the document that is getting indexed into Elasticsearch
The processors available has the field key is mandatory. This means one can use a processor on only one field
Is there a way to run a processor on all the fields of a document?
There sure is. Use a script processor but beware of reserved keys like _type, _id etc:
PUT _ingest/pipeline/my_string_trimmer
{
"description": "Trims and lowercases all string values",
"processors": [
{
"script": {
"source": """
def forbidden_keys = [
'_type',
'_id',
'_version_type',
'_index',
'_version'
];
def corrected_source = [:];
for (pair in ctx.entrySet()) {
def key = pair.getKey();
if (forbidden_keys.contains(key)) {
continue;
}
def value = pair.getValue();
if (value instanceof String) {
corrected_source[key] = value.trim().toLowerCase();
} else {
corrected_source[key] = value;
}
}
// overwrite the original
ctx.putAll(corrected_source);
"""
}
}
]
}
Test with a sample doc:
POST my-index/_doc?pipeline=my_string_trimmer
{
"abc": " DEF ",
"def": 123,
"xyz": false
}

Count of unique aggregration doc_count in ElasticSearch

Using ElasticSearch 7.0, I can get how many log I have for each user with an aggregation :
"aggs": {
"by_user": {
"terms": {
"field": "user_id",
}
}
}
This returns me something like:
user32: 25
user52: 20
user10: 20
...
What I would like is to know how many user have 25 logs, and how many user have 20 logs etc. The ideal result would be something like :
25: 1
20: 2
19: 4
12: 54
Because 54 users have 12 logs lines.
How can I make an aggregation that returns this result ?
It sounds like you can use Bucket Script Aggregation to simplify your query but the problem is that there is still open PR on this topic.
So, for now i think the simplest is to use painless script with Scripted Metric Aggregation. I recommend you to carefully read about the stages of its execution.
In terms of code I know it's not the best algorithm for your problem but quick and dirty your query could look something like this:
GET my_index/_search
{
"size": 0,
"query" : {
"match_all" : {}
},
"aggs": {
"profit": {
"scripted_metric": {
"init_script" : "state.transactions = [:];",
"map_script" :
"""
def key = doc['firstName.keyword'];
if (key != null && key.value != null) {
def value = state.transactions[key.value];
if(value==null) value = 0;
state.transactions[key.value] = value+1
}
""",
"combine_script" : "return state.transactions",
"reduce_script" :
"""
def result = [:];
for (state in states) {
for (item in state.entrySet()) {
def key=item.getValue().toString();
def value = result[key];
if(value==null)value = 0;
result[key]=value+1;
}
}
return result;
"""
}
}
}
}

Elasticsearch complex multi bucket time aggregation - Session data to User count

Having dataset with user session data like this:
{'username':'TestUser',
'sessionStartTime' : '2019-02-14 09:00:00'
'sessionEndTime' : ''2019-02-14 10:20:00'},
{'username':'User2',
'sessionStartTime' : '2019-02-14 02:00:00'
'sessionEndTime' : ''2019-02-14 12:00:00'}
Is there an easy way to query elastic for an multi-bucket aggregated sum of sessions in a time range?
So basically I want to query for time range 09:00:00 to 11:00:00 and get a aggregated hourly result like this:
{'bucketStart' : '2019-02-14 09:00:00',
'bucketEnd' : '2019-02-14 10:00:00',
'sessioncount' : 2},
{'bucketStart' : '2019-02-14 10:00:00',
'bucketEnd' : '2019-02-14 11:00:00',
'sessioncount' : 1}
Goal of this is, to use the resulting data to draw a line graph for "online" users sessions count, having only the session data in database.
Ok I made this on my date (day by day) so adjust the 360000 * 24 (nombre of ms in the date_histogram interval, day for me).
The second thing you could have to do is to cut your date by hour (i mean 14:03 => 14:00, 12:37 => 12h etc..., rounding up for end time and down for start time)
I am not a pro in painless, so I store agg result in a predefined array (size 99), maybe we can do it with a list of something dynamic. Anyway if your session could be longer than 99 hours, adjust it.
The script create a agg by day array, splitting hour by hour the dates.
{
"query": {
// your filter query
},
"aggs": {
"active_alerts": {
"date_histogram": {
"interval": "day",
"script": {
"inline": "def currentDate=(doc['sessionStartTime'].value); def endDate=(doc['sessionEndTime'].value); def combined=[99]; def counter = 0; while ((currentDate < endDate) && (counter < 99)) { combined[counter] = currentDate; currentDate += 3600000 * 24 } return combined",
"lang":"painless"
}
}
}
}
}
Hope it helps, let me know ;)
Full solutions for reference :
Additionnaly, allows for open-ended range where the "to" can be absent.
In Kibana, add the following script to the X series Date Histogram agg:
{"script": {
"lang": "painless",
"source": "
def currentDate=(doc['from'].value);
def endDate=(doc['to']);
def endDateValue = endDate.size() == 0 ? ZonedDateTime.ofInstant(Calendar.getInstance().toInstant(), ZoneOffset.UTC): endDate.value;
def combined = new ArrayList();
while ((currentDate.isBefore(endDateValue))) { combined.add(currentDate); currentDate = currentDate.plusDays(1) } return combined"
},
"field": null,
"calendar_interval": "1d"
}
For an ES API agg :
GET /<index>/_search
{
"query": {
"match_all": {}
},
"aggs": {
"fromto_range":{
"date_histogram" : {
"script": {
"lang": "painless",
"source": "def currentDate=(doc['from'].value); def endDate=(doc['to']); def endDateValue = endDate.size() == 0 ? ZonedDateTime.ofInstant(Calendar.getInstance().toInstant(), ZoneOffset.UTC): endDate.value; def combined = new ArrayList(); while ((currentDate.isBefore(endDateValue))) { combined.add(currentDate); currentDate = currentDate.plusDays(1) } return combined"
},
"calendar_interval":"1d"
}
}
}
}

elasticsearch-painless - Manipulate date

I am trying to manipulate date in elasticsearch's scripting language painless.
Specifically, I am trying to add 4 hours, which is 14,400 seconds.
{
"script_fields": {
"new_date_field": {
"script": {
"inline": "doc['date_field'] + 14400"
}
}
}
}
This throws Cannot apply [+] operation to types [org.elasticsearch.index.fielddata.ScriptDocValues.Longs] and [java.lang.Integer].
Thanks
The solution was to use .value
{
"script_fields": {
"new_date_field": {
"script": {
"inline": "doc['date_field'].value + 14400"
}
}
}
}
However, I actually wanted to use it for reindexing, where the format is a bit different.
Here is my version for manipulating time in the _reindex api
POST _reindex
{
"source": {
"index": "some_index_v1"
},
"dest": {
"index": "some_index_v2"
},
"script": {
"inline": "def sf = new SimpleDateFormat(\"yyyy-MM-dd'T'HH:mm:ss\"); def dt = sf.parse(ctx._source.date_field); def calendar = sf.getCalendar(); calendar.setTime(dt); def instant = calendar.toInstant(); def localDateTime = LocalDateTime.ofInstant(instant, ZoneOffset.UTC); ctx._source.date_field = localDateTime.plusHours(4);"
}
}
Here is the inline script in a readable version
def sf = new SimpleDateFormat(\"yyyy-MM-dd'T'HH:mm:ss\");
def dt = sf.parse(ctx._source.date_field);
def calendar = sf.getCalendar();
calendar.setTime(dt);
def instant = calendar.toInstant();
def localDateTime = LocalDateTime.ofInstant(instant, ZoneOffset.UTC);
ctx._source.date_field = localDateTime.plusHours(4);
Here is the list of functions supported by painless, it was painful.
An addition. Converting date to a string, your first part I believe, can be done with:
def dt = String.valueOf(ctx._source.date_field);
Just spent a couple of hours playing with this.. so I can concantenate a date field (in UTC format with 00:00:00 added).. to a string with the time, to get a valid datetime to add to ES. Don't ask why it was split.. its an old Oracle system

Resources