I am trying to manipulate date in elasticsearch's scripting language painless.
Specifically, I am trying to add 4 hours, which is 14,400 seconds.
{
"script_fields": {
"new_date_field": {
"script": {
"inline": "doc['date_field'] + 14400"
}
}
}
}
This throws Cannot apply [+] operation to types [org.elasticsearch.index.fielddata.ScriptDocValues.Longs] and [java.lang.Integer].
Thanks
The solution was to use .value
{
"script_fields": {
"new_date_field": {
"script": {
"inline": "doc['date_field'].value + 14400"
}
}
}
}
However, I actually wanted to use it for reindexing, where the format is a bit different.
Here is my version for manipulating time in the _reindex api
POST _reindex
{
"source": {
"index": "some_index_v1"
},
"dest": {
"index": "some_index_v2"
},
"script": {
"inline": "def sf = new SimpleDateFormat(\"yyyy-MM-dd'T'HH:mm:ss\"); def dt = sf.parse(ctx._source.date_field); def calendar = sf.getCalendar(); calendar.setTime(dt); def instant = calendar.toInstant(); def localDateTime = LocalDateTime.ofInstant(instant, ZoneOffset.UTC); ctx._source.date_field = localDateTime.plusHours(4);"
}
}
Here is the inline script in a readable version
def sf = new SimpleDateFormat(\"yyyy-MM-dd'T'HH:mm:ss\");
def dt = sf.parse(ctx._source.date_field);
def calendar = sf.getCalendar();
calendar.setTime(dt);
def instant = calendar.toInstant();
def localDateTime = LocalDateTime.ofInstant(instant, ZoneOffset.UTC);
ctx._source.date_field = localDateTime.plusHours(4);
Here is the list of functions supported by painless, it was painful.
An addition. Converting date to a string, your first part I believe, can be done with:
def dt = String.valueOf(ctx._source.date_field);
Just spent a couple of hours playing with this.. so I can concantenate a date field (in UTC format with 00:00:00 added).. to a string with the time, to get a valid datetime to add to ES. Don't ask why it was split.. its an old Oracle system
Related
I'm trying to run an elasticsearch search using the elasticsearch package (v7.15) in python.
Below is the dict sent to the search function :
{
"runtime_mappings": {
"tag_dynamic": {
"type": "keyword",
"script": {
"source": """
String nowString = params['now'];
ZonedDateTime nowZdt = ZonedDateTime.parse(nowString);
long now = nowZdt.toInstant().toEpochMilli();
ZonedDateTime mtimeZdt = ZonedDateTime.parse(doc['m_time']);
long millisDateTime = mtimeZdt.toInstant().toEpochMilli();
long Mtime_elapsedTime = now - millisDateTime;
ZonedDateTime zdtMinus = nowZdt.minusDays(30);
long millisMinusTime = zdtMinus.toInstant().toEpochMilli();
long Mtime_elapsedTime = now - millisMinusTime;
if (Mtime_elapsedTime > 3d0_elapsedTime) {
dyntag = 'TOARCHIVE';}
emit(dyntag);
""",
"params": {
"now": "<generated string datetime in ISO-8601>"
}
}
}
},
"query": {
"query_string": {
"query": "m_time:*"
}
}
}
And I get this error : Unknown key for a START_OBJECT in [runtime_mappings]. Yet the search syntax is nearly identitical to the one in the Elasticsearch docs.
Can anyone tell me the reason why I keep getting this error ?
I tested many variations , including removing the "query" part, adding "body":{ in the beginning, adding "query":{ in the beginning, etc. And I always get the same error.
I am creating a script to increase the count value of the field if the field full path exist, or else I have to add the full path dynamically. For example, In the below example
If the record already has inner->board1->count, I should increment the value of it by the value of the count
If I don't have inner or board1 or count, I should add them and add the value of the count. Please also note here the inner or board1orcount` are not fixed.
If the value is not an object, I can check using ctx._source.myCounts == null, but I am not sure how to check for the object fields and subfields and sub subfields.
Code
POST test/_update/3
{
"script": {
"source": "ctx._source.board_counts = params.myCounts",
"lang": "painless",
"params": {
"myCounts": {
"inner":{
"board1":{"count":5},
"board2":{"count":4},
"board3":{"temp":1,"temp2":3}
},
"outer":{
"board1":{"count":5},
"board10":{"temp":1,"temp2":3}
}
}
}
}
}
I am able to come up with this and working fine.
POST test/_update/3
{
"script": {
"source": "{"source": "if (ctx._source['myCounts'] == null) {ctx._source['myCounts'] = [:];} for (mainItem in params.myCounts) { for (accessItemKey in mainItem.keySet()) { if (ctx._source.myCounts[accessItemKey] == null) { ctx._source.myCounts[accessItemKey] = [:];}for (boardItemKey in mainItem[accessItemKey].keySet()) {if (ctx._source.myCounts[accessItemKey][boardItemKey] == null) {ctx._source.myCounts[accessItemKey][boardItemKey] = [:];} for (countItemKey in mainItem[accessItemKey][boardItemKey].keySet()) { if (ctx._source.myCounts[accessItemKey][boardItemKey][countItemKey] == null) { ctx._source.myCounts[accessItemKey][boardItemKey][countItemKey] =mainItem[accessItemKey][boardItemKey][countItemKey]; }else {ctx._source.myCounts[accessItemKey][boardItemKey][countItemKey] += mainItem[accessItemKey][boardItemKey][countItemKey];}}}}}",
"lang": "painless",
"params": {
"myCounts": {
"inner":{
"board1":{"count":5},
"board2":{"count":4},
"board3":{"temp":1,"temp2":3}
},
"outer":{
"board1":{"count":5},
"board10":{"temp":1,"temp2":3}
}
}
}
}
}
I am trying to trim and lowercase all the values of the document that is getting indexed into Elasticsearch
The processors available has the field key is mandatory. This means one can use a processor on only one field
Is there a way to run a processor on all the fields of a document?
There sure is. Use a script processor but beware of reserved keys like _type, _id etc:
PUT _ingest/pipeline/my_string_trimmer
{
"description": "Trims and lowercases all string values",
"processors": [
{
"script": {
"source": """
def forbidden_keys = [
'_type',
'_id',
'_version_type',
'_index',
'_version'
];
def corrected_source = [:];
for (pair in ctx.entrySet()) {
def key = pair.getKey();
if (forbidden_keys.contains(key)) {
continue;
}
def value = pair.getValue();
if (value instanceof String) {
corrected_source[key] = value.trim().toLowerCase();
} else {
corrected_source[key] = value;
}
}
// overwrite the original
ctx.putAll(corrected_source);
"""
}
}
]
}
Test with a sample doc:
POST my-index/_doc?pipeline=my_string_trimmer
{
"abc": " DEF ",
"def": 123,
"xyz": false
}
Having dataset with user session data like this:
{'username':'TestUser',
'sessionStartTime' : '2019-02-14 09:00:00'
'sessionEndTime' : ''2019-02-14 10:20:00'},
{'username':'User2',
'sessionStartTime' : '2019-02-14 02:00:00'
'sessionEndTime' : ''2019-02-14 12:00:00'}
Is there an easy way to query elastic for an multi-bucket aggregated sum of sessions in a time range?
So basically I want to query for time range 09:00:00 to 11:00:00 and get a aggregated hourly result like this:
{'bucketStart' : '2019-02-14 09:00:00',
'bucketEnd' : '2019-02-14 10:00:00',
'sessioncount' : 2},
{'bucketStart' : '2019-02-14 10:00:00',
'bucketEnd' : '2019-02-14 11:00:00',
'sessioncount' : 1}
Goal of this is, to use the resulting data to draw a line graph for "online" users sessions count, having only the session data in database.
Ok I made this on my date (day by day) so adjust the 360000 * 24 (nombre of ms in the date_histogram interval, day for me).
The second thing you could have to do is to cut your date by hour (i mean 14:03 => 14:00, 12:37 => 12h etc..., rounding up for end time and down for start time)
I am not a pro in painless, so I store agg result in a predefined array (size 99), maybe we can do it with a list of something dynamic. Anyway if your session could be longer than 99 hours, adjust it.
The script create a agg by day array, splitting hour by hour the dates.
{
"query": {
// your filter query
},
"aggs": {
"active_alerts": {
"date_histogram": {
"interval": "day",
"script": {
"inline": "def currentDate=(doc['sessionStartTime'].value); def endDate=(doc['sessionEndTime'].value); def combined=[99]; def counter = 0; while ((currentDate < endDate) && (counter < 99)) { combined[counter] = currentDate; currentDate += 3600000 * 24 } return combined",
"lang":"painless"
}
}
}
}
}
Hope it helps, let me know ;)
Full solutions for reference :
Additionnaly, allows for open-ended range where the "to" can be absent.
In Kibana, add the following script to the X series Date Histogram agg:
{"script": {
"lang": "painless",
"source": "
def currentDate=(doc['from'].value);
def endDate=(doc['to']);
def endDateValue = endDate.size() == 0 ? ZonedDateTime.ofInstant(Calendar.getInstance().toInstant(), ZoneOffset.UTC): endDate.value;
def combined = new ArrayList();
while ((currentDate.isBefore(endDateValue))) { combined.add(currentDate); currentDate = currentDate.plusDays(1) } return combined"
},
"field": null,
"calendar_interval": "1d"
}
For an ES API agg :
GET /<index>/_search
{
"query": {
"match_all": {}
},
"aggs": {
"fromto_range":{
"date_histogram" : {
"script": {
"lang": "painless",
"source": "def currentDate=(doc['from'].value); def endDate=(doc['to']); def endDateValue = endDate.size() == 0 ? ZonedDateTime.ofInstant(Calendar.getInstance().toInstant(), ZoneOffset.UTC): endDate.value; def combined = new ArrayList(); while ((currentDate.isBefore(endDateValue))) { combined.add(currentDate); currentDate = currentDate.plusDays(1) } return combined"
},
"calendar_interval":"1d"
}
}
}
}
We areable to pass the integer values as part of inline params but not date..
We are trying it like this.
"script": {
"inline": "if ((doc['enddate'].date >= param1) && (doc['enddate'].date <= param2)) { return param2 }",
"params": {
"param1": new DateTime(),
"param2": new DateTime(doc['enddate'].date).plusDays(+1)
}
}
You cannot reference document fields in inline parameters and in your case you don't really need any parameters. I suggest doing it the following way:
"script": {
"inline": "def now = new DateTime(); def tomorrow = now.plusDays(1); if ((doc['enddate'].date >= now) && (doc['enddate'].date <= tomorrow)) { return tomorrow }"
}
Note that you still need to return something in case the condition is not satisfied.