elasticsearch kibana update document which has space using update by query - elasticsearch

I have a field to update which has space in it.
POST /index/type/_update_by_query
{
"query": {
"match_phrase":{
"field": "value"
}
},
"script":{
"lang": "painless",
"inline": "ctx._source.Existing Field = New_Value"
}
}
But I get this error.
{
"error": {
"root_cause": [
{
"type": "script_exception",
"reason": "compile error",
"script_stack": [
"ctx._source.Existing Field = New_Value",
" ^---- HERE"
],
"script": "ctx._source.Existing Field = New_Value",
"lang": "painless"
}
],
"type": "script_exception",
"reason": "compile error",
"script_stack": [
"ctx._source.Existing Field = New_Value",
" ^---- HERE"
],
"script": "ctx._source.Existing Field = New_Value",
"lang": "painless",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "unexpected token ['Field'] was expecting one of [{<EOF>, ';'}]."
}
},
"status": 500
}
When I execute this query on a field which doesn't have space, it works fine.
How do I handle cases where there is a space in the field name?
ELK version = 5.4.3
I have read in the documentation that using spaces in field names is not advised, but these fields are dynamically created from a certain server and there are like 1M data entries every day. Hence I want to do a update_by_query on all the matching entries.

Try this one:
POST index/type/_update_by_query
{
"script":{
"lang": "painless",
"inline": "ctx._source['Existing Field'] = 'New Value'"
}
}
This is possible because ctx._source is an instance of painless Map, which is a normal Java HashMap. It allows you to access fields with weird characters and also add and remove fields in update queries.
Hope that helps!

Related

How to calculate lag between the time log message was generated at application end and the time it was ingested to Elastic Search?

Elasticsearch Experts, need your help to achieve the below mention goal.
Goal:
Trying to find a way to calculate lag between the time, log message was generated at application end (#timestamp field) and the time, it was ingested to Elastic Search (ingest_time field)?
Current Setup:
I am using FluentD to capture the logs and send to Kafka. Then I use Kafka connect (Elasticsearch connector) to send the logs further to Elasticsearch. Since I have a layer of Kafka in between FluentD and Elasticsearch, I want to calculate the lag between the log message generation time and ingestion time.
Log message generation time is stored in timestamp field of the log and is done at when the application generates log. PFB how log message looks at Kafka topic end.
{
"message": "ServiceResponse - Throwing non 2xx response",
"log_level": "ERROR",
"thread_id": "http-nio-9033-exec-21",
"trace_id": "86d39fbc237ef7f8",
"user_id": "85355139",
"tag": "feedaggregator-secondary",
"#timestamp": "2022-06-18T23:30:06+0530"
}
I have created an ingest pipeline to add the ingest_time field to every doc inserted to the Elasticsearch index.
PUT _ingest/pipeline/ingest_time
{
"description": "Add an ingest timestamp",
"processors": [
{
"set": {
"field": "_source.ingest_time",
"value": "{{_ingest.timestamp}}"
}
}]
}
Once document gets inserted to the index from Kafka using Kafka connect (ES sink connector), this is how my message looks on Kibana in JSON format.
{
"_index": "feedaggregator-secondary-2022-06-18",
"_type": "_doc",
"_id": "feedaggregator-secondary-2022-06-18+2+7521337",
"_version": 1,
"_score": null,
"_source": {
"thread_id": "http-nio-9033-exec-21",
"trace_id": "86d39fbc237ef7f8",
"#timestamp": "2022-06-18T23:30:06+0530",
"ingest_time": "2022-06-18T18:00:09.038032Z",
"user_id": "85355139",
"log_level": "ERROR",
"tag": "feedaggregator-secondary",
"message": "ServiceResponse - Throwing non 2xx response"
},
"fields": {
"#timestamp": [
"2022-06-18T18:00:06.000Z"
]
},
"sort": [
1655574126000
]
}
Now, I wanted to calculate the difference between #timestamp field and ingest_time field. For this I added a script in the ingest pipeline, which adds a field lag_seconds and sets it value as the difference between ingest_time and #timestamp fields.
PUT _ingest/pipeline/calculate_lag
{
"description": "Add an ingest timestamp and calculate ingest lag",
"processors": [
{
"set": {
"field": "_source.ingest_time",
"value": "{{_ingest.timestamp}}"
}
},
{
"script": {
"lang": "painless",
"source": """
if(ctx.containsKey("ingest_time") && ctx.containsKey("#timestamp")) {
ctx['lag_in_seconds'] = ChronoUnit.MILLIS.between(ZonedDateTime.parse(ctx['#timestamp']), ZonedDateTime.parse(ctx['ingest_time']))/1000;
}
"""
}
}
]
}
Error:
But since my ingest_time and #timestamp fields are in different format it gave error DateTimeParseException.
{
"error": {
"root_cause": [
{
"type": "exception",
"reason": "java.lang.IllegalArgumentException: ScriptException[runtime error]; nested: DateTimeParseException[Text '2022-06-18T23:30:06+0530' could not be parsed, unparsed text found at index 22];",
"header": {
"processor_type": "script"
}
}
],
"type": "exception",
"reason": "java.lang.IllegalArgumentException: ScriptException[runtime error]; nested: DateTimeParseException[Text '2022-06-18T23:30:06+0530' could not be parsed, unparsed text found at index 22];",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "ScriptException[runtime error]; nested: DateTimeParseException[Text '2022-06-18T23:30:06+0530' could not be parsed, unparsed text found at index 22];",
"caused_by": {
"type": "script_exception",
"reason": "runtime error",
"script_stack": [
"java.base/java.time.format.DateTimeFormatter.parseResolved0(DateTimeFormatter.java:2049)",
"java.base/java.time.format.DateTimeFormatter.parse(DateTimeFormatter.java:1948)",
"java.base/java.time.ZonedDateTime.parse(ZonedDateTime.java:598)",
"java.base/java.time.ZonedDateTime.parse(ZonedDateTime.java:583)",
"ctx['lag_in_seconds'] = ChronoUnit.MILLIS.between(ZonedDateTime.parse(ctx['#timestamp']), ZonedDateTime.parse(ctx['ingest_time']))/1000;\n }",
" ^---- HERE"
],
"script": " if(ctx.containsKey(\"ingest_time\") && ctx.containsKey(\"#timestamp\")) {\n ctx['lag_in_seconds'] = ChronoUnit.MILLIS.between(ZonedDateTime.parse(ctx['#timestamp']), ZonedDateTime.parse(ctx['ingest_time']))/1000;\n }",
"lang": "painless",
"caused_by": {
"type": "date_time_parse_exception",
"reason": "Text '2022-06-18T23:30:06+0530' could not be parsed, unparsed text found at index 22"
}
}
},
"header": {
"processor_type": "script"
}
},
"status": 500
}
So, need your help to find the lag_seconds, between the #timestamp and ingest_time fields.
Using managed Elasticsearch by AWS (Opensearch) Elasticsearch Version - 7.1
I can see a Java date parsing problem for the #timestamp field. ctx['#timestamp'] will return the value "2022-06-18T23:30:06+0530", which is a ISO_OFFSET_DATE_TIME. You would need to parse this is using OffsetDateTime.parse(ctx['#timestamp']). Alternatively, you could try to access the #timestamp from the fields block. You can read up on date parsing in Java at https://howtodoinjava.com/java/date-time/zoneddatetime-parse/.

ElasticSearch Illegal list shortcut value [id] "Update By Query"

I get Illegal list shortcut value [id]. trying to update this document with this query. What Am I missing
events" : { "type" : "nested" }
location: {"type" : "nested"} nested type of objects
id: {"type" : "text"}
POST event_lists/_update_by_query?conflicts=proceed
{
"script": {
"lang": "painless",
"source": """
for (int i=0; i< ctx._source.events.length; i++){
if(params.event_ids.contains(ctx._source.events[i].id)){
ctx._source.events[i].location = params.location;
break;
}
}
""",
"params": {
"event_ids": ["12345"],
"location": location_object
}
}
}
When trying to use Kibana to debug
Debug.explain(ctx._source.events[i].id);
I get
"root_cause": [
{
"type": "script_exception",
"reason": "runtime error",
"painless_class": "java.lang.String",
"to_string": "ETo3zoABiBlDN0geqAGN",
"java_class": "java.lang.String",
"script_stack": [
"Debug.explain(ctx._source.events[i].id); \n ",
" ^---- HERE"
]
I ended up doing a check if the object is a list vs. an object and it seems to be working.
for (event in ctx._source.events ){
if(event instanceof List && event.size() > 0) {
if(params.event_ids.contains(event[0].id)){
event[0].location = params.location;
break;
}
}else{
if(params.event_ids.contains(event.id)){
event.location = params.location;
break;
}
}
}

Elasticsearch removing an array list when reindexing all records

So I am trying to reindex one of my indices to a temporary one and remove an array list: platform.platforms.*
This is what my Kibana query looks like:
POST /_reindex
{
"source": {
"index": "devops-ucd-000001"
},
"dest": {
"index": "temp-ucd"
},
"conflicts": "proceed",
"script": {
"lang": "painless",
"inline": "ctx._source.platform.platforms.removeAll(Collections.singleton('1'))"
}
}
However what I get is a null pointer exception:
"script_stack": [
"ctx._source.platform.platforms.removeAll(Collections.singleton('1'))",
" ^---- HERE"
],
"script": "ctx._source.platform.platforms.removeAll(Collections.singleton('1'))",
"lang": "painless",
"caused_by": {
"type": "null_pointer_exception",
"reason": null
}
I tried following this question: how to remove arraylist value in elastic search using curl? to no avail.
Any help would be appreciated here.
It is probably due to some documents not having platform field. You need to add additional checks in your script to ignore such documents
"script": {
"lang": "painless",
"inline": """
if(ctx._source.platform!=null && ctx._source.platform.platforms!=null && ctx._source.platform.platforms instanceof List)
{
ctx._source.platform.platforms.removeAll(Collections.singleton('1'))
}
"""
}
Above has null check on platform and platform.platforms also if platforms is of type list

Accessing Text Keyword Field through a Script

I am trying to do some scripting in elasticsearch
Here is an example of the JSON segment in the request.
{
"script_score": {
"script": {
"source": "doc.containsKey('var')?params.adder[doc['var'].keyword]:0 ",
"params": {
"adder": {
"type1": 1,
"type2": 1000
}
}
}
},
"weight": 100000
}
This is the error that is thrown
{
"shard": 0,
"index": "",
"node": "4eX6EgO2QAuBdc5zkUiDBg",
"reason": {
"type": "script_exception",
"reason": "runtime error",
"script_stack": [
"org.elasticsearch.index.mapper.TextFieldMapper$TextFieldType.fielddataBuilder(TextFieldMapper.java:759)",
"org.elasticsearch.index.fielddata.IndexFieldDataService.getForField(IndexFieldDataService.java:116)",
"org.elasticsearch.index.query.QueryShardContext.lambda$lookup$0(QueryShardContext.java:290)",
"org.elasticsearch.search.lookup.LeafDocLookup$1.run(LeafDocLookup.java:101)",
"org.elasticsearch.search.lookup.LeafDocLookup$1.run(LeafDocLookup.java:98)",
"java.base/java.security.AccessController.doPrivileged(AccessController.java:312)",
"org.elasticsearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:98)",
"org.elasticsearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:41)",
"doc.containsKey('var')?params.adder[doc['var'].keyword]:0 ",
" ^---- HERE"
],
"script": "doc.containsKey('var')?params.adder[doc['var'].keyword]:0 ",
"lang": "painless",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [var] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
}
}
}
It's surprising to me that they cannot access the keyword field because it's nested. Do I need to make another field that is the keyword field?
Thank you
To access nested fields try doc['var.keyword'] or doc['var.keyword'].value

How does elasticsearch handle returns inside a scripted update query?

I can't find the relevant documentation describing the return keyword. Where is this documented?
I am running the following query
POST /myindex/mytype/FwOaGmQBdhLB1nuQhK1Q/_update
{
"script": {
"source": """
if (ctx._source.owner._id.equals(params.signedInUserId)){
for (int i = 0; i < ctx._source.managers.length; i++) {
if (ctx._source.managers[i].email.equals(params.managerEmail)) {
ctx._source.managers.remove(i);
return;
}
}
}
ctx.op = 'noop';
""",
"lang": "painless",
"params": {
"signedInUserId": "auth0|5a78c1ccebf64a46ecdd0d9c",
"managerEmail": "d#d.com"
}
},
"_source": true
}
but I'm getting the error
"type": "illegal_argument_exception",
"reason": "failed to execute script",
"caused_by": {
"type": "script_exception",
"reason": "compile error",
"script_stack": [
"... ve(i);\n return;\n }\n }\n ...",
" ^---- HERE"
],
"script": <the script here>,
"lang": "painless",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "invalid sequence of tokens near [';'].",
"caused_by": {
"type": "no_viable_alt_exception",
"reason": null
}
}
If I remove return keyword, then the script runs but I get the wrong behavior as expected. I can correct the behavior by using a Boolean to keep track of email removal, but why can't I return early?
It's hard to say, you could avoid null/void returns by passing a lambda comparator to either retainAll or removeIf
ctx._source.managers.removeIf(m -> m.email.equals(params.managerEmail))
Lambda expressions and method references work the same as Java’s.

Resources