ElasticSearch painless script for reindexing - elasticsearch

We are trying to use following painless script to reindex our data in elasticsearch.
POST _reindex
{
"source": {
"index": "metricbeat-*"
},
"dest": {
"index": "metricbeat"
},
"script": {
"lang": "painless",
"inline": "ctx._index = 'metricbeat-' + (ctx._index.substring('metricbeat-'.length(), ctx._index.length())) + '-1'"
}
}
Reffered from following URL:
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html#_reindex_daily_indices
This script works perfect and creates another copy of our all indices.
exa: if I have origin index as
metricbeat-2016.05.30
after running this script it creates metricbeat-2016.05.30-1 which is exact copy of original index i.e (metricbeat-2016.05.30)
Now I want to do following 2 things:
1] Delete original index i.e metricbeat-2016.05.30
2] Rename reindexed index or copy of original index i.e (metricbeat-2016.05.30-1) back to metricbeat-2016.05.30 i.e original index.
How can we do this ?
can we modify above painless script ?
Thanks in advance !

The way I did it was to reindex like in the example from Elasticsearch reference, but instead of appending a "-1" I prepended the index with "temp-":
POST _reindex
{
"source": {
"index": "metricbeat-*"
},
"dest": {
"index": "metricbeat"
},
"script": {
"lang": "painless",
"source": "ctx._index = 'temp-' + ctx._index"
}
}
This makes it easier to delete the original indices with the pattern "metricbeat-*":
DELETE metricbeat-*
I then reindexed again to get the original name:
POST _reindex
{
"source": {
"index": "temp-metricbeat-*"
},
"dest": {
"index": "metricbeat"
},
"script": {
"lang": "painless",
"source": "ctx._index = ctx._index.substring(5)"
}
}
As a side note, the example in the Elasticsearch reference is unnecessarily complex:
ctx._index = 'metricbeat-' + (ctx._index.substring('metricbeat-'.length(), ctx._index.length())) + '-1'
You get the same result with the code:
ctx._index = ctx._index + '-1'

you cannot rename an index. You could use aliases however, after you deleted the original index.

Related

Elasticsearch removing an array list when reindexing all records

So I am trying to reindex one of my indices to a temporary one and remove an array list: platform.platforms.*
This is what my Kibana query looks like:
POST /_reindex
{
"source": {
"index": "devops-ucd-000001"
},
"dest": {
"index": "temp-ucd"
},
"conflicts": "proceed",
"script": {
"lang": "painless",
"inline": "ctx._source.platform.platforms.removeAll(Collections.singleton('1'))"
}
}
However what I get is a null pointer exception:
"script_stack": [
"ctx._source.platform.platforms.removeAll(Collections.singleton('1'))",
" ^---- HERE"
],
"script": "ctx._source.platform.platforms.removeAll(Collections.singleton('1'))",
"lang": "painless",
"caused_by": {
"type": "null_pointer_exception",
"reason": null
}
I tried following this question: how to remove arraylist value in elastic search using curl? to no avail.
Any help would be appreciated here.
It is probably due to some documents not having platform field. You need to add additional checks in your script to ignore such documents
"script": {
"lang": "painless",
"inline": """
if(ctx._source.platform!=null && ctx._source.platform.platforms!=null && ctx._source.platform.platforms instanceof List)
{
ctx._source.platform.platforms.removeAll(Collections.singleton('1'))
}
"""
}
Above has null check on platform and platform.platforms also if platforms is of type list

why script processor works in reindex api and not working on pipeline

i create idices based on projectId like so:
//By calling reindex API directly,it works fine
POST _reindex?wait_for_completion=false
{
"conflicts": "proceed",
"source": {
"index": "xxxxx-rlk-test1-2021-07-22"
},
"dest": {
"index": "xxxxxx",
"op_type": "create"
},
"script": {
"lang": "painless",
"source": """
if (ctx._source.kubernetes != null){
if (ctx._source.kubernetes.namespace_labels['field_cattle_io/projectId'] != null){
ctx._index = 'xxxxxx-rlk-'+ (ctx._source.kubernetes.namespace_labels['field_cattle_io/projectId']) + '' + (ctx._index.substring('xxxxxx-rlk-test-'.length(), ctx._index.length()))
}else {
ctx._index = 'xxxxxx-rlk-'+ (ctx._source.kubernetes.namespace_labels['field_cattle_io/projectId']) +'-noproject'
}
}
"""
}
}
But when i would like to use reindex with pipeline like so:
PUT _ingest/pipeline/group-by-projectid-pipeline
{
"description": "this pipeline split indices by pipeline",
"processors": [
{
"script": {
"lang": "painless",
"source": """
if (ctx.kubernetes != null){
if (ctx.kubernetes.namespace_labels['field_cattle_io/projectId'] != null){
ctx._index = 'xxxxxx-rlk-'+ (ctx.kubernetes.namespace_labels['field_cattle_io/projectId']) +'' + (ctx._index.substring('xxxxxx-rlk-test-'.length(), ctx._index.length()))
}else {
ctx._index = 'xxxxxx-rlk-'+ (ctx.kubernetes.namespace_labels['field_cattle_io/projectId']) +'-noproject'
}
}
"""
}
}
]
}
and :
POST _reindex
{
"conflicts": "proceed",
"source": {
"index": "xxxxxx-rlk-test1-2021-07-22"
},
"dest": {
"index": "xxxxxx",
"pipeline": "group-by-projectid-pipeline",
"op_type": "create"
}
}
then elasticsearch says (about (ctx._index.substring('xxxxxx-rlk-test-'.length(), ctx._index.length()))):
"type" : "string_index_out_of_bounds_exception",
"reason" : "begin 16, end 6, length 6"
Thank you in advance for your help!
This is because the script do not execute at the same time in both situations.
During the reindex call without pipeline, the script is executing before the document lands in the destination index, hence ctx._index is the name of the source index, i.e. xxxxxx-rlk-test1-2021-07-22, so your substring call works.
During a reindex call with pipeline, the script processor runs at the time the document is about to land in the destination index, hence ctx._index is the name of the destination index, i.e. xxxxxx.
This is the reason by '...'.substring(16, 6) doesn't work. So you should proceed differently in the second case.
The easy way out of this (if you want to keep the same logic) is to use a dummy destination index that has the same length as the source one that you're supposed to modify anyway:
POST _reindex
{
"conflicts": "proceed",
"source": {
"index": "xxxxxx-rlk-test1-2021-07-22"
},
"dest": {
"index": "xxxxxx-rlk-xxxxx-2021-07-22", <--- change this
"pipeline": "group-by-projectid-pipeline",
"op_type": "create"
}
}

Reindexing elastic-search documents into another index by changing the routing key to a combination of two field values

I have an existing elastic search index with the following document structure, without a routing_key
{
"_id",
"feild1"
"field2"
}
I need to migrate the data into a new index. The structure of the index remains the same with an added routing_key. The routing key needs to be updated to "field1_field2". Is there a simple Kibana script to migrate the data to the new index?
Combination of a simple painless and the reindex API of elastic search could be used to achieve this.
POST _reindex
{
"source": {
"index": "{old_index_name}",
"size": {batch_size}
},
"dest": {
"index": "{new_index_name}"
},
"script": {
"lang": "painless",
"inline": "if (ctx._source.participants.length > 0) {ctx._routing=ctx._source.field1+'-'+ctx._source.field2}"
}
}

Re-Index Elasticsearch, ignore fields not in mapping

Trying to test out re-index API in elasticsearch and running into issues where existing data contains fields not present in the new index's strict mapping. Is there a way to tell elasticsearch to simply ignore those fields and carry on?
Edit: To clarify, by ignore I meant not to include those fields during the re-index process.
If you have access to the index settings before running reindex you can just do:
PUT test/_mapping
{
"dynamic": "false"
}
then change it back to strict once reindexing is done.
UPDATE based on your comment
POST _reindex
{
"source": {
"index": "src"
},
"dest": {
"index": "dst"
},
"script": {
"lang": "painless",
"source": """
ctx['_source'].remove('email');
ctx['_source'].remove('username');
ctx['_source'].remove('name');
// removing from nested:
for(item in ctx['_source'].Groups){
item.remove('GroupName');
item.remove('IsActive');
}
"""
}
}
While reindexing you can include or exclude source fields according to your destination index mapping.
To exclude some specific fields while reindexing:
POST _reindex
{
"source": {
"index": "source-index",
"_source": {
"excludes": ["exclude_a", "exclude_b"]
}
},
"dest": {
"index": "dest-index"
}
}
To include any specific field while reindexing:
POST _reindex
{
"source": {
"index": "source-index",
"_source": ["include_a", "include_b"]
},
"dest": {
"index": "dest-index"
}
}

How do I convert to uppercase and delete a particular field while using reindex?

I am trying to migrate from ES 1.4 to ES 5.5. In one of the index, I need to change the name of field and also convert it's value to uppercase. I am able to reindex with a change in name of field and remove the unwanted field but need help in converting the value to uppercase.
This is what I tried
POST _reindex?wait_for_completion=false
{
"source": {
"remote": {
"host": "http://source_ip:17002"
},
"index": "log_event_2017-08-11",
"size": 1000,
"query": {
"match_all": {}
}
},
"dest": {
"index": "logs-ics-2017-08-11"
},
"script": {
"inline": "ctx._source.product = ctx._source.remove(\"product_name\")",
"lang": "painless"
}
}
The above POST request is able to remove "product_name" and create "product" with it's value. So in order to uppercase "product" docs value I tried below inline script but it gives a null_pointer_exception.
I am new to Elasticsearch scripting. Please help.
"ctx._source.product = ctx._source.remove(\"product_name\");ctx._source.product = doc[\"product\"].toUpperCase()"
You can add an ingest pipeline before you trigger the _reindexapi. There are processors to rename a field and convert a field to uppercase. You can incorporate the pipeline in your reindex call, then.
{
"source": {
"index": "source"
},
"dest": {
"index": "dest",
"pipeline": "<id_of_your_pipeline>"
}
}

Resources