How do I convert to uppercase and delete a particular field while using reindex? - elasticsearch

I am trying to migrate from ES 1.4 to ES 5.5. In one of the index, I need to change the name of field and also convert it's value to uppercase. I am able to reindex with a change in name of field and remove the unwanted field but need help in converting the value to uppercase.
This is what I tried
POST _reindex?wait_for_completion=false
{
"source": {
"remote": {
"host": "http://source_ip:17002"
},
"index": "log_event_2017-08-11",
"size": 1000,
"query": {
"match_all": {}
}
},
"dest": {
"index": "logs-ics-2017-08-11"
},
"script": {
"inline": "ctx._source.product = ctx._source.remove(\"product_name\")",
"lang": "painless"
}
}
The above POST request is able to remove "product_name" and create "product" with it's value. So in order to uppercase "product" docs value I tried below inline script but it gives a null_pointer_exception.
I am new to Elasticsearch scripting. Please help.
"ctx._source.product = ctx._source.remove(\"product_name\");ctx._source.product = doc[\"product\"].toUpperCase()"

You can add an ingest pipeline before you trigger the _reindexapi. There are processors to rename a field and convert a field to uppercase. You can incorporate the pipeline in your reindex call, then.
{
"source": {
"index": "source"
},
"dest": {
"index": "dest",
"pipeline": "<id_of_your_pipeline>"
}
}

Related

Re-Index Elasticsearch, ignore fields not in mapping

Trying to test out re-index API in elasticsearch and running into issues where existing data contains fields not present in the new index's strict mapping. Is there a way to tell elasticsearch to simply ignore those fields and carry on?
Edit: To clarify, by ignore I meant not to include those fields during the re-index process.
If you have access to the index settings before running reindex you can just do:
PUT test/_mapping
{
"dynamic": "false"
}
then change it back to strict once reindexing is done.
UPDATE based on your comment
POST _reindex
{
"source": {
"index": "src"
},
"dest": {
"index": "dst"
},
"script": {
"lang": "painless",
"source": """
ctx['_source'].remove('email');
ctx['_source'].remove('username');
ctx['_source'].remove('name');
// removing from nested:
for(item in ctx['_source'].Groups){
item.remove('GroupName');
item.remove('IsActive');
}
"""
}
}
While reindexing you can include or exclude source fields according to your destination index mapping.
To exclude some specific fields while reindexing:
POST _reindex
{
"source": {
"index": "source-index",
"_source": {
"excludes": ["exclude_a", "exclude_b"]
}
},
"dest": {
"index": "dest-index"
}
}
To include any specific field while reindexing:
POST _reindex
{
"source": {
"index": "source-index",
"_source": ["include_a", "include_b"]
},
"dest": {
"index": "dest-index"
}
}

How to re-index multiple Elastic Search types into a new index with a single type?

I am upgrading from ElasticSearch 5.6 to 6.0 and I have standard logstash-* indexes. In those indexes I have multiple (doc) types "attachmentsDbStats" and "attachmentsFileStats" which have the same schema. The only difference is the value of _type and type. I have created a new index attachments-* where the type is "attachement" and I want to reindex documents of both types into the new index. Obviously b/c of the new single type restriction in 6.0, both need to have the same type. I have update all of the documents in my old index such that the "type" field has a value "attachment." When I run reindex I am not able to upload the documents due to the restriction on the single type. I have attempted to update the _type field in the old indexes but that is immutable. Any ideas how to reindex and convert the type during the conversion?
Something like this should get you started:
POST _reindex
{
"source": {
"index": "logstash-*"
},
"dest": {
"index": "attachments-*"
},
"script": {
"source": """
ctx._id = ctx._type + "-" + ctx._id;
ctx._source.type = ctx._type;
ctx._type = "attachement";
"""
}
}
Combining _type and _id into the new _id field, so it's definitely unique. Move the _type field to a custom type. And set the _type to "attachement" (note that the convention used by Elastic uses "doc" as the default type, but you can pick whatever you want as long as there is a single type).
Thank you for the tip. Fixing some of the spelling errors I had above this did the trick:
POST _reindex {
"source": {
"index": "logstash-*",
"query": {
"bool": {
"must": {
"term": {
"type": "attachments"
}
}
}
}
},
"dest": {
"index": "attachments.old"
},
"script": {
"source": "ctx._id = ctx._type + '-' + ctx._id; ctx._source.type = ctx._type; ctx._type = 'attachments';"
}
}

ElasticSearch painless script for reindexing

We are trying to use following painless script to reindex our data in elasticsearch.
POST _reindex
{
"source": {
"index": "metricbeat-*"
},
"dest": {
"index": "metricbeat"
},
"script": {
"lang": "painless",
"inline": "ctx._index = 'metricbeat-' + (ctx._index.substring('metricbeat-'.length(), ctx._index.length())) + '-1'"
}
}
Reffered from following URL:
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html#_reindex_daily_indices
This script works perfect and creates another copy of our all indices.
exa: if I have origin index as
metricbeat-2016.05.30
after running this script it creates metricbeat-2016.05.30-1 which is exact copy of original index i.e (metricbeat-2016.05.30)
Now I want to do following 2 things:
1] Delete original index i.e metricbeat-2016.05.30
2] Rename reindexed index or copy of original index i.e (metricbeat-2016.05.30-1) back to metricbeat-2016.05.30 i.e original index.
How can we do this ?
can we modify above painless script ?
Thanks in advance !
The way I did it was to reindex like in the example from Elasticsearch reference, but instead of appending a "-1" I prepended the index with "temp-":
POST _reindex
{
"source": {
"index": "metricbeat-*"
},
"dest": {
"index": "metricbeat"
},
"script": {
"lang": "painless",
"source": "ctx._index = 'temp-' + ctx._index"
}
}
This makes it easier to delete the original indices with the pattern "metricbeat-*":
DELETE metricbeat-*
I then reindexed again to get the original name:
POST _reindex
{
"source": {
"index": "temp-metricbeat-*"
},
"dest": {
"index": "metricbeat"
},
"script": {
"lang": "painless",
"source": "ctx._index = ctx._index.substring(5)"
}
}
As a side note, the example in the Elasticsearch reference is unnecessarily complex:
ctx._index = 'metricbeat-' + (ctx._index.substring('metricbeat-'.length(), ctx._index.length())) + '-1'
You get the same result with the code:
ctx._index = ctx._index + '-1'
you cannot rename an index. You could use aliases however, after you deleted the original index.

ElasticSearch: reindex and aliases (keep routing policy)

I'm using alias per user. For each alias I'm linking a routing and a filter:
PUT _/<index>/_alias/u1#u1.com'
{
"routing": "u1#u1.com",
"filter": {
"term": {
"user": "u1#u1.com"
}
}
}
So, I'm setting that indexation and searching is using routing information.
I want to reindex all documents on another index using _reindex api. After having created the new index, I've created all aliases. So, I figure out that documents have to be reindexed using aliases in order to keep routing policy.
Is there any way to set it up on _reindex?
Example:
POST _/_reindex
{
"source": {
"index": "old"
},
"dest": {
"index": "new"
}
}
'
Any ideas?
Yes, you can:
POST _reindex
{
"source": {
"index": "old",
"query": {
"term": {
"user": "u1#u1.com"
}
}
},
"dest": {
"index": "new",
"routing": "=u1#u1.com"
}
}
According to this documentation:
By default if _reindex sees a document with routing then the routing is preserved unless it’s changed by the script.
So, as far I've been able to figure out, by default, for each document that already has a routing information it's preserved.

Elasticsearch Mapping - Rename existing field

Is there anyway I can rename an element in an existing elasticsearch mapping without having to add a new element ?
If so whats the best way to do it in order to avoid breaking the existing mapping?
e.g. from fieldCamelcase to fieldCamelCase
{
"myType": {
"properties": {
"timestamp": {
"type": "date",
"format": "date_optional_time"
},
"fieldCamelcase": {
"type": "string",
"index": "not_analyzed"
},
"field_test": {
"type": "double"
}
}
}
}
You could do this by creating an Ingest pipeline, that contains a Rename Processor in combination with the Reindex API.
PUT _ingest/pipeline/my_rename_pipeline
{
"description" : "describe pipeline",
"processors" : [
{
"rename": {
"field": "fieldCamelcase",
"target_field": "fieldCamelCase"
}
}
]
}
POST _reindex
{
"source": {
"index": "source"
},
"dest": {
"index": "dest",
"pipeline": "my_rename_pipeline"
}
}
Note that you need to be running Elasticsearch 5.x in order to use ingest. If you're running < 5.x then you'll have to go with what #Val mentioned in his comment :)
Updating field name in ES (version>5, missing has been removed) using _update_by_query API:
Example:
POST http://localhost:9200/INDEX_NAME/_update_by_query
{
"query": {
"bool": {
"must_not": {
"exists": {
"field": "NEW_FIELD_NAME"
}
}
}
},
"script" : {
"inline": "ctx._source.NEW_FIELD_NAME = ctx._source.OLD_FIELD_NAME; ctx._source.remove(\"OLD_FIELD_NAME\");"
}
}
First of all, you must understand how elasticsearch and lucene store data, by immutable segments (you can read about easily on Internet).
So, any solution will remove/create documents and change mapping or create a new index so a new mapping as well.
The easiest way is to use the update by query API: https://www.elastic.co/guide/en/elasticsearch/reference/2.4/docs-update-by-query.html
POST /XXXX/_update_by_query
{
"query": {
"missing": {
"field": "fieldCamelCase"
}
},
"script" : {
"inline": "ctx._source.fieldCamelCase = ctx._source.fieldCamelcase; ctx._source.remove(\"fieldCamelcase\");"
}
}
Starting with ES 6.4 you can use "Field Aliases", which allow the functionality you're looking for with close to 0 work or resources.
Do note that aliases can only be used for searching - not for indexing new documents.

Resources