Re-Index Elasticsearch, ignore fields not in mapping - elasticsearch

Trying to test out re-index API in elasticsearch and running into issues where existing data contains fields not present in the new index's strict mapping. Is there a way to tell elasticsearch to simply ignore those fields and carry on?
Edit: To clarify, by ignore I meant not to include those fields during the re-index process.

If you have access to the index settings before running reindex you can just do:
PUT test/_mapping
{
"dynamic": "false"
}
then change it back to strict once reindexing is done.
UPDATE based on your comment
POST _reindex
{
"source": {
"index": "src"
},
"dest": {
"index": "dst"
},
"script": {
"lang": "painless",
"source": """
ctx['_source'].remove('email');
ctx['_source'].remove('username');
ctx['_source'].remove('name');
// removing from nested:
for(item in ctx['_source'].Groups){
item.remove('GroupName');
item.remove('IsActive');
}
"""
}
}

While reindexing you can include or exclude source fields according to your destination index mapping.
To exclude some specific fields while reindexing:
POST _reindex
{
"source": {
"index": "source-index",
"_source": {
"excludes": ["exclude_a", "exclude_b"]
}
},
"dest": {
"index": "dest-index"
}
}
To include any specific field while reindexing:
POST _reindex
{
"source": {
"index": "source-index",
"_source": ["include_a", "include_b"]
},
"dest": {
"index": "dest-index"
}
}

Related

How do I convert to uppercase and delete a particular field while using reindex?

I am trying to migrate from ES 1.4 to ES 5.5. In one of the index, I need to change the name of field and also convert it's value to uppercase. I am able to reindex with a change in name of field and remove the unwanted field but need help in converting the value to uppercase.
This is what I tried
POST _reindex?wait_for_completion=false
{
"source": {
"remote": {
"host": "http://source_ip:17002"
},
"index": "log_event_2017-08-11",
"size": 1000,
"query": {
"match_all": {}
}
},
"dest": {
"index": "logs-ics-2017-08-11"
},
"script": {
"inline": "ctx._source.product = ctx._source.remove(\"product_name\")",
"lang": "painless"
}
}
The above POST request is able to remove "product_name" and create "product" with it's value. So in order to uppercase "product" docs value I tried below inline script but it gives a null_pointer_exception.
I am new to Elasticsearch scripting. Please help.
"ctx._source.product = ctx._source.remove(\"product_name\");ctx._source.product = doc[\"product\"].toUpperCase()"
You can add an ingest pipeline before you trigger the _reindexapi. There are processors to rename a field and convert a field to uppercase. You can incorporate the pipeline in your reindex call, then.
{
"source": {
"index": "source"
},
"dest": {
"index": "dest",
"pipeline": "<id_of_your_pipeline>"
}
}

ElasticSearch painless script for reindexing

We are trying to use following painless script to reindex our data in elasticsearch.
POST _reindex
{
"source": {
"index": "metricbeat-*"
},
"dest": {
"index": "metricbeat"
},
"script": {
"lang": "painless",
"inline": "ctx._index = 'metricbeat-' + (ctx._index.substring('metricbeat-'.length(), ctx._index.length())) + '-1'"
}
}
Reffered from following URL:
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html#_reindex_daily_indices
This script works perfect and creates another copy of our all indices.
exa: if I have origin index as
metricbeat-2016.05.30
after running this script it creates metricbeat-2016.05.30-1 which is exact copy of original index i.e (metricbeat-2016.05.30)
Now I want to do following 2 things:
1] Delete original index i.e metricbeat-2016.05.30
2] Rename reindexed index or copy of original index i.e (metricbeat-2016.05.30-1) back to metricbeat-2016.05.30 i.e original index.
How can we do this ?
can we modify above painless script ?
Thanks in advance !
The way I did it was to reindex like in the example from Elasticsearch reference, but instead of appending a "-1" I prepended the index with "temp-":
POST _reindex
{
"source": {
"index": "metricbeat-*"
},
"dest": {
"index": "metricbeat"
},
"script": {
"lang": "painless",
"source": "ctx._index = 'temp-' + ctx._index"
}
}
This makes it easier to delete the original indices with the pattern "metricbeat-*":
DELETE metricbeat-*
I then reindexed again to get the original name:
POST _reindex
{
"source": {
"index": "temp-metricbeat-*"
},
"dest": {
"index": "metricbeat"
},
"script": {
"lang": "painless",
"source": "ctx._index = ctx._index.substring(5)"
}
}
As a side note, the example in the Elasticsearch reference is unnecessarily complex:
ctx._index = 'metricbeat-' + (ctx._index.substring('metricbeat-'.length(), ctx._index.length())) + '-1'
You get the same result with the code:
ctx._index = ctx._index + '-1'
you cannot rename an index. You could use aliases however, after you deleted the original index.

ElasticSearch: reindex and aliases (keep routing policy)

I'm using alias per user. For each alias I'm linking a routing and a filter:
PUT _/<index>/_alias/u1#u1.com'
{
"routing": "u1#u1.com",
"filter": {
"term": {
"user": "u1#u1.com"
}
}
}
So, I'm setting that indexation and searching is using routing information.
I want to reindex all documents on another index using _reindex api. After having created the new index, I've created all aliases. So, I figure out that documents have to be reindexed using aliases in order to keep routing policy.
Is there any way to set it up on _reindex?
Example:
POST _/_reindex
{
"source": {
"index": "old"
},
"dest": {
"index": "new"
}
}
'
Any ideas?
Yes, you can:
POST _reindex
{
"source": {
"index": "old",
"query": {
"term": {
"user": "u1#u1.com"
}
}
},
"dest": {
"index": "new",
"routing": "=u1#u1.com"
}
}
According to this documentation:
By default if _reindex sees a document with routing then the routing is preserved unless it’s changed by the script.
So, as far I've been able to figure out, by default, for each document that already has a routing information it's preserved.

Elasticsearch reindex api deleting document after copy

I've gone through the _reindex api documentation a few times, and can't figure out if it's possible or not. Once the document is copied from the source index to the destination index, is it possible to also remove the source document?
Here is the current _reindex api call body that I'm invoking:
{
"source": {
"index": "srcindex",
"type": "type",
"query": {
"range": {
"date": {
"from": <timestamp>
}
}
}
},
"dest": {
"index": "dstindex",
"type": "type"
}
}
Currently, It is not supported i.e copying then deleting immediately(effectively moving a document).
You can find good discussion happened on this topic here.
Eventually, you need to do _reindex then _delete_by_query to achieve your goal.
Hope this helps!

Elasticsearch Mapping - Rename existing field

Is there anyway I can rename an element in an existing elasticsearch mapping without having to add a new element ?
If so whats the best way to do it in order to avoid breaking the existing mapping?
e.g. from fieldCamelcase to fieldCamelCase
{
"myType": {
"properties": {
"timestamp": {
"type": "date",
"format": "date_optional_time"
},
"fieldCamelcase": {
"type": "string",
"index": "not_analyzed"
},
"field_test": {
"type": "double"
}
}
}
}
You could do this by creating an Ingest pipeline, that contains a Rename Processor in combination with the Reindex API.
PUT _ingest/pipeline/my_rename_pipeline
{
"description" : "describe pipeline",
"processors" : [
{
"rename": {
"field": "fieldCamelcase",
"target_field": "fieldCamelCase"
}
}
]
}
POST _reindex
{
"source": {
"index": "source"
},
"dest": {
"index": "dest",
"pipeline": "my_rename_pipeline"
}
}
Note that you need to be running Elasticsearch 5.x in order to use ingest. If you're running < 5.x then you'll have to go with what #Val mentioned in his comment :)
Updating field name in ES (version>5, missing has been removed) using _update_by_query API:
Example:
POST http://localhost:9200/INDEX_NAME/_update_by_query
{
"query": {
"bool": {
"must_not": {
"exists": {
"field": "NEW_FIELD_NAME"
}
}
}
},
"script" : {
"inline": "ctx._source.NEW_FIELD_NAME = ctx._source.OLD_FIELD_NAME; ctx._source.remove(\"OLD_FIELD_NAME\");"
}
}
First of all, you must understand how elasticsearch and lucene store data, by immutable segments (you can read about easily on Internet).
So, any solution will remove/create documents and change mapping or create a new index so a new mapping as well.
The easiest way is to use the update by query API: https://www.elastic.co/guide/en/elasticsearch/reference/2.4/docs-update-by-query.html
POST /XXXX/_update_by_query
{
"query": {
"missing": {
"field": "fieldCamelCase"
}
},
"script" : {
"inline": "ctx._source.fieldCamelCase = ctx._source.fieldCamelcase; ctx._source.remove(\"fieldCamelcase\");"
}
}
Starting with ES 6.4 you can use "Field Aliases", which allow the functionality you're looking for with close to 0 work or resources.
Do note that aliases can only be used for searching - not for indexing new documents.

Resources