ElasticSearch: reindex and aliases (keep routing policy) - elasticsearch

I'm using alias per user. For each alias I'm linking a routing and a filter:
PUT _/<index>/_alias/u1#u1.com'
{
"routing": "u1#u1.com",
"filter": {
"term": {
"user": "u1#u1.com"
}
}
}
So, I'm setting that indexation and searching is using routing information.
I want to reindex all documents on another index using _reindex api. After having created the new index, I've created all aliases. So, I figure out that documents have to be reindexed using aliases in order to keep routing policy.
Is there any way to set it up on _reindex?
Example:
POST _/_reindex
{
"source": {
"index": "old"
},
"dest": {
"index": "new"
}
}
'
Any ideas?

Yes, you can:
POST _reindex
{
"source": {
"index": "old",
"query": {
"term": {
"user": "u1#u1.com"
}
}
},
"dest": {
"index": "new",
"routing": "=u1#u1.com"
}
}

According to this documentation:
By default if _reindex sees a document with routing then the routing is preserved unless it’s changed by the script.
So, as far I've been able to figure out, by default, for each document that already has a routing information it's preserved.

Related

Re-Index Elasticsearch, ignore fields not in mapping

Trying to test out re-index API in elasticsearch and running into issues where existing data contains fields not present in the new index's strict mapping. Is there a way to tell elasticsearch to simply ignore those fields and carry on?
Edit: To clarify, by ignore I meant not to include those fields during the re-index process.
If you have access to the index settings before running reindex you can just do:
PUT test/_mapping
{
"dynamic": "false"
}
then change it back to strict once reindexing is done.
UPDATE based on your comment
POST _reindex
{
"source": {
"index": "src"
},
"dest": {
"index": "dst"
},
"script": {
"lang": "painless",
"source": """
ctx['_source'].remove('email');
ctx['_source'].remove('username');
ctx['_source'].remove('name');
// removing from nested:
for(item in ctx['_source'].Groups){
item.remove('GroupName');
item.remove('IsActive');
}
"""
}
}
While reindexing you can include or exclude source fields according to your destination index mapping.
To exclude some specific fields while reindexing:
POST _reindex
{
"source": {
"index": "source-index",
"_source": {
"excludes": ["exclude_a", "exclude_b"]
}
},
"dest": {
"index": "dest-index"
}
}
To include any specific field while reindexing:
POST _reindex
{
"source": {
"index": "source-index",
"_source": ["include_a", "include_b"]
},
"dest": {
"index": "dest-index"
}
}

How do I convert to uppercase and delete a particular field while using reindex?

I am trying to migrate from ES 1.4 to ES 5.5. In one of the index, I need to change the name of field and also convert it's value to uppercase. I am able to reindex with a change in name of field and remove the unwanted field but need help in converting the value to uppercase.
This is what I tried
POST _reindex?wait_for_completion=false
{
"source": {
"remote": {
"host": "http://source_ip:17002"
},
"index": "log_event_2017-08-11",
"size": 1000,
"query": {
"match_all": {}
}
},
"dest": {
"index": "logs-ics-2017-08-11"
},
"script": {
"inline": "ctx._source.product = ctx._source.remove(\"product_name\")",
"lang": "painless"
}
}
The above POST request is able to remove "product_name" and create "product" with it's value. So in order to uppercase "product" docs value I tried below inline script but it gives a null_pointer_exception.
I am new to Elasticsearch scripting. Please help.
"ctx._source.product = ctx._source.remove(\"product_name\");ctx._source.product = doc[\"product\"].toUpperCase()"
You can add an ingest pipeline before you trigger the _reindexapi. There are processors to rename a field and convert a field to uppercase. You can incorporate the pipeline in your reindex call, then.
{
"source": {
"index": "source"
},
"dest": {
"index": "dest",
"pipeline": "<id_of_your_pipeline>"
}
}

Elasticsearch reindex api deleting document after copy

I've gone through the _reindex api documentation a few times, and can't figure out if it's possible or not. Once the document is copied from the source index to the destination index, is it possible to also remove the source document?
Here is the current _reindex api call body that I'm invoking:
{
"source": {
"index": "srcindex",
"type": "type",
"query": {
"range": {
"date": {
"from": <timestamp>
}
}
}
},
"dest": {
"index": "dstindex",
"type": "type"
}
}
Currently, It is not supported i.e copying then deleting immediately(effectively moving a document).
You can find good discussion happened on this topic here.
Eventually, you need to do _reindex then _delete_by_query to achieve your goal.
Hope this helps!

Elasticsearch Mapping - Rename existing field

Is there anyway I can rename an element in an existing elasticsearch mapping without having to add a new element ?
If so whats the best way to do it in order to avoid breaking the existing mapping?
e.g. from fieldCamelcase to fieldCamelCase
{
"myType": {
"properties": {
"timestamp": {
"type": "date",
"format": "date_optional_time"
},
"fieldCamelcase": {
"type": "string",
"index": "not_analyzed"
},
"field_test": {
"type": "double"
}
}
}
}
You could do this by creating an Ingest pipeline, that contains a Rename Processor in combination with the Reindex API.
PUT _ingest/pipeline/my_rename_pipeline
{
"description" : "describe pipeline",
"processors" : [
{
"rename": {
"field": "fieldCamelcase",
"target_field": "fieldCamelCase"
}
}
]
}
POST _reindex
{
"source": {
"index": "source"
},
"dest": {
"index": "dest",
"pipeline": "my_rename_pipeline"
}
}
Note that you need to be running Elasticsearch 5.x in order to use ingest. If you're running < 5.x then you'll have to go with what #Val mentioned in his comment :)
Updating field name in ES (version>5, missing has been removed) using _update_by_query API:
Example:
POST http://localhost:9200/INDEX_NAME/_update_by_query
{
"query": {
"bool": {
"must_not": {
"exists": {
"field": "NEW_FIELD_NAME"
}
}
}
},
"script" : {
"inline": "ctx._source.NEW_FIELD_NAME = ctx._source.OLD_FIELD_NAME; ctx._source.remove(\"OLD_FIELD_NAME\");"
}
}
First of all, you must understand how elasticsearch and lucene store data, by immutable segments (you can read about easily on Internet).
So, any solution will remove/create documents and change mapping or create a new index so a new mapping as well.
The easiest way is to use the update by query API: https://www.elastic.co/guide/en/elasticsearch/reference/2.4/docs-update-by-query.html
POST /XXXX/_update_by_query
{
"query": {
"missing": {
"field": "fieldCamelCase"
}
},
"script" : {
"inline": "ctx._source.fieldCamelCase = ctx._source.fieldCamelcase; ctx._source.remove(\"fieldCamelcase\");"
}
}
Starting with ES 6.4 you can use "Field Aliases", which allow the functionality you're looking for with close to 0 work or resources.
Do note that aliases can only be used for searching - not for indexing new documents.

Alias on "_all" index not updated when new indices created

I have a filtered alias in elasticsearch that I've created using "_all" as the index it is bound to. Like so:
curl -XPOST "localhost:9200/_aliases" -d'
{
"actions": [
{
"add": {
"index": "_all",
"alias": "logs",
"filter": { "type": { "value": "log" } }
}
}
]
}'
I created this alias because the logs are being placed in different indices (by month actually), and I need to see the aggregate. The problem I'm having is that whenever a new index is created, this alias is not updated. The alias seems to only reference the indices that existed when the alias was created.
Is there a way to have the alias update when new indices are added? Or is there a better approach altogether to achieve what I'm trying to do here?
You actually need an index template, more about it here.
And here's an example, for your specific case:
PUT /_template/logs_template
{
"template": "*",
"aliases": {
"logs": {
"filter": {
"type": {
"value": "log"
}
}
}
}
}
The above basically says that for each new index, whatever its name ("*"), associate the "logs" alias with it.

Resources