ElasticSearch - Reindexing your data with zero downtime

ElasticSearch - Reindexing your data with zero downtime - elasticsearch

https://www.elastic.co/blog/changing-mapping-with-zero-downtime/
I try to create a new index and reindexing my data with zero downtime with this guide.
Now I have an index called "photoshooter" and I follow the steps
1) Create new index "photoshooter_v1" with the new mapping... (Done)
2) Create alias...
curl -XPOST localhost:9200/_aliases -d '
{
"actions": [
{ "add": {
"alias": "photoshooter",
"index": "photoshooter_v1"
}}
]
}
and I get this error...
{
"error": "InvalidAliasNameException[[photoshooter_v1] Invalid alias name [photoshooter], an index exists with the same name as the alias]",
"status": 400
}
I think I lose something with the logic..

Lets say your current index is named as "photoshooter " if i am guessing it right ok.
Now Create a Alias for this index first - OK
{
"actions": [
{ "add": {
"alias": "photoshooter_docs",
"index": "photoshooter"
}}
]
}
test it - curl -XGET 'localhost:9200/photoshooter_docs/_search'
Note - now you will use 'photoshooter_docs' as index name to interact with your index which is actually 'photoshooter' Ok.
Now we create a new index with your new mapping let's say we name it 'photoshooter_v2' now copy your 'photoshooter' index data to new index(photoshooter_v2)
Once you have copied all your data now simply
Remove the alias from previous index to new index -
curl -XPOST localhost:9200/_aliases -d '
{
"actions": [
{ "remove": {
"alias": "photoshooter_docs",
"index": "photoshooter"
}},
{ "add": {
"alias": "photoshooter_docs",
"index": "photoshooter_v2"
}}
]
}
test it again -> curl -XGET 'localhost:9200/photoshooter_docs/_search'
Congrats you have changed your mapping without zero downtime .
And to copy data you can use tools like this
https://github.com/mallocator/Elasticsearch-Exporter
Note - this tools also copies the mapping from old index to new index which you might don't want to do. So that you have read in its documentation or edit it according to your use .
Thanks
Hope this helps

It's it very simple, you cannot create an alias with a name of an index that already exists.
You'll need to consider a new name for the new index, re-index the data in the new one and then remove the old one to be able to give it the same name.
If you want to do that on daily basis, you might consider adding per say the date to your index's name and switch upon it every day.

Related

elasticsearch reindex doesnt use index name specified in script

I tried reindexing daily indices from remote cluster and following reindex-daily-indices example
POST _reindex
{
"source": {
"remote": {
"host": "http://remote_es:9200"
},
"index": "telemetry-*"
},
"dest": {
"index": "dummy"
},
"script": {
"lang": "painless",
"source": """
ctx._index = 'telemetry-' + (ctx._index.substring('telemetry-'.length(), ctx._index.length()));
"""
}
}
It looks like if the new ctx._index is exactly the same as the original ctx._index, it will use the dest.index instead. It reindex all the records into "dummy" index
Is this a bug or intended behaviour? I could not find any explanation to this behaviour.
Is there a way to reindex (multiple indices) from remote and still preserve the original name?

It's because according to your logic, the destination index name is the same as the source index name. In the documentation you linked at, they are appending '-1' at the end of the index name.
In your case, the following logic just sets the same destination index name as the source index name, and reindex doesn't allow that, so it's using the destination index name specified in dest.index
ctx._index = 'telemetry-' + (ctx._index.substring('telemetry-'.length(), ctx._index.length()));
Also worth noting that this case has been reported here and here.

How to create a duplicate index in ElasticSearch from existing index?

I have an existing index with mappings and data in ElasticSearch which I need to duplicate for testing new development. Is there anyway to create a temporary/duplicate index from the already existing one?
Coming from an SQL background, I am looking at something equivalent to
SELECT *
INTO TestIndex
FROM OriginalIndex
WHERE 1 = 0
I have tried the Clone API but can't get it to work.
I'm trying to clone using:
POST /originalindex/_clone/testindex
{
}
But this results in the following exception:
{
"error": {
"root_cause": [
{
"type": "invalid_type_name_exception",
"reason": "Document mapping type name can't start with '_', found: [_clone]"
}
],
"type": "invalid_type_name_exception",
"reason": "Document mapping type name can't start with '_', found: [_clone]"
},
"status": 400
}
I know someone would guide me quickly. Thanks in advance all you wonderful folks.

First you have to set the source index to be read-only
PUT /originalindex/_settings
{
"settings": {
"index.blocks.write": true
}
}
Then you can clone
POST /originalindex/_clone/testindex
If you need to copy documents to a new index, you can use the reindex api
curl -X POST "localhost:9200/_reindex?pretty" -H 'Content-Type:
application/json' -d'
{
"source": {
"index": "someindex"
},
"dest": {
"index": "someindex_copy"
}
}
'
(See: https://wrossmann.medium.com/clone-an-elasticsearch-index-b3e9b295d3e9)

Shortly after posting the question, I figured out a way.
First, get the properties of original index:
GET originalindex
Copy the properties and put to a new index:
PUT /testindex
{
"aliases": {...from the above GET request},
"mappings": {...from the above GET request},
"settings": {...from the above GET request}
}
Now I have a new index for testing.

Update configuration for actively used index without data loss

Sometimes, I need to update mappings, settings, or bind default pipelines to the actively used index.
For the time being, I am using a method with data loss as follows:
update the index template with proper mapping (or binding the default pipeline by index.default_pipeline);
create a_new_index (matching the template index_patterns);
reindex the index_to_fix to a_new_index to migrate the data already indexed;
use alias to redirect the coming indexing request to a_new_index (the alias will have the same name as index_to_fix to ensure the indexing is undisturbed) and delete the index_to_fix;
But between step 3 and step 4, there is a time gap, during which the newly indexed data are lost in the original index_to_fix.
Is there a way, to update configurations for actively used index without any data loss?

Thanks for the help of #LeBigCat, after some discussions. I think this problem could be solved in three steps.
Use Alias for CRUD
First thing first, try not to use index directly, use alias if possible; since you can't use an alias with the same name as the existed indices, directly you can't replace the index even if it's broken (badly designed). The easiest way is to use a template and include the index name directly in the alias.
PUT _template/test
{
...
"aliases" : {
"{index}-alias" : {}
}
}
Redirect the Indexing
Since the index_to_fix is being actively used, after updating the template and create a new index a_new_fix, we can use alias to redirect the indexing to a_new_fix.
POST /_aliases
{
"actions" : [
{ "add": { "index": "a_new_index", "alias": "index_to_fix-alias" } },
{ "remove": { "index": "index_to_fix", "alias": "index_to_fix-alias" } }
]
}
Migrating the Data
Simply use _reindex to migrate all the data from index_to_fix to a_new_index.
POST _reindex
{
"source": {
"index": "index_to_fix"
},
"dest": {
"index": "index_to_fix-alias"
}
}

Creating a mapping for an existing index with a new type in elasticsearch

I found an article on elasticsearch's site describing how to 'reindex without downtime', but that's not really acceptable every time a new element is introduced that needs to have a custom mapping (http://www.elasticsearch.org/blog/changing-mapping-with-zero-downtime/)
Does anyone know why I can't create a mapping for an existing index but a new type in elasticsearch? The type doesn't exist yet, so why not? Maybe I'm missing something and it IS possible? If so, how can that be achieved?
Thanks,
Vladimir

Here is a simple example to create two type mapping in a index, (one after another)
I've used i1 as index and t1 and t2 as types,
Create index
curl -XPUT "http://localhost:9200/i1"
Create type 1
curl -XPUT "http://localhost:9200/i1/t1/_mapping" -d
{
"t1": {
"properties": {
"field1": {
"type": "string"
},
"field2": {
"type": "string"
}
}
}
}'
Create type 2
curl -XPUT "localhost:9200/i1/t2/_mapping" -d'
{
"t2": {
"properties": {
"field3": {
"type": "string"
},
"field4": {
"type": "string"
}
}
}
}'
Now Looking at mapping( curl -XGET "http://localhost:9200/i1/_mapping" ), It seems like it is working.
Hope this helps!! Thanks

If you're using Elasticsearch 6.0 or above, an index can have only one type.
So you have to create an index for your second type or create a custom type that would contain the two types.
For more details : Removal of multiple types in index

How to copy some ElasticSearch data to a new index

Let's say I have movie data in my ElasticSearch and I created them like this:
curl -XPUT "http://192.168.0.2:9200/movies/movie/1" -d'
{
"title": "The Godfather",
"director": "Francis Ford Coppola",
"year": 1972
}'
And I have a bunch of movies from different years. I want to copy all the movies from a particular year (so, 1972) and copy them to a new index of "70sMovies", but I couldn't see how to do that.

Since ElasticSearch 2.3 you can now use the built in _reindex API
for example:
POST /_reindex
{
"source": {
"index": "twitter"
},
"dest": {
"index": "new_twitter"
}
}
Or only a specific part by adding a filter/query
POST /_reindex
{
"source": {
"index": "twitter",
"query": {
"term": {
"user": "kimchy"
}
}
},
"dest": {
"index": "new_twitter"
}
}
Read more: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html

The best approach would be to use elasticsearch-dump tool https://github.com/taskrabbit/elasticsearch-dump.
The real world example I used :
elasticdump \
--input=http://localhost:9700/.kibana \
--output=http://localhost:9700/.kibana_read_only \
--type=mapping
elasticdump \
--input=http://localhost:9700/.kibana \
--output=http://localhost:9700/.kibana_read_only \
--type=data

Check out knapsack:
https://github.com/jprante/elasticsearch-knapsack
Once you have the plugin installed and working, you could export part of your index via query. For example:
curl -XPOST 'localhost:9200/test/test/_export' -d '{
"query" : {
"match" : {
"myfield" : "myvalue"
}
},
"fields" : [ "_parent", "_source" ]
}'
This will create a tarball with only your query results, which you can then import into another index.

To reindex specific type from source index to destination index type syntax is
POST _reindex/
{
"source": {
"index": "source_index",
"type": "source_type",
"query": {
// add filter criteria
}
},
"dest": {
"index": "dest_index",
"type": "dest_type"
}
}

If the intent were to copy some portion of the data or the entire data to an index with the same settings/mappings as that of the original index one could use the clone api to achieve the same. Something like below:
POST /<index>/_clone/<target-index>
OR
PUT /<index>/_clone/<target-index>
However if the intent is to copy the data to a new index with the different settings/mappings than the original index one could use the reindex api to achieve the same. Something like below:
POST _reindex/
{
"source": {
"index": "source_index",
"type": "source_type",
"query": {
// add filter criteria
}
},
"dest": {
"index": "dest_index",
"type": "dest_type"
}
}
*Note: In case of reindex api the target index has to be created prior to actual api call.
For further reading on difference between clone and reindex refer What's the difference between cloning and reindexing an index in Elasticsearch?

You can do it easily with elasticsearch-dump (https://github.com/taskrabbit/elasticsearch-dump) in three steps. In the following example I copy the index "thor" to "thor2"
elasticdump --input=http://localhost:9200/thor --output=http://localhost:9200/thor2 --type=analyzer
elasticdump --input=http://localhost:9200/thor --output=http://localhost:9200/thor2 --type=mapping
elasticdump --input=http://localhost:9200/thor --output=http://localhost:9200/thor2 --type=data

Well the straightforward way to do this is to write code, with the API of your choice, querying for "year": 1972 and then indexing that data into a new index. You would use the Search api or the Scan and Scroll API to get all the documents and then either index them one by one or use the Bulk Api:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-search.html
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-scroll.html
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-index_.html
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-bulk.html
Assuming you don't want to do this via code but are looking for a direct way of doing this, I suggest the Elasticsearch Snapshot and Restore. Basically you would take a snapshot of your existing index, restore it into a new index and then use the Delete command to delete all documents with a year other than 1972.
Snapshot And Restore
The snapshot and restore module allows to create snapshots of
individual indices or an entire cluster into a remote repository. At
the time of the initial release only shared file system repository was
supported, but now a range of backends are available via officially
supported repository plugins.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-snapshots.html
Delete By Query API
The delete by query API allows to delete documents from one or more
indices and one or more types based on a query. The query can either
be provided using a simple query string as a parameter, or using the
Query DSL defined within the request body.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-delete-by-query.html

Since v7.4 the _clone api was introduced and can easily satisfy your need: (read for the relevant prerequisites and monitoring involved)
POST /<index>/_clone/<target-index>
Or:
PUT /<index>/_clone/<target-index>

You can use elasticdump --searchBody:
# Copy documents from movies to 70sMovies (filtering using query)
elasticdump \
--input=http://localhost:9200/movies \
--output=http://localhost:9200/70sMovies \
--type=data \
--searchBody="{\"query\":{\"term\":{\"username\": \"admin\"}}}" # <--- Your query here
more on elasticdump options here.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

ElasticSearch - Reindexing your data with zero downtime - elasticsearch

Related

elasticsearch reindex doesnt use index name specified in script

How to create a duplicate index in ElasticSearch from existing index?

Update configuration for actively used index without data loss

Creating a mapping for an existing index with a new type in elasticsearch

How to copy some ElasticSearch data to a new index

Categories

Resources