Can I use the snapshot and restore module of elastic search to export one index to all together to a new elastic search cluster.
For e.g
Can I export all the indices from development cluster to QA cluster?
Check out https://github.com/taskrabbit/elasticsearch-dump, we made it for exactly this purpose
Yes, you can use the snapshot & restore feature to backup indices from one cluster and move them to another cluster. Once you have created the snapshot on your development cluster, you can copy it over to your QA cluster and restore it.
The advantage over just performing a direct copy of the index folder is that you do not need to shutdown the clusters. The snapshot is a point in time capture of the indices state and can be run without taking the cluster offline.
Related
Sorry for what may be an obvious question. But I have a 3 node ElasticSearch cluster, and I want it to take a nightly snapshot that is sent to S3 for recovery. I have done this for my test cluster which is a single node. And I was starting to do it for my 3 node production cluster when I was left wondering if I have to configure the repository and snapshot on each node separately or can I just do it on one node via Kibana and then it will replicate that across the cluster? I have looked through the documentation but didn't see anything about this.
Thank you!
Yes, you need to configure it in every node.
First you need to install the repository-s3 plugin in every node, this is explained in the documentation.
After that, you also need to add the access and secret keys in the elasticsearch-keystore of every node. (documentation).
The rest of the configuration, creating the repository and setting the snapshots, are done through Kibana once.
as a PoC we are looking to define a method of backing up and restoring elasticsearch clusters that are running on AWS EC2 instances. The clusters each have more than 1 node running on different EC2 instances.
Being new to elasticsearch the main method that appears is to use the elasticsearch snapshot API, however are there any issues with using AWS Backup as a service to take snapshots of the EC2 instances themselves?
The restoration process would then be to create a new EC2 instance from a specified AMI that is created by the AWS Backup snapshot of the original EC2 instance running elasticsearch.
You can do that, but it has some drawbacks and it is not recommended.
First, to make a snapshot of any instance, you will need to stop your entire elasticsearch cluster. If, for example, your cluster has 3 nodes, you will need to stop all your nodes and make the snapshots, you can't make a snapshot of only one node, you will need to make a snapshot of the entire cluster at the same moment, always.
Second, since you are making snapshots of the entire instance, not only the elasticsearch data, you lose the flexiblity of restoring the data in another place, or restore just part of the data, you need to restore everything. Also, if you make snapshots everyday at 23:00 P.M. and for some reason you need to restore your snapshot at 17:00 P.M. next day, everything stored after your last snapshot will be lost.
And Third, even if you took those precautions, there is no guarantee that you will not have problems or corrupted data.
As per the documentation:
The only reliable way to back up a cluster is by using the snapshot
and restore functionality
Since you are using AWS, the best approach would be to use a s3 repository for your snapshots and automate your backups using the snapshot lifecycle managment in kibana.
I have an S3 bucket of elasticsearch snapshots created by a curator job. I want to be able to restore these indexes to a fresh cluster using the S3 bucket. The target elasticsearch cluster does not have access to the source elasticsearch cluster by design.
I've installed cloud-aws plugin on the es client for the target cluster and I set permissions to the S3 bucket using environment variables. I have the config and action file in place for curator. I've verified the AWS permissions to the S3 bucket, but I'm not sure how to verify the permissions from the elasticsearch cluster's perspective. When I try running the curator job I get the following:
get_repository:662 Repository my-elk-snapshots not found.
I know that if I were to use elasticsearch directly I would need to create a reference to the S3 bucket so that the cluster knows about it. Is this the case for a fresh restore? I think that curator uses the elasticsearch cluster under the hood, but I'm confused about this scenario since the cluster is essentially blank.
How did you add the repository to the original (source) cluster? You need to use the exact same steps to add the repository to the new (target) cluster. Only then will the repository be readable by the new cluster. That's why you're getting the "repository not found" message. It has to be added to the new cluster so that snapshots are visible, and therefore able to be restored.
How do I move Elasticsearch data from one server to another?
I have server A running Elasticsearch 1.4.2 on one local node with multiple indices. I would like to copy that data to server B running Elasticsearch with the same version. The lucene_version is also same on both the servers.But when I copy all the files to server B data is not migrated it only shows the mappings of all the node. I tried the same procedure on my local computer and it worked perfectly. Am I missing something on the server end?
This can be achieved by multiple ways. The easier and safest way is to create a replica on the new node. Replica can be created by starting a new node on the new server by assigning the same cluster name. (if you have changed other network configurations then you might need to change that also). If you have initialized your index with no replica before then you can change the number of replica online using update settings api
Your cluster will be in yellow state until your datas are in sync.Normal operations won't get affected.
Once your cluster state is in green you can shut down the server you do not wish to have. At this stage your cluster stage will go to yellow again. You can use the update setting to change replica count to 0 / add other nodes to bring cluster state in green state.
This way is recommended only if both your servers are on the same network else data syncing will take lots of time.
Another way is to use snapshot. You can create a snapshot on your old server. Copy the snapshot files from the old server to new server in the same location. On the new server create the same snapshot on the same location. You will find the snapshot file you copied. You can restore it using that. Doing it using command line can be a bit cumbersome. You can use a plugin like kopf which will make taking snapshot and restore as easy as button click.
Given Elasticsearch cluster with several machines, I would want to have a single machine(special node) that is located on a different geographical region that can effectively sync with the cluster for read only purpose. (i.e. no write for the special node; and that special node should be able to handle all query on its own). Is it possible and how can this be done?
With elasticsearch 1.0 (currently available in RC1) you can use the snapshot & restore api; have a look at this blog too to know more.
You can basically make a snapshot of your indices, then copy the snapshot over to the secondary location and restore it into a different cluster. The nice part is that snapshots are incremental, which means that only the files that have changed since the last snapshot are actually backed up. You can then create snapshots at regular intervals, and import them into the secondary cluster.
If you are not using 1.0 yet, I would suggest to have a look at it, snapshot & restore is a great addition. You can still make backups manually and restore them with 0.90, but you don't have a nice api to do that and you need to do everything pretty much manually.