ElasticSearch backup and restore - elasticsearch

as a PoC we are looking to define a method of backing up and restoring elasticsearch clusters that are running on AWS EC2 instances. The clusters each have more than 1 node running on different EC2 instances.
Being new to elasticsearch the main method that appears is to use the elasticsearch snapshot API, however are there any issues with using AWS Backup as a service to take snapshots of the EC2 instances themselves?
The restoration process would then be to create a new EC2 instance from a specified AMI that is created by the AWS Backup snapshot of the original EC2 instance running elasticsearch.

You can do that, but it has some drawbacks and it is not recommended.
First, to make a snapshot of any instance, you will need to stop your entire elasticsearch cluster. If, for example, your cluster has 3 nodes, you will need to stop all your nodes and make the snapshots, you can't make a snapshot of only one node, you will need to make a snapshot of the entire cluster at the same moment, always.
Second, since you are making snapshots of the entire instance, not only the elasticsearch data, you lose the flexiblity of restoring the data in another place, or restore just part of the data, you need to restore everything. Also, if you make snapshots everyday at 23:00 P.M. and for some reason you need to restore your snapshot at 17:00 P.M. next day, everything stored after your last snapshot will be lost.
And Third, even if you took those precautions, there is no guarantee that you will not have problems or corrupted data.
As per the documentation:
The only reliable way to back up a cluster is by using the snapshot
and restore functionality
Since you are using AWS, the best approach would be to use a s3 repository for your snapshots and automate your backups using the snapshot lifecycle managment in kibana.

Related

Is there any way to restore Elasticsearch snapshots apart from using the Elasticsearch restore API?

my company wants to use an existing Elasticsearch snapshot repository (consisting of various hundreds of gigabytes) to obtain the original documents and store them elsewhere. I must state that the snapshots have been obtained using the Elasticsearch snapshot API.
My company is somehow reluctant to use Elasticsearch to restore the snapshots, as they fear that would involve creating a new Elasticsearch cluster that would consume considerable resources. So far, I have not seen any other way to restore the snapshots than to use Elasticsearch, but, given my company's insistence, I ask here: is there any other tool that I could use to restore said snapshots? Thank you in advance for any help resolving this issue.
What I would do in your shoes is to spin up a local cluster and restore the existing snapshot into it (here is the relevant Elastic documentation: Restoring to a different cluster). Then, from there, I would either export the data by using the Kibana Reporting plugin (https://www.elastic.co/what-is/kibana-reporting), or by writing a Logstash pipeline to export the data from the local cluster to - say - a CSV file.

Does creating snapshot make cluster slow?

I use ElasticSearch 5.6.
When running snapshot, I run
http://localhost:9200/_cluster/health
but did not get response for more than 10 sec.
I can also see when snapshot runs, machines have a lot of costs at disk/network IO.
Such a delay does not happen if I do not run snapshot.
I check _cluster/health with timeout to ensure that creating snapshot does not slow-down queries.
Is it the correct way to check this?
In practice will creating snapshots make queries slow down?
Yes, there is increased disk activity as indices are read however excerpt from elastic documentation states:
The index snapshot process is incremental. In the process of making the index snapshot Elasticsearch analyses the list of the index files that are already stored in the repository and copies only files that were created or changed since the last snapshot. That allows multiple snapshots to be preserved in the repository in a compact form. Snapshotting process is executed in non-blocking fashion. All indexing and searching operation can continue to be executed against the index that is being snapshotted.
Apart from _cluster/health check taking more than 10 secs do you see any impact to data indexing/ searching etc ?
How frequently are you running the snapshots ? Is it a full cluster snapshot ? Where is the snapshot repository - filesystem / s3 / Azure/ Google cloud ?

how to transfer elastic data from one server to another

How do I move Elasticsearch data from one server to another?
I have server A running Elasticsearch 1.4.2 on one local node with multiple indices. I would like to copy that data to server B running Elasticsearch with the same version. The lucene_version is also same on both the servers.But when I copy all the files to server B data is not migrated it only shows the mappings of all the node. I tried the same procedure on my local computer and it worked perfectly. Am I missing something on the server end?
This can be achieved by multiple ways. The easier and safest way is to create a replica on the new node. Replica can be created by starting a new node on the new server by assigning the same cluster name. (if you have changed other network configurations then you might need to change that also). If you have initialized your index with no replica before then you can change the number of replica online using update settings api
Your cluster will be in yellow state until your datas are in sync.Normal operations won't get affected.
Once your cluster state is in green you can shut down the server you do not wish to have. At this stage your cluster stage will go to yellow again. You can use the update setting to change replica count to 0 / add other nodes to bring cluster state in green state.
This way is recommended only if both your servers are on the same network else data syncing will take lots of time.
Another way is to use snapshot. You can create a snapshot on your old server. Copy the snapshot files from the old server to new server in the same location. On the new server create the same snapshot on the same location. You will find the snapshot file you copied. You can restore it using that. Doing it using command line can be a bit cumbersome. You can use a plugin like kopf which will make taking snapshot and restore as easy as button click.

Elastic search - exporting index from a cluster to a different cluster

Can I use the snapshot and restore module of elastic search to export one index to all together to a new elastic search cluster.
For e.g
Can I export all the indices from development cluster to QA cluster?
Check out https://github.com/taskrabbit/elasticsearch-dump, we made it for exactly this purpose
Yes, you can use the snapshot & restore feature to backup indices from one cluster and move them to another cluster. Once you have created the snapshot on your development cluster, you can copy it over to your QA cluster and restore it.
The advantage over just performing a direct copy of the index folder is that you do not need to shutdown the clusters. The snapshot is a point in time capture of the indices state and can be run without taking the cluster offline.

Setting up a single backup node for an elasticsearch cluster?

Given Elasticsearch cluster with several machines, I would want to have a single machine(special node) that is located on a different geographical region that can effectively sync with the cluster for read only purpose. (i.e. no write for the special node; and that special node should be able to handle all query on its own). Is it possible and how can this be done?
With elasticsearch 1.0 (currently available in RC1) you can use the snapshot & restore api; have a look at this blog too to know more.
You can basically make a snapshot of your indices, then copy the snapshot over to the secondary location and restore it into a different cluster. The nice part is that snapshots are incremental, which means that only the files that have changed since the last snapshot are actually backed up. You can then create snapshots at regular intervals, and import them into the secondary cluster.
If you are not using 1.0 yet, I would suggest to have a look at it, snapshot & restore is a great addition. You can still make backups manually and restore them with 0.90, but you don't have a nice api to do that and you need to do everything pretty much manually.

Resources