I am running my cluster on found (So no option to add plugins), and I am trying to configure s3 bucket to be able to create snapshots of my own.
I managed to configure the bucket but now I am trying to run the actual backup.
Is there a way to configure scheduled backup in elasticsearch? or I need to manually trigger a backup by sending a request to elasticsearch?
Thanks!
How to use a custom repository is documented here: https://www.elastic.co/guide/en/found/current/custom-repository.html
Note that we do snapshots every 30 minutes already: https://www.elastic.co/guide/en/found/current/restoring-snapshots.html
A more appropriate place for Found-related questions is this forum: https://discuss.elastic.co/c/found
Related
as a PoC we are looking to define a method of backing up and restoring elasticsearch clusters that are running on AWS EC2 instances. The clusters each have more than 1 node running on different EC2 instances.
Being new to elasticsearch the main method that appears is to use the elasticsearch snapshot API, however are there any issues with using AWS Backup as a service to take snapshots of the EC2 instances themselves?
The restoration process would then be to create a new EC2 instance from a specified AMI that is created by the AWS Backup snapshot of the original EC2 instance running elasticsearch.
You can do that, but it has some drawbacks and it is not recommended.
First, to make a snapshot of any instance, you will need to stop your entire elasticsearch cluster. If, for example, your cluster has 3 nodes, you will need to stop all your nodes and make the snapshots, you can't make a snapshot of only one node, you will need to make a snapshot of the entire cluster at the same moment, always.
Second, since you are making snapshots of the entire instance, not only the elasticsearch data, you lose the flexiblity of restoring the data in another place, or restore just part of the data, you need to restore everything. Also, if you make snapshots everyday at 23:00 P.M. and for some reason you need to restore your snapshot at 17:00 P.M. next day, everything stored after your last snapshot will be lost.
And Third, even if you took those precautions, there is no guarantee that you will not have problems or corrupted data.
As per the documentation:
The only reliable way to back up a cluster is by using the snapshot
and restore functionality
Since you are using AWS, the best approach would be to use a s3 repository for your snapshots and automate your backups using the snapshot lifecycle managment in kibana.
I have an elastic beanstalk environment set with code pipeline to my repository which is a Laravel site.
When I push a new change to the master branch it gets changed in the ec2, AND DELETES ALL DATA IN STORAGE.
I can't find a similar question online, so any ideas on how can I fix this issue?
Elastic Beanstalk runs on instances that don't persist data between redeployments by default. You need to re-design app to use one of the available options if you really need stable local storage.
Take a look at Persistent storage section in the doc below.
https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/concepts.concepts.design.html
my company wants to use an existing Elasticsearch snapshot repository (consisting of various hundreds of gigabytes) to obtain the original documents and store them elsewhere. I must state that the snapshots have been obtained using the Elasticsearch snapshot API.
My company is somehow reluctant to use Elasticsearch to restore the snapshots, as they fear that would involve creating a new Elasticsearch cluster that would consume considerable resources. So far, I have not seen any other way to restore the snapshots than to use Elasticsearch, but, given my company's insistence, I ask here: is there any other tool that I could use to restore said snapshots? Thank you in advance for any help resolving this issue.
What I would do in your shoes is to spin up a local cluster and restore the existing snapshot into it (here is the relevant Elastic documentation: Restoring to a different cluster). Then, from there, I would either export the data by using the Kibana Reporting plugin (https://www.elastic.co/what-is/kibana-reporting), or by writing a Logstash pipeline to export the data from the local cluster to - say - a CSV file.
I have a elasticsearch 5.6.2 cluster with one master and two data nodes and I am using Kibana for visualizing . I want to enable automatic snapshots for the elasticsearch cluster to Amazon-s3 every 30mins. Can I Know How Can I accomplish it ..? There is no proper Documentation . I had also refered curator docs and I have a question, DO I need to configure that curator or on each node ...?
Please help guys
Curator is an external process.
You must put it on one single machine. It can be a node or any other machine.
It will send REST requests to elasticsearch when needed.
Put in your crontab and that is going to be ok.
You can also call the SNAPSHOT endpoint manually from a shell script every 30 minutes and don’t use curator at all.
Elastic cloud does a backup every 30 minutes (in case you don’t want to manage the cluster yourself and have that kind of advanced features like also rolling upgrades, Kibana, security...)
I have an S3 bucket of elasticsearch snapshots created by a curator job. I want to be able to restore these indexes to a fresh cluster using the S3 bucket. The target elasticsearch cluster does not have access to the source elasticsearch cluster by design.
I've installed cloud-aws plugin on the es client for the target cluster and I set permissions to the S3 bucket using environment variables. I have the config and action file in place for curator. I've verified the AWS permissions to the S3 bucket, but I'm not sure how to verify the permissions from the elasticsearch cluster's perspective. When I try running the curator job I get the following:
get_repository:662 Repository my-elk-snapshots not found.
I know that if I were to use elasticsearch directly I would need to create a reference to the S3 bucket so that the cluster knows about it. Is this the case for a fresh restore? I think that curator uses the elasticsearch cluster under the hood, but I'm confused about this scenario since the cluster is essentially blank.
How did you add the repository to the original (source) cluster? You need to use the exact same steps to add the repository to the new (target) cluster. Only then will the repository be readable by the new cluster. That's why you're getting the "repository not found" message. It has to be added to the new cluster so that snapshots are visible, and therefore able to be restored.