Backup elasticsearch index - elasticsearch

Is it possible to get backup of index from elasticksearch by http rest interface?
Can I just send http-request and get snapshot without creating snapshot repository?

Want to store as elasticsearch restorable file?
You can store snapshots of individual indices or an entire cluster in a remote repository like a shared file system, S3, or HDFS.
Want to store as JSON so you can use the data outside es?
elasticdump works by sending an input to an output. Both can be either an elasticsearch URL or a File.
CSV?
https://github.com/taraslayshchuk/es2csv

Related

How to take snapshots of specific indices in Elastic Cloud Enterprise?

In Elastic Cloud UI, You can take snapshots/backup of your entire on-disk data and store it in a file shared system, say, Object Store S3.
How do I backup only certain indices instead of all with using Elastic Cloud UI only? Is there a way?
If not then and only then I want to go with APIs.
If you link out to the Elasticsearch Service docs for Snapshot and Restore, you will see that we also link to the Elasticsearch Snapshot and Restore docs. Here you will find instructions to backup certain indices. You can use the API console to do this more easily through the Elastic Cloud UI.

Loading data from S3 to Elasticsearch using AWS glue

I have muliple folders in a S3 bucket and each folder contains one JSON lines file.
I want to do two things with this data
Apply some transformations and get tabular data and save it to some database.
save these json objects, as it is to Elasticseach cluster for full text search
I am using AWS glue for this task and I know how to do 1, but, I can't find any resources that talks about getting data from s3 and storing it to elasticsearch using AWS glue.
Is there a way to do this?
If anyone is looking for an answer to this then I used Logstash to load files to Elasticsearch.

Is there any way to restore Elasticsearch snapshots apart from using the Elasticsearch restore API?

my company wants to use an existing Elasticsearch snapshot repository (consisting of various hundreds of gigabytes) to obtain the original documents and store them elsewhere. I must state that the snapshots have been obtained using the Elasticsearch snapshot API.
My company is somehow reluctant to use Elasticsearch to restore the snapshots, as they fear that would involve creating a new Elasticsearch cluster that would consume considerable resources. So far, I have not seen any other way to restore the snapshots than to use Elasticsearch, but, given my company's insistence, I ask here: is there any other tool that I could use to restore said snapshots? Thank you in advance for any help resolving this issue.
What I would do in your shoes is to spin up a local cluster and restore the existing snapshot into it (here is the relevant Elastic documentation: Restoring to a different cluster). Then, from there, I would either export the data by using the Kibana Reporting plugin (https://www.elastic.co/what-is/kibana-reporting), or by writing a Logstash pipeline to export the data from the local cluster to - say - a CSV file.

how to restore elasticsearch indices from S3 to blank cluster using curator?

I have an S3 bucket of elasticsearch snapshots created by a curator job. I want to be able to restore these indexes to a fresh cluster using the S3 bucket. The target elasticsearch cluster does not have access to the source elasticsearch cluster by design.
I've installed cloud-aws plugin on the es client for the target cluster and I set permissions to the S3 bucket using environment variables. I have the config and action file in place for curator. I've verified the AWS permissions to the S3 bucket, but I'm not sure how to verify the permissions from the elasticsearch cluster's perspective. When I try running the curator job I get the following:
get_repository:662 Repository my-elk-snapshots not found.
I know that if I were to use elasticsearch directly I would need to create a reference to the S3 bucket so that the cluster knows about it. Is this the case for a fresh restore? I think that curator uses the elasticsearch cluster under the hood, but I'm confused about this scenario since the cluster is essentially blank.
How did you add the repository to the original (source) cluster? You need to use the exact same steps to add the repository to the new (target) cluster. Only then will the repository be readable by the new cluster. That's why you're getting the "repository not found" message. It has to be added to the new cluster so that snapshots are visible, and therefore able to be restored.

elasticsearch snapshot vs elasticdump

I have a very slow internet connection and have a server that is running Elasticsearch. I am looking at having a local, read only, version of the elastic search indices with a local kabana instance as i dont need the data to be live. I know there are 3 ways of doing this, making my local machine a node in the ES cluster, taking a snapshot and transferring it or using elasticdump and transferring the file. i understand the issues with adding my local as a node but dont understand the difference between a snapshot and elasticdump.
What is the difference between a snapshot and elasticdump? what are the advantages and disadvantages of each?
elasticdump will simply scan one index in your remote ES cluster and will either dump the JSON data into a file it can then replay to rebuild the index in the same or some other ES instance (remote or local).
elasticdump can also store the data it pumps from your remote ES directly into your local instance (instead of storing the data into a file).
Snapshot/restore is the official way of backuping your index data. There are various targets (filesystem, S3, etc), but the main idea is that you do a first snapshot and then all subsequent snapshots will be incremental, i.e. the snapshot process will only store what has changed since the last run.
In your case, you can go either way, but using elasticdump is straightforward if all you want to do is to have a local copy of your production data.
Another option we are sometimes using successfully is using autossh for maintaining connection and opening SSH tunnel between remote Elasticsearch nodes.
autossh -M 30010 -f user#remote.example.com -L 9200:localhost:9200 -N
Depending on your security policies and environment, this works really well for accessing live data remotely even with poor connectivity.

Resources