how to transfer elastic data from one server to another - elasticsearch

How do I move Elasticsearch data from one server to another?
I have server A running Elasticsearch 1.4.2 on one local node with multiple indices. I would like to copy that data to server B running Elasticsearch with the same version. The lucene_version is also same on both the servers.But when I copy all the files to server B data is not migrated it only shows the mappings of all the node. I tried the same procedure on my local computer and it worked perfectly. Am I missing something on the server end?

This can be achieved by multiple ways. The easier and safest way is to create a replica on the new node. Replica can be created by starting a new node on the new server by assigning the same cluster name. (if you have changed other network configurations then you might need to change that also). If you have initialized your index with no replica before then you can change the number of replica online using update settings api
Your cluster will be in yellow state until your datas are in sync.Normal operations won't get affected.
Once your cluster state is in green you can shut down the server you do not wish to have. At this stage your cluster stage will go to yellow again. You can use the update setting to change replica count to 0 / add other nodes to bring cluster state in green state.
This way is recommended only if both your servers are on the same network else data syncing will take lots of time.
Another way is to use snapshot. You can create a snapshot on your old server. Copy the snapshot files from the old server to new server in the same location. On the new server create the same snapshot on the same location. You will find the snapshot file you copied. You can restore it using that. Doing it using command line can be a bit cumbersome. You can use a plugin like kopf which will make taking snapshot and restore as easy as button click.

Related

Cannot find datadog agent connected to elasticserch

I have an issue where i have multiple host dashboards for the same elasticsearch server. Both dashboards has its own name and way of collecting data. One is connected to the installed datadog-agent and the other is somehow connected to the elasticsearch service directly.
The weird thing is that i cannot seem to find a way to turn off the agent connected directly to the ES service, other than turning off the elasticsearch service completly.
I have tried to delete the datadog-agent completely. This stops the dashboard connected to it, to stop receiving data (of course) but the other dashboard keeps receiving data somehow. I cannot find what is sending this data and therefor is not able to stop it. We have multiple master and data node and this is an issue for all of them. ES version is 7.17
another of our clusters is running ES 6.8, and we have not made the final configuration of the monitoring of this cluster but for now it does not have this issue.
just as extra information:
The dashboard connected to the agent is called the same as the host server name, while the other only has the internal ip as it's host name.
Does anyone have any idea what it is that is running and how to stop it? I have tried almost everything i could think of.
i finally found the reason. as all datadog-agents on all master and data nodes was configured to not use the node name as the name and cluster stats was turned on for the elastic plugin for datadog. This resulted in the behavior that when even one of the datadog-agents in the cluster was running, data was coming in to the dashboard which was not named correclty. Leaving the answer here if anyone hits the same situation in the future.

ElasticSearch backup and restore

as a PoC we are looking to define a method of backing up and restoring elasticsearch clusters that are running on AWS EC2 instances. The clusters each have more than 1 node running on different EC2 instances.
Being new to elasticsearch the main method that appears is to use the elasticsearch snapshot API, however are there any issues with using AWS Backup as a service to take snapshots of the EC2 instances themselves?
The restoration process would then be to create a new EC2 instance from a specified AMI that is created by the AWS Backup snapshot of the original EC2 instance running elasticsearch.
You can do that, but it has some drawbacks and it is not recommended.
First, to make a snapshot of any instance, you will need to stop your entire elasticsearch cluster. If, for example, your cluster has 3 nodes, you will need to stop all your nodes and make the snapshots, you can't make a snapshot of only one node, you will need to make a snapshot of the entire cluster at the same moment, always.
Second, since you are making snapshots of the entire instance, not only the elasticsearch data, you lose the flexiblity of restoring the data in another place, or restore just part of the data, you need to restore everything. Also, if you make snapshots everyday at 23:00 P.M. and for some reason you need to restore your snapshot at 17:00 P.M. next day, everything stored after your last snapshot will be lost.
And Third, even if you took those precautions, there is no guarantee that you will not have problems or corrupted data.
As per the documentation:
The only reliable way to back up a cluster is by using the snapshot
and restore functionality
Since you are using AWS, the best approach would be to use a s3 repository for your snapshots and automate your backups using the snapshot lifecycle managment in kibana.

elasticsearch snapshot vs elasticdump

I have a very slow internet connection and have a server that is running Elasticsearch. I am looking at having a local, read only, version of the elastic search indices with a local kabana instance as i dont need the data to be live. I know there are 3 ways of doing this, making my local machine a node in the ES cluster, taking a snapshot and transferring it or using elasticdump and transferring the file. i understand the issues with adding my local as a node but dont understand the difference between a snapshot and elasticdump.
What is the difference between a snapshot and elasticdump? what are the advantages and disadvantages of each?
elasticdump will simply scan one index in your remote ES cluster and will either dump the JSON data into a file it can then replay to rebuild the index in the same or some other ES instance (remote or local).
elasticdump can also store the data it pumps from your remote ES directly into your local instance (instead of storing the data into a file).
Snapshot/restore is the official way of backuping your index data. There are various targets (filesystem, S3, etc), but the main idea is that you do a first snapshot and then all subsequent snapshots will be incremental, i.e. the snapshot process will only store what has changed since the last run.
In your case, you can go either way, but using elasticdump is straightforward if all you want to do is to have a local copy of your production data.
Another option we are sometimes using successfully is using autossh for maintaining connection and opening SSH tunnel between remote Elasticsearch nodes.
autossh -M 30010 -f user#remote.example.com -L 9200:localhost:9200 -N
Depending on your security policies and environment, this works really well for accessing live data remotely even with poor connectivity.

Creating rethinkdb cluster

I'm writing an automation script that supposed to create 4 instances in AWS and deploy rethinkdb cluster on them without any human interaction. According to the documentation I need to either use --join parameter on command line or put join statements in configuration file. However, what I don't understand is if I need to specify join only once in order to create the cluster or every time I restart any of the cluster nodes?
My current understanding is that I only need to issue it once, the cluster configuration is somehow stored in metadata and next time I can just start rethinkdb without --join parameter and it will reconnect to the rest of the cluster on its own. But when would I need the join option in the configuration file then?
If this is true then do I need to start rethinkdb with --join option in my script then shut it down and then start again without --join? Is this the right way to do it or there are better alternatives?
You're right that on subsequent restarting, you don't need to specify --join from command line, it will discover the cluster and attempt to re-connect. Part of cluster state is store in system table server_config.
Even if you wiped out the data directory on a this node, it may still be able to form cluster because other nodes may have information about that node, and will attempt to connect to it. But if no other node store information about this particular server, or when this particular node is restarted and have a new IP address for some reason, and its data directory is wiped as well, this time, the cluster doesn't know about it(with new IP address).
So, I'll always specifiing --join. It doesn't hurt. And it helps in worst case to make the new node be able to join cluster again.

Setting up a single backup node for an elasticsearch cluster?

Given Elasticsearch cluster with several machines, I would want to have a single machine(special node) that is located on a different geographical region that can effectively sync with the cluster for read only purpose. (i.e. no write for the special node; and that special node should be able to handle all query on its own). Is it possible and how can this be done?
With elasticsearch 1.0 (currently available in RC1) you can use the snapshot & restore api; have a look at this blog too to know more.
You can basically make a snapshot of your indices, then copy the snapshot over to the secondary location and restore it into a different cluster. The nice part is that snapshots are incremental, which means that only the files that have changed since the last snapshot are actually backed up. You can then create snapshots at regular intervals, and import them into the secondary cluster.
If you are not using 1.0 yet, I would suggest to have a look at it, snapshot & restore is a great addition. You can still make backups manually and restore them with 0.90, but you don't have a nice api to do that and you need to do everything pretty much manually.

Resources