Provide recommended configuration to migrate the data from the single data center cassandra cluster to multiple data center cassandra cluster. Currenlty i have the single data center cluster environment with following configurations,
i) No of nodes: 3
ii) Replication Factor : 2
iii) Strategy: SimpleStrategy
iv) endpoint_snitch: SimpleSnitch
And now i am planning to add 2 more nodes which is in different location. So i thought of moving to Multiple data center cluster with following confiruations.
i) No of nodes: 5
ii) RF: dc1=2, dc2=2
iii) Strategy: NetworkTopolofyStrategy
iv). endpoint_snitch: PropertyFileSnitch (I have the cassandra.topolofy.properties file)
What is the procedure to migrate the data without losing any data?
Please let me know the recommended steps to follow or any guide which i can refer. Please let me know if further info is required.
Complete repairs on all nodes.
Take snapshot on all nodes to have a fall back point.
Decommission each node that is not a pure Cassandra workload. Repair the ring each time you decommission a node.
Update keyspaces with NetworkTopologyStrategy and replication factor to match the original RF
ALTER KEYSPACE keyspace_name
WITH REPLICATION =
{ 'class' : 'NetworkTopologyStrategy', 'datacenter_name' : 2 };
Change snitch on each node with restart.
Add nodes in a different datacenter. Make sure that when you add them you have auto_bootstrap: false in the cassandra.yaml
Run nodetool rebuild original_dc_name on each new node.
I just found this excellent tutorial on migrating Cassandra:
Cassandra Migration To EC2 by highscalability.com
Although the details will be found at the original article, an outline of the main steps are:
1. Cassandra Multi DC Support
Configure the PropertyFileSnitch
Update the replication strategy
Update the client connection setting
2. Setup Cassandra On EC2
Start the nodes
Stop the EC2 nodes and cleanup
Start the nodes
Place data replicas in the cluster on EC2
3. Decommission The Old DC And Cleanup
Decommission the seed node(s)
Update your client settings
Decommission the old data center
Related
I have started using Apache Ignite for my current project. I have set up the ignite Cluster with 3 Server Nodes with Backup Cache count as 1. Ignite Client Node is able to create a primary Cache as well as Backup cache in the cluster. But here I want to know for a particular cache which is Primary node and on which Node the Backup Cache is stored. Is there any tool available or any Visor command to do so along with finding the size of each cache.
Thank you.
Visor CLI shows how many primary and backup partitions each node holds.
By default, a cache is split into 1024 partitions. You can change that by configuring affinity function.
You may take a look at control.sh and inspect some specific partition distribution.
--cache distribution nodeId|null [cacheName1,...,cacheNameN] [--user-attributes attrName1,...,attrNameN]
Prints the information about partition distribution.
This commands prints partition distribution across nodes.
Sample:
./control.sh --cache distribution null myCache
[groupId,partition,nodeId,primary,state,updateCounter,partitionSize,nodeAddresses]
[next group: id=1482644790, name=myCache]
1482644790,0,e27ad549,P,OWNING,0,0,[0:0:0:0:0:0:0:1, 10.0.75.1, 127.0.0.1, 172.23.45.97, 172.25.4.211]
I have a cluster with 3 nodes - say cluster1 on AWS EC2 instances. The cluster is up and running, took snapshot of the keyspace's volume.
Now I want to restore few tables/keyspaces from the snapshot volumes, so I created another cluster say cluster2 and attached the snapshot volumes on to the new cluster's ec2 nodes (same number of nodes). Cluster2 is not starting bcz the system keyspace in the snapshot taken was having cluster name as cluster1 and the cluster on which it is being restored is cluster2. How do I do a restore in this case? I do not want to do any modifications to the existing cluster.
Also when I do restore do I need to think about the token ranges of the old and new cluster's mapping?
Before starting the cluster2, it's important to ensure that none of the IP addresses of the cluster1 are included in the seed list of the cluster2 to ensure that they are kept unaware between them. Also, to remove from the path data_file_directories (as defined in the cassandra.yaml), the following directories:
system
system_auth
system_distributed
system_traces
system_schema should not be touched, as it contains the schema definition of the keyspaces and tables.
Start the cluster, one node at a time; the first node should include its own IP address at the beginning of the seed list; This will be a one time change, and the change should be removed once that the cluster is up and running.
At this moment you should have a separate cluster, with the information and structure of the original cluster at the time that the snapshot was taken. To test this, execute nodetool gossipinfo and only the nodes of the cluster2 should be listed, login into cqlsh describe keyspaces should list all your keyspaces, and executing queries of your application should retrieve your data. You will note that Cassandra already generated the system* keyspaces, as well as dealt with the token distribution.
The next step is to update the name of the restored cluster, in each one of the nodes:
Log into cqlsh
Execute UPDATE system.local SET cluster_name = 'cluster2' where key='local';
exit cqlsh
run nodetool flush
run nodetool drain
edit the cassandra.yaml file, update cluster_name with the name 'cluster2'
restart the cassandra service
wait until the node is reported as NORMAL with nodetool status or nodetool netstats
repeat with a different node
At this point you will have 2 independent clusters, with different name.
I have 3 different pool of clients in 3 different geographical locations.
I need configure Rethinkdb with 3 different clusters and replicate data between the (insert, update and deletes). I do not want to use shard, only replication.
I didn't found in documentation if this is possible.
I didn't found in documentation how to configure multi-cluster replication.
Any help is appreciated.
I think that multi cluster is just same a single clusters with nodes in different data center
First, you need to setup a cluster, follow this document: http://www.rethinkdb.com/docs/start-a-server/#a-rethinkdb-cluster-using-multiple-machines
Basically using below command to join a node into cluster:
rethinkdb --join IP_OF_FIRST_MACHINE:29015 --bind all
Once you have your cluster setup, the rest is easy. Go to your admin ui, select the table, in "Sharding and replication", click Reconfigure and enter how many replication you want, just keep shard at 1.
You can also read more about Sharding and Replication at http://rethinkdb.com/docs/sharding-and-replication/#sharding-and-replication-via-the-web-console
i have a question.
I have couchbase installed in this situation:
2 cluster with:
cluster 1:
192.168.1.91
192.168.1.92
192.168.1.93
and cluster 2:
192.168.1.94
192.168.1.95
192.168.1.96
i want to set up replication...so i have created a bucket (test) with 2 replicas, so...
i think that data is replicated in cluster 1... and in cluster 2..
i have set 2 xdcr...
one in cluster 1 to cluster 2 and another one
in cluster 2 to cluster 1....
and seem working but i don't understand some thinks...
1) data is replicated from cluster 1 to cluster 2... but there is a way to replicated also the views?..
2) i have seen another think... in bucket test i have for example 1000 record.
so.. more or less 300 for node.
if a node go down i thoght that i see anywhere 1000 record (for this reason i need replication and i set 2 replicas for bucket) but instead i see only 600 record of my bucket test,why this?
thanks a lot to anyone..
1) views aren't replicated. What you should do is create the same views on both sides of the cluster and they will be updated as data is replicated between your clusters.
2) My guess is that when your node crashes you are not actually failing it over. This needs to be done in order to active the replicas on the other nodes.
I am new to Apache-Hadoop. I have Apache-Hadoop cluster of 3 nodes. I am trying to load a file having 4.5 billion records,but its not getting distributed to all nodes. The behavior is kind of region hotspotting.
I have removed "hbase.hregion.max.filesize" parameter from hbase-site.xml config file.
I observed that if I use 4 node's cluster then it distributes data to 3 nodes and if I use 3 node's cluster then it distributes to 2 nodes.
I think, I am missing some configuration.
Generaly with HBase the main issue is to prepare rowkeys that are not monotonically.
If they are, only oneregion server is used at the time:
http://ikaisays.com/2011/01/25/app-engine-datastore-tip-monotonically-increasing-values-are-bad/
This is HBase Reference Guide about RowKey Design:
http://hbase.apache.org/book.html#rowkey.design
And one more really good article:
http://hortonworks.com/blog/apache-hbase-region-splitting-and-merging/
In our case predefinition of Region servers also improved the loading time:
create 'Some_table', { NAME => 'fam'}, {SPLITS=> ['a','d','f','j','m','o','r','t','z']}
Regards
Pawel