Rethink DB Cross Cluster Replication - rethinkdb

I have 3 different pool of clients in 3 different geographical locations.
I need configure Rethinkdb with 3 different clusters and replicate data between the (insert, update and deletes). I do not want to use shard, only replication.
I didn't found in documentation if this is possible.
I didn't found in documentation how to configure multi-cluster replication.
Any help is appreciated.

I think that multi cluster is just same a single clusters with nodes in different data center
First, you need to setup a cluster, follow this document: http://www.rethinkdb.com/docs/start-a-server/#a-rethinkdb-cluster-using-multiple-machines
Basically using below command to join a node into cluster:
rethinkdb --join IP_OF_FIRST_MACHINE:29015 --bind all
Once you have your cluster setup, the rest is easy. Go to your admin ui, select the table, in "Sharding and replication", click Reconfigure and enter how many replication you want, just keep shard at 1.
You can also read more about Sharding and Replication at http://rethinkdb.com/docs/sharding-and-replication/#sharding-and-replication-via-the-web-console

Related

Multi data center for Clickhouse

Does clickhouse Multi Master or multi data center set up support?
Any other solutions for multi data center replication for clickhouse?
CH is multi-master only.
CH is multi / geo DC out the box. There are many users with cross-ocean DCs.
The only requirement is proper latency for Replicated* Engines.
All!!!!! ZK nodes should be in the same DC or in DCs with latency < 50ms. CH loading nodes (which ingest data) should be as close as possible to ZK (better <100ms). Non-loading replicas can be far -- 150-250ms.
Cross-ocean setup needs proper configuration of load-balancing to run queries on local-DC replicas and tuning some params (connect_timeout_with_failover_ms -- 50ms by default).
yes, clickhouse can be setup as multi-DC
please read about Distributed engine
https://clickhouse.yandex/docs/en/table_engines/distributed/
also look to load_balancing settings
https://clickhouse.yandex/docs/en/operations/settings/settings/#settings-load_balancing

Clustered NIFI, Only one node is working

I'm using NIFI in a clustered mode with two nodes, and I have noticed that only one node that do all the work.
Any idea why is that ? and how can I make nifi2 do some of the processing of the dataflow ?
It depends how data is coming in to your cluster. It is up to you as the data flow designer to create an approach that allows the data to be partitioned across your cluster for processing.
See this post for an overview of strategies to do this:
https://community.hortonworks.com/articles/16120/how-do-i-distribute-data-across-a-nifi-cluster.html

Multidatacenter Replication with Rethinkdb

I have two servers in two different geographic locations (alfa1 and alfa2).
r.tableCreate('dados', {shards:1, replicas:{alfa1:1, alfa2:1}, primaryReplicaTag:'alfa1'})
I need to be able to write for both servers, but when I try to shutdown alfa1, and write to alfa2, rethinkdb only allow reads: Table test.dados is available for outdated reads, but not up-to-date reads or writes.
I need a way to write for all replicas, not only for Primary.
Is this possible ? Does rethinkdb allow multidatacenter replication ?
I think that multidatacenter replication need to permit write for both datacenters.
I tried to remove "primaryReplicaTag" but system don't accept !
Any help is welcome !!!
RethinkDB does support multi-datacenter replication/sharding.
I think the problem here is that you've setup a cluster of two, which means that when one fails you only have 50% of the nodes in the cluster which means you have less than 51%.
From the failover docs - https://rethinkdb.com/docs/failover/
To perform automatic failover for a table, the following requirements
must be met:
The cluster must have three or more servers
The table must be configured to have three or more replicas
A majority (greater thanhalf) of replicas for the table must be available
Try adding just one additional server and your problems should be resolved.

How to migrate single datacenter cluster to multiple datacenter cluster in cassandra>

Provide recommended configuration to migrate the data from the single data center cassandra cluster to multiple data center cassandra cluster. Currenlty i have the single data center cluster environment with following configurations,
i) No of nodes: 3
ii) Replication Factor : 2
iii) Strategy: SimpleStrategy
iv) endpoint_snitch: SimpleSnitch
And now i am planning to add 2 more nodes which is in different location. So i thought of moving to Multiple data center cluster with following confiruations.
i) No of nodes: 5
ii) RF: dc1=2, dc2=2
iii) Strategy: NetworkTopolofyStrategy
iv). endpoint_snitch: PropertyFileSnitch (I have the cassandra.topolofy.properties file)
What is the procedure to migrate the data without losing any data?
Please let me know the recommended steps to follow or any guide which i can refer. Please let me know if further info is required.
Complete repairs on all nodes.
Take snapshot on all nodes to have a fall back point.
Decommission each node that is not a pure Cassandra workload. Repair the ring each time you decommission a node.
Update keyspaces with NetworkTopologyStrategy and replication factor to match the original RF
ALTER KEYSPACE keyspace_name
WITH REPLICATION =
{ 'class' : 'NetworkTopologyStrategy', 'datacenter_name' : 2 };
Change snitch on each node with restart.
Add nodes in a different datacenter. Make sure that when you add them you have auto_bootstrap: false in the cassandra.yaml
Run nodetool rebuild original_dc_name on each new node.
I just found this excellent tutorial on migrating Cassandra:
Cassandra Migration To EC2 by highscalability.com
Although the details will be found at the original article, an outline of the main steps are:
1. Cassandra Multi DC Support
Configure the PropertyFileSnitch
Update the replication strategy
Update the client connection setting
2. Setup Cassandra On EC2
Start the nodes
Stop the EC2 nodes and cleanup
Start the nodes
Place data replicas in the cluster on EC2
3. Decommission The Old DC And Cleanup
Decommission the seed node(s)
Update your client settings
Decommission the old data center

Prevent unknown nodes joining cluster

I am new to Cassandra. When i tried to set up a Cassandra cluster, i noticed that any node can join the cluster if it has the IP address of seed nodes, then your data can be seen by any one. How to prevent this? (i thought about requiring password before joining but don't know how to do).
use kerberos!! check Datastax documentation - http://www.datastax.com/docs/datastax_enterprise3.0/security/security_setup_kerberos

Resources