couchbase xdcr replication with view - view

i have a question.
I have couchbase installed in this situation:
2 cluster with:
cluster 1:
192.168.1.91
192.168.1.92
192.168.1.93
and cluster 2:
192.168.1.94
192.168.1.95
192.168.1.96
i want to set up replication...so i have created a bucket (test) with 2 replicas, so...
i think that data is replicated in cluster 1... and in cluster 2..
i have set 2 xdcr...
one in cluster 1 to cluster 2 and another one
in cluster 2 to cluster 1....
and seem working but i don't understand some thinks...
1) data is replicated from cluster 1 to cluster 2... but there is a way to replicated also the views?..
2) i have seen another think... in bucket test i have for example 1000 record.
so.. more or less 300 for node.
if a node go down i thoght that i see anywhere 1000 record (for this reason i need replication and i set 2 replicas for bucket) but instead i see only 600 record of my bucket test,why this?
thanks a lot to anyone..

1) views aren't replicated. What you should do is create the same views on both sides of the cluster and they will be updated as data is replicated between your clusters.
2) My guess is that when your node crashes you are not actually failing it over. This needs to be done in order to active the replicas on the other nodes.

Related

What is the behavior of Redis slave when using it as a cache?

I am quite new to Redis, and I am trying to figure out the behavior of Redis slaves in caching. Two of my Redis slaves has a 0% hit rate, where one of them has 100+ keyspace_misses while the other has 900+ keyspace_misses. I have the master slave configured like this:
Master Slave
1 5
2 6
3 7
4 8
The other slave has 0 keyspace_misses while the last slave has 0 keyspace_misses and 2 keyspace_hits. Is it normal for Redis slaves to do lookups? Or is it caused by by a problem in master? Are there logs to show this problem?
So how this works is,
set command is executed in master.
this data is sent for replication to slave.
when there is a get request, it lands on any of the node (master or slave) where it is searched and the value is returned if found.
What you say:
Two of my Redis slaves has a 0% hit rate -
You might be missing slaveof ip_to_contact_master port_to_contact_master in your redis.conf file
one of them has 100+ keyspace_misses while the other has 900+ keyspace_misses - Keyspace misses are normal as the incoming key may not be in redis or may have been expired or may have not been replicated yet.
You can read about scaling reads in redis here

Rethink DB Cross Cluster Replication

I have 3 different pool of clients in 3 different geographical locations.
I need configure Rethinkdb with 3 different clusters and replicate data between the (insert, update and deletes). I do not want to use shard, only replication.
I didn't found in documentation if this is possible.
I didn't found in documentation how to configure multi-cluster replication.
Any help is appreciated.
I think that multi cluster is just same a single clusters with nodes in different data center
First, you need to setup a cluster, follow this document: http://www.rethinkdb.com/docs/start-a-server/#a-rethinkdb-cluster-using-multiple-machines
Basically using below command to join a node into cluster:
rethinkdb --join IP_OF_FIRST_MACHINE:29015 --bind all
Once you have your cluster setup, the rest is easy. Go to your admin ui, select the table, in "Sharding and replication", click Reconfigure and enter how many replication you want, just keep shard at 1.
You can also read more about Sharding and Replication at http://rethinkdb.com/docs/sharding-and-replication/#sharding-and-replication-via-the-web-console

How to migrate single datacenter cluster to multiple datacenter cluster in cassandra>

Provide recommended configuration to migrate the data from the single data center cassandra cluster to multiple data center cassandra cluster. Currenlty i have the single data center cluster environment with following configurations,
i) No of nodes: 3
ii) Replication Factor : 2
iii) Strategy: SimpleStrategy
iv) endpoint_snitch: SimpleSnitch
And now i am planning to add 2 more nodes which is in different location. So i thought of moving to Multiple data center cluster with following confiruations.
i) No of nodes: 5
ii) RF: dc1=2, dc2=2
iii) Strategy: NetworkTopolofyStrategy
iv). endpoint_snitch: PropertyFileSnitch (I have the cassandra.topolofy.properties file)
What is the procedure to migrate the data without losing any data?
Please let me know the recommended steps to follow or any guide which i can refer. Please let me know if further info is required.
Complete repairs on all nodes.
Take snapshot on all nodes to have a fall back point.
Decommission each node that is not a pure Cassandra workload. Repair the ring each time you decommission a node.
Update keyspaces with NetworkTopologyStrategy and replication factor to match the original RF
ALTER KEYSPACE keyspace_name
WITH REPLICATION =
{ 'class' : 'NetworkTopologyStrategy', 'datacenter_name' : 2 };
Change snitch on each node with restart.
Add nodes in a different datacenter. Make sure that when you add them you have auto_bootstrap: false in the cassandra.yaml
Run nodetool rebuild original_dc_name on each new node.
I just found this excellent tutorial on migrating Cassandra:
Cassandra Migration To EC2 by highscalability.com
Although the details will be found at the original article, an outline of the main steps are:
1. Cassandra Multi DC Support
Configure the PropertyFileSnitch
Update the replication strategy
Update the client connection setting
2. Setup Cassandra On EC2
Start the nodes
Stop the EC2 nodes and cleanup
Start the nodes
Place data replicas in the cluster on EC2
3. Decommission The Old DC And Cleanup
Decommission the seed node(s)
Update your client settings
Decommission the old data center

restore a cassandra cluster from snapshot failed

Hope someone can help. We are having issues restoring all nodes of a cassandra 2.0 cluster from a snapshot. I have reviewed the instructions [Restoring from a snapshot][1]
Specific steps done include:
All data had been flushed from the memtables.
All nodes were compacted down to 1 sstable
Snapshots were taken on all nodes and saved off elsewhere
New cluster stood up, install from sratch of identical cluster (less data)
keyspace and column families were created
All nodes were stopped
commitlogs were cleared on all nodes and verified no sstable files existed
snapshot sstables were copied to each corresponding node under the base table folder
All nodes were restarted
Nodetool repair was run on all nodes
Result of these steps that appear to match the documentation is:
For a 2 node cluster, nodetool cfstats on each node seems to report approximate number of keys each node would have. nodetool status shows correct division of data by host
logging into cqlsh and doing a select count(*) on one of the columnfamily with limit high enough to return all rows does not report back the correct/original number of rows. It appears to report just the results of one node.
Is there a step missing from the documentation? Why doesn't a select count(*) show all the rows?
Thanks,
dfgriffith

How to balance load of HBase while loading file?

I am new to Apache-Hadoop. I have Apache-Hadoop cluster of 3 nodes. I am trying to load a file having 4.5 billion records,but its not getting distributed to all nodes. The behavior is kind of region hotspotting.
I have removed "hbase.hregion.max.filesize" parameter from hbase-site.xml config file.
I observed that if I use 4 node's cluster then it distributes data to 3 nodes and if I use 3 node's cluster then it distributes to 2 nodes.
I think, I am missing some configuration.
Generaly with HBase the main issue is to prepare rowkeys that are not monotonically.
If they are, only oneregion server is used at the time:
http://ikaisays.com/2011/01/25/app-engine-datastore-tip-monotonically-increasing-values-are-bad/
This is HBase Reference Guide about RowKey Design:
http://hbase.apache.org/book.html#rowkey.design
And one more really good article:
http://hortonworks.com/blog/apache-hbase-region-splitting-and-merging/
In our case predefinition of Region servers also improved the loading time:
create 'Some_table', { NAME => 'fam'}, {SPLITS=> ['a','d','f','j','m','o','r','t','z']}
Regards
Pawel

Resources