I am new B to CouchBase and looking into replication. One thing I am trying to figure out is how Couchbase handle replication conflicts between two caches. That means:
There are two couchbase servers called S1 and S2 added/replicated together and those servers located in different geographical locations.
Also there two clients (C1 and C2).C1 cache to S1 and C2 cache into S2 objects having a same key but different objects(C1 cache an object called Obj1, C2 cahces Ojb2 object) in same time.
My problem is that what the final value for that key is in the cluster? (what is in S1 and S2 for the key)
Writes in Couchbase
Ignoring replication for a moment and explaining how writes work in a single couchbase cluster.
A key in couchbase is hashed to a vbucket(shard). That vbucket only ever lives on one node in the cluster so there is only ever one writable copy of the data. When two clients write to the same key, the client that wrote last will "win". The couchbase SDK do expose a number of operations to help with this, such as "add()" and "cas()".
Internal replication
Couchbase does have replica copies of the data. These copies are not writable by the end user and only become active when a node fails. The replication used is a one way sync from the active vbucket to the replica vbucket. This is done memory to memory and is extremely fast. As a result for inter cluster replication you do not have to worry about conflict resolution. Do understand that if there is a failover before data has been replicated that data is lost, again the SDK expose a number of operations to ensure a write has been replicated to Nth number of nodes. See the observe commands.
External replication
External replication in couchbase is called XDCR where data is synced between two different clusters. It is best practices not to write the same key in both clusters at the same time. Instead to have a key space per a cluster and use the XDCR for disaster recovery. The couchbase manual explains the conflict resolution very well but basically the key in the cluster that has been updated the most will win.
If you would like to read more about cluster systems then CAP Theorem would be the place to start. Couchbase is a CP system.
Related
How is data consistency handled in the distributed cache using Oracle coherence where each cluster node is responsible only for a piece of data?
I also have confusion about below
Are cluster nodes on different servers and each has its own local cache?
For instance say I have node A, with cache "a" and node B and with cache "b", is the database on a
separate server D?
When is an update, is update first made on D and written back to cache a and b, or how does data consistency work.
Explanation in laymen terms will be helpful as I am new to Oracle Cohernace
Thank you!
Coherence uses two different distribution mechanisms: full replication and data partitioning; each distributed cache is configured to use one of these. Most caches in most large systems use the partitioned model, because they scale very well, adding storage with each server and maintaining very high performance even up to hundreds of servers.
The Coherence software architecture is service based; when Coherence starts, it first creates a local service for managing clustering, and that service communicates over the network to locate and then join (or create, if it is the first server running) the cluster.
If you have any partitioned caches, then those are managed by partitioned cache service(s). A partitioned cache service coordinates across the cluster to manage the entirety of the partitioned cache. It does this dynamically, starting by dividing the responsibilities of data management evenly across all of the storage enabled nodes. The data in the cache(s) is partitioned, which means "sliced up", so that some values will go to server 1, some values to server 2, etc. The data ownership model prevents any confusion about who owns what, so even if a message gets delayed on the network and ends up at the wrong server, no damage is done, and the system self-corrects. If a server dies, whatever data (slices) it was managing is backed up by one or more other server, and the servers work together to ensure that new back-ups are made for any data that does not have the desired number of backups. It is a dynamic system.
There are several different APIs provided to an application, starting with an API as simple as using a hash map (in fact it is the Java Map API).
What are the guidelines for creating Oracle NoSQL Database Storage Node (SN), can we create multiple storage nodes on the same machine? If so what are the trades off? I looked at the product documentation, but it's not clear
So digging deeper here's what was found :
It is recommended that Storage Nodes (SNs) be allocated one per node in the cluster for availability and performance reasons. If you believe that a given node has the I/O and CPU resources to host multiple Replication Nodes, the Storage Node's capacity parameter can be set to a value greater than one, and the system will know that multiple RNs may be hosted at this SN. This way, the system can:
ensure that each Replication Node in a shard is hosted on a different Storage Node, reducing a shard's vulnerability to failure dynamically divide up memory and other hardware resources among the Replication Nodes ensure that the master Replication Nodes, which are the ones which service write operations in a store, are distributed evenly among the Storage Nodes both at startup, and after any failovers. If more than one SN is hosted on the same node, multiple SNs are lost if that node fails, and data may become inaccessible.
You can set the capacity parameter for a Storage Node in several ways:
When using the makebootconfig command
List item with the change-policy command
List item with the plan change-params command.
Also, in very limited situations, such as for early prototyping and experimentation, it can be useful to create multiple SNs on the same node.
On a single machine, a Storage Node is uniquely identified by its root directory (KVROOT) plus a configuration file name, which defaults to "config.xml." This means you can create multiple SNs as by creating a unique KVROOT directory for each SN. Usually, these would be on different nodes, but it's also possible to have them on a single node.
Partition Tolerance - The system continues to operate as a whole even if individual servers fail or can't be reached.
Better definition from this link
Even if the connections between nodes are down, the other two (A & C)
promises, are kept.
Now consider we have master slave model in both RDBMS(oracle) and mongodb. I am not able to understand why RDBMS is said to not partition tolerant but mongo is partition tolerant.
Consider I have 1 master and 2 slaves. In case master gets down in mongo, reelection is done to select one of the slave as Master so that system continues to operate.
Does not happen the same in RDBMS system like oracle/Mysql ?
See this article about CAP theorem and MySQL.
Replication in Mysql cluster is synchronous, meaning transaction is not committed before replication happen. In this case your data should be consistent, however cluster may be not available for some clients in some cases after partition occurs. It depends on the number of nodes and arbitration process. So MySQL cluster can be made partition tolerant.
Partition handling in one cluster:
If there are not enough live nodes to serve all of the data stored - shutdown
Serving a subset of user data (and risking data consistency) is not an option
If there are not enough failed or unreachable nodes to serve all of the data stored - continue and provide service
No other subset of nodes can be isolated from us and serving clients
If there are enough failed or unreachable nodes to serve all of the data stored - arbitrate.
There could be another subset of nodes regrouped into a viable cluster out there.
Replication btw 2 clusters is asynchronous.
Edit: MySql can be also configured as a cluster, in this case it is CP, otherwise it is CA and Partition tolerance can be broken by having 2 masters.
I have a hazelcast cluster that populates a distributed IMap with data from a separate, remote (REST) service. I want to keep a local copy of the IMap data for HA/DR purposes so I implemented a file based MapStore.
It didn't work out like I expected. I noticed that each node stores what is probably only the items in the partition of that node, which isn't necessarily a problem, but I also noticed that after a restart of all the nodes, the IMap only contains the items from the disk of the first node that starts up.
I couldn't find a good explanation in the docs about how the MapStore is used throughout the lifecycle of the nodes in a cluster. Can someone explain?
The MapStore, as you figured out already, is called on nodes where the partition resides. Since the partition table is randomized on startup there is a very low chance to end up with the same partition distribution as on the last restart.
One way to work around this and still use your implementation is to introduce a distributed filesystem like Ceph or similar to store data files into. That way, no matter how partition table looks after the restart every node could read its partitions.
I'm currently evaluating HBase as a Datastore, but one question was left unanswered: HBase stores many copies of the same object on many nodes (aka replication). As HBase features so-called strong consistency (in constrast to eventual consistent) it guarantees that every replica returns the same value if read.
As I understood the HBase concept, when reading values, first the HBase master is queried for a (there must be more than one) RegionServer providing the data. Then I can issue read and write requests without invention of the master. How can then replication work?
How does HBase provide concistency?
How do write operations internally work?
Do write operations block until all replicas are written (=> synchronous replication). If yes, who manages this transfer?
How does HDFS come into the game?
I have already read the BigTable-Paper and searched the docs, but I found no further information on the architecture of HBase.
Thanks!
hbase does not do any replication in the way that you are thinking. It is built on top of HDFS, which provides replication for the data blocks that make up the hbase tables. However, only one regionserver ever serves or writes data for any given row.
Usually regionservers are colocated with data nodes. All data writes in HDFS go to the local node first, if possible, another node on the same rack, and another node on a different rack (given a replication factor of 3 in HDFS). So, a region server will eventually end up with all of its data served from the local server.
As for blocking: the only block is until the WAL (write ahead log) is flushed to disk. This guarentees that no data is lost as the log can always be replayed. Note that older version of hbase did not have this worked out because HDFS did not support a durable append operation until recently. We are in a strange state for the moment as there is no official Apache release of Hadoop that supports both append and HBase. In the meantime, you can either apply the append patch yourself, or use the Cloudera distribution (recommended).
HBase does have a related replication feature that will allow you to replicate data from one cluster to another.