How does CockroachDB sync/replicate data between nodes? - cockroachdb

Because it’s a distributed database, how does CockroachDB ensure that each node has access to the same data that all of the others do?

CockroachDB uses Raft (a consensus algorithm) to synchronize data across all nodes that host a range of data. For more detail, CockroachDB has a blog post about how they've made that work here: https://www.cockroachlabs.com/blog/consensus-made-thrive/

Related

Writing Data In Neo4J Causal Clustering Implementation

In a Neo4J Causal Clustering, implemented using 3 AWS EC2 Instances, should the data be written only to Leader Node or can it be written to any Follower Nodes?
Data should always be written to a core server, these writes must be acknowledged by a majority of core servers.
Reads can be from any the read replicas.
Using bookmarks, you are able to "read your own writes", thus making sure of showing consistent data.

Doubts on RDD Spark

I want to understand below things on RDD of Spark Concept.
is RDD just a concept of copying require data in some node's RAM from HDFS storage to speed up the execution?
if a file is splitted across the cluster then for a single flie, RDD brings all require data from other nodes?
if 2nd point is correct then how it decides which node's JVM it has to execute? how data locality works here?
The RDD is at the core of Apache Spark and it is a data abstraction for a distributed collection of objects. They are immutable distributed collection of elements of your data that can be stored in memory or disk across a cluster of machines. The data is partitioned across machines in your cluster that can be operated in parallel with a low-level API that offers transformations and actions. RDDs are fault tolerant as they track data lineage information to rebuild lost data automatically on failure. Ref: https://databricks.com/blog/2016/06/22/apache-spark-key-terms-explained.html
If a file is split across the cluster upon loading, the calculations are done on the nodes where the RDDs reside. That is, the compute is performed where the data resides (as well as it can) to minimize the need for performing shuffles. For more information concerning Spark and Data locality, please refer to: https://databricks.gitbooks.io/databricks-spark-knowledge-base/content/performance_optimization/data_locality.html.
Note, for more information about Spark Research, please refer to: http://spark.apache.org/research.html; more specifically, please refer to Zaharia et. al.'s paper: Resilient Distributed Datasets: A Fault-Tolerant Abstraction for
In-Memory Cluster Computing (http://people.csail.mit.edu/matei/papers/2012/nsdi_spark.pdf).

Elasticsearch architecture

Is there a way to sync multiple ES clusters with each other? The ES docs discourage from having a cluster spanning multiple data centers. So to avoid that I'd be having distinct ES clusters in each datacenter. I also need to have the same data indexed in each cluster.
One way to achieve that would be to send each document to each cluster. But issuing 'n' write requests seems unnecessary. Additionally, if some write requests fail, the clusters could potentially go out of sync.
Is there a way for a cluster to "subscribe" to changes in another cluster? Or send the writes to a master cluster (whichever one is the closest to the data source) and let it eventually replicate to the other ones?
edit: I've read about tribe nodes. The docs say that it works just for reads and has some limitations. Is that something that would let me do this?
You can set up custom routing/allocation strategy on datacenter id [1]. This will ensure that one replica of the shard goes into each data center. Example
cluster.routing.allocation.awareness.force.dc.values: dc1,dc2
cluster.routing.allocation.awareness.attributes: dc
[1] https://www.elastic.co/guide/en/elasticsearch/reference/1.6/modules-cluster.html

Google Cloud Bigtable Durability/Availability Guarantees

I would like someone from Google to provide some guidelines on the durability and availability guarantees provided by the Cloud Bigtable service.
Here is my understanding so far:
The fact that the minimum cluster requires 3 nodes suggests that, at least within a zone, the data is highly durable and replicated to 3 nodes.
However, this answer by a Googler states that "Cloud Bigtable doesn’t replicate data" — directly contradicting the quote on the Cloud Bigtable homepage which claims it "is built with a replicated storage strategy". So which is it? Is it replicated or not? And if so, how many copies are kept?
The fact that clusters can only be set up within a particular zone suggests that the availability of a cluster is tied directly to the availability of that zone. So if I want to have a highly available Bigtable-based data storage, would it be best practice to set up independent clusters across multiple zones and handle the synchronisation of writes across the clusters myself?
There is no information on whether Bigtable clusters across zones are independent or not. If I were to set up clusters across multiple zones, and one zone goes down, could we expect the clusters in other zones to carry on working? Or is there some underlying single point of failure which could impact clusters even across zones?
Compared to the App Engine datastore which is very specific about these details, the Cloud Bigtable documentation is rather lacking — or, at least, I've not managed to find a page which goes into detail on these aspects.
The Cloud Bigtable docs are similarly vague on other aspects, e.g. on the matter of size limits for values, the documentation states that individual values should stay below "~10 MB per cell". What on earth does "~10 MB" mean?! Can I hardcode a limit of exactly 10MB and expect it to always work or will that change from day to day dependent on unknown factors?
Anyway, apologies if I sound agitated. I genuinely would like to use the Bigtable service. But I, like presumably many others, need to understand the durability/availability aspects of it before being able to invest in it. Thank you.
On replication:
The answer you referenced is referring to replication of data across Bigtable Clusters, which is not supported at this time. (For example, a Bigtable Cluster in the United States replicating its writes to a second Cluster in Europe)
This concept is separate from replication of data within a Bigtable cluster, which is analogous to replication in HDFS, which is something that the product absolutely does today.
On availability:
Yes, the availability of a Bigtable Cluster is tied to the availability of a Google Cloud Zone.
On Independence:
Yes, Cloud Bigtable clusters are independent across zones. An outage in one zone should not impact the availability of other zones.
On data per cell:
We do not reject writes >10Mb per cell, we have this set as a guideline for getting optimal performance.

HBase: How does replication work?

I'm currently evaluating HBase as a Datastore, but one question was left unanswered: HBase stores many copies of the same object on many nodes (aka replication). As HBase features so-called strong consistency (in constrast to eventual consistent) it guarantees that every replica returns the same value if read.
As I understood the HBase concept, when reading values, first the HBase master is queried for a (there must be more than one) RegionServer providing the data. Then I can issue read and write requests without invention of the master. How can then replication work?
How does HBase provide concistency?
How do write operations internally work?
Do write operations block until all replicas are written (=> synchronous replication). If yes, who manages this transfer?
How does HDFS come into the game?
I have already read the BigTable-Paper and searched the docs, but I found no further information on the architecture of HBase.
Thanks!
hbase does not do any replication in the way that you are thinking. It is built on top of HDFS, which provides replication for the data blocks that make up the hbase tables. However, only one regionserver ever serves or writes data for any given row.
Usually regionservers are colocated with data nodes. All data writes in HDFS go to the local node first, if possible, another node on the same rack, and another node on a different rack (given a replication factor of 3 in HDFS). So, a region server will eventually end up with all of its data served from the local server.
As for blocking: the only block is until the WAL (write ahead log) is flushed to disk. This guarentees that no data is lost as the log can always be replayed. Note that older version of hbase did not have this worked out because HDFS did not support a durable append operation until recently. We are in a strange state for the moment as there is no official Apache release of Hadoop that supports both append and HBase. In the meantime, you can either apply the append patch yourself, or use the Cloudera distribution (recommended).
HBase does have a related replication feature that will allow you to replicate data from one cluster to another.

Resources