I read on cockroachdb docs the following:
"we can optimize query performance by making the same node both Raft leader and the Leaseholder"
But how can you set a node to function both as raft leader and leaseholder (what commands)? Did I miss it in some manual?
Edit / extra background info:
I have a couple of nodes in one datacenter (low latency). But I would like to start a node in a different datacenter (for safety). I don't want that node to function as a leader...
CockroachDB automatically ensures that the raft leader and leaseholder are colocated. There isn't anything manual to be done.
Related
I am implementing a Raft service using Hashicorp Raft library for distributed consensus.
https://github.com/hashicorp/raft
I have a simple layout with Raft, 1 followers and a leader.
I bootstrap the cluster with the leader and add 2 follower Raft nodes to the Raft cluster, things look fine. When I knock one of the followers offline, the leader gets:
failed to contact quorum of nodes, stepping down. The problem with this is now there are 0 nodes in leader state and no one can promote to leader because majority of votes required from quorum of nodes. Now because the previous leader is now a follower, even my service discovery tools can't remove the old ip address from the leader because it requires leader power to do so.
My cluster enters this infinite loop (deadlock) of trying to connect to a node that's offline forever and no one can get promoted to leader. Any ideas?
Edit: After realizing, I guess I need a system where there are an odd number of nodes to reach quorum. (ie 3 nodes, 1 gets knocked offline then I can tell the new leader to remove old IP address)
I don't have experience with that library, but:
"The problem with this is now there are 0 nodes in leader state and no one can promote to leader because majority of votes required from quorum of nodes".
With one of three nodes out, you still have quorum/majority. Raft would promote one of followers to leader.
The fact that your system stalled after you removed one of two followers tells me that one of followers was not added correctly in the first place.
You had one leader and one follower initially. Since that was working, it means that follower and the leader can communicate; all good here.
You have added second follower; how do you know that it was done correctly? Can you try to do it again, and knock out this second follower - the system should keep working as the leader and the first follower are ok.
So I can conclude, that the second follower did not join the cluster, and when you knocked out the first follower, the systems stopped, as it should - no more majority of correctly configured nodes are available.
The distributed value-store etcd uses the raft algorithm. The docs link to animations explaining: how the replica nodes vote to make one node the leader (to be the recipient of external write instructions), and thereafter the leader broadcasts all instructions to all nodes (attaching those instructions to a heartbeat signal that is bounced off of the other nodes, in a star topology, with confirmation after a majority acknowledge).
The replication obviously provides resilience (against failures of individual nodes), and presumably the read performance scales up with replica count.
Is it correct to understand that write performance is constant, and does not scale with replica count?
It is true. write requires majority of nodes to ack new entry in order to commit it. It may happen that write is even slower with increased number of replicas (it is as fast as slowest node out of quorum). In regards to read, you might find etcd docs about linearizability interesting. TL;DR; default reads also need quorum.
I am trying to set up cluster according to this manual
https://docs.opendaylight.org/en/stable-magnesium/getting-started-guide/clustering.html
But i also woukd like to know how does it work.
There is written that i choose which node/controller is leader and which will follow after
leader is down using member role - 1 - 2 in akka.config.
But in some work i have read it is using RAFT algorithm to choose/elect leader. Am i mixing it up somehow?
Can someone explain it to me please?
The nodes you specify when setting up an OpenDaylight cluster are all equal, there is no pre-selection of a leader. When the cluster starts up, the controller nodes will participate in a RAFT election to choose the leader.
I am leaning some basic concept of cluster computing and I have some questions to ask.
According to this article:
If a cluster splits into two (or more) groups of nodes that can no longer communicate with each other (aka.partitions), quorum is used to prevent resources from starting on more nodes than desired, which would risk data corruption.
A cluster has quorum when more than half of all known nodes are online in the same partition, or for the mathematically inclined, whenever the following equation is true:
total_nodes < 2 * active_nodes
For example, if a 5-node cluster split into 3- and 2-node paritions, the 3-node partition would have quorum and could continue serving resources. If a 6-node cluster split into two 3-node partitions, neither partition would have quorum; pacemaker’s default behavior in such cases is to stop all resources, in order to prevent data corruption.
Two-node clusters are a special case.
By the above definition, a two-node cluster would only have quorum when both nodes are running. This would make the creation of a two-node cluster pointless
Questions:
From above,I came out with some confuse, why we can not stop all cluster resources like “6-node cluster”?What`s the special lies in the two node cluster?
You are correct that a two node cluster can only have quorum when they are in communication. Thus if the cluster was to split, using the default behavior, the resources would stop.
The solution is to not use the default behavior. Simply set Pacemaker to no-quorum-policy=ignore. This will instruct Pacemaker to continue to run resources even when quorum is lost.
...But wait, now what happens if the cluster communication is broke but both nodes are still operational. Will they not consider their peers dead and both become the active nodes? Now I have two primaries, and potentially diverging data, or conflicts on my network, right? This issue is addressed via STONITH. Properly configured STONITH will ensure that only one node is ever active at a given time and essentially prevent split-brains from even occurring.
An excellent article further explaining STONITH and it's importance was written by LMB back in 2010 here: http://advogato.org/person/lmb/diary/105.html
Trying to launch a cluster of nodes one at a time, and I'm a bit confused about the bootstrap-expect value.
The way it is set up is that consul is launched with bootstrap-expect, then after it starts consul join is ran
Currently, the deployment sets bootstrap-expect have it set to the number of nodes in the cluster, and a leader is elected after that number.
However, when bootstrap-expect is set to 1 (thought process is so we can have a cluster without waiting for all the nodes), something strange happens.
So first, each node thinks it is the leader - which is expected since bootstrap-expect is set to 1. But after doing consul join to each other, a new cluster leader isn't elected - what happens is strange - each node in the cluster still thinks itself as a cluster leader.
Why don't the nodes, when joining a cluster, elect a new leader? Or at least respect the prexisting leader?
This is condition called Split Brain that you've "intentionally" created. Each server think's it's the leader and has it's own version of the log and each of these versions are not reconcilable with each other. Split Brain is famously hard to recover from. Since the Servers can not agree on what the Cluster State should be, they can't decide who the new leader should be, and they continue without a successful election. You can read up on Raft to learn more about why.