etcd - is the leader node automatically and dynamically elected? newbie question - etcd

It's been about several hours since I am actively looking into etcd and I have a newbie question:
Is the leader node dynamically and automatically elected or the "human" operator of the cluster must take action?
If yes, then what are the conditions that must be met in order to begin the election of a new leader node?
If no, then what should be done?

Related

Hashicorp Raft consensus deadlock state

I am implementing a Raft service using Hashicorp Raft library for distributed consensus.
https://github.com/hashicorp/raft
I have a simple layout with Raft, 1 followers and a leader.
I bootstrap the cluster with the leader and add 2 follower Raft nodes to the Raft cluster, things look fine. When I knock one of the followers offline, the leader gets:
failed to contact quorum of nodes, stepping down. The problem with this is now there are 0 nodes in leader state and no one can promote to leader because majority of votes required from quorum of nodes. Now because the previous leader is now a follower, even my service discovery tools can't remove the old ip address from the leader because it requires leader power to do so.
My cluster enters this infinite loop (deadlock) of trying to connect to a node that's offline forever and no one can get promoted to leader. Any ideas?
Edit: After realizing, I guess I need a system where there are an odd number of nodes to reach quorum. (ie 3 nodes, 1 gets knocked offline then I can tell the new leader to remove old IP address)
I don't have experience with that library, but:
"The problem with this is now there are 0 nodes in leader state and no one can promote to leader because majority of votes required from quorum of nodes".
With one of three nodes out, you still have quorum/majority. Raft would promote one of followers to leader.
The fact that your system stalled after you removed one of two followers tells me that one of followers was not added correctly in the first place.
You had one leader and one follower initially. Since that was working, it means that follower and the leader can communicate; all good here.
You have added second follower; how do you know that it was done correctly? Can you try to do it again, and knock out this second follower - the system should keep working as the leader and the first follower are ok.
So I can conclude, that the second follower did not join the cluster, and when you knocked out the first follower, the systems stopped, as it should - no more majority of correctly configured nodes are available.

Cluster in opendaylight

I am trying to set up cluster according to this manual
https://docs.opendaylight.org/en/stable-magnesium/getting-started-guide/clustering.html
But i also woukd like to know how does it work.
There is written that i choose which node/controller is leader and which will follow after
leader is down using member role - 1 - 2 in akka.config.
But in some work i have read it is using RAFT algorithm to choose/elect leader. Am i mixing it up somehow?
Can someone explain it to me please?
The nodes you specify when setting up an OpenDaylight cluster are all equal, there is no pre-selection of a leader. When the cluster starts up, the controller nodes will participate in a RAFT election to choose the leader.

set node as raft leader and leaseholder

I read on cockroachdb docs the following:
"we can optimize query performance by making the same node both Raft leader and the Leaseholder"
But how can you set a node to function both as raft leader and leaseholder (what commands)? Did I miss it in some manual?
Edit / extra background info:
I have a couple of nodes in one datacenter (low latency). But I would like to start a node in a different datacenter (for safety). I don't want that node to function as a leader...
CockroachDB automatically ensures that the raft leader and leaseholder are colocated. There isn't anything manual to be done.

Bootstrap expect=1 in consul results in weird behavior in cluster

Trying to launch a cluster of nodes one at a time, and I'm a bit confused about the bootstrap-expect value.
The way it is set up is that consul is launched with bootstrap-expect, then after it starts consul join is ran
Currently, the deployment sets bootstrap-expect have it set to the number of nodes in the cluster, and a leader is elected after that number.
However, when bootstrap-expect is set to 1 (thought process is so we can have a cluster without waiting for all the nodes), something strange happens.
So first, each node thinks it is the leader - which is expected since bootstrap-expect is set to 1. But after doing consul join to each other, a new cluster leader isn't elected - what happens is strange - each node in the cluster still thinks itself as a cluster leader.
Why don't the nodes, when joining a cluster, elect a new leader? Or at least respect the prexisting leader?
This is condition called Split Brain that you've "intentionally" created. Each server think's it's the leader and has it's own version of the log and each of these versions are not reconcilable with each other. Split Brain is famously hard to recover from. Since the Servers can not agree on what the Cluster State should be, they can't decide who the new leader should be, and they continue without a successful election. You can read up on Raft to learn more about why.

What cluster node should be active?

There is some cluster and there is some unix network daemon. This daemon is started on each cluster node, but only one can be active.
When active daemon breaks (whether program breaks of node breaks), other node should become active.
I could think of few possible algorithms, but I think there is some already done research on this and some ready-to-go algorithms? Am I right? Can you point me to the answer?
Thanks.
Jgroups is a Java network stack which includes DistributedLockManager type of support and cluster voting capabilities. These allow any number of unix daemons to agree on who should be active. All of the nodes could be trying to obtain a lock (for example) and only one will succeed until the application or the node fails.
Jgroups also have the concept of the coordinator of a specific communication channel. Only one node can be coordinator at one time and when a node fails, another node becomes coordinator. It is simple to test to see if you are the coordinator in which case you would be active.
See: http://www.jgroups.org/javadoc/org/jgroups/blocks/DistributedLockManager.html
If you are going to implement this yourself there is a bunch of stuff to keep in mind:
Each node needs to have a consistent view of the cluster.
All nodes will need to inform all of the rest of the nodes that they are online -- maybe with multicast.
Nodes that go offline (because of ap or node failure) will need to be removed from all other nodes' "view".
You can then have the node with the lowest IP or something be the active node.
If this isn't appropriate then you will need to have some sort of voting exchange so the nodes can agree who is active. Something like: http://en.wikipedia.org/wiki/Two-phase_commit_protocol

Resources