DRBD - automatic recover after disconnect - cluster-computing

I have High availability cluster that configured with DRBD resource.
Master/Slave Set: RVClone01 [RV_data01]
Masters: [ rvpcmk01-cr ]
Slaves: [ rvpcmk02-cr ]
I perform a test that disconnect one of the network adapter that connect between the DRBD network interfaces (for example shutdown the network adapter).
Now the cluster display statuses that everything o.k BUT the status of the DRBD when running "drbd-overview" shows in primary server:
[root#rvpcmk01 ~]# drbd-overview
0:drbd0/0 WFConnection Primary/Unknown UpToDate/DUnknown /opt ext4 30G 13G 16G 45%
and in the secondary server:
[root#rvpcmk02 ~]# drbd-overview
0:drbd0/0 StandAlone Secondary/Unknown UpToDate/DUnknown
Now I have few questions:
1. Why cluster doesn't know about the problem with the DRBD?
2. Why when I put the network adapter that was down to UP again and connect back the connection between the DRBD the DRBD didn't handle this failure and sync back the DRBD when connection is o.k?
3. I saw an article that talk about "Solve a DRBD split-brain" - https://www.hastexo.com/resources/hints-and-kinks/solve-drbd-split-brain-4-steps/
in this article it's explain how to get over a problem of disconnection and resync the DRBD.
BUT how I should know that this kind of problem exist?
I hope I explain my case clearly and provide enough information about what I have and what I need...

1) You aren't using fencing/STONITH devices in Pacemaker or DRBD, which is why nothing happens when you unplug your network interface that DRBD is using. This isn't a scenario that Pacemaker will react to without defining fencing policies within DRBD, and STONITH devices within Pacemaker.
2) You likely are only using one ring for the Corosync communications (the same as the DRBD device), which will cause the Secondary to promote to Primary (introducing a split-brain in DRBD), until the cluster communications are reconnected and realize they have two masters, demoting one to Secondary. Again, fencing/STONITH would prevent/handle this.
3) You can set up the split-brain notification handler in your DRBD configuration.
Once you have STONITH/fencing devices setup in Pacemaker, you would add the following definitions to your DRBD configuration to "fix" all the issues you mentioned in your question:
resource <resource>
handlers {
split-brain "/usr/lib/drbd/notify-split-brain.sh root";
fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";
...
}
disk {
fencing resource-and-stonith;
...
}
...
}
Setting up fencing/STONITH in Pacemaker is a little too dependent on your hardware/software for me to give you pointers on setting that up for your cluster. This should get you pointed in the right direction:
http://clusterlabs.org/doc/crm_fencing.html
Hope that helps!

Related

SLURM controller not being able to connect to workers and state is set as UNKNOWN

I am trying to setup a small cluster, managed with SLURM. The controller is also a compute node. The config in /etc/slurm/slurm.conf is:
NodeName=controller,node[01-02] RealMemory=250000 Sockets=1 CoresPerSocket=32 ThreadsPerCore=2 State=UNKNOWN
PartitionName=compute Nodes=ALL Default=YES MaxTime=INFINITE State=UP
When running sinfo I get:
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
compute* up infinite 2 unk* node[01-02]
compute* up infinite 1 idle controller
However, when running slurmd -C on each node I get:
NodeName=node01 CPUs=64 Boards=1 SocketsPerBoard=1 CoresPerSocket=32 ThreadsPerCore=2 RealMemory=257655
UpTime=0-00:30:44
The same on the other node. I have allowed the ports 6817 and 6818 (the default slurm ports) on all machines (for TCP - which I assume is the protocol). I have also checked that the /etc/slurm/slurm.conf and /etc/slurm/slurmdbd.conf are the same, along with the munge keys (this works).
Is there anyway to debug the connection to a given machine?
Thanks in advance for any help.
I was able to go through the log files and found out the connections were being blocked. The cluster is using Fedora and so I added each machine to the firewall trusted list using this link - whitelist source ip addresses in centos 7
These updated firewall settings did not seem to be applied straight away so I had to restart all machines and now SLURM is functioning correctly.

Defining servers from HA cluster to Consul

I have a cluster of (two) database servers (HA/ High Availability). My application connects to one of them (active) at a time. The other one remains passive and always ready to get connected when the active one fails over.
It’s a typical Windows cluster mechanism. Now I have a challenge to handle these two servers, but how can I let the my app know which one to be connected, since both (active & passive) ned to be registered in consul.

Service discovery cache update in the case of node failure

I am trying to adopt a service discovery mechanism for my system. I have a bunch of nodes and they will communicate with each other via gRpc. Because in some frameworks like Mesos, a new node is brought up after it fails would possibly has a different ip address and a different port, I am thinking of using service discovery so that each node can have a cluster config that is agnostic to node failure.
My current options are to using DNS or strongly-consistent key-value store like etcd or zookeeper. My problem is to understand how the cache of name mappings in healthy nodes get invalidated and updated when a node goes through down and up.
The possible ways I can think of are:
When healthy nodes detect a connection problem, they invalidate
their cache entry immediately and keep pulling the DNS registry
until the node is connectable again.
When a node is down and up, the DNS registry broadcasts the events to all healthy nodes. Seems this may require heartbeats from DNS registry.
The cache in each node has a TTL field and within a TTL interval each node has to live with the node failure until the cache entry expires and pulls from the DNS registry again.
My question is which option (you can name more) is the case in reality and why it is better than other alternatives?

ElasticSearch Multicast not working in Linode

I have 2 fresh Ubuntu Linodes in the same data centre with the same ES config except different node names. The cluster name is the same. They can each curl to each other's ElasticSearch server and there's no firewall yet in place, but multicast isn't working and I can't figure out why. They both elect themselves as master and nothing is logged about the other node or the cluster.
Is there any reason why multicast wouldn't work in an environment like this?
As Konstantin says in the comments, multicast is typically not supported in a multitenant environment, which makes sense, but still could have been useful for testing. Some more info here: http://blog.killtheradio.net/how-tos/keepalived-haproxy-and-failover-on-the-cloud-or-any-vps-without-multicast/
"The problem with multicast in reality is that most “cloud” (VPS) providers (AWS, Linode, Slicehost, Rackspace, etc) don’t support it on their networks. You can send a multicast message to a group, but your other machines listening on that group won’t hear it."
While there are workarounds, the simplest thing in this case is to switch to unicast.

Full Clustering in Apache Traffic Server

I followed the steps mentioned in the official documentation for full clustering of multiple ATS instances. I installed 2 instances of ATS on 2 different Ubuntu machines (having the same specs, OS versions and hardware), and both of these act as a reverse proxy for web service hosted on a Tomcat server in a different machine. I wasnt able to set up the cluster. Here are some of the queries that I have.
They are on the same switch or same VLAN : The two Ubuntu machines on which I installed the ATS are connected to the same switch. They have the same interface mentioned in the /etc/network/interfaces. Are these enough or there is something else that has to be done to get the clustering?.
Running the comment traffic_line -r proxy.process.cluster.nodes : This returned 1 after I ran the traffic_line -x and traffic_line -L commands. But, in the cluster.config file, there isnt any additions or changes.
Moreover, when I make a query to one of these ATS instances (I have mapped the URLs in the remap.config file), both of them cache the responses locally and is not shared across.
From this information, can anyone tell me if I am doing something wrong. Let me know if anymore info is required.
Are these on virtual machines? I almost wasted 2 days trying to figure out what is wrong, when I initially set it up on openvz containers. Out of a wild guess, I decided to migrate to 2 physical nodes, and it went well. See Apache Traffic Server Clustering not working
proxy.process.cluster.nodes returns 1
means that it is just the standalone single node, and the second node on the cluster is not discovered.
Try a tcp dump for multicast and broadcast messages. If the other server's IP is not showing in the discovery packet, it has something to do at the network level, where the netops might have disabled multicast packet forwarding across switches.

Resources