Consul bootstrap-expect value - consul

I have a consul cluster which normally should have 5 servers and a bunch of clients. Our script to start the servers originally configured like this
consul agent -server -bootstrap-expect 5 -join <ips of all 5 servers>
However, we had to reOS all servers and bootstrap again -- one of our servers was down with hardware issues and the bootstrap no longer works.
My question is -- in a situation where there are 5 servers, but 3 are sufficient for quorum, should -bootstrap-expect be set to 3?
The documentation here https://www.consul.io/docs/agent/options.html#_bootstrap_expect seems to imply that -bootstrap-expect should be set to the total number of servers which means that even a single machine being down will prevent the cluster from bootstrapping
To be clear our startup scripts are static files, so when I say there are 5 servers it means that up to 5 could be started with the server tag.

In your case, if you don't explicitly need all 5 servers to be online during initial cluster setup, you should set -bootstrap-expect to 3. This will avoid situations similar to what happened i.e. you have 5 servers and you tell them they must wait for all 5 to be online, for initial cluster setup. As documentation suggests:
When provided, Consul waits until the specified number of servers are available and then bootstraps the cluster. This allows an initial leader to be elected automatically.
With --bootstrap-expect=3 as soon as 3 of your 5 Consul servers have joined cluster, the leader election will start, and in case last 2 join much later, cluster will function. And for that matter you can have any number of servers join at later time.

Related

Wildfly 11 - High Availability - Single deploy on slave

I have two servers in a HA mode. I'd like to know if is it possible to deploy an application on the slave server? If yes, how to configure it in jgroups? I need to run a specific program that access the master database, but I would not like to run on master serve to avoid overhead on it.
JGroups itself does not know much about WildFly and the deployments, it only creates a communication channel between nodes. I don't know where you get the notion of master/slave, but JGroups always has single* node marked as coordinator. You can check the membership through Channel.getView().
However, you still need to deploy the app on both nodes and just make it inactive if this is not its target node.
*) If there's no split-brain partition, or similar rare/temporal issues

Rethinkdb failover cluster with just 2 nodes

I would like to setup an IOT reporting database, with failover. The idea is to have a cluster with 2 nodes, one in a datacenter one at home.
If "home" looses internet connection, it continues to operate and upon
online status, "datacenter" would align to offline changes.
Now, I read the rethinkdb docs, that you need at least 3 nodes for a failover to function.
So the question is, is my scenario doable with just 2 nodes, if yes, how?
According to the docu https://www.rethinkdb.com/docs/start-a-server/
First, start RethinkDB on the first machine:
$ rethinkdb --bind all
Then start RethinkDB on the second machine:
$ rethinkdb --join IP_OF_FIRST_MACHINE:29015 --bind all

Can we run two 'slave' nodes on the same machines?

We are running a 3 node mesos cluster and mesos master is running on each node. Also, 2 slaves are running on each node. Is this a good practice? 2 slaves on each cluster won't be sending too much offer and end up being overloaded? What is the recommended config for 3 nodes cluster?
Thread from Mesos User Mailing List
It depends on your isolation setting (mainly cgroup, or any node level
resources). In general, we don't recommend folks use multiple agents on a
node.
It's possible to make it work by setting cgroup_root separately for
MesosContainerizer. For DockerContainerizer, currently, we hard code
DOCKER_NAME_PREFIX, making it not possible to use two agents on a node
properly.
Running Docker containers won't work properly because restarting one agent
will cause Docker containers managed by the other agent to be deleted.

ElasticSearch replication

Can somebody provide some instructions on how to configure ElasticSearch for replication. I am running ES in windows and understand that if I run the bat files multiple times on the same server, a separate instance of ES is started and they all connect to each other.
I will be moving to a production environment soon and will have a three node set up, each node being on a different server. Can someone point me at some documentation which gives me a bit more control of the replication set up.
Have a look at the discovery documentation. It works out-of-the-box with multicast discovery, even though you could have problems with firewalls etc., but I would recommend against it in production. I would rather use unicast and configure the host names of the nodes belonging to the cluster in the elasticsearch.yml. That way you make sure nobody is going to join the production cluster from his own machine.
One other thing I would do is configure a proper cluster name specific for every environment.
Replication is set to each index in Elasticsearch, not set to the server or node. That is, each index can have different number of replication setting. The number of replica setting is 1 by default.
The number of replication is not related or restricted to number of node set up. If the number of replication is greater than the number of data nodes, only the index health becomes yellow since some replications are not allocated, anything still works fine.
You can reference to the document for further information: https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-update-settings.html

tomcat 6 - Cluster / BackupManager

I have a question regarding Clustering (session replication/failover) in tomcat 6 using BackupManager. Reason I chose BackupManager, is because it replicates the session to only one other server.
I am going to run through the example below to try and explain my question.
I have 6 nodes setup in a tomcat 6 cluster with BackupManager. The front end is one Apache server using mod_jk with sticky session enabled
Each node has 1 session each.
node1 has a session from client1
node2 has a session from client2
..
..
Now lets say node1 goes down ; assuming node2 is the backup, node2 now has two sessions (for client2 and client1)
The next time client1 makes a request, what exactly happens ?
Does Apache "know" that node1 is down and does it send the request directly to node2 ?
=OR=
does it try each of the 6 instances and find out the hard way who the backup is ?
Not too sure about the workings with BackupManager, my reading of this good URL suggests the replication is intelligent enough in identifying the backup.
In-memory session replication, is
session data replicated across all
Tomcat instances within the cluster,
Tomcat offers two solutions,
replication across all instances
within the cluster or replication to
only its backup server, this solution
offers a guaranteed session data
replication ...
SimpleTcpCluster uses Apache Tribes to maintain communicate with the communications group. Group membership is established and maintained by Apache Tribes, it handles server crashes and recovery. Apache Tribes also offer several levels of guaranteed message delivery between group members. This is achieved updating in-session memory to reflect any session data changes, the replication is done immediately between members ...
You can reduce the amount of data by
using the BackupManager (send only to
one node, the backup node)
You'll be able to see this from the logs if notifyListenersOnReplication="true" is set.
On the other hand, you could still use DeltaManager and split your cluster into 3 domains of 2 servers each.
Say these will be node 1 <-> node 2, 3 <-> 4 and 5 <-> 6.
In such a case - configuring the domain worker attribute, will ensure that session replication will only happen within the domain.
And mod_jk then definitely knows which server to look on when node1 fails.
http://tomcat.apache.org/tomcat-6.0-doc/cluster-howto.html states
Currently you can use the domain
worker attribute (mod_jk > 1.2.8) to
build cluster partitions with the
potential of having a more scaleable
cluster solution with the
DeltaManager(you'll need to configure
the domain interceptor for this).
And a better example on this link:
http://people.apache.org/~mturk/docs/article/ftwai.html
See the "Domain Clustering model" section.

Resources