I have a service called workload
I need to run this service on 3 nodes in my primary node is node1 and nod2 & node3 are secondary.
I need to run service only on node1 if anything wrong happens then it should run one for the secondary node so that no interruption occurs for our service.
Related
I have a cluster of consul servers in two datacenters. each datacenter consists of 3 servers each. When I execute consul members -wan command I can see all 6 servers.
I want to separate these two into individual clusters and no connection between them.
I tried to use the command force-leave and leave as per the consul documentation:
https://www.consul.io/commands/force-leave: When I used this command
the result was a 500 - no node is found. I tried using the node name as server.datacenter, full FQDN of the server, IP of the server, none of them worked for me.
https://www.consul.io/commands/leave: When I used this command from
the node which I want to remove from the cluster, the response was
success but when I execute consul members -wan I still can see this
node.
I tried another approach where in I stopped the consul on the node I want to remove from cluster. Then executed the command: consul force-leave node-name. Then the command: consul members -wan showed this node as left. When I started the consul on this node, the node is back in cluster.
What steps am I missing here?
I think I solved the problem I had. I followed the instructions here:
https://support.hashicorp.com/hc/en-us/articles/1500011413401-Remove-WAN-federation-between-Consul-clusters
I'm trying to setup a two node system on separate linux servers. Based on the start multiple
instances documentation they are running the bind all switch when starting rethinkdb on the
primary server. then on the secondary they are using the job switch to point to the primary IP /
port
I would like to use the config file.
On my node1 (primary) I have the follow:
bind 192.168.1.177
canonical-address 192.168.1.177
on node2 (secondary) I have the following:
bind 192.168.1.178
canonical-address 192.168.1.178
join 192.168.1.177:29015
on startup node2 doesn't connect to node1. The only way to get it to work is on node1 add the join
and point it to the node2 IP/port. Is that correct? and example of the 2 configs would be
appreciated.
I am very new to consul , and has been reading about consul clustering recently. My understanding is , for each node (equivalent to a physical machine or VM), we will run a local consul agent (in client mode), hence any microservices running in that node will register itself thru this agent. but what happen if this one and only one agent is down, won't the microservices in that node unable to register anymore? Or should we expect more than one consul agent (in client mode) per node to handle such situation?
You are correct. If the Consul agent is down, the services on that host will not be able to register with the agent, and Consul will consider all services which were previously registered against the agent to be unavailable.
A very simple solution is to run Consul under a process manager like systemd, and configure systemd to restart the agent if the process unexpectedly fails. You can find an example systemd unit for this at https://learn.hashicorp.com/tutorials/consul/deployment-guide#configure-systemd. If Consul is installed from the HashiCorp Linux package repo (https://learn.hashicorp.com/tutorials/consul/get-started-install), this systemd unit will be included as part of the installation package.
After restarting my 3 masters in my DC/OS cluster, the DC/OS dashboard is showing 0 connected nodes. However from the DC/OS cli I see all 6 of my agent nodes:
$ dcos node
HOSTNAME IP ID
172.16.1.20 172.16.1.20 a7af5134-baa2-45f3-892e-5e578cc00b4d-S7
172.16.1.21 172.16.1.21 a7af5134-baa2-45f3-892e-5e578cc00b4d-S12
172.16.1.22 172.16.1.22 a7af5134-baa2-45f3-892e-5e578cc00b4d-S8
172.16.1.23 172.16.1.23 a7af5134-baa2-45f3-892e-5e578cc00b4d-S6
172.16.1.24 172.16.1.24 a7af5134-baa2-45f3-892e-5e578cc00b4d-S11
172.16.1.25 172.16.1.25 a7af5134-baa2-45f3-892e-5e578cc00b4d-S10`
I am still able to schedule tasks in Marathon both from the dcos cli and from the Marathon gui, they then are properly scheduled and executed on the agents. Also, from the mesos interface on :5050 I can see all of the agents in the slaves page.
I have restarted agent nodes and master nodes. I have also rerun the DC/OS GUI installer and run preflight check, which of course fails with an "already installed" error.
Is there a way to re-register the node with DC/OS GUI short of uninstalling/reinstalling a node?
For anyone who is running into this, my problem was related to our corporate proxy. In order to get the Universe working in my cluster I had to add proxy settings to /opt/mesosphere/environment. I then restarted the dcos-cosmos.service and life was good. However, upon server restart, dcos-history-service.service was now running with the new environment and was unable to resolve my local names with our proxy server. To solve, I added a NO_PROXY to the /opt/mesosphere/environment and DCOS dashboard is again happy.
My JBoss cluster can't Failover. The cluster with two nodes is managed by domain and the
httpd-2.2 realize the load balancing.
Steps to reproduce:
startup the cluster which include two nodes named node1 and node2
login. then find the request is forwarded to node1 .It's meaning that the node1 executes
the login and keep the session.
shutdown the node1
continue to execute the web operations
Then the web export the infomation about session is expired!
The log of node2 has exception like that:
[org.jboss.as.clustering.web.infinispan] JBAS010322,Failed to load
session ; java.lang.RuntimeException:Failed to load session
Attributes for session ;