Defining servers from HA cluster to Consul - consul

I have a cluster of (two) database servers (HA/ High Availability). My application connects to one of them (active) at a time. The other one remains passive and always ready to get connected when the active one fails over.
It’s a typical Windows cluster mechanism. Now I have a challenge to handle these two servers, but how can I let the my app know which one to be connected, since both (active & passive) ned to be registered in consul.

Related

Connecting one server from single machine multiple times vs connecting one server from multiple machines

I am setting up load test for SaaS platform.
I want to benchmark 20 clients connect to SaaS Platform and push some data.
Each client can send maximum of 2 MB and average of 200 Bytes data to SaaS endpoint.
Which one testing is better setting up 20 Clients on single machine or setting up 20 Clients on 5 different machines?
I want to know from TCP stack point of view?
Now when we run 20 clients in a single machine, it will create connection to same destination address and destination port but 20 different Source Ports.
However in background will it use same TCP connection to push data of 20 clients?
From "TCP stack" point of view one "client" == one "connection". If the server doesn't have any background logic to check source IP address in order to restrict requests rate - you can go for a single machine.
See Connection establishment for more information.
In general you need to mimic real life usage of the SaaS platform by end users (or upstream/downstream systems) as close as possible so carefully choose a load testing tool which can produce the same network footprint in terms of creating connection, re-using it and keeping it alive

Nifi cluster configuration

I am currently running one cluster with two nodes on one VM, both nodes are listing to different Ports, one is on port 80 and one is on port 81. My firewall is configured to allow port 80 communications through. With that being said, if I disconnect the node of port 80 the UI sends me this message “This node is currently not connected to the cluster. Any modifications to the data flow made here will not replicate across the cluster.” But the process in the background connects to the new node and keeps running normally, and the Canvas (UI) bugs out and I get a “disconnection message in the top left of the screen, where it would usually show you how many nodes you have running, but if I disconnect the node on port 81, everything runs smoothly. Not sure if both nodes need to be on the same Port or not, or if both nodes need to be on the same port but different VMs. Can anyone help?
Apache NiFi 1.x clustering follows a zero-master design. Each of the cluster nodes runs an active NiFi process, and each runs the web and API server on their port (80 and 81 here). Because you are running the two processes on the same physical machine, they require different ports.
As you communicate with the NiFi process on port 80 - changing the flow, starting/stopping processors, etc - it will coordinate these changes with the NiFi process on port 81. If you connected to the UI on port 81, you would see your changes reflected, and you would also be able to make updates that are coordinated across the cluster.
If you remove a node from the cluster, this coordination no longer involves that node.
Typically, you would expose the web UI/API port of each of the cluster nodes, so that if one node fails or is disconnected, you can continue to administer the cluster through any other active, healthy node.

Service discovery cache update in the case of node failure

I am trying to adopt a service discovery mechanism for my system. I have a bunch of nodes and they will communicate with each other via gRpc. Because in some frameworks like Mesos, a new node is brought up after it fails would possibly has a different ip address and a different port, I am thinking of using service discovery so that each node can have a cluster config that is agnostic to node failure.
My current options are to using DNS or strongly-consistent key-value store like etcd or zookeeper. My problem is to understand how the cache of name mappings in healthy nodes get invalidated and updated when a node goes through down and up.
The possible ways I can think of are:
When healthy nodes detect a connection problem, they invalidate
their cache entry immediately and keep pulling the DNS registry
until the node is connectable again.
When a node is down and up, the DNS registry broadcasts the events to all healthy nodes. Seems this may require heartbeats from DNS registry.
The cache in each node has a TTL field and within a TTL interval each node has to live with the node failure until the cache entry expires and pulls from the DNS registry again.
My question is which option (you can name more) is the case in reality and why it is better than other alternatives?

Can Marathon assign the same randomly selected host_port across instances?

For my containerized application, I want to Marathon to allocate the same host_port for the container's bridge network endpoint for all instances of that application. Specifying the host port runs the risk of resource exhaustion. Not specifying it will cause a random port to be picked for each instance.
I dont mind a randomly picked port so long as it is identical across all instances of my application. Is there a way to request Marathon to pick such a host port for my container endpoint.
I think what you are really after is service discovery / load balancing. Have a look at the Marathon docs at
https://mesosphere.github.io/marathon/docs/service-discovery-load-balancing
to get an overview.
Also, see the Docker networking docs at
https://mesosphere.github.io/marathon/docs/native-docker.html
You can probably either make use of the hostPort or the more general ports properties.

Openfire Cluster Hazelcast Plugin Issues

Windows Server 2003R2/2008R2/2012, Openfire 3.8.1, Hazelcast 1.0.4, MySQL 5.5.30-ndb-7.2.12-cluster-gpl-log
We've set up 5 servers in Openfire Cluster. Each of them in a different subnet, subnets are located in different cities and interconnected with each other through VPN routers (2-8 Mbps):
192.168.0.1 - node0
192.168.1.1 - node1
192.168.2.1 - node2
192.168.3.1 - node3
192.168.4.1 - node4
Openfire configured to use MySQL database which is successfully replicating from the master node0 to all slave nodes (each node uses it's own local database server, functioning as slave).
In Openfire Web Admin > Server Manager > Clustering we are able to see all cluster nodes.
Openfire custom settings for Hazelcast:
hazelcast.max.execution.seconds - 30
hazelcast.startup.delay.seconds - 3
hazelcast.startup.retry.count - 3
hazelcast.startup.retry.seconds - 10
Hazelcast config for node0 (similar on other nodes except for interface section) (%PROGRAMFILES%\Openfire\plugins\hazelcast\classes\hazelcast-cache-config.xml):
<join>
<multicast enabled="false" />
<tcp-ip enabled="true">
<hostname>192.168.0.1:5701</hostname>
<hostname>192.168.1.1:5701</hostname>
<hostname>192.168.2.1:5701</hostname>
<hostname>192.168.3.1:5701</hostname>
<hostname>192.168.4.1:5701</hostname>
</tcp-ip>
<aws enabled="false" />
</join>
<interfaces enabled="true">
<interface>192.168.0.1</interface>
</interfaces>
These are the only settings changed from default ones.
The problem is that XMPP clients are authorizing too long, about 3-4 minutes, after authorization other users in roster are inactive for 5-7 minutes, during this time logged in user in Openfire Web Admin > Sessions is marked as Offline. Even after user is able to see other logged in users as active, messages are not delivered, or delivered after 5-10 minutes or after few Openfire restarts...
We appreciate any help. We spent about 5 days trying to set up this monster, and are out of any ideas... :(
Thanks a lot in advance!
UPD 1: Installed Openfire 3.8.2 alpha with Hazelcast 2.5.1 Build 20130427 same problem
UPD 2: Tried starting the cluster on two servers that are in the same city, separated by probably 1-2 hops # 1-5ms ping. Everything works perfectly! Then we stopped one of those servers and started one in another city (3-4 hops # 80-100 ms ping) the problem occured again... Slow authorizations, logged off users in roster, messages are not delivered on time etc.
UPD 3: Installed Openfire 3.8.2 without JRE, and Java SDK 1.70_25.
Here are JMX screenshots:
node 0:
node 1:
Red line is the first client connection (after Openfire restart). Tested on two users. Same thing... First user (node0) connected instantly, second user (node1) spent 5 seconds on connection.
Rosters have been showing offline users on both sides for 20-30 seconds, then online users start appearing in them.
First user sends message to second user. Second user waits for 20 seconds, then receives first message. Reply and all other messages are transfered instantly.
UPD 4:
Durring the diggin through JConsole "Threads" tab we've discovered these various states:
For example hz.openfire.cached.thread-3:
WAITING on java.util.concurrent.SynchronousQueue$TransferStack#8a5325
Total blocked: 0 Total waited: 449
Maybe this could help... We actually don't know where to look for.
Thanks!
[UPDATE] Note per the Hazelcast documentation - WAN replication is supported in their enterprise version only, not in the community version that is shipped with Openfire. You must obtain an enterprise license key from Hazelcast if you would like to use this feature.
You may opt to setup multiple LAN-based Openfire clusters and then federate them using the S2S integration across separate XMPP domains. This is the preferred approach for scaling up Openfire for a very large user base.
[Original post follows]
My guess is that the longer network latency in your remote cluster configuration might be tying up the Hazelcast executor threads (for queries and events). Some of these events and queries are invoked synchronously within an Openfire cluster. Try tuning the following properties:
hazelcast.executor.query.thread.count (default: 8)
hazelcast.executor.event.thread.count (default: 16)
I would start by setting these values to 40/80 (5x) respectively to see if there is any improvement in the overall application responsiveness, and potentially even higher based on your expected load. Additional Hazelcast settings (including other thread pools) plus instructions for adding these properties into the configuration XML can be found here:
Hazelcast configuration properties
Hope that helps ... and good luck!

Resources