which port is used to gossip in akka actor cluster by the cluster members? - actor

I am trying to debug an issue where cluster island are getting formed.
checking if there is an network issue.
which port is used to gossip in akka actor cluster by the cluster members ?

It depends on your configuration - it's the TCP port that you set up Akka to listen on. For more details refer to Cluster configuration docs (see the port and seed-node values).
Broader hints on cluster partitions:
If you ended up having a cluster partition in an Akka cluster it most likely means you're using the auto-downing feature. It's not recommended for production use, as it's rather fleaky - it relies on a simple timeout based mechanism. For more advanced downing mechanisms you can look into using the Split Brain Resolver commercial tooling, or build a downing mechanism yourself which would hook into external monitoring infrastructure (we've seen a number of teams do this).
An interesting thought to keep in mind is that perhaps you do not need auto-downing at all, and when leaving the cluster with a node you can do so cleanly by issuing Cluster.leave(address) in the code.

Related

Vertx clustering alternative

Anyone with real-world experience of Vertx cluster managers other than Hazelcast have advice on our requirement below?
For our (real time sensor data) system we have hundreds of verticles in multiple JVM's, but we do not need, or want, the eventbus to span multiple physical servers.
We're running Vertx on multiple servers but our platform is less complex if we don't pool a single eventbus between all of them (we prefer to be explicit about passing messages between servers).
Hazelcast is the wrong cluster manager for us. We don't need its peer discovery between servers, but crucially any release change of Hazelcast means that new clients cannot join a cluster with existing running clients running the previous version so bringing up one new verticle compiled with vertx 3.6.3 into an existing cluster is not possible unless we stop the entire cluster and restart it with all the verticles recompiled to 3.6.3. This seriously impacts our development. It's helpful for the verticles to be more plug-and-play and vertx can do that but Hazelcast can't (due to constant version incompatibilities).
Can anyone recommend a vertx cluster manager that fits our use case?
I've now had time to review each of the alternatives Vertx directly supports as a 'cluster manager' (Hazelcast, Zookeeper, Ignite, Infinispan) and we're proceeding with a Zookeeper architecture for our system, replacing Hazelcast:
Here's the background to our decision:
We started as a fairly typical (if there is such a thing) Vertx development with multiple verticles in a JVM responding to external events (urban sensor data entering our java/vertx feed handlers) published on the eventbus and the data being processed asynchronously in many other vertx verticles, often involving them publishing new derived data as new asynchronous messages.
Quite quickly we wanted to use multiple JVM's, mainly to isolate the feedhandlers from the rest of the code so if things broke the feedhandlers would keep running (as a failsafe they're persisting the data as well as publishing it). So we added (easily) Vertx clustering so the JVM's on the same machine could communicate and all verticles could publish/subscribe messages in the same system. We used the default cluster manager, Hazelcast, and modified the config so the vertx clustering is limited to the single server (we run multiple versions of the entire platform on different servers and don't want them confusing each other). We have hundreds of verticles in half-a-dozen JVM's.
Our environment (search SmartCambridge vertx) is fairly dynamic with rapid development cycles (e.g. to create a new feedhandler and have it publishing its data on the eventbus) and that means we commonly wish to start up a JVM containing these new verticles and have it join an existing vertx cluster, maybe permanently, maybe just for a while. Vertx/Hazelcast has joining a (vertx) cluster as a fairly serious operation, i.e. Hazelcast has (I believe) a concept of Hazelcast cluster members and Hazelcast clients, where clients can come and go easily but joining a Hazelcast cluster as a member requires considerable code compatibility between the existing cluster and the new member. Each time we upgraded our Vertx library the Hazelcast library version would change and this made it impossible for a newly compiled vertx verticle to join an existing vertx cluster.
Note we have experimented with having the Vertx eventbus flow between multiple servers, and also extend the eventbus into the browser/javascript, but in both cases have found it simpler/more robust to be explicit about routing messages from server to server and have written verticles specifically for that purpose.
So the new plan (after several years of Vertx development), given our environment of 5 production/development servers but with the vertx eventbus always limited to single servers, is to implement a single Zookeeper cluster across all 5 servers so we get the Zookeeper native resilience goodness, and configure each production server to use a different znode root (the default is 'io.vertx' but this is a simple config option).
This design has an attractive simple minimum build on a single server (i.e Zookeeper + Vertx) so ad-hoc development on a random machine (e.g. laptop) is still possible but we can extend our platform to have multiple servers in a single vertx cluster trivially by setting a common znode root.

What exactly is a 'node' in Redis

I'm reading around Redis at the moment and trying to find a good understanding of what a 'node' is terms of how Redis works. Am I right to think of it in the same was as an endpoint?
In Redis' context, a node is a server running one or more redis-server processes.
Endpoint is a network address through which you can access one or more such processes, depending on how Redis is clustered.
When using the open source Redis cluster, an endpoint is any of the processes - meaning a node's address and the port that the process listens to. Redis client libraries use the protocol to interrogate the clustered redis-server process about other members of the cluster (again, processes listening on ports on nodes), so they can establish connections to other endpoints accordingly.
Disclaimer: it appears that you're asking about AWS ElastiCache, which may or may not be using the OSS implementation in whole or partially. I do not claim to have any knowledge on that subject.
Its a type of (temporary memory [RAM]) to which network is attached. Its the smallest unit where frequently accessed data is stored by following lazy loading or write through strategy. A collection of such nodes ,where a predefined Redis process is running on each node , is called cluster.
More on node :
https://redis.io/commands/cluster-nodes/

What's the difference between using the ActiveMQ MasterSlave Discovery and Shared Config?

On the activemq MasterSlave page, they introduce a few ways for setting that up using either JDBC, Shared File, or LevelDB Store.
However, on the Network of Brokers page, they talk about the MasterSlave Discovery without the need of setting up one of the shared configuration (JDBC, File, or LevelDB Store).
<networkConnectors>
<networkConnector uri="masterslave:(tcp://host1:61616,tcp://host2:61616,tcp://..)"/>
</networkConnectors>
What are the differences between using the MasterSlave Discovery and Shared Configuration? When should I should one or the other?
JDBC, Shared File or Replicated LevelDB are all options to create a high available persistance store that can be access by masters and it's slave(s). Note that LevelDB store is not Shared, but replicated.
If you want to connect a broker via network connection (network of brokers) to another logical broker that consists of a master and a slave, the masterslave: uri prefix is a shorthand for the failover prefix with less typing.
So, MasterSlave Discovery and Shard Configuration are totally different things.
What you should compare is instead a shared persistence store (JDBC, Shared file) vs a replicated LevelDB store (share nothing). The later will allow you to setup totally independent brokers that act as a failover cluster, without the need to share a disk or database.
one issue if you are using the masterslave discovery uri
that is the cpu is high usage (>90%)
the workaround way
There is an interesting discussion going on ActiveMQ user forum about the same : http://activemq.2283324.n4.nabble.com/Avoiding-shared-state-between-master-and-slave-brokers-td4686401.html
I am also confused about this :
Is there any way to achieve shared nothing fully replicated configuration in a network of brokers wherein there is only one master at a time and all clients are connected to this one instance (with support for reelection of new master when current master goes away)?

Should Apache Kafka and Apache Hadoop share the same ZooKeeper instance?

Is it possible to use same ZooKeeper instance for coordinating Apache Kafka and Apache Hadoop clusters? If yes, what would be the appropriate configuration of ZooKeeper?
Thanks!
Yes, as far as my understanding goes, ideally there should be a single zookeeper cluster with dedicated machines for managing the co-ordination between different application in a distributed system. i would try to share few points here
The zookeeper cluster consisting of several servers are typically called ensemble and basically manages to track and share states of your application.e.g Kafka uses it to commit offset changes to it so that in case of failure it can identify from where to start again. from the doc page :
Like the distributed processes it coordinates, ZooKeeper itself is intended to be replicated over a sets of hosts(ensemble). whenever a change is made, it is not considered successful until it has been written to a quorum (at least half) of the servers in the ensemble.
Now
Imagine both Kafka & Hadoop are having a dedicated cluster of 3 zookeeper servers each, in case couple of nodes get down in any of the two clusters it will result a service failure (ZK works based on simple majority voting, so it will tolerate up to 1 node failure keeping the service alive but not 2 ) . Instead if there is One Single cluster of 5zk servers managing both the applications and in case two of the nodes are down you still have the service available.Not only this offer better reliability also it reduces the hardware expenses as instead of managing 6 servers you only have to take care of 5.

Discovery service and Leader election algorithm

I've been doing some research for enhancement of in-house Discovery Service on my project. We have a number of nodes in a cluster accountable for discovery service, higly available. In order to get access to some service each client app sends a multicast message to all these nodes in the cluster. All nodes respond to a client and the very first response defines a particular node for further work. This is an overhead and I'm thinking of using some kind of leader election algorithm where only a single leader responds to clients. Is it reasonable to use such an algorithm for this task?
I think what you are trying to do is load balance across multiple machines where in any machine can handle the requests. Leader selection etc seems a overhead. Probably a loadbalancer can solve the issue.

Resources