Anyone with real-world experience of Vertx cluster managers other than Hazelcast have advice on our requirement below?
For our (real time sensor data) system we have hundreds of verticles in multiple JVM's, but we do not need, or want, the eventbus to span multiple physical servers.
We're running Vertx on multiple servers but our platform is less complex if we don't pool a single eventbus between all of them (we prefer to be explicit about passing messages between servers).
Hazelcast is the wrong cluster manager for us. We don't need its peer discovery between servers, but crucially any release change of Hazelcast means that new clients cannot join a cluster with existing running clients running the previous version so bringing up one new verticle compiled with vertx 3.6.3 into an existing cluster is not possible unless we stop the entire cluster and restart it with all the verticles recompiled to 3.6.3. This seriously impacts our development. It's helpful for the verticles to be more plug-and-play and vertx can do that but Hazelcast can't (due to constant version incompatibilities).
Can anyone recommend a vertx cluster manager that fits our use case?
I've now had time to review each of the alternatives Vertx directly supports as a 'cluster manager' (Hazelcast, Zookeeper, Ignite, Infinispan) and we're proceeding with a Zookeeper architecture for our system, replacing Hazelcast:
Here's the background to our decision:
We started as a fairly typical (if there is such a thing) Vertx development with multiple verticles in a JVM responding to external events (urban sensor data entering our java/vertx feed handlers) published on the eventbus and the data being processed asynchronously in many other vertx verticles, often involving them publishing new derived data as new asynchronous messages.
Quite quickly we wanted to use multiple JVM's, mainly to isolate the feedhandlers from the rest of the code so if things broke the feedhandlers would keep running (as a failsafe they're persisting the data as well as publishing it). So we added (easily) Vertx clustering so the JVM's on the same machine could communicate and all verticles could publish/subscribe messages in the same system. We used the default cluster manager, Hazelcast, and modified the config so the vertx clustering is limited to the single server (we run multiple versions of the entire platform on different servers and don't want them confusing each other). We have hundreds of verticles in half-a-dozen JVM's.
Our environment (search SmartCambridge vertx) is fairly dynamic with rapid development cycles (e.g. to create a new feedhandler and have it publishing its data on the eventbus) and that means we commonly wish to start up a JVM containing these new verticles and have it join an existing vertx cluster, maybe permanently, maybe just for a while. Vertx/Hazelcast has joining a (vertx) cluster as a fairly serious operation, i.e. Hazelcast has (I believe) a concept of Hazelcast cluster members and Hazelcast clients, where clients can come and go easily but joining a Hazelcast cluster as a member requires considerable code compatibility between the existing cluster and the new member. Each time we upgraded our Vertx library the Hazelcast library version would change and this made it impossible for a newly compiled vertx verticle to join an existing vertx cluster.
Note we have experimented with having the Vertx eventbus flow between multiple servers, and also extend the eventbus into the browser/javascript, but in both cases have found it simpler/more robust to be explicit about routing messages from server to server and have written verticles specifically for that purpose.
So the new plan (after several years of Vertx development), given our environment of 5 production/development servers but with the vertx eventbus always limited to single servers, is to implement a single Zookeeper cluster across all 5 servers so we get the Zookeeper native resilience goodness, and configure each production server to use a different znode root (the default is 'io.vertx' but this is a simple config option).
This design has an attractive simple minimum build on a single server (i.e Zookeeper + Vertx) so ad-hoc development on a random machine (e.g. laptop) is still possible but we can extend our platform to have multiple servers in a single vertx cluster trivially by setting a common znode root.
Related
We've two Queue Servers, both attached to the application. so far Server 1 receives all the queued jobs and processed it. I would like to set up a Cluster so that the load is spread across 2 servers. anyone can suggest how to setup a cluster.
Thanks.
Beanstalkd doesn't offer this feature.
Alternatives are:
you setup a soft sharding to route requests to queue A or B
you can use alternatives like Redis Queue, Cloud Pub/Sub from Google Cloud Platform
Beanstalkd in a single instance setup can support multiple TCP connections, and generally outperforms Redis. Below are a few benchmarks, however benchmarks are subjective.
Benchmarks References
https://ph4r05.deadcode.me/blog/2017/12/16/laravel-queueing-benchmark.html
https://adam.herokuapp.com/past/2010/4/24/beanstalk_a_simple_and_fast_queueing_backend/
So a vertical scaling is usually sufficient.
The problem however, is availability, when the single beanstalkd instance goes away.
You can checkout coolbeans, this project is in alpha. It provides a replicated beanstalkd https://github.com/1xyz/coolbeans
I am developing a series of microservices using Spring Boot and plan to deploy them on Kubernetes.
Some of the microservices are composed of an API which writes messages to a kafka queue and a listener which listens to the queue and performs the relevant actions (e.g. write to DB etc, construct messsages for onward processing).
These services work fine locally but I am planning to run multiple instances of the microservice on Kubernetes. I'm thinking of the following options:
Run multiple instances as is (i.e. each microservice serves as an API and a listener).
Introduce a FRONTEND, BACKEND environment variable. If the FRONTEND variable is true, do not configure the listener process. If the BACKEND variable is true, configure the listener process.
This way I can start scale how may frontend / backend services I need and also have the benefit of shutting down the backend services without losing requests.
Any pointers, best practice or any other options would be much appreciated.
You can do as you describe, with environment variables, or you may also be interested in building your app with different profiles/bean configuration and make two different images.
In both cases, you should use two different Kubernetes Deployments so you can scale and configure them independently.
You may also be interested in a Leader Election pattern where you want only one active replica if it only make sense if one single replica processes the events from a queue. This can also be solved by only using a single replica depending on your availability requirements.
Our axon backed service runs on several nodes. Our event processors are tracking (1 segment, thus active on one node). If I subscribe to a query on node A and the event that should trigger the update is handled on node B, node A will miss this.
Is this by design or should this work and am I misconfiguring the application?
In case of the former, what could we do to implement a likewise functionality in the most axon idiomatic manner?
(currently we poll the data source / projection directly for x seconds)
The QueryBus you are using is a SimpleQueryBus which stays within a single JVM, always.
If you need a distributed version of the QueryBus, you should turn towards using Axon Server as the centralized means to route queries between your nodes.
Note that although you could create this yourself, people have tried to do so (as shown in this Pull Request on the framework) and decided against it in favor of the optimizations made in Axon Server.
So, in short, I am assuming you are currently excluding the Axon Server connector.
Thus the framework gives you the SimpleQueryBus, which is indeed designed to not span several nodes.
And lastly, the quickest way to achieve distributed routing of queries is to use Axon Server.
Recently I started learning Redis and have been able to do everything from learning aspect in 32 bit Windows. I am a .net developer and made caching available using Redis using ServiceStack client in a Web API setup. I have been able to successfully run a Redis cluster of 4 masters and 4 slaves, and was wondering how can I make that work in conjunction with the ServiceStack client.
My main concern is that if the master that I connect my client to, goes down, then how can the client automatically connect to some other available slave that takes over, as the port of that slave is going to be different. So failover is working at Redis level, but how the client handles it?
I recreated the mentioned scenario, using Redis Command Line Interface, but when I took the master down, the interface just stopped responding, as in everything was just going in a blackhole. So, per my experience, the cli does not automatically handles failover as a client.
I have started studying StackExchange's client to Redis, but still have the same question.
I am using Redis distribution given by Microsoft for learning purposes available at Github (Sorry, cannot provide link as I am new here and do not have sufficient reputation points).
Redis Sentinel are additional Redis processes which monitor the health of your Redis Master/Slaves and takes care of performing Automatic Failover when it detects that your Master instance is down. The Redis Config project provides a quick way to setup a popular Redis Sentinel Configuration.
The ServiceStack.Redis Client supports Redis Sentinel and implements the Recommended client Strategy which is what enables it to automatically recover after a failover by asking one of the Sentinels for the next available address to connect to, resuming operations with one of the available instances.
You can learn more about Redis Sentinel in the official Documentation.
Does any body have some info, links, pointer on how is cross process Eventbus communication is occurring. Per documentation I am concluding that multiple Vert.x (thus separate JVM processes) could be clustered on and communicate via Eventbus. However, there are little to none documentation on how to achieve it.
Looking into DOCs, I can see that publish/registerHandler methods take address as a String what works within a process, but I can not wrap my head around on how it works cross processes and how to register and publish to address, does it work over HTTP , TCP ? From API perspective do I need to pass port and process signature ?
Cross process communication happens via the EventBus. Multiple vertx instances can be started up and clustered to allow separate instances on the same or other machines to communicate. The low level clustering is handled by Hazelcast.The configuration is handled by the cluster.xml file in the conf folder of your vertx install. You can learn more about the format of the file by looking at the Hazelcast Docs. It is transparent to your handers and works over TCP.
You can test it by running two or more instances on your local machine once they are started with the -cluster flag. Look at the example being run, and the config changes required in How to use eventbus messaging in vertx?