Vert.x cluster Eventbus cross processes - cluster-computing

Does any body have some info, links, pointer on how is cross process Eventbus communication is occurring. Per documentation I am concluding that multiple Vert.x (thus separate JVM processes) could be clustered on and communicate via Eventbus. However, there are little to none documentation on how to achieve it.
Looking into DOCs, I can see that publish/registerHandler methods take address as a String what works within a process, but I can not wrap my head around on how it works cross processes and how to register and publish to address, does it work over HTTP , TCP ? From API perspective do I need to pass port and process signature ?

Cross process communication happens via the EventBus. Multiple vertx instances can be started up and clustered to allow separate instances on the same or other machines to communicate. The low level clustering is handled by Hazelcast.The configuration is handled by the cluster.xml file in the conf folder of your vertx install. You can learn more about the format of the file by looking at the Hazelcast Docs. It is transparent to your handers and works over TCP.
You can test it by running two or more instances on your local machine once they are started with the -cluster flag. Look at the example being run, and the config changes required in How to use eventbus messaging in vertx?

Related

Scaling a microservice with frontend and backend instances

I am developing a series of microservices using Spring Boot and plan to deploy them on Kubernetes.
Some of the microservices are composed of an API which writes messages to a kafka queue and a listener which listens to the queue and performs the relevant actions (e.g. write to DB etc, construct messsages for onward processing).
These services work fine locally but I am planning to run multiple instances of the microservice on Kubernetes. I'm thinking of the following options:
Run multiple instances as is (i.e. each microservice serves as an API and a listener).
Introduce a FRONTEND, BACKEND environment variable. If the FRONTEND variable is true, do not configure the listener process. If the BACKEND variable is true, configure the listener process.
This way I can start scale how may frontend / backend services I need and also have the benefit of shutting down the backend services without losing requests.
Any pointers, best practice or any other options would be much appreciated.
You can do as you describe, with environment variables, or you may also be interested in building your app with different profiles/bean configuration and make two different images.
In both cases, you should use two different Kubernetes Deployments so you can scale and configure them independently.
You may also be interested in a Leader Election pattern where you want only one active replica if it only make sense if one single replica processes the events from a queue. This can also be solved by only using a single replica depending on your availability requirements.

Understanding the MajorDomo Pattern from NetMQ ZeroMQ

I am trying to understand how to best implement the MDP example in c# to be used in a windows service in a multiple client - single server environment.
I have read the docs but I am still unclear on the following:
Should all Worker instances be created on startup and left to run?
Should the Workers all be different types of services or just different instances of the same service?
Can I have one windows service when contains the Broker and Workers or is it best to split them out into their own services?
The example code I am using is the MajorDomo Pattern taken from here https://github.com/NetMQ/Samples
Yes, all workers in a MDP environment should be created independently of the requests, since the broker should not know how to create them
Each worker handles a given "service" (contract). Obviously each contract should have at least one worker.
If you need parallelized handling of requests, and a given worker can only do one at a time, having extra workers for that service could make sense. Generally you would do this if multiple machines were involved however (horizontal scaling)
You can have the broker and workers in the same process. HOWEVER, if you want to update only a worker, taking down the broker at the same time can be annoying for the clients. I would recommend letting the broker be its own process, with the workers in one or more other processes.

Vertx clustering alternative

Anyone with real-world experience of Vertx cluster managers other than Hazelcast have advice on our requirement below?
For our (real time sensor data) system we have hundreds of verticles in multiple JVM's, but we do not need, or want, the eventbus to span multiple physical servers.
We're running Vertx on multiple servers but our platform is less complex if we don't pool a single eventbus between all of them (we prefer to be explicit about passing messages between servers).
Hazelcast is the wrong cluster manager for us. We don't need its peer discovery between servers, but crucially any release change of Hazelcast means that new clients cannot join a cluster with existing running clients running the previous version so bringing up one new verticle compiled with vertx 3.6.3 into an existing cluster is not possible unless we stop the entire cluster and restart it with all the verticles recompiled to 3.6.3. This seriously impacts our development. It's helpful for the verticles to be more plug-and-play and vertx can do that but Hazelcast can't (due to constant version incompatibilities).
Can anyone recommend a vertx cluster manager that fits our use case?
I've now had time to review each of the alternatives Vertx directly supports as a 'cluster manager' (Hazelcast, Zookeeper, Ignite, Infinispan) and we're proceeding with a Zookeeper architecture for our system, replacing Hazelcast:
Here's the background to our decision:
We started as a fairly typical (if there is such a thing) Vertx development with multiple verticles in a JVM responding to external events (urban sensor data entering our java/vertx feed handlers) published on the eventbus and the data being processed asynchronously in many other vertx verticles, often involving them publishing new derived data as new asynchronous messages.
Quite quickly we wanted to use multiple JVM's, mainly to isolate the feedhandlers from the rest of the code so if things broke the feedhandlers would keep running (as a failsafe they're persisting the data as well as publishing it). So we added (easily) Vertx clustering so the JVM's on the same machine could communicate and all verticles could publish/subscribe messages in the same system. We used the default cluster manager, Hazelcast, and modified the config so the vertx clustering is limited to the single server (we run multiple versions of the entire platform on different servers and don't want them confusing each other). We have hundreds of verticles in half-a-dozen JVM's.
Our environment (search SmartCambridge vertx) is fairly dynamic with rapid development cycles (e.g. to create a new feedhandler and have it publishing its data on the eventbus) and that means we commonly wish to start up a JVM containing these new verticles and have it join an existing vertx cluster, maybe permanently, maybe just for a while. Vertx/Hazelcast has joining a (vertx) cluster as a fairly serious operation, i.e. Hazelcast has (I believe) a concept of Hazelcast cluster members and Hazelcast clients, where clients can come and go easily but joining a Hazelcast cluster as a member requires considerable code compatibility between the existing cluster and the new member. Each time we upgraded our Vertx library the Hazelcast library version would change and this made it impossible for a newly compiled vertx verticle to join an existing vertx cluster.
Note we have experimented with having the Vertx eventbus flow between multiple servers, and also extend the eventbus into the browser/javascript, but in both cases have found it simpler/more robust to be explicit about routing messages from server to server and have written verticles specifically for that purpose.
So the new plan (after several years of Vertx development), given our environment of 5 production/development servers but with the vertx eventbus always limited to single servers, is to implement a single Zookeeper cluster across all 5 servers so we get the Zookeeper native resilience goodness, and configure each production server to use a different znode root (the default is 'io.vertx' but this is a simple config option).
This design has an attractive simple minimum build on a single server (i.e Zookeeper + Vertx) so ad-hoc development on a random machine (e.g. laptop) is still possible but we can extend our platform to have multiple servers in a single vertx cluster trivially by setting a common znode root.

Use Hazelcast Executor Service to be executed on clients

I all the documentation and all the "Google search results" I saw, the hazelcast executor service can be used to be executed on "Members".
I wonder if it is possible to also have things being executed on hazelcast clients?
The distributed executor service is intended to run processing where the data is hosted, on the servers. This is a similar idea to a stored procedure, run the processing where the data lives, save data transfer.
In general, you can't run a Java Runnable or Callable on the clients as the clients may not be Java.
Also, the clients don't host any data, so they'd have to fetch what data they need from the servers potentially.
If you want something to run on all or some connected clients, you could implement this yourself using the publish/subscribe mechanism. A payload could be sent to an ITopic with the necessary execution parameters, and clients listening can act on the message.
You can also create a Near Cache on client side and use JDK’s ExecutorService that runs in your local jvm app.

How to handle global resources in Spring State Machine?

I am thinking of using Spring State Machine for a TCP client. The protocol itself is given and based on proprietary TCP messages with message id and length field. The client sets up a TCP connection to the server, sends a message and always waits for the response before sending the next message. In each state, only certain responses are allowed. Multiple clients must run in parallel.
Now I have the following questions related to Spring State machine.
1) During the initial transition from disconnected to connected the client sets up a connection via java.net.Socket. How can I make this socket (or the DataOutputStream and BufferedReader objects got from the socket) available to the actions of the other transitions?
In this sense, the socket would be some kind of global resource of the state machine. The only way I have seen so far would be to put it in the message headers. But this does not look very natural.
2) Which runtime environment do I need for Spring State Machine?
Is a JVM enough or do I need Tomcat?
Is it thread-safe?
Thanks, Wolfgang
There's nothing wrong using event headers but those are not really global resources as header exists only for duration of a event processing. I'd try to add needed objects into an machine's extended state which is then available for all actions.
You need just JVM. On default machine execution is synchronous so there should not be any threading issues. Docs have notes if you want to replace underlying executor asynchronous(this is usually done if multiple concurrent regions are used).

Resources