How to setup a clustering of a queue servers for beanstalkd? - laravel

We've two Queue Servers, both attached to the application. so far Server 1 receives all the queued jobs and processed it. I would like to set up a Cluster so that the load is spread across 2 servers. anyone can suggest how to setup a cluster.
Thanks.

Beanstalkd doesn't offer this feature.
Alternatives are:
you setup a soft sharding to route requests to queue A or B
you can use alternatives like Redis Queue, Cloud Pub/Sub from Google Cloud Platform

Beanstalkd in a single instance setup can support multiple TCP connections, and generally outperforms Redis. Below are a few benchmarks, however benchmarks are subjective.
Benchmarks References
https://ph4r05.deadcode.me/blog/2017/12/16/laravel-queueing-benchmark.html
https://adam.herokuapp.com/past/2010/4/24/beanstalk_a_simple_and_fast_queueing_backend/
So a vertical scaling is usually sufficient.
The problem however, is availability, when the single beanstalkd instance goes away.
You can checkout coolbeans, this project is in alpha. It provides a replicated beanstalkd https://github.com/1xyz/coolbeans

Related

Scaling a microservice with frontend and backend instances

I am developing a series of microservices using Spring Boot and plan to deploy them on Kubernetes.
Some of the microservices are composed of an API which writes messages to a kafka queue and a listener which listens to the queue and performs the relevant actions (e.g. write to DB etc, construct messsages for onward processing).
These services work fine locally but I am planning to run multiple instances of the microservice on Kubernetes. I'm thinking of the following options:
Run multiple instances as is (i.e. each microservice serves as an API and a listener).
Introduce a FRONTEND, BACKEND environment variable. If the FRONTEND variable is true, do not configure the listener process. If the BACKEND variable is true, configure the listener process.
This way I can start scale how may frontend / backend services I need and also have the benefit of shutting down the backend services without losing requests.
Any pointers, best practice or any other options would be much appreciated.
You can do as you describe, with environment variables, or you may also be interested in building your app with different profiles/bean configuration and make two different images.
In both cases, you should use two different Kubernetes Deployments so you can scale and configure them independently.
You may also be interested in a Leader Election pattern where you want only one active replica if it only make sense if one single replica processes the events from a queue. This can also be solved by only using a single replica depending on your availability requirements.

Understanding the MajorDomo Pattern from NetMQ ZeroMQ

I am trying to understand how to best implement the MDP example in c# to be used in a windows service in a multiple client - single server environment.
I have read the docs but I am still unclear on the following:
Should all Worker instances be created on startup and left to run?
Should the Workers all be different types of services or just different instances of the same service?
Can I have one windows service when contains the Broker and Workers or is it best to split them out into their own services?
The example code I am using is the MajorDomo Pattern taken from here https://github.com/NetMQ/Samples
Yes, all workers in a MDP environment should be created independently of the requests, since the broker should not know how to create them
Each worker handles a given "service" (contract). Obviously each contract should have at least one worker.
If you need parallelized handling of requests, and a given worker can only do one at a time, having extra workers for that service could make sense. Generally you would do this if multiple machines were involved however (horizontal scaling)
You can have the broker and workers in the same process. HOWEVER, if you want to update only a worker, taking down the broker at the same time can be annoying for the clients. I would recommend letting the broker be its own process, with the workers in one or more other processes.

Vertx clustering alternative

Anyone with real-world experience of Vertx cluster managers other than Hazelcast have advice on our requirement below?
For our (real time sensor data) system we have hundreds of verticles in multiple JVM's, but we do not need, or want, the eventbus to span multiple physical servers.
We're running Vertx on multiple servers but our platform is less complex if we don't pool a single eventbus between all of them (we prefer to be explicit about passing messages between servers).
Hazelcast is the wrong cluster manager for us. We don't need its peer discovery between servers, but crucially any release change of Hazelcast means that new clients cannot join a cluster with existing running clients running the previous version so bringing up one new verticle compiled with vertx 3.6.3 into an existing cluster is not possible unless we stop the entire cluster and restart it with all the verticles recompiled to 3.6.3. This seriously impacts our development. It's helpful for the verticles to be more plug-and-play and vertx can do that but Hazelcast can't (due to constant version incompatibilities).
Can anyone recommend a vertx cluster manager that fits our use case?
I've now had time to review each of the alternatives Vertx directly supports as a 'cluster manager' (Hazelcast, Zookeeper, Ignite, Infinispan) and we're proceeding with a Zookeeper architecture for our system, replacing Hazelcast:
Here's the background to our decision:
We started as a fairly typical (if there is such a thing) Vertx development with multiple verticles in a JVM responding to external events (urban sensor data entering our java/vertx feed handlers) published on the eventbus and the data being processed asynchronously in many other vertx verticles, often involving them publishing new derived data as new asynchronous messages.
Quite quickly we wanted to use multiple JVM's, mainly to isolate the feedhandlers from the rest of the code so if things broke the feedhandlers would keep running (as a failsafe they're persisting the data as well as publishing it). So we added (easily) Vertx clustering so the JVM's on the same machine could communicate and all verticles could publish/subscribe messages in the same system. We used the default cluster manager, Hazelcast, and modified the config so the vertx clustering is limited to the single server (we run multiple versions of the entire platform on different servers and don't want them confusing each other). We have hundreds of verticles in half-a-dozen JVM's.
Our environment (search SmartCambridge vertx) is fairly dynamic with rapid development cycles (e.g. to create a new feedhandler and have it publishing its data on the eventbus) and that means we commonly wish to start up a JVM containing these new verticles and have it join an existing vertx cluster, maybe permanently, maybe just for a while. Vertx/Hazelcast has joining a (vertx) cluster as a fairly serious operation, i.e. Hazelcast has (I believe) a concept of Hazelcast cluster members and Hazelcast clients, where clients can come and go easily but joining a Hazelcast cluster as a member requires considerable code compatibility between the existing cluster and the new member. Each time we upgraded our Vertx library the Hazelcast library version would change and this made it impossible for a newly compiled vertx verticle to join an existing vertx cluster.
Note we have experimented with having the Vertx eventbus flow between multiple servers, and also extend the eventbus into the browser/javascript, but in both cases have found it simpler/more robust to be explicit about routing messages from server to server and have written verticles specifically for that purpose.
So the new plan (after several years of Vertx development), given our environment of 5 production/development servers but with the vertx eventbus always limited to single servers, is to implement a single Zookeeper cluster across all 5 servers so we get the Zookeeper native resilience goodness, and configure each production server to use a different znode root (the default is 'io.vertx' but this is a simple config option).
This design has an attractive simple minimum build on a single server (i.e Zookeeper + Vertx) so ad-hoc development on a random machine (e.g. laptop) is still possible but we can extend our platform to have multiple servers in a single vertx cluster trivially by setting a common znode root.

Should Apache Kafka and Apache Hadoop share the same ZooKeeper instance?

Is it possible to use same ZooKeeper instance for coordinating Apache Kafka and Apache Hadoop clusters? If yes, what would be the appropriate configuration of ZooKeeper?
Thanks!
Yes, as far as my understanding goes, ideally there should be a single zookeeper cluster with dedicated machines for managing the co-ordination between different application in a distributed system. i would try to share few points here
The zookeeper cluster consisting of several servers are typically called ensemble and basically manages to track and share states of your application.e.g Kafka uses it to commit offset changes to it so that in case of failure it can identify from where to start again. from the doc page :
Like the distributed processes it coordinates, ZooKeeper itself is intended to be replicated over a sets of hosts(ensemble). whenever a change is made, it is not considered successful until it has been written to a quorum (at least half) of the servers in the ensemble.
Now
Imagine both Kafka & Hadoop are having a dedicated cluster of 3 zookeeper servers each, in case couple of nodes get down in any of the two clusters it will result a service failure (ZK works based on simple majority voting, so it will tolerate up to 1 node failure keeping the service alive but not 2 ) . Instead if there is One Single cluster of 5zk servers managing both the applications and in case two of the nodes are down you still have the service available.Not only this offer better reliability also it reduces the hardware expenses as instead of managing 6 servers you only have to take care of 5.

Solution/Architecture: queues or something else?

I have a multiple frontends to my service written in Node.js and workers written in Ruby. Now the question is how to make those communicate? I need to maintain dynamic pool of workers to handle load (spawn more workers when load rises) and messages are quite big ~2-3M because I'm sending images to workers uploaded by users through Node.js frontends. Because I want nice scaling I thought about some queuing solution, but I didn't find any existing solutions (or misunderstood guides) that will provide:
Fallback mechanisms. Solutions I've found so far have single failure point - message broker and there are no ways to provide fallbacks.
Serialization. So when broker fails tasks are not lost.
Ability to pass big messages.
Easy API for Ruby and Node.js
Some API to track queue size so I could rearrange workers pool.
Preferrably lightweight.
Maybe my approach is wrong? Maybe I shouldn't use queues but some other way? Or there's some queueing solution that fits requirements above?
No doubt you require a Queue to scale and you can monitor this queue to spawn "workers".
Apache ActiveMQ is very robust and supports REST protocol. Ruby client is also available to access the queue.
Interesting article on RESTful queue using Apache ActiveMQ
in the end of the day i took ZeroMQ queue solution. Very fast, robust and lightweight implementation. Had to write own broker, but thats the only cons of this solution.
redis publish/subscribe should do the trick
http://redis.io/topics/pubsub

Resources