Solution/Architecture: queues or something else? - ruby

I have a multiple frontends to my service written in Node.js and workers written in Ruby. Now the question is how to make those communicate? I need to maintain dynamic pool of workers to handle load (spawn more workers when load rises) and messages are quite big ~2-3M because I'm sending images to workers uploaded by users through Node.js frontends. Because I want nice scaling I thought about some queuing solution, but I didn't find any existing solutions (or misunderstood guides) that will provide:
Fallback mechanisms. Solutions I've found so far have single failure point - message broker and there are no ways to provide fallbacks.
Serialization. So when broker fails tasks are not lost.
Ability to pass big messages.
Easy API for Ruby and Node.js
Some API to track queue size so I could rearrange workers pool.
Preferrably lightweight.
Maybe my approach is wrong? Maybe I shouldn't use queues but some other way? Or there's some queueing solution that fits requirements above?

No doubt you require a Queue to scale and you can monitor this queue to spawn "workers".
Apache ActiveMQ is very robust and supports REST protocol. Ruby client is also available to access the queue.
Interesting article on RESTful queue using Apache ActiveMQ

in the end of the day i took ZeroMQ queue solution. Very fast, robust and lightweight implementation. Had to write own broker, but thats the only cons of this solution.

redis publish/subscribe should do the trick
http://redis.io/topics/pubsub

Related

Notifying golongpoll.SubscriptionManager of an event from kafka-go

I was writing a POC on long-polling using go.
I see the general package to be used is https://github.com/jcuga/golongpoll .
But assuming that I would want to publish an event to the golongpoll.SubscriptionManager from a general context, especially when there is a possibility that the long poll API request is being served by one machine, while the Kafka event for that particular consumer group is consumed by another instance in the cluster.
The examples given in the documentation did not talk of such a scenario at all, even though this seems like a common scenario. One way I can think of is have a distributed cache like Redis in between and have all the services poll this for a change? But that sounds a bit dumb to me.

MassTransit Multiple Consumers

I have an environment where I have only one app server. I have some messages that take awhile to service (like 10 seconds or so) and I'd like to increase throughput by configuring multiple instances of my consumer application running code to process these messages. I've read about the "competing consumer" pattern and gather that this should be avoided when using MassTransit. According to the MassTransit docs here, each receive endpoint should have a unique queue name. I'm struggling to understand how to map this recommendation to my environment. Is it possible to have N instances of consumers running that each receive the same message, but only one of the instances will actually act on it? In other words, can we implement the "competing consumer" pattern but across multiple queues instead of one?
Or am I looking at this wrong? Do I really need to look into the "Send" method as opposed to "Publish"? The downside with "Send" is that it requires the sender to have direct knowledge of the existence of an endpoint, and I want to be dynamic with the number of consumers/endpoints I have. Is there anything built in to MassTransit that could help with the keeping track of how many consumer instances/queues/endpoints there are that can service a particular message type?
Thanks,
Andy
so the "avoid competing consumers" guidance was from when MSMQ was the primary transport. MSMQ would fall over if multiple threads where reading from the queue.
If you are using RabbitMQ, then competing consumers work brilliantly. Competing consumers is the right answer. Each competing consume will use the same receive from endpoint.

Golang background processing

How can one do background processing/queueing in Go?
For instance, a user signs up, and you send them a confirmation email - you want to send the confirmation email in the background as it may be slow, and the mail server may be down etc etc.
In Ruby a very nice solution is DelayedJob, which queues your job to a relational database (i.e. simple and reliable), and then uses background workers to run the tasks, and retries if the job fails.
I am looking for a simple and reliable solution, not something low level if possible.
While you could just open a goroutine and do every async task you want, this is not a great solution if you want reliability, i.e. the promise that if you trigger a task it will get done.
If you really need this to be production grade, opt for a distributed work queue. I don't know of any such queues that are specific to golang, but you can work with rabbitmq, beanstalk, redis or similar queuing engines to offload such tasks from your process and add fault tolerance and queue persistence.
A simple Goroutine can make the job:
http://golang.org/doc/effective_go.html#goroutines
Open a gorutine with the email delivery and then answer to the HTTP request or whatever
If you wish use a workqueue you can use Rabbitmq or Beanstalk client like:
https://github.com/streadway/amqp
https://github.com/kr/beanstalk
Or maybe you can create a queue in you process with a FIFO queue running in a goroutine
https://github.com/iNamik/go_container
But maybe the best solution is this job queue library, with this library you can set the concurrency limit, etc:
https://github.com/otium/queue
import "github.com/otium/queue"
q := queue.NewQueue(func(email string) {
//Your mail delivery code
}, 20)
q.Push("foo#bar.com")
I have created a library for running asynchronous tasks using a message queue (currently RabbitMQ and Memcache are supported brokers but other brokers like Redis or Cassandra could easily be added).
You can take a look. It might be good enough for your use case (and it also supports chaining and workflows).
https://github.com/RichardKnop/machinery
It is an early stage project though.
You can also use goworker library to schedule jobs.
http://www.goworker.org/
If you are coming from Ruby background and looking for something like Sidekiq, Resque, or DelayedJob, please check out the library asynq.
Queue semantics are very similar to sidekiq.
https://github.com/hibiken/asynq
If you want a library with a very simple interface, yet robust that feels Go-like, uses Redis as Backend and RabbitMQ as message broker, you can try
https://github.com/Joker666/cogman

Web server and ZeroMQ patterns

I am running an Apache server that receives HTTP requests and connects to a daemon script over ZeroMQ. The script implements the Multithreaded Server pattern (http://zguide.zeromq.org/page:all#header-73), it successfully receives the request and dispatches it to one of its worker threads, performs the action, responds back to the server, and the server responds back to the client. Everything is done synchronously as the client needs to receive a success or failure response to its request.
As the number of users is growing into a few thousands, I am looking into potentially improving this. The first thing I looked at is the different patterns of ZeroMQ, and whether what I am using is optimal for my scenario. I've read the guide but I find it challenging understanding all the details and differences across patterns. I was looking for example at the Load Balancing Message Broker pattern (http://zguide.zeromq.org/page:all#header-73). It seems quite a bit more complicated to implement than what I am currently using, and if I understand things correctly, its advantages are:
Actual load balancing vs the round-robin task distribution that I currently have
Asynchronous requests/replies
Is that everything? Am I missing something? Given the description of my problem, and the synchronous requirement of it, what would you say is the best pattern to use? Lastly, how would the answer change, if I want to make my setup distributed (i.e. having the Apache server load balance the requests across different machines). I was thinking of doing that by simply creating yet another layer, based on the Multithreaded Server pattern, and have that layer bridge the communication between the web server and my workers.
Some thoughts about the subject...
Keep it simple
I would try to keep things simple and "plain" ZeroMQ as long as possible. To increase performance, I would simply to change your backend script to send request out from dealer socket and move the request handling code to own program. Then you could just run multiple worker servers in different machines to get more requests handled.
I assume this was the approach you took:
I was thinking of doing that by simply creating yet another layer, based on the Multithreaded Server pattern, and have that layer bridge the communication between the web server and my workers.
Only problem here is that there is no request retry in the backend. If worker fails to handle given task it is forever lost. However one could write worker servers so that they handle all the request they got before shutting down. With this kind of setup it is possible to update backend workers without clients to notice any shortages. This will not save requests that get lost if the server crashes.
I have the feeling that in common scenarios this kind of approach would be more than enough.
Mongrel2
Mongrel2 seems to handle quite many things you have already implemented. It might be worth while to check it out. It probably does not completely solve your problems, but it provides tested infrastructure to distribute the workload. This could be used to deliver the request to be handled to multithreaded servers running on different machines.
Broker
One solution to increase the robustness of the setup is a broker. In this scenario brokers main role would be to provide robustness by implementing queue for the requests. I understood that all the requests the worker handle are basically the same type. If requests would have different types then broker could also do lookups to find correct server for the requests.
Using the queue provides a way to ensure that every request is being handled by some broker even if worker servers crashed. This does not come without price. The broker is by itself a single point of failure. If it crashes or is restarted all messages could be lost.
These problems can be avoided, but it requires quite much work: the requests could be persisted to the disk, servers could be clustered. Need has to be weighted against the payoffs. Does one want to use time to write a message broker or the actual system?
If message broker seems a good idea the time which is required to implement one can be reduced by using already implemented product (like RabbitMQ). Negative side effect is that there could be a lot of unwanted features and adding new things is not so straight forward as to self made broker.
Writing own broker could covert toward inventing the wheel again. Many brokers provide similar things: security, logging, management interface and so on. It seems likely that these are eventually needed in home made solution also. But if not then single home made broker which does single thing and does it well can be good choice.
Even if broker product is chosen I think it is a good idea to hide the broker behind ZeroMQ proxy, a dedicated code that sends/receives messages from the broker. Then no other part of the system has to know anything about the broker and it can be easily replaced.
Using broker is somewhat developer time heavy. You either need time to implement the broker or time to get use to some product. I would avoid this route until it is clearly needed.
Some links
Comparison between broker and brokerless
RabbitMQ
Mongrel2

what is a good work queue for cross platform usage?

Scenario:
In a web-application some parts are realized in PHP and some other in node.js. Communication between PHP and node.js should be realized via an asynchronous queue/worker system.
In the PHP part of the application API requests should be queued. In the node.js part queued API requests should be processed (worker). Results should be saved back to the queue. Later the results should be retrieved using PHP. The queue should support retry strategies and support notification (to the client) on completed requests.
Question:
I do not want to realize the queue on my own. The work queue itself should not run in PHP, because i do not want long running PHP processes.
I found the work queues
beanstalkd
resque
celery
rabbitmq
Are they suitable for this scenario? Resque looks great. However can a PHP client work together with a Ruby queue? Has anybody experience with something similar? Can worker write back results to the working queue? Can clients be notified on results?
after doing a lot of research i am using rabbitmq.
there are "official" client libraries for multiple plattforms out there. thus subsystems running on different plattforms can work together quite simple.
there are php forks of resque out there. but i do like it the rabbitmq way. one message broker, good documentation, "official" client libraries.

Resources