How does Celery work? - multiprocessing

I have recently started working on distributed computing for increasing the computation speed. I opted for Celery. However, I am not very familiar with some terms. So, I have several related questions.
From the Celery docs:
What's a Task Queue?
...
Celery communicates via messages, usually using a broker to mediate between clients and workers. To initiate a task the client adds a message to the queue, the broker then delivers that message to a worker.
What are clients (here)? What is a broker? Why are messages delivered through a broker? Why would Celery use a backend and queues for interprocess communication?
When I execute the Celery console by issuing the command
celery worker -A tasks --loglevel=info --concurrency 5
Does this mean that the Celery console is a worker process which is in charge of 5 different processes and keeps track of the task queue? When a new task is pushed into the task queue, does this worker assign the task/job to any of the 5 processes?

Last question first:
celery worker -A tasks --loglevel=info --concurrency 5
You are correct - the worker controls 5 processes. The worker distributes tasks among the 5 processes.
A "client" is any code that runs celery tasks asynchronously.
There are 2 different types of communication - when you run apply_async you send a task request to a broker (most commonly rabbitmq) - this is basically a set of message queues.
When the workers finish they put their results into the result backend.
The broker and results backend are quite separate and require different kinds of software to function optimally.
You can use RabbitMQ for both, but once you reach a certain rate of messages it will not work properly. The most common combination is RabbitMQ for broker and Redis for results.

We can take analogy of assembly line packaging in a factory to understand the working of celery.
Each product is placed on a conveyor belt.
The products are processed by machines.
At the end all the processed product is stored in one place one by one.
Celery working:
Note: Instead of taking each product for processing as they are placed on convey belt, In celery the queue is maintained whose output will be fed to a worker for execution one per task (sometimes more than one queue is maintained).
Each request (which is a task) is send to a queue (Redis/Rabbit MQ) and an acknowledgment is send back.
Each task is assigned to a specific worker which executes the task.
Once the worker has finished the task its output is stored in the result backend (Redis).

Related

persist celery events while receiver is down

What happens to celery's events when my receiver is down?
According to documentation(https://docs.celeryproject.org/en/latest/userguide/monitoring.html#real-time-processing) I need to run a separate process that is listening for celery events and process them.
But, if i have to shutdown the receiver process for maintance or other purpose, all events are lost for ever?
Can i persist this events?
Long answer: it depends on your broker choice.
Short answer: the three most popular brokers with celery are RabbitMQ, redis, and SQS. Each one offers some degree of persistence and. Rabbit MQ and SQS are message queueing services that offer "guaranteed delivery" of messages once and only once. The default redis configuration will persist messages in RAM and save them to disk after a maximum of fifteen minutes, so if redis shuts down within that fifteen minute time span, you will lose messages / tasks.

Understanding the MajorDomo Pattern from NetMQ ZeroMQ

I am trying to understand how to best implement the MDP example in c# to be used in a windows service in a multiple client - single server environment.
I have read the docs but I am still unclear on the following:
Should all Worker instances be created on startup and left to run?
Should the Workers all be different types of services or just different instances of the same service?
Can I have one windows service when contains the Broker and Workers or is it best to split them out into their own services?
The example code I am using is the MajorDomo Pattern taken from here https://github.com/NetMQ/Samples
Yes, all workers in a MDP environment should be created independently of the requests, since the broker should not know how to create them
Each worker handles a given "service" (contract). Obviously each contract should have at least one worker.
If you need parallelized handling of requests, and a given worker can only do one at a time, having extra workers for that service could make sense. Generally you would do this if multiple machines were involved however (horizontal scaling)
You can have the broker and workers in the same process. HOWEVER, if you want to update only a worker, taking down the broker at the same time can be annoying for the clients. I would recommend letting the broker be its own process, with the workers in one or more other processes.

Can I use --without-heartbeat and --beat on a celery worker together?

I want to have a celery worker and celery beat on the same process (dyno), however I want to limit the number of messages sent to RabbitMQ broker. Before implementing beat I have been using --without-heartbeat and it has been fine, but I wonder if this will still work when I add --beat to my procfile.
WORKER: celery worker --app=cworker.celery --autoscale=6,2 --beat --loglevel=info --without-mingle --without-gossip --without-heartbeat
Am I completely misunderstanding two separate elements?

Resque and a multi-server architecture

I haven't yet actually used Resque. I have the following questions and assumptions that I'd like verified:
1) I understand that you can have a multiserver architecture by configuring each of your resque instances to point to a central redis server. Correct?
2) Does this mean that any resque instance can add items to a queue and any workers can work on any of those queues?
3) Can multiple workers respond to the same item in a queue? I.e. one server puts "item 2 has been updated" in a queue, can workers 1, 2, and 3, on different servers, all act on that? Or would I need to create separate queues? I kind of want a pub/sub types tasks.
4) Does the Sinatra monitor app live on each instance of Resque? Or is there just one app that knows about all the queues and workers?
5) Does Resque know when a task is completed? I.e. does the monitor app show that a task is in process? Or just that a worker took it?
6) Does the monitor app show completed tasks? I.e. if a task complete quickly will I be able to see that at some point in the recent past that task was completed?
7) Can I programmatically query whether a task has been started, is in progress, or is completed?
As I am using resque extensively in our project, here are few answers for your query:
I understand that you can have a multi-server architecture by configuring each of your resque instances to point to a central redis server. Correct?
Ans: Yes you have multiple servers pointing to single resque server. I am running on similar architecture.
Does this mean that any resque instance can add items to a queue and any workers can work on any of those queues?
Ans: This depends on how you are configuring your servers, you have to create queues and them assign workers to them.
You can have multiple queues and each queue can have multiple workers working on them.
Can multiple workers respond to the same item in a queue? I.e. one server puts "item 2 has been updated" in a queue, can workers 1, 2, and 3, on different servers, all act on that? Or would I need to create separate queues? I kind of want a pub/sub types tasks.
Ans: This again based on your requirement, if you want to have a single queue and all workers working on it, this is also valid.
or if you want separate queue for each server you can do that also.
Any server can put jobs in any queue but only assigned workers will pickup and work on that job
Does the Sinatra monitor app live on each instance of Resque? Or is there just one app that knows about all the queues and workers?
Ans: Sinatra monitor app gives you an interface where you can see all workers/queues related info such as running jobs, waiting jobs, queues and failed jobs etc.
Does Resque know when a task is completed? I.e. does the monitor app show that a task is in process? Or just that a worker took it?
Ans: It does, basically resque also maintains internal queues to manage all the jobs.
Ans: Yes it shows every stats about the Job.
Does the monitor app show completed tasks? I.e. if a task complete quickly will I be able to see that at some point in the recent past that task was completed?
Ans: Yes you can do
for example to know which workers are working use Resque.working, similarly you can check their code base and utilise anything.
Resque is a very powerful library we are using for more than one year now.
Cheers happy coding!!

Broadcasting a message to multiple workers spring rabbitmq

I am using RabbitMQ with Spring. I have multiple workers running on separate vm's that pick up messages in a round robin fashion. All is good.
Now, I would like to declare one queue "command" where ALL the workers process messages sent to that queue. So I want this command to be run on ALL the worker/listeners.
Is it possible to set this up using RabbitMQ/Spring?
I saw one solution where each work setup their own queue for processing, but that is not ideal for me.
So, I would essentially like to broadcast a message to a single queue and have all the workers process the message.
Thanks for any help.
Dave
I would essentially like to broadcast a message to a single queue and
have all the workers process the message.
Create a fanout exchange. A publish subscribe feature where all the messages would be pushed to a single queue & received by all the subscriber workers.

Resources