What is the best way to route tasks in machinery(go)? - go

I'm trying to use machinery as a distributed task queue and would like to deploy separate workers for different groups of tasks. E.g. have a worker next to the database server running database related tasks and a number of workers on different servers running cpu/memory intensive tasks. Only the documentation isn't really clear on how one wold do this.
I initially tried running the workers without registering unwanted tasks on to them but this resulted in the worker repeatedly consuming the unregistered task and requeuing it with he following message:
INFO: 2022/01/27 08:33:13 redis.go:342 Task not registered with this worker. Requeuing message: {"UUID":"task_7026263a-d085-4492-8fa8-e4b83b2c8d59","Name":"add","RoutingKey":"","ETA":null,"GroupUUID":"","GroupTaskCount":0,"Args":[{"Name":"","Type":"int32","Value":2},{"Name":"","Type":"int32","Value":4}],"Headers":{},"Priority":0,"Immutable":false,"RetryCount":0,"RetryTimeout":0,"OnSuccess":null,"OnError":null,"ChordCallback":null,"BrokerMessageGroupId":"","SQSReceiptHandle":"","StopTaskDeletionOnError":false,"IgnoreWhenTaskNotRegistered":false}
I suspect this can be fixed by setting IgnoreWhenTaskNotRegistered to True however this doesn't seem like a very elegant solution.
Task signatures also have a RoutingKey field but there was no info in the docs on how to configure a worker to only consume tasks from a specific routing key.
Also, one other solution would be to have separate machinery task servers but this would take away the ability to use workflows and orchestrate tasks between workers.

Found the solution through some trial and error.
Setting IgnoreWhenTaskNotRegistered to true isn't a correct solution since, unlike what I initially thought, the worker still consumes the unregistered task and then discards it instead of requeuing it.
The correct way to route tasks is to set RoutingKey in the task's signature to the desired queue's name and use taskserver.NewCustomQueueWorker to get a queue specific worker object instead of taskserver.NewWorker
Sending a task to a specific queue:
task := tasks.Signature{
Name: "<TASKNAME>",
RoutingKey: "<QUEUE>",
Args: []tasks.Arg{
// args...
},
}
res, err := taskserver.SendTask(&task)
if err != nil {
// handle error
}
And starting a worker to consume from a specific queue:
worker := taskserver.NewCustomQueueWorker("<WORKERNAME>", concurrency, "<QUEUE>")
if err := worker.Launch(); err != nil {
// handle error
}
Still not quite sure how to tell a worker to consume from a set of queues as `NewCustomQueueWorker` only accepts a single string as it's queue name, however that's a relatively minor detail.

Related

Scheduler-worker cluster without port forwarding

Hello Satckoverflow!
TLDR I would like to recreate https://github.com/KorayGocmen/scheduler-worker-grpc without port forwarding on the worker.
I am trying to build a competitive programming judge server for evaluation of submissions as a project for my school where I teach programming to kids.
Because the evaluation is computationally heavy I would like to have multiple worker nodes.
The scheduler would receive submissions and hand them out to the worker nodes. For ease of worker deployment ( as it will be often changing ) I would like the worker to be able to subscribe to the scheduler and thus become a worker and receive jobs.
The workers may not be on the same network as the scheduler + the worker resides in a VM ( maybe later will be ported to docker but currently there are issues with it ).
The scheduler should be able to know resource usage of the worker, send different types of jobs to the worker and receive a stream of results.
I am currently thinking of using grpc to address my requirements of communication between workers and the scheduler.
I could create multiple scheduler service methods like:
register worker, receive a stream of jobs
stream job results, receive nothing
stream worker state periodically, receive nothing
However I would prefer the following but idk whether it is possible:
The scheduler GRPC api:
register a worker ( making the worker GRPC api available to the scheduler )
The worker GRPC api:
start a job ( returns stream of job status )
cancel a job ???
get resource usage
The worker should unregister automatically if the connection is lost.
So my question is... is it possible to create a grpc worker api that can be registered to the scheduler for later use if the worker is behind a NAT without port forwarding?
Additional possibly unnecessary information:
Making matters worse I have multiple radically different types of jobs ( streaming an interactive console, executing code against prepared testcases ). I may just create different workers for different jobs.
Sometimes the jobs involve having large files on the local filesystem ( up to 500 MB ) that are usually kept near the scheduler therefore I would like to send the job to a worker which already has the specific files downloaded from the scheduler. Otherwise download the large files on one of the workers. Having all files at the same time on the worker would take more than 20 GB therefore I would like to avoid it.
A worker can run multiple jobs ( up to 16 ) at the same time.
I am writing the system in go.
As long as only the workers initiate the connections you don't have to worry about NAT. gRPC supports streaming in either direction (or both). This means that all of your requirements can be implemented using just one server on the scheduler; there is no need for the scheduler to connect back to the workers.
Given your description your service could look something like this:
syntax = "proto3";
import "google/protobuf/empty.proto";
service Scheduler {
rpc GetJobs(GetJobsRequest) returns (stream GetJobsResponse) {}
rpc ReportWorkerStatus(stream ReportWorkerStatusRequest) returns (google.protobuf.Empty) {}
rpc ReportJobStatus(stream JobStatus) returns (stream JobAction) {}
}
enum JobType {
JOB_TYPE_UNSPECIFIED = 0;
JOB_TYPE_CONSOLE = 1;
JOB_TYPE_EXEC = 2;
}
message GetJobsRequest {
// List of job types this worker is willing to accept.
repeated JobType types = 1;
}
message GetJobsResponse {
string jobId = 0;
JobType type = 1;
string fileName = 2;
bytes fileContent = 3;
// etc.
}
message ReportWorkerStatusRequest {
float cpuLoad = 0;
uint64 availableDiskSpace = 1;
uint64 availableMemory = 2;
// etc.
// List of filenames or file hashes, or whatever else you need to precisely
// report the presence of files.
repeated string haveFiles = 2;
}
Much of this is a matter of preference (you can use oneof instead of enums, for instance), but hopefully it's clear that a single connection from client to server is sufficient for your requirements.
Maintaining the set of available workers is quite simple:
func (s *Server) GetJobs(req *pb.GetJobRequest, stream pb.Scheduler_GetJobsServer) error {
ctx := stream.Context()
s.scheduler.AddWorker(req)
defer s.scheduler.RemoveWorker(req)
for {
job, err := s.scheduler.GetJob(ctx, req)
switch {
case ctx.Err() != nil: // client disconnected
return nil
case err != nil:
return err
}
if err := stream.Send(job); err != nil {
return err
}
}
}
The Basics tutorial includes examples for all types of streaming, including server and client implementations in Go.
As for registration, that usually just means creating some sort of credential that a worker will use when communicating with the server. This might be a randomly generated token (which the server can use to load associated metadata), or a username/password combination, or a TLS client certificate, or similar. Details will depend on your infrastructure and desired workflow when setting up workers.

How to deal with back pressure in GO GRPC?

I have a scenario where the clients can connect to a server via GRPC and I would like to implement backpressure on it, meaning that I would like to accept many simultaneous requests 10000, but have only 50 simultaneous threads executing the requests (this is inspired in Apache Tomcat NIO interface behaviour). I also would like the communication to be asynchronous, in a reactive manner, meaning that the client send the request but does not wait on it and the server sends the response back later and the client then execute some function registered to be executed.
How can I do that in GO GRPC? Should I use streams? Is there any example?
The GoLang API is a synchronous API, this is how GoLang usually works. You block in a while true loop until an event happens, and then you proceed to handle that event. With respect to having more simultaneous threads executing requests, we don't control that on the Client Side. On the client side at the application layer above gRPC, you can fork more Goroutines, each executing requests. The server side already forks a goroutine for each accepted connection and even stream on the connection so there is already inherent multi threading on the server side.
Note that there are no threads in go. Go us using goroutines.
The behavior described, is already built in to the GRC server. For example, see this option.
// NumStreamWorkers returns a ServerOption that sets the number of worker
// goroutines that should be used to process incoming streams. Setting this to
// zero (default) will disable workers and spawn a new goroutine for each
// stream.
//
// # Experimental
//
// Notice: This API is EXPERIMENTAL and may be changed or removed in a
// later release.
func NumStreamWorkers(numServerWorkers uint32) ServerOption {
// TODO: If/when this API gets stabilized (i.e. stream workers become the
// only way streams are processed), change the behavior of the zero value to
// a sane default. Preliminary experiments suggest that a value equal to the
// number of CPUs available is most performant; requires thorough testing.
return newFuncServerOption(func(o *serverOptions) {
o.numServerWorkers = numServerWorkers
})
}
The workers are at some point initialized.
// initServerWorkers creates worker goroutines and channels to process incoming
// connections to reduce the time spent overall on runtime.morestack.
func (s *Server) initServerWorkers() {
s.serverWorkerChannels = make([]chan *serverWorkerData, s.opts.numServerWorkers)
for i := uint32(0); i < s.opts.numServerWorkers; i++ {
s.serverWorkerChannels[i] = make(chan *serverWorkerData)
go s.serverWorker(s.serverWorkerChannels[i])
}
}
I suggest you read the server code yourself, to learn more.

Invoke kubernetes operator reconcile loop on external resources changes

I'm working on developing a k8s custom resource that as part of the business logic needs to reconcile its state when an external Job in the cluster have changed its own state.
Those Jobs aren't created by the custom resource itself but are externally created for a third party service, however I need to reconcile the state of the CRO for example when any of those external jobs have finished.
After reading bunch of documentation, I came up with setting a watcher for the controller, to watch Jobs like the following example
func (r *DatasetReconciler) SetupWithManager(mgr ctrl.Manager) error {
return ctrl.NewControllerManagedBy(mgr).
For(&datasetv1beta1.Dataset{}).
Watches(&source.Kind{Type: &batchv1.Job{}}, &handler.EnqueueRequestForObject{} /* filter by predicates, see https://pkg.go.dev/sigs.k8s.io/controller-runtime#v0.9.6/pkg/controller#Controller */).
Complete(r)
}
No I'm having my reconcile loop triggered for Jobs and my CRs with the corresponding name and namespace but I don't know anything about the object kind.
func (r *DatasetReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
l := log.FromContext(ctx)
l.Info("Enter Reconcile loop")
l.Info("Request", "Req", req)
//if this is triggered by my CR
dataset := &datasetv1beta1.Dataset{}
r.Get(ctx, types.NamespacedName{Name: req.Name, Namespace: req.Namespace}, dataset)
//whereas when triggered by a Job
job := &batchv1.Job{}
r.Get(ctx, types.NamespacedName{Name: req.Name, Namespace: req.Namespace}, job)
return ctrl.Result{}, nil
}
How can I check within Reconcile the object kind? so I can retrieve the full object data calling r.Get
By design, the event that triggered reconciliation is not passed to the reconciler so that you are forced to define and act on a state instead. This approach is referred to as level-based, as opposed to edge-based.
In your example you have two resources you are trying to keep track of. I would suggest either:
Using ownerReferences or labels if these resources are related. That way you can get all related Datasets for a given Job (or vice versa) and reconcile things that way.
If the two resources are not related, create a separate controller for each resource.
If you want to prevent reconciliation on certain events you can make use of predicates. From the event in the predicate function you can get the object type by e.Object.(*core.Pod) for example.

How to start multiple Machinery workers?

I am new to golang Machinery, the following is the code on the doc to start workers machinery workers
worker := server.NewWorker("worker_name", 10)
err := worker.Launch()
if err != nil {
// do something with the error
}
My first question is, does server.NewWorker("worker_name", 10) start 10 workers? or it means something else, if not, how do I start 10 workers if needed, run go run example/machinery.go worker 10 times?
My second question is related to the first parameter consumerTag, where can I find the place tags are used?
Thanks
No, this line:
worker := server.NewWorker("worker_name", 10)
Starts a new worker. You need to run multiple instances to start new workers. 10 is the number of concurrent goroutines that specific worker is going to be running. If you have 10 tasks in the queue they can run concurrently.
For the tag, you need to check the specific implementation for each broker in the codebase.

Redis Pub/Sub Ack/Nack

Is there a concept of acknowledgements in Redis Pub/Sub?
For example, when using RabbitMQ, I can have two workers running on separate machines and when I publish a message to the queue, only one of the workers will ack/nack it and process the message.
However I have discovered with Redis Pub/Sub, both workers will process the message.
Consider this simple example, I have this go routine running on two different machines/clients:
go func() {
for {
switch n := pubSubClient.Receive().(type) {
case redis.Message:
process(n.Data)
case redis.Subscription:
if n.Count == 0 {
return
}
case error:
log.Print(n)
}
}
}()
When I publish a message:
conn.Do("PUBLISH", "tasks", "task A")
Both go routines will receive it and run the process function.
Is there a way of achieving similar behaviour to RabbitMQ? E.g. first worker to ack the message will be the only one to receive it and process it.
Redis PubSub is more like a broadcast mechanism.
if you want queues, you can use BLPOP along with RPUSH to get the same interraction. Keep in mind, RabbitMQ does all sorts of other stuff that are not really there in Redis. But if you looking for simple job scheduling / request handling style, this will work just fine.
No, Redis' PubSub does not guarantee delivery nor does it limit the number of possible subscribers who'll get the message.
Redis streams (now, with Redis 5.0) support acknowledgment of tasks as they are completed by a group.
https://redis.io/topics/streams-intro

Resources