I am performing some job with Resque-worker (5 workers). Now, when this job is completed/done I want to trigger another worker which processes the data previous worker stored in db. What would be the most appropriate method of doing this?
I'm not sure if this is of any help, but have you had a look at the resque-status gem?
That way you can track a given jobs' status, to see when it is completed. But I'm affraid there are no auto-trigger functionality, to start new workers.
Related
In my Laravel 5.1 project I want to start my second job when first will finished.
Here is my logic.
\Queue::push(new MyJob())
and when this job finish I want to start this job
\Queue::push(new ClearJob())
How can i realize this?
If you want this, you just should define 1 Queue.
A queue is just a list/line of things waiting to be handled in order,
starting from the beginning. When I say things, I mean jobs. - https://toniperic.com/2015/12/01/laravel-queues-demystified
To get the opposite of what you want: async executed Jobs, you should define a new Queue for every Job.
Multiple Queues and Workers
You can have different queues/lists for
storing the jobs. You can name them however you want, such as “images”
for pushing image processing tasks, or “emails” for queue that holds
jobs specific to sending emails. You can also have multiple workers,
each working on a different queue if you want. You can even have
multiple workers per queue, thus having more than one job being worked
on simultaneously. Bear in mind having multiple workers comes with a
CPU and memory cost. Look it up in the official docs, it’s pretty
straightforward.
Nothing seems to prevent a perform method to use the Sidekiq API. It should be safe in read-only mode.
What if it calls a "write" methods ? Especially when this method acts on the current job itself.
We would like to reschedule a job without creating a new job because we need to track the job completion with the sidekiq-status gem from another worker.
Using MyWorker.perform_in or MyWorker.perform_at to reschedule the job from inside the worker creates a new job, making it difficult to track the total completion. We're thinking of using Sidekiq::ScheduledSet.new.find and the reschedule method but it seems awkward and potentially dangerous to reschedule a job that is about to complete.
Does Sidekiq and its API support this use case ?
You might be able to hack something together but it'll be really slow if you try to modify the Sets and Lists in Redis directly. They aren't designed to be used that way.
The official Sidekiq solution to this problem is a Batch.
https://github.com/mperham/sidekiq/wiki/Batches#status
You create a one-job batch. If the job needs to be rescheduled, it adds a new job to the Batch to be executed later. Your other worker just checks the status of the overall Batch and if it is 100% complete.
I want to only send out one email at a time so I have a script starting every minute. Is it possible to only send out one email and then stop the delayed job from sending the next queued jobs?
Before you place the next job on the queue, you could look at the last job that's already on the queue, and check it's run_at time. Then set the run_at time for your job to be one minute later. If there's no jobs on the queue, set it to now, or one minute from now, depending on how strict you need to be about one minute between.
You could fetch one specific job from DJ's table itself, invoke it and then destroy it, something like:
job = Delayed::Job.last
job.invoke_job # This will NOT destroy the job
job.destroy
Found it here: https://groups.google.com/forum/#!topic/delayed_job/5j5BmAlXN3g
We want to use resque to queue a bunch of jobs, and process them by workers. While the jobs are waiting to be processed, we want to know what is their position in the queue (as an indicator of how long they would have to wait). How do we find the position of a job in a queue?
Thanks in advance.
Assuming your problem is in using resque queue system ( you have not mentioned an technology stack that you are using ) .
You can use resque-status an extension to the resque queue system that provides simple trackable jobs.
resque-status provides a set of simple classes that extend resque’s default functionality (with 0% monkey patching) to give apps a way to track specific job instances and their status. It achieves this by giving job instances UUID’s and allowing the job instances to report their status from within their iterations.
Anyone knows if org.apache.hadoop.mapreduce.Job is thread-safe? In my application I create a thread for each job, and then waitForCompletion. And I have another monitor thread that checks every job's state with isComplete.
Is that safe? Are jobs thread-safe? Documentation doesn't seem to mention anything about it...
Thanks
Udi
Unlike the others, I also use threads to submit jobs in parallel and wait for their completion. You just have to use a job class instance per thread. If you share same job instances over multiple threads, you have to take care of the synchronization by yourself.
Why would you want to write a separate thread for each job? What exactly is your use case?
You can run multiple jobs in your Hadoop cluster. Do you have dependencies between the multiple jobs?
Suppose you have 10 jobs running. 1 job fails then would you need to re-run the 9 successful tasks.
Finally, job tracker will take care of scheduling multiple jobs on the Hadoop cluster. If you do not have dependencies then you should not be worried about thread safety. If you have dependencies then you may need to re-think your design.
Yes they are.. Actually the files is split in blocks and each block is executed on a separate node. all the map tasks run in parallel and then are fed to the the reducer after they are done. There is no question of synchronization as you would think about in multi threaded program. In multi threaded program all the threads are running on the same box and since they share some of the data you have to synchronize them
Just in case you need another kind of parallelism on the map task level, you should override run() method in your mapper and work with multiple threads there. Default implementation calls setup(), then map() times number of records to process, and finally it calls cleanup() method once.
Hope this helps someone!
If you are checking whether the jobs have finished I think you are a bit confused about how Map reduce works. You ought to be letting Hadoop do that for itself.