Resque find position of job in a queue - resque

We want to use resque to queue a bunch of jobs, and process them by workers. While the jobs are waiting to be processed, we want to know what is their position in the queue (as an indicator of how long they would have to wait). How do we find the position of a job in a queue?
Thanks in advance.

Assuming your problem is in using resque queue system ( you have not mentioned an technology stack that you are using ) .
You can use resque-status an extension to the resque queue system that provides simple trackable jobs.
resque-status provides a set of simple classes that extend resque’s default functionality (with 0% monkey patching) to give apps a way to track specific job instances and their status. It achieves this by giving job instances UUID’s and allowing the job instances to report their status from within their iterations.

Related

Spring Scheduling Quartz and thousands of jobs

According to the business logic of my Spring Boot application with Quartz Scheduling and MongoDB as Job persistent storage, every user of the system can create the postponed job that must be executed at some point in time. The user chooses the time when it must be executed.
Right now I'm thinking about the approach where every user will create a dedicated JobDetail for every postponed job, something like this:
schedulerFactoryBean.getScheduler().addJob(jobDetail(), true, true);
The issue I can potentially see here, that with this approach I can quickly create thousands of jobs in Quartz scheduler. Previously I never scheduled such amount of jobs in Spring Scheduling with Quartz and don't know how the system will handle it. Is it a good idea to implement the system in such way and will Spring Scheduling Quartz handle such amount of jobs without problems?
Yes, Quartz itself can handle thousands of jobs and triggers without any issues.
If you are going to have many jobs executing concurrently, just make sure that you configure Quartz with a sufficient number of worker threads. The number of worker threads should be typically equal to the maximum number of jobs that can be running concurrently + some small buffer (10% or so) just in case.
From what you write I assume that your jobs will be one-off jobs, i.e. each job will be executed only once. If that is the case, Quartz can automatically discard your jobs as soon as they finish executing unless your jobs are marked as durable. Quartz automatically removes non-durable jobs if they are not scheduled to run in the future. This feature may help you reduce the total number of registered jobs.
I hope this helps. If not, please ask.

Laravel Queue start a second job after first job

In my Laravel 5.1 project I want to start my second job when first will finished.
Here is my logic.
\Queue::push(new MyJob())
and when this job finish I want to start this job
\Queue::push(new ClearJob())
How can i realize this?
If you want this, you just should define 1 Queue.
A queue is just a list/line of things waiting to be handled in order,
starting from the beginning. When I say things, I mean jobs. - https://toniperic.com/2015/12/01/laravel-queues-demystified
To get the opposite of what you want: async executed Jobs, you should define a new Queue for every Job.
Multiple Queues and Workers
You can have different queues/lists for
storing the jobs. You can name them however you want, such as “images”
for pushing image processing tasks, or “emails” for queue that holds
jobs specific to sending emails. You can also have multiple workers,
each working on a different queue if you want. You can even have
multiple workers per queue, thus having more than one job being worked
on simultaneously. Bear in mind having multiple workers comes with a
CPU and memory cost. Look it up in the official docs, it’s pretty
straightforward.

Ruby on Rails, Resque

I have a resque job class that is responsible for producing a report on user activity. The class queries the database and then performs numerous calculations/data parsing to send out an email to certain people. My question is, should resque jobs like this, that have numerous method (200 lines or so of code), be filled with all class methods and respond to the single ResqueClass.perform method? Or, should I be instantiating a new instance of this resque class to represent the single report that is being produced? If both methods properly calculate the data and email it, is there a convention or best practice on how it should be handled for background jobs?
Thank You
Both strategies are valid. I generally approach this from the perspective of concurrency. While your job is running, the resque worker servicing your job is busy, so if you have N workers and N of these jobs running, you're going to have to wait until one is done before anything else in the queue gets processed.
Maybe that's ok - if you just have one report at a time then you in effect will dedicate one worker to running the report, your others can do other things. But if you have a pile of these and it takes a while, you might impact other jobs in your queue.
The downside is that if your report dies, you may need logic to pick up where you left off. If you instantiate the report once per user, you'd simply need to retry the failed jobs - no "where was I" logic is required.

Knowing when resque worker had completed job

I am performing some job with Resque-worker (5 workers). Now, when this job is completed/done I want to trigger another worker which processes the data previous worker stored in db. What would be the most appropriate method of doing this?
I'm not sure if this is of any help, but have you had a look at the resque-status gem?
That way you can track a given jobs' status, to see when it is completed. But I'm affraid there are no auto-trigger functionality, to start new workers.

Is hadoop's job ThreadSafe?

Anyone knows if org.apache.hadoop.mapreduce.Job is thread-safe? In my application I create a thread for each job, and then waitForCompletion. And I have another monitor thread that checks every job's state with isComplete.
Is that safe? Are jobs thread-safe? Documentation doesn't seem to mention anything about it...
Thanks
Udi
Unlike the others, I also use threads to submit jobs in parallel and wait for their completion. You just have to use a job class instance per thread. If you share same job instances over multiple threads, you have to take care of the synchronization by yourself.
Why would you want to write a separate thread for each job? What exactly is your use case?
You can run multiple jobs in your Hadoop cluster. Do you have dependencies between the multiple jobs?
Suppose you have 10 jobs running. 1 job fails then would you need to re-run the 9 successful tasks.
Finally, job tracker will take care of scheduling multiple jobs on the Hadoop cluster. If you do not have dependencies then you should not be worried about thread safety. If you have dependencies then you may need to re-think your design.
Yes they are.. Actually the files is split in blocks and each block is executed on a separate node. all the map tasks run in parallel and then are fed to the the reducer after they are done. There is no question of synchronization as you would think about in multi threaded program. In multi threaded program all the threads are running on the same box and since they share some of the data you have to synchronize them
Just in case you need another kind of parallelism on the map task level, you should override run() method in your mapper and work with multiple threads there. Default implementation calls setup(), then map() times number of records to process, and finally it calls cleanup() method once.
Hope this helps someone!
If you are checking whether the jobs have finished I think you are a bit confused about how Map reduce works. You ought to be letting Hadoop do that for itself.

Resources