Ruby on Rails, Resque - ruby

I have a resque job class that is responsible for producing a report on user activity. The class queries the database and then performs numerous calculations/data parsing to send out an email to certain people. My question is, should resque jobs like this, that have numerous method (200 lines or so of code), be filled with all class methods and respond to the single ResqueClass.perform method? Or, should I be instantiating a new instance of this resque class to represent the single report that is being produced? If both methods properly calculate the data and email it, is there a convention or best practice on how it should be handled for background jobs?
Thank You

Both strategies are valid. I generally approach this from the perspective of concurrency. While your job is running, the resque worker servicing your job is busy, so if you have N workers and N of these jobs running, you're going to have to wait until one is done before anything else in the queue gets processed.
Maybe that's ok - if you just have one report at a time then you in effect will dedicate one worker to running the report, your others can do other things. But if you have a pile of these and it takes a while, you might impact other jobs in your queue.
The downside is that if your report dies, you may need logic to pick up where you left off. If you instantiate the report once per user, you'd simply need to retry the failed jobs - no "where was I" logic is required.

Related

Cron vs queued task

My application has an Order model with an execution_datetime attribute. I'd like to send some distinct notifications. For example
execution_datetime minus 12 hours: email to carrier
execution_datetime minus 3 hours: sms to customer
execution_datetime plus 1 hour: email to customer
The above timings are not strict and can be approximated; slight deviations are acceptable. Also, the execution_datetime can change in the meantime...
I'm unsure whether to use cron or queued tasks for this. Some thoughts of my own:
Cron:
Business logic will need to be written to fetch applicable orders and execute accordingly
Is execution guaranteed? Should some sort of database flag be implemented to indicate a notification has been sent, and then perhaps fetch all due orders that are unflagged as some sort of failsafe?
Queued tasks:
Task is scheduled on creation of the order? If so, suppose the execution time is changed. How to modify the scheduled task? You'd need to somewhere keep track of the task ID?
Or perhaps a cron job that mass schedules applicable tasks every day?
I look forward to your suggestions.
Great question! I am interested in this discussion.Let me chip in with a scenario from my personal experience.
In my application, I have a Listing model and they have a promotion_ends_at column. Obviously, the listing promotion ends sometimes in the future.
So, like you also mentioned, there are two ways to do this.
When the listing is created, I could queue a job that will end the promotion on the listing in the future). The delay of that job would be the time the promotion has to end (and that could me months away).
I could also have a cron job that runs regularly that manages listings that their promotions should end on a specific date.
We were using SQS as our queue service and since the maximum delay on SQS is 15 mins, option 1 was not feasible. We, then, moved to Redis where we could queue delayed jobs with a long delay easily.
However, like you also said, the promotion_ends_at column could be updated during that time. So, either, you would have to keep track of the job to de-queue it or you could re-check whether the job should still run when it is about to execute.
For example, you could fresh() the model and check whether your condition is still valid. In my case, I would fresh my Listing and check if the promotion_ends_at is in the past. However, this means that we would have a lot of stale jobs that would probably be discarded anyway.
We finally went with a simple cron job that mass schedules the job on the day that they need to be run. I also think that running delayed jobs is a business logic and maybe the queue shouldn't be held responsible for running jobs delayed far too much in the future.

Laravel Queue start a second job after first job

In my Laravel 5.1 project I want to start my second job when first will finished.
Here is my logic.
\Queue::push(new MyJob())
and when this job finish I want to start this job
\Queue::push(new ClearJob())
How can i realize this?
If you want this, you just should define 1 Queue.
A queue is just a list/line of things waiting to be handled in order,
starting from the beginning. When I say things, I mean jobs. - https://toniperic.com/2015/12/01/laravel-queues-demystified
To get the opposite of what you want: async executed Jobs, you should define a new Queue for every Job.
Multiple Queues and Workers
You can have different queues/lists for
storing the jobs. You can name them however you want, such as “images”
for pushing image processing tasks, or “emails” for queue that holds
jobs specific to sending emails. You can also have multiple workers,
each working on a different queue if you want. You can even have
multiple workers per queue, thus having more than one job being worked
on simultaneously. Bear in mind having multiple workers comes with a
CPU and memory cost. Look it up in the official docs, it’s pretty
straightforward.

Laravel Queues for multi user environment

I am using Laravel 5.1, and I have a task that takes around 2 minutes to process, and this task particularly is generating a report...
Now, it is obvious that I can't make the user wait for 2 minutes on the same page where I took user's input, instead I should process this task in the background and notify the user later about task completion...
So, to achieve this, Laravel provides Queues that runs the tasks in background (If I didn't understand wrong), Now for multi-user environment, i.e. if more than one user demands report generation (say there are 4 users), so being the feature named Queues, does it mean that tasks will be performed one after the other (i.e. when 4 users demand for report generation one after other, then 4th user's report will only be generated when report of 3rd user is generated) ??
If Queues completes their tasks one after other, then is there anyway with which tasks are instantly processed in background, on request of user, and user can get notified later when its task is completed??
Queue based architecture is little complicated than that. See the Queue provides you an interface to different messaging implementations like rabbitMQ, beanstalkd.
Now at any point in code you send send message to Queue which in this context is termed as a JOB. Now your queue will have multiple jobs which are ready to get out as in FIFO sequence.
As per your questions, there are worker which listens to queue, they get a job and execute them. It's up to you how many workers you want. If you have one worker your tasks will be executed one after another, more the workers more the parallel processes.
Worker process are started with command line interface of laravel called Artisan. Each process means one worker. You can start multiple workers with supervisor.
Since you know for sure that u r going to send notification to user after around 2 mins, i suggest to use cron job to check whether any report to generate every 2 mins and if there are, you can send notification to user. That check will be a simple one query so don't need to worry about performance that much.

Why does resque use child processes for processing each job in a queue?

We have been using Resque in most of our projects, and we have been happy with it.
In a recent project, we were having a situation, where we are making a connection to a live streaming API from the twitter. Since, we have to maintain the connection, we were dumping each line from the streaming API to a resque queue, lest the connection is not lost. And we were, processing the queue afterwards.
We had a situation where the insertion rate into the queue was of the order 30-40/second and the rate at which the queue is popped was only 3-5/second. And because of this, the queue was always increasing. When we checked for reasons for this, we found that resque had a parent process, and for each job of the queue, it forks a child process, and the child process will be processing the job. Our rails environment was quite heavy and the child process forking was taking time.
So, we implemented another rake task of this sort, for the time being:
rake :process_queue => :environment do
while true
begin
interaction = Resque.pop("process_twitter_resque")
if interaction
ProcessTwitterResque.perform(interaction)
end
rescue => e
puts e.message
puts e.backtrace.join("\n")
end
end
end
and started the task like this:
nohup bundle exec rake process_queue --trace >> log/workers/process_queue/worker.log 2>&1 &
This does not handle failed jobs and all.
But, my question is why does Resque implement a child forked process to process the jobs from the queue. The jobs definitly does not need to be processed paralelly (since it is a queue and we expect it to process one after the other, sequentially and I beleive Resque also fork only 1 child process at a time).
I am sure Resque has done it with some purpose in mind. What is the exact purpose behind this parent/child process architecture?
The Ruby process that sits and listens for jobs in Redis is not the process that ultimately runs the job code written in the perform method. It is the “master” process, and its only responsibility is to listen for jobs. When it receives a job, it forks yet another process to run the code. This other “child” process is managed entirely by its master. The user is not responsible for starting or interacting with it using rake tasks. When the child process finishes running the job code, it exits and returns control to its master. The master now continues listening to Redis for its next job.
The advantage of this master-child process organization – and the advantage of Resque processes over threads – is the isolation of job code. Resque assumes that your code is flawed, and that it contains memory leaks or other errors that will cause abnormal behavior. Any memory claimed by the child process will be released when it exits. This eliminates the possibility of unmanaged memory growth over time. It also provides the master process with the ability to recover from any error in the child, no matter how severe. For example, if the child process needs to be terminated using kill -9, it will not affect the master’s ability to continue processing jobs from the Redis queue.
In earlier versions of Ruby, Resque’s main criticism was its potential to consume a lot of memory. Creating new processes means creating a separate memory space for each one. Some of this overhead was mitigated with the release of Ruby 2.0 thanks to copy-on-write. However, Resque will always require more memory than a solution that uses threads because the master process is not forked. It’s created manually using a rake task, and therefore must load whatever it needs into memory from the start. Of course, manually managing each worker process in a production application with a potentially large number of jobs quickly becomes untenable. Thankfully, we have pool managers for that.
Resque uses #fork for 2 reasons (among others): ability to prevent zombie workers (just kill them) and ability to use multiple cores (since it's another process).
Maybe this will help you with your fast-executing jobs: http://thewebfellas.com/blog/2012/12/28/resque-worker-performance

how to implement custom cloud worker

I am designing a cloud app and need a worker process which scours my database looking for work, and then performs it.
Most of the info I seem to find on the subject of background tasks in the cloud involves some kind of scheduler and/or queuing system.
What I have doesn't quite fit into the "run this task every 5 minutes" or "add this to the queue to be executed later" models. I think the main difference to my problem is that the workers themselves find work to do, rather than being assigned it by a periodic scheduler or an external process that generates work.
What I have is basically a giant table where each entry has three fields:
job: a small task to be performed, lets say it gets the last message from a twitter account and stores it in the database
the interval at which to perform that job: say every 5 minutes, N.B. the interval is arbitrary and different for each entry in the table
the last date when the job was performed
The way I would implement this is to have a worker which has an infinite loop. When it enters the loop, it scours the database a)looking for items whose date + interval < currentTime, b)when it finds one, it sets date = currentTime, and c)then executes the job. If there is no work ATM, it sleep for a few seconds, then tries again.
I will have many parallel workers scouring the database simultaneously, which is why I do b) first and then c) in the paragraph above. Since there are parallel workers, action a) and b) are atomic operations on the database to prevent work being duplicated. If the worker crashes after a) and b), but before it manages to finish the work, it's no big deal, and the workers can just do it at the next interval; reason for this is that the work is not performed in a time-invariant system so a backlog scenario of failed jobs has no benefit as the tasks have to be performed at their exact intervals, so it's better to skip 1 interval than to have uneven intervals between which the tasks were executed.
My question is whether that is a reasonable implementation strategy? If so, how do I bring this process to life on the cloud (I am using Heroku, but may switch to EC2 in the future)? I still haven't written any code so I would welcome other suggestions (maybe I misunderstood the use cases/applications for queue systems).
This sounds so close to using something like a scheduled job that you might as well tread the well beaten path and do it the more conventional way. There's no reason why you can't schedule a job to run once every few seconds.
However, this idea of looking for work sounds dodgy. What happens if two workers find the same task to run at the same time for instance? Also, are there not triggers in the application which can indicate that work needs doing? It seems strange that you have code 'looking for work'.
You can go a very long way with simple periodic background tasks, so I would exhaust all possibilities in that area before rolling your own.

Resources