Using timeout in a sidekiq or activejob perform method - ruby

I'm planning to move from Heroku Scheduler to a custom clock process using the clockwork gem. Heroku Scheduler will kill a task if it didn't complete before the next scheduled one of the same type.
How do I achieve this in Sidekiq?
Given that Timeout is not thread safe. Is it a bad idea to do this in a Sidekiq worker?
class RunsTooLongWorker
include Sidekiq::Worker
sidekiq_options :retry => false
def perform(*args)
Timeout::timeout(2.hours) do
# do possibly long running task
end
end
end
If not what's the alternative? Let's say I want to run a job every 10 minutes but I don't want to have the same jobs running at the same time. How should I deal with that?

To answer this part of your question:
Given that Timeout is not thread safe. Is it a bad idea to do this in a Sidekiq worker?
Here is a great blog post written by Mike Perham about
why you shouldn't use Ruby's timeout module in sidekiq jobs
As for an alternative...It sounds like what you need most is to ensure that your jobs don't trip over each other. For this, I can think of two approaches:
Poor Man's approach: create an 'enqueued' attribute on the model of the object you're working on. When you begin whatever processing you're doing, mark that item as enqueued, and when you finish, mark it as 'not enqueued'. Make your every-ten-minute job scope to those not enqueued. Alternatively, create a new table with your job's name and its a status field, and query it for availability before re-executing your processing.
If you have something more complex going on, this sounds like a great case for Sidekiq Unique Jobs gem.. I think the 'while executing' approach is the one you want.

I guess the solution for your task will be to use whenever gem as it is suggested at Sidekiq wiki , and this is the wiki for whenever after installation it will create config/schedule.rbfile where you can define schedule of your jobs for example
every 3.hours do
runner "MyModel.some_process"
rake "my:rake:task"
command "/usr/bin/my_great_command"
end
it has three build in types as you can see from the snippet runner, rake and command but you can also define you owns, and it is explained in wiki as well.

Related

Ruby threading/forking with API (Sinatra)

I am using Sinatra gem for my API. What I want to do is when request is received process it, return the response and start new long running task.
I am newbie to Ruby, I have read about Threading but not sure what is the best way to accomplish my task.
Here my sinatra endpoint
post '/items' do
# Processing data
# Return response (body ...)
# Start long running task
end
I would be grateful for any advice or example.
I believe that better way to do it - is to use background jobs. While your worker executes some long-running tasks, it is unavailable for new requests. With background jobs - they do the work, while your web-worker can work with new request.
You can have a look at most popular backgroung jobs gems for ruby as a starting point: resque, delayed_jobs, sidekiq
UPD: Implementation depends on chosen gem, but general scheme will be like this:
# Controller
post '/items' do
# Processing data
MyAwesomeJob.enqueue # here you put your job into queue
head :ok # or whatever
end
In MyAwesomejob you implement your long-runnning task
Next, about Mongoid and background jobs. You should never use complex objects as job arguments. I don't know what kind of task you are implementing, but there is general answer - use simple objects.
For example, instead of using your User as argument, use user_id and then find it inside your job. If you will do it like that, you can use any DB without problems.
Agree with unkmas.
There are two ways to do this.
Threads or a background job gem like sidekiq.
Threads are perfectly fine if the processing times aren't that high and if you don't want to write code for the worker. But there is a strong possibility that you might run up too many threads if you don't use a threadpool or if you're expecting bursty http traffic.
The best way to do it is by using sidekiq or something similar. You could even have a job queue like beanstalkd in between and en-queue the job to it and return the response. You can have a worker reading from the queue and processing it later on.

Is it safe to call the Sidekiq API from inside perform?

Nothing seems to prevent a perform method to use the Sidekiq API. It should be safe in read-only mode.
What if it calls a "write" methods ? Especially when this method acts on the current job itself.
We would like to reschedule a job without creating a new job because we need to track the job completion with the sidekiq-status gem from another worker.
Using MyWorker.perform_in or MyWorker.perform_at to reschedule the job from inside the worker creates a new job, making it difficult to track the total completion. We're thinking of using Sidekiq::ScheduledSet.new.find and the reschedule method but it seems awkward and potentially dangerous to reschedule a job that is about to complete.
Does Sidekiq and its API support this use case ?
You might be able to hack something together but it'll be really slow if you try to modify the Sets and Lists in Redis directly. They aren't designed to be used that way.
The official Sidekiq solution to this problem is a Batch.
https://github.com/mperham/sidekiq/wiki/Batches#status
You create a one-job batch. If the job needs to be rescheduled, it adds a new job to the Batch to be executed later. Your other worker just checks the status of the overall Batch and if it is 100% complete.

Rufus Scheduler: specs for schedules?

I have just added the rufus scheduler gem to my application and ran it for a few minutes in development mode to find that it works.
But of course I'd like to write a spec that ensures the schedules are set up correctly. For example, typos could slip into the interval strings or some other gremlin might prevent.
My initial idea was to look at Scheduler#jobs but that can become tricky quite quickly: if there are, for example, two jobs with the same interval, I cannot see a straightforward way to identify the one to test.
Apart from that, it should be possible to set up some expectations, run the block and check whether the expected methods were called.
Do you have recommendations on how to test for correctness of job schedules at a given point in the application lifecycle?
You can place tags on jobs:
https://github.com/jmettraux/rufus-scheduler#tags
It helps identifying them. It's also useful to look them up:
https://github.com/jmettraux/rufus-scheduler#schedulerjobstag--tags--x

sidekiq multiple threads how-to

I have a task that will take a long time so I split it into 3 parts and want to launch three threads that will work on it concurrently (I made sure there isn't any accessing of the same variables or anything, don't worry, they strictly handle their own datasets).
As far as I can tell sidekiq launches a new thread for each worker, so I made three workers importer,importer2,importer3, all in app/workers. In one of my controllers I have this code:
Importer.perform_async(arrays[0], date)
Importer2.perform_async(arrays[1], date)
Importer3.perform_async(arrays[2], date)
render json: 1
My question is: Is that the best way to handle this?
It seems odd that a) the request to the controller would take so long to render the 1 and in the sidekiq log I can see Importer JID-639e67d2aa20cce885690dc7 INFO: start as well as the same for Importer2 but not 3 and then then sidekiq just exits with killed
When I relaunch sidekiq, I get the Importer3 ... start and it then is the only one working (it updates a DB value and it is the only one changing`
Any ideas why?
Are you sure you have enough memory? Maybe this can be helpful: Debugging Mystery Sidekiq Shutdowns

Ruby on Rails, Resque

I have a resque job class that is responsible for producing a report on user activity. The class queries the database and then performs numerous calculations/data parsing to send out an email to certain people. My question is, should resque jobs like this, that have numerous method (200 lines or so of code), be filled with all class methods and respond to the single ResqueClass.perform method? Or, should I be instantiating a new instance of this resque class to represent the single report that is being produced? If both methods properly calculate the data and email it, is there a convention or best practice on how it should be handled for background jobs?
Thank You
Both strategies are valid. I generally approach this from the perspective of concurrency. While your job is running, the resque worker servicing your job is busy, so if you have N workers and N of these jobs running, you're going to have to wait until one is done before anything else in the queue gets processed.
Maybe that's ok - if you just have one report at a time then you in effect will dedicate one worker to running the report, your others can do other things. But if you have a pile of these and it takes a while, you might impact other jobs in your queue.
The downside is that if your report dies, you may need logic to pick up where you left off. If you instantiate the report once per user, you'd simply need to retry the failed jobs - no "where was I" logic is required.

Resources