how to retry sidekiq job without raising exceptions? - ruby

Simple stuff, i need to have a way to retry a job WITHOUT RAISING THE EXCEPTION.
I know I can use something like
def perform
if stuff_happening
perform_in(2.min)
return
end
end
and its fine, but there is one problem: retry count. potentially, with stuff_happening every time, this job will keep scheduling indefinitely? is there a way to ensure it will only be scheduled a fix number of times and then stop?

Sidenote: we use 2 spaces for indent in Ruby.
You need to pass the state (amount of retries already happened) through:
def perform(attempts_left = 10)
if stuff_happening && attempts_left > 0
perform_in(2.min, attempts_left - 1)
return
end
end

Related

How to push back sidekiq job without retry in middleware?

Is there a way how to push job back to queue from sidekiq server middleware? Or simply retry without counting it?
UDPATE: My background: I need to track status of the jobs in elasticsearch (one job follows after another one), but if elastic is not accessible, and I reschedule the same worker again, I would lose the chain (jid changes).
The easiest way would be for the job to re-schedule itself, then exit. For example:
class MyJob < ApplicationJob
queue_as :default
def perform(*args)
if ready_to_perform?
# Do stuff!
else
MyJob.perform_later(args)
end
end
end
Use with caution. You probably don't want a job to be stuck re-scheduling itself forever!
This isn't quite the same as "retrying without incrementing the retry counter" (which is a little more complicated to implement), but is sufficient for most use cases like this.
This is not a working code but something like this could help you achieve want you want.
You could define a middleware and add it to sidekiq as below
Sidekiq.configure_server do |config|
config.server_middleware do |chain|
chain.add Sidekiq::RetryMonitoringMiddleware
end
end
Now, you can define in the middleware as mentioned below:
class Sidekiq::RetryMonitoringMiddleware
def call(worker, job_params, _queue)
#calling the worker perform method to add it to the queue
worker.perform(job_params['jid'], *job_params['args']) if should_retry?(job_params)
rescue StandardError => e
Rails.logger.error e
ensure
yield
end
private
def should_retry?(job)
# If worker is having a failure flag then only it should return a response
# Need to check in which key we get a failure message
(Integer(job['failure'])) == (1 || "true")
end
end
Hope it helps!!

Any way to snipe or terminate specific sidekiq workers?

Is it possible to snipe or cancel specific Sidekiq workers/running jobs - effectively invoking an exception or something into the worker thread to terminate it.
I have some fairly simple background ruby (MRI 1.9.3) jobs under Sidekiq (latest) that run fine and are dependent on external systems. The external systems can take varying amounts of time during which the worker must remain available.
I think I can use Sidekiq's API to get to the appropriate worker - but I don't see any 'terminate/cancel/quite/exit' methods in the docs - is this possible? Is this something other people have done?
Ps. I know I could use an async loop within the workers job to trap relevant signals and shut itself down ..but that will complicate things a bit due to the nature of the external systems.
Async loop is the best way to do it as sidekiq has no way to terminate running job.
def perform
main_thread = Thread.new do
ActiveRecord::Base.connection_pool.with_connection do
begin
# ...
ensure
$redis.set some_thread_key, 1
end
end
end
watcher_thread = Thread.new do
ActiveRecord::Base.connection_pool.with_connection do
until $redis.del(some_thread_key) == 1 do
sleep 1
end
main_thread.kill
until !!main_thread.status == false do
sleep 0.1
end
end
end
[main_thread, watcher_thread].each(&:join)
end

Ruby, can you call a method from inside the same method?

I'm trying to compare 2 files. If only 1 file appears, it will create a copy.
Is it then possible to re-call the method, when using begin..rescue..end?
def differ()
begin
file_today = read_file("/etc/hosts.deny")
file_yesterday = read_file("/etc/hosts.deny_old")
content = Diffy::Diff.new(file_yesterday, file_today)
rescue
copy_log
differ #call itself?!O_o Well, after the copy has been created!
end
return content
end
It is "differ #call itself?!O_o Well, after the copy has been created!" that I cant get to work.
You can use the retry keyword in your rescue clause to restart it.
Edit: Here's some more information from the free edition of Programming Ruby:
"The redo statement causes a loop to repeat the current iteration. Sometimes, though, you need to wind the loop right back to the very beginning. The retry statement is just the ticket. retry restarts any kind of iterator loop. Retry will reevaluate any arguments to the iterator before restarting it."
Edit: I realized that this behavior for retry has been deprecated in 1.9. Just know that retry is usually used to re-execute a code block that raised an exception. Make sure you've fixed whatever caused the exception before you retry - otherwise you end up in an infinite loop!

Programmatic access to the Resque failed-job queue

How can I write code to go through the Resque failure queue and selectively delete jobs? Right now I've got a handful of important failures there, interspersed between thousands of failures from a runaway job that ran repeatedly. I want to delete the ones generated by the runaway job. The only API I'm familiar with is for enqueuing jobs. (I'll continue RTFMing, but I'm in a bit of a hurry.)
I neded up doing it like this:
# loop over all failure indices, instantiating as needed
(Resque::Failure.count-1).downto(0).each do |error_index_number|
failure = Resque::Failure.all(error_index_number)
# here :failure is the hash that has all the data about the failed job, perform any check you need here
if failure["error"][/regex_identifying_runaway_job/].present?
Resque::Failure.remove(error_index_number)
# or
# Resque::Failure.requeue(error_index_number)
end
As #Winfield mentioned, having a look at Resque's failure backend is useful.
You can manually modify the Failure queue the way you're asking, but it might be better to write a custom Failure handler that delete/re-enqueues jobs as they fail.
You can find the base failure backend here and an implementation that logs failed jobs to the Hoptoad exception tracking service here.
For example:
module Resque
module Failure
class RemoveRunaways < Base
def save
i=0
while job = Resque::Failure.all(i)
# Selectively remove all MyRunawayJobs from failure queue whenever they fail
if job.fetch('payload').fetch('class') == 'MyRunawayJob'
remove(i)
else
i = i + 1
end
end
end
end
end
end
EDIT: Forgot to mention how to specify this backend to handle Failures.
In your Resque initializer (eg: config/initializers/resque.rb):
# Use Resque Multi failure handler: standard handler and your custom handler
Resque::Failure::Multiple.classes = [Resque::Failure::Redis, Resque::Failure::RemoveRunaways]
Resque::Failure.backend = Resque::Failure::Multiple
Remove with bool function example
I used a higher order function approach, that evaluates a failure to remove
def remove_failures(should_remove_failure_func)
(Resque::Failure.count-1).downto(0).each do |i|
failure = Resque::Failure.all(i)
Resque::Failure.remove(i) if should_remove_failure_func.call(failure)
end
end
def remove_failed_validation_jobs
has_failed_for_validation_reason = -> (failure) do
failure["error"] == "Validation failed: Example has already been taken"
end
remove_failures(has_failed_for_validation_reason)
end

Delayed Job creating Airbrakes every time it raises an error

def perform
refund_log = {
success: refund_retry.success?,
amount: refund_amount,
action: "refund"
}
if refund_retry.success?
refund_log[:reference] = refund_retry.transaction.id
refund_log[:message] = refund_retry.transaction.status
else
refund_log[:message] = refund_retry.message
refund_log[:params] = {}
refund_retry.errors.each do |error|
refund_log[:params][error.code] = error.message
end
order_transaction.message = refund_log[:params].values.join('|')
raise "delayed RefundJob has failed"
end
end
When I raise "delayed RefundJob has failed" in the else statement, it creates an Airbrake. I want to run the job again if it ends up in the else section.
Is there any way to re-queue the job without raising an exception? And prevent creating an airbrake?
I am using delayed_job version 1.
The cleanest way would be to re-queue, i.e. create a new job and enqueue it, and then exit the method normally.
To elaborate on #Roman's response, you can create a new job, with a retry parameter in it, and enqueue it.
If you maintain the retry parameter (increment it each time you re-enqueue a job), you can track how many retries you made, and thus avoid an endless retry loop.
DelayedJob expects a job to raise an error to requeued, by definition.
From there you can either :
Ignore your execpetion on airbrake side, see https://github.com/airbrake/airbrake#filtering so it still gets queued again without filling your logs
Dive into DelayedJob code where you can see on https://github.com/tobi/delayed_job/blob/master/lib/delayed/job.rb#L65 that a method named reschedule is available and used by run_with_lock ( https://github.com/tobi/delayed_job/blob/master/lib/delayed/job.rb#L99 ). From there you can call reschedule it manually, instead of raising your exception.
About the later solution, I advise adding some mechanism that still fill an airbrake report on the third or later try, you can still detect that something is wrong without the hassle of having your logs filled by the attempts.

Resources