Delayed Job creating Airbrakes every time it raises an error - ruby

def perform
refund_log = {
success: refund_retry.success?,
amount: refund_amount,
action: "refund"
}
if refund_retry.success?
refund_log[:reference] = refund_retry.transaction.id
refund_log[:message] = refund_retry.transaction.status
else
refund_log[:message] = refund_retry.message
refund_log[:params] = {}
refund_retry.errors.each do |error|
refund_log[:params][error.code] = error.message
end
order_transaction.message = refund_log[:params].values.join('|')
raise "delayed RefundJob has failed"
end
end
When I raise "delayed RefundJob has failed" in the else statement, it creates an Airbrake. I want to run the job again if it ends up in the else section.
Is there any way to re-queue the job without raising an exception? And prevent creating an airbrake?
I am using delayed_job version 1.

The cleanest way would be to re-queue, i.e. create a new job and enqueue it, and then exit the method normally.

To elaborate on #Roman's response, you can create a new job, with a retry parameter in it, and enqueue it.
If you maintain the retry parameter (increment it each time you re-enqueue a job), you can track how many retries you made, and thus avoid an endless retry loop.

DelayedJob expects a job to raise an error to requeued, by definition.
From there you can either :
Ignore your execpetion on airbrake side, see https://github.com/airbrake/airbrake#filtering so it still gets queued again without filling your logs
Dive into DelayedJob code where you can see on https://github.com/tobi/delayed_job/blob/master/lib/delayed/job.rb#L65 that a method named reschedule is available and used by run_with_lock ( https://github.com/tobi/delayed_job/blob/master/lib/delayed/job.rb#L99 ). From there you can call reschedule it manually, instead of raising your exception.
About the later solution, I advise adding some mechanism that still fill an airbrake report on the third or later try, you can still detect that something is wrong without the hassle of having your logs filled by the attempts.

Related

How to handle SIGTERM with resque-status in complex jobs

I've been using resque on Heroku, which will from time to time interrupt your jobs with a SIGTERM.
Thus far I've handled this with a simple:
def process(options)
do_the_job
rescue Resque::TermException
self.defer options
end
We've started using resque-status so that we can keep track of jobs, but the method above obviously breaks that as the job will show completed when actually it's been deferred to another job.
My current thinking is that instead of deferring the current job in resque, there needs to be another job that re-queues jobs that have failed due to SIGTERM.
The trick comes in that some jobs are more complicated:
def process(options)
do_part1 unless options['part1_finished']
options['part1_finished']
do_part2
rescue Resque::TermException
self.defer options
end
Simply removing the rescue and simply retrying those jobs would cause an exception when do_part1 gets repeated.
Looking more deeply into how resque-status works, a possible work around is to go straight to resque for the re-queue using the same parameters that resque-status would use.
def process
do_part1 unless options['part1_finished']
options['part1_finished']
do_part2
rescue Resque::TermException
Resque.enqueue self.class, uuid, options
raise DeferredToNewJob
end
Of course, this is undocumented so may be incompatible with future releases of resque-status.
There is a draw back: between that job failing and the new job picking it up, the status of the first job will be reported by resque-status.
This is why I re-raise a new exception - otherwise the job status will show completed until the new worker picks up the old job, which may confuse processes that are watching and waiting for the job to finish.
By raising a new exception DeferredToNewJob, the job status will temporarily show failure, which is easier to work around at the front end, and the specific exception can be automatically cleared from the resque failure queue.
UPDATE
resque-status provides support for on_failure handler. If a method with this name is defined as an instance method on the class, we can make this even simpler
Here's my on_failure
def on_failure(e)
if e.is_a? DeferredToNewJob
tick('Waiting for new job')
else
raise e
end
end
With this in place the job spends basically no time in the failed state for processes watching it's status.
In addition, if resque-status finds this handler, then it won't raise the exception up to resque, so it won't get added to the failed queue.

Basic Sidekiq Questions about Idempotency and functions

I'm using Sidekiq to perform some heavy processing in the background. I looked online but couldn't find the answers to the following questions. I am using:
Class.delay.use_method(listing_id)
And then, inside the class, I have a
self.use_method(listing_id)
listing = Listing.find_by_id listing_id
UserMailer.send_mail(listing)
Class.call_example_function()
Two questions:
How do I make this function idempotent for the UserMailer sendmail? In other words, in case the delayed method runs twice, how do I make sure that it only sends the mail once? Would wrapping it in something like this work?
mail_sent = false
if !mail_sent
UserMailer.send_mail(listing)
mail_sent = true
end
I'm guessing not since the function is tried again and then mail_sent is set to false for the second run through. So how do I make it so that UserMailer is only run once.
Are functions called within the delayed async method also asynchronous? In other words, is Class.call_example_function() executed asynchronously (not part of the response / request cycle?) If not, should I use Class.delay.call_example_function()
Overall, just getting familiar with Sidekiq so any thoughts would be appreciated.
Thanks
I'm coming into this late, but having been around the loop and had this StackOverflow entry appearing prominently via Google, it needs clarification.
The issue of idempotency and the issue of unique jobs are not the same thing. The 'unique' gems look at the parameters of job at the point it is about to be processed. If they find that there was another job with the same parameters which had been submitted within some expiry time window then the job is not actually processed.
The gems are literally what they say they are; they consider whether an enqueued job is unique or not within a certain time window. They do not interfere with the retry mechanism. In the case of the O.P.'s question, the e-mail would still get sent twice if Class.call_example_function() threw an error thus causing a job retry, but the previous line of code had successfully sent the e-mail.
Aside: The sidekiq-unique-jobs gem mentioned in another answer has not been updated for Sidekiq 3 at the time of writing. An alternative is sidekiq-middleware which does much the same thing, but has been updated.
https://github.com/krasnoukhov/sidekiq-middleware
https://github.com/mhenrixon/sidekiq-unique-jobs (as previously mentioned)
There are numerous possible solutions to the O.P.'s email problem and the correct one is something that only the O.P. can assess in the context of their application and execution environment. One would be: If the e-mail is only going to be sent once ("Congratulations, you've signed up!") then a simple flag on the User model wrapped in a transaction should do the trick. Assuming a class User accessible as an association through the Listing via listing.user, and adding in a boolean flag mail_sent to the User model (with migration), then:
listing = Listing.find_by_id(listing_id)
unless listing.user.mail_sent?
User.transaction do
listing.user.mail_sent = true
listing.user.save!
UserMailer.send_mail(listing)
end
end
Class.call_example_function()
...so that if the user mailer throws an exception, the transaction is rolled back and the change to the user's flag setting is undone. If the "call_example_function" code throws an exception, then the job fails and will be retried later, but the user's "e-mail sent" flag was successfully saved on the first try so the e-mail won't be resent.
Regarding idempotency, you can use https://github.com/mhenrixon/sidekiq-unique-jobs gem:
All that is required is that you specifically set the sidekiq option
for unique to true like below:
sidekiq_options unique: true
For jobs scheduled in the future it is possible to set for how long
the job should be unique. The job will be unique for the number of
seconds configured or until the job has been completed.
*If you want the unique job to stick around even after it has been successfully processed then just set the unique_unlock_order to
anything except :before_yield or :after_yield (unique_unlock_order =
:never)
I'm not sure I understand the second part of the question - when you delay a method call, the whole method call is deferred to the sidekiq process. If by 'response / request cycle' you mean that you are running a web server, and you call delay from there, so all the calls within the use_method are called from the sidekiq process, and hence outside of that cycle. They are called synchronously relative to each other though...

Ruby, can you call a method from inside the same method?

I'm trying to compare 2 files. If only 1 file appears, it will create a copy.
Is it then possible to re-call the method, when using begin..rescue..end?
def differ()
begin
file_today = read_file("/etc/hosts.deny")
file_yesterday = read_file("/etc/hosts.deny_old")
content = Diffy::Diff.new(file_yesterday, file_today)
rescue
copy_log
differ #call itself?!O_o Well, after the copy has been created!
end
return content
end
It is "differ #call itself?!O_o Well, after the copy has been created!" that I cant get to work.
You can use the retry keyword in your rescue clause to restart it.
Edit: Here's some more information from the free edition of Programming Ruby:
"The redo statement causes a loop to repeat the current iteration. Sometimes, though, you need to wind the loop right back to the very beginning. The retry statement is just the ticket. retry restarts any kind of iterator loop. Retry will reevaluate any arguments to the iterator before restarting it."
Edit: I realized that this behavior for retry has been deprecated in 1.9. Just know that retry is usually used to re-execute a code block that raised an exception. Make sure you've fixed whatever caused the exception before you retry - otherwise you end up in an infinite loop!

Programmatic access to the Resque failed-job queue

How can I write code to go through the Resque failure queue and selectively delete jobs? Right now I've got a handful of important failures there, interspersed between thousands of failures from a runaway job that ran repeatedly. I want to delete the ones generated by the runaway job. The only API I'm familiar with is for enqueuing jobs. (I'll continue RTFMing, but I'm in a bit of a hurry.)
I neded up doing it like this:
# loop over all failure indices, instantiating as needed
(Resque::Failure.count-1).downto(0).each do |error_index_number|
failure = Resque::Failure.all(error_index_number)
# here :failure is the hash that has all the data about the failed job, perform any check you need here
if failure["error"][/regex_identifying_runaway_job/].present?
Resque::Failure.remove(error_index_number)
# or
# Resque::Failure.requeue(error_index_number)
end
As #Winfield mentioned, having a look at Resque's failure backend is useful.
You can manually modify the Failure queue the way you're asking, but it might be better to write a custom Failure handler that delete/re-enqueues jobs as they fail.
You can find the base failure backend here and an implementation that logs failed jobs to the Hoptoad exception tracking service here.
For example:
module Resque
module Failure
class RemoveRunaways < Base
def save
i=0
while job = Resque::Failure.all(i)
# Selectively remove all MyRunawayJobs from failure queue whenever they fail
if job.fetch('payload').fetch('class') == 'MyRunawayJob'
remove(i)
else
i = i + 1
end
end
end
end
end
end
EDIT: Forgot to mention how to specify this backend to handle Failures.
In your Resque initializer (eg: config/initializers/resque.rb):
# Use Resque Multi failure handler: standard handler and your custom handler
Resque::Failure::Multiple.classes = [Resque::Failure::Redis, Resque::Failure::RemoveRunaways]
Resque::Failure.backend = Resque::Failure::Multiple
Remove with bool function example
I used a higher order function approach, that evaluates a failure to remove
def remove_failures(should_remove_failure_func)
(Resque::Failure.count-1).downto(0).each do |i|
failure = Resque::Failure.all(i)
Resque::Failure.remove(i) if should_remove_failure_func.call(failure)
end
end
def remove_failed_validation_jobs
has_failed_for_validation_reason = -> (failure) do
failure["error"] == "Validation failed: Example has already been taken"
end
remove_failures(has_failed_for_validation_reason)
end

Ruby - Error Handling - Good Practices

This is more of an opinion oriented question. When handling exceptions in nested codes such as:
Assuming you have a class that initialize another class to run a job. The job returns a value, which is then processed by the class which initially called it.
Where would you put the exception and error logging? Would you define it on the initialization of the job class in the calling class, which will handle then exception in the job execution or on both levels ?
if the job handles exceptions then you don't need to wrap the call to the job in a try catch.
but the class that initializes and runs the job could throw exceptions, so you should handle exceptions at that level as well.
here is an example:
def some_job
begin
# a bunch of logic
rescue
# handle exception
# log it
end
end
it wouldn't make sense then to do this:
def some_manager
begin
some_job
rescue
# log
end
end
but something like this makes more sense:
def some_manager
begin
# a bunch of logic
some_job
# some more logic
rescue
# handle exception
# log
end
end
and of course you would want to catch specific exceptions.
Probably the best answer, in general, for handling Exceptions in Ruby is reading Exceptional Ruby. It may change your perspective on error handling.
Having said that, your specific case. When I hear "job" in hear "background process", so I'll base my answer on that.
Your job will want to report status while it's doing it's thing. This could be states like "in queue", "running", "finished", but it also could be more informative (user facing) information: "processing first 100 out of 1000 records".
So, if an error happens in your background process, my suggestion is two-fold:
Make sure you catch exceptions before you exit the job. Your background job processor might not like a random exception coming from your code. I, personally, like the idea of catching the exception and saving it to the database, for easy retrieval later. Then again, depending on your background job processor, maybe it handles error reporting for you. (I think reque does, for example).
On the front end, use AJAX (or something) to occasionally check in to how the job is doing. Say every 10 seconds or something. In additional to getting the status of the job, also make sure you return this additional information to the user (if appropriate).

Resources