I am using a RubyGem (DeathByCaptcha) that makes HTTP calls to deathbycaptcha.com. Every so often the HTTP request times out or fails for some other unknown reason, and my Ruby scripts exits with an exception. I am trying to automate repeated instances of this method ("decode") and I am trying to determine if there is a way to prevent an error in this method from exiting the whole script.
EDIT: Since I'm bound to get flamed on here, I will mention upfront that the purpose of this is to determine the effectiveness of different captcha options on my website's registration page with common captcha-breakers, because I have had problems with spam signups.
Here is how to prevent the exception from exiting the script.
tries = 0
begin
# risky failing code
rescue
sleep(1) # sleep n seconds
tries += 1
retry if tries <= 3 # retry the risky code again
end
You would need to catch the exception that is raised and somehow handle it.
You are looking for something like
begin
# Send HTTP request
rescue WhateverExceptionClassYouGet > error
# Do something with the error
end
Related
I'm using rspotify to gather a list of data from albums names. All the while I've reached Spotify's api rate limit and would now like to create a few fallbacks to wait until the I can search and retry the search in order to not lose the (x) amount of data I've already retrieved.
The gem uses RestClient but the unfortunately when I reach the rate limit I don't get the amount of time needed to wait until I can make another call:
.rvm/gems/ruby-2.5.1/gems/rest-client 2.0.2/lib/restclient/abstract_response.rb:223:in 'exception_with_response': 429 Too Many Requests (RestClient::TooManyRequests)
The above is all I'm given. The begin/rescue statement below doesn't work as when the code fails, it fails entirely without retrying. What am I doing wrong here?
begin
search = RSpotify::Album.search(album[:title])
rescue RestClient::ExceptionWithResponse, RestClient::TooManyRequests, Exception => e
puts e
retry
rescue e
puts e
retry
end
Here is how they suggest error handling:
https://github.com/rest-client/rest-client#response-callbacks-error-handling
I was thinking of maybe throttling, so inside the exception to use
RSpotify::authenticate("id", "token") with some of the multiple spotify accounts that I have
and then retry
So something like that all put together
begin
album = RSpotify::Album.find(track.first.album.id)
rescue RestClient::ExceptionWithResponse, RestClient::TooManyRequests, Exception => e
RSpotify::authenticate("id2", "token2")
retry
end
Bugsnag reports that from time to time IO::EAGAINWaitReadable exception is raised in production.
IO::EAGAINWaitReadable: Resource temporarily unavailable - read would block
The exception is raised on HTTP request via HTTParty, ultimately leading to net/protocol.rb:153:in read_nonblock in Ruby 2.1.3.
Why do I get IO::EAGAINWaitReadable? Why are sometimes HTTP requests blocking? And why not let them block, what's the deal?
The most general way to handle IO::EAGAINWaitReadable is:
begin
result = io.read_nonblock(maxlen)
rescue IO::EAGAINWaitReadable
IO.select([io])
retry
end
So can do it without selection a port, but better with a selection as it is shewn in the example. Also you can look at the SO answer on how to trap the WaitReadable additionally to the specified.
I am trying to make calls to an external API. I handle four or more exceptions for the call.
If I make multiple calls, the code increases very quick. Should I be writing a wrapper for each such call which handles the exceptions and returns data?
Here is an example of such code (this is not mine). The call to user_search is followed by the exception handling.
Note: I am not using Rails
begin
#twitter = Twitter.user_search(name)
rescue Twitter::Unauthorized
puts "Not authorized. Please check the Twitter credentials at the top of the script."
break
rescue Twitter::BadRequest => e
puts "Hit rate limit. Continuing scraping at #{e.ratelimit_reset}"
sleep e.retry_after
retry
rescue Exception => e
puts "Something else went wrong:"
puts e.message
end
I've changed the title of the question. I think the issue is more how to handle long exception handling code. In the example code suppose I have multiple calls to the twitter API followed by exception handling, it seems like the exception handling code disrupts reading the code which does the actual work.
Write your exception handler around a chunk of code so that, if the first line in the block fails, you're happy skipping all the code up until the last line in the block.
If an exception invalidates the whole rest of the method that the handler appears in, consider letting the exception bubble up to the next layer. Not everything necessarily has to be caught by your method.
Often I find myself writing exception handlers around single lines of code (with suitable recovery code) but it's not a rule.
This is more of an opinion oriented question. When handling exceptions in nested codes such as:
Assuming you have a class that initialize another class to run a job. The job returns a value, which is then processed by the class which initially called it.
Where would you put the exception and error logging? Would you define it on the initialization of the job class in the calling class, which will handle then exception in the job execution or on both levels ?
if the job handles exceptions then you don't need to wrap the call to the job in a try catch.
but the class that initializes and runs the job could throw exceptions, so you should handle exceptions at that level as well.
here is an example:
def some_job
begin
# a bunch of logic
rescue
# handle exception
# log it
end
end
it wouldn't make sense then to do this:
def some_manager
begin
some_job
rescue
# log
end
end
but something like this makes more sense:
def some_manager
begin
# a bunch of logic
some_job
# some more logic
rescue
# handle exception
# log
end
end
and of course you would want to catch specific exceptions.
Probably the best answer, in general, for handling Exceptions in Ruby is reading Exceptional Ruby. It may change your perspective on error handling.
Having said that, your specific case. When I hear "job" in hear "background process", so I'll base my answer on that.
Your job will want to report status while it's doing it's thing. This could be states like "in queue", "running", "finished", but it also could be more informative (user facing) information: "processing first 100 out of 1000 records".
So, if an error happens in your background process, my suggestion is two-fold:
Make sure you catch exceptions before you exit the job. Your background job processor might not like a random exception coming from your code. I, personally, like the idea of catching the exception and saving it to the database, for easy retrieval later. Then again, depending on your background job processor, maybe it handles error reporting for you. (I think reque does, for example).
On the front end, use AJAX (or something) to occasionally check in to how the job is doing. Say every 10 seconds or something. In additional to getting the status of the job, also make sure you return this additional information to the user (if appropriate).
I have a Http client written in Ruby that can make synchronous requests to URLs. However, to quickly execute multiple requests I decided to use Eventmachine. The idea is to
queue all the requests and execute them using eventmachine.
class EventMachineBackend
...
...
def execute(request)
$q ||= EM.Queue.new
$q.push(request)
$q.pop {|request| request.invoke}
EM.run{EM.next_tick {EM.stop}}
end
...
end
Forgive my use of a global queue variable. I will refactor it later. Is what I am doing in EventMachineBackend#execute the right way of using Eventmachine queues?
One problem I see in my implementation is it is essentially synchronous. I push a request, pop and execute the request and wait for it to complete.
Could anyone suggest a better implementation.
Your the request logic has to be asynchronous for it to work with EventMachine, I suggest that you use em-http-request. You can find an example on how to use it here, it shows how to run the requests in parallel. An even better interface for running multiple connections in parallel is the MultiRequest class from the same gem.
If you want to queue requests and only run a fixed number of them in parallel you can do something like this:
EM.run do
urls = [...] # regular array with URLs
active_requests = 0
# this routine will be used as callback and will
# be run when each request finishes
when_done = proc do
active_requests -= 1
if urls.empty? && active_requests == 0
# if there are no more urls, and there are no active
# requests it means we're done, so shut down the reactor
EM.stop
elsif !urls.empty?
# if there are more urls launch a new request
launch_next.call
end
end
# this routine launches a request
launch_next = proc do
# get the next url to fetch
url = urls.pop
# launch the request, and register the callback
request = EM::HttpRequest.new(url).get
request.callback(&when_done)
request.errback(&when_done)
# increment the number of active requests, this
# is important since it will tell us when all requests
# are done
active_requests += 1
end
# launch three requests in parallel, each will launch
# a new requests when done, so there will always be
# three requests active at any one time, unless there
# are no more urls to fetch
3.times do
launch_next.call
end
end
Caveat emptor, there may very well be some detail I've missed in the code above.
If you think it's hard to follow the logic in my example, welcome to the world of evented programming. It's really tricky to write readable evented code. It all goes backwards. Sometimes it helps to start reading from the end.
I've assumed that you don't want to add more requests after you've started downloading, it doesn't look like it from the code in your question, but should you want to you can rewrite my code to use an EM::Queue instead of a regular array, and remove the part that does EM.stop, since you will not be stopping. You can probably remove the code that keeps track of the number of active requests too, since that's not relevant. The important part would look something like this:
launch_next = proc do
urls.pop do |url|
request = EM::HttpRequest.new(url).get
request.callback(&launch_next)
request.errback(&launch_next)
end
end
Also, bear in mind that my code doesn't actually do anything with the response. The response will be passed as an argument to the when_done routine (in the first example). I also do the same thing for success and error, which you may not want to do in a real application.