Skip if it doesn't respond in a certain amount of time - ruby

I've created a program that pulls websites off of google and then strips them down to their basic url: example http://google.com/search/owie/weikw => http://google.com. It then saves these to a file.
After that it runs a .each_line on the file then runs a whois command, what I want to do is if the command doesn't respond in a certain amount of time, skip that line of the file and go to the next one, is there a way I can do this?

Use the Timeout Module
If your scraper or whois doesn't support timeout natively, you can use Timeout::timeout to set an upper bound in seconds. For example:
require 'timeout'
MAX_SECONDS = 10
begin
Timeout::timeout(MAX_SECONDS) do
# run your whois
end
rescue Timeout::Error
# handle the exception
end
By default, this will raise a Timeout::Error exception if the block exceeds the time limit, but you can have the method raise other exceptions if you prefer. How you handle the exceptions is then up to you.

Related

Error handling API calls via RestClient

I'm using rspotify to gather a list of data from albums names. All the while I've reached Spotify's api rate limit and would now like to create a few fallbacks to wait until the I can search and retry the search in order to not lose the (x) amount of data I've already retrieved.
The gem uses RestClient but the unfortunately when I reach the rate limit I don't get the amount of time needed to wait until I can make another call:
.rvm/gems/ruby-2.5.1/gems/rest-client 2.0.2/lib/restclient/abstract_response.rb:223:in 'exception_with_response': 429 Too Many Requests (RestClient::TooManyRequests)
The above is all I'm given. The begin/rescue statement below doesn't work as when the code fails, it fails entirely without retrying. What am I doing wrong here?
begin
search = RSpotify::Album.search(album[:title])
rescue RestClient::ExceptionWithResponse, RestClient::TooManyRequests, Exception => e
puts e
retry
rescue e
puts e
retry
end
Here is how they suggest error handling:
https://github.com/rest-client/rest-client#response-callbacks-error-handling
I was thinking of maybe throttling, so inside the exception to use
RSpotify::authenticate("id", "token") with some of the multiple spotify accounts that I have
and then retry
So something like that all put together
begin
album = RSpotify::Album.find(track.first.album.id)
rescue RestClient::ExceptionWithResponse, RestClient::TooManyRequests, Exception => e
RSpotify::authenticate("id2", "token2")
retry
end

Adding Retry mechanism to Watir in case of Timeout

I have a series of scripts that I have developed using Ruby and the Watir gem. Those are wrapped by Spinach, but that is beside what I am about to ask.
The intent of those scripts is to do some functional spot check or simply alleviate some very repetitive tasks.
They have been running well for a while, but lately, I've started to see a lot of failure due to Timeouts between the Chromedriver / Geckodriver (tried both browsers) and the scripts. Of course, I could simply restart the script, but when the success rate goes below 70 % it really starts to be aggravating.
What I ended up doing is wrap up all of my calls to Watir in a Proc with a Begin, rescue that would do a retry in case of a timeout.
This is ugly and violates so many rules that I am nearly ashamed to had to resort to this solution, but at least using this my scripts are now completing.
here is how I worked around the issue:
# takes a proc and wraps it around a series of rescue
def execute_block_and_rety_if_needed
yield
rescue Net::ReadTimeout
puts 'Read Timeout detected, retrying operation'
retry
rescue Net::HTTPRequestTimeOut
puts 'Http Request Timeout detected, retrying operation'
retry
rescue Errno::ETIMEDOUT
puts 'Errno::ETIMEDOUT detected, retrying operation'
retry
end
a sample use would look like this:
execute_block_and_rety_if_needed { #browser.link(name: 'OK').wait_until_present.click } # click the 'OK' button
As you can see, this clearly violates the DRY principle as I need to call this proc every single time.
My question is: how can I move this as a module / feature of Watir so that it picks it up automatically. (ideally I would add a maximum number of retry to prevent an infinite loop).
Version information:
- Chromedriver => 2.29.461585
- GeckoDriver => 0.16.1
- Firefox => ESR 52
- Chrome => 58
- Watir => 6.2.1
As far as the DRY comment, I referred to the fact that I had to wrap ALL of my Watir calls with the proc, sorry if this wasn't clear.
execute_block_and_rety_if_needed { #browser.link(name: 'User').wait_until_present.click } # click the 'Edit' button
execute_block_and_rety_if_needed { #browser.link(name: 'Cancel').wait_until_present.click } # click the 'Cancel' button
execute_block_and_rety_if_needed { #browser.link(name: 'OK').wait_until_present.click } # click the 'OK' button
The above is just an example that has to happen if I want to use the retry mechanism.
Given that you want to retry every command sent to the browser, you might want to consider addressing the issue in the underlying Selenium-WebDriver rather than Watir. Watir commands get sent to Selenium-WebDriver, which in turn sends them to the browser/driver.
Each command (or at least most) is currently sent through Selenium::WebDriver::Remote::Http:Default#request. You could patch the method to wrap it in a retry. Not only would your clicks retry for timeouts, but so would every other command - eg navigation, setting fields, getting values, etc.
# Patch to retry timeouts during requests
require 'watir'
module Selenium
module WebDriver
module Remote
module Http
module DefaultExt
def request(*args)
tries ||= 3
super
rescue Net::ReadTimeout, Net::HTTPRequestTimeOut, Errno::ETIMEDOUT => ex
puts "#{ex.class} detected, retrying operation"
(tries -= 1).zero? ? raise : retry
end
end
end
end
end
end
Selenium::WebDriver::Remote::Http::Default.prepend(Selenium::WebDriver::Remote::Http::DefaultExt)
# Then you can use Watir as usual
browser = Watir::Browser.new :chrome # this will retry timeouts
browser.goto('http://www.example.com') # this will also retry timeouts
browser.link.click # this will also retry timeouts
You shouldn't need to use a block for this. You can implement a method that does something like:
def ensure_click(element, retries = 3)
#retries ||= retries
element.click
rescue Net::ReadTimeout, Net::HTTPRequestTimeOut, Errno::ETIMEDOUT => ex
raise unless #retries > 0
#retries = #retries - 1
puts "#{ex.class} detected, retrying"
retry
end
...
ensure_click(#browser.link(name: 'User'))
...
That being said, those exceptions are not typically driver errors, but network issues of some sort. The are not normal.

Prevent exceptions in Rubygem that cannot communicate with its associated web service?

I am using a RubyGem (DeathByCaptcha) that makes HTTP calls to deathbycaptcha.com. Every so often the HTTP request times out or fails for some other unknown reason, and my Ruby scripts exits with an exception. I am trying to automate repeated instances of this method ("decode") and I am trying to determine if there is a way to prevent an error in this method from exiting the whole script.
EDIT: Since I'm bound to get flamed on here, I will mention upfront that the purpose of this is to determine the effectiveness of different captcha options on my website's registration page with common captcha-breakers, because I have had problems with spam signups.
Here is how to prevent the exception from exiting the script.
tries = 0
begin
# risky failing code
rescue
sleep(1) # sleep n seconds
tries += 1
retry if tries <= 3 # retry the risky code again
end
You would need to catch the exception that is raised and somehow handle it.
You are looking for something like
begin
# Send HTTP request
rescue WhateverExceptionClassYouGet > error
# Do something with the error
end

Ruby Eventmachine queueing problem

I have a Http client written in Ruby that can make synchronous requests to URLs. However, to quickly execute multiple requests I decided to use Eventmachine. The idea is to
queue all the requests and execute them using eventmachine.
class EventMachineBackend
...
...
def execute(request)
$q ||= EM.Queue.new
$q.push(request)
$q.pop {|request| request.invoke}
EM.run{EM.next_tick {EM.stop}}
end
...
end
Forgive my use of a global queue variable. I will refactor it later. Is what I am doing in EventMachineBackend#execute the right way of using Eventmachine queues?
One problem I see in my implementation is it is essentially synchronous. I push a request, pop and execute the request and wait for it to complete.
Could anyone suggest a better implementation.
Your the request logic has to be asynchronous for it to work with EventMachine, I suggest that you use em-http-request. You can find an example on how to use it here, it shows how to run the requests in parallel. An even better interface for running multiple connections in parallel is the MultiRequest class from the same gem.
If you want to queue requests and only run a fixed number of them in parallel you can do something like this:
EM.run do
urls = [...] # regular array with URLs
active_requests = 0
# this routine will be used as callback and will
# be run when each request finishes
when_done = proc do
active_requests -= 1
if urls.empty? && active_requests == 0
# if there are no more urls, and there are no active
# requests it means we're done, so shut down the reactor
EM.stop
elsif !urls.empty?
# if there are more urls launch a new request
launch_next.call
end
end
# this routine launches a request
launch_next = proc do
# get the next url to fetch
url = urls.pop
# launch the request, and register the callback
request = EM::HttpRequest.new(url).get
request.callback(&when_done)
request.errback(&when_done)
# increment the number of active requests, this
# is important since it will tell us when all requests
# are done
active_requests += 1
end
# launch three requests in parallel, each will launch
# a new requests when done, so there will always be
# three requests active at any one time, unless there
# are no more urls to fetch
3.times do
launch_next.call
end
end
Caveat emptor, there may very well be some detail I've missed in the code above.
If you think it's hard to follow the logic in my example, welcome to the world of evented programming. It's really tricky to write readable evented code. It all goes backwards. Sometimes it helps to start reading from the end.
I've assumed that you don't want to add more requests after you've started downloading, it doesn't look like it from the code in your question, but should you want to you can rewrite my code to use an EM::Queue instead of a regular array, and remove the part that does EM.stop, since you will not be stopping. You can probably remove the code that keeps track of the number of active requests too, since that's not relevant. The important part would look something like this:
launch_next = proc do
urls.pop do |url|
request = EM::HttpRequest.new(url).get
request.callback(&launch_next)
request.errback(&launch_next)
end
end
Also, bear in mind that my code doesn't actually do anything with the response. The response will be passed as an argument to the when_done routine (in the first example). I also do the same thing for success and error, which you may not want to do in a real application.

'Who's online?' Ruby Network Program

I have several embedded linux systems that I want to write a 'Who's Online?' network service in Ruby. Below is related part of my code:
mySocket = UDPSocket.new
mySocket.bind("<broadcast>", 50050)
loop do
begin
text, sender = mySocket.recvfrom(1024)
puts text
if text =~ /KNOCK KNOCK/ then
begin
sock = UDPSocket.open
sock.send(r.ipaddress, 0, sender[3], 50051)
sock.close
rescue
retry
end
end
rescue Exception => inLoopEx
puts inLoopEx.message
puts inLoopEx.backtrace.inspect
retry
end
end
I send the 'KNOCK KNOCK' command from a PC. Now, the problem is since they all receive the message at the same time, they try to respond at the same time too, which causes a Broken Pipe exception (which is the reason of my 'rescue retry' code). This code works OK sometimes but; other times the rescue retry part of the code (which is waked by Broken Pipe exception from sock.send) causes one or more systems to respond after 5 seconds or so.
Is there a better way of doing this since I assume I cant escape the Broken Pipe exception?
I have found that exception was caused by the 'r.ipaddress' part in the send command, which is related to my embedded system's internals...

Resources