Catching timeout errors with ruby mechanize - ruby

I have a mechanize function to log me out of a site but on very rare occasions it times me out. The function involves going to a specific page, and then clicking on a logout button. On the occasional that mechanize suffers a timeout when either going to the logout page or clicking the logout button the code crashes. So I put in a small rescue and it seems to be working as seen below the first piece of code.
def logmeout(agent)
page = agent.get('http://www.example.com/')
agent.click(page.link_with(:text => /Log Out/i))
end
Logmeout with rescue:
def logmeout(agent)
begin
page = agent.get('http://www.example.com/')
agent.click(page.link_with(:text => /Log Out/i))
rescue Timeout::Error
puts "Timeout!"
retry
end
end
Assuming I understand rescue correctly, it will do both actions over even if just the clicking timed out, so in the effort to be efficient I am was wondering if I could use a proc in this situation and pass it a code block. Would something like this work:
def trythreetimes
tries = 0
begin
yield
rescue
tries += 1
puts "Trying again!"
retry if tries <= 3
end
end
def logmeout(agent)
trythreetimes {page = agent.get('http://www.example.com/')}
trythreetimes {agent.click(page.link_with(:text => /Log Out/i))}
end
Note in my trythreetimes function I left it as generic rescue so the function would be more re-usable.
Thanks so much for any help anyone can provide, I realize there are a couple different questions in here but they are all things I am trying to learn!

Instead of retrying some timeouts on some mechanize requests I think you'd better set Mechanize::HTTP::Agent::read_timeout attribute to a reasonable amount of seconds like 2 or 5, anyway one that prevent timeouts errors for this request.
Then, it seem's that your log out procedure only required access to a simple HTTP GET request. I mean there is no form to fill in so no HTTP POST request.
So if I were you, I would prefere inspected the page source code (Ctrl+U with Firefox or Chrome) in order to identify the link which is reached by your agent.click(page.link_with(:text => /Log Out/i))
It should be faster because these type of pages are usually blank and Mechanize will not have to load a full html web page in memory.
Here is the code I would prefer use :
def logmeout(agent)
begin
agent.read_timeout=2 #set the agent time out
page = agent.get('http://www.example.com/logout_url.php')
agent.history.pop() #delete this request in the history
rescue Timeout::Error
puts "Timeout!"
puts "read_timeout attribute is set to #{agent.read_timeout}s" if !agent.read_timeout.nil?
#retry #retry is no more needed
end
end
but you can use your retry function too :
def trythreetimes
tries = 0
begin
yield
rescue Exception => e
tries += 1
puts "Error: #{e.message}"
puts "Trying again!" if tries <= 3
retry if tries <= 3
puts "No more attempt!"
end
end
def logmeout(agent)
trythreetimes do
agent.read_timeout=2 #set the agent time out
page = agent.get('http://www.example.com/logout_url.php')
agent.history.pop() #delete this request in the history
end
end
hope it helps ! ;-)

Using mechanize 1.0.0 I got this problem from a different source of error.
In my case I was blocked by proxy and then SSL. This worked for me:
ag = Mechanize.new
ag.set_proxy('yourproxy', yourport)
ag.agent.http.verify_mode = OpenSSL::SSL::VERIFY_NONE
ag.get( url )

Related

Browsermob Proxy + Watir not capturing traffic continuously

I have the BrowserMob Proxy set up correctly with Watir and it is capturing traffic and saving the HAR file; however, what it's not doing is that it's not capturing the traffic continuously. So following is what I'm trying to achieve:
Go to homepage
Click on a link to go to another page where I need to wait for some events to happen
Once on the second page, start capturing traffic after the event happens and wait for a specific call to occur and capture its contents.
What I'm noticing however, is that it's following all of the above steps, but on step 3 the proxy stops capturing traffic before that call is even made on that page. The HAR that is returned doesn't have that call in it hence the test fails before it even does its job. Following is how the code looks like.
class BMP
attr_accessor :server, :proxy, :net_har, :sel_proxy
def initialize
bm_path = File.path(Support::Paths.cucumber_root + "/browsermob-
proxy-2.1.4/bin/browsermob-proxy")
#server = BrowserMob::Proxy::Server.new(bm_path, {:port => 9999,
:log => false, :use_little_proxy => true, :timeout => 100})
#server.start
#proxy = #server.create_proxy
#sel_proxy = #proxy.selenium_proxy
#proxy.timeouts(:read => 50000, :request => 50000, :dns_cache =>
50000)
#net_har = #proxy.new_har("new_har", :capture_binary_content =>
true, :capture_headers => true, :capture_content => true)
end
def fetch_har_entries(target_url)
har_logs = File.join(Support::Paths.har_logs, "har_file # .
{Time.now.strftime("%m%d%y_%H%M%S")} .har")
#net_har.save_to har_logs
index = 0
while (#net_har.entries.count > index) do
if #net_har.entries[index].request.url.include?(target_url) &&
entry.request.method.eql?("GET")
logs = JSON.parse(entry.response.content.text) if not
entry.response.content.text.nil?
har_logs = File.join(Support::Paths.har_logs, "json_file_# .
{Time.now.strftime("%m%d%y_%H%M%S")}.json")
File.open(har_logs, "w") do |json|
json.write(logs)
end
break
end
index += 1
end
end
end
In my test file I have following
Then("I navigate to the homepage") do
visit(HomePage) do |page|
page.element.click
end
end
And("I should wait for event to capture traffic") do
visit(SecondPage) do |page|
page.wait_until{page.element2.present?)
BMP.fetch_har_entries("target/url")
end
end
What am I missing that is causing the proxy to not capture traffic in its entirety?
In case anyone gets here from a google search, I figured out how to resolve this on my own (thanks stackoverflow community for nothing, lol). So to resolve the issue, i used a custom retriable loop called eventually method.
logs = nil
eventually(timeout: 110, interval: 1) do
#net_har = #proxy.new_har("har", capture_binary_content: true, capture_headers: true, capture_content: true)
#net_har.entries.each do |entry|
begin
break if #net_har.entries.index entry == #net_har.entries.count
next unless entry.request.url.include?(target_url) &&
entry.request.post_data.text.include?(target_body_text)
logs = entry.request.post_data.text
break
rescue TypeError
fail("Response body for the network call came back empty")
end
end
raise EOFError if logs_hash.nil?
end
logs
end
Basically I'm assuming what was happening was the BMP would only cache or capture 30 seconds worth of har logs, and if my network event didn't occur during those 30 secs, i was SOL. So the what above code is doing is that's it's waiting for the logs variable to be not nil, if it is, it raises an EOFError and goes back to the loop initializes the har again and looks for the network call again. It keeps on doing that until it find the call or 110 seconds are up. Following is the eventually method I'm using
def eventually(options = {})
timeout = options[:timeout] || 30
interval = options[:interval] || 0.1
time_limit = Time.now + timeout
loop do
begin
yield
rescue EOFError => error
end
return if error.nil?
raise error if Time.now >= time_limit
sleep interval
end
end

In RoR, how do I catch an exception if I get no response from a server?

I’m using Rails 4.2.3 and Nokogiri to get data from a web site. I want to perform an action when I don’t get any response from the server, so I have:
begin
content = open(url).read
if content.lstrip[0] == '<'
doc = Nokogiri::HTML(content)
else
begin
json = JSON.parse(content)
rescue JSON::ParserError => e
content
end
end
rescue Net::OpenTimeout => e
attempts = attempts + 1
if attempts <= max_attempts
sleep(3)
retry
end
end
Note that this is different than getting a 500 from the server. I only want to retry when I get no response at all, either because I get no TCP connection or because the server fails to respond (or some other reason that causes me not to get any response). Is there a more generic way to take account of this situation other than how I have it? I feel like there are a lot of other exception types I’m not thinking of.
This is generic sample how you can define timeout durations for HTTP connection, and perform several retries in case of any error while fetching content (edited)
require 'open-uri'
require 'nokogiri'
url = "http://localhost:3000/r503"
openuri_params = {
# set timeout durations for HTTP connection
# default values for open_timeout and read_timeout is 60 seconds
:open_timeout => 1,
:read_timeout => 1,
}
attempt_count = 0
max_attempts = 3
begin
attempt_count += 1
puts "attempt ##{attempt_count}"
content = open(url, openuri_params).read
rescue OpenURI::HTTPError => e
# it's 404, etc. (do nothing)
rescue SocketError, Net::ReadTimeout => e
# server can't be reached or doesn't send any respones
puts "error: #{e}"
sleep 3
retry if attempt_count < max_attempts
else
# connection was successful,
# content is fetched,
# so here we can parse content with Nokogiri,
# or call a helper method, etc.
doc = Nokogiri::HTML(content)
p doc
end
When it comes to rescuing exceptions, you should aim to have a clear understanding of:
Which lines in your system can raise exceptions
What is going on under the hood when those lines of code run
What specific exceptions could be raised by the underlying code
In your code, the line that's fetching the content is also the one that could see network errors:
content = open(url).read
If you go to the documentation for the OpenURI module you'll see that it uses Net::HTTP & friends to get the content of arbitrary URIs.
Figuring out what Net::HTTP can raise is actually very complicated but, thankfully, others have already done this work for you. Thoughtbot's suspenders project has lists of common network errors that you can use. Notice that some of those errors have to do with different network conditions than what you had in mind, like the connection being reset. I think it's worth rescuing those as well, but feel free to trim the list down to your specific needs.
So here's what your code should look like (skipping the Nokogiri and JSON parts to simplify things a bit):
require 'net/http'
require 'open-uri'
HTTP_ERRORS = [
EOFError,
Errno::ECONNRESET,
Errno::EINVAL,
Net::HTTPBadResponse,
Net::HTTPHeaderSyntaxError,
Net::ProtocolError,
Timeout::Error,
]
MAX_RETRIES = 3
attempts = 0
begin
content = open(url).read
rescue *HTTP_ERRORS => e
if attempts < MAX_RETRIES
attempts += 1
sleep(2)
retry
else
raise e
end
end
I would think about using a Timeout that raises an exception after a short period:
MAX_RESPONSE_TIME = 2 # seconds
begin
content = nil # needs to be defined before the following block
Timeout.timeout(MAX_RESPONSE_TIME) do
content = open(url).read
end
# parsing `content`
rescue Timeout::Error => e
attempts += 1
if attempts <= max_attempts
sleep(3)
retry
end
end

Skip a http request if response if taking too long with ruby

I have an array of urls. I'm going through each one, sending a get request and printing the response code. Here is part of the code:
arr.each do |url|
res = Faraday.get(link.href)
p res.status
end
However sometimes I get to url, it times out and crashes. Is there a way to tell ruby "if I don't get a response in a certain amount of time then skip to the next url?"
You could add a timeout like this:
require 'timeout'
arr.each do |url|
begin
Timeout.timeout(5) do # a timeout of five seconds
res = Faraday.get(link.href)
p res.status
end
rescue Timeout::Error
# handle error: show user a message?
end
end

How can I get my script to loop?

I have it where my script signs in and goes to a browser url, yet when it signs out of the current web page it just sits there and won't restart the loop. How can I get the loop to realize its done and to restart?
x = 0
while x <= 5
File.open("yahoo_accounts.txt") do |email|
email.each do |item|
email, password = item.chomp.split(',')
emails << email
passwords << password
emails.zip(passwords) { |name, pass|
browser = Watir::Browser.new :ff
browser.goto "url"
#logs in and does what its suppose to do with the name and pass
}
end
x += 1
next
end
end
When the script is done it just sits at the webpage...I'm trying to get it to go to the beginning again...
You would think it would take each name,pass and go back to the beginning url.
Thanks for your help.
It looks like you may not be calling browser.close appropriately. In my quick mock-up testing, I definitely get weird behaviour if I don't do that. You're also using non-idiomatic Ruby looping. Try this:
5.times do
File.open("yahoo_accounts.txt") do |email|
email.each do |item|
email, password = item.chomp.split(',')
emails << email
passwords << password
emails.zip(passwords) do |name, pass|
browser = Watir::Browser.new :ff
browser.goto "url"
#logs in and does what its suppose to do with the name and pass
browser.close
end
end
end
end
EDIT:
Alternatively, if you want the same exact Watir::Browser instance to be doing all the work, initialize and close outside of your main loop. Right now, you're spawning a new Browser instance with every iteration of emails.zip, times every iteration of email.each, times the 5 iterations of your while loop. This is just ungainly, and may be screwing up your expected results. So just doing:
browser = Watir::Browser.new :ff
5.times do
... loop code ...
end
browser.close
Will at least make whatever's happening under the hood clearer.

Element not found in the cache - perhaps the page has changed since it was looked up in Selenium Ruby web driver?

I am trying to write a crawler that crawls all links from loaded page and logs all request and response headers along with response body in some file say XML or txt. I am opening all links from first loaded page in new browser window so I wont get this error:
Element not found in the cache - perhaps the page has changed since it was looked up
I want to know what could be the alternate way to make requests and receive response from all links and then locate input elements and submit buttons form all opened windows.
I am able to do above to some extent except when opened window has common site searh box like one on this http://www.testfire.net in the upper right corner.
What I want to do is I want to omit such common boxes so that I can fill other inputs with values using i.send_keys "value" method of webdriver and dont get this error
ERROR: Element not found in the cache - perhaps the page has changed since it was looked up.
What is the way to detect and distinguish input tags from each opened window so that value does not get filled repeatably in common input tags that appear on most pages of website.
My code is following:
require 'rubygems'
require 'selenium-webdriver'
require 'timeout'
class Clicker
def open_new_window(url)
#driver = Selenium::WebDriver.for :firefox
#url = #driver.get " http://test.acunetix.com "
#link = Array.new(#driver.find_elements(:tag_name, "a"))
#windows = Array.new(#driver.window_handles())
#link.each do |a|
a = #driver.execute_script("var d=document,a=d.createElement('a');a.target='_blank';a.href=arguments[0];a.innerHTML='.';d.body.appendChild(a);return a", a)
a.click
end
i = #driver.window_handles
i[0..i.length].each do |handle|
#driver.switch_to().window(handle)
puts #driver.current_url()
inputs = Array.new(#driver.find_elements(:tag_name, 'input'))
forms = Array.new(#driver.find_elements(:tag_name, 'form'))
inputs.each do |i|
begin
i.send_keys "value"
puts i.class
i.submit
rescue Timeout::Error => exc
puts "ERROR: #{exc.message}"
rescue Errno::ETIMEDOUT => exc
puts "ERROR: #{exc.message}"
rescue Exception => exc
puts "ERROR: #{exc.message}"
end
end
forms.each do |j|
begin
j.send_keys "value"
j.submit
rescue Timeout::Error => exc
puts "ERROR: #{exc.message}"
rescue Errno::ETIMEDOUT => exc
puts "ERROR: #{exc.message}"
rescue Exception => exc
puts "ERROR: #{exc.message}"
end
end
end
#Switch back to the original window
#driver.switch_to().window(i[0])
end
end
ol = Clicker.new
url = ""
ol.open_new_window(url)
Guide me how can I get all requeat and response headers with response body using Selenium Webdriver or using http.set_debug_output of ruby's net/http ?
Selenium is not one of the best options to use to attempt to build a "web-crawler". It can be too flakey at times, especially when it comes across unexpected scenarios. Selenium WebDriver is a great tool for automating and testing expectancies and user interactions.
Instead, good old fashioned curl would probably be a better option for web-crawling. Also, I am pretty sure there are some ruby gems that might help you web-crawl, just Google search it!
But To answer the actual question if you were to use Selenium WebDriver:
I'd work out a filtering algorithm where you can add the HTML of an element that you interact with to an variable array. Then, when you go on to the next window/tab/link, it would check against the variable array and skip the element if it finds a matching HTML value.
Unfortunately, SWD does not support getting request headers and responses with its API. The common work-around is to use a third party proxy to intercept the requests.
============
Now I'd like to address a few issues with your code.
I'd suggest before iterating over the links, add a #default_current_window = #driver.window_handle. This will allow you to always return back to the correct window at the end of your script when you call #driver.switch_to.window(#default_current_window).
In your #links iterator, instead of iterating over all the possible windows that could be displayed, use #driver.switch_to.window(#driver.window_handles.last). This will switch to the most recently displayed new window (and it only needs to happen once per link click!).
You can DRY up your inputs and form code by doing something like this:
inputs = []
inputs << #driver.find_elements(:tag_name => "input")
inputs << #driver.find_elements(:tag_name => "form")
inputs.flatten
inputs.each do |i|
begin
i.send_keys "value"
i.submit
rescue e
puts "ERROR: #{e.message}"
end
end
Please note how I just added all of the elements you wanted SWD to find into a single array variable that you iterate over. Then, when something bad happens, a single rescue is needed (I assume you don't want to automatically quit from there, which is why you just want to print the message to the screen).
Learning to DRY up your code and use external gems will help you achieve a lot of what you are trying to do, and at a faster pace.

Resources