Preventing timeout when connecting to a URL - ruby

I want to see the time taken to access a url using Benchmark in the code below. I also tried to do the same thing without benchmark. That is, get time at start of test and end of test, subtract the two to get the time. Both methods end in the same timeout error.
require 'open-uri'
require 'benchmark'
response = nil
puts "opening website with benchmark..."
puts Benchmark.measure{
response = open('http://mywebsite.com')
}
puts "Done !"
status = response.status
puts status
Error:
opening website with benchmark...
C:/ruby/lib/ruby/1.8/timeout.rb:64:in `rbuf_fill': execution expired (Timeout::Error)
from C:/ruby/lib/ruby/1.8/net/protocol.rb:134:in `rbuf_fill'
from C:/ruby/lib/ruby/1.8/net/protocol.rb:116:in `readuntil'
from C:/ruby/lib/ruby/1.8/net/protocol.rb:126:in `readline'
from C:/ruby/lib/ruby/1.8/net/http.rb:2028:in `read_status_line'
from C:/ruby/lib/ruby/1.8/net/http.rb:2017:in `read_new'
from C:/ruby/lib/ruby/1.8/net/http.rb:1051:in `request'
from C:/ruby/lib/ruby/1.8/open-uri.rb:248:in `open_http'
from C:/ruby/lib/ruby/1.8/net/http.rb:543:in `start'
from C:/ruby/lib/ruby/1.8/open-uri.rb:242:in `open_http'
from C:/ruby/lib/ruby/1.8/open-uri.rb:616:in `buffer_open'
from C:/ruby/lib/ruby/1.8/open-uri.rb:164:in `open_loop'
from C:/ruby/lib/ruby/1.8/open-uri.rb:162:in `catch'
from C:/ruby/lib/ruby/1.8/open-uri.rb:162:in `open_loop'
from C:/ruby/lib/ruby/1.8/open-uri.rb:132:in `open_uri'
from C:/ruby/lib/ruby/1.8/open-uri.rb:518:in `open'
from C:/ruby/lib/ruby/1.8/open-uri.rb:30:in `open'
from C:/code/test.rb:7
from C:/ruby/lib/ruby/1.8/benchmark.rb:293:in `measure'
from C:/code/test.rb:6
When I try to connect to this URL in my browser, it takes about 2-3 minutes to access, all the time.
I searched google, but found no useful answers to my problem. I know that I have to
change the timeout setting for something, but not able to figure out which one. Can someone please help ?

Use the :read_timeout option, specified in seconds, e.g.,
open('foo.com', :read_timeout => 10)
http://ruby-doc.org/stdlib-1.8.7/libdoc/open-uri/rdoc/OpenURI/OpenRead.html

The BufferedIO class that Net::HTTP uses for the connection that open-uri then uses for the request has a read_timeout attribute set to 60 seconds.
The Net::HTTP class provides a setter read_timeout for that.
Unfortunately, the way open-uri sets up that request doesn't provide you with a way to get at that setting before the request or the easily override the default of 60.
You will probably need to use Net::HTTP yourself.
link = URI.parse(url)
request = Net::HTTP::Get.new(link.path)
response = Net::HTTP.start(link.host, link.port) {|http|
http.read_timeout = 100 #Default is 60 seconds
http.request(request)
}
Code stolen from this answer
edit: or upgrade to 1.9 which supports the :read_timeout = x option Dave Newton noted.

Related

400 Bad Request for Ruby RSS gem

I can't seem to get this RSS feed to work properly. I've tried Nokogiri and now RSS::Parser and neither work:
a = 'https://phys.org/rss-feed/biology-news/biology-other/'
URI.open(a) do |rss|
feed = RSS::Parser.parse(rss)
puts "Title: #{feed.channel.title}"
feed.items.each do |item|
puts "Item: #{item.title}"
end
end
The code is taken directly out of the docs: https://github.com/ruby/rss
The feed is valid, so I'm confused as to why there's a 400 error code.
What am I doing wrong? Anybody have insight as to how to get this RSS parsed?
Here is the error:
/Users/user3/.rbenv/versions/3.1.2/lib/ruby/3.1.0/open-uri.rb:364:in `open_http': 400 Bad request (OpenURI::HTTPError)
from /Users/user3/.rbenv/versions/3.1.2/lib/ruby/3.1.0/open-uri.rb:741:in `buffer_open'
from /Users/user3/.rbenv/versions/3.1.2/lib/ruby/3.1.0/open-uri.rb:212:in `block in open_loop'
from /Users/user3/.rbenv/versions/3.1.2/lib/ruby/3.1.0/open-uri.rb:210:in `catch'
from /Users/user3/.rbenv/versions/3.1.2/lib/ruby/3.1.0/open-uri.rb:210:in `open_loop'
from /Users/user3/.rbenv/versions/3.1.2/lib/ruby/3.1.0/open-uri.rb:151:in `open_uri'
from /Users/user3/.rbenv/versions/3.1.2/lib/ruby/gems/3.1.0/gems/open_uri_redirections-0.2.1/lib/open-uri/redirections_patch.rb:55:in `open_uri'
from /Users/user3/.rbenv/versions/3.1.2/lib/ruby/3.1.0/open-uri.rb:721:in `open'
from /Users/user3/.rbenv/versions/3.1.2/lib/ruby/3.1.0/open-uri.rb:29:in `open'
from /users/user3/app.rb:1856:in `<main>'
The web server requires the request to have a User-Agent set in the headers. Without such a User-Agent header it returns the 400 error message.
require 'uri'
require 'open-uri'
require 'rss'
uri = URI.parse("https://phys.org/rss-feed/biology-news/biology-other/")
uri.open("User-Agent" => "Ruby/#{RUBY_VERSION}") do |rss|
feed = RSS::Parser.parse(rss)
puts "Title: #{feed.channel.title}"
feed.items.each do |item|
puts "Item: #{item.title}"
end
end
This code work for me.

How to configure read_timeout and open_timeout for Selenium in Ruby?

I'm getting this exception when calling goto() via Watir:
Net::ReadTimeout with #<TCPSocket:(closed)>
/usr/lib/ruby/2.6.0/net/protocol.rb:217:in `rbuf_fill'
/usr/lib/ruby/2.6.0/net/protocol.rb:191:in `readuntil'
/usr/lib/ruby/2.6.0/net/protocol.rb:201:in `readline'
/usr/lib/ruby/2.6.0/net/http/response.rb:40:in `read_status_line'
/usr/lib/ruby/2.6.0/net/http/response.rb:29:in `read_new'
/usr/lib/ruby/2.6.0/net/http.rb:1509:in `block in transport_request'
/usr/lib/ruby/2.6.0/net/http.rb:1506:in `catch'
/usr/lib/ruby/2.6.0/net/http.rb:1506:in `transport_request'
/usr/lib/ruby/2.6.0/net/http.rb:1479:in `request'
/var/lib/gems/2.6.0/gems/selenium-webdriver-3.142.3/lib/selenium/webdriver/remote/http/default.rb:129:in `response_for'
/var/lib/gems/2.6.0/gems/selenium-webdriver-3.142.3/lib/selenium/webdriver/remote/http/default.rb:82:in `request'
/var/lib/gems/2.6.0/gems/selenium-webdriver-3.142.3/lib/selenium/webdriver/remote/http/common.rb:64:in `call'
/var/lib/gems/2.6.0/gems/selenium-webdriver-3.142.3/lib/selenium/webdriver/remote/bridge.rb:167:in `execute'
/var/lib/gems/2.6.0/gems/selenium-webdriver-3.142.3/lib/selenium/webdriver/remote/w3c/bridge.rb:567:in `execute'
/var/lib/gems/2.6.0/gems/selenium-webdriver-3.142.3/lib/selenium/webdriver/remote/w3c/bridge.rb:59:in `get'
/var/lib/gems/2.6.0/gems/selenium-webdriver-3.142.3/lib/selenium/webdriver/common/navigation.rb:32:in `to'
/var/lib/gems/2.6.0/gems/watir-6.16.5/lib/watir/navigation.rb:16:in `goto'
It seems that I can modify the timeouts, but I don't understand how.
They are used here, but how do I configure them?
You can refer to Documentation Ruby-Bindings : Internal timeouts
From Documentation : Internal timeouts Internally, WebDriver uses
HTTP to communicate with a lot of the drivers (the JsonWireProtocol).
By default, Net::HTTP from Ruby's standard library is used, which has
a default timeout of 60 seconds. If you call e.g. Driver#get,
Driver#click on a page that takes more than 60 seconds to load, you'll
see a Timeout::Error raised from Net::HTTP. You can configure this
timeout (before launching a browser) by doing:
client = Selenium::WebDriver::Remote::Http::Default.new
client.read_timeout = 120 # seconds
driver = Selenium::WebDriver.for :remote, http_client: client
In Watir it may be something like
caps = Selenium::WebDriver::Remote::Capabilities.chrome
client = Selenium::WebDriver::Remote::Http::Default.new
client.read_timeout = 600
client.open_timeout = 600
driver = Watir::Browser.new :chrome, :desired_capabilities => caps,
:http_client => client

Net::HTTP and Nokogiri - undefined method `body' for nil:NilClass (NoMethodError)

Thanks for your time. Somewhat new to OOP and Ruby and after synthesizing solutions from a few different stack overflow answers I've got myself turned around.
My goal is to write a script that parses a CSV of URLs using Nokogiri library. After trying and failing to use open-uri and the open-uri-redirections plugin to follow redirects, I settled on Net::HTTP and that got me moving...until I ran into URLs that have a 302 redirect specifically.
Here's the method I'm using to engage the URL:
require 'Nokogiri'
require 'Net/http'
require 'csv'
def fetch(uri_str, limit = 10)
# You should choose better exception.
raise ArgumentError, 'HTTP redirect too deep' if limit == 0
url = URI.parse(uri_str)
#puts "The value of uri_str is: #{ uri_str}"
#puts "The value of URI.parse(uri_str) is #{ url }"
req = Net::HTTP::Get.new(url.path, { 'User-Agent' => 'Mozilla/5.0 (etc...)' })
# puts "THE URL IS #{url.scheme + ":" + url.host + url.path}" # just a reporter so I can see if it's mangled
response = Net::HTTP.start(url.host, url.port, :use_ssl => url.scheme == 'https') { |http| http.request(req) }
case response
when Net::HTTPSuccess then response
when Net::HTTPRedirection then fetch(response['location'], limit - 1)
else
#puts "Problem clause!"
response.error!
end
end
Further down in my script I take an ARGV with the URL csv filename, do CSV.read, encode the URL to a string, then use Nokogiri::HTML.parse to turn it all into something I can use xpath selectors to examine and then write to an output CSV.
Works beautifully...so long as I encounter a 200 response, which unfortunately is not every website. When I run into a 302 I'm getting this:
C:/Ruby24-x64/lib/ruby/2.4.0/Net/http.rb:1570:in `addr_port': undefined method `+' for nil:NilClass (NoMethodError)
from C:/Ruby24-x64/lib/ruby/2.4.0/Net/http.rb:1503:in `begin_transport'
from C:/Ruby24-x64/lib/ruby/2.4.0/Net/http.rb:1442:in `transport_request'
from C:/Ruby24-x64/lib/ruby/2.4.0/Net/http.rb:1416:in `request'
from httpcsv.rb:14:in `block in fetch'
from C:/Ruby24-x64/lib/ruby/2.4.0/Net/http.rb:877:in `start'
from C:/Ruby24-x64/lib/ruby/2.4.0/Net/http.rb:608:in `start'
from httpcsv.rb:14:in `fetch'
from httpcsv.rb:17:in `fetch'
from httpcsv.rb:42:in `block in <main>'
from C:/Ruby24-x64/lib/ruby/2.4.0/csv.rb:866:in `each'
from C:/Ruby24-x64/lib/ruby/2.4.0/csv.rb:866:in `each'
from httpcsv.rb:38:in `<main>'
I know I'm missing something right in front of me but I can't tell what I should puts to see if it is nil. Any help is appreciated, thanks in advance.

Profiling Sinatra with rack-perftools_profiler (using net/http)

One of my Sinatra actions performs another request using net/http and computes something based on the response:
api_response = Net::HTTP.get_response(uri)
It works like a charm, but when I try to profile this:
configure do
use ::Rack::PerftoolsProfiler, default_printer: 'pdf', frequency: 4000, mode: :walltime
end
(and put ?profile=true into the params of the request I get this:
Errno::EADDRINUSE - Address already in use - connect(2):
/Users/tomek/.rvm/rubies/ruby-1.9.3-p286/lib/ruby/1.9.1/net/http.rb:762:in `initialize'
/Users/tomek/.rvm/rubies/ruby-1.9.3-p286/lib/ruby/1.9.1/net/http.rb:762:in `open'
/Users/tomek/.rvm/rubies/ruby-1.9.3-p286/lib/ruby/1.9.1/net/http.rb:762:in `block in connect'
/Users/tomek/.rvm/rubies/ruby-1.9.3-p286/lib/ruby/1.9.1/timeout.rb:54:in `timeout'
/Users/tomek/.rvm/rubies/ruby-1.9.3-p286/lib/ruby/1.9.1/timeout.rb:99:in `timeout'
/Users/tomek/.rvm/rubies/ruby-1.9.3-p286/lib/ruby/1.9.1/net/http.rb:762:in `connect'
/Users/tomek/.rvm/rubies/ruby-1.9.3-p286/lib/ruby/1.9.1/net/http.rb:755:in `do_start'
/Users/tomek/.rvm/rubies/ruby-1.9.3-p286/lib/ruby/1.9.1/net/http.rb:744:in `start'
/Users/tomek/.rvm/rubies/ruby-1.9.3-p286/lib/ruby/1.9.1/net/http.rb:454:in `get_response'
Any idea what's happening?
I try to setup these the same way as you.
require 'sinatra'
require 'rack/perftools_profiler'
require 'net/http'
configure do
use ::Rack::PerftoolsProfiler, default_printer: 'pdf', frequency: 4000, mode: :walltime
end
get "/" do
uri = URI('http://google.com')
api_response = Net::HTTP.get_response(uri)
end
And it run with out a problem on ruby 1.9.3-p286 and with rack-perftools_profiler (0.6.1).
Two recommendations:
update your ruby version
update your gems
If these don't help check which program has bound your port with sudo netstat -nlp
If these all don't help please tell us more about your setup.

Skipping slow websites when looping through an array of URLs using watir-webdriver

I'm trying to loop through an array of websites in Chrome using watir-webdriver, but I always encounter an error on certain websites. Recently, I have had this problem with http://adage.com. The loop will execute perfectly until it reaches http://adage.com and then it will hang until the following error is displayed:
/Users/default/.rvm/rubies/ruby-1.9.3-p125/lib/ruby/1.9.1/net/protocol.rb:146:in `rescue in rbuf_fill': Timeout::Error (Timeout::Error)
from /Users/default/.rvm/rubies/ruby-1.9.3-p125/lib/ruby/1.9.1/net/protocol.rb:140:in `rbuf_fill'
from /Users/default/.rvm/rubies/ruby-1.9.3-p125/lib/ruby/1.9.1/net/protocol.rb:122:in `readuntil'
from /Users/default/.rvm/rubies/ruby-1.9.3-p125/lib/ruby/1.9.1/net/protocol.rb:132:in `readline'
from /Users/default/.rvm/rubies/ruby-1.9.3-p125/lib/ruby/1.9.1/net/http.rb:2562:in `read_status_line'
from /Users/default/.rvm/rubies/ruby-1.9.3-p125/lib/ruby/1.9.1/net/http.rb:2551:in `read_new'
from /Users/default/.rvm/rubies/ruby-1.9.3-p125/lib/ruby/1.9.1/net/http.rb:1319:in `block in transport_request'
from /Users/default/.rvm/rubies/ruby-1.9.3-p125/lib/ruby/1.9.1/net/http.rb:1316:in `catch'
from /Users/default/.rvm/rubies/ruby-1.9.3-p125/lib/ruby/1.9.1/net/http.rb:1316:in `transport_request'
from /Users/default/.rvm/rubies/ruby-1.9.3-p125/lib/ruby/1.9.1/net/http.rb:1293:in `request'
from /Users/default/.rvm/rubies/ruby-1.9.3-p125/lib/ruby/1.9.1/net/http.rb:1286:in `block in request'
from /Users/default/.rvm/rubies/ruby-1.9.3-p125/lib/ruby/1.9.1/net/http.rb:745:in `start'
from /Users/default/.rvm/rubies/ruby-1.9.3-p125/lib/ruby/1.9.1/net/http.rb:1284:in `request'
from /Users/default/.rvm/gems/ruby-1.9.3-p125/gems/selenium-webdriver-2.25.0/lib/selenium/webdriver/remote/http/default.rb:82:in `response_for'
from /Users/default/.rvm/gems/ruby-1.9.3-p125/gems/selenium-webdriver-2.25.0/lib/selenium/webdriver/remote/http/default.rb:38:in `request'
from /Users/default/.rvm/gems/ruby-1.9.3-p125/gems/selenium-webdriver-2.25.0/lib/selenium/webdriver/remote/http/common.rb:40:in `call'
from /Users/default/.rvm/gems/ruby-1.9.3-p125/gems/selenium-webdriver-2.25.0/lib/selenium/webdriver/remote/bridge.rb:598:in `raw_execute'
from /Users/default/.rvm/gems/ruby-1.9.3-p125/gems/selenium-webdriver-2.25.0/lib/selenium/webdriver/remote/bridge.rb:576:in `execute'
from /Users/default/.rvm/gems/ruby-1.9.3-p125/gems/selenium-webdriver-2.25.0/lib/selenium/webdriver/remote/bridge.rb:536:in `getActiveElement'
from /Users/default/.rvm/gems/ruby-1.9.3-p125/gems/selenium-webdriver-2.25.0/lib/selenium/webdriver/common/target_locator.rb:60:in `active_element'
from /Users/default/.rvm/gems/ruby-1.9.3-p125/gems/watir-webdriver-0.6.1/lib/watir-webdriver/browser.rb:136:in `send_keys'
from /Users/default/Dropbox/beta_scripts/loop_test.rb:16:in `rescue in <main>'
from /Users/default/Dropbox/beta_scripts/loop_test.rb:11:in `<main>'
I have no idea how to avoid this. I have tried setting timeouts and even sending the ESC key during rescue to stop Chrome from loading the page, but have not had any success. Ultimately, I want to be able to reliably load an array of 500+ websites in succession, but this seems impossible given the likelihood that one of the websites will hang. Is there any way to stop a slow page from loading and move on to the next element in the array?
Below is a shortened version of my code that isolates the problem:
#!/usr/bin/env ruby
require 'watir-webdriver'
b = Watir::Browser.new :chrome
sites = ["twitter.com", "cars.com", "autotrader.com", "rolex.com", "newyorker.com", "adage.com", "theatlantic.com", "pcmag.com"]
sites.each do |uri|
begin
Timeout::timeout(10) do
b.goto uri
end
rescue Timeout::Error => e_time
sleep 5
b.send_keys :escape
p "#{uri} is taking forever to load (#{e_time})"
rescue Exception => e_exception
p e_exception
end
end
b.close
Well I can understand your frustration mate because I have encountered the same when dealing with selenium webdriver. Here it is what you need to do to be 100% sure that your script will run flawless and robust till the end for your 500+ websites.
sites.each do |uri|
!30.times { if ((b.goto uri)rescue false)then break else sleep 1; end }
end
The code above will try to access each website for a maximum of 30sec and then move to the next website.

Resources