How to scrape data from list of URLs and save data to CSV with nokogiri - ruby

I have a file called bontyurls.csv that looks like this:
http://bontrager.com/model/11383
http://bontrager.com/model/01740
http://bontrager.com/model/09595
I want my script to read that file and then spit out a file like this: bonty_test_urls_results.csv
url,model_names
http://bontrager.com/model/11383,"Road TLR Conversion Kit"
http://bontrager.com/model/01740,"404 File Not Found"
http://bontrager.com/model/09595,"RXL Road"
Here's what I've got so far:
# based on code from here: http://www.andrewsturges.com/2011/09/how-to-harvest-web-data-using-ruby-and.html
require 'nokogiri'
require 'open-uri'
require 'csv'
#urls = Array.new
#model_names = Array.new
urls = CSV.read("bontyurls.csv")
(0..urls.length - 1).each do |index|
puts urls[index][0]
doc = Nokogiri::HTML(open(urls[index][0]))
doc.xpath('//h1').each do |model_name|
#model_name << model_name.content
end
end
# write results to file
CSV.open("bonty_test_urls_results.csv", "wb") do |row|
row << ["url", "model_names"]
(0..#urls.length - 1).each do |index|
row << [
#urls[index],
#model_names[index]]
end
end
That code isn't working. I'm getting this error:
$ ruby bonty_test_urls.rb
http://bontrager.com/model/00310
bonty_test_urls.rb:15:in `block (2 levels) in <main>': undefined method `<<' for nil:NilClass (NoMethodError)
from /home/simon/.rvm/gems/ruby-1.9.3-p194/gems/nokogiri-1.5.5/lib/nokogiri/xml/node_set.rb:239:in `block in each'
from /home/simon/.rvm/gems/ruby-1.9.3-p194/gems/nokogiri-1.5.5/lib/nokogiri/xml/node_set.rb:238:in `upto'
from /home/simon/.rvm/gems/ruby-1.9.3-p194/gems/nokogiri-1.5.5/lib/nokogiri/xml/node_set.rb:238:in `each'
from bonty_test_urls.rb:14:in `block in <main>'
from bonty_test_urls.rb:11:in `each'
from bonty_test_urls.rb:11:in `<main>'
Here is some code that returns the model_name at least. I'm just having trouble getting it to work in the larger script:
require 'open-uri'
require 'nokogiri'
doc = Nokogiri::HTML(open("http://bontrager.com/model/09124"))
doc.xpath('//h1').each do |node|
puts node.text
end
Also, I haven't figured out how to handle the URLs that return a 404.

This is how I'd do it:
require 'csv'
require 'nokogiri'
require 'open-uri'
CSV_OPTIONS = {
:write_headers => true,
:headers => %w[url model_names]
}
CSV.open('bonty_test_urls_results.csv', 'wb', CSV_OPTIONS) do |csv|
csv_doc = File.foreach('bontyurls.csv') do |url|
url.chomp!
begin
doc = Nokogiri.HTML(open(url))
h1 = doc.at('h1').text.strip
h1 = doc.at('title').text.strip.sub(/^Bontrager: /i, '') if (h1.empty?)
csv << [url, h1]
rescue OpenURI::HTTPError => e
csv << [url, e.message]
end
end
end
Which generates a CSV file like:
url,model_names
http://bontrager.com/model/11383,Road TLR Conversion Kit (Model #11383)
http://bontrager.com/model/01740,404 File Not Found
http://bontrager.com/model/09595,RXL Road (Model #09595)

You declare #model_names, but try to push in to #model_name, which is why it's nil.

Related

Ruby Zip: Cannot open entry for reading while its open for writing

I'm trying to write some mail merge code where I open a docx file (as a zip) replace tags with data and then create a new docx file (as a zip) and iterate over the old zip file either adding my new replaced data or pulling the existing file from the old docx file and adding that instead.
The problem I'm getting is anytime I try to access the out.get_output_stream method, I'm getting the following error:
cannot open entry for reading while its open for writing - [Content_Types].xml (StandardError)
[Content_Types].xml happens to be first file in the docx so that's why its bombing on that particular file. What am I doing wrong?
require 'rubygems'
require 'zip' # rubyzip gem
class WordMailMerge
def self.open(path, &block)
self.new(path, &block)
end
def initialize(path, &block)
#replace = {}
if block_given?
#zip = Zip::File.open(path)
yield(self)
#zip.close
else
#zip = Zip::File.open(path)
end
end
def force_settings
#replace["word/settings.xml"] = %{<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w:settings xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:sl="http://schemas.openxmlformats.org/schemaLibrary/2006/main"><w:zoom w:percent="100"/></w:settings>}
end
def merge(rec)
xml = #zip.read("word/document.xml")
# replace tags with correct content
#replace["word/document.xml"] = xml
end
def save(path)
Zip::File.open(path, Zip::File::CREATE) do |out|
#zip.each do |entry|
if #replace[entry.name]
# this line creates the error
out.get_output_stream(entry.name).write(#replace[entry.name])
else
# this line also will do it.
out.get_output_stream(entry.name).write(#zip.read(entry.name))
end
end
end
end
def close
#zip.close
end
end
w = WordMailMerge.open("Option_2.docx")
w.force_settings
w.merge({})
w.save("Option_2_new.docx")
The following is the stack trace:
/home/aaron/.rvm/rubies/ruby-2.4.1/lib/ruby/2.4.0/delegate.rb:85:in `call': cannot open entry for reading while its open for writing - [Content_Types].xml (StandardError)
from /home/aaron/.rvm/rubies/ruby-2.4.1/lib/ruby/2.4.0/delegate.rb:85:in `method_missing'
from /home/aaron/.rvm/gems/ruby-2.4.1#appt/gems/rubyzip-1.2.1/lib/zip/streamable_stream.rb:28:in `get_input_stream'
from /home/aaron/.rvm/gems/ruby-2.4.1#appt/gems/rubyzip-1.2.1/lib/zip/streamable_stream.rb:45:in `write_to_zip_output_stream'
from /home/aaron/.rvm/gems/ruby-2.4.1#appt/gems/rubyzip-1.2.1/lib/zip/file.rb:313:in `block (3 levels) in commit'
from /home/aaron/.rvm/gems/ruby-2.4.1#appt/gems/rubyzip-1.2.1/lib/zip/entry_set.rb:38:in `block in each'
from /home/aaron/.rvm/gems/ruby-2.4.1#appt/gems/rubyzip-1.2.1/lib/zip/entry_set.rb:37:in `each'
from /home/aaron/.rvm/gems/ruby-2.4.1#appt/gems/rubyzip-1.2.1/lib/zip/entry_set.rb:37:in `each'
from /home/aaron/.rvm/gems/ruby-2.4.1#appt/gems/rubyzip-1.2.1/lib/zip/file.rb:312:in `block (2 levels) in commit'
from /home/aaron/.rvm/gems/ruby-2.4.1#appt/gems/rubyzip-1.2.1/lib/zip/output_stream.rb:53:in `open'
from /home/aaron/.rvm/gems/ruby-2.4.1#appt/gems/rubyzip-1.2.1/lib/zip/file.rb:311:in `block in commit'
from /home/aaron/.rvm/gems/ruby-2.4.1#appt/gems/rubyzip-1.2.1/lib/zip/file.rb:409:in `block in on_success_replace'
from /home/aaron/.rvm/rubies/ruby-2.4.1/lib/ruby/2.4.0/tmpdir.rb:130:in `create'
from /home/aaron/.rvm/gems/ruby-2.4.1#appt/gems/rubyzip-1.2.1/lib/zip/file.rb:407:in `on_success_replace'
from /home/aaron/.rvm/gems/ruby-2.4.1#appt/gems/rubyzip-1.2.1/lib/zip/file.rb:310:in `commit'
from /home/aaron/.rvm/gems/ruby-2.4.1#appt/gems/rubyzip-1.2.1/lib/zip/file.rb:334:in `close'
from /home/aaron/.rvm/gems/ruby-2.4.1#appt/gems/rubyzip-1.2.1/lib/zip/file.rb:103:in `ensure in open'
from /home/aaron/.rvm/gems/ruby-2.4.1#appt/gems/rubyzip-1.2.1/lib/zip/file.rb:103:in `open'
from zip.rb:34:in `save'
from zip.rb:56:in `<main>'
You need to change your update code to below
def save(path)
Zip::File.open(path, Zip::File::CREATE) do |out|
#zip.each do |entry|
if #replace[entry.name]
# this line creates the error
out.get_output_stream(entry.name){ |f| f.puts #replace[entry.name] }
else
# this line also will do it.
# out.get_output_stream(entry.name).write(#zip.read(entry.name))
out.get_output_stream(entry.name){ |f| f.puts #zip.read(entry.name) }
end
end
end
end
And then the file will get created
Edit-1
Below is the final code that I had used for testing
require 'rubygems'
require 'zip' # rubyzip gem
class WordMailMerge
def self.open(path, &block)
self.new(path, &block)
end
def initialize(path, &block)
#replace = {}
if block_given?
#zip = Zip::File.open(path)
yield(self)
#zip.close
else
#zip = Zip::File.open(path)
end
end
def force_settings
#replace["word/settings.xml"] = %{<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w:settings xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:sl="http://schemas.openxmlformats.org/schemaLibrary/2006/main"><w:zoom w:percent="100"/></w:settings>}
end
def merge(rec)
xml = #zip.read("word/document.xml")
# replace tags with correct content
#replace["word/document.xml"] = xml.gsub("{name}", "Tarun lalwani")
end
def save(path)
Zip::File.open(path, Zip::File::CREATE) do |out|
#zip.each do |entry|
if #replace[entry.name]
# this line creates the error
out.get_output_stream(entry.name){ |f| f.puts #replace[entry.name] }
else
# this line also will do it.
# out.get_output_stream(entry.name).write(#zip.read(entry.name))
out.get_output_stream(entry.name){ |f| f.puts #zip.read(entry.name) }
end
end
end
end
def close
#zip.close
end
end
w = WordMailMerge.open("Option_2.docx")
w.force_settings
w.merge({})
w.save("Option_2_new.docx")
Option_2.docx
Option_2_new.doc

Ruby not breaking out of while loop

I'm trying to get this to loop while a specific element exists on the page. The code runs and grabs the urls I want, however, when the next button is no longer on the page it wont break out of the loop and throws the following error.
/Users/someone/.rbenv/versions/2.2.0/lib/ruby/gems/2.2.0/gems/rspec-expectations-3.2.0/lib/rspec/matchers.rb:926:in `method_missing': undefined method `each' for nil:NilClass (NoMethodError)
from /something/something/something.rb:30:in `block in <top (required)>'
from /something/something/something.rb:28:in `open'
from /something/something/something.rb:29:in `<top (required)>'
from -e:1:in `load'
from -e:1:in `<main>'</code>
Brand new to Ruby, so please be gentle ;)
require 'capybara/poltergeist'
require 'capybara/dsl'
require 'csv'
require 'rspec'
include RSpec::Matchers
include Capybara::DSL
Capybara.register_driver :poltergeist do |app|
Capybara::Poltergeist::Driver.new(app,timeout: 60, :phantomjs_options => ['--debug=no', '--load-images=yes', '--ignore-ssl-errors=yes', '--ssl-protocol=TLSv1'], :debug => false)
end
Capybara.default_driver = :poltergeist
Capybara.javascript_driver = :poltergeist
Capybara.default_wait_time = 20
Capybara.ignore_hidden_elements = true
Capybara.current_session.driver.resize(1200, 1000)
visit('site.com')
while page.find(:xpath, 'html/body/div[4]/div[6]/div[1]/div[2]/div[2]/div[1]/div[2]/button[1]') do
page.find(:xpath, 'html/body/div[4]/div[6]/div[1]/div[2]/div[2]/div[1]/div[2]/button[1]').click
urls = page.all('.author-name>a').map { |a| a['href'] }.uniq
puts urls
end
puts urls
f = File.open("profiles.txt", "a") do |f|
urls.each { |element| f.puts(element) }
end

Zlib::BufError when using progressbar/ruby-progressbar gem

I use the following Ruby snippet to download a 8.9MB file.
require 'open-uri'
require 'net/http'
require 'uri'
def http_download_no_progress_bar(uri, filename)
uri.open(read_timeout: 500) do |file|
open filename, 'w' do |io|
file.each_line do |line|
io.write line
end
end
end
end
I want to add the progressbar gem to visualize the download process:
require 'open-uri'
require 'progressbar'
require 'net/http'
require 'uri'
def http_download_with_progressbar(uri, filename)
progressbar = nil
uri.open(
read_timeout: 500,
content_length_proc: lambda { |total|
if total && 0 < total.to_i
progressbar = ProgressBar.new("...", total)
progressbar.file_transfer_mode
end
},
progress_proc: lambda { |step|
progressbar.set step if progressbar
}
) do |file|
open filename, 'w' do |io|
file.each_line do |line|
io.write line
end
end
end
end
However, it now fails with the following error:
/home/user/.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/net/http/response.rb:357:in `finish':
buffer error (Zlib::BufError)oooooo | 8.0MB 8.6MB/s ETA: 0:00:00
from /home/user/.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/net/http/response.rb:357:in `finish'
from /home/user/.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/net/http/response.rb:262:in `ensure in inflater'
from /home/user/.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/net/http/response.rb:262:in `inflater'
from /home/user/.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/net/http/response.rb:274:in `read_body_0'
from /home/user/.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/net/http/response.rb:201:in `read_body'
from /home/user/.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/open-uri.rb:328:in `block (2 levels) in open_http'
from /home/user/.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/net/http.rb:1415:in `block (2 levels) in transport_request'
from /home/user/.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/net/http/response.rb:162:in `reading_body'
from /home/user/.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/net/http.rb:1414:in `block in transport_request'
from /home/user/.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/net/http.rb:1405:in `catch'
from /home/user/.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/net/http.rb:1405:in `transport_request'
from /home/user/.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/net/http.rb:1378:in `request'
from /home/user/.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/open-uri.rb:319:in `block in open_http'
from /home/user/.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/net/http.rb:853:in `start'
from /home/user/.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/open-uri.rb:313:in `open_http'
from /home/user/.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/open-uri.rb:724:in `buffer_open'
from /home/user/.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/open-uri.rb:210:in `block in open_loop'
from /home/user/.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/open-uri.rb:208:in `catch'
from /home/user/.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/open-uri.rb:208:in `open_loop'
from /home/user/.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/open-uri.rb:149:in `open_uri'
from /home/user/.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/open-uri.rb:704:in `open'
Meanwhile I also tried the ruby-progressbar gem:
require 'open-uri'
require 'ruby-progressbar'
require 'net/http'
require 'uri'
def http_download_with_ruby_progressbar(uri, filename)
progressbar = nil
uri.open(
read_timeout: 500,
content_length_proc: lambda { |total|
if total && 0 < total.to_i
progressbar = ProgressBar.create(title: filename, total: total)
end
},
progress_proc: lambda { |step|
progressbar.progress = step if progressbar
}
) do |file|
open filename, 'w' do |io|
file.each_line do |line|
io.write line
end
end
end
end
It fails with the same error. Here is the associated issue for the problem.
The problem is the file you are trying to download as every method works with this file: https://androidnetworktester.googlecode.com/files/1mb.txt.
The problem is that your file is larger than it says it is. The content_length_proc says that it is 8549968 bytes (8.15MB) whereas it is 101187668 bytes (96.5MB) (check with ls after downloading the file). Now I have an alternative that does not crash and gives you a progressbar:
def http_download_with_words(uri, filename)
bytes_total = nil
uri.open(
read_timeout: 500,
:content_length_proc => lambda{|content_length|
bytes_total = content_length},
:progress_proc => lambda{|bytes_transferred|
if bytes_total
# Print progress
print("\r#{bytes_transferred}/#{bytes_total}")
else
# We don’t know how much we get, so just print number
# of transferred bytes
print("\r#{bytes_transferred} (total size unknown)")
end
}
) do |file|
open filename, 'w' do |io|
file.each_line do |line|
io.write line
end
end
end
end
http_download_with_words(URI( 'http://data.wien.gv.at/daten/geo?service=WFS&request=GetFeature&version=1.1.0&typeName=ogdwien%3aBAUMOGD&srsName=EPSG:4326' ), 'temp.txt')
which is pretty self-explanatory, (seen here.)
Now the part I haven't been able to figure out is how exactly the progressbar gem is interfering with the ZLib. Most things seem to work fine inside the procs (e.g. having them print random stuff) so I assume both of these progressbars do something odd on completion that somehow messes with the transfer. I'd be very interested if anyone can figure out why that is?
In my testing when this occurred it was due to the raise in #set. As for why it results in an error in Zlib, that's not clear. Perhaps some strange exception handling in there. In my case I did "progbar.set(count) rescue nil" to get rid of the issue.

Creating a file for each URL of a site using Ruby

I have the following code, which creates a file with the content from a crawled site:
require 'rubygems'
require 'anemone'
require 'nokogiri'
require 'open-uri'
Anemone.crawl("http://www.findbrowsenodes.com/", :delay => 3) do |anemone|
anemone.on_pages_like(/http:\/\/www.findbrowsenodes.com\/us\/.+\/[\d]*/) do | page |
doc = Nokogiri::HTML(open(page.url))
node_id = doc.at_css("#n_info #clipnode").text unless doc.at_css("#n_info #clipnode").nil?
node_name = doc.at_css("#n_info .node_name").text unless doc.at_css("#n_info .node_name").nil?
node_url = page.url
open("filename.txt", "a") do |f|
f.puts "#{node_id}\t#{node_name}\t#{node_url}"
end
end
end
Now I want to create not one but various files named node_id. I tried this:
page.each do |p|
p.open("#{node_id}.txt", "a") do |f|
f.puts "#{node_id}\t#{node_name}\t#{node_url}"
end
end
but got this:
undefined method `value' for #<Nokogiri::XML::DTD:0x51c089a name="html"> (NoMethodError)
then tried this:
page.open("#{node_id}.txt", "a") do |f|
f.puts "#{node_id}\t#{node_name}\t#{node_url}"
end
but got this:
private method `open' called for #<Anemone::Page:0x91472e8> (NoMethodError)
What's the right way of doing this?
File.open("#{node_id}.txt", "w") do |f|
f.puts "stuff"
end
How you make the assignment to node_id is up to you.

Ruby Net::HTTP time out

I'm trying to write my first Ruby program, but have a problem. The code has to download 32 MP3 files over HTTP. It actually downloads a few, then times-out.
I tried setting a timeout period, but it makes no difference. Running the code under Windows, Cygwin and Mac OS X has the same result.
This is the code:
require 'rubygems'
require 'open-uri'
require 'nokogiri'
require 'set'
require 'net/http'
require 'uri'
puts "\n Up and running!\n\n"
links_set = {}
pages = ['http://www.vimeo.com/siai/videos/sort:oldest',
'http://www.vimeo.com/siai/videos/page:2/sort:oldest',
'http://www.vimeo.com/siai/videos/page:3/sort:oldest']
pages.each do |page|
doc = Nokogiri::HTML(open(page))
doc.search('//*[#href]').each do |m|
video_id = m[:href]
if video_id.match(/^\/(\d+)$/i)
links_set[video_id[/\d+/]] = m.children[0].to_s.split(" at ")[0].split(" -- ")[0]
end
end
end
links = links_set.to_a
p links
cookie = ''
file_name = ''
open("http://www.tubeminator.com") {|f|
cookie = f.meta['set-cookie'].split(';')[0]
}
links.each do |link|
open("http://www.tubeminator.com/ajax.php?function=downloadvideo&url=http%3A%2F%2Fwww.vimeo.com%2F" + link[0],
"Cookie" => cookie) {|f|
puts f.read
}
open("http://www.tubeminator.com/ajax.php?function=convertvideo&start=0&duration=1120&size=0&format=mp3&vq=high&aq=high",
"Cookie" => cookie) {|f|
file_name = f.read
}
puts file_name
Net::HTTP.start("www.tubeminator.com") { |http|
#http.read_timeout = 3600 # 1 hour
resp = http.get("/download-video-" + file_name)
open(link[1] + ".mp3", "wb") { |file|
file.write(resp.body)
}
}
end
puts "\n Yay!!"
And this is the exception:
/Users/test/.rvm/rubies/ruby-1.9.2-preview1/lib/ruby/1.9.1/net/protocol.rb:140:in `rescue in rbuf_fill': Timeout::Error (Timeout::Error)
from /Users/test/.rvm/rubies/ruby-1.9.2-preview1/lib/ruby/1.9.1/net/protocol.rb:134:in `rbuf_fill'
from /Users/test/.rvm/rubies/ruby-1.9.2-preview1/lib/ruby/1.9.1/net/protocol.rb:116:in `readuntil'
from /Users/test/.rvm/rubies/ruby-1.9.2-preview1/lib/ruby/1.9.1/net/protocol.rb:126:in `readline'
from /Users/test/.rvm/rubies/ruby-1.9.2-preview1/lib/ruby/1.9.1/net/http.rb:2138:in `read_status_line'
from /Users/test/.rvm/rubies/ruby-1.9.2-preview1/lib/ruby/1.9.1/net/http.rb:2127:in `read_new'
from /Users/test/.rvm/rubies/ruby-1.9.2-preview1/lib/ruby/1.9.1/net/http.rb:1120:in `transport_request'
from /Users/test/.rvm/rubies/ruby-1.9.2-preview1/lib/ruby/1.9.1/net/http.rb:1106:in `request'
from /Users/test/.rvm/rubies/ruby-1.9.2-preview1/lib/ruby/1.9.1/open-uri.rb:312:in `block in open_http'
from /Users/test/.rvm/rubies/ruby-1.9.2-preview1/lib/ruby/1.9.1/net/http.rb:564:in `start'
from /Users/test/.rvm/rubies/ruby-1.9.2-preview1/lib/ruby/1.9.1/open-uri.rb:306:in `open_http'
from /Users/test/.rvm/rubies/ruby-1.9.2-preview1/lib/ruby/1.9.1/open-uri.rb:767:in `buffer_open'
from /Users/test/.rvm/rubies/ruby-1.9.2-preview1/lib/ruby/1.9.1/open-uri.rb:203:in `block in open_loop'
from /Users/test/.rvm/rubies/ruby-1.9.2-preview1/lib/ruby/1.9.1/open-uri.rb:201:in `catch'
from /Users/test/.rvm/rubies/ruby-1.9.2-preview1/lib/ruby/1.9.1/open-uri.rb:201:in `open_loop'
from /Users/test/.rvm/rubies/ruby-1.9.2-preview1/lib/ruby/1.9.1/open-uri.rb:146:in `open_uri'
from /Users/test/.rvm/rubies/ruby-1.9.2-preview1/lib/ruby/1.9.1/open-uri.rb:669:in `open'
from /Users/test/.rvm/rubies/ruby-1.9.2-preview1/lib/ruby/1.9.1/open-uri.rb:33:in `open'
from test.rb:38:in `block in <main>'
from test.rb:37:in `each'
from test.rb:37:in `<main>'
I'd also appreciate your comments on the rest of the code.
For Ruby 1.8 I used this to solve my time-out issues. Extending the Net::HTTP class in my code and re-initialized with default parameters including an initialization of my own read_timeout should keep things sane I think.
require 'net/http'
# Lengthen timeout in Net::HTTP
module Net
class HTTP
alias old_initialize initialize
def initialize(*args)
old_initialize(*args)
#read_timeout = 5*60 # 5 minutes
end
end
end
Your timeout isn't in the code you set the timeout for. It's here, where you use open-uri:
open("http://www.tubeminator.com/ajax.php?function=downloadvideo&url=http%3A%2F%2Fwww.vimeo.com%2F" + link[0],
You can set a read timeout for open-uri like so:
#!/usr/bin/ruby1.9
require 'open-uri'
open('http://stackoverflow.com', 'r', :read_timeout=>0.01) do |http|
http.read
end
# => /usr/lib/ruby/1.9.0/net/protocol.rb:135:in `sysread': \
# => execution expired (Timeout::Error)
# => ...
# => from /tmp/foo.rb:5:in `<main>'
:read_timeout is new for Ruby 1.9 (it's not in Ruby 1.8). 0 or nil means "no timeout."

Resources