how to test open-uri url exist before processing any data - ruby

I'm trying to process content from a list of links using "open-uri" in ruby (1.8.6), but the bad thing happens when I'm getting an error when one link is broken or requires authentication:
open-uri.rb:277:in `open_http': 404 Not Found (OpenURI::HTTPError)
from C:/tools/Ruby/lib/ruby/1.8/open-uri.rb:616:in `buffer_open'
from C:/tools/Ruby/lib/ruby/1.8/open-uri.rb:164:in `open_loop'
from C:/tools/Ruby/lib/ruby/1.8/open-uri.rb:162:in `catch'
or
C:/tools/Ruby/lib/ruby/1.8/net/http.rb:560:in `initialize': getaddrinfo: no address associated with hostname. (SocketError)
from C:/tools/Ruby/lib/ruby/1.8/net/http.rb:560:in `open'
from C:/tools/Ruby/lib/ruby/1.8/net/http.rb:560:in `connect'
from C:/tools/Ruby/lib/ruby/1.8/timeout.rb:53:in `timeout'
or
C:/tools/Ruby/lib/ruby/1.8/net/protocol.rb:133:in `sysread': An existing connection was forcibly closed by the remote host. (Errno::ECONNRESET)
from C:/tools/Ruby/lib/ruby/1.8/net/protocol.rb:133:in `rbuf_fill'
from C:/tools/Ruby/lib/ruby/1.8/timeout.rb:62:in `timeout'
from C:/tools/Ruby/lib/ruby/1.8/timeout.rb:93:in `timeout'
is there a way to test the response (url) before processing any data?
the code is:
require 'open-uri'
smth.css.each do |item|
open('item[:name]', 'wb') do |file|
file << open('item[:href]').read
end
end
Many thanks

You could try something along the lines of
require 'open-uri'
smth.css.each do |item|
begin
open('item[:name]', 'wb') do |file|
file << open('item[:href]').read
end
rescue => e
case e
when OpenURI::HTTPError
# do something
when SocketError
# do something else
else
raise e
end
rescue SystemCallError => e
if e === Errno::ECONNRESET
# do something else
else
raise e
end
end
end
I don't know of any way of testing the connection without opening it and trying, so rescuing these errors would be the only way I can think of. The thing to be aware of is that OpenURI::HTTPError and SocketError are both subclasses of StandardError, whereas Errno::ECONNRESET is a subclass of SystemCallError. So rescue => e won't catch Errno::ECONNRESET.

I was able to solve this problem by using a conditional if/else statement to check the return value of the action for "failure":
def controller_action
url = "some_API"
response = open(url).read
data = JSON.parse(response)["data"]
if response["status"] == "failure"
redirect_to :action => "home"
else
do_something_else
end
end

Related

Crawling list of URLs and bypass those with no DNS

I am crawling a large list of URLS with Ruby but all the URLS I have are not active and not associated with a DNS. When I hit that url my crawler errors.
require 'rubygems'
require 'nokogiri'
require 'open-uri'
require 'net/http'
require 'colorize'
URL_LIST = [
'http://website.com',
'http://website.net'
]
URL_LIST.each do |url|
item = "#{url}"
resp = Net::HTTP.get_response(URI.parse(item))
case resp.code.to_i
when 200
puts "Success: #{url}".green
when 301..303
new_url = resp['location']
puts "Redirect #{url} => #{new_url}".yellow
else
resp.code
end
end
When I run this script and hit a bad url I receive an error like this:
/Users/<name>/.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/net/http.rb:879:in `initialize': getaddrinfo: nodename nor servname provided, or not known (SocketError)
from /Users/<name>/.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/net/http.rb:879:in `open'
from /Users/<name>/.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/net/http.rb:879:in `block in connect'
from /Users/<name>/.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/timeout.rb:76:in `timeout'
from /Users/<name>/.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/net/http.rb:878:in `connect'
from /Users/<name>/.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/net/http.rb:863:in `do_start'
from /Users/<name>/.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/net/http.rb:852:in `start'
from /Users/<name>/.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/net/http.rb:583:in `start'
from /Users/<name>/.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/net/http.rb:478:in `get_response'
from spider.rb:808:in `block in <main>'
from spider.rb:806:in `each'
from spider.rb:806:in `<main>'
Use a begin/rescue block to rescue the error and output error info in red:
URL_LIST = [
'http://website.com',
'http://sdfasdfwqeasdfasdfr.com',
'http://website.net'
]
URL_LIST.each do |url|
item = "#{url}"
begin
resp = Net::HTTP.get_response(URI.parse(item))
case resp.code.to_i
when 200
puts "Success: #{url}".green
when 301..303
new_url = resp['location']
puts "Redirect #{url} => #{new_url}".yellow
else
resp.code
end
rescue SocketError => e
puts "Error: #{url} - #{e}".red
end
end
The output will look like:
Redirect http://website.com => http://www.website.com/
Error: http://sdfasdfwqeasdfasdfr.com - getaddrinfo: nodename nor servname provided, or not known
Success: http://website.net

"rescue Exception" not rescuing Timeout::Error in net_http

We appear to have a situation where rescue Exception is not catching a particular exception.
I'm trying to send an email alert about any exception that occurs, and then continue processing. We've put in the requisit handling of intentional exits. We want the loop to keep going, after alerting us, for anything else.
The exception that is not being caught is ostensibly Timeout::Error, according to the stack trace.
Here is the stack trace, with references to my intermediate code removed (the last line of my code is request.rb:93):
/opt/ruby-enterprise/lib/ruby/1.8/timeout.rb:64:in `rbuf_fill': execution expired (Timeout::Error)
from /opt/ruby-enterprise/lib/ruby/1.8/net/protocol.rb:134:in `rbuf_fill'
from /opt/ruby-enterprise/lib/ruby/1.8/net/protocol.rb:116:in `readuntil'
from /opt/ruby-enterprise/lib/ruby/1.8/net/protocol.rb:126:in `readline'
from /opt/ruby-enterprise/lib/ruby/1.8/net/http.rb:2028:in `read_status_line'
from /opt/ruby-enterprise/lib/ruby/1.8/net/http.rb:2017:in `read_new'
from /opt/ruby-enterprise/lib/ruby/1.8/net/http.rb:1051:in `__request__'
from /mnt/data/blueleaf/releases/20150211222522/vendor/bundle/ruby/1.8/gems/rest-client-1.6.7/lib/restclient/net_http_ext.rb:51:in `request'
from /opt/ruby-enterprise/lib/ruby/1.8/net/http.rb:1037:in `__request__'
from /opt/ruby-enterprise/lib/ruby/1.8/net/http.rb:543:in `start'
from /opt/ruby-enterprise/lib/ruby/1.8/net/http.rb:1035:in `__request__'
from /mnt/data/blueleaf/releases/20150211222522/vendor/bundle/ruby/1.8/gems/rest-client-1.6.7/lib/restclient/net_http_ext.rb:51:in `request'
from /mnt/data/blueleaf/releases/20150211222522/app/models/dst/request.rb:93:in `send'
[intermediate code removed]
from script/dst_daemon.rb:49
from script/dst_daemon.rb:46:in `each'
from script/dst_daemon.rb:46
from /opt/ruby-enterprise/lib/ruby/1.8/benchmark.rb:293:in `measure'
from script/dst_daemon.rb:45
from script/dst_daemon.rb:24:in `loop'
from script/dst_daemon.rb:24
from script/runner:3:in `eval'
from /mnt/data/blueleaf/releases/20150211222522/vendor/bundle/ruby/1.8/gems/rails-2.3.14/lib/commands/runner.rb:46
from script/runner:3:in `require'
Here is request.rb#send, with line 93 indicated with a comment:
def send
build
uri = URI.parse([DST::Request.configuration[:prefix], #path].join('/'))
https = Net::HTTP.new(uri.host, uri.port)
https.use_ssl = true
https.verify_mode = OpenSSL::SSL::VERIFY_NONE
https_request = Net::HTTP::Post.new(uri.request_uri.tap{|e| debug_puts "\nURL: #{e}, host:#{uri.host}"})
# line 93:
https_request.body = request
response = https.request(https_request)
# the rest should be irrelevant
Here is dst_daemon.rb; line 49 is indicated with a comment, and the rescue Exception that should catch anything other than deliberate interrupts is near the end:
DST::Request.environment = :production
class DST::Request::RequestFailed < Exception; end
Thread.abort_on_exception = true
SEMAPHORE = 'import/dst/start.txt' unless defined?(SEMAPHORE)
DEBUG_DST = 'import/dst/debug.txt' unless defined?(DEBUG_DST)
DEBUG_LOG = 'import/dst/debug.log' unless defined?(DEBUG_LOG)
def debug_dst(*args)
File.open(DEBUG_LOG, 'a') do |f|
f.print "#{Time.now.localtime}: "
f.puts(*args)
end if debug_dst?
end
def debug_dst?
File.exist?(DEBUG_DST)
end
dst_ids = [Institution::BAA_DST_WS_CLIENT_ID, Institution::BAA_DST_WS_DEALER_ID]
institutions = Institution.find_all_by_baa_api_financial_institution_id(dst_ids)
DST::Collector.prime_key!
loop do
begin
if File.exist?(SEMAPHORE)
debug_dst 'waking up...'
custodians = InstitutionAccount.acts_as_baa_custodian.
find_all_by_institution_id(institutions).select(&:direct?)
good,bad = custodians.partition do |c|
c.custodian_users.map{|e2|e2.custodian_passwords.count(:conditions => ['expired is not true']) == 1}.all?
end
if bad.present?
msg = " skipping: \n"
bad.each do |c|
msg += " #{c.user.full_name_or_email}, custodian id #{c.id}: "
c.custodian_users.each{|cu| msg += "#{cu.username}:#{cu.custodian_passwords.count(:conditions => ['expired is not true'])}; "}
msg += "\n"
end
AdminSimpleMailer.deliver_generic_mail("DST Daemon skipping #{bad.size} connections", msg)
debug_dst msg
end
Benchmark.measure do
good.each do |custodian|
begin
debug_dst " collecting for: #{custodian.name}, #{custodian.subtitle}, (#{custodian.id.inspect})"
# line 49:
DST::Collector.new(custodian, 0).collect!
rescue DST::Request::PasswordFailed, DST::Request::RequestFailed => e
message = e.message + "\n\n" + e.backtrace.join("\n")
AdminSimpleMailer.deliver_generic_mail("DST Daemon Connection Failed #{e.class.name}", message)
debug_dst " skipping, #{e.class}"
end
end
end.tap{|duration| debug_dst "collection done, duration #{duration.real.to_f/60} minutes. importing" }
DST::Strategy.new(Date.yesterday, :recompute => true).import!
debug_dst 'import done.'
rm SEMAPHORE, :verbose => debug_dst?
else
debug_dst 'sleeping.' if Time.now.strftime("%M").to_i % 5 == 0
end
rescue SystemExit, Interrupt
raise
rescue Exception => e
message = e.message + "\n\n" + e.backtrace.join("\n")
AdminSimpleMailer.deliver_generic_mail("DST Daemon Exception #{e.class.name}", message)
ensure
sleep 60
end
end
Shouldn't it be impossible for this loop to exit with a stack trace other than from SystemExit or Interrupt?
As you probably know already, calling raise inside a rescue block will raise the exception to the caller.
Since Timeout::Error is an Interrupt in ruby 1.8*, the timeout exception raised by net_http gets handled in the rescue SystemExit, Interrupt block rather than in the following rescue Exception => e.
To verify that Timeout::Error is an Interrupt, just evaluate Timeout::Error.ancestors. What you get out of that is the hierarchy of classes Timeout::Error inherits from.
*this is no longer the case in ruby1.9.

ruby rest_client exception handling

I'd like to do some HTTP REST requests in Ruby, using rest-client gem,
Following readme.md at https://github.com/rest-client/rest-client
I wrote this simple command line script, trying to catch exceptions in case of response codes differents from 2xx:
RestClient.get('http://thisurldoesnotexist/resource') { |response, request, result, &block|
case response.code
when 200
p "It worked !"
response
else
response.return!(request, result, &block)
end
}
Hi got this on stdout output:
/home/*****/.rvm/rubies/ruby-2.0.0-p247/lib/ruby/2.0.0/net/http.rb:878:in `initialize': getaddrinfo: Name or service not known (SocketError)
from /home/solyaris/.rvm/rubies/ruby-2.0.0-p247/lib/ruby/2.0.0/net/http.rb:878:in `open'
from /home/solyaris/.rvm/rubies/ruby-2.0.0-p247/lib/ruby/2.0.0/net/http.rb:878:in `block in connect'
from /home/solyaris/.rvm/rubies/ruby-2.0.0-p247/lib/ruby/2.0.0/timeout.rb:52:in `timeout'
from /home/solyaris/.rvm/rubies/ruby-2.0.0-p247/lib/ruby/2.0.0/net/http.rb:877:in `connect'
from /home/solyaris/.rvm/rubies/ruby-2.0.0-p247/lib/ruby/2.0.0/net/http.rb:862:in `do_start'
from /home/solyaris/.rvm/rubies/ruby-2.0.0-p247/lib/ruby/2.0.0/net/http.rb:851:in `start'
from /home/solyaris/.rvm/gems/ruby-2.0.0-p247/gems/rest-client-1.6.7/lib/restclient/request.rb:172:in `transmit'
from /home/solyaris/.rvm/gems/ruby-2.0.0-p247/gems/rest-client-1.6.7/lib/restclient/request.rb:64:in `execute'
from /home/solyaris/.rvm/gems/ruby-2.0.0-p247/gems/rest-client-1.6.7/lib/restclient/request.rb:33:in `execute'
from /home/solyaris/.rvm/gems/ruby-2.0.0-p247/gems/rest-client-1.6.7/lib/restclient.rb:68:in `get'
from prova_rest.rb:3:in `<main>'
How can i catch SocketError ?
where I'm wrong ?
thanks
giorgio
The callback block is executed only when receiving some response from the server. In this case, the name resolving is failed so RestClient.get just throws an exception without entering the block. Thus just wrap your code within a begin...end construct.
begin
RestClient.get('http://thisurldoesnotexist/resource') { |response, request, result, &block|
case response.code
when 200
p "It worked !"
response
else
response.return!(request, result, &block)
end
}
rescue SocketError => e
# Handle your error here
end

How to handle timeout with net/http in ruby while sending requests to IPs from given IP range and skip IPs with timeout and move to next ones?

I want to handle timeout for IP range taken from console for which I make requests to IPs within taken range and getting timeout error.
I want to make requests to all IPs and get responses from them.
For IP that time out , want to skip it and move to next one. How to handle this so loop dont get exception and script sends request to all IPs that can give response handling timed out ones.
Attaching code here:
require 'net/http'
require 'uri'
require 'ipaddr'
puts "Origin IP:"
originip = gets()
(IPAddr.new("209.85.175.121")..IPAddr.new("209.85.175.150")).each do |address|
req = Net::HTTP.get(URI.parse("http://#{address.to_s}"))
puts req
end
Error:
C:/Ruby187/lib/ruby/1.8/net/http.rb:560:in `initialize': A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. - connect(2) (Errno::ETIMEDOUT)
from C:/Ruby187/lib/ruby/1.8/net/http.rb:560:in `open'
from C:/Ruby187/lib/ruby/1.8/net/http.rb:560:in `connect'
from C:/Ruby187/lib/ruby/1.8/timeout.rb:53:in `timeout'
from C:/Ruby187/lib/ruby/1.8/timeout.rb:101:in `timeout'
from C:/Ruby187/lib/ruby/1.8/net/http.rb:560:in `connect'
from C:/Ruby187/lib/ruby/1.8/net/http.rb:553:in `do_start'
from C:/Ruby187/lib/ruby/1.8/net/http.rb:542:in `start'
from C:/Ruby187/lib/ruby/1.8/net/http.rb:379:in `get_response'
from C:/Ruby187/lib/ruby/1.8/net/http.rb:356:in `get'
from IP Range 2.rb:9
from IP Range 2.rb:8:in `each'
Just like Marc says. You should rescue the exception. Like so:
begin
response = Net::HTTP.get(...)
rescue Errno::ECONNREFUSED => e
# Do what you think needs to be done
end
Also, what you get back from the call to get() is a response, not a request.
Catch the exception using timeout,
require 'timeout'
(IPAddr.new("209.85.175.121")..IPAddr.new("209.85.175.150")).each do |address|
begin
req = Net::HTTP.get(URI.parse("http://#{address.to_s}"))
puts req
rescue Timeout::Error => exc
puts "ERROR: #{exc.message}"
rescue Errno::ETIMEDOUT => exc
puts "ERROR: #{exc.message}"
# uncomment the following two lines, if you are not able to track the exception type.
#rescue Exception => exc
# puts "ERROR: #{exc.message}"
end
end
Edit: When we rescue Timeout::Error, only those exceptions which belongs to Timeout::Error class will be caught. We need to catch the raised exception using their error class, updated the code accordingly.

Limit to how many errors can be rescued?

I have a program that I'm using as a pentesting tool, I'm in the process of discovering if websites are SQL vulnerable and came across a Timeout::Error now I have tried to rescue the error but there's also a few other errors that need to be rescued as well. So my question is, is there a limit to how many errors can be rescued within a rescue block? And if not why is this Timeout not getting rescued?
Source:
def get_urls
info("Searching for possible SQL vulnerable sites.")
#agent = Mechanize.new
page = #agent.get('http://www.google.com/')
google_form = page.form('f')
google_form.q = "#{SEARCH}"
url = #agent.submit(google_form, google_form.buttons.first)
url.links.each do |link|
if link.href.to_s =~ /url.q/
str = link.href.to_s
str_list = str.split(%r{=|&})
urls = str_list[1]
next if str_list[1].split('/')[2] == "webcache.googleusercontent.com"
urls_to_log = urls.gsub("%3F", '?').gsub("%3D", '=')
success("Site found: #{urls_to_log}")
File.open("#{PATH}/temp/SQL_sites_to_check.txt", "a+") {|s| s.puts("#{urls_to_log}'")}
end
end
info("Possible vulnerable sites dumped into #{PATH}/temp/SQL_sites.txt")
end
def check_if_vulnerable
info("Checking if sites are vulnerable.")
IO.read("#{PATH}/temp/SQL_sites_to_check.txt").each_line do |parse|
Timeout::timeout(5) do
begin
#parsing = Nokogiri::HTML(RestClient.get("#{parse.chomp}"))
rescue Timeout::Error, RestClient::ResourceNotFound, RestClient::SSLCertificateNotVerified
if RestClient::ResourceNotFound
warn("URL: #{parse.chomp} returned 404 error, URL dumped into 404 bin")
File.open("#{PATH}/lib/404_bin.txt", "a+"){|s| s.puts(parse)}
elsif RestClient::SSLCertificateNotVerified
err("URL: #{parse.chomp} requires SSL cert, url dumped into SSL bin")
File.open("#{PATH}/lib/SSL_bin.txt", "a+"){|s| s.puts(parse)}
elsif Timeout::Error
warn("URL: #{parse.chomp} failed to load resulting in time out after 10 seconds. URL dumped into TIMEOUT bin")
File.open("#{PATH}/lib/TIMEOUT_bin.txt", "a+"){|s| s.puts(parse)}
end
end
end
end
end
Error:
C:/Ruby22/lib/ruby/2.2.0/net/http.rb:892:in `new': execution expired (Timeout::E
rror)
from C:/Ruby22/lib/ruby/2.2.0/net/http.rb:892:in `connect'
from C:/Ruby22/lib/ruby/2.2.0/net/http.rb:863:in `do_start'
from C:/Ruby22/lib/ruby/2.2.0/net/http.rb:852:in `start'
from C:/Ruby22/lib/ruby/gems/2.2.0/gems/rest-client-1.8.0-x86-mingw32/li
b/restclient/request.rb:413:in `transmit'
from C:/Ruby22/lib/ruby/gems/2.2.0/gems/rest-client-1.8.0-x86-mingw32/li
b/restclient/request.rb:176:in `execute'
from C:/Ruby22/lib/ruby/gems/2.2.0/gems/rest-client-1.8.0-x86-mingw32/li
b/restclient/request.rb:41:in `execute'
from C:/Ruby22/lib/ruby/gems/2.2.0/gems/rest-client-1.8.0-x86-mingw32/li
b/restclient.rb:65:in `get'
from whitewidow.rb:94:in `block (2 levels) in check_if_vulnerable'
from C:/Ruby22/lib/ruby/2.2.0/timeout.rb:88:in `block in timeout'
from C:/Ruby22/lib/ruby/2.2.0/timeout.rb:32:in `block in catch'
from C:/Ruby22/lib/ruby/2.2.0/timeout.rb:32:in `catch'
from C:/Ruby22/lib/ruby/2.2.0/timeout.rb:32:in `catch'
from C:/Ruby22/lib/ruby/2.2.0/timeout.rb:103:in `timeout'
from whitewidow.rb:92:in `block in check_if_vulnerable'
from whitewidow.rb:91:in `each_line'
from whitewidow.rb:91:in `check_if_vulnerable'
from whitewidow.rb:113:in `<main>'
As you can see in the check_vulns method I have the Timeout::Error rescued. So what is causing this to timeout without moving to the next URL? I've tried adding a next to the rescue but it still doesn't work, help please?
By simply moving the Timeout I can rescue the error
def check_if_vulnerable
info("Checking if sites are vulnerable.")
IO.read("#{PATH}/temp/SQL_sites_to_check.txt").each_line do |parse|
begin
Timeout::timeout(5) do
#parsing = Nokogiri::HTML(RestClient.get("#{parse.chomp}"))
end
rescue Timeout::Error, RestClient::ResourceNotFound, RestClient::SSLCertificateNotVerified
if RestClient::ResourceNotFound
warn("URL: #{parse.chomp} returned 404 error, URL dumped into 404 bin")
File.open("#{PATH}/lib/404_bin.txt", "a+"){|s| s.puts(parse)}
elsif RestClient::SSLCertificateNotVerified
err("URL: #{parse.chomp} requires SSL cert, url dumped into SSL bin")
File.open("#{PATH}/lib/SSL_bin.txt", "a+"){|s| s.puts(parse)}
elsif Timeout::Error
warn("URL: #{parse.chomp} failed to load resulting in time out after 10 seconds. URL dumped into TIMEOUT bin")
File.open("#{PATH}/lib/TIMEOUT_bin.txt", "a+"){|s| s.puts(parse)}
end
end
end
end
end

Resources