Ruby URL.parse error - ruby

Here is my ruby program
require 'net/http'
require 'uri'
begin
url = URI.parse("http://google.com")
rescue Exception => err
p err
exit
end
http = Net::HTTP.new(url.host, url.port)
res = http.head("/")
p res.code
It works fine, however if I remove http:// from URL.parse(), It gives me this error:
/usr/lib/ruby/1.9.1/net/http.rb:1196:in `addr_port': undefined method `+' for nil:NilClass (NoMethodError) ...
from /usr/lib/ruby/1.9.1/net/http.rb:1094:in `request'
from /usr/lib/ruby/1.9.1/net/http.rb:860:in `head'
Is it the correct way to handle Exception ?
I know maybe the URL is not correct, but It should raise an exception URI::InvalidURIError instead of accepting and continue the program ?

If you say u = URI.parse('http://google.com'), you'll get a URI::HTTP back and the u.port will have a default of 80. If you say u = URI.parse('google.com'), you'll get a URI::Generic back with the u.port will be nil as will u.host.
So, when you do this:
url = URI.parse('google.com')
http = Net::HTTP.new(url.host, url.port)
You're really doing this:
http = Net::HTTP.new(nil, nil)
and Net::HTTP doesn't like that very much at all. You could try something like this instead:
if(str.to_s.empty?)
# complain loudly about a missing str
end
begin
url = URI.parse(str)
url = URI.parse('http://' + str) if !url.scheme
if(url.scheme != 'http' && url.scheme != 'https')
# more complaining about bad input
end
http = Net::HTTP.new(url.host, url.port)
#...
rescue URI::Error => e
# even yet more complaining
end
That sort of thing should bypass the exception completely and cover a few other things that you might be interested in.

You have to specifically catch URI::InvalidURIError, as it is not a descendant of Exception. See:
irb(main):002:0> URI::InvalidURIError.is_a?(Exception)
=> false
So the fix for your code would be:
begin
url = URI.parse("http://google.com")
rescue URI::InvalidURIError => err
p err
exit
end

The correct way is not to let any exception happen at all, but to check your conditions beforehand. Like this:
require 'net/http'
require 'uri'
begin
url = URI.parse("http://google.com")
rescue URI::InvalidURIError => err
p err
exit
end
if url.host && url.port
http = Net::HTTP.new(url.host, url.port)
res = http.head("/")
p res.code
else
p 'Error parsing url'
end

Related

Fastest way to check if a url exists

currently I am writing a program that needs to check tons of possible urls searching for any that actually exist. To be precise, I mean exist as in you can visit the url and there's actual content of some sort.. not string parsing to see if it's in url format.
The program generates a list of possible variants for a filename and then checks each one until it gets a url that actually exists, so most of the url remains the same. Examples would be,
https://www.test.com/folder1/FILE.png
https://www.test.com/folder1/File.png
https://www.test.com/folder1/file.png
https://www.test.com/folder1/file1.png
That said, my code currently works fine.. however it ends up taking about 2-4 secods per url check and I don't know of a way to speed it up. Is there any faster or better way to validate urls or am I just out of luck?
This is my function to validate urls:
require "net/http"
def url_exist? url_path
url = URI.parse(url_path)
req = Net::HTTP.new(url.host, url.port)
req.use_ssl = true
res = req.request_head(url.path)
if res.code == "200" || res.code == "403"
return true
end
end
Thank you for taking the time to read this and any help will be much appreciated.
Your code creates a new connection for each URL. It should be faster to send multiple requests over the same connection via HTTP keep-alive.
In Ruby, you can open such connection via Net::HTTP.start, e.g.:
require 'net/http'
class URLChecker
def initialize(base_url)
uri = URI(base_url)
Net::HTTP.start(uri.host, uri.port, use_ssl: uri.is_a?(URI::HTTPS)) do |http|
#http = http
yield self
end
end
def exist?(path)
res = #http.head(path)
res.code == '200' || res.code == '403'
end
end
URLChecker.new('https://stackoverflow.com') do |uc|
p uc.exist?('/questions/tagged/ruby') #=> true
p uc.exist?('/questions/tagged/python') #=> true
p uc.exist?('/questions/tagged/foobar') #=> false
end

Undefined method 'host' in rspec

I have the following methods in a Ruby script:
def parse_endpoint(endpoint)
return URI.parse(endpoint)
end
def verify_url(endpoint, fname)
url = “#{endpoint}#{fname}”
req = Net::HTTP.new(url.host, url.port)
res = req.request_head(url.path)
if res.code == “200”
true
else
puts “#{fname} is an invalid file”
false
end
end
Testing the url manually like so works fine (returns true since the url is indeed valid):
endpoint = parse_endpoint('http://mywebsite.com/mySubdirectory/')
verify_url(endpoint, “myFile.json”)
However, when I try to do the following in rspec
describe 'my functionality'
let (:endpoint) { parse_endpoint(“http://mywebsite.com/mySubdirectory/”) }
it 'should verify valid url' do
expect(verify_url(endpoint, “myFile.json”).to eq(true))
end
end
it gives me this error
“NoMethodError:
undefined method `host' for "http://mysebsite.com/mySubdirectory/myFile.json":String”
What am I doing wrong?
url is a String object, and you are trying to access a method called host which does not exist in String:
url = “#{endpoint}#{fname}”
req = Net::HTTP.new(url.host, url.port)
EDIT you probably need an URI object. I think this is what you want:
2.2.1 :004 > require 'uri'
=> true
2.2.1 :001 > url = 'http://mywebsite.com/mySubdirectory/'
=> "http://mywebsite.com/mySubdirectory/"
2.2.1 :005 > parsed_url = URI.parse url
=> #<URI::HTTP http://mywebsite.com/mySubdirectory/>
2.2.1 :006 > parsed_url.host
=> "mywebsite.com"
So just add url = URI.parse url before using url.host.
Testing the url manually like so works fine (returns true since the url is indeed valid):
endpoint = parse_endpoint('http://mywebsite.com/mySubdirectory/')
verify_url(endpoint, “myFile.json”)
It seems you missed something when you tested code above (maybe you tested old version) because it can't work as it is now.
Look at these lines of code:
url = "#{endpoint}#{fname}"
req = Net::HTTP.new(url.host, url.port)
You're creating a string variable url from other two variables endpoint and fname. So far, so good.
But then you're trying to access method host on url variable, which doesn't exist (but it exists on the endpoint variable), that's why you get this error.
You may want to use this code instead:
def verify_url(endpoint, fname)
url = endpoint.merge(fname)
res = Net::HTTP.start(url.host, url.port) do |http|
http.head(url.path)
end
# it's actually a bad idea to puts some text in a query method
# let's just return value instead
res.code == "200"
end

Ruby HTTP POST - Errors

Can someone explain to me why I am getting this error when doing this POST? I pulled the snippet from the Ruby-docs page.
undefined method `hostname' for #URI::HTTP:0x10bd441d8 URL:http://ws.mittthetwitapp.com/ws.phpmywebservice (NoMethodError)
Perhaps I am missing a require or something?
require 'net/http'
uri= URI('http://ws.mywebservice.com/ws.php')
req = Net::HTTP::Post.new(uri.path)
req.set_form_data('xmlPayload' => '<TestRequest><Message>Hi Test</Message></TestRequest>')
res = Net::HTTP.start(uri.hostname, uri.port) do |http|
http.request(req)
end
case res
when Net::HTTPSuccess, Net::HTTPRedirection
# OK
else
res.value
end
If you're using a version of Ruby prior to 1.9.3, you should use uri.host.
URI#hostname was added in Ruby 1.9.3. It is different than URI#host in that it removes brackets from IPv6 hostnames. For non-IPv6 hostnames it should behave identically.
The implementation (from APIdock):
def hostname
v = self.host
/\A\[(.*)\]\z/ =~ v ? $1 : v
end

Parameters for a Ruby HTTP Put call

I'm having trouble getting parameters passed in an HTTP Put call, using ruby. Take a look at the "put_data" variable.
When I leave it as a hash, ruby says:
undefined method `bytesize' for #<Hash:0x007fbf41a109e8>
if I convert to a string, I get:
can't convert Net::HTTPUnauthorized into String
I've also tried doing just - '?token=wBsB16NSrfVDpZPoEpM'
def process_activation
uri = URI("http://localhost:3000/api/v1/activation/" + self.member_card_num)
Net::HTTP.start(uri.host, uri.port) do |http|
headers = {'Content-Type' => 'text/plain; charset=utf-8'}
put_data = {:token => "wBsB16NSrfVDpZPoEpM"}
response = http.send_request('PUT', uri.request_uri, put_data, headers)
result = JSON.parse(response)
end
if result['card']['state']['state'] == "active"
return true
else
return false
end
end
I've searched all around, including rubydocs, but can't find an example of how to encode parameters. Any help would be appreciated.
Don't waste your time with NET::HTTP. I used 'rest-client' and had this thing done in minutes...
def process_activation
response = RestClient.put 'http://localhost:3000/api/v1/card_activation/'+ self.member_card_num, :token => "wBsB1pjJNNfiK6NSrfVDpZPoEpM"
result = JSON.parse(response)
return result['card']['state']['state'] == "active"
end

Ruby - net/http - following redirects

I've got a URL and I'm using HTTP GET to pass a query along to a page. What happens with the most recent flavor (in net/http) is that the script doesn't go beyond the 302 response. I've tried several different solutions; HTTPClient, net/http, Rest-Client, Patron...
I need a way to continue to the final page in order to validate an attribute tag on that pages html. The redirection is due to a mobile user agent hitting a page that redirects to a mobile view, hence the mobile user agent in the header. Here is my code as it is today:
require 'uri'
require 'net/http'
class Check_Get_Page
def more_http
url = URI.parse('my_url')
req, data = Net::HTTP::Get.new(url.path, {
'User-Agent' => 'Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_3_2 like Mac OS X; en-us) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8H7 Safari/6533.18.5'
})
res = Net::HTTP.start(url.host, url.port) {|http|
http.request(req)
}
cookie = res.response['set-cookie']
puts 'Body = ' + res.body
puts 'Message = ' + res.message
puts 'Code = ' + res.code
puts "Cookie \n" + cookie
end
end
m = Check_Get_Page.new
m.more_http
Any suggestions would be greatly appreciated!
To follow redirects, you can do something like this (taken from ruby-doc)
Following Redirection
require 'net/http'
require 'uri'
def fetch(uri_str, limit = 10)
# You should choose better exception.
raise ArgumentError, 'HTTP redirect too deep' if limit == 0
url = URI.parse(uri_str)
req = Net::HTTP::Get.new(url.path, { 'User-Agent' => 'Mozilla/5.0 (etc...)' })
response = Net::HTTP.start(url.host, url.port, use_ssl: true) { |http| http.request(req) }
case response
when Net::HTTPSuccess then response
when Net::HTTPRedirection then fetch(response['location'], limit - 1)
else
response.error!
end
end
print fetch('http://www.ruby-lang.org/')
Given a URL that redirects
url = 'http://httpbin.org/redirect-to?url=http%3A%2F%2Fhttpbin.org%2Fredirect-to%3Furl%3Dhttp%3A%2F%2Fexample.org'
A. Net::HTTP
begin
response = Net::HTTP.get_response(URI.parse(url))
url = response['location']
end while response.is_a?(Net::HTTPRedirection)
Make sure that you handle the case when there are too many redirects.
B. OpenURI
open(url).read
OpenURI::OpenRead#open follows redirects by default, but it doesn't limit the number of redirects.
I wrote another class for this based on examples given here, thank you very much everybody. I added cookies, parameters and exceptions and finally got what I need: https://gist.github.com/sekrett/7dd4177d6c87cf8265cd
require 'uri'
require 'net/http'
require 'openssl'
class UrlResolver
def self.resolve(uri_str, agent = 'curl/7.43.0', max_attempts = 10, timeout = 10)
attempts = 0
cookie = nil
until attempts >= max_attempts
attempts += 1
url = URI.parse(uri_str)
http = Net::HTTP.new(url.host, url.port)
http.open_timeout = timeout
http.read_timeout = timeout
path = url.path
path = '/' if path == ''
path += '?' + url.query unless url.query.nil?
params = { 'User-Agent' => agent, 'Accept' => '*/*' }
params['Cookie'] = cookie unless cookie.nil?
request = Net::HTTP::Get.new(path, params)
if url.instance_of?(URI::HTTPS)
http.use_ssl = true
http.verify_mode = OpenSSL::SSL::VERIFY_NONE
end
response = http.request(request)
case response
when Net::HTTPSuccess then
break
when Net::HTTPRedirection then
location = response['Location']
cookie = response['Set-Cookie']
new_uri = URI.parse(location)
uri_str = if new_uri.relative?
url + location
else
new_uri.to_s
end
else
raise 'Unexpected response: ' + response.inspect
end
end
raise 'Too many http redirects' if attempts == max_attempts
uri_str
# response.body
end
end
puts UrlResolver.resolve('http://www.ruby-lang.org')
The reference that worked for me is here: http://shadow-file.blogspot.co.uk/2009/03/handling-http-redirection-in-ruby.html
Compared to most examples (including the accepted answer here), it's more robust as it handles URLs which are just a domain (http://example.com - needs to add a /), handles SSL specifically, and also relative URLs.
Of course you would be better off using a library like RESTClient in most cases, but sometimes the low-level detail is necessary.
Maybe you can use curb-fu gem here https://github.com/gdi/curb-fu the only thing is some extra code to make it follow redirect. I've used the following before. Hope it helps.
require 'rubygems'
require 'curb-fu'
module CurbFu
class Request
module Base
def new_meth(url_params, query_params = {})
curb = old_meth url_params, query_params
curb.follow_location = true
curb
end
alias :old_meth :build
alias :build :new_meth
end
end
end
#this should follow the redirect because we instruct
#Curb.follow_location = true
print CurbFu.get('http://<your path>/').body
If you do not need to care about the details at each redirection, you can use the library Mechanize
require 'mechanize'
agent = Mechanize.new
begin
response = #agent.get(url)
rescue Mechanize::ResponseCodeError
// response codes other than 200, 301, or 302
rescue Timeout::Error
rescue Mechanize::RedirectLimitReachedError
rescue StandardError
end
It will return the destination page.
Or you can turn off redirection by this :
agent.redirect_ok = false
Or you can optionally change some settings at the request
agent.user_agent = "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.106 Mobile Safari/537.36"

Resources