How to fix bad URI is not URI [duplicate] - ruby

This question already has answers here:
URI::InvalidURIError (bad URI(is not URI?): ):
(4 answers)
Closed 6 years ago.
I'm using the ruby version 1.9.3, I like to get host name from the video url below,
I tried with code
require 'uri'
url = "https://ferrari-view.4me.it/view-share/playerp/?plContext=http://ferrari-%201363948628-stream.4mecloud.it/live/ferrari/ngrp:livegenita/manifest.f4m&cartellaConfig=http://ferrari-4me.weebo.it/static/player/config/&cartellaLingua=http://ferrari-4me.weebo.it/static/player/config/&poster=http://pusher.newvision.it:8080/resources/img1.jpg&urlSkin=http://ferrari-4me.weebo.it/static/player/swf/skin.swf?a=1363014732171&method=GET&target_url=http://ferrari-4me.weebo.it/static/player/swf/player.swf&userLanguage=IT&styleTextColor=#000000&autoPlay=true&bufferTime=2&isLive=true&highlightColor=#eb2323&gaTrackerList=UA-23603234-4"
puts URI.parse(url).host
it throws an exception URI::InvalidURIError: bad URI(is not URI?):
I tried with encode the URL then parse like below
puts URI.parse(URI.parse(url)).host
it throws an exception same URI::InvalidURIError: bad URI(is not URI?)
But above code works for the below URL.
url = http://www.youtube.com/v/GpQDa3PUAbU?version=3&autohide=1&autoplay=1
How to fix this? any suggestion please.
Thanks

This url is not valid, but it works in browser because browser itself is less strict about special characters like :, /, etc.
You should encode your URI first
encoded_url = URI.encode(url)
And then parse it
URI.parse(encoded_url)

Addressable::URI is a better, more rfc-compliant replacement for URI:
require "addressable/uri"
Addressable::URI.parse(url).host
#=> "ferrari-view.4me.it"
gem install addressable first.

try this:
safeurl = URI.encode(url.strip)
response = RestClient.get(safeurl)

Your URI query is not valid. There are several characters that you should encode with URI::encode(). For instance, #, , or & are not valid in a query.
Below a working version of your code
require 'uri'
plContext = URI::encode("http://ferrari-%201363948628-stream.4mecloud.it/live/ferrari/ngrp:livegenita/manifest.f4m")
cartellaConfig = URI::encode("http://ferrari-4me.weebo.it/static/player/config/")
cartellaLingua = URI::encode("http://ferrari-4me.weebo.it/static/player/config/")
poster = URI::encode("http://pusher.newvision.it:8080/resources/img1.jpg")
urlSkin = URI::encode("http://ferrari-4me.weebo.it/static/player/swf/skin.swf?a=1363014732171")
target_url = URI::encode("http://ferrari-4me.weebo.it/static/player/swf/player.swf")
url = "https://ferrari-view.4me.it/view-share/playerp/?"
url << "plContext=#{plContext}"
url << "&cartellaConfig=#{cartellaConfig}"
url << "&cartellaLingua=#{cartellaLingua}"
url << "&poster=#{poster}"
url << "&urlSkin=#{urlSkin}"
url << "&method=GET"
url << "&target_url=#{target_url}"
url << "&userLanguage=IT"
url << "&styleTextColor=#{URI::encode("#000000")}"
url << "&autoPlay=true&bufferTime=2&isLive=true&gaTrackerList=UA-23603234-4"
url << "&highlightColor=#{URI::encode("#eb2323")}"
puts url
puts URI.parse(url).host

URI.parse is right: that URI is illegal. Just because it accidentally happens to work in your browser doesn't make it legal. You cannot parse that URI, because it isn't a URI.

uri = URI.parse(URI.encode(url.strip))

Related

InvalidURIError making request to Facebook Graph API with Ruby

I'm simply trying to get a response from the API that includes certain fields that I'm specifying in my uri string but I keep receiving an InvalidURIError. I've come here as a last resort, having spent hours trying to debug this.
I've already tried using the URI.encode() method on it as well, but only get the same error.
Here's my code:
url = params[:url]
uri = URI('https://graph.facebook.com/v2.3/?id=' + url + '&fields=share,og_object{id,url,engagement}&access_token=' + CONFIG['fb_access_token'])
req = Net::HTTP::Post.new(uri.path)
req.set_form_data('fields' => 'og_object[engagement]','access_token' => CONFIG['fb_access_token'])
res = Net::HTTP.new(uri.host, uri.port)
res.verify_mode = OpenSSL::SSL::VERIFY_NONE
res.use_ssl = true
response = nil
res.start do |http|
response = http.request(req)
end
response = http.request(req)
output = ""
output << "#{response.body} <br />"
return output
And the error I'm receiving:
URI::InvalidURIError - bad URI(is not URI?): https://graph.facebook.com/v2.3/?id=http://www.wikipedia.org&fields=share,og_object{id,url,engagement}&access_token=960606020650536|eJC0PoCARFaqKZWZHdwN5ogkhfs
I'm just exhausted at this point so if I left out any important information just let me know and I'll respond with it as soon as I can. Thank you!
The problem is you're just dumping strings into your URI without escaping them first.
Since you're using Sinatra you can use Rack::Utils.build_query to construct your URI's query component with the values correctly escaped:
uri = URI('https://graph.facebook.com/v2.3/')
uri.query = Rack::Utils.build_query(
id: url,
fields: 'share,og_object{id,url,engagement}',
access_token: CONFIG['fb_access_token']
)

RestClient bug on response redirect encoding

When trying to get this page:
resp = RestClient.get("http://www.radios.com.br/aovivo/XXXX/24924")
I get this error:
URI::InvalidURIError: bad URI(is not URI?): http://www.radios.com.br/aovivo/Radio-Gospel-Ajduk?s/24924
from /Users/danicuki/.rvm/rubies/ruby-2.0.0-p353/lib/ruby/2.0.0/uri/common.rb:176:in `split'
from /Users/danicuki/.rvm/rubies/ruby-2.0.0-p353/lib/ruby/2.0.0/uri/common.rb:211:in `parse'
I think this is happening because the response redirect url has encoding problem. How to fix it?
Non-ASCII characters in URIs must be urlencoded:
url = "http://www.radios.com.br/aovivo/XXXX/24924"
resp = RestClient.get(URI::encode(str))
You need to apply patch for RestClient: (in version 2.1.0 it is not fixed yet)
RestClient::AbstractResponse.module_eval do
alias _origin_follow_redirection _follow_redirection
def _follow_redirection(new_args, &block)
# cannot follow redirection if there is no location header
raise exception_with_response unless headers[:location]
# Fix URI::InvalidURIError: URI must be ascii only
headers[:location] = URI::encode headers[:location]
_origin_follow_redirection new_args, &block
end
end

Converting to valid urls which can be opened by open-uri

I need to open some webpages using open-uri in ruby and then parse the content of those pages using Nokogori.
I just did:
require 'open-uri'
content_file = open(user_input_url)
This worked for: http://www.google.co.in and http://google.co.in but fails when user give inputs like www.google.co.in or google.co.in.
One thing i can do for such inputs i can append http:// and https:// and return the content of the page that opens. But this seems like a big hack to me.
Is there any better way to achieve this in ruby(i.e converting these user_inputs to valid open_uri urls).
uri = URI("www.google.com")
if uri.instance_of?(URI::Generic)
uri = URI::HTTP.build({:host => uri.to_s})
end
content_file = open(uri)
There are other ways as well see ref: http://www.ruby-doc.org/stdlib-2.0.0/libdoc/uri/rdoc/URI/HTTP.html
Prepend the scheme if not present and then use URI which will check the URL validity:
require 'uri'
url = 'www.google.com/a/b?c=d#e'
url.prepend "http://" unless url.start_with?('http://', 'https://')
url = URI(url) # it will raise error if the url is not valid
open url
Unfortunately, an "object oriented" version of what you need is more verbose and even more hackish:
require 'uri'
case url = URI.parse 'www.google.com/a/b?c=d#e'
when URI::HTTP, URI::HTTPS
# no-op
when URI::Generic
# We need to split u.path at the first '/', since URI::Generic interprets
# 'www.google.com/a/b' as a single path
host, path = url.path.split '/', 2
url = URI::HTTP.build host: host ,
path: "/#{path}" ,
query: url.query ,
fragment: url.fragment
else
raise "unsupported url class (#{url.class}) for #{url}"
end
open url
If you accept suggestions, don't break your head too much on this: I faced this matter often and I'm quite sure there aren't "polished" ways to do it
You need to prepend http to the urls, without an explicit scheme the uri could be anything, e.g. a local file. A uri is not necessarily an http url.
You can check either by using the URI class or by using a regex:
user_input_url = URI.parse(user_input_url).scheme ?
user_input_url :
"http://#{user_input_url}"
user_input_url = user_input_url =~ /https?:\/\// ?
user_input_url :
"http://#{user_input_url}"
def instance_to_hash(instance)
hash = {}
instance.instance_variables.each {|var| hash[var[1..-1].to_sym] = instance.instance_variable_get(var) }
hash
end
def url_compile(url)
# if url without 'http://', 'https://', '//' at start of string
# then prepend '//'
url.prepend '//' unless url.start_with?('http://', 'https://', '//')
uri = URI(url)
if uri.instance_of?(URI::Generic) # if scheme nil then assume it HTTPS
uri = URI::HTTPS.build(instance_to_hash(uri))
end
uri
end

URI::InvalidURIError for a URI that's not in my power to encode

I'm using httparty to unshorten short URIs and I happened upon:
HTTParty.get('http://bit.ly/19NoFfn', limit: 50 )
which when expanded yields:
https://sublime.wbond.net/packages/PhpSpec Snippets
which obviously throws a: URI::InvalidURIError.
Would it be possible to pass some parameter to httparty so that it would automatically try to encode URIs before trying to follow them?
I sort of solved my issue:
def unshorten(uri)
begin
response = HTTParty.get(uri, limit: 50)
rescue URI::InvalidURIError => error
bad_uri = error.message.match(/^bad\sURI\(is\snot\sURI\?\)\:\s(.*)$/)[1]
good_uri = URI.encode bad_uri
response = self.unshorten good_uri
end
response
end
I don't feel particularly comfortable fetching the URI from the error message string but it seems there's no other way. Or is there? :)

How to check if a URL is valid

How can I check if a string is a valid URL?
For example:
http://hello.it => yes
http:||bra.ziz, => no
If this is a valid URL how can I check if this is relative to a image file?
Notice:
As pointed by #CGuess, there's a bug with this issue and it's been documented for over 9 years now that validation is not the purpose of this regular expression (see https://bugs.ruby-lang.org/issues/6520).
Use the URI module distributed with Ruby:
require 'uri'
if url =~ URI::regexp
# Correct URL
end
Like Alexander Günther said in the comments, it checks if a string contains a URL.
To check if the string is a URL, use:
url =~ /\A#{URI::regexp}\z/
If you only want to check for web URLs (http or https), use this:
url =~ /\A#{URI::regexp(['http', 'https'])}\z/
Similar to the answers above, I find using this regex to be slightly more accurate:
URI::DEFAULT_PARSER.regexp[:ABS_URI]
That will invalidate URLs with spaces, as opposed to URI.regexp which allows spaces for some reason.
I have recently found a shortcut that is provided for the different URI rgexps. You can access any of URI::DEFAULT_PARSER.regexp.keys directly from URI::#{key}.
For example, the :ABS_URI regexp can be accessed from URI::ABS_URI.
The problem with the current answers is that a URI is not an URL.
A URI can be further classified as a locator, a name, or both. The
term "Uniform Resource Locator" (URL) refers to the subset of URIs
that, in addition to identifying a resource, provide a means of
locating the resource by describing its primary access mechanism
(e.g., its network "location").
Since URLs are a subset of URIs, it is clear that matching specifically for URIs will successfully match undesired values. For example, URNs:
"urn:isbn:0451450523" =~ URI::regexp
=> 0
That being said, as far as I know, Ruby doesn't have a default way to parse URLs , so you'll most likely need a gem to do so. If you need to match URLs specifically in HTTP or HTTPS format, you could do something like this:
uri = URI.parse(my_possible_url)
if uri.kind_of?(URI::HTTP) or uri.kind_of?(URI::HTTPS)
# do your stuff
end
I prefer the Addressable gem. I have found that it handles URLs more intelligently.
require 'addressable/uri'
SCHEMES = %w(http https)
def valid_url?(url)
parsed = Addressable::URI.parse(url) or return false
SCHEMES.include?(parsed.scheme)
rescue Addressable::URI::InvalidURIError
false
end
This is a fairly old entry, but I thought I'd go ahead and contribute:
String.class_eval do
def is_valid_url?
uri = URI.parse self
uri.kind_of? URI::HTTP
rescue URI::InvalidURIError
false
end
end
Now you can do something like:
if "http://www.omg.wtf".is_valid_url?
p "huzzah!"
end
For me, I use this regular expression:
/\A(http|https):\/\/[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(:[0-9]{1,5})?(\/.*)?\z/ix
Option:
i - case insensitive
x - ignore whitespace in regex
You can set this method to check URL validation:
def valid_url?(url)
return false if url.include?("<script")
url_regexp = /\A(http|https):\/\/[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(:[0-9]{1,5})?(\/.*)?\z/ix
url =~ url_regexp ? true : false
end
To use it:
valid_url?("http://stackoverflow.com/questions/1805761/check-if-url-is-valid-ruby")
Testing with wrong URLs:
http://ruby3arabi - result is invalid
http://http://ruby3arabi.com - result is invalid
http:// - result is invalid
http://test.com\n<script src=\"nasty.js\"> (Just simply check "<script")
127.0.0.1 - not support IP address
Test with correct URLs:
http://ruby3arabi.com - result is valid
http://www.ruby3arabi.com - result is valid
https://www.ruby3arabi.com - result is valid
https://www.ruby3arabi.com/article/1 - result is valid
https://www.ruby3arabi.com/websites/58e212ff6d275e4bf9000000?locale=en - result is valid
In general,
/^#{URI::regexp}$/
will work well, but if you only want to match http or https, you can pass those in as options to the method:
/^#{URI::regexp(%w(http https))}$/
That tends to work a little better, if you want to reject protocols like ftp://.
This is a little bit old but here is how I do it. Use Ruby's URI module to parse the URL. If it can be parsed then it's a valid URL. (But that doesn't mean accessible.)
URI supports many schemes, plus you can add custom schemes yourself:
irb> uri = URI.parse "http://hello.it" rescue nil
=> #<URI::HTTP:0x10755c50 URL:http://hello.it>
irb> uri.instance_values
=> {"fragment"=>nil,
"registry"=>nil,
"scheme"=>"http",
"query"=>nil,
"port"=>80,
"path"=>"",
"host"=>"hello.it",
"password"=>nil,
"user"=>nil,
"opaque"=>nil}
irb> uri = URI.parse "http:||bra.ziz" rescue nil
=> nil
irb> uri = URI.parse "ssh://hello.it:5888" rescue nil
=> #<URI::Generic:0x105fe938 URL:ssh://hello.it:5888>
[26] pry(main)> uri.instance_values
=> {"fragment"=>nil,
"registry"=>nil,
"scheme"=>"ssh",
"query"=>nil,
"port"=>5888,
"path"=>"",
"host"=>"hello.it",
"password"=>nil,
"user"=>nil,
"opaque"=>nil}
See the documentation for more information about the URI module.
You could also use a regex, maybe something like http://www.geekzilla.co.uk/View2D3B0109-C1B2-4B4E-BFFD-E8088CBC85FD.htm assuming this regex is correct (I haven't fully checked it) the following will show the validity of the url.
url_regex = Regexp.new("((https?|ftp|file):((//)|(\\\\))+[\w\d:\##%/;$()~_?\+-=\\\\.&]*)")
urls = [
"http://hello.it",
"http:||bra.ziz"
]
urls.each { |url|
if url =~ url_regex then
puts "%s is valid" % url
else
puts "%s not valid" % url
end
}
The above example outputs:
http://hello.it is valid
http:||bra.ziz not valid

Resources