URI Response Code - ruby

I would like to use Ruby's OpenURI to check whether the URL can be properly accessed. So I would like to check its response code (4xx or 5xx means error, etc.) Is it possible to find that?

You can use the status method to return an array that contains the status code and message.
require "open-uri"
open("http://www.example.org") do |f|
puts f.base_uri #=> http://www.example.org
puts f.status #=> ["200", "OK"]
end

Related

ruby sinatra how to redirect with regex

I am trying to move stuff at root to /en/ directory to make my little service multi-lingual.
So, I want to redirect this url
mysite.com/?year=2018
to
mysite.com/en/?year=2018
My code is like
get %r{^/(\?year\=\d{4})$} do |c|
redirect "/en/#{c}"
end
but it seems like I never get #{c} part from the url.
Why is that? or are there just better ways to do this?
Thanks!
You can use the request.path variable to get the information you're looking for.
For example,
get "/something" do
puts request.path # => "/something"
redirect "/en#{request.path}"
end
However if you are using query parameters (i.e. ?yeah=2000) you'll have to manually pass those off to the redirect route.
Kind of non-intuitively, there's a helper method for this in ActiveRecord.
require 'active_record'
get "/something" do
puts params.to_param
# if params[:year] is 2000, you'll get "year=2000"
redirect "/en#{request.path}?#{params.to_param}"
end
You could alternatively write your own helper method pretty easily:
def hash_to_param_string(hash)
hash.reduce("") do |string, (key, val)|
string << "#{key}=#{val}&"
end
end
puts hash_to_param_string({key1: "val1", key2: "val2"})
# => "key1=val1&key2=val2"

Parse html GET via open() with nokogiri - redirect exception

I'm trying to learn ruby, so I'm following an exercise of google dev. I'm trying to parse some links. In the case of successful redirection (considering that I know that it its possible only to get redirected once), I get redirect forbidden. I noticed that I go from a http protocol link to an https protocol link. Any concrete idea how could I implement in this in ruby because google's exercise is for python?
error:
ruby fix.rb
redirection forbidden: http://code.google.com/edu/languages/google-python-class/images/puzzle/p-bija-baei.jpg -> https://developers.google.com/edu/python/images/puzzle/p-bija-baei.jpg?csw=1
code that should achieve what I'm looking for:
def acquireData(urls, imgs) #List item urls list of valid urls !checked, imgs list of the imgs I'll download afterwards.
begin
urls.each do |url|
page = Nokogiri::HTML(open(url))
puts page.body
end
rescue Exception => e
puts e
end
end
Ruby's OpenURI will automatically handle redirects for you, as long as they're not "meta-refresh" that occur inside the HTML itself.
For instance, this follows a redirect automatically:
irb(main):008:0> page = open('http://www.example.org')
#<StringIO:0x00000002ae2de0>
irb(main):009:0> page.base_uri.to_s
"http://www.iana.org/domains/example"
In other words, the request to "www.example.org" got redirected to "www.iana.org" and OpenURI tracked it correctly.
If you are trying to learn HOW to handle redirects, read the Net::HTTP documentation. Here is the example how to do it from the document:
Following Redirection
Each Net::HTTPResponse object belongs to a class for its response code.
For example, all 2XX responses are instances of a Net::HTTPSuccess subclass, a 3XX response is an instance of a Net::HTTPRedirection subclass and a 200 response is an instance of the Net::HTTPOK class. For details of response classes, see the section “HTTP Response Classes” below.
Using a case statement you can handle various types of responses properly:
def fetch(uri_str, limit = 10)
# You should choose a better exception.
raise ArgumentError, 'too many HTTP redirects' if limit == 0
response = Net::HTTP.get_response(URI(uri_str))
case response
when Net::HTTPSuccess then
response
when Net::HTTPRedirection then
location = response['location']
warn "redirected to #{location}"
fetch(location, limit - 1)
else
response.value
end
end
print fetch('http://www.ruby-lang.org')
If you want to handle meta-refresh statements, reflect on this:
require 'nokogiri'
doc = Nokogiri::HTML(%[<meta http-equiv="refresh" content="5;URL='http://example.com/'">])
meta_refresh = doc.at('meta[http-equiv="refresh"]')
if meta_refresh
puts meta_refresh['content'][/URL=(.+)/, 1].gsub(/['"]/, '')
end
Which outputs:
http://example.com/
Basically the url in code.google that you're trying to open redirects to a https url. You can see that by yourself if you paste http://code.google.com/edu/languages/google-python-class/images/puzzle/p-bija-baei.jpg into your browser
Check the following bug report that explains why open-uri can't redirect to https;
So the solution to your problem is simply: use a different set of urls (that don't redirect to https)

Getting the contents of a 404 error page response ruby

I know some languages have a library that allows you to get the HTTP content for a 404 or 500 message.
Is there a library that allows that for Ruby?
I've tried open-uri but it simply returns an HTTPError exception without the HTML content for the 404 response.
This doesn't seem to be stated clearly enough in the docs, but HttpError has an io attribute, which you can treat as a read only file as far as i know.
require 'open-uri'
begin
response = open('http://google.com/blahblah')
rescue => e
puts e # Error message
puts e.io.status # Http Error code
puts e.io.readlines # Http response body
end
Net::HTTP supports what you need.
You can use the request_get method and it will return a response regardless of the status code.
From script/console:
> http = Net::HTTP.new('localhost', 3000)
=> #<Net::HTTP localhost:3000 open=false>
> resp = http.request_get('/foo') # a page that doesn't exist
=> #<Net::HTTPNotFound 404 Not Found readbody=true>
> resp.code
=> "404"
> resp.body
=> "<html>...</html>"
(If the library is not available to you by default, you can do a require 'net/http'
Works with HTTParty as well https://github.com/jnunemaker/httparty
require 'rubygems'
require 'httparty'
HTTParty.get("http://google.com/blahblah").parsed_response
There are a number of HTTP Clients available, choose one you like from https://www.ruby-toolbox.com/categories/http_clients

Ruby parsing HTTPresponse with Nokogiri

Parsing HTTPresponse with Nokogiri
Hi, I am having trouble parsing HTTPresponse objects with Nokogiri.
I use this function to fetch a website here:
fetch a link
def fetch(uri_str, limit = 10)
# You should choose better exception.
raise ArgumentError, 'HTTP redirect too deep' if limit == 0
url = URI.parse(URI.encode(uri_str.strip))
puts url
#get path
req = Net::HTTP::Get.new(url.path,headers)
#start TCP/IP
response = Net::HTTP.start(url.host,url.port) { |http|
http.request(req)
}
case response
when Net::HTTPSuccess
then #print final redirect to a file
puts "this is location" + uri_str
puts "this is the host #{url.host}"
puts "this is the path #{url.path}"
return response
# if you get a 302 response
when Net::HTTPRedirection
then
puts "this is redirect" + response['location']
return fetch(response['location'],aFile, limit - 1)
else
response.error!
end
end
html = fetch("http://www.somewebsite.com/hahaha/")
puts html
noko = Nokogiri::HTML(html)
When I do this html prints a whole bunch of gibberish and
Nokogiri complains that "node_set must be a Nokogiri::XML::NOdeset
If anyone could offer help it would be quite appreciated
First thing. Your fetch method returns a Net::HTTPResponse object and not just the body. You should provide the body to Nokogiri.
response = fetch("http://www.somewebsite.com/hahaha/")
puts response.body
noko = Nokogiri::HTML(response.body)
I've updated your script so it's runnable (bellow). A couple of things were undefined.
require 'nokogiri'
require 'net/http'
def fetch(uri_str, limit = 10)
# You should choose better exception.
raise ArgumentError, 'HTTP redirect too deep' if limit == 0
url = URI.parse(URI.encode(uri_str.strip))
puts url
#get path
headers = {}
req = Net::HTTP::Get.new(url.path,headers)
#start TCP/IP
response = Net::HTTP.start(url.host,url.port) { |http|
http.request(req)
}
case response
when Net::HTTPSuccess
then #print final redirect to a file
puts "this is location" + uri_str
puts "this is the host #{url.host}"
puts "this is the path #{url.path}"
return response
# if you get a 302 response
when Net::HTTPRedirection
then
puts "this is redirect" + response['location']
return fetch(response['location'], limit-1)
else
response.error!
end
end
response = fetch("http://www.google.com/")
puts response
noko = Nokogiri::HTML(response.body)
puts noko
The script gives no error and prints the content. You may be getting Nokogiri error due to the content you're receiving. One common problem I've encountered with Nokogiri is character encoding. Without the exact error it's impossible to tell what's going on.
I'd recommnend looking at the following StackOverflow Questions
ruby 1.9: invalid byte sequence in UTF-8 (specifically this answer)
How to convert a Net::HTTP response to a certain encoding in Ruby 1.9.1?

Ruby Net::HTTP - following 301 redirects

My users submit urls (to mixes on mixcloud.com) and my app uses them to perform web requests.
A good url returns a 200 status code:
uri = URI.parse("http://www.mixcloud.com/ErolAlkan/hard-summer-mix/")
request = Net::HTTP.get_response(uri)(
#<Net::HTTPOK 200 OK readbody=true>
But if you forget the trailing slash then our otherwise good url returns a 301:
uri = "http://www.mixcloud.com/ErolAlkan/hard-summer-mix"
#<Net::HTTPMovedPermanently 301 MOVED PERMANENTLY readbody=true>
The same thing happens with 404's:
# bad path returns a 404
"http://www.mixcloud.com/bad/path/"
# bad path minus trailing slash returns a 301
"http://www.mixcloud.com/bad/path"
How can I 'drill down' into the 301 to see if it takes us on to a valid resource or an error page?
Is there a tool that provides a comprehensive overview of the rules that a particular domain might apply to their urls?
301 redirects are fairly common if you do not type the URL exactly as the web server expects it. They happen much more frequently than you'd think, you just don't normally ever notice them while browsing because the browser does all that automatically for you.
Two alternatives come to mind:
1: Use open-uri
open-uri handles redirects automatically. So all you'd need to do is:
require 'open-uri'
...
response = open('http://xyz...').read
If you have trouble redirecting between HTTP and HTTPS, then have a look here for a solution:
Ruby open-uri redirect forbidden
2: Handle redirects with Net::HTTP
def get_response_with_redirect(uri)
r = Net::HTTP.get_response(uri)
if r.code == "301"
r = Net::HTTP.get_response(URI.parse(r['location']))
end
r
end
If you want to be even smarter you could try to add or remove missing backslashes to the URL when you get a 404 response. You could do that by creating a method like get_response_smart which handles this URL fiddling in addition to the redirects.
I can't figure out how to comment on the accepted answer (this question might be closed), but I should note that r.header is now obsolete, so r.header['location'] should be replaced by r['location'] (per https://stackoverflow.com/a/6934503/1084675 )
rest-client follows the redirections for GET and HEAD requests without any additional configuration. It works very nice.
for result codes between 200 and 207, a RestClient::Response will be returned
for result codes 301, 302 or 307, the redirection will be followed if the request is a GET or a HEAD
for result code 303, the redirection will be followed and the request transformed into a GET
example of usage:
require 'rest-client'
RestClient.get 'http://example.com/resource'
The rest-client README also gives an example of following redirects with POST requests:
begin
RestClient.post('http://example.com/redirect', 'body')
rescue RestClient::MovedPermanently,
RestClient::Found,
RestClient::TemporaryRedirect => err
err.response.follow_redirection
end
Here is the code I came up with (derived from different examples) which will bail out if there are too many redirects (note that ensure_success is optional):
require "net/http"
require "uri"
class Net::HTTPResponse
def ensure_success
unless kind_of? Net::HTTPSuccess
warn "Request failed with HTTP #{#code}"
each_header do |h,v|
warn "#{h} => #{v}"
end
abort
end
end
end
def do_request(uri_string)
response = nil
tries = 0
loop do
uri = URI.parse(uri_string)
http = Net::HTTP.new(uri.host, uri.port)
request = Net::HTTP::Get.new(uri.request_uri)
response = http.request(request)
uri_string = response['location'] if response['location']
unless response.kind_of? Net::HTTPRedirection
response.ensure_success
break
end
if tries == 10
puts "Timing out after 10 tries"
break
end
tries += 1
end
response
end
Not sure if anyone is looking for this exact solution, but if you are trying to download an image http/https and store it to a variable
require 'open_uri_redirections'
require 'net/https'
web_contents = open('file_url_goes_here', :ssl_verify_mode => OpenSSL::SSL::VERIFY_NONE, :allow_redirections => :all) {|f| f.read }
puts web_contents

Resources