Net::http check actual page - ruby

Ok so I got some help yesterday checking and actual host to see if its available. I then wrote this.
I pass it for my server www.myhost.com and port 81. Works perfect. But what if I want to actually check a page. www.myhost.com/anypage.php? Not sure but I think the problem lies with the alternate port.
def server_up(server, port)
http = Net::HTTP.start(server, port, {open_timeout: 5, read_timeout: 5})
response = http.head("/")
response.code == "200"
rescue Timeout::Error, SocketError
false
end

As tadman mentioned in the comments, you could modify your method to accept an optional path argument (below). You may want to rename the method, though, since it will no longer simply check if the server is up, but rather, also if the page exists.
def server_up(server, port, path="")
http = Net::HTTP.start(server, port, {open_timeout: 5, read_timeout: 5})
response = http.head("/#{path}")
response.code == "200"
rescue Timeout::Error, SocketError
false
end

Related

Fastest way to check if a url exists

currently I am writing a program that needs to check tons of possible urls searching for any that actually exist. To be precise, I mean exist as in you can visit the url and there's actual content of some sort.. not string parsing to see if it's in url format.
The program generates a list of possible variants for a filename and then checks each one until it gets a url that actually exists, so most of the url remains the same. Examples would be,
https://www.test.com/folder1/FILE.png
https://www.test.com/folder1/File.png
https://www.test.com/folder1/file.png
https://www.test.com/folder1/file1.png
That said, my code currently works fine.. however it ends up taking about 2-4 secods per url check and I don't know of a way to speed it up. Is there any faster or better way to validate urls or am I just out of luck?
This is my function to validate urls:
require "net/http"
def url_exist? url_path
url = URI.parse(url_path)
req = Net::HTTP.new(url.host, url.port)
req.use_ssl = true
res = req.request_head(url.path)
if res.code == "200" || res.code == "403"
return true
end
end
Thank you for taking the time to read this and any help will be much appreciated.
Your code creates a new connection for each URL. It should be faster to send multiple requests over the same connection via HTTP keep-alive.
In Ruby, you can open such connection via Net::HTTP.start, e.g.:
require 'net/http'
class URLChecker
def initialize(base_url)
uri = URI(base_url)
Net::HTTP.start(uri.host, uri.port, use_ssl: uri.is_a?(URI::HTTPS)) do |http|
#http = http
yield self
end
end
def exist?(path)
res = #http.head(path)
res.code == '200' || res.code == '403'
end
end
URLChecker.new('https://stackoverflow.com') do |uc|
p uc.exist?('/questions/tagged/ruby') #=> true
p uc.exist?('/questions/tagged/python') #=> true
p uc.exist?('/questions/tagged/foobar') #=> false
end

How do I get a redirected status code using NET:HTTP?

Similar to "getting the status code of a HTTP redirected page", but with NET::HTTP instead of curb I am making a GET request to a page that that will redirect:
response = Net::HTTP.get_response(URI.parse("http://www.wikipedia.org/wiki/URL_redirection"))
puts response.code #{
puts response['location']
=> 301
en.wikipedia.org/wiki/URL_redirection
The problem is that I want to know the status code of the redirected page. In this case it is 200, but in my app I want to check if it is 200 or something else.
The solution I've seen is to just call get_response(response['location']), but that won't work in my application because the way the redirect is designed makes it so that the redirect can only be followed once. Since the first GET consumes that one redirect, I can't then follow it again.
Is there some way to get the last status code that is a result of a GET?
EDIT: Further clarification of the situation:
The application that I'm sending GET to has a single sign-on authentication mechanism where, if I want to access 'myapp/mypage', I have to first send a post:
postResponse = Net::HTTP.post_form(URI.parse("http://myapp.com/trusted"), {"username" => #username})
Then make the GET request to:
'http://myapp.com/trusted/#{postResponse.body}/mypage
*The postResponse.body is a 'ticket' which can be redeemed once.
That GET verifies that the ticket is valid and then redirects to:
myapp.com/mypage
So whether that ticket is valid or not, I get a 301.
I want to check the status code of the final get to myapp.com/mypage.
If I manually try to follow the redirect, whether it's a HEAD request or a GET, the original redirect will have already consumed the ticket, so I will get an error that the ticket is expired even if the original redirect was a 200.
The Net::HTTP documentation has example code showing how to deal with redirects. Have you tried it? It should make it easy to get inside the redirect mechanism and grab statuses for later.
Here's their example:
Following Redirection
Each Net::HTTPResponse object belongs to a class for its response code.
For example, all 2XX responses are instances of a Net::HTTPSuccess subclass, a 3XX response is an instance of a Net::HTTPRedirection subclass and a 200 response is an instance of the Net::HTTPOK class. For details of response classes, see the section “HTTP Response Classes” below.
Using a case statement you can handle various types of responses properly:
def fetch(uri_str, limit = 10)
# You should choose a better exception.
raise ArgumentError, 'too many HTTP redirects' if limit == 0
response = Net::HTTP.get_response(URI(uri_str))
case response
when Net::HTTPSuccess then
response
when Net::HTTPRedirection then
location = response['location']
warn "redirected to #{location}"
fetch(location, limit - 1)
else
response.value
end
end
print fetch('http://www.ruby-lang.org')
A minor change like this should help:
require 'net/http'
RESPONSES = []
def fetch(uri_str, limit = 10)
# You should choose a better exception.
raise ArgumentError, 'too many HTTP redirects' if limit == 0
response = Net::HTTP.get_response(URI(uri_str))
RESPONSES << response
case response
when Net::HTTPSuccess then
response
when Net::HTTPRedirection then
location = response['location']
warn "redirected to #{location}"
fetch(location, limit - 1)
else
response.value
end
end
print fetch('http://jigsaw.w3.org/HTTP/300/302.html')
puts RESPONSES.join("\n") # =>
I see this when I run it:
redirected to http://jigsaw.w3.org/HTTP/300/Overview.html
#<Net::HTTPOK:0x007f9e82a1e050>#<Net::HTTPFound:0x007f9e82a2daa0>
#<Net::HTTPOK:0x007f9e82a1e050>
If it's enough just to make an HTTP HEAD request without 'consuming' your URL (this would be the usual expectation for a HEAD request), you can do it like this:
2.0.0-p195 :143 > result = Net::HTTP.start('www.google.com') { |http| http.head '/' }
=> #<Net::HTTPFound 302 Found readbody=true>
So in your example you'd do this:
...
result = Net::HTTP.start(response.uri.host) { |http| http.head response.uri.path }
If you want to preserve a history of response codes, you could try this. This retains the last 5 response codes from calls to get_response and exposes them through a Net::HTTP.history method.
module Net
class << HTTP
alias_method :_get_response, :get_response
def get_response *args, &block
resp = _get_response *args, &block
#history = (#history || []).push(resp.code).last 5
resp
end
def history
#history || []
end
end
end
(I don't entirely get the usage scenario, so adapt to your needs)

Ruby Checking to see if website exists

Ok i am checking to see if my server is running. It works as long as the port is correct. But if I cange the port to one I know is not excepted it completely skips my if routine. The example below works fine. But change the port number to say 99 and it completely skips the if. I would think it should fall into the else section.
url = URI.parse("http://www.google.com/")
url.port = 80
req = Net::HTTP.new(url.host, url.port)
res = req.request_head(url.path)
if res.code == "200"
#do something
else
#do something else
end
You should provide a timeout and rescue SocketError and Timeout::Error:
require "net/http"
def check_server(server, port)
begin
http = Net::HTTP.start(server, port, {open_timeout: 5, read_timeout: 5})
begin
response = http.head("/")
if response.code == "200"
# everything fine
else
# unexpected status code
end
rescue Timeout::Error
# timeout reading from server
end
rescue Timeout::Error
# timeout connecting to server
rescue SocketError
# unknown server
end
end
If you just want to check if your server is up, this can be simplified:
require "net/http"
def up?(server, port)
http = Net::HTTP.start(server, port, {open_timeout: 5, read_timeout: 5})
response = http.head("/")
response.code == "200"
rescue Timeout::Error, SocketError
false
end
It returns true if / returns a 200 status code and false otherwise, i.e. for other status codes, timeouts and typical error conditions.

Ruby Net::HTTP - following 301 redirects

My users submit urls (to mixes on mixcloud.com) and my app uses them to perform web requests.
A good url returns a 200 status code:
uri = URI.parse("http://www.mixcloud.com/ErolAlkan/hard-summer-mix/")
request = Net::HTTP.get_response(uri)(
#<Net::HTTPOK 200 OK readbody=true>
But if you forget the trailing slash then our otherwise good url returns a 301:
uri = "http://www.mixcloud.com/ErolAlkan/hard-summer-mix"
#<Net::HTTPMovedPermanently 301 MOVED PERMANENTLY readbody=true>
The same thing happens with 404's:
# bad path returns a 404
"http://www.mixcloud.com/bad/path/"
# bad path minus trailing slash returns a 301
"http://www.mixcloud.com/bad/path"
How can I 'drill down' into the 301 to see if it takes us on to a valid resource or an error page?
Is there a tool that provides a comprehensive overview of the rules that a particular domain might apply to their urls?
301 redirects are fairly common if you do not type the URL exactly as the web server expects it. They happen much more frequently than you'd think, you just don't normally ever notice them while browsing because the browser does all that automatically for you.
Two alternatives come to mind:
1: Use open-uri
open-uri handles redirects automatically. So all you'd need to do is:
require 'open-uri'
...
response = open('http://xyz...').read
If you have trouble redirecting between HTTP and HTTPS, then have a look here for a solution:
Ruby open-uri redirect forbidden
2: Handle redirects with Net::HTTP
def get_response_with_redirect(uri)
r = Net::HTTP.get_response(uri)
if r.code == "301"
r = Net::HTTP.get_response(URI.parse(r['location']))
end
r
end
If you want to be even smarter you could try to add or remove missing backslashes to the URL when you get a 404 response. You could do that by creating a method like get_response_smart which handles this URL fiddling in addition to the redirects.
I can't figure out how to comment on the accepted answer (this question might be closed), but I should note that r.header is now obsolete, so r.header['location'] should be replaced by r['location'] (per https://stackoverflow.com/a/6934503/1084675 )
rest-client follows the redirections for GET and HEAD requests without any additional configuration. It works very nice.
for result codes between 200 and 207, a RestClient::Response will be returned
for result codes 301, 302 or 307, the redirection will be followed if the request is a GET or a HEAD
for result code 303, the redirection will be followed and the request transformed into a GET
example of usage:
require 'rest-client'
RestClient.get 'http://example.com/resource'
The rest-client README also gives an example of following redirects with POST requests:
begin
RestClient.post('http://example.com/redirect', 'body')
rescue RestClient::MovedPermanently,
RestClient::Found,
RestClient::TemporaryRedirect => err
err.response.follow_redirection
end
Here is the code I came up with (derived from different examples) which will bail out if there are too many redirects (note that ensure_success is optional):
require "net/http"
require "uri"
class Net::HTTPResponse
def ensure_success
unless kind_of? Net::HTTPSuccess
warn "Request failed with HTTP #{#code}"
each_header do |h,v|
warn "#{h} => #{v}"
end
abort
end
end
end
def do_request(uri_string)
response = nil
tries = 0
loop do
uri = URI.parse(uri_string)
http = Net::HTTP.new(uri.host, uri.port)
request = Net::HTTP::Get.new(uri.request_uri)
response = http.request(request)
uri_string = response['location'] if response['location']
unless response.kind_of? Net::HTTPRedirection
response.ensure_success
break
end
if tries == 10
puts "Timing out after 10 tries"
break
end
tries += 1
end
response
end
Not sure if anyone is looking for this exact solution, but if you are trying to download an image http/https and store it to a variable
require 'open_uri_redirections'
require 'net/https'
web_contents = open('file_url_goes_here', :ssl_verify_mode => OpenSSL::SSL::VERIFY_NONE, :allow_redirections => :all) {|f| f.read }
puts web_contents

Implementing Re-connect Strategy using Ruby Net

I'm developing a small application which posts XML to some webservice.
This is done using Net::HTTP::Post::Post. However, the service provider recommends using a re-connect.
Something like:
1st request fails -> try again after 2 seconds
2nd request fails -> try again after 5 seconds
3rd request fails -> try again after 10 seconds
...
What would be a good approach to do that? Simply running the following piece of code in a loop, catching the exception and run it again after an amount of time? Or is there any other clever way to do that? Maybe the Net package even has some built in functionality that I'm not aware of?
url = URI.parse("http://some.host")
request = Net::HTTP::Post.new(url.path)
request.body = xml
request.content_type = "text/xml"
#run this line in a loop??
response = Net::HTTP.start(url.host, url.port) {|http| http.request(request)}
Thanks very much, always appreciate your support.
Matt
This is one of the rare occasions when Ruby's retry comes in handy. Something along these lines:
retries = [3, 5, 10]
begin
response = Net::HTTP.start(url.host, url.port) {|http| http.request(request)}
rescue SomeException # I'm too lazy to look it up
if delay = retries.shift # will be nil if the list is empty
sleep delay
retry # backs up to just after the "begin"
else
raise # with no args re-raises original error
end
end
I use gem retryable for retry.
With it code transformed from:
retries = [3, 5, 10]
begin
response = Net::HTTP.start(url.host, url.port) {|http| http.request(request)}
rescue SomeException # I'm too lazy to look it up
if delay = retries.shift # will be nil if the list is empty
sleep delay
retry # backs up to just after the "begin"
else
raise # with no args re-raises original error
end
end
To:
retryable( :tries => 10, :on => [SomeException] ) do
response = Net::HTTP.start(url.host, url.port) {|http| http.request(request)}
end

Resources