Ruby Checking to see if website exists - ruby

Ok i am checking to see if my server is running. It works as long as the port is correct. But if I cange the port to one I know is not excepted it completely skips my if routine. The example below works fine. But change the port number to say 99 and it completely skips the if. I would think it should fall into the else section.
url = URI.parse("http://www.google.com/")
url.port = 80
req = Net::HTTP.new(url.host, url.port)
res = req.request_head(url.path)
if res.code == "200"
#do something
else
#do something else
end

You should provide a timeout and rescue SocketError and Timeout::Error:
require "net/http"
def check_server(server, port)
begin
http = Net::HTTP.start(server, port, {open_timeout: 5, read_timeout: 5})
begin
response = http.head("/")
if response.code == "200"
# everything fine
else
# unexpected status code
end
rescue Timeout::Error
# timeout reading from server
end
rescue Timeout::Error
# timeout connecting to server
rescue SocketError
# unknown server
end
end
If you just want to check if your server is up, this can be simplified:
require "net/http"
def up?(server, port)
http = Net::HTTP.start(server, port, {open_timeout: 5, read_timeout: 5})
response = http.head("/")
response.code == "200"
rescue Timeout::Error, SocketError
false
end
It returns true if / returns a 200 status code and false otherwise, i.e. for other status codes, timeouts and typical error conditions.

Related

How to bypass network errors while using Ruby Mechanize web crawling

I am using the Ruby mechanize web crawler to pull data from popular real estate websites. I'm using the home address as keywords to scrape the public data on Zillow, Redfin, etc.
I'm basically trying to bypass any HTTP and network errors. The following rescue function doesn't seem to do the job.
def scrape_single(key_word)
#setup agent
agent = Mechanize.new{ |agent|
agent.user_agent_alias = 'Mac Safari'
}
agent.ignore_bad_chunking = true
agent.verify_mode = OpenSSL::SSL::VERIFY_NONE
agent.request_headers = { "Accept-Encoding" => ""}
agent.follow_meta_refresh = true
agent.keep_alive = false
#page setup
begin
agent.get(##search_engine) do |page|
##search_result = page.form('f') do |search|
search.q = key_word
end.submit
end
rescue Timeout::Error
puts "Timeout"
retry
rescue Net::HTTPGatewayTimeOut => e
if e.response_code == '504' || '502'
e.skip
sleep 5
end
rescue Net::HTTPBadGateway => e
if e.response_code == '504' || '502'
e.skip
sleep 5
end
rescue Net::HTTPNotFound => e
if e.response_code == '404'
e.skip
sleep 5
end
rescue Net::HTTPFatalError => e
if e.response_code == '503'
e.skip
end
rescue Mechanize::ResponseCodeError => e
if e.response_code == '404'
e.skip
sleep 5
elsif e.response_code == '502'
e.skip
sleep 5
else
retry
end
rescue Errno::ETIMEDOUT
retry
end
return ##search_result # returns Mechanize::Page
end
The following is an example of error message I get for a keyword with an address in MA.
/home/ec2-user/.gem/ruby/2.1/gems/mechanize-2.7.5/lib/mechanize/http/agent.rb:323:in `fetch': 404 => Net::HTTPNotFound for https://www.redfin.com/MA/WASHINGTON/306-WERDEN-RD-Unknown/home/134059623 -- unhandled response (Mechanize::ResponseCodeError)
The actual message you see when you input the above URL is:
Cannot GET /MA/WASHINGTON/306-WERDEN-RD-Unknown/home/134059623
My goal is to simply ignore and skip sporadic errors and move onto next keyword. I couldn't really find a working solution online and any feedback would be greatly appreciated.
If I understand the error raised is Mechanize::ResponseCodeError and this is clearly a 404 response_code. But in your script you don't raise 404 response_code from Mechanize::ResponseCodeError
all_response_code = ['403', '404', '502']
rescue Mechanize::ResponseCodeError => e
if all_response_code.include? response_code
e.skip
sleep 5
else
retry
end
Maybe if you add a condition for the 404 response_code, it will do the trick
EDIT
I changed the code a little bit in order to have less lines

Net::http check actual page

Ok so I got some help yesterday checking and actual host to see if its available. I then wrote this.
I pass it for my server www.myhost.com and port 81. Works perfect. But what if I want to actually check a page. www.myhost.com/anypage.php? Not sure but I think the problem lies with the alternate port.
def server_up(server, port)
http = Net::HTTP.start(server, port, {open_timeout: 5, read_timeout: 5})
response = http.head("/")
response.code == "200"
rescue Timeout::Error, SocketError
false
end
As tadman mentioned in the comments, you could modify your method to accept an optional path argument (below). You may want to rename the method, though, since it will no longer simply check if the server is up, but rather, also if the page exists.
def server_up(server, port, path="")
http = Net::HTTP.start(server, port, {open_timeout: 5, read_timeout: 5})
response = http.head("/#{path}")
response.code == "200"
rescue Timeout::Error, SocketError
false
end

Implementing Re-connect Strategy using Ruby Net

I'm developing a small application which posts XML to some webservice.
This is done using Net::HTTP::Post::Post. However, the service provider recommends using a re-connect.
Something like:
1st request fails -> try again after 2 seconds
2nd request fails -> try again after 5 seconds
3rd request fails -> try again after 10 seconds
...
What would be a good approach to do that? Simply running the following piece of code in a loop, catching the exception and run it again after an amount of time? Or is there any other clever way to do that? Maybe the Net package even has some built in functionality that I'm not aware of?
url = URI.parse("http://some.host")
request = Net::HTTP::Post.new(url.path)
request.body = xml
request.content_type = "text/xml"
#run this line in a loop??
response = Net::HTTP.start(url.host, url.port) {|http| http.request(request)}
Thanks very much, always appreciate your support.
Matt
This is one of the rare occasions when Ruby's retry comes in handy. Something along these lines:
retries = [3, 5, 10]
begin
response = Net::HTTP.start(url.host, url.port) {|http| http.request(request)}
rescue SomeException # I'm too lazy to look it up
if delay = retries.shift # will be nil if the list is empty
sleep delay
retry # backs up to just after the "begin"
else
raise # with no args re-raises original error
end
end
I use gem retryable for retry.
With it code transformed from:
retries = [3, 5, 10]
begin
response = Net::HTTP.start(url.host, url.port) {|http| http.request(request)}
rescue SomeException # I'm too lazy to look it up
if delay = retries.shift # will be nil if the list is empty
sleep delay
retry # backs up to just after the "begin"
else
raise # with no args re-raises original error
end
end
To:
retryable( :tries => 10, :on => [SomeException] ) do
response = Net::HTTP.start(url.host, url.port) {|http| http.request(request)}
end

How do I set the socket timeout in Ruby?

How do you set the timeout for blocking operations on a Ruby socket?
The solution I found which appears to work is to use Timeout::timeout:
require 'timeout'
...
begin
timeout(5) do
message, client_address = some_socket.recvfrom(1024)
end
rescue Timeout::Error
puts "Timed out!"
end
The timeout object is a good solution.
This is an example of asynchronous I/O (non-blocking in nature and occurs asynchronously to
the flow of the application.)
IO.select(read_array
[, write_array
[, error_array
[, timeout]]] ) => array or nil
Can be used to get the same effect.
require 'socket'
strmSock1 = TCPSocket::new( "www.dn.se", 80 )
strmSock2 = TCPSocket::new( "www.svd.se", 80 )
# Block until one or more events are received
#result = select( [strmSock1, strmSock2, STDIN], nil, nil )
timeout=5
timeout=100
result = select( [strmSock1, strmSock2], nil, nil,timeout )
puts result.inspect
if result
for inp in result[0]
if inp == strmSock1 then
# data avail on strmSock1
puts "data avail on strmSock1"
elsif inp == strmSock2 then
# data avail on strmSock2
puts "data avail on strmSock2"
elsif inp == STDIN
# data avail on STDIN
puts "data avail on STDIN"
end
end
end
I think the non blocking approach is the way to go.
I tried the mentioned above article and could still get it to hang.
this article non blocking networking and the jonke's approach above got me on the right path. My server was blocking on the initial connect so I needed it to be a little lower level.
the socket rdoc can give more details into the connect_nonblock
def self.open(host, port, timeout=10)
addr = Socket.getaddrinfo(host, nil)
sock = Socket.new(Socket.const_get(addr[0][0]), Socket::SOCK_STREAM, 0)
begin
sock.connect_nonblock(Socket.pack_sockaddr_in(port, addr[0][3]))
rescue Errno::EINPROGRESS
resp = IO.select([sock],nil, nil, timeout.to_i)
if resp.nil?
raise Errno::ECONNREFUSED
end
begin
sock.connect_nonblock(Socket.pack_sockaddr_in(port, addr[0][3]))
rescue Errno::EISCONN
end
end
sock
end
to get a good test. startup a simple socket server and then do a ctrl-z to background it
the IO.select is expecting data to come in on the input stream within 10 seconds. this may not work if that is not the case.
It should be a good replacement for the TCPSocket's open method.

Recovering from a broken TCP socket in Ruby when in gets()

I'm reading lines of input on a TCP socket, similar to this:
class Bla
def getcmd
#sock.gets unless #sock.closed?
end
def start
srv = TCPServer.new(5000)
#sock = srv.accept
while ! #sock.closed?
ans = getcmd
end
end
end
If the endpoint terminates the connection while getline() is running then gets() hangs.
How can I work around this? Is it necessary to do non-blocking or timed I/O?
You can use select to see whether you can safely gets from the socket, see following implementation of a TCPServer using this technique.
require 'socket'
host, port = 'localhost', 7000
TCPServer.open(host, port) do |server|
while client = server.accept
readfds = true
got = nil
begin
readfds, writefds, exceptfds = select([client], nil, nil, 0.1)
p :r => readfds, :w => writefds, :e => exceptfds
if readfds
got = client.gets
p got
end
end while got
end
end
And here a client that tries to break the server:
require 'socket'
host, port = 'localhost', 7000
TCPSocket.open(host, port) do |socket|
socket.puts "Hey there"
socket.write 'he'
socket.flush
socket.close
end
The IO#closed? returns true when both reader and writer are closed.
In your case, the #sock.gets returns nil, and then you call the getcmd again, and this runs in a never ending loop. You can either use select, or close the socket when gets returns nil.
I recommend using readpartial to read from your socket and also catching peer resets:
while true
sockets_ready = select(#sockets, nil, nil, nil)
if sockets_ready != nil
sockets_ready[0].each do |socket|
begin
if (socket == #server_socket)
# puts "Connection accepted!"
#sockets << #server_socket.accept
else
# Received something on a client socket
if socket.eof?
# puts "Disconnect!"
socket.close
#sockets.delete(socket)
else
data = ""
recv_length = 256
while (tmp = socket.readpartial(recv_length))
data += tmp
break if (!socket.ready?)
end
listen socket, data
end
end
rescue Exception => exception
case exception
when Errno::ECONNRESET,Errno::ECONNABORTED,Errno::ETIMEDOUT
# puts "Socket: #{exception.class}"
#sockets.delete(socket)
else
raise exception
end
end
end
end
end
This code borrows heavily from some nice IBM code by M. Tim Jones. Note that #server_socket is initialized by:
#server_socket = TCPServer.open(port)
#sockets is just an array of sockets.
I simply pgrep "ruby" to find the pid, and kill -9 the pid and restart.
If you believe the rdoc for ruby sockets, they don't implement gets. This leads me to believe gets is being provided by a higher level of abstraction (maybe the IO libraries?) and probably isn't aware of socket-specific things like 'connection closed.'
Try using recvfrom instead of gets

Resources