What's the correct way to check if a host-alive and handle timeouts efficiently? - ruby

I'm trying to check if a given host is up, running, and listening to a specific port, and to handle any errors correctly.
I found a a number of references of Ruby socket programming but none of them seems to able to handle "socket time-out" efficiently. I tried IO.select, which takes four parameters, of which, the last one is the timeout value:
IO.select([TCPSocket.new('example.com', 22)], [nil], [nil], 4)
The problem is, it gets stuck, especially if the port number is wrong or the server is not listening on to it. So, finally I ended up with this, which I didn't like that much but doing the job:
require 'socket'
require 'timeout'
dns = "example.com"
begin
Timeout::timeout(3) { TCPSocket.new(dns, 22) }
puts "Responded!!"
# do some stuff here...
rescue SocketError
puts "No connection!!"
# do some more stuff here...
rescue Timeout::Error
puts "No connection, timed out!!"
# do some other stuff here...
end
Is there a better way doing this?

The best test for availability of any resource is to try to use it. Adding extra code to try to predict ahead of time whether the use will work is bound to fail:
You test the wrong thing and get a different answer.
You test the right thing but at the wrong time, and the answer changes between the test and the use, and your application performs double the work for nothing, and you write redundant code.
The code you have to write to handle the test failure is identical to the code you should write to handle the use-failure. Why write that twice?

We make extensive use of Net::SSH in one of our systems, and ran into timeout issues.
Probably the biggest fix was to implement use of the select method, to set a low-level timeout, and not try to use the Timeout class, which is thread based.
"How do I set the socket timeout in Ruby?" and "Set socket timeout in Ruby via SO_RCVTIMEO socket option" have code to investigate for that. Also, one of those links to "Socket Timeouts in Ruby" which has useful code, however be aware that it was written for Ruby 1.8.6.
The version of Ruby can make a difference too. Pre-1.9 the threading wasn't capable of stopping a blocking IP session so the code would hang until the socket timed out, then the Timeout would fire. Both the above questions go over that.

Related

Raise exception when TCP connection broken

I'm building a server which accepts connections through TCP (using TCPServer). I mostly just read data (socket.gets.chomp) and write data (socket.print).
socket.gets will return nil if the connection has been closed by the client in the meantime, so .chomp will raise NoMethodError. This is hard to handle specifically since it's such an unspecific exception - I want to distinguish exceptions caused by the connection loss from other causes of NoMethodError, such as me typoing a method.
Ideally, I would receive something more specific such as SocketError whenever trying to interact with a closed socket, rather than just getting back nil. How could I accomplish that?
I have already considered these options:
Write a wrapper for TCPSocket or IO which checks on socket availability before every call (a lot of work to do cleanly considering how many methods there are in IO)
Check each return value for nil (even more effort and code redundancy as my application grows, also I would still .print to the socket when it's already closed)
Monkey patching NilClass for chomp (again only handles this specific use case, and monkey patching should be avoided for clean code)
Being at end of file is not intrinsically an error, nor is it normally understood to mean a "broken" connection like your title says.
For example, HTTP allows multiple requests to be sent over a single connection. After completely reading a request you can read again, and if the connection is closed you'd get nil, which tells you there are no more requests coming. This situation isn't considered an error condition by most/all HTTP software.
Most Ruby software handles nil return from read as an indication that the network conversation is over (successfully). I suggest you do something like that.
If you wish to consider EOF an error, you could create a wrapper class for IO that would "upgrade" nil return from read into an exception of some kind, but I would suggest rethinking whether this is really what you need.
See also https://ruby-doc.org/core-3.0.0/IO.html#method-i-read.

SSL_write: bad write retry. Exception. Readin e-mails with IMAP IDLE. In Ruby

I woul like to get unseen mails "as soon as possible", using a Ruby (2.1) script to implement IMAP IDLE ("push notify") feature.
With the help of some guys (see also: Support for IMAP IDLE in ruby), I wrote the script here:
https://gist.github.com/solyaris/b993283667f15effa579
def idle_loop(imap, search_condition, folder)
# https://stackoverflow.com/questions/4611716/how-imap-idle-works
loop do
begin
imap.select folder
imap.idle do |resp|
#trap_shutdown
# You'll get all the things from the server.
#For new emails you're only interested in EXISTS ones
if resp.kind_of?(Net::IMAP::UntaggedResponse) and resp.name == "EXISTS"
# Got something. Send DONE. This breaks you out of the blocking call
imap.idle_done
end
end
# We're out, which means there are some emails ready for us.
# Go do a search for UNSEEN and fetch them.
retrieve_emails(imap, search_condition, folder) { |mail| process_email mail}
#rescue Net::IMAP::Error => imap_err
# Socket probably timed out
# puts "IMAP IDLE socket probably timed out.".red
rescue SignalException => e
# https://stackoverflow.com/questions/2089421/capturing-ctrl-c-in-ruby
puts "Signal received at #{time_now}: #{e.class} #{e.message}".red
shutdown imap
rescue Exception => e
puts "Something went wrong at #{time_now}: #{e.class} #{e.message}".red
imap.noop
end
end
end
Now, all run smootly at first glance, BUT I have the exception
Something went wrong: SSL_write: bad write retry
at this line in code:
https://gist.github.com/solyaris/b993283667f15effa579#file-idle-rb-L189
The error happen when I leave the script running for more than... say more than 30 minutes.
BTW, the server is imap.gmail.com (arghh...), and I presume is something related to IMAP IDLE reconnection socket (I din't read yet the ruby UMAP library code) but I do not understand the reason of the exception;
Any idea for the reason if the exception ? Just trap the exception to fix the issue ?
thanks
giorgio
UPDATE
I modified a bit the exception handling (see gist code: https://gist.github.com/solyaris/b993283667f15effa579)
Now I got a Net::IMAP::Error connection closed I just restart the IMAP connection and it seems working...
Sorry for confusing, anyway in general any comments on code I wrote, IDLE protocol correct management, are welcome.
The IMAP IDLE RFC says to stop IDLE after at most 29 minutes and reissue a new IDLE command. IMAP servers are permitted to assume that the client is dead and has gone away after 31 minutes of inactivity.
You may also find that some NAT middleboxes silently sabotage your connection long before the half-hour is up, I've seen timeouts as short as about two minutes. (Every time I see something like that I scream "vivat ipv6!") I don't think there's any good solution for those middleboxes, except maybe to infect them with a vile trojan, but the bad solutions include adjusting your idle timeout if you get the SSL exception before a half-hour is up.

Elegant way to stop socket read operation from outside

I implemented a small client server application in Ruby and I have the following problem: The server starts a new client session in a new thread for each connecting client, but it should be possible to shutdown the server and stop all the client sessions in a 'polite' way from outside without just killing the thread while I don't know which state it is in.
So I decided that the client session object gets a `stop' flag which can be set from outside and is checked before each action. The problem is that it should not wait for the client, if it is just waiting for a request. I have the following temporary solution:
def read_client
loop do
begin
timeout(1) { return #client.gets }
rescue Timeout::Error
if #stop
stop # Notifies the client and closes the connection
return nil
end
end
end
end
But that sucks, looks terrible and intuitively, this should be such a normal thing that there has to be a `normal' solution to it. I don't even know if it is safe or if it could happen that the gets operation reads part of the client request, but not all of it.
Another side question is, if setting/getting a boolean flag is an atomic operation in Ruby (or if I need an additional Mutex for the flag).
Thread-per-client approach is usually a disaster for server design. Also blocking I/O is difficult to interrupt without OS-specific tricks. Check out non-blocking sockets, see for example, answers to this question.

What is the best way to simulate no Internet connection within a Cucumber test?

Part of my command-line Ruby program involves checking if there is an internet connection before any commands are processed. The actual check in the program is trivial (using Socket::TCPSocket), but I'm trying to test this behaviour in Cucumber for an integration test.
The code:
def self.has_internet?(force = nil)
if !force.nil? then return force
begin
TCPSocket.new('www.yelp.co.uk', 80)
return true
rescue SocketError
return false
end
end
if has_internet? == false
puts("Could not connect to the Internet!")
exit 2
end
The feature:
Scenario: Failing to log in due to no Internet connection
Given the Internet is down
When I run `login <email_address> <password>`
Then the exit status should be 2
And the output should contain "Could not connect to the Internet!"
I obviously don't want to change the implementation to fit the test, and I require all my scenarios to pass. Clearly if there is actually no connection, the test passes as it is, but my other tests fail as they require a connection.
My question: How can I test for this in a valid way and have all my tests pass?
You can stub your has_internet? method and return false in the implementation of the Given the Internet is down step.
YourClass.stub!(:has_internet?).and_return(false)
There are three alternative solutions I can think of:
have the test temporarily monkeypatch TCPSocket.initialize (or maybe Socket#connect, if that's where it ends up) to pretend the internet is down.
write (suid) a script that adds/removes an iptables firewall rule to disable the internet, and have your test call the script
use LD_PRELOAD on a specially written .so shared library that overrides the connect C call. This is harder.
Myself, I would probably try option 1, give up after about 5 minutes and go with option 2.
maybe a bit late for you :), but have a look at
https://github.com/mmolhoek/vcr-uri-catcher
I made this to test network failures, so this should do the trick for you.

In ruby, how do I attempt a block of code but move on after n seconds?

I have a library method that occasionally hangs on a network connection, and there's no timeout mechanism.
What's the easiest way to add my own? Basically, I'm trying to keep my code from getting indefinitely stuck.
timeout.rb has some problems where basically it doesn't always work quite right, and I wouldn't recommend using it. Check System Timer or Terminator instead
The System Timer page in particular describes why timeout.rb can fail, complete with pretty pictures and everything. Bottom line is:
For timeout.rb to work, a freshly created “homicidal” Ruby thread has to be scheduled by the Ruby interpreter.
M.R.I. 1.8, the interpreter used by most Ruby applications in production, implements Ruby threads as green threads.
It is a well-known limitations of the green threads (running on top of a single native thread) that when a green thread performs a blocking system call to the underlying operating systems, none of green threads in the virtual machine will run until the system call returns.
Answered my own question:
http://www.ruby-doc.org/stdlib/libdoc/timeout/rdoc/index.html
require 'timeout'
status = Timeout::timeout(5) {
# Something that should be interrupted if it takes too much time...
}
To prevent an ugly error on timeout I suggest enclosing it and using a rescue like this:
begin
status = Timeout::timeout(5) do
#Some stuff that should be interrupted if it takes too long
end
rescue Timeout::Error
puts "Error message: It took too long!\n"
end

Resources