SSL_write: bad write retry. Exception. Readin e-mails with IMAP IDLE. In Ruby - ruby

I woul like to get unseen mails "as soon as possible", using a Ruby (2.1) script to implement IMAP IDLE ("push notify") feature.
With the help of some guys (see also: Support for IMAP IDLE in ruby), I wrote the script here:
https://gist.github.com/solyaris/b993283667f15effa579
def idle_loop(imap, search_condition, folder)
# https://stackoverflow.com/questions/4611716/how-imap-idle-works
loop do
begin
imap.select folder
imap.idle do |resp|
#trap_shutdown
# You'll get all the things from the server.
#For new emails you're only interested in EXISTS ones
if resp.kind_of?(Net::IMAP::UntaggedResponse) and resp.name == "EXISTS"
# Got something. Send DONE. This breaks you out of the blocking call
imap.idle_done
end
end
# We're out, which means there are some emails ready for us.
# Go do a search for UNSEEN and fetch them.
retrieve_emails(imap, search_condition, folder) { |mail| process_email mail}
#rescue Net::IMAP::Error => imap_err
# Socket probably timed out
# puts "IMAP IDLE socket probably timed out.".red
rescue SignalException => e
# https://stackoverflow.com/questions/2089421/capturing-ctrl-c-in-ruby
puts "Signal received at #{time_now}: #{e.class} #{e.message}".red
shutdown imap
rescue Exception => e
puts "Something went wrong at #{time_now}: #{e.class} #{e.message}".red
imap.noop
end
end
end
Now, all run smootly at first glance, BUT I have the exception
Something went wrong: SSL_write: bad write retry
at this line in code:
https://gist.github.com/solyaris/b993283667f15effa579#file-idle-rb-L189
The error happen when I leave the script running for more than... say more than 30 minutes.
BTW, the server is imap.gmail.com (arghh...), and I presume is something related to IMAP IDLE reconnection socket (I din't read yet the ruby UMAP library code) but I do not understand the reason of the exception;
Any idea for the reason if the exception ? Just trap the exception to fix the issue ?
thanks
giorgio
UPDATE
I modified a bit the exception handling (see gist code: https://gist.github.com/solyaris/b993283667f15effa579)
Now I got a Net::IMAP::Error connection closed I just restart the IMAP connection and it seems working...
Sorry for confusing, anyway in general any comments on code I wrote, IDLE protocol correct management, are welcome.

The IMAP IDLE RFC says to stop IDLE after at most 29 minutes and reissue a new IDLE command. IMAP servers are permitted to assume that the client is dead and has gone away after 31 minutes of inactivity.
You may also find that some NAT middleboxes silently sabotage your connection long before the half-hour is up, I've seen timeouts as short as about two minutes. (Every time I see something like that I scream "vivat ipv6!") I don't think there's any good solution for those middleboxes, except maybe to infect them with a vile trojan, but the bad solutions include adjusting your idle timeout if you get the SSL exception before a half-hour is up.

Related

What's the correct way to check if a host-alive and handle timeouts efficiently?

I'm trying to check if a given host is up, running, and listening to a specific port, and to handle any errors correctly.
I found a a number of references of Ruby socket programming but none of them seems to able to handle "socket time-out" efficiently. I tried IO.select, which takes four parameters, of which, the last one is the timeout value:
IO.select([TCPSocket.new('example.com', 22)], [nil], [nil], 4)
The problem is, it gets stuck, especially if the port number is wrong or the server is not listening on to it. So, finally I ended up with this, which I didn't like that much but doing the job:
require 'socket'
require 'timeout'
dns = "example.com"
begin
Timeout::timeout(3) { TCPSocket.new(dns, 22) }
puts "Responded!!"
# do some stuff here...
rescue SocketError
puts "No connection!!"
# do some more stuff here...
rescue Timeout::Error
puts "No connection, timed out!!"
# do some other stuff here...
end
Is there a better way doing this?
The best test for availability of any resource is to try to use it. Adding extra code to try to predict ahead of time whether the use will work is bound to fail:
You test the wrong thing and get a different answer.
You test the right thing but at the wrong time, and the answer changes between the test and the use, and your application performs double the work for nothing, and you write redundant code.
The code you have to write to handle the test failure is identical to the code you should write to handle the use-failure. Why write that twice?
We make extensive use of Net::SSH in one of our systems, and ran into timeout issues.
Probably the biggest fix was to implement use of the select method, to set a low-level timeout, and not try to use the Timeout class, which is thread based.
"How do I set the socket timeout in Ruby?" and "Set socket timeout in Ruby via SO_RCVTIMEO socket option" have code to investigate for that. Also, one of those links to "Socket Timeouts in Ruby" which has useful code, however be aware that it was written for Ruby 1.8.6.
The version of Ruby can make a difference too. Pre-1.9 the threading wasn't capable of stopping a blocking IP session so the code would hang until the socket timed out, then the Timeout would fire. Both the above questions go over that.

APNs error handling in ruby

I want to send notifications to apple devices in batches (1.000 device tokens in batch for example). Ant it seems that I can't know for sure that message was delivered to APNs.
Here is the code sample:
ssl_connection(bundle_id) do |ssl, socket|
device_tokens.each do |device_token|
ssl.write(apn_message_for device_token)
# I can check if there is an error response from APNs
response_has_an_error = IO.select([socket],nil,nil,0) != nil
# ...
end
end
The main problem is if network is down after the ssl_connection is established
ssl.write(...)
will never raise an error. Is there any way to ckeck that connection still works?
The second problem is in delay between ssl.write and ready error answer from APNs. I can pass timeout parameter to IO.select after last messege was sent. Maybe It's OK to wait for a few seconds for 1.000 batch, but wat if I have to send 1.000 messages for differend bundle_ids?
At https://zeropush.com, we use a gem named grocer to handle our communication with Apple and we had a similar problem. The solution we found was to use the socket's read_non_block method before each write to check for incoming data on the socket which would indicate an error.
It makes the logic a bit funny because read_non_block throws IO::WaitReadable if there is no data to read. So we call read_non_block and catch IO::WaitReadable before continuing as normal. In our case, catching the exception is the happy case. You may be able to use a similar approach rather than using IO.select(...).
One issue to be aware of is that Apple may not respond immediately and any notifications sent between a failing notification and reading from the socket will be lost.
You can see the code we are using in production at https://github.com/SymmetricInfinity/grocer/blob/master/lib/grocer/connection.rb#L30.

reconnect tcpsocket (or how to detect closed socket)

I have a ruby tcpsocket client that is connected to a server.
How can I check to see if the socket is connected before I send the data ?
Do I try to "rescue" a disconnected tcpsocket, reconnect and then resend ? if so, does anyone have a simple code sample as I don't know where to begin :(
I was quite proud that I managed to get a persistent connected client tcpsocket in rails. Then the server decided to kill the client and it all fell apart ;)
edit
I've used this code to get round some of the problems - it will try to reconnect if not connected, but won't handle the case if the server is down (it will keep retrying). Is this the start of the right approach ? Thanks
def self.write(data)
begin
##my_connection.write(data)
rescue Exception => e
##my_connection = TCPSocket.new 'localhost', 8192
retry
end
end
What I usually do in these types of scenarios is keep track of consecutive retries in a variable and have some other variable that sets the retry roof. Once we hit the roof, throw some type of exception that indicates there is a network or server problem. You'll want to reset the retry count variable on success of course.

Ruby TCPSocket doesn't notice it when server is killed

I've this ruby code that connects to a TCP server (namely, netcat). It loops 20 times, and sends "ABCD ". If I kill netcat, it takes TWO iterations of the loop for an exception to be triggered. On the first loop after netcat is killed, no exception is triggered, and "send" reports that 5 bytes have been correctly written... Which in the end is not true, since of course the server never received them.
Is there a way to work around this issue ? Right now I'm losing data : since I think it's been correctly transfered, I'm not replaying it.
#!/usr/bin/env ruby
require 'rubygems'
require 'socket'
sock = TCPSocket.new('192.168.0.10', 5443)
sock.sync = true
20.times do
sleep 2
begin
count = sock.write("ABCD ")
puts "Wrote #{count} bytes"
rescue Exception => myException
puts "Exception rescued : #{myException}"
end
end
When you're sending data your blocking call will return when the data is written to the TCP output buffer. It would only block if the buffer was full, waiting for the server to acknowledge receipt of previous data that was sent.
Once this data is in the buffer, the network drivers try to send the data. If the connection is lost, on the second attempt to write, your application discovers the broken state of the connection.
Also, how does the connection close? Is the server actively closing the connection? In which case client socket would be notified at its next socket call. Or has it crashed? Or perhaps there's a network fault which means you can no longer communicate.
Discovering a broken connection only occurs when you try to send or receive data over the socket. This is different from having the connection actively closed. You simply can't determine if the connection is still alive without doing something with it.
So try doing sock.recv(0) after the write - if the socket has failed this would raise "Errno::ECONNRESET: Connection reset by peer - recvfrom(2)". You could also try sock.sendmsg "", 0 (not sock.write, or sock.send), and this would report a "Errno::EPIPE: Broken pipe - sendmsg(2)".
Even if you got your hands on the TCP packets and get acknowledgement that the data had been received at the other end, there's still no guarantee that the server will have processed this data - it might in its input buffer but not yet processed.
All of this might help identify a broken connection earlier, but it still won't guarantee that the data was received and processed by the server. The only sure way to know that the application has processed your message is with an application level response.
I tried without the sleep function (just to make sure it wasn't putting on hold anything) and still no luck:
#!/usr/bin/env ruby
require 'rubygems'
require 'socket'
require 'activesupport' # Fixnum.seconds
sock = TCPSocket.new('127.0.0.1', 5443)
sock.sync = true
will_restart_at = Time.now + 2.seconds
should_continue = true
while should_continue
if will_restart_at <= Time.now
will_restart_at = Time.now + 2.seconds
begin
count = sock.write("ABCD ")
puts "Wrote #{count} bytes"
rescue Exception => myException
puts "Exception rescued : #{myException}"
should_continue = false
end
end
end
I analyzed with Wireshark and the two solutions are exactly behaving identically.
I think (and can't be sure) that until you actually call your_socket.write (which will not fail as the socket is still opened because you weren't probing for its possible destruction), the socket won't raise any error.
I tried to simulate this with nginx and manual TCP sockets. And look at that:
irb> sock = TCPSocket.new('127.0.0.1', 80)
=> #<TCPSocket:0xb743b824>
irb> sock.write("salut")
=> 5
irb> sock.read
=> "<html>\r\n<head><title>400 Bad Request</title></head>\r\n<body>\r\n</body>\r\n</html>\r\n"
# Here, I kill nginx
irb> sock.write("salut")
=> 5
irb> sock.read
=> ""
irb> sock.write("salut")
Errno::EPIPE: Broken pipe
So what's the conclusion from here? Unless you're actually expecting some data from the server, you're screwed to detect that you've lost the connection :)
To detect a gracefully close, you'll have to read from the socket - read returning 0 indicates the socket has closed.
If you do need know if data got sent successfully though, there's no way other than implementing ACKs of the data at the application level.

'Who's online?' Ruby Network Program

I have several embedded linux systems that I want to write a 'Who's Online?' network service in Ruby. Below is related part of my code:
mySocket = UDPSocket.new
mySocket.bind("<broadcast>", 50050)
loop do
begin
text, sender = mySocket.recvfrom(1024)
puts text
if text =~ /KNOCK KNOCK/ then
begin
sock = UDPSocket.open
sock.send(r.ipaddress, 0, sender[3], 50051)
sock.close
rescue
retry
end
end
rescue Exception => inLoopEx
puts inLoopEx.message
puts inLoopEx.backtrace.inspect
retry
end
end
I send the 'KNOCK KNOCK' command from a PC. Now, the problem is since they all receive the message at the same time, they try to respond at the same time too, which causes a Broken Pipe exception (which is the reason of my 'rescue retry' code). This code works OK sometimes but; other times the rescue retry part of the code (which is waked by Broken Pipe exception from sock.send) causes one or more systems to respond after 5 seconds or so.
Is there a better way of doing this since I assume I cant escape the Broken Pipe exception?
I have found that exception was caused by the 'r.ipaddress' part in the send command, which is related to my embedded system's internals...

Resources