Ruby TCPSocket: Find out how much data is available - ruby

Is there a way to find out how many bytes of data is available on an TCPSocket in Ruby? I.e. how many bytes can be ready without blocking?

The standard library io/wait might be useful here. Requiring it gives stream-based I/O (sockets and pipes) some new methods, among which is ready?. According to the documentation, ready? returns non-nil if there are bytes available without blocking. It just so happens that the non-nil value it returns it the number of bytes that are available in MRI.
Here's an example which creates a dumb little socket server, and then connects to it with a client. The server just sends "foo" and then closes the connection. The client waits a little bit to give the server time to send, and then prints how many bytes are available for reading. The interesting stuff for you is in the client:
require 'socket'
require 'io/wait'
# Server
server_socket = TCPServer.new('localhost', 0)
port = server_socket.addr[1]
Thread.new do
session = server_socket.accept
sleep 0.5
session.puts "foo"
session.close
end
# Client
client_socket = TCPSocket.new('localhost', port)
puts client_socket.ready? # => nil
sleep 1
puts client_socket.ready? # => 4
Don't use that server code in anything real. It's deliberately short in order to keep the example simple.
Note: According to the Pickaxe book, io/wait is only available if "FIONREAD feature in ioctl(2)", which it is in Linux. I don't know about Windows & others.

Related

Read entire message from a TCPSocket without hanging

I'm putting together a TCPServer in Ruby 3.0.2 and I'm finding that I can't seem to read the entire packet without blocking (until the socket is closed).
Edit: There was some confusion on what I was trying to do - my bad - so just to help clarify: I wanted to read everything that had been sent over the TCP connection so far. (end edit)
My first try was:
#!/snap/bin/ruby
require 'socket'
server = TCPServer.new('localhost', 4200)
loop {
Thread.start(server.accept) do |connection|
puts connection.gets # The important line
end
}
But that hangs until the client closes the connection. Okay, so I take a look at connection.methods, and the ruby docs and try a bunch of options that seem promising. Basically, there is two types of read methods: blocking and nonblocking.
The blocking methods that I tried are .read, .gets, .readlines, .readline, .recv, and .recvmsg. Now .read, .readlines, and .gets all hang (until the socket is closed) - so that's not helpful. The other ones (eg. .readline, the recv methods) don't read the entire message. Now, I could read each line until I see an empty line and parse the HTTP header from there. But there's got to be a better way; I don't want to have to worry about getting a corrupted message and hanging because I didn't read an empty line at the end of the header.
So I went looking at the non-blocking options. Specifically .recv_nonblock and .recvmsg_nonblock. Both of these throw errors (Resource temporarily unavailable - recvfrom(2) would block and Resource temporarily unavailable - recvmsg(2) respectively).
Any ideas on what could be going on? I think it has something to with me using Ruby 3, because trying out the code on Ruby 2.5, client.gets returns a line (doesn't hang), although .readlines does hang - so not sure what's going on.
Ideally, I could just call something along the lines of client.get_message and I would get the entire message that has been sent, but I'd also be okay with working at the TCP level and getting the packet size, reading that size, and reconstructing the message from there.
TCP just transmits the bytes that you write to the socket, and guarantees that the are received in the order they were sent. If you have the concept of a 'message' then you'll need to add that into your server and client.
.gets specifically will block until it reads a new 'line', or whatever you define as the separator for the string - see the docs IO#gets. This means that until your server receives that byte from the client, it will block.
In your client have a look at how you're writing your data - if you're using ruby then puts would work, as it will terminate the string with a new line. If you're using write then it will only write the string without a new line
Ie.
# client.rb
c = TCPSocket.new 'localhost', 5000
c.puts "foo"
c.write "bar"
c.write "baz\n"
# server.rb
s = TCPServer.new 5000
loop do
client = s.accept
puts client.gets
puts client.gets
end
will output
foo
barbaz
Thanks to everyone who commented/answered, but I found the solution that I think was intended by the creators of the Socket class!
The recv_nonblock method takes some optional arguments - one of which is a buffer that the Socket will store what it has read to. So a call like client.recv_nonblock(1000, 0, buffer) stores up to 1000 characters from the Socket into buffer and then exits instead of blocking.
Just to make life easy, I put together a monkey patch to the TCPSocket class:
class TCPSocket
def eat_buffer
contents = ''
buffer = ''
begin
loop {
recv_nonblock(256, 0, buffer)
contents += buffer
}
rescue IO::EAGAINWaitReadable
contents
end
end
end
The point that Steffen makes in the comments is well taken - TCP isn't designed to be used this way. This is a hacky (in the bad sense) method, and should be avoided.

2 sockets interoperation produce "socket operation on non-socket - ENOTSOCK" error

In a Ruby script I'm having a problem with socket connections.
What I am doing is the following:
I have two threads and each one creates a connection to a different web server
Any time thread 1 receives data from server 1, I want thread 1 to post this data to server 2
Any time thread 2 receives data from server 2, I want thread 2 to post this data to server 1
Basically I am kind of acting as a bridge between the 2 servers.
Code looks like this:
require 'uri'
require 'net/http'
require 'json'
#connection1 = Net::HTTP.start 'server1.com'
#connection2 = Net::HTTP.start 'server2.com'
# reads data from server 1 as it comes and sends it to server 2
Thread.new{
while JSON.parse(#connection1.post('/receive').body) !nil
#connection2.post '/send', JSON.parse(#connection1.post('/receive').body)
end
}
# reads data from server 2 as it comes and sends it to server 2
while JSON.parse(#connection2.post('/receive').body) !nil
#connection1.post '/send', JSON.parse(#connection2.post('/receive').body)
end
# Thread.join
# not actually needed because the two connections are supposed to continuously stream data
However as soon as one of the two connections receives data and tries sending it to the other connection I'm receiving the following error:
Socket operation on non-socket - Errno::ENOTSOCK
More in deep stack trace:
C:/Dev/Ruby24-x64/lib/ruby/2.4.0/net/protocol.rb:176:in
wait_readable': socket operation on non-socket. (Errno::ENOTSOCK)
from C:/Dev/Ruby24-x64/lib/ruby/2.4.0/net/protocol.rb:176:in 'rbuf_fill'
from C:/Dev/Ruby24-x64/lib/ruby/2.4.0/net/protocol.rb:154:in 'readuntil'
from C:/Dev/Ruby24-x64/lib/ruby/2.4.0/net/protocol.rb:164:in 'readline'
from C:/Dev/Ruby24-x64/lib/ruby/2.4.0/net/http/response.rb:40:in
'read_status_line'
from C:/Dev/Ruby24-x64/lib/ruby/2.4.0/net/http/response.rb:29:in 'read_new'
from C:/Dev/Ruby24-x64/lib/ruby/2.4.0/net/http.rb:1446:in block in 'transport_request'
from C:/Dev/Ruby24-x64/lib/ruby/2.4.0/net/http.rb:1443:in 'catch'
from C:/Dev/Ruby24-x64/lib/ruby/2.4.0/net/http.rb:1443:in 'transport_request'
from C:/Dev/Ruby24-x64/lib/ruby/2.4.0/net/http.rb:1416:in 'request'
from C:/Dev/Ruby24-x64/lib/ruby/2.4.0/net/http.rb:1430:in 'send_entity'
from C:/Dev/Ruby24-x64/lib/ruby/2.4.0/net/http.rb:1218:in 'post'
So what do you think I am doing wrong?
I should add that for reasons beyond my control the two remote servers are configured to serve data when contacted with a POST rather than with a GET.
Core problem
You lack any sort of synchronization between both threads and Net::HTTP is not thread-safe.
What's possibly happening here is that you call #connection1.post /receive in one thread, that said thread gets paused and the second thread tries to use #connection1.post /send while connection1 is still being used.
Another problem is that your code in inefficient, you issue two /receive requests per thread to get information.
while JSON.parse(#connection1.post('/receive').body) !nil
#connection2.post '/send', JSON.parse(#connection1.post('/receive').body)
end
This makes three requests total
Could be
while True
result = JSON.parse(#connection1.post('/receive').body)
break if result.nil?
#connection2.post '/send', result)
end
This makes two requests total
Suggested Solution
Use a Mutex to make sure that while connection1 is sending/receiving a request, no other thread touches it.
require 'uri'
require 'net/http'
require 'json'
#connection1 = Net::HTTP.start 'server1.com'
#connection2 = Net::HTTP.start 'server2.com'
connection_1_lock = Mutex.new
connection_2_lock = Mutex.new
# reads data from server 1 as it comes and sends it to server 2
Thread.new do
while True
receive_result = nil
connection_1_lock.synchronize do
receive_result = JSON.parse(#connection1.post('/receive').body)
end
connection_2_lock.synchronize do
#connection2.post '/send', receive_result
end
end
end
Thread.new do
while True
receive_result = nil
connection_2_lock.synchronize do
receive_result = JSON.parse(#connection2.post('/receive').body)
end
connection_1_lock.synchronize do
#connection1.post '/send', receive_result
end
end
end
I believe the code above should fix your problem, although I cannot guarantee it. Concurrent programming is hard.
Further reading:
I suggest you read up on concurrent/multithreaded programming and its pitfalls. There are numerous Ruby resources online.
Since Ruby's documentation on Mutex is notoriously bad, I'll shamelessly plug my own article here and suggest you read it:
https://dev.to/enether/working-with-multithreaded-ruby-part-i-cj3 (The 'How To Protect Yourself' paragraph introduces mutexes)

Set socket timeout in Ruby via SO_RCVTIMEO socket option

I'm trying to make sockets timeout in Ruby via the SO_RCVTIMEO socket option however it seems to have no effect on any recent *nix operating system.
Using Ruby's Timeout module is not an option as it requires spawning and joining threads for each timeout which can become expensive. In applications that require low socket timeouts and which have a high number of threads it essentially kills performance. This has been noted in many places including Stack Overflow.
I've read Mike Perham's excellent post on the subject here and in an effort to reduce the problem to one file of runnable code created a simple example of a TCP server that will receive a request, wait the amount of time sent in the request and then close the connection.
The client creates a socket, sets the receive timeout to be 1 second, and then connects to the server. The client tells the server to close the session after 5 seconds then waits for data.
The client should timeout after one second but instead successfully closes the connection after 5.
#!/usr/bin/env ruby
require 'socket'
def timeout
sock = Socket.new(Socket::AF_INET, Socket::SOCK_STREAM, 0)
# Timeout set to 1 second
timeval = [1, 0].pack("l_2")
sock.setsockopt Socket::SOL_SOCKET, Socket::SO_RCVTIMEO, timeval
# Connect and tell the server to wait 5 seconds
sock.connect(Socket.pack_sockaddr_in(1234, '127.0.0.1'))
sock.write("5\n")
# Wait for data to be sent back
begin
result = sock.recvfrom(1024)
puts "session closed"
rescue Errno::EAGAIN
puts "timed out!"
end
end
Thread.new do
server = TCPServer.new(nil, 1234)
while (session = server.accept)
request = session.gets
sleep request.to_i
session.close
end
end
timeout
I've tried doing the same thing with a TCPSocket as well (which connects automatically) and have seen similar code in redis and other projects.
Additionally, I can verify that the option has been set by calling getsockopt like this:
sock.getsockopt(Socket::SOL_SOCKET, Socket::SO_RCVTIMEO).inspect
Does setting this socket option actually work for anyone?
You can do this efficiently using select from Ruby's IO class.
IO::select takes 4 parameters. The first three are arrays of sockets to monitor and the last one is a timeout (specified in seconds).
The way select works is that it makes lists of IO objects ready for a given operation by blocking until at least one of them is ready to either be read from, written to, or wants to raise an error.
The first three arguments therefore, correspond to the different types of states to monitor.
Ready for reading
Ready for writing
Has pending exception
The fourth is the timeout you want to set (if any). We are going to take advantage of this parameter.
Select returns an array that contains arrays of IO objects (sockets in this case) which are deemed ready by the operating system for the particular action being monitored.
So the return value of select will look like this:
[
[sockets ready for reading],
[sockets ready for writing],
[sockets raising errors]
]
However, select returns nil if the optional timeout value is given and no IO object is ready within timeout seconds.
Therefore, if you want to do performant IO timeouts in Ruby and avoid having to use the Timeout module, you can do the following:
Let's build an example where we wait timeout seconds for a read on socket:
ready = IO.select([socket], nil, nil, timeout)
if ready
# do the read
else
# raise something that indicates a timeout
end
This has the benefit of not spinning up a new thread for each timeout (as in the Timeout module) and will make multi-threaded applications with many timeouts much faster in Ruby.
I think you're basically out of luck. When I run your example with strace (only using an external server to keep the output clean), it's easy to check that setsockopt is indeed getting called:
$ strace -f ruby foo.rb 2>&1 | grep setsockopt
[pid 5833] setsockopt(5, SOL_SOCKET, SO_RCVTIMEO, "\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 16) = 0
strace also shows what's blocking the program. This is the line I see on the screen before the server times out:
[pid 5958] ppoll([{fd=5, events=POLLIN}], 1, NULL, NULL, 8
That means that the program is blocking on this call to ppoll, not on a call to recvfrom. The man page that lists socket options (socket(7)) states that:
Timeouts have no effect for select(2), poll(2), epoll_wait(2), etc.
So the timeout is being set but has no effect. I hope I'm wrong here, but it seems there's no way to change this behavior in Ruby. I took a quick look at the implementation and didn't find an obvious way out. Again, I hope I'm wrong -- this seems to be something basic, how come it's not there?
One (very ugly) workaround is by using dl to call read or recvfrom directly. Those calls are affected by the timeout you set. For example:
require 'socket'
require 'dl'
require 'dl/import'
module LibC
extend DL::Importer
dlload 'libc.so.6'
extern 'long read(int, void *, long)'
end
sock = Socket.new(Socket::AF_INET, Socket::SOCK_STREAM, 0)
timeval = [3, 0].pack("l_l_")
sock.setsockopt Socket::SOL_SOCKET, Socket::SO_RCVTIMEO, timeval
sock.connect( Socket.pack_sockaddr_in(1234, '127.0.0.1'))
buf = "\0" * 1024
count = LibC.read(sock.fileno, buf, 1024)
if count == -1
puts 'Timeout'
end
This code works here. Of course: it's an ugly solution, which won't work on many platforms, etc. It may be a way out though.
Also please notice that this is the first time I do something similar in Ruby, so I'm not aware of all the pitfalls I may be overlooking -- in particular, I'm suspect of the types I specified in 'long read(int, void *, long)' and of the way I'm passing a buffer to read.
Based on my testing, and Jesse Storimer's excellent ebook on "Working with TCP Sockets" (in Ruby), the timeout socket options do not work in Ruby 1.9 (and, I presume 2.0 and 2.1). Jesse says:
Your operating system also offers native socket timeouts that can be set via the
SNDTIMEO and RCVTIMEO socket options. But, as of Ruby 1.9, this feature is no longer
functional."
Wow. I think the moral of the story is to forget about these options and use IO.select or Tony Arcieri's NIO library.

Understanding IO.select when reading socket in Ruby

I have some code that I'm using to get data from a network socket. It works fine, but I flailed my way into it through trial and error. I humbly admit that I don't fully understand how it works, but I would really like to. (This was cargo culted form working code I found)
The part I don't understand starts with "ready = IO.select ..." I'm unclear on:
What IO.select is doing (I tried looking it up but got even more confused with Kernel and what-not)
what the array argument to IO.select is for
what ready[0] is doing
the general idea of reading 1024 bytes? at a time
Here's the code:
#mysocket = TCPSocket.new('192.168.1.1', 9761)
th = Thread.new do
while true
ready = IO.select([#mysocket])
readable = ready[0]
readable.each do |socket|
if socket == #mysocket
buf = #mysocket.recv_nonblock(1024)
if buf.length == 0
puts "The server connection is dead. Exiting."
exit
else
puts "Received a message"
end
end
end
end
end
Thanks in advance for helping me "learn to fish". I hate having bits of my code that I don't fully understand - it's just working by coincidence.
1) IO.select takes a set of sockets and waits until it's possible to read or write with them (or if error happens). It returns sockets event happened with.
2) array contains sockets that are checked for events. In your case you specify only sockets for reading.
3) IO.select returns an array of arrays of sockets. Element 0 contains sockets you can read from, element 1 - sockets you can write to and element 2 - sockets with errors.
After getting list of sockets you can read the data.
4) yes, recv_nonblock argument is size in byte. Note that size of data actually being read may be less than 1024, in this case you may need to repeat select (if actual data matters for you).

Ruby TCPSocket doesn't notice it when server is killed

I've this ruby code that connects to a TCP server (namely, netcat). It loops 20 times, and sends "ABCD ". If I kill netcat, it takes TWO iterations of the loop for an exception to be triggered. On the first loop after netcat is killed, no exception is triggered, and "send" reports that 5 bytes have been correctly written... Which in the end is not true, since of course the server never received them.
Is there a way to work around this issue ? Right now I'm losing data : since I think it's been correctly transfered, I'm not replaying it.
#!/usr/bin/env ruby
require 'rubygems'
require 'socket'
sock = TCPSocket.new('192.168.0.10', 5443)
sock.sync = true
20.times do
sleep 2
begin
count = sock.write("ABCD ")
puts "Wrote #{count} bytes"
rescue Exception => myException
puts "Exception rescued : #{myException}"
end
end
When you're sending data your blocking call will return when the data is written to the TCP output buffer. It would only block if the buffer was full, waiting for the server to acknowledge receipt of previous data that was sent.
Once this data is in the buffer, the network drivers try to send the data. If the connection is lost, on the second attempt to write, your application discovers the broken state of the connection.
Also, how does the connection close? Is the server actively closing the connection? In which case client socket would be notified at its next socket call. Or has it crashed? Or perhaps there's a network fault which means you can no longer communicate.
Discovering a broken connection only occurs when you try to send or receive data over the socket. This is different from having the connection actively closed. You simply can't determine if the connection is still alive without doing something with it.
So try doing sock.recv(0) after the write - if the socket has failed this would raise "Errno::ECONNRESET: Connection reset by peer - recvfrom(2)". You could also try sock.sendmsg "", 0 (not sock.write, or sock.send), and this would report a "Errno::EPIPE: Broken pipe - sendmsg(2)".
Even if you got your hands on the TCP packets and get acknowledgement that the data had been received at the other end, there's still no guarantee that the server will have processed this data - it might in its input buffer but not yet processed.
All of this might help identify a broken connection earlier, but it still won't guarantee that the data was received and processed by the server. The only sure way to know that the application has processed your message is with an application level response.
I tried without the sleep function (just to make sure it wasn't putting on hold anything) and still no luck:
#!/usr/bin/env ruby
require 'rubygems'
require 'socket'
require 'activesupport' # Fixnum.seconds
sock = TCPSocket.new('127.0.0.1', 5443)
sock.sync = true
will_restart_at = Time.now + 2.seconds
should_continue = true
while should_continue
if will_restart_at <= Time.now
will_restart_at = Time.now + 2.seconds
begin
count = sock.write("ABCD ")
puts "Wrote #{count} bytes"
rescue Exception => myException
puts "Exception rescued : #{myException}"
should_continue = false
end
end
end
I analyzed with Wireshark and the two solutions are exactly behaving identically.
I think (and can't be sure) that until you actually call your_socket.write (which will not fail as the socket is still opened because you weren't probing for its possible destruction), the socket won't raise any error.
I tried to simulate this with nginx and manual TCP sockets. And look at that:
irb> sock = TCPSocket.new('127.0.0.1', 80)
=> #<TCPSocket:0xb743b824>
irb> sock.write("salut")
=> 5
irb> sock.read
=> "<html>\r\n<head><title>400 Bad Request</title></head>\r\n<body>\r\n</body>\r\n</html>\r\n"
# Here, I kill nginx
irb> sock.write("salut")
=> 5
irb> sock.read
=> ""
irb> sock.write("salut")
Errno::EPIPE: Broken pipe
So what's the conclusion from here? Unless you're actually expecting some data from the server, you're screwed to detect that you've lost the connection :)
To detect a gracefully close, you'll have to read from the socket - read returning 0 indicates the socket has closed.
If you do need know if data got sent successfully though, there's no way other than implementing ACKs of the data at the application level.

Resources