Understanding IO.select when reading socket in Ruby - ruby

I have some code that I'm using to get data from a network socket. It works fine, but I flailed my way into it through trial and error. I humbly admit that I don't fully understand how it works, but I would really like to. (This was cargo culted form working code I found)
The part I don't understand starts with "ready = IO.select ..." I'm unclear on:
What IO.select is doing (I tried looking it up but got even more confused with Kernel and what-not)
what the array argument to IO.select is for
what ready[0] is doing
the general idea of reading 1024 bytes? at a time
Here's the code:
#mysocket = TCPSocket.new('192.168.1.1', 9761)
th = Thread.new do
while true
ready = IO.select([#mysocket])
readable = ready[0]
readable.each do |socket|
if socket == #mysocket
buf = #mysocket.recv_nonblock(1024)
if buf.length == 0
puts "The server connection is dead. Exiting."
exit
else
puts "Received a message"
end
end
end
end
end
Thanks in advance for helping me "learn to fish". I hate having bits of my code that I don't fully understand - it's just working by coincidence.

1) IO.select takes a set of sockets and waits until it's possible to read or write with them (or if error happens). It returns sockets event happened with.
2) array contains sockets that are checked for events. In your case you specify only sockets for reading.
3) IO.select returns an array of arrays of sockets. Element 0 contains sockets you can read from, element 1 - sockets you can write to and element 2 - sockets with errors.
After getting list of sockets you can read the data.
4) yes, recv_nonblock argument is size in byte. Note that size of data actually being read may be less than 1024, in this case you may need to repeat select (if actual data matters for you).

Related

Read entire message from a TCPSocket without hanging

I'm putting together a TCPServer in Ruby 3.0.2 and I'm finding that I can't seem to read the entire packet without blocking (until the socket is closed).
Edit: There was some confusion on what I was trying to do - my bad - so just to help clarify: I wanted to read everything that had been sent over the TCP connection so far. (end edit)
My first try was:
#!/snap/bin/ruby
require 'socket'
server = TCPServer.new('localhost', 4200)
loop {
Thread.start(server.accept) do |connection|
puts connection.gets # The important line
end
}
But that hangs until the client closes the connection. Okay, so I take a look at connection.methods, and the ruby docs and try a bunch of options that seem promising. Basically, there is two types of read methods: blocking and nonblocking.
The blocking methods that I tried are .read, .gets, .readlines, .readline, .recv, and .recvmsg. Now .read, .readlines, and .gets all hang (until the socket is closed) - so that's not helpful. The other ones (eg. .readline, the recv methods) don't read the entire message. Now, I could read each line until I see an empty line and parse the HTTP header from there. But there's got to be a better way; I don't want to have to worry about getting a corrupted message and hanging because I didn't read an empty line at the end of the header.
So I went looking at the non-blocking options. Specifically .recv_nonblock and .recvmsg_nonblock. Both of these throw errors (Resource temporarily unavailable - recvfrom(2) would block and Resource temporarily unavailable - recvmsg(2) respectively).
Any ideas on what could be going on? I think it has something to with me using Ruby 3, because trying out the code on Ruby 2.5, client.gets returns a line (doesn't hang), although .readlines does hang - so not sure what's going on.
Ideally, I could just call something along the lines of client.get_message and I would get the entire message that has been sent, but I'd also be okay with working at the TCP level and getting the packet size, reading that size, and reconstructing the message from there.
TCP just transmits the bytes that you write to the socket, and guarantees that the are received in the order they were sent. If you have the concept of a 'message' then you'll need to add that into your server and client.
.gets specifically will block until it reads a new 'line', or whatever you define as the separator for the string - see the docs IO#gets. This means that until your server receives that byte from the client, it will block.
In your client have a look at how you're writing your data - if you're using ruby then puts would work, as it will terminate the string with a new line. If you're using write then it will only write the string without a new line
Ie.
# client.rb
c = TCPSocket.new 'localhost', 5000
c.puts "foo"
c.write "bar"
c.write "baz\n"
# server.rb
s = TCPServer.new 5000
loop do
client = s.accept
puts client.gets
puts client.gets
end
will output
foo
barbaz
Thanks to everyone who commented/answered, but I found the solution that I think was intended by the creators of the Socket class!
The recv_nonblock method takes some optional arguments - one of which is a buffer that the Socket will store what it has read to. So a call like client.recv_nonblock(1000, 0, buffer) stores up to 1000 characters from the Socket into buffer and then exits instead of blocking.
Just to make life easy, I put together a monkey patch to the TCPSocket class:
class TCPSocket
def eat_buffer
contents = ''
buffer = ''
begin
loop {
recv_nonblock(256, 0, buffer)
contents += buffer
}
rescue IO::EAGAINWaitReadable
contents
end
end
end
The point that Steffen makes in the comments is well taken - TCP isn't designed to be used this way. This is a hacky (in the bad sense) method, and should be avoided.

Ruby socket not returning data on second read

I have an asic computer in my house that I don't really have control over, but I can talk to its API over TCP (CGminer OS). I'm trying to record data from it:
socket = TCPSocket.open(address, port)
loop do
sleep 1
socket.write(command)
response = socket.read
end
The first iteration of this loop returns the data as expected, the second is an empty string. I'm pretty clueless about sockets and not sure what I need to do. I know I can reopen the socket each iteration if I have to, I'm just hoping I don't need to.
Solution is to just reopen the socket.
loop do
socket = TCPSocket.open(address, port)
response = socket.read
socket.close
end

How to wait until TCP socket response is ready

I'm connecting to a TCP server using Ruby's TCPSocket class.
I send some data about an address and I must wait for the server to do some processing to give me the geocoding of said address. Since the process in the server takes some time, I cannot read the response immediately.
When I used socket.readpartial() I got a response of two white spaces.
I temporarily solved this using sleep(5) but I don't like this at all, because it is hackish and clumsy, and I risk that even after 5 seconds the response is not ready and I still get an empty response.
I know that the responses will always be 285 characters long.
Is there a more correct and elegant way of having my TCP socket wait for the full response?
Here's my code:
def matchgeocode(rua, nro, cidade, uf)
count = 0
begin
socket = TCPSocket.new(GEOCODER_URL, GEOCODER_PORT)
# Needed for authentication
socket.write("TICKET #{GEOCODER_TICKET}")
socket.read(2)
# Here's the message I send to the server
socket.write("MATCHGEOCODE -Rua:\"#{rua}\" -Nro:#{nro} -Cidade:\"#{cidade}\" -Uf:\"#{uf}\"")
# My hackish sleep
sleep(5)
# Reading the fixed size response
response = socket.readpartial(285)
socket.write('QUIT')
socket.close
rescue Exception => e
count += 1
puts e.message
if count <= 5 && response.eql?('')
retry
end
end
response
end
Since you know the length of the response you should use read, not readpartial.
readpartial returns immediately if ANY data is available, even one byte is enough. That's why you need the sleep call so that the response has time to return to you before readpartial tries to peek at what data is present.
read on the other hand blocks completely until ALL requested data is available. Since you know the length of the result then read is the natural solution here.

Set socket timeout in Ruby via SO_RCVTIMEO socket option

I'm trying to make sockets timeout in Ruby via the SO_RCVTIMEO socket option however it seems to have no effect on any recent *nix operating system.
Using Ruby's Timeout module is not an option as it requires spawning and joining threads for each timeout which can become expensive. In applications that require low socket timeouts and which have a high number of threads it essentially kills performance. This has been noted in many places including Stack Overflow.
I've read Mike Perham's excellent post on the subject here and in an effort to reduce the problem to one file of runnable code created a simple example of a TCP server that will receive a request, wait the amount of time sent in the request and then close the connection.
The client creates a socket, sets the receive timeout to be 1 second, and then connects to the server. The client tells the server to close the session after 5 seconds then waits for data.
The client should timeout after one second but instead successfully closes the connection after 5.
#!/usr/bin/env ruby
require 'socket'
def timeout
sock = Socket.new(Socket::AF_INET, Socket::SOCK_STREAM, 0)
# Timeout set to 1 second
timeval = [1, 0].pack("l_2")
sock.setsockopt Socket::SOL_SOCKET, Socket::SO_RCVTIMEO, timeval
# Connect and tell the server to wait 5 seconds
sock.connect(Socket.pack_sockaddr_in(1234, '127.0.0.1'))
sock.write("5\n")
# Wait for data to be sent back
begin
result = sock.recvfrom(1024)
puts "session closed"
rescue Errno::EAGAIN
puts "timed out!"
end
end
Thread.new do
server = TCPServer.new(nil, 1234)
while (session = server.accept)
request = session.gets
sleep request.to_i
session.close
end
end
timeout
I've tried doing the same thing with a TCPSocket as well (which connects automatically) and have seen similar code in redis and other projects.
Additionally, I can verify that the option has been set by calling getsockopt like this:
sock.getsockopt(Socket::SOL_SOCKET, Socket::SO_RCVTIMEO).inspect
Does setting this socket option actually work for anyone?
You can do this efficiently using select from Ruby's IO class.
IO::select takes 4 parameters. The first three are arrays of sockets to monitor and the last one is a timeout (specified in seconds).
The way select works is that it makes lists of IO objects ready for a given operation by blocking until at least one of them is ready to either be read from, written to, or wants to raise an error.
The first three arguments therefore, correspond to the different types of states to monitor.
Ready for reading
Ready for writing
Has pending exception
The fourth is the timeout you want to set (if any). We are going to take advantage of this parameter.
Select returns an array that contains arrays of IO objects (sockets in this case) which are deemed ready by the operating system for the particular action being monitored.
So the return value of select will look like this:
[
[sockets ready for reading],
[sockets ready for writing],
[sockets raising errors]
]
However, select returns nil if the optional timeout value is given and no IO object is ready within timeout seconds.
Therefore, if you want to do performant IO timeouts in Ruby and avoid having to use the Timeout module, you can do the following:
Let's build an example where we wait timeout seconds for a read on socket:
ready = IO.select([socket], nil, nil, timeout)
if ready
# do the read
else
# raise something that indicates a timeout
end
This has the benefit of not spinning up a new thread for each timeout (as in the Timeout module) and will make multi-threaded applications with many timeouts much faster in Ruby.
I think you're basically out of luck. When I run your example with strace (only using an external server to keep the output clean), it's easy to check that setsockopt is indeed getting called:
$ strace -f ruby foo.rb 2>&1 | grep setsockopt
[pid 5833] setsockopt(5, SOL_SOCKET, SO_RCVTIMEO, "\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 16) = 0
strace also shows what's blocking the program. This is the line I see on the screen before the server times out:
[pid 5958] ppoll([{fd=5, events=POLLIN}], 1, NULL, NULL, 8
That means that the program is blocking on this call to ppoll, not on a call to recvfrom. The man page that lists socket options (socket(7)) states that:
Timeouts have no effect for select(2), poll(2), epoll_wait(2), etc.
So the timeout is being set but has no effect. I hope I'm wrong here, but it seems there's no way to change this behavior in Ruby. I took a quick look at the implementation and didn't find an obvious way out. Again, I hope I'm wrong -- this seems to be something basic, how come it's not there?
One (very ugly) workaround is by using dl to call read or recvfrom directly. Those calls are affected by the timeout you set. For example:
require 'socket'
require 'dl'
require 'dl/import'
module LibC
extend DL::Importer
dlload 'libc.so.6'
extern 'long read(int, void *, long)'
end
sock = Socket.new(Socket::AF_INET, Socket::SOCK_STREAM, 0)
timeval = [3, 0].pack("l_l_")
sock.setsockopt Socket::SOL_SOCKET, Socket::SO_RCVTIMEO, timeval
sock.connect( Socket.pack_sockaddr_in(1234, '127.0.0.1'))
buf = "\0" * 1024
count = LibC.read(sock.fileno, buf, 1024)
if count == -1
puts 'Timeout'
end
This code works here. Of course: it's an ugly solution, which won't work on many platforms, etc. It may be a way out though.
Also please notice that this is the first time I do something similar in Ruby, so I'm not aware of all the pitfalls I may be overlooking -- in particular, I'm suspect of the types I specified in 'long read(int, void *, long)' and of the way I'm passing a buffer to read.
Based on my testing, and Jesse Storimer's excellent ebook on "Working with TCP Sockets" (in Ruby), the timeout socket options do not work in Ruby 1.9 (and, I presume 2.0 and 2.1). Jesse says:
Your operating system also offers native socket timeouts that can be set via the
SNDTIMEO and RCVTIMEO socket options. But, as of Ruby 1.9, this feature is no longer
functional."
Wow. I think the moral of the story is to forget about these options and use IO.select or Tony Arcieri's NIO library.

Ruby TCPSocket: Find out how much data is available

Is there a way to find out how many bytes of data is available on an TCPSocket in Ruby? I.e. how many bytes can be ready without blocking?
The standard library io/wait might be useful here. Requiring it gives stream-based I/O (sockets and pipes) some new methods, among which is ready?. According to the documentation, ready? returns non-nil if there are bytes available without blocking. It just so happens that the non-nil value it returns it the number of bytes that are available in MRI.
Here's an example which creates a dumb little socket server, and then connects to it with a client. The server just sends "foo" and then closes the connection. The client waits a little bit to give the server time to send, and then prints how many bytes are available for reading. The interesting stuff for you is in the client:
require 'socket'
require 'io/wait'
# Server
server_socket = TCPServer.new('localhost', 0)
port = server_socket.addr[1]
Thread.new do
session = server_socket.accept
sleep 0.5
session.puts "foo"
session.close
end
# Client
client_socket = TCPSocket.new('localhost', port)
puts client_socket.ready? # => nil
sleep 1
puts client_socket.ready? # => 4
Don't use that server code in anything real. It's deliberately short in order to keep the example simple.
Note: According to the Pickaxe book, io/wait is only available if "FIONREAD feature in ioctl(2)", which it is in Linux. I don't know about Windows & others.

Resources