Behaviour of recv on a non-blocking TCP socket before connection completion - nonblocking

Friends,
I have a non-blocking TCP socket (on AIX). When I attempted connect() I got EINPROGRESS. My question is if I call recv() before connection completion, what would be the (most apprpriate) error code?
I saw that, in case connection fails, and I call recv(), I got ECONNREFUSED; means I got an error corresponding to my earlier connect() attempt. Taking same logic I should get EINPROGRESS for recv(). Am I right in my approach ?
If yes, this raises another question - Why such error codes are not included amongst error codes of recv()?

I have only seen EAGAIN returned in this case, just as you would see in the case where there's no data to read. For writing to a non-connected socket, you typically get ENOTCONN, though I believe some platforms might give you EAGAIN.
Here's a trivial Python script to demonstrate:
import socket
# Any address that does not succeed or fail right away will do
ADDR = "192.168.100.100"
PORT = 23
s = socket.socket()
s.setblocking(False)
try:
s.connect((ADDR, PORT))
except socket.error, e:
print "Connect gave us",e
try:
s.recv(1)
except socket.error, e:
print "Read gave us",e
try:
s.send("x")
except socket.error, e:
print "Write gave us",e
For me, it gives:
Connect gave us (36, 'Operation now in progress')
Read gave us (35, 'Resource temporarily unavailable')
Write gave us (57, 'Socket is not connected')
These are EINPROGRESS, EAGAIN, and ENOTCONN respectively.

You are operating on a non-blocking socket, which is perfect fine to return EINPROGRESS, which indict that the connection establishment has not being finished yet, this is documented in connect's page:
EINPROGRESS
The socket is nonblocking and the connection cannot be completed immediately. It is possible to select(2) or poll(2) for completion by
selecting the socket for writing. After select(2) indicates writability, use getsockopt(2) to read the SO_ERROR option at level SOL_SOCKET
to determine whether connect() completed successfully (SO_ERROR is zero) or unsuccessfully (SO_ERROR is one of the usual error codes listed
here, explaining the reason for the failure).
So you will need select/pool to make sure the socket is writable, and get error from SO_ERROR.

Related

Trying to send a FIX api message to ctrader server using Ruby but receiving no response

Trying to see if I can get a response from ctrader server.
Getting no response and seems to hang at "s.recv(1024)". So not sure what could be going wrong here. I have limited experience with sockets and network coding.
I have checked my login credentials and all seems ok.
Note: I am aware of many FIX engines that are available for this purpose but wanted to
try this on my own.
ctrader FIX guides
require 'socket'
hostname = "h51.p.ctrader.com"
port = 5201
#constructing a fix message to see what ctrader server returns
#8=FIX.4.4|9=123|35=A|49=demo.ctrader.*******|56=cServer|57=QUOTE|50=QUOTE|34=1|52=20220127-16:49:31|98=0|108=30|553=********|554=*******|10=155|
fix_message = "8=FIX.4.4|9=#{bodylengthsum}|" + bodylength + "10=#{checksumcalc}|"
s = TCPSocket.new(hostname, port)
s.send(fix_message.force_encoding("ASCII"),0)
print fix_message
puts s.recv(1024)
s.close
Sockets are by default blocking on read. When you call recv that call will block if no data is available.
The fact that your recv call is not returning anything, would be an indication that the server did not send you any reply at all; the call is blocking waiting for incoming data.
If you would use read instead, then the call will block until all the requested data has been received.
So calling recv(1024) will block until 1 or more bytes are available.
Calling read(1024) will block until all 1024 bytes have been received.
Note that you cannot rely on a single recv call to return a full message, even if the sender sent you everything you need. Multiple recv calls may be required to construct the full message.
Also note that the FIX protocol gives the msg length at the start of each message. So after you get enough data to see the msg length, you could call read to ensure you get the rest.
If you do not want your recv or read calls to block when no data (or incomplete data) is available, then you need to use non-blocking IO instead for your reads. This is complex topic, which you need to research, but often used when you don't want to block and need to read arbitary length messages. You can look here for some tips.
Another option would be to use something like EventMachine instead, which makes it easier to deal with sockets in situations like this, without having to worry about blocking in your code.

Raise exception when TCP connection broken

I'm building a server which accepts connections through TCP (using TCPServer). I mostly just read data (socket.gets.chomp) and write data (socket.print).
socket.gets will return nil if the connection has been closed by the client in the meantime, so .chomp will raise NoMethodError. This is hard to handle specifically since it's such an unspecific exception - I want to distinguish exceptions caused by the connection loss from other causes of NoMethodError, such as me typoing a method.
Ideally, I would receive something more specific such as SocketError whenever trying to interact with a closed socket, rather than just getting back nil. How could I accomplish that?
I have already considered these options:
Write a wrapper for TCPSocket or IO which checks on socket availability before every call (a lot of work to do cleanly considering how many methods there are in IO)
Check each return value for nil (even more effort and code redundancy as my application grows, also I would still .print to the socket when it's already closed)
Monkey patching NilClass for chomp (again only handles this specific use case, and monkey patching should be avoided for clean code)
Being at end of file is not intrinsically an error, nor is it normally understood to mean a "broken" connection like your title says.
For example, HTTP allows multiple requests to be sent over a single connection. After completely reading a request you can read again, and if the connection is closed you'd get nil, which tells you there are no more requests coming. This situation isn't considered an error condition by most/all HTTP software.
Most Ruby software handles nil return from read as an indication that the network conversation is over (successfully). I suggest you do something like that.
If you wish to consider EOF an error, you could create a wrapper class for IO that would "upgrade" nil return from read into an exception of some kind, but I would suggest rethinking whether this is really what you need.
See also https://ruby-doc.org/core-3.0.0/IO.html#method-i-read.

How does ZeroMQ connect and bind work internally

I am experimenting with ZeroMQ. And I found it really interesting that in ZeroMQ, it does not matter whether either connect or bind happens first. I tried looking into the source code of ZeroMQ but it was too big to find anything.
The code is as follows.
# client side
import zmq
ctx = zmq.Context()
socket = ctx.socket(zmq.PAIR)
socket.connect('tcp://*:2345') # line [1]
# make it wait here
# server side
import zmq
ctx = zmq.Context()
socket = ctx.socket(zmq.PAIR)
socket.bind('tcp://localhost:2345')
# make it wait here
If I start client side first, the server has not been started yet, but magically the code is not blocked at line [1]. At this point, I checked with ss and made sure that the client is not listening on any port. Nor does it have any open connection. Then I start the server. Now the server is listening on port 2345, and magically the client is connected to it. My question is how does the client know the server is now online?
The best place to ask your question is the ZMQ mailing list, as many of the developers (and founders!) of the library are active there and can answer your question directly, but I'll give it a try. I'll admit that I'm not a C developer so my understanding of the source is limited, but here's what I gather, mostly from src/tcp_connector.cpp (other transports are covered in their respective files and may behave differently).
Line 214 starts the open() method, and here looks to be the meat of what's going on.
To answer your question about why the code is not blocked at Line [1], see line 258. It's specifically calling a method to make the socket behave asynchronously (for specifics on how unblock_socket() works you'll have to talk to someone more versed in C, it's defined here).
On line 278, it attempts to make the connection to the remote peer. If it's successful immediately, you're good, the bound socket was there and we've connected. If it wasn't, on line 294 it sets the error code to EINPROGRESS and fails.
To see what happens then, we go back to the start_connecting() method on line 161. This is where the open() method is called from, and where the EINPROGRESS error is used. My best understanding of what's happening here is that if at first it does not succeed, it tries again, asynchronously, until it finds its peer.
I think the best answer is in zeromq wiki
When should I use bind and when connect?
As a very general advice: use bind on the most stable points in your architecture and connect from the more volatile endpoints. For request/reply the service provider might be point where you bind and the client uses connect. Like plain old TCP.
If you can't figure out which parts are more stable (i.e. peer-to-peer) think about a stable device in the middle, where boths sides can connect to.
The question of bind or connect is often overemphasized. It's really just a matter of what the endpoints do and if they live long — or not. And this depends on your architecture. So build your architecture to fit your problem, not to fit the tool.
And
Why do I see different behavior when I bind a socket versus connect a socket?
ZeroMQ creates queues per underlying connection, e.g. if your socket is connected to 3 peer sockets there are 3 messages queues.
With bind, you allow peers to connect to you, thus you don't know how many peers there will be in the future and you cannot create the queues in advance. Instead, queues are created as individual peers connect to the bound socket.
With connect, ZeroMQ knows that there's going to be at least a single peer and thus it can create a single queue immediately. This applies to all socket types except ROUTER, where queues are only created after the peer we connect to has acknowledge our connection.
Consequently, when sending a message to bound socket with no peers, or a ROUTER with no live connections, there's no queue to store the message to.
When you call socket.connect('tcp://*:2345') or socket.bind('tcp://localhost:2345') you are not calling these methods directly on an underlying TCP socket. All of ZMQ's IO - including connecting/binding underlying TCP sockets - happens in threads that are abstracted away from the user.
When these methods are called on a ZMQ socket it essentially queues these events within the IO threads. Once the IO threads begin to process them they will not return an error unless the event is truly impossible, otherwise they will continually attempt to connect/reconnect.
This means that a ZMQ socket may return without an error even if socket.connect is not successful. In your example it would likely fail without error but then quickly reattempt and succeeded if you were to run the server side of script.
It may also allow you to send messages while in this state (depending on the state of the queue in this situation, rather than the state of the network) and will then attempt to transmit queued messages once the IO threads are able to successfully connect. This also includes if a working TCP connection is later lost. The queues may continue to accept messages for the unconnected socket while IO attempts to automatically resolve the lost connection in the background. If the endpoint takes a while to come back online it should still receive it's messages.
To better explain here's another example
<?php
$pid = pcntl_fork();
if($pid)
{
$context = new ZMQContext();
$client = new ZMQSocket($context, ZMQ::SOCKET_REQ);
try
{
$client->connect("tcp://0.0.0.0:9000");
}catch (ZMQSocketException $e)
{
var_dump($e);
}
$client->send("request");
$msg = $client->recv();
var_dump($msg);
}else
{
// in spawned process
echo "waiting 2 seconds\n";
sleep(2);
$context = new ZMQContext();
$server = new ZMQSocket($context, ZMQ::SOCKET_REP);
try
{
$server->bind("tcp://0.0.0.0:9000");
}catch (ZMQSocketException $e)
{
var_dump($e);
}
$msg = $server->recv();
$server->send("response");
var_dump($msg);
}
The binding process will not begin until 2 seconds later than the connecting process. But once the child process wakes and successfully binds the req/rep transaction will successfully take place without error.
jason#jason-VirtualBox:~/php-dev$ php play.php
waiting 2 seconds
string(7) "request"
string(8) "response"
If I was to replace tcp://0.0.0.0:9000 on the binding socket with tcp://0.0.0.0:2345 it will hang because the client is trying to connect to tcp://0.0.0.0:9000, yet still without error.
But if I replace both with tcp://localhost:2345 I get an error on my system because it can't bind on localhost making the call truly impossible.
object(ZMQSocketException)#3 (7) {
["message":protected]=>
string(38) "Failed to bind the ZMQ: No such device"
["string":"Exception":private]=>
string(0) ""
["code":protected]=>
int(19)
["file":protected]=>
string(28) "/home/jason/php-dev/play.php"
["line":protected]=>
int(40)
["trace":"Exception":private]=>
array(1) {
[0]=>
array(6) {
["file"]=>
string(28) "/home/jason/php-dev/play.php"
["line"]=>
int(40)
["function"]=>
string(4) "bind"
["class"]=>
string(9) "ZMQSocket"
["type"]=>
string(2) "->"
["args"]=>
array(1) {
[0]=>
string(20) "tcp://localhost:2345"
}
}
}
["previous":"Exception":private]=>
NULL
}
If your needing real-time information for the state of underlying sockets you should look into socket monitors. Using socket monitors along with the ZMQ poll allows you to poll for both socket events and queue events.
Keep in mind that polling a monitor socket using ZMQ poll is not similar to polling a ZMQ_FD resource via select, epoll, etc. The ZMQ_FD is edge triggered and therefor doesn't behave the way you would expect when polling network resources, where a monitor socket within ZMQ poll is level triggered. Also, monitor sockets are very light weight and latency between the system event and the resulting monitor event is typically sub microsecond.

APNs error handling in ruby

I want to send notifications to apple devices in batches (1.000 device tokens in batch for example). Ant it seems that I can't know for sure that message was delivered to APNs.
Here is the code sample:
ssl_connection(bundle_id) do |ssl, socket|
device_tokens.each do |device_token|
ssl.write(apn_message_for device_token)
# I can check if there is an error response from APNs
response_has_an_error = IO.select([socket],nil,nil,0) != nil
# ...
end
end
The main problem is if network is down after the ssl_connection is established
ssl.write(...)
will never raise an error. Is there any way to ckeck that connection still works?
The second problem is in delay between ssl.write and ready error answer from APNs. I can pass timeout parameter to IO.select after last messege was sent. Maybe It's OK to wait for a few seconds for 1.000 batch, but wat if I have to send 1.000 messages for differend bundle_ids?
At https://zeropush.com, we use a gem named grocer to handle our communication with Apple and we had a similar problem. The solution we found was to use the socket's read_non_block method before each write to check for incoming data on the socket which would indicate an error.
It makes the logic a bit funny because read_non_block throws IO::WaitReadable if there is no data to read. So we call read_non_block and catch IO::WaitReadable before continuing as normal. In our case, catching the exception is the happy case. You may be able to use a similar approach rather than using IO.select(...).
One issue to be aware of is that Apple may not respond immediately and any notifications sent between a failing notification and reading from the socket will be lost.
You can see the code we are using in production at https://github.com/SymmetricInfinity/grocer/blob/master/lib/grocer/connection.rb#L30.

Why is gevent.sleep(0.1) necessary in this example to prevent the app from blocking?

I'm pulling my hair out over this one. I'm trying to get the simplest of examples working with zeromq and gevent. I changed this script to use PUB/SUB sockets and when I run it the 'server' socket loops forever. If I uncomment the gevent.sleep(0.1) line then it works as expected and yields to the other green thread, which in this case is the client.
The problem is, why should I have to manually add a sleep call? I thought when I import the zmq.green version of zmq that the send and receive calls are non blocking and underneath do the task switching.
In other words, why should I have to add the gevent.sleep() call to get this example working? In Jeff Lindsey's original example, he's doing REQ/REP sockets and he doesn't need to add sleep calls...but when I changed this to PUB/SUB I need it there for this to yield to the client for processing.
#Notes: Code taken from slide: http://www.google.com/url?sa=t&rct=j&q=zeromq%20gevent&source=web&cd=27&ved=0CFsQFjAGOBQ&url=https%3A%2F%2Fraw.github.com%2Fstrangeloop%2F2011-slides%2Fmaster%2FLindsay-DistributedGeventZmq.pdf&ei=JoDNUO6OIePQiwK8noHQBg&usg=AFQjCNFa5g9ZliRVoN_yVH7aizU_fDMtfw&bvm=bv.1355325884,d.cGE
#Jeff Lindsey talk on gevent and zeromq
import gevent
from gevent import spawn
import zmq.green as zmq
context = zmq.Context()
def serve():
print 'server online'
socket = context.socket(zmq.PUB)
socket.bind("ipc:///tmp/jeff")
while True:
print 'send'
socket.send("World")
#gevent.sleep(0.1)
def client():
print 'client online'
socket = context.socket(zmq.SUB)
socket.connect("ipc:///tmp/jeff")
socket.setsockopt(zmq.SUBSCRIBE, '')
while True:
print 'recv'
message = socket.recv()
cl = spawn(client)
server = spawn(serve)
print 'joinall'
gevent.joinall([cl, server])
print 'end'
I thought when I import the zmq.green version of zmq that the send and receive calls are non blocking and underneath do the task switching.
zmq.green will only yield if these calls would block, it does not yield if they are ready (there's nothing to wait for). In your case the sender is always ready, so it never has a reason to yield.
Some pointers:
a minimal explicit yield is gevent.sleep(0), it doesn't need to be finite.
zmq.green only yields on blocking calls. That is, if a socket is always ready to send/recv when you ask it to, it will never yield.
socket.send only blocks when the socket is not ready to send (not (socket.events & zmq.POLLOUT)),
which can never actually be true of a PUB socket (you will see it at HWM for PUSH, DEALER, etc.).
in general, don't trust send to yield, because of the way zeromq works this will rarely be the case unless
you are exceeding the capacity of your configuration.
unlike send, recv regularly blocks in normal usage, so it yields on most calls. But if a peer is flooding your incoming buffer, repeated recv calls will not yield until there is nothing ready to receive, so you may again need to explicitly yield every so often to prevent starvation.
What zmq.green amounts to is turning send/recv into:
try:
socket.send(msg, zmq.NOBLOCK) # or recv
except zmq.ZMQError as e:
if e.errno == zmq.EAGAIN:
yield # and wait for socket to be ready, then try again
so if send/recv with NOBLOCK are always succeeding, the socket never yields.
To put it another way: If a socket has nothing to wait for, it won't wait.

Resources