Is it granteed the FD_CLOSE event are only posted when there is no data buffered in socket? - windows

We using the WSAEventSelect to bind an socket with an event. And From the MSDN
The FD_CLOSE network event is recorded when a close indication is
received for the virtual circuit corresponding to the socket. In TCP
terms, this means that the FD_CLOSE is recorded when the connection
goes into the TIME WAIT or CLOSE WAIT states. This results from the
remote end performing a shutdown on the send side or a closesocket.
FD_CLOSE being posted after all data is read from a socket. An
application should check for remaining data upon receipt of FD_CLOSE
to avoid any possibility of losing data. For more information, see the
section on Graceful Shutdown, Linger Options, and Socket Closure and
the shutdown function.
Seams the first highlight sentence means the FD_CLOSE will only been posted after all data is read from socket. But the second sentence require an application need to check if there is data in socket when received FD_CLOSE.
Isn't it conflict? How to understand it?

Unfortunately there is a lot of speculation and very little official word. My understanding is the following:
FD_CLOSE being posted after all data is read from a socket.
Edit: My original response here appears to be false. I believe this statement to be referring to a specific type of socket closure, but there doesn't seem to be agreement on exactly what. It is expected that this should be true, but experience shows that it will not always be.
An application should check for remaining data upon receipt of FD_CLOSE to avoid any possibility of losing data.
There may be data still available at the point your application code receives the FD_CLOSE event. In fact, reading around indicates that new data may become available at the socket after you have received the FD_CLOSE. You should check for this data in order to avoid losing it. I've seen some people implement recv loops until the recv call fails (indicating the socket is actually closed) or even restart the event loop waiting for more FD_READs. I think in the general case you can simply attempt a recv with a sufficiently large buffer and assume nothing more will arrive.

Related

Purpose of zeromq send high watermark

The first time I skimmed the zeromq docs, I assumed that the sender high watermark was there to ensure that the sender did not get too far ahead of the receiver. Now that I'm looking at it more carefully, it seems that this can't possibly be true, since the wire protocol doesn't have any concept of ACKs so the sender can't know whether the receiver is keeping up or is way behind. After staring at jeromq code in the debugger for way too long, it seems that the watermark is actually a purely "within-same-process" mechanism to ensure that the application thread that's writing to the ZMQ socket does not get too far ahead of the background thread that's responsible for taking messages off the ZMQ socket and writing bytes into the OS's TCP socket.
It seems like a rather fringe thing to worry about, relative to how much attention it's given in the docs. It doesn't even seem like a great way to control memory usage, because if you have a high water mark of 10, then 15 messages of 2kb each is not allowed, but 5 messages of 100 megs each is allowed, so things are still pretty un-predictable.
Am I understanding all this correctly or am I hopelessly confused.
I think that another thing that says it's not to prevent a sender getting too far ahead of the receiver is that if one set the HWM to 0, that's taken as infinity not actually zero. For 0 to mean zero, it'd have to have some too-ing and fro-ing with the receiver to know whether the socket was actually empty throughout the whole connection.
I wish that 0 did mean zero, because then ZeroMQ could implement both Actor Model and Communicating Sequential Processes architectures. But it doesn't, so it can't.
Possible Uses
None the less, a potential useful aspect is related to the fact that ZeroMQ is Actor Model. Suppose one were sending messages, and it kind of mattered whether or not those messages got through. In the situation where the link has collapsed (something that ZeroMQ's heartbeat can tell you, pretty quickly), messages already sent are potentially lost forever. However, if the HWM is being used to throttle the rate of messages being sent by the application, then the number of lost messages when the link breaks is minimised.
Obviously with CSP - the perfect architecture so far as I'm concerned! - you lose no messages (because the acts of sending and receiving are an execution rendezvous; the send won't complete until the receive has also completed).
What I have done in the past is to queue up messages for transmission in the sending application, sending them as and when the socket / connection can ingest them. Having the outbound message queue in the sending application's control (instead of in ZeroMQ's control) means that sender state can potentially get ahead of the transfer of messages, but still recover easily from a network connection fault.
I have written systems where a sender has a choice of two pathways to send messages through - prime and spare - and if the link to prime has collapsed the sender continues to send to spare instead. Having queued the messages inside the application and not in the socket allows the sender's state can get ahead of the actual transfer of messages, knowing that if a link goes down it's still got all the unsent outboud messages that have been generated in the meantime. These can then be directed at spare instead, without having to rewind the sender's internal state (which could be really tricky) to the last known successful transfer.
Something like that, anyway.
"Why not send to both prime and spare anyway?" is a valid question. Well, sometimes things can be complicated...

C++ IRC Client design

I'm attempting to write an RFC 2812 compliant C++ IRC library.
I am having some trouble with the design of the client itself.
From what I have read IRC communication tends to be asynchronous.
I am using boost::asio::async_read and boost::asio::async_write.
From reading the documentation I have gathered that you cannot perform multiple async_write requests before one is completed. You therefore end up with rather nested callbacks. Doesn't this defeat the purpose of doing async calls? Wouldn't it just be better to use synchronous calls to prevent the nesting? If not, why?
Secondly, if I am not mistaken, each boost::asio::async_write should be followed up by a boost::asio::async_read to receive the server's response to the commands sent. My client's functions, therefore, would need to take a callback parameter so a user of the class may do something after the client receives a response (ex. send another message...).
If I were to continue implementing this with async, should I keep a std::deque<std::tuple<message, callback>> and each time a boost::asio::async_write is finished, and there is a tuple in the queue, dequeue and send the message then raise the callback? Would this be the optimal way to implement this system?
I'm thinking since messages are sent all the time I'm going to have to implement some kind of listener loop that queues up responses, but how would you associate these responses with the specific command that triggered them? Or in the case that the response is just a message to the channel from another user?
The IRC protocol is a full-duplex protocol. As such, one should always be listening to the server connection expecting commands to process. It could be argued that one should primarily use the messages received from the server to update state, rather than correlating request and responses, as the server may not respond to a command or may respond much later than expected. For example, one may issue a WHOIS command, but receive multiple PRIVMSG commands before receiving a response to WHOIS. For a chat client, a user would likely expect being able to receive chat messages while waiting for a response to WHOIS. Hence, having a async_write() to async_read() call chain may not be ideal in handling the protocol.
For a given socket, the Asio documentation does recommend not initiating additional read operations if there is an outstanding composed read operation and not initiating additional write operations if there is an outstanding composed write operation. Queuing up messages and having an asynchronous call chains process from the queue is a great way to fulfill this recommendation. Consider reading this answer for a nice solution using a queue and an asynchronous call chain.
Also, be aware that the server may send a PING command even on an active connection. When the client is responding with a PONG command, it may be necessary to insert the PONG command near the front of the outbound queue so that it gets sent out as soon as possible.
Doesn't this defeat the purpose of doing async calls?
The usual solution is to use strands:
Why do I need strand per connection when using boost::asio?
You are free to queue multiple asynchronous operations on the same io objects using an (implicit) strand¹.
Using a strand ensures that the completion handlers are invoked on that same logical thread.
On the Protocol
You could indeed keep a queue of commands and await responses for each command before sending the next.
You might be a little bit smarter about this if you can spot the correlation due the different type of reply, but then you'd need to keep queues per type of command. I'd consider that premature optimization.

boost::asio sending data faster than receiving over TCP. Or how to disable buffering

I have created a client/server program, the client starts
an instance of Writer class and the server starts an instance of
Reader class. Writer will then write a DATA_SIZE bytes of data
asynchronously to the Reader every USLEEP mili seconds.
Every successive async_write request by the Writer is done
only if the "on write" handler from the previous request had
been called.
The problem is, If the Writer (client) is writing more data into the
socket than the Reader (server) is capable of receiving this seems
to be the behaviour:
Writer will start writing into (I think) system buffer and even
though the data had not yet been received by the Reader it will be
calling the "on write" handler without an error.
When the buffer is full, boost::asio won't fire the "on write"
handler anymore, untill the buffer gets smaller.
In the meanwhile, the Reader is still receiving small chunks
of data.
The fact that the Reader keeps receiving bytes after I close
the Writer program seems to prove this theory correct.
What I need to achieve is to prevent this buffering because the
data need to be "real time" (as much as possible).
I'm guessing I need to use some combination of the socket options that
asio offers, like the no_delay or send_buffer_size, but I'm just guessing
here as I haven't had success experimenting with these.
I think that the first solution that one can think of is to use
UDP instead of TCP. This will be the case as I'll need to switch to
UDP for other reasons as well in the near future, but I would
first like to find out how to do it with TCP just for the sake
of having it straight in my head in case I'll have a similar
problem some other day in the future.
NOTE1: Before I started experimenting with asynchronous operations in asio library I had implemented this same scenario using threads, locks and asio::sockets and did not experience such buffering at that time. I had to switch to the asynchronous API because asio does not seem to allow timed interruptions of synchronous calls.
NOTE2: Here is a working example that demonstrates the problem: http://pastie.org/3122025
EDIT: I've done one more test, in my NOTE1 I mentioned that when I was using asio::iosockets I did not experience this buffering. So I wanted to be sure and created this test: http://pastie.org/3125452 It turns out that the buffering is there event with asio::iosockets, so there must have been something else that caused it to go smoothly, possibly lower FPS.
TCP/IP is definitely geared for maximizing throughput as intention of most network applications is to transfer data between hosts. In such scenarios it is expected that a transfer of N bytes will take T seconds and clearly it doesn't matter if receiver is a little slow to process data. In fact, as you noticed TCP/IP protocol implements the sliding window which allows the sender to buffer some data so that it is always ready to be sent but leaves the ultimate throttling control up to the receiver. Receiver can go full speed, pace itself or even pause transmission.
If you don't need throughput and instead want to guarantee that the data your sender is transmitting is as close to real time as possible, then what you need is to make sure the sender doesn't write the next packet until he receives an acknowledgement from the receiver that it has processed the previous data packet. So instead of blindly sending packet after packet until you are blocked, define a message structure for control messages to be sent back from the receiver back to the sender.
Obviously with this approach, your trade off is that each sent packet is closer to real-time of the sender but you are limiting how much data you can transfer while slightly increasing total bandwidth used by your protocol (i.e. additional control messages). Also keep in mind that "close to real-time" is relative because you will still face delays in the network as well as ability of the receiver to process data. So you might also take a look at the design constraints of your specific application to determine how "close" do you really need to be.
If you need to be very close, but at the same time you don't care if packets are lost because old packet data is superseded by new data, then UDP/IP might be a better alternative. However, a) if you have reliable deliver requirements, you might ends up reinventing a portion of tcp/ip's wheel and b) keep in mind that certain networks (corporate firewalls) tend to block UDP/IP while allowing TCP/IP traffic and c) even UDP/IP won't be exact real-time.

ZeroMQ XREP -- Endpoint disappearing

I am using a standard LRU queue as defined by the ZeroMQ guide figure 41, and I am wondering how to add in protection so that I don't send messages to end points that have disappeared (server crash, OOM killer, anything along those lines).
From the documentation I read that XREP will just drop the message if it is going to a non-existant end-point, and there is no way I get notified about that. Is there a way to get such a notification? Should I just send out a "ping" first and if I don't get a response then that "worker" is dead meat to me? How will I know that it is the same client that I just sent the ping to that I am getting the message back from?
Or is my use case not a good one for ZeroMQ? I just want to make sure that a message has been received, I don't want it being dropped on the floor without my knowledge...
Pinging a worker to know if it is alive will cause a race condition: the worker might well answer the ping just before it dies.
However, if you assume that a worker will not die during a request processing (you can do little in this case), you can reverse the flow of communication between the workers and the central queue. Let the worker fetch a request from the queue (using a REQ/REP connection) and have it send the answer along with the original envelope when the processing is done (using the same socket as above, or even better through a separate PUSH/PULL connection).
With this scenario, you know that a dead worker will not be sent requests, as it will be unable to fetch them (being dead…). Moreover, your central queue can even ensure that it receives an answer to every request in a given time. If it does not, it can put the request back in the queue so that a new worker will fetch it shortly after. This way, even if a worker dies while processing a request, the request will eventually be served.
(as a side note: be careful if the worker crashes because of a particular request - you do not want to kill your workers one by one, and might want to put a maximum number of tries for a request)
Edit: I wrote some code implementing the other direction to explain what I mean.

Optimally reading data from an Asynchronous Socket

I have a problem with a socket library that uses WSAASyncSelect to put the socket into asynchronous mode. In asynchronous mode the socket is placed into a non-blocking mode (WSAWOULDBLOCK is returned on any operations that would block) and windows messages are posted to a notification window to inform the application when the socket is ready to be read, written to etc.
My problem is this - when receiving a FD_READ event I don't know how many bytes to try and recv. If I pass a buffer thats too small, then winsock will automatically post another FD_READ event telling me theres more data to read. If data is arriving very fast, this can saturate the message queue with FD_READ messages, and as WM_TIMER and WM_PAINT messages are only posted when the message queue is empty this means that an application could stop painting if its receiving a lot of data and useing asynchronous sockets with a too small buffer.
How large to make the buffer then? I tried using ioctlsocket(FIONREAD) to get the number of bytes to read, and make a buffer exactly that large, BUT, KB192599 explicitly warns that that approach is fraught with inefficiency.
How do I pick a buffer size thats big enough, but not crazy big?
As far as I could ever work out, the value set using setsockopt with the SO_RVCBUF option is an upper bound on the FIONREAD value. So rather than call ioctlsocket it should be OK to call getsockopt to find out the SO_RCVBUF setting, and use that as the (attempted) value for each recv.
Based on your comment to Aviad P.'s answer, it sounds like this would solve your problem.
(Disclaimer: I have always used FIONREAD myself. But after reading the linked-to KB article I will probably be changing...)
You can set your buffer to be as big as you can without impacting performance, relying on the TCP PUSH flag to make your reads return before filling the buffer if the sender sent a smaller message.
The TCP PUSH flag is set at a logical message boundary (normally after a send operation, unless explicitly set to false). When the receiving end sees the PUSH flag on a TCP packet, it returns any blocking reads (or asynchronous reads, doesn't matter) with whatever's accumulated in the receive buffer up to the PUSH point.
So if your sender is sending reasonable sized messages, you're ok, if he's not, then you limit your buffer size such that even if you read into it all, you don't negatively impact performance (subjective).

Resources