When does an asynchronous routine change to a synchronous one? - parallel-processing

From what I know , an asynchronous send (MPI_Isend) changes to a synchronous one (MPI_Send) when the buffer is full , so it must wait until appropriate space is available , so I wanted to know whether an asynchronous receive (MPI_Irecv) changes to a synchronous one (MPI_recv) when the buffer is empty ?

As far as I know MPI_Isend will not turn into a blocking routine because of buffers filling up. The reason is that Isend uses the user's send buffer, so there is no problem with internal buffering.
Maybe you should clarify what you mean by "the buffer". The user send buffer, or something internal? The user buffer is never "full" or "empty": the only thing you can say is that before MPI_Wait the buffer should not be reused, and after the wait it can safely be.

Related

MPI, If using non-blocking i_send or i_recv, does it matter which one goes first if there is a wait at the end?

If I call MPI_isend then MPI_irecv or do a irecv first then a isend, with a wait at the end? does it matter which order?
if i MPI_isend then MPI_irecv or do a irecv first then a isend, with a
wait at the end? does it matter which order?
MPI_Irecv and MPI_Isend are nonblocking communication routines, therefore one needs to use the MPI_Wait (or use MPI_Test to test for the completion of the request) to ensure that the message is completed, and that the data in the send/receive buffer can be again safely manipulated.
The nonblocking here means one does not wait for the data to be read and sent; rather the data is immediately made available to be read and sent. That does not imply, however, that the data is immediately sent. If it were, there would not be a need to call MPI_Wait.
Quoting #hristo-away-iliev on source
You must always wait on or test nonblocking operations if you'd like
your programs to be standard compliant and hence portable. The
standard allows the implementation to postpone the actual data
transmission until the wait/test call. Some MPI operations (other than
wait/test) progress non-blocking operations but one should not rely on
this behaviour.
and
MPI_Isend does not necessarily progress in the background but only
when the MPI implementation is given the chance to progress it.
MPI_Wait progresses the operation and guarantees its completion. Some
MPI implementations can progress operations in the background using
progression threads. Some cannot. It is implementation-dependent and
one should never rely on one or another specific behaviour.

MPI_SSEND : How does it guarantee reuse of the sender buffer?

I understand that MPI_Bsend will save the sender's buffer in local buffer managed by MPI library, hence it's safe to reuse the sender's buffer for other purposes.
What I do not understand is how MPI_Ssend guarantee this?
Send in synchronous mode.
A send that uses the synchronous mode can be started whether or not a matching receive was posted. However, the send will complete successfully only if a matching receive is posted, and the receive operation has started to receive the message sent by the synchronous send. Thus, the completion of a synchronous send not only indicates that the send buffer can be reused, but also indicates that the receiver has reached a certain point in its execution, namely that it has started executing the matching receive
As per above, MPI_Ssend will return (ie allow further program execution) if matching receive has been posted and it has started to receive the message sent by the synchronous send. Consider the following case:
I send a huge data array of int say data[1 million] via MPI_Ssend. Another process starts receiving it (but might not have done so completely), which allows MPI_Ssend to return and execute the next program statement. The next statement makes changes to the buffer at very end data[1 million] = \*new value*\. Then the MPI_Ssend finally reaches the buffer end and sends this new value which was not what I wanted.
What am I missing in this picture?
TIA
MPI_Ssend() is both blocking and synchronous.
From the MPI 3.1 standard (chapter 3.4, page 37)
Thus, the completion of a synchronous send not only indicates
that the send buffer can be reused, but it also indicates that the
receiver has reached a certain point in its execution, namely that it
has started executing the matching receive.

boost::asio sending data faster than receiving over TCP. Or how to disable buffering

I have created a client/server program, the client starts
an instance of Writer class and the server starts an instance of
Reader class. Writer will then write a DATA_SIZE bytes of data
asynchronously to the Reader every USLEEP mili seconds.
Every successive async_write request by the Writer is done
only if the "on write" handler from the previous request had
been called.
The problem is, If the Writer (client) is writing more data into the
socket than the Reader (server) is capable of receiving this seems
to be the behaviour:
Writer will start writing into (I think) system buffer and even
though the data had not yet been received by the Reader it will be
calling the "on write" handler without an error.
When the buffer is full, boost::asio won't fire the "on write"
handler anymore, untill the buffer gets smaller.
In the meanwhile, the Reader is still receiving small chunks
of data.
The fact that the Reader keeps receiving bytes after I close
the Writer program seems to prove this theory correct.
What I need to achieve is to prevent this buffering because the
data need to be "real time" (as much as possible).
I'm guessing I need to use some combination of the socket options that
asio offers, like the no_delay or send_buffer_size, but I'm just guessing
here as I haven't had success experimenting with these.
I think that the first solution that one can think of is to use
UDP instead of TCP. This will be the case as I'll need to switch to
UDP for other reasons as well in the near future, but I would
first like to find out how to do it with TCP just for the sake
of having it straight in my head in case I'll have a similar
problem some other day in the future.
NOTE1: Before I started experimenting with asynchronous operations in asio library I had implemented this same scenario using threads, locks and asio::sockets and did not experience such buffering at that time. I had to switch to the asynchronous API because asio does not seem to allow timed interruptions of synchronous calls.
NOTE2: Here is a working example that demonstrates the problem: http://pastie.org/3122025
EDIT: I've done one more test, in my NOTE1 I mentioned that when I was using asio::iosockets I did not experience this buffering. So I wanted to be sure and created this test: http://pastie.org/3125452 It turns out that the buffering is there event with asio::iosockets, so there must have been something else that caused it to go smoothly, possibly lower FPS.
TCP/IP is definitely geared for maximizing throughput as intention of most network applications is to transfer data between hosts. In such scenarios it is expected that a transfer of N bytes will take T seconds and clearly it doesn't matter if receiver is a little slow to process data. In fact, as you noticed TCP/IP protocol implements the sliding window which allows the sender to buffer some data so that it is always ready to be sent but leaves the ultimate throttling control up to the receiver. Receiver can go full speed, pace itself or even pause transmission.
If you don't need throughput and instead want to guarantee that the data your sender is transmitting is as close to real time as possible, then what you need is to make sure the sender doesn't write the next packet until he receives an acknowledgement from the receiver that it has processed the previous data packet. So instead of blindly sending packet after packet until you are blocked, define a message structure for control messages to be sent back from the receiver back to the sender.
Obviously with this approach, your trade off is that each sent packet is closer to real-time of the sender but you are limiting how much data you can transfer while slightly increasing total bandwidth used by your protocol (i.e. additional control messages). Also keep in mind that "close to real-time" is relative because you will still face delays in the network as well as ability of the receiver to process data. So you might also take a look at the design constraints of your specific application to determine how "close" do you really need to be.
If you need to be very close, but at the same time you don't care if packets are lost because old packet data is superseded by new data, then UDP/IP might be a better alternative. However, a) if you have reliable deliver requirements, you might ends up reinventing a portion of tcp/ip's wheel and b) keep in mind that certain networks (corporate firewalls) tend to block UDP/IP while allowing TCP/IP traffic and c) even UDP/IP won't be exact real-time.

How to find out when CancelIo() is done?

CancelIo() is supposed to cancel all pending I/O operations associated with the calling thread. In my experience, CancelIo() sometimes cancels future I/O operations as well. Given:
ReadFile(port, buffer, length, &bytesTransferred, overlapped);
If I invoke CancelIo(port) immediately before the read, GetQueuedCompletionStatus() will block forever, never receiving the read operation.
If I invoke CancelIo(port) immediately after the read, GetQueuedCompletionStatus() will return 0 with GetLastError()==ERROR_OPERATION_ABORTED
If I invoke CancelIo(port) and there are no pending or subsequent reads, GetQueuedCompletionStatus() will block forever.
The key point here is that there is no way to detect when CancelIo() has finished executing. How can I ensure that CancelIo() is done executing and it is safe to issue further read requests?
PS: Looking at http://osdir.com/ml/lib.boost.asio.user/2008-02/msg00074.html and http://www.boost.org/doc/libs/1_44_0/doc/html/boost_asio/using.html it sounds like CancelIo() is not really usable. Must customer requires Windows XP support. What are my options?
NOTE: I am reading from a serial port.
CancelIo() works fine. I misunderstood my code.
Upon further investigation it turns out that the code was invoking CancelIo() followed by ReadFile() with a timeout INFINITE. The completion port was never getting notified of the read because the remote end was never sending anything. In other words, CancelIo() did not cancel subsequent operations.
I found some eye-opening documentation here:
Be careful when coding for asynchronous I/O because the system reserves the right to make an operation synchronous if it needs to. Therefore, it is best if you write the program to correctly handle an I/O operation that may be completed either synchronously or asynchronously. The sample code demonstrates this consideration.
It turns out that device drivers may choose to treat an asynchronous operation in a synchronous manner if the data being read is already cached by the device driver. Upon further investigation, I discovered that when CancelIo() was being invoked before ReadFile() it would sometimes cause the latter to return synchronously. I have no idea why the completion port was never getting notified of ReadFile() after a CancelIo() but I can no longer reproduce this problem.
The completion port is signaled regardless of whether ReadFile() is synchronous or asynchronous.
Wait on (possibly with zero timeout) overlapped.Handle. It will be set whether the operation is completed or cancelled.
If you're already using overlapped operations, why do you need to cancel I/O at all? The entire concept of 'cancelling' an in-flight I/O operation is really race-prone, and totally subject to the underlying device stack you're trying to write to; really the only time you'd want to do this is to unblock another thread who is waiting on the completion of that I/O.
It is possible to write asynchronous I/O code without CancelIo function. The question depends on the scenario you are using CancelIO. Let's say that you need to implement file reading thread. Thread pseudo-code:
for(;;)
{
ReadFile(port, buffer, length, &bytesTransferred, overlapped);
WaitForMultipleObjects( overlapped event + stop event);
if ( stop event is signaled )
break;
if (overlapped event is signaled )
handle ReadFile results
}
Such thread reads file (socket, port etc.) using overlapped I/O. Most of the time it waits on WiatForMultipleObjects line. It wakes up when new data is available, or stop event is signaled. To stop this thread, set stop event from another thread. CancelIO is not used.

Optimally reading data from an Asynchronous Socket

I have a problem with a socket library that uses WSAASyncSelect to put the socket into asynchronous mode. In asynchronous mode the socket is placed into a non-blocking mode (WSAWOULDBLOCK is returned on any operations that would block) and windows messages are posted to a notification window to inform the application when the socket is ready to be read, written to etc.
My problem is this - when receiving a FD_READ event I don't know how many bytes to try and recv. If I pass a buffer thats too small, then winsock will automatically post another FD_READ event telling me theres more data to read. If data is arriving very fast, this can saturate the message queue with FD_READ messages, and as WM_TIMER and WM_PAINT messages are only posted when the message queue is empty this means that an application could stop painting if its receiving a lot of data and useing asynchronous sockets with a too small buffer.
How large to make the buffer then? I tried using ioctlsocket(FIONREAD) to get the number of bytes to read, and make a buffer exactly that large, BUT, KB192599 explicitly warns that that approach is fraught with inefficiency.
How do I pick a buffer size thats big enough, but not crazy big?
As far as I could ever work out, the value set using setsockopt with the SO_RVCBUF option is an upper bound on the FIONREAD value. So rather than call ioctlsocket it should be OK to call getsockopt to find out the SO_RCVBUF setting, and use that as the (attempted) value for each recv.
Based on your comment to Aviad P.'s answer, it sounds like this would solve your problem.
(Disclaimer: I have always used FIONREAD myself. But after reading the linked-to KB article I will probably be changing...)
You can set your buffer to be as big as you can without impacting performance, relying on the TCP PUSH flag to make your reads return before filling the buffer if the sender sent a smaller message.
The TCP PUSH flag is set at a logical message boundary (normally after a send operation, unless explicitly set to false). When the receiving end sees the PUSH flag on a TCP packet, it returns any blocking reads (or asynchronous reads, doesn't matter) with whatever's accumulated in the receive buffer up to the PUSH point.
So if your sender is sending reasonable sized messages, you're ok, if he's not, then you limit your buffer size such that even if you read into it all, you don't negatively impact performance (subjective).

Resources