MPI posting meaning - parallel-processing

I have google searched the meaning of posting but I haven't found a concrete answer for that. For instance, in the phrase:
once the matching receive is posted
does "posted" here mean that the receive operation has been started or that has been successfully completed?
Any comment is welcome, thanks.

To post a receive operation means to call the corresponding MPI function that initiates the operation. This could be the blocking receive MPI_Recv or it could be the initiation of a non-blocking receive via MPI_Irecv. Same applies to the send operations. Completion comes after posting an operation and it is a separate thing, although blocking MPI operations combine both.

Related

Microservice Architecture: Can you eliminate the synchronous calls between services completely in a system?

Anywhere you read about Microservices, it says microservice should communicate asynchronously. It is understandable why asynchronous communication is preferred as it removes dependencies and provides low-coupling, and availability, etc.
Suppose, there is a common authorization service that is invoked every time a user calls an API. In this scenario you cannot move further util you have the response from the authorization service. Although you can call the authorization service asynchronously using Async IO, however, it is still a request/reply pattern.
Questions I have
Is possible to get rid of synchronous communication or more appropriately request/reply pattern in microservices-based system design?
Although it is possible to implement a reply/response pattern asynchronously through messaging and callbacks, which add significant overhead and latency but is it worth converting every request/reply to asynchronously?
If synchronous calls cannot be eliminated completely, then which scenarios it is ok to have synchronous calls among microservices?
I think the short answer for your question is: request-reply pattern doesn't mean synchronous. It can also be asynchronous. Which you already mentioned.
Long answer:
Request-Reply is just a principle. For example you send an email to a friend. The message contains data relevant to you and you are expecting a response but didn't say that explicitly. Your friend will see the email when he will get back from work and then he may or may not reply to you. Only you know that you need an answer from him.
Now there are a few options while waiting for your response. Either block your entire life until your friend responds (which will mean synchronous communication) either do something else until the response arrives in your inbox (which is asynchronous).
Now, to the point:
Is possible to get rid of synchronous communication or more appropriately request/reply pattern in microservices-based system design?
Yes, you already have answered that at the second point. Even though it is possible I think it should be used where it is required.
Although it is possible to implement a reply/response pattern asynchronously through messaging and callbacks, which add significant overhead and latency but is it worth converting every request/reply to asynchronously?
For the right scenario, yes. The messaging system have very good performances so the latency should not be an issue. When a latency problem occurs in a messaging system there are other options to improve it.
If synchronous calls cannot be eliminated completely, then which scenarios it is ok to have synchronous calls among microservices?
Yes.
There is one more thing that needs to be added. Synchronous doesn't always mean blocking. In a reactive world, if you make an HTTP call to another service the caller sends the request and then awaits for the response in a non-blocking manner. When the responses arrives, the caller is notified the the response has arrived and so the process continues. While "awaiting" the CPU can do other stuff.

Latest Windows threadpool API usage for I/O

I don't understand part of the latest Windows threadpool API. I need help with that.
From the documentation, the recipe to use it for I/O (in my case, for SOCKET) can be summarized as follows:
Call CreateThreadpoolIo.
Call StartThreadpoolIo. You can find this warning there:
You must call this function before initiating each asynchronous I/O operation on the file handle bound to the I/O completion object. Failure to do so will cause the thread pool to ignore an I/O operation when it completes and will cause memory corruption.
Call the operation on the file handle (e.g., WSARecvFrom). If it fails, call CancelThreadpoolIo. Otherwise, process the result when it is available. WSARecvFrom, when used asynchronously, asks for a WSAOVERLAPPED (that you have to create beforehand) but not for any information that links it to the previous call to StartThreadpoolIo. CancelThreadpoolIo only asks for the PTP_IO, but not for any additional information to derive a specific asynchronous operation.
Repeat steps 2 and 3.
Call CloseThreadpoolIo to finish. You can find this warning there:
It may be necessary to cancel threadpool I/O notifications to prevent memory leaks. For more information, see CancelThreadpoolIo.
I usually need it for UDP, so I strive to have several reception operations queued (asynchronous WSARecvFrom operations started) at any given time. That way I don't have to rush to start another reception operation at the beginning of the callback function nor synchronize access to the reception buffers (I can have a pool of them, each one able to contain a datagram, and reissue the reception operation when I finish processing each message; in the interim, other queued operations will keep the receiver busy). Datagrams are independent and self contained. I'm aware that this approach may not be valid for TCP.
StartThreadpoolIo/CancelThreadpoolIo seem to me the source of the problem: StartThreadpoolIo and WSARecvFrom are not directly bound (they don't share any arguments). So:
How can the framework know which operation to cancel when you call CancelThreadpoolIo? How does it cancel just the operation that failed and not any of the pending ones?
You can say, "don't call StartThreadpoolIo concurrently". I can live without several concurrent WSARecvFrom's, but I can't live without concurrent WSARecvFrom and WSASendTo. So I think being unable to have several asynchronous operations at the same time can't be the way the API was designed.
You can say, "call StartThreadpoolIo only once, that will suffice to register the callback; it is an on/off process". But the documentation says:
You must call this function before initiating each asynchronous I/O operation on the file handle...
You can say, "it cancels the operation started by the same thread that just called StartThreadpoolIo". But then the advice of calling CancelThreadpoolIo in the context of calling CloseThreadpoolIo doesn't make sense (I will call CloseThreadpoolIo from the thread that triggers stopping, which will be completely independent from the threads issuing the asynchronous operations; and a single call to CancelThreadpoolIo may not be enough to cancel several operations). Being unable to trigger cancellation from a different thread is a serious limitation, anyway. I'm aware of the existence of CreateThreadpoolCleanupGroup, but my question is more fundamental. I want to understand how this API can be fundamentally right and useful.
You can say "call CreateThreadpoolIo several times, so that you have independent PTP_IO's to work with". It doesn't work. When I call CreateThreadpoolIo a second time, nullptr is returned.
Am I wrong, or is this API awkward? Normally, other asynchronous APIs work with one of these patterns:
Create an operation and receive a handle => call methods passing the handle.
Create a reusable handle => call methods (including starting operations) passing the handle.
The latest Windows threadpool API, in which the handle seems to be implicit, or there are several handles for the same operation (TP_IO, WSAOVERLAPPED, StartThreadpoolIo) and they aren't all explicitly linked together, uses neither of them.
Thank you very much for your help.
How can the framework know which operation to cancel when you call CancelThreadpoolIo? How does it cancel just the operation that failed
and not any of the pending ones?
CancelThreadpoolIo() doesn't cancel IO. It is reciprocal to StartThreadpoolIo(). StartThreadpoolIo() prepares threadpool to accept a completion. If threadpool doesn't expect a completion, it won't wait for it, thus you may miss it. If threadpool expects a completion but completion doesn't happen, threadpool may waste resources.
CancelThreadpoolIo() undoes whatever StartThreadpoolIo() did.

C++ IRC Client design

I'm attempting to write an RFC 2812 compliant C++ IRC library.
I am having some trouble with the design of the client itself.
From what I have read IRC communication tends to be asynchronous.
I am using boost::asio::async_read and boost::asio::async_write.
From reading the documentation I have gathered that you cannot perform multiple async_write requests before one is completed. You therefore end up with rather nested callbacks. Doesn't this defeat the purpose of doing async calls? Wouldn't it just be better to use synchronous calls to prevent the nesting? If not, why?
Secondly, if I am not mistaken, each boost::asio::async_write should be followed up by a boost::asio::async_read to receive the server's response to the commands sent. My client's functions, therefore, would need to take a callback parameter so a user of the class may do something after the client receives a response (ex. send another message...).
If I were to continue implementing this with async, should I keep a std::deque<std::tuple<message, callback>> and each time a boost::asio::async_write is finished, and there is a tuple in the queue, dequeue and send the message then raise the callback? Would this be the optimal way to implement this system?
I'm thinking since messages are sent all the time I'm going to have to implement some kind of listener loop that queues up responses, but how would you associate these responses with the specific command that triggered them? Or in the case that the response is just a message to the channel from another user?
The IRC protocol is a full-duplex protocol. As such, one should always be listening to the server connection expecting commands to process. It could be argued that one should primarily use the messages received from the server to update state, rather than correlating request and responses, as the server may not respond to a command or may respond much later than expected. For example, one may issue a WHOIS command, but receive multiple PRIVMSG commands before receiving a response to WHOIS. For a chat client, a user would likely expect being able to receive chat messages while waiting for a response to WHOIS. Hence, having a async_write() to async_read() call chain may not be ideal in handling the protocol.
For a given socket, the Asio documentation does recommend not initiating additional read operations if there is an outstanding composed read operation and not initiating additional write operations if there is an outstanding composed write operation. Queuing up messages and having an asynchronous call chains process from the queue is a great way to fulfill this recommendation. Consider reading this answer for a nice solution using a queue and an asynchronous call chain.
Also, be aware that the server may send a PING command even on an active connection. When the client is responding with a PONG command, it may be necessary to insert the PONG command near the front of the outbound queue so that it gets sent out as soon as possible.
Doesn't this defeat the purpose of doing async calls?
The usual solution is to use strands:
Why do I need strand per connection when using boost::asio?
You are free to queue multiple asynchronous operations on the same io objects using an (implicit) strand¹.
Using a strand ensures that the completion handlers are invoked on that same logical thread.
On the Protocol
You could indeed keep a queue of commands and await responses for each command before sending the next.
You might be a little bit smarter about this if you can spot the correlation due the different type of reply, but then you'd need to keep queues per type of command. I'd consider that premature optimization.

Can I use MPI_Probe to probe messages sent by a collective operation?

In my code I have a server process repeatedly probing for incoming messages, which come in two types.
One type of the two will be sent once by each process to give hint to the server process about its
termination.
I was wondering if it is valid to use MPI_Broadcast to broadcast these termination messages and use MPI_Probe to probe their arrivals.
I tried using this combination but it failed. This failure might have been caused by some other things. So I would like anyone who knows about this to confirm.
No, you can only use MPI_Probe for testing for point-to-point communications. For collective communications, the only way to participate at all is to actively make the collective call. From the definition of MPI_Probe in the standard, "The call matches the same message that would have been received by a call to MPI_RECV(..., source, tag, comm, status) executed at the same point in the program" -- eg, it only matches point-to-point stuff like Recv would.
With the new nonblocking collectives coming in MPI3, you would however be able to use MPI_Test (or MPI_Wait) to check to see the status of the nonblocking request, just as you would with a nonblocking send/recv, although I haven't been following that WGs work too closely so I don't know the details.
I'm not sure that the MPI standard excludes this, but I don't see how it would be useful if it is possible. On the (rare) occasions when I've used mpi_probe I've used it to find out the size of an incoming message; it can, of course, get other information about messages 'in flight' too. But mpi_bcast is a collective operation so all the processes in a communicator know everything about a message that you could use mpi_probe to find out. I think ?

Usage of IcmpSendEcho2 with an asynchronous callback

I've been reading the MSDN documentation for IcmpSendEcho2 and it raises more questions than it answers.
I'm familiar with asynchronous callbacks from other Win32 APIs such as ReadFileEx... I provide a buffer which I guarantee will be reserved for the driver's use until the operation completes with any result other than IO_PENDING, I get my callback in case of either success or failure (and call GetCompletionStatus to find out which). Timeouts are my responsibility and I can call CancelIo to abort processing, but the buffer is still reserved until the driver cancels the operation and calls my completion routine with a status of CANCELLED. And there's an OVERLAPPED structure which uniquely identifies the request through all of this.
IcmpSendEcho2 doesn't use an OVERLAPPED context structure for asynchronous requests. And the documentation is unclear excessively minimalist about what happens if the ping times out or fails (failure would be lack of a network connection, a missing ARP entry for local peers, ICMP destination unreachable response from an intervening router for remote peers, etc).
Does anyone know whether the callback occurs on timeout and/or failure? And especially, if no response comes, can I reuse the buffer for another call to IcmpSendEcho2 or is it forever reserved in case a reply comes in late?
I'm wanting to use this function from a Win32 service, which means I have to get the error-handling cases right and I can't just leak buffers (or if the API does leak buffers, I have to use a helper process so I have a way to abandon requests).
There's also an ugly incompatibility in the way the callback is made. It looks like the first parameter is consistent between the two signatures, so I should be able to use the newer PIO_APC_ROUTINE as long as I only use the second parameter if an OS version check returns Vista or newer? Although MSDN says "don't do a Windows version check", it seems like I need to, because the set of versions with the new argument aren't the same as the set of versions where the function exists in iphlpapi.dll.
Pointers to additional documentation or working code which uses this function and an APC would be much appreciated.
Please also let me know if this is completely the wrong approach -- i.e. if either using raw sockets or some combination of IcmpCreateFile+WriteFileEx+ReadFileEx would be more robust.
I use IcmpSendEcho2 with an event, not a callback, but I think the flow is the same in both cases. IcmpSendEcho2 uses NtDeviceIoControlFile internally. It detects some ICMP-related errors early on and returns them as error codes in the 12xx range. If (and only if) IcmpSendEcho2 returns ERROR_IO_PENDING, it will eventually call the callback and/or set the event, regardless of whether the ping succeeds, fails or times out. Any buffers you pass in must be preserved until then, but can be reused afterwards.
As for the version check, you can avoid it at a slight cost by using an event with RegisterWaitForSingleObject instead of an APC callback.

Resources