Is MPI_Bsend_init/MPI_Start best asynchronous buffered communication. - parallel-processing

Is MPI_Bsend_init/MPI_Start best asynchronous buffered communication. Can you guys think of better way to communicate data between processors. Pseudo-Code for N Processing nodes
MPI_Recv(request[i]) -- Recv data
for(i=0;i<N;i++) MPI_Bsend_init(request[i]) -- Setup request
MPI_Start(request[i]) -- Send data

Bsend is the wrong function here from a performance standpoint. There is little to no advantage of Bsend, as the eager protocol used by essentially all implementations today, is buffered on the receiver side automatically for small messages, which is where Bsend would be viable.
In any case, persistent send - as you are using - is already nonblocking, so there is no such thing as Isent_init. See e.g. http://www.mpi-forum.org/docs/mpi-1.1/mpi-11-html/node51.html:
The call is local, with similar semantics to the nonblocking
communication operations described in section Nonblocking
communication . That is, a call to MPI_START with a request created by
MPI_SEND_INIT starts a communication in the same manner as a call to
MPI_ISEND...
And my colleagues have surprised me with confirmation that persistent Send-Recv does provide efficiency gains on modern InfiniBand clusters. I can only assume this is because IB page registration is done up-front.

Related

Cancel last sent message ZeroMQ (python) (dealer/router and pushh/pull)

How would one cancel the last sent message ?
I have this set up
The idea is that the client can ask for different types of large data.
The server reads the request from the client and answers an acknowledgement.
Once its data is ready, it pushes it through the other socket.
This enables queueing task on the server side when multiple clients are connected.
However, if the client decides that it does not need the data anymore, it can send a cancel message to the server.
I'm using asyncio.Queue for queueing messages, so I can easily empty the queue, however, I don't know how to drop a message that is in the push/pull pipe to free up the channel?
The kill switch example (Figure 19 - Parallel Pipeline with Kill Signaling) in https://zguide.zeromq.org/docs/chapter2/ is used to end the process. I just want to cancel it.
My idea was to close the socket on the server side and reopen it, but even with linger set to 0, the messages are not dropped.
EDIT: The messages are indeed dropped, but I feel the solution is wrong.
It doesn't really make any sense for ZeroMQ itself to have such a feature.
Suppose that it did have a cancel message feature. For it to operate as expected, you would be critically dependent on the speed of the network. You might develop on a slow network and their you have the time available to decide to cancel, submit the request and for that to take effect before anything has moved anywhere. But on a fast network you won't.
ZeroMQ is a bit like the post office. Once you have posted a letter, they are going to deliver it.
Other issues for a library developer would include how messages are identified, who can cancel a message, etc? It would get very complex for the library to do it and cater for all possible use cases, so it's not unreasonable that they've left such things as an exercise for the application developers.
Chop the Responses Up
You could divide the responses up into smaller messages, send them at some likely rate (proportionate to the network throughput) and check to see if a cancellation has been received before sending each chunk.
It's a bit fiddly, you'd need to know what kind of rate to send the smaller messages so that you don't starve the network, but don't over do it either.
Or, Convert to CSP
The problem lies in ZeroMQ implementing Actor Model, where the transport buffers messages. What you need is Communicating Sequential Processes, which does not buffer messages. You can implement this quite easily on top of ZeroMQ, basically all you need to do is have a two way message exchange going on basically like:
Peer1->Peer2: I'd like to send you a message
time passes
Peer2->Peer1: Okay send a message
Peer1->Peer2: Here is the message
time passes
Peer2->Peer1: I have received the message
end
And in doing this the peers would block, ie peer 1 does nothing else until it gets peer 2's final response.
This feels clunky, but it's what you have to do to reign in an Actor Model system and control where your messages are at any point in time. It's slower because there's more too-ing and fro-ing going on between the peers (in systems like Transputers, this was all done down at the electronic level, so it wasn't an encumberance on software).
The blocking can be a blessing, if throughput matters. Basically, if you find the sender is being blocked too much, that just means you haven't got enough receivers for the tasks they're performing. Actor Model can deceive, because buffering in the network / actor model implementation can temporarily soak up an excess of messages, adding a bit of latency that goes unnoticed.
Anyway, this way you can have a mechanism whereby the flow of messages is fully managed within the application, and not within the ZeroMQ library. If a client does send a "cancel my last request" message (using the above mechanism to send it), that either arrives before the reponse has started to be sent, or after the response has already been delivered to the client (using the mechanism above to send it). There is no intermediate state where a response is already on the way, but out of control of the applications.
CSP is a mode that I'd dearly like ZeroMQ to implement natively. It nearly does, in that you can control the socket high water marks. Unfortunately, a high water mark of 0 means "inifinite", not zero.
CSP itself is a 1970s idea, that saw some popularity and indeed silicon in the 1980s, early 1990s (Inmos, Transputers, Occam, etc) but has recently made something of a comeback in languages like Rust, Go, Erlang. There's even a MS-supplied library for .NET that does it too (not that they call it CSP).
The really big benefit of CSP is that it is algebraically analysable - a design can be analysed and proven to be free of deadlock, without having to do any testing. However, with Actor model systems you cannot do that, and testing will not confirm a lack of problems either. Complex, circular message flows in Actor model can easily lead to deadlock, but that might not occur until the network between computers becomes just a tiny bit busier. Deadlock can happen in CSP too, but it's basically guaranteed to happen every time, if the system has accidentally been architected to deadlock. This shows up in testing quite readily (so at least you know early on!).
As I alluded to early, CSP also doesn't deceive you into thinking there is enough compute resources in a system. If a sender has a strict schedule to keep, and the recipient(s) aren't keeping up, the sender ends up being blocked trying to send instead of waiting for fresh input. It's easy to detect that the real time requirement has not been met. Whereas with Actor model, the send launches messages off into some buffer, and so long as the receiver(s) on average keeps up, all appears to be OK. However, you have no visibility of whether messages are building up inside the (in this case) ZeroMQ's own buffers, so there is little notice of a trending problem in the overall system.

Microservice Architecture: Can you eliminate the synchronous calls between services completely in a system?

Anywhere you read about Microservices, it says microservice should communicate asynchronously. It is understandable why asynchronous communication is preferred as it removes dependencies and provides low-coupling, and availability, etc.
Suppose, there is a common authorization service that is invoked every time a user calls an API. In this scenario you cannot move further util you have the response from the authorization service. Although you can call the authorization service asynchronously using Async IO, however, it is still a request/reply pattern.
Questions I have
Is possible to get rid of synchronous communication or more appropriately request/reply pattern in microservices-based system design?
Although it is possible to implement a reply/response pattern asynchronously through messaging and callbacks, which add significant overhead and latency but is it worth converting every request/reply to asynchronously?
If synchronous calls cannot be eliminated completely, then which scenarios it is ok to have synchronous calls among microservices?
I think the short answer for your question is: request-reply pattern doesn't mean synchronous. It can also be asynchronous. Which you already mentioned.
Long answer:
Request-Reply is just a principle. For example you send an email to a friend. The message contains data relevant to you and you are expecting a response but didn't say that explicitly. Your friend will see the email when he will get back from work and then he may or may not reply to you. Only you know that you need an answer from him.
Now there are a few options while waiting for your response. Either block your entire life until your friend responds (which will mean synchronous communication) either do something else until the response arrives in your inbox (which is asynchronous).
Now, to the point:
Is possible to get rid of synchronous communication or more appropriately request/reply pattern in microservices-based system design?
Yes, you already have answered that at the second point. Even though it is possible I think it should be used where it is required.
Although it is possible to implement a reply/response pattern asynchronously through messaging and callbacks, which add significant overhead and latency but is it worth converting every request/reply to asynchronously?
For the right scenario, yes. The messaging system have very good performances so the latency should not be an issue. When a latency problem occurs in a messaging system there are other options to improve it.
If synchronous calls cannot be eliminated completely, then which scenarios it is ok to have synchronous calls among microservices?
Yes.
There is one more thing that needs to be added. Synchronous doesn't always mean blocking. In a reactive world, if you make an HTTP call to another service the caller sends the request and then awaits for the response in a non-blocking manner. When the responses arrives, the caller is notified the the response has arrived and so the process continues. While "awaiting" the CPU can do other stuff.

Synchronous vs Asynchronous socket reads

Most example apps I come across for receiving data are using async calls. For instance, c++ examples use boost asio services to bind message handlers to callbacks. But what about an app that only needs to listen to data from a single socket and process the messages in order? Would it be faster to have a loop that polls/recv's from the socket and calls the handler without using a callback (assume main and logging threads are separate)? Or is there no performance difference (assume messages are coming in as fast as the network card and kernel can handle them)?
There are many intricacies I don't know such as the impact of callbacks to performance due to things like branch prediction. Or if there will be a performance penalty of the callbacks call a different thread to do the processing. Curious to hear some thoughts, experiences, dialog on this subject to save myself from attempting both implementations to discover the answer.

Broadcast Server

I am writing a TCP Server that accepts connections from multiple clients, this server gathers data from the system that it's running on and transmits it to every connected client.
What design patterns would be best for this situation?
Example
Put all connections in an array, then loop through the array and send the data to each client one by one. Advantage: very easy to implement. Disadvantage: not very efficient when handling large amounts of data.
An easier way is to use some existing software to do this ... For example use https://www.ibm.com/developerworks/community/groups/service/html/communityview?communityUuid=d5bedadd-e46f-4c97-af89-22d65ffee070 .
In case you want to write on your own you will need a list(linked list) to manage the connections.
Here is an example of a server http://examples.oreilly.com/jenut/Server.java
If you want to handle large amounts of data, one of the techniques is to have a queue associated with each of the subscribers at the server end. A multi-threaded program can send the data to the clients from those queues.
A number of patterns have been developed for distributed processing and servers, for instance in the ACE project: http://www.cs.wustl.edu/~schmidt/patterns-ace.html. The design might be focused around the events which announce either that data has been received and may be read, or that buffers have been emptied and more data may now be written. At least in the days when a 32-bit address space was the rule, you could have many more open connections than you had threads, so you would typically have a small number of threads waiting for events which announced that they could safely read or write without stalling until the other side co-operated. This may come from events, or from calls such as select() or poll() terminating. Patterns are also associated with http://zguide.zeromq.org/page:all#-MQ-in-a-Hundred-Words.

boost::asio sending data faster than receiving over TCP. Or how to disable buffering

I have created a client/server program, the client starts
an instance of Writer class and the server starts an instance of
Reader class. Writer will then write a DATA_SIZE bytes of data
asynchronously to the Reader every USLEEP mili seconds.
Every successive async_write request by the Writer is done
only if the "on write" handler from the previous request had
been called.
The problem is, If the Writer (client) is writing more data into the
socket than the Reader (server) is capable of receiving this seems
to be the behaviour:
Writer will start writing into (I think) system buffer and even
though the data had not yet been received by the Reader it will be
calling the "on write" handler without an error.
When the buffer is full, boost::asio won't fire the "on write"
handler anymore, untill the buffer gets smaller.
In the meanwhile, the Reader is still receiving small chunks
of data.
The fact that the Reader keeps receiving bytes after I close
the Writer program seems to prove this theory correct.
What I need to achieve is to prevent this buffering because the
data need to be "real time" (as much as possible).
I'm guessing I need to use some combination of the socket options that
asio offers, like the no_delay or send_buffer_size, but I'm just guessing
here as I haven't had success experimenting with these.
I think that the first solution that one can think of is to use
UDP instead of TCP. This will be the case as I'll need to switch to
UDP for other reasons as well in the near future, but I would
first like to find out how to do it with TCP just for the sake
of having it straight in my head in case I'll have a similar
problem some other day in the future.
NOTE1: Before I started experimenting with asynchronous operations in asio library I had implemented this same scenario using threads, locks and asio::sockets and did not experience such buffering at that time. I had to switch to the asynchronous API because asio does not seem to allow timed interruptions of synchronous calls.
NOTE2: Here is a working example that demonstrates the problem: http://pastie.org/3122025
EDIT: I've done one more test, in my NOTE1 I mentioned that when I was using asio::iosockets I did not experience this buffering. So I wanted to be sure and created this test: http://pastie.org/3125452 It turns out that the buffering is there event with asio::iosockets, so there must have been something else that caused it to go smoothly, possibly lower FPS.
TCP/IP is definitely geared for maximizing throughput as intention of most network applications is to transfer data between hosts. In such scenarios it is expected that a transfer of N bytes will take T seconds and clearly it doesn't matter if receiver is a little slow to process data. In fact, as you noticed TCP/IP protocol implements the sliding window which allows the sender to buffer some data so that it is always ready to be sent but leaves the ultimate throttling control up to the receiver. Receiver can go full speed, pace itself or even pause transmission.
If you don't need throughput and instead want to guarantee that the data your sender is transmitting is as close to real time as possible, then what you need is to make sure the sender doesn't write the next packet until he receives an acknowledgement from the receiver that it has processed the previous data packet. So instead of blindly sending packet after packet until you are blocked, define a message structure for control messages to be sent back from the receiver back to the sender.
Obviously with this approach, your trade off is that each sent packet is closer to real-time of the sender but you are limiting how much data you can transfer while slightly increasing total bandwidth used by your protocol (i.e. additional control messages). Also keep in mind that "close to real-time" is relative because you will still face delays in the network as well as ability of the receiver to process data. So you might also take a look at the design constraints of your specific application to determine how "close" do you really need to be.
If you need to be very close, but at the same time you don't care if packets are lost because old packet data is superseded by new data, then UDP/IP might be a better alternative. However, a) if you have reliable deliver requirements, you might ends up reinventing a portion of tcp/ip's wheel and b) keep in mind that certain networks (corporate firewalls) tend to block UDP/IP while allowing TCP/IP traffic and c) even UDP/IP won't be exact real-time.

Resources