How to send a message without a specific destination in MPI? - parallel-processing

I want to send a message to one of the ranks receiving a message with a specific tag. If there is any rank received the message and the message is consumed. In MPI_Recv() we can receive a message using MPI_ANY_SOURCE/MPI_ANY_TAG but the MPI_Send() can't do this. How can I send an message with unknown destination?
The MPI_Bcast() can't do it because after receiving, I have to reply to the source process.
Thanks.

What I would do is have the worker processes signal to the master that they're ready to receive. The master would keep track of which ranks are ready, pick one (lowest rank first, random, round robin, however you like), send to it, and clear its "ready" flag.

Do you just want to send a message it to a random rank?
MPI_Comm_size(MPI_COMM_WORLD, &size);
sendto = rand() % size;
MPI_Send(buffer, count, MPI_CHAR, sendto, 0, MPI_COMM_WORLD);

The short answer is: You cannot do this in MPI.
The slightly longer answer is: You probably do not want to do this. I am guessing that you are trying to set up some sort of work-stealing. As suszterpatt suggested, you could use one-sided communication to 'grab' the work from the sending process, but you will need to use locks, and this will not scale well to many processes unless there is some idea of a local process group (i.e., you cannot have 1,000 processes all work-stealing from one process, you will need to decompose things).

Related

ZeroMQ PULL get all messages in queue

I have a large (~100) number of PUSH "clients" that periodically (~1/s) send a message to a PULL "server". The server processes the messages in batches at a rate of ~2 batches per second. The batch size is not fixed - ideally I would like to process all the messages in the queue at each iteration. My code for getting all the messages in the queue is as follows:
buffer = []
while True:
try:
buffer.append(socket.recv_pyobj(zmq.NOBLOCK))
except zmq.ZMQError as e:
break
print(len(buffer))
This seems to work, but the buffer lengths vary wildly in between iterations. Normally, I would expect to see ~50 messages read each time. However, I often see many iterations with <10 messages, and then one with a very large number (~1000). I suspect I'm doing something wrong regarding buffering or something like that as this should not be happening.
Highly dynamic changes on receiving side?
My suspect is the PULL-side fair-queue-ing policy, hard-wired inside the .Context() low-level data-pump, that serves ( follows ) this mandatory duty-cycle for all the .connect()-ed peers.
Unless you are able to somehow equalise the latencies with which the feeding PUSH-ers are sending their respective messages, the fair-queued receiving cycle will, exactly in the way you have reported, manifest the ones, who did not PUSH "in time" their next message.

Use Winapi Sendmessage to retrieve a value from the message's recipient

Given two applications, A and B, B needs to get value from A in a synchronous way.
In other words, taking the object model for inspiration, application B wants to do some kind of myVar = A.getValue() and continue its execution using myVar in the rest of the code.
I have the handle of application A's main form and i know how to send message to it with SendMessage(). This function waits for the message to be proceed and then return an integer that is the result of the execution. But i don't know how to use this mechanism so that B gets back some complex data structure (string or data record).
It seems to me that using the SendMessage return value is not a good idea for many reasons, so is there a way to do that?
It has to be done through Windows Messages (i already know how to do it through pipes and sockets).
Thank you!
PS: i work with Delphi but that has no importance here, unless you are able to give examples in Delphi to illustrate your answer :)
The return value of a message is an LRESULT, that is a pointer sized value. If what you wish to return fits in such a value, that would be the clean way to proceed. Otherwise you need something else.
You say that you must use Windows messages, and that you are sending these messages between different processes. Given those constraints, there is exactly one solution, namely WM_COPYDATA. That is the only Windows message which can marshal custom data across a process boundary.
So the procedure is as follows:
Send a message from process B to process A. Include in that message a window handle from process B.
When process A receives the message, it must send a WM_COPYDATA to the window handle that was sent in stage 1.
Process B can then receive and process the message sent in stage 2.
Note that the WM_COPYDATA message is sent from process A to process B whilst process A is handling the original message. This means that the WM_COPYDATA is received and processed by process B, before the original message from B to A returns. This can be somewhat confusing, but you did state that you wanted to do this entirely with Windows messages.
This can be done by one of your processes performing VirtualAllocEx() to allocate memory in another process and then use Read/WriteProcessMemory to copy data there.

ZeroMQ: I want Publish–Subscribe to drop older messages in favor of newer ones

I'm using ZeroMQ publish–subscribe sockets to connect two processes. The publishing process is a sensor, and has a much faster refresh rate than the subscription process. I want the subscription process to only use the most recent message in the queue — and ignore older messages altogether.
I've tried setting a highwater mark on the subscriber, but that seems to drop newer messages rather than older.
Is there a publish–subscribe pattern someone can direct me toward for this purpose?
read about the conflate feature from documentation on zeromq (it is kind of new), I think it is exactly what you want.
From the documentation:
ZMQ_CONFLATE: Keep only last message
If set, a socket shall keep only
one message in its inbound/outbound queue, this message being the last
message received/the last message to be sent. Ignores 'ZMQ_RCVHWM' and
'ZMQ_SNDHWM' options. Does not support multi-part messages, in
particular, only one part of it is kept in the socket internal queue.
Okay, I found a solution, but I don't know if it's the best one — so I won't mark it as correct just yet.
zmq::message_t message;
int events = 0;
size_t events_size = sizeof(int);
// Priming read
subscriber.recv(&message);
// See if there are more messages to read
subscriber.getsockopt(ZMQ_EVENTS, static_cast<void*>(&events), &events_size);
while (events & ZMQ_POLLIN) {
// Receive the new (and perhaps most recent) message
subscriber.recv(&message);
// Poll again for additional messages
subscriber.getsockopt(ZMQ_EVENTS, static_cast<void*>(&events), &events_size);
}
// Now, message points to the most recent received data.
This strategy has the added advantage that the queue shouldn't fill up. The disadvantage is that my publisher could conceivably send faster than this loop in the subscriber could be run, and then it'll loop indefinitely.
That seems unlikely, but I'd like to make it impossible. I'm not quite sure how to accomplish this goal yet.

MPI: blocking vs non-blocking

I am having trouble understanding the concept of blocking communication and non-blocking communication in MPI. What are the differences between the two? What are the advantages and disadvantages?
Blocking communication is done using MPI_Send() and MPI_Recv(). These functions do not return (i.e., they block) until the communication is finished. Simplifying somewhat, this means that the buffer passed to MPI_Send() can be reused, either because MPI saved it somewhere, or because it has been received by the destination. Similarly, MPI_Recv() returns when the receive buffer has been filled with valid data.
In contrast, non-blocking communication is done using MPI_Isend() and MPI_Irecv(). These function return immediately (i.e., they do not block) even if the communication is not finished yet. You must call MPI_Wait() or MPI_Test() to see whether the communication has finished.
Blocking communication is used when it is sufficient, since it is somewhat easier to use. Non-blocking communication is used when necessary, for example, you may call MPI_Isend(), do some computations, then do MPI_Wait(). This allows computations and communication to overlap, which generally leads to improved performance.
Note that collective communication (e.g., all-reduce) is only available in its blocking version up to MPIv2. IIRC, MPIv3 introduces non-blocking collective communication.
A quick overview of MPI's send modes can be seen here.
This post, although is a bit old, but I contend the accepted answer. the statement " These functions don't return until the communication is finished" is a little misguiding because blocking communications doesn't guarantee any handshake b/w the send and receive operations.
First one needs to know, send has four modes of communication : Standard, Buffered, Synchronous and Ready and each of these can be blocking and non-blocking
Unlike in send, receive has only one mode and can be blocking or non-blocking .
Before proceeding further, one must also be clear that I explicitly mention which one is MPI_Send\Recv buffer and which one is system buffer( which is a local buffer in each processor owned by the MPI Library used to move data around among ranks of a communication group)
BLOCKING COMMUNICATION :
Blocking doesn't mean that the message was delivered to the receiver/destination. It simply means that the (send or receive) buffer is available for reuse. To reuse the buffer, it's sufficient to copy the information to another memory area, i.e the library can copy the buffer data to own memory location in the library and then, say for e.g, MPI_Send can return.
MPI standard makes it very clear to decouple the message buffering from send and receive operations. A blocking send can complete as soon as the message was buffered, even though no matching receive has been posted. But in some cases message buffering can be expensive and hence direct copying from send buffer to receive buffer might be efficient. Hence MPI Standard provides four different send modes to give the user some freedom in selecting the appropriate send mode for her application. Lets take a look at what happens in each mode of communication :
1. Standard Mode
In the standard mode, it is up to the MPI Library, whether or not to buffer the outgoing message. In the case where the library decides to buffer the outgoing message, the send can complete even before the matching receive has been invoked. In the case where the library decides not to buffer (for performance reasons, or due to unavailability of buffer space), the send will not return until a matching receive has been posted and the data in send buffer has been moved to the receive buffer.
Thus MPI_Send in standard mode is non-local in the sense that send in standard mode can be started whether or not a matching receive has been posted and its successful completion may depend on the occurrence of a matching receive ( due to the fact it is implementation dependent if the message will be buffered or not) .
The syntax for standard send is below :
int MPI_Send(const void *buf, int count, MPI_Datatype datatype,
int dest, int tag, MPI_Comm comm)
2. Buffered Mode
Like in the standard mode, the send in buffered mode can be started irrespective of the fact that a matching receive has been posted and the send may complete before a matching receive has been posted. However the main difference arises out of the fact that if the send is stared and no matching receive is posted the outgoing message must be buffered. Note if the matching receive is posted the buffered send can happily rendezvous with the processor that started the receive, but in case there is no receive, the send in buffered mode has to buffer the outgoing message to allow the send to complete. In its entirety, a buffered send is local. Buffer allocation in this case is user defined and in the event of insufficient buffer space, an error occurs.
Syntax for buffer send :
int MPI_Bsend(const void *buf, int count, MPI_Datatype datatype,
int dest, int tag, MPI_Comm comm)
3. Synchronous Mode
In synchronous send mode, send can be started whether or not a matching receive was posted. However the send will complete successfully only if a matching receive was posted and the receiver has started to receive the message sent by synchronous send. The completion of synchronous send not only indicates that the buffer in the send can be reused, but also the fact that receiving process has started to receive the data. If both send and receive are blocking then the communication does not complete at either end before the communicating processor rendezvous.
Syntax for synchronous send :
int MPI_Ssend(const void *buf, int count, MPI_Datatype datatype, int dest,
int tag, MPI_Comm comm)
4. Ready Mode
Unlike the previous three mode, a send in ready mode can be started only if the matching receive has already been posted. Completion of the send doesn't indicate anything about the matching receive and merely tells that the send buffer can be reused. A send that uses ready mode has the same semantics as standard mode or a synchronous mode with the additional information about a matching receive. A correct program with a ready mode of communication can be replaced with synchronous send or a standard send with no effect to the outcome apart from performance difference.
Syntax for ready send :
int MPI_Rsend(const void *buf, int count, MPI_Datatype datatype, int dest,
int tag, MPI_Comm comm)
Having gone through all the 4 blocking-send, they might seem in principal different but depending on implementation the semantics of one mode may be similar to another.
For example MPI_Send in general is a blocking mode but depending on implementation, if the message size is not too big, MPI_Send will copy the outgoing message from send buffer to system buffer ('which mostly is the case in modern system) and return immediately. Lets look at an example below :
//assume there are 4 processors numbered from 0 to 3
if(rank==0){
tag=2;
MPI_Send(&send_buff1, 1, MPI_DOUBLE, 1, tag, MPI_COMM_WORLD);
MPI_Send(&send_buff2, 1, MPI_DOUBLE, 2, tag, MPI_COMM_WORLD);
MPI_Recv(&recv_buff1, MPI_FLOAT, 3, 5, MPI_COMM_WORLD);
MPI_Recv(&recv_buff2, MPI_INT, 1, 10, MPI_COMM_WORLD);
}
else if(rank==1){
tag = 10;
//receive statement missing, nothing received from proc 0
MPI_Send(&send_buff3, 1, MPI_INT, 0, tag, MPI_COMM_WORLD);
MPI_Send(&send_buff3, 1, MPI_INT, 3, tag, MPI_COMM_WORLD);
}
else if(rank==2){
MPI_Recv(&recv_buff, 1, MPI_DOUBLE, 0, 2, MPI_COMM_WORLD);
//do something with receive buffer
}
else{ //if rank == 3
MPI_Send(send_buff, 1, MPI_FLOAT, 0, 5, MPI_COMM_WORLD);
MPI_Recv(recv_buff, 1, MPI_INT, 1, 10, MPI_COMM_WORLD);
}
Lets look at what is happening at each rank in the above example
Rank 0 is trying to send to rank 1 and rank 2, and receive from rank 1 andd 3.
Rank 1 is trying to send to rank 0 and rank 3 and not receive anything from any other ranks
Rank 2 is trying to receive from rank 0 and later do some operation with the data received in the recv_buff.
Rank 3 is trying to send to rank 0 and receive from rank 1
Where beginners get confused is that rank 0 is sending to rank 1 but rank 1 hasn't started any receive operation hence the communication should block or stall and the second send statement in rank 0 should not be executed at all (and this is what MPI documentation stress that it is implementation defined whether or not the outgoing message will be buffered or not). In most of the modern system, such messages of small sizes (here size is 1) will easily be buffered and MPI_Send will return and execute its next MPI_Send statement. Hence in above example, even if the receive in rank 1 is not started, 1st MPI_Send in rank 0 will return and it will execute its next statement.
In a hypothetical situation where rank 3 starts execution before rank 0, it will copy the outgoing message in the first send statement from the send buffer to a system buffer (in a modern system ;) ) and then start executing its receive statement. As soon as rank 0 finishes its two send statements and begins executing its receive statement, the data buffered in system by rank 3 is copied in the receive buffer in rank 0.
In case there's a receive operation started in a processor and no matching send is posted, the process will block until the receive buffer is filled with the data it is expecting. In this situation an computation or other MPI communication will be blocked/halted unless MPI_Recv has returned.
Having understood the buffering phenomena, one should return and think more about MPI_Ssend which has the true semantics of a blocking communication. Even if MPI_Ssend copies the outgoing message from send buffer to a system buffer (which again is implementation defined), one must note MPI_Ssend will not return unless some acknowledge (in low level format) from the receiving process has been received by the sending processor.
Fortunately MPI decided to keep things easer for the users in terms of receive and there is only one receive in Blocking communication : MPI_Recv, and can be used with any of the four send modes described above. For MPI_Recv, blocking means that receive returns only after it contains the data in its buffer. This implies that receive can complete only after a matching send has started but doesn't imply whether or not it can complete before the matching send completes.
What happens during such blocking calls is that the computations are halted until the blocked buffer is freed. This usually leads to wastage of computational resources as Send/Recv is usually copying data from one memory location to another memory location, while the registers in cpu remain idle.
NON-BLOCKING COMMUNICATION :
For Non-Blocking Communication, the application creates a request for communication for send and / or receive and gets back a handle and then terminates. That's all that is needed to guarantee that the process is executed. I.e the MPI library is notified that the operation has to be executed.
For the sender side, this allows overlapping computation with communication.
For the receiver side, this allows overlapping a part of the communication overhead , i.e copying the message directly into the address space of the receiving side in the application.
In using blocking communication you must be care about send and receive calls for example
look at this code
if(rank==0)
{
MPI_Send(x to process 1)
MPI_Recv(y from process 1)
}
if(rank==1)
{
MPI_Send(y to process 0);
MPI_Recv(x from process 0);
}
What happens in this case?
Process 0 sends x to process 1 and blocks until process 1 receives x.
Process 1 sends y to process 0 and blocks until process 0 receives y, but
process 0 is blocked such that process 1 blocks for infinity until the two processes are killed.
It is easy.
Non-blocking means computation and transferring data can happen in the same time for a single process.
While Blocking means, hey buddy, you have to make sure that you have already finished transferring data then get back to finish the next command, which means if there is a transferring followed by a computation, computation must be after the success of transferring.
Both the accepted answer and the other very long one mention overlap of computation and communication as an advantage. That is 1. not the main motivation, and 2. very hard to attain. The main advantage (and the original motivation) of non-blocking communication is that you can express complicated communication patterns without getting deadlock and without processes serializing themselves unnecessarily.
Examples: Deadlock: everyone does a receive, then everyone does a send, for instance along a ring. This will hang.
Serialization: along a linear ordering, everyone except the last does a send to the right, then everyone except the first does a receive from the left. This will have all processes executing sequentially rather than in parallel.

Usage of non-blocking send and blocking receive in MPI?

I am trying to implement master-worker program.
My master has jobs that the workers are going to do. Every time a worker completes a job, he asks for a new job from the master, and the master sends it to him. The workers are calculating minimal paths. When a worker finds a minimum that is better than the global minimum he got, he sends it to everyone including the master.
I plan for the workers and masters to send data using MPI_ISEND. Also, I think that the receive should be blocking. The master has nothing to do when no one has asked for work or has updated the best result, so he should block waiting for a receive. Also, each worker should, after he has done his work, wait on a receive to get a new one.
Nevertheless, I'm not sure of the impact of using non-blocking asynchronous send, and blocking synchronous receive.
An alternative I think is using MPI_IPROBE, but I'm not sure that this will give me any optimization.
Please help me understand whether what I'm doing is right. Is this the right solution?
You can match blocking sends with nonblocking receives and vice versa, that won't cause any problems. However, if the master really has nothing to do while the workers work, and the workers should block after completing their work unit, then there's no reason for non-blocking communication on that front. The master can post a blocking receive with MPI_ANY_SOURCE, and the workers can just use a blocking send to post back their results, since the matching receive at the master will already have been posted.
So, I'd have Send-Recv for exchanging work units between master and worker, and Isend-Irecv for broadcasting the new global minima.

Resources