how to fix this MPI code program - debugging

This program demonstrates an unsafe program, because sometimes it will execute fine, and other times it will fail. The reason why the program fails or hangs is due to buffer exhaustion on the receiving task side, as a consequence of the way an MPI library has implemented an eager protocol for messages of a certain size. One possible solution is to include an MPI_Barrier call in the both the send and receive loops.
how its program code is correct???
#include "mpi.h"
#include <stdio.h>
#include <stdlib.h>
#define MSGSIZE 2000
int main (int argc, char *argv[])
{
int numtasks, rank, i, tag=111, dest=1, source=0, count=0;
char data[MSGSIZE];
double start, end, result;
MPI_Status status;
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
if (rank == 0) {
printf ("mpi_bug5 has started...\n");
if (numtasks > 2)
printf("INFO: Number of tasks= %d. Only using 2 tasks.\n", numtasks);
}
/******************************* Send task **********************************/
if (rank == 0) {
/* Initialize send data */
for(i=0; i<MSGSIZE; i++)
data[i] = 'x';
start = MPI_Wtime();
while (1) {
MPI_Send(data, MSGSIZE, MPI_BYTE, dest, tag, MPI_COMM_WORLD);
count++;
if (count % 10 == 0) {
end = MPI_Wtime();
printf("Count= %d Time= %f sec.\n", count, end-start);
start = MPI_Wtime();
}
}
}
/****************************** Receive task ********************************/
if (rank == 1) {
while (1) {
MPI_Recv(data, MSGSIZE, MPI_BYTE, source, tag, MPI_COMM_WORLD, &status);
/* Do some work - at least more than the send task */
result = 0.0;
for (i=0; i < 1000000; i++)
result = result + (double)random();
}
}
MPI_Finalize();
}

Ways to improve this code so that the receiver doesn't end up with an unlimited number of unexpected messages include:
Synchronization - you mentioned MPI_Barrier, but even using MPI_Ssend instead of MPI_Send would work.
Explicit buffering - the use of MPI_Bsend or Brecv to ensure adequate buffering exists.
Posted receives - the receiving process posts IRecvs before starting work to ensure that the messages are received into the buffers meant to hold the data, rather than system buffers.
In this pedagogical case, since the number of messages is unlimited, only the first (synchronization) would reliably work.

Related

Load balancing MPI multithreading for variable-complexity tasks or variable-speed nodes?

I've written an MPI code that currently multithreads by sending equal numbers of elements from each array to a different process to do work (thus, for 6 workers, the array is broken into 6 equal parts). What I would like to do is send small chunks only if a worker is ready to receive, and receive completed chunks without blocking future sends; this way if one chunk takes 10 seconds but the other chunks take 1 second, other data can be processed while waiting for the long chunk to complete.
Here's some skeleton code I've put together:
#include <mpi.h>
#include <iostream>
#include <vector>
#include <cmath>
struct crazytaxi
{
double a = 10.0;
double b = 25.2;
double c = 222.222;
};
int main(int argc, char** argv)
{
//Initial and temp kanno vectors
std::vector<crazytaxi> kanno;
std::vector<crazytaxi> kanno_tmp;
//init MPI
MPI_Init(NULL,NULL);
//allocate vector
int SZ = 4200;
kanno.resize(SZ);
int world_size;
MPI_Comm_size(MPI_COMM_WORLD,&world_size);
int world_rank;
MPI_Comm_rank(MPI_COMM_WORLD,&world_rank);
if (world_rank == 0)
{
for (int i = 0; i < SZ; i++)
kanno[i].a = 1.0*i;
kanno[i].b = 10.0/(i+1);
}
for (int j = 0; j < 10; j++) {
//Make sure all processes have same kanno vector;
if (world_rank == 0) {
for (int i = 1; i < world_size; i++)
MPI_Send(&kanno[0],sizeof(crazytaxi)*kanno.size(),MPI_BYTE,i,3,MPI_COMM_WORLD);
} else {
MPI_Recv(&kanno[0],sizeof(crazytaxi)*kanno.size(),MPI_BYTE,0,3,MPI_COMM_WORLD,MPI_STATUS_IGNORE);
}
//copy to tmp vector
kanno_tmp = kanno;
MPI_Barrier();
//the sender
if (world_rank == 0) {
unsigned p1 = 0;
unsigned segment = 10;
unsigned p2 = segment;
while (p1 < SZ) {
for (int i = 0; i < world_size; i++) {
//if (process #i is ready to receive)
//Send data in chunks of 10 to i
//else
//continue
}
}
}
if (world_rank != 0) {
//Receive data to be processed
//do some math
for (unsigned i = p1; i < p2; i++)
kanno_tmp[i].a = std::sqrt(kanno[i].a)/((double)i+1.0);
//Send processed data to 0 and wait to receive new data.
}
//copy temp vector to kanno
kanno = kanno_tmp;
}
//print some of the results;
if (world_rank == 0)
{
for (int i = 0; i < SZ; i += 40)
printf("Line %d: %lg,%lg\n",i,kanno[i].a,kanno[i].b);
}
MPI_Finalize();
}
I can 90% turn this into what I want, except that my MPI_Send and MPI_Recv calls will block, or the 'master' process won't know that the 'slave' processes are ready to receive data.
Is there a way in MPI to do something like
unsigned Datapointer = [some_array_index];
while (Datapointer < array_size) {
if (world_rank == 0) {
for (int i = 1; i < world_size; i++)
{
if (<process i is ready to receive>) {
MPI_Send([...]);
Datapointer += 10;
}
if (<process i has sent data>)
MPI_Recv([...]);
if (Datapointer > array_size) {
MPI_Bcast([killswitch]);
break;
}
}
}
}
MPI_Barrier();
or is there a more efficient way to structure this for variable-complexity chunks or variable-speed nodes?
As #Gilles Gouaillardet, pointed out the keywords in such scenario is MPI_ANY_SOURCE. Using it, the processes can receive message from any source. To know which process send that message, you can use status.MPI_SOURCE on the status of the recv call.
MPI_Status status;
if(rank == 0) {
//send initial work to all processes
while(true) {
MPI_recv(buf, 32, MPI_INT, MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &status);
// do the distribution logic
MPI_send(buf, 32, MPI_INT, status.MPI_SOURCE, tag, MPI_COMM_WORLD);
// break out of the loop once the work is over and send all the processes
message to stop waiting for work
}
}
else {
while(true){
// receive work from rank 0
MPI_recv(buf, 32, MPI_INT, 0, MPI_ANY_TAG, MPI_COMM_WORLD, &status);
// Perform computation and send back the result
MPI_send(buf, 32, MPI_INT, 0, tag, MPI_COMM_WORLD);
//break this until asked by master 0 using some kind of special message
}
}

How to work with NETLINK_KOBJECT_UEVENT protocol in user space?

Let's consider this example code:
#include <linux/netlink.h>
#include <sys/socket.h>
#include <string.h>
#include <unistd.h>
#include <stdio.h>
#define BUF_SIZE 4096
int main() {
int fd, res;
unsigned int i, len;
char buf[BUF_SIZE];
struct sockaddr_nl nls;
fd = socket(PF_NETLINK, SOCK_RAW, NETLINK_KOBJECT_UEVENT);
if (fd == -1) {
return 1;
}
memset(&nls, 0, sizeof(nls));
nls.nl_family = AF_NETLINK;
nls.nl_pid = getpid();
nls.nl_groups = 1;
res = bind(fd, (struct sockaddr *)&nls, sizeof(nls));
if (res == -1) {
return 2;
}
while (1) {
len = recv(fd, buf, sizeof(buf), 0);
printf("============== Received %d bytes\n", len);
for (i = 0; i < len; ++i) {
if (buf[i] == 0) {
printf("[0x00]\n");
} else if (buf[i] < 33 || buf[i] > 126) {
printf("[0x%02hhx]", buf[i]);
} else {
printf("%c", buf[i]);
}
}
printf("<END>\n");
}
close(fd);
return 0;
}
It listens on netlink socket for events related to hotplug. Basically, it works. However, some parts are unclear for me even after spending whole evening on googling, reading different pieces of documentation and manuals and working through examples.
Basically, I have two questions.
What different values for sockaddr_nl.nl_groups means? At least for NETLINK_KOBJECT_UEVENT protocol.
If buffer allocated for the message is too small, the message will be simply truncated (you can play with the BUF_SIZE size to see that). What this buffer size should be to not lose any data? Is it possible to know in user space length of the incoming message to allocate enough space?
I would appreciate either direct answers as references to kernel code.
The values represent different multicast groups. A netlink socket can have 31 different multicast groups (0 means unicast) that multicast messages can be sent to. For NETLINK_KOBJECT_UEVENT it looks like it's fixed to 1 see f.ex. here.
You should be able to use getsockopt with level set to SOL_SOCKET and optname set to SO_RCVBUF.

Error when using MPI_SEND and MPI_RECV on Windows

I am following a source code on my documents, but I encounter an error when I try to use MPI_Send() and MPI_Recv() from Open MPI library.
I have googled and read some threads in this site but I can not find the solution to resolve my error.
This is my error:
mca_oob_tcp_msg_recv: readv faled : Unknown error (108)
Here is details image:
And this is the code that I'm following:
#include <stdio.h>
#include <string.h>
#include <conio.h>
#include <mpi.h>
int main(int argc, char **argv) {
int rank, size, mesg, tag = 123;
MPI_Status status;
MPI_Init(&argv, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
if (size < 2) {
printf("Need at least 2 processes!\n");
} else if (rank == 0) {
mesg = 11;
MPI_Send(&mesg,1,MPI_INT,1,tag,MPI_COMM_WORLD);
MPI_Recv(&mesg,1,MPI_INT,1,tag,MPI_COMM_WORLD,&status);
printf("Rank 0 received %d from rank 1\n",mesg);
} else if (rank == 1) {
MPI_Recv(&mesg,1,MPI_INT,0,tag,MPI_COMM_WORLD,&status);
printf("Rank 1 received %d from rank 0/n",mesg);
mesg = 42;
MPI_Send(&mesg,1,MPI_INT,0,tag,MPI_COMM_WORLD);
}
MPI_Finalize();
return 0;
}
I commented all of MPI_Send(), and MPI_Recv(), and my program worked. In other hand, I commented either MPI_Send() or MPI_Recv(), and I still got that error. So I think the problem are MPI_Send() and MPI_Recv() functions.
P.S.: I'm using Open MPI v1.6 on Windows 8.1 OS.
You pass in the wrong arguments to MPI_Init (two times argv, instead of argc and argv once each).
The sends and receives actually look fine, I think. But there is also one typo in one of your prints with a /n instead of \n.
Here is what works for me (on MacOSX, though):
int main(int argc, char **argv) {
int rank, size, mesg, tag = 123;
MPI_Status status;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
if (size < 2) {
printf("Need at least 2 processes!\n");
} else if (rank == 0) {
mesg = 11;
MPI_Send(&mesg,1,MPI_INT,1,tag,MPI_COMM_WORLD);
MPI_Recv(&mesg,1,MPI_INT,1,tag,MPI_COMM_WORLD,&status);
printf("Rank 0 received %d from rank 1\n",mesg);
} else if (rank == 1) {
MPI_Recv(&mesg,1,MPI_INT,0,tag,MPI_COMM_WORLD,&status);
printf("Rank 1 received %d from rank 0\n",mesg);
mesg = 42;
MPI_Send(&mesg,1,MPI_INT,0,tag,MPI_COMM_WORLD);
}
MPI_Finalize();
return 0;
}
If this does not work, I'd guess your OS does not let the processes communicate with each other via the method chosen by OpenMPI.
Set MPI_STATUS_IGNORED instead of &status in MPI_Recv in both places.

Inverting an image using MPI

I am trying to invert a PGM image using MPI. The grayscale (PGM) image should be loaded on the root processor and then be sent to each of the s^2 processors. Each processor will invert a block of the given image, and the inverted blocks will be gathered back on the root processor, which will assemble the blocks into the final image and write it to a PGM image. I ran the following code, but did not get any output. The image was read after running the code, but there was no indication of writing the resultant image. Could you please let me know what could be wrong with it?
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
#include <time.h>
#include <string.h>
#include <math.h>
#include <memory.h>
#define max(x, y) ((x>y) ? (x):(y))
#define min(x, y) ((x<y) ? (x):(y))
int xdim;
int ydim;
int maxraw;
unsigned char *image;
void ReadPGM(FILE*);
void WritePGM(FILE*);
#define s 2
int main(int argc, char **argv) {
MPI_Init(&argc, &argv);
int p, rank;
MPI_Comm_size(MPI_COMM_WORLD, &p);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
const int NPROWS=s; /* number of rows in _decomposition_ */
const int NPCOLS=s; /* number of cols in _decomposition_ */
const int BLOCKROWS = xdim/NPROWS; /* number of rows in _block_ */
const int BLOCKCOLS = ydim/NPCOLS; /* number of cols in _block_ */
int i, j;
FILE *fp;
float BLimage[BLOCKROWS*BLOCKCOLS];
for (int ii=0; ii<BLOCKROWS*BLOCKCOLS; ii++)
BLimage[ii] = 0;
float BLfilteredMat[BLOCKROWS*BLOCKCOLS];
for (int ii=0; ii<BLOCKROWS*BLOCKCOLS; ii++)
BLfilteredMat[ii] = 0;
if (rank == 0) {
/* begin reading PGM.... */
ReadPGM(fp);
}
MPI_Datatype blocktype;
MPI_Datatype blocktype2;
MPI_Type_vector(BLOCKROWS, BLOCKCOLS, ydim, MPI_FLOAT, &blocktype2);
MPI_Type_create_resized( blocktype2, 0, sizeof(float), &blocktype);
MPI_Type_commit(&blocktype);
int disps[NPROWS*NPCOLS];
int counts[NPROWS*NPCOLS];
for (int ii=0; ii<NPROWS; ii++) {
for (int jj=0; jj<NPCOLS; jj++) {
disps[ii*NPCOLS+jj] = ii*ydim*BLOCKROWS+jj*BLOCKCOLS;
counts [ii*NPCOLS+jj] = 1;
}
}
MPI_Scatterv(image, counts, disps, blocktype, BLimage, BLOCKROWS*BLOCKCOLS, MPI_FLOAT, 0, MPI_COMM_WORLD);
//************** Invert the block **************//
for (int proc=0; proc<p; proc++) {
if (proc == rank) {
for (int j = 0; j < BLOCKCOLS; j++) {
for (int i = 0; i < BLOCKROWS; i++) {
BLfilteredMat[j*BLOCKROWS+i] = 255 - image[j*BLOCKROWS+i];
}
}
} // close if (proc == rank) {
MPI_Barrier(MPI_COMM_WORLD);
} // close for (int proc=0; proc<p; proc++) {
MPI_Gatherv(BLfilteredMat, BLOCKROWS*BLOCKCOLS,MPI_FLOAT, image, counts, disps,blocktype, 0, MPI_COMM_WORLD);
if (rank == 0) {
/* Begin writing PGM.... */
WritePGM(fp);
free(image);
}
MPI_Finalize();
return (1);
}
It is very likely MPI is not the right tool for the job. The reason for this is that your job is inherently bandwidth limited.
Think of it this way: You have a coloring book with images which you all want to color in.
Method 1: you take your time and color them in one by one.
Method 2: you copy each page to a new sheet of paper and mail it to a friend who then colors it in for you. He mails it back to you and in the end you glue all the pages you received from all of your friends together to make one colored-in book.
Note that method two involves copying the whole book, which is arguably the same amount of work needed to color in the whole book. So method two is less time-efficient without even considering the overhead of shoving the pages into an envelope, licking the stamp, going to the post office and waiting for the letter to be delivered.
If you look at your code, every transmitted byte is only touched once throughout the whole program in this line:
BLfilteredMat[j*BLOCKROWS+i] = 255 - image[j*BLOCKROWS+i];
The single processor is much faster at subtracting two integers than it is at sending an integer of the wire, therefore one must advise against using MPI for your particular problem.
My suggestion to solve your problem: Try to avoid unneccessary communication whenever possible. Do all processes have access to the file system on which the files are located? You could try reading them directly from the filesystem.

MPI hangs during execution

I'm trying to write a simple program with MPI that finds all numbers less than 514, that are equal to the exponent of the sum of their digits(for example, 512 = (5+1+2)^3. The problem I have is with the main loop - it works just fine on a few iterations(c=10), but when I try to increase the number of iterations(c=x), mpiexec.exe just hangs - seemingly in the middle of printf routine.
I'm pretty sure that deadlocks are to blame, but I couldn't find any.
The source code:
#include <stdlib.h>
#include <stdio.h>
#include <iostream>
#include "mpi.h"
int main(int argc, char* argv[])
{
//our number
int x=514;
//amount of iterations
int c = 10;
//tags for message identification
int tag = 42;
int tagnumber = 43;
int np, me, y1, y2;
MPI_Status status;
/* Initialize MPI */
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &np);
MPI_Comm_rank(MPI_COMM_WORLD, &me);
/* Check that we run on more than two processors */
if (np < 2)
{
printf("You have to use at least 2 processes to run this program\n");
MPI_Finalize();
exit(0);
}
//begin iterations
while(c>0)
{
//if main thread, then send messages to all created threads
if (me == 0)
{
printf("Amount of threads: %d\n", np);
int b = 1;
while(b<np)
{
int q = x-b;
//sends a number to a secondary thread
MPI_Send(&q, 1, MPI_INT, b, tagnumber, MPI_COMM_WORLD);
printf("Process %d sending to process %d, value: %d\n", me, b, q);
//get a number from secondary thread
MPI_Recv(&y2, 1, MPI_INT, b, tag, MPI_COMM_WORLD, &status);
printf ("Process %d received value %d\n", me, y2);
//compare it with the sent one
if (q==y2)
{
//if they're equal, then print the result
printf("\nValue found: %d\n", q);
}
b++;
}
x = x-b+1;
b = 1;
}
else
{
//if not a main thread, then process the message sent and send the result back.
MPI_Recv (&y1, 1, MPI_INT, 0, tagnumber, MPI_COMM_WORLD, &status);
int sum = 0;
int y2 = y1;
while (y1!=0)
{
//find the number's sum of digits
sum += y1%10;
y1 /= 10;
}
int sum2 = sum;
while(sum2<y2)
{
//calculate the exponentiation
sum2 = sum2*sum;
}
MPI_Send (&sum2, 1, MPI_INT, 0, tag, MPI_COMM_WORLD);
}
c--;
}
MPI_Finalize();
exit(0);
}
And I run the compiled exe-file as "mpiexec.exe -n 4 lab2.exe". I use HPC Pack 2008 SDK, if that's of any use to you guys.
Is there any way to fix it? Or maybe some way to debug that situation properly?
Thanks a lot in advance!
Not sure if you already found where's the problem, but your infinite run happens in this loop:
while(sum2<y2)
{
//calculate the exponentiation
sum2 = sum2*sum;
}
You can confirm this by setting c to about 300 or above then make a printf call in this while loop. I haven't completely pinpoint your error of logic, but I marked three comments below at your code location where I feel is strange:
while(c>0)
{
if (me == 0)
{
...
while(b<np)
{
int q = x-b; //<-- you subtract b from x here
...
b++;
}
x = x-b+1; //<-- you subtract b again. sure this is what you want?
b = 1; //<-- this is useless
}
Hope this helps.

Resources