How to debug MPI functions (MPICH) in external processes? - debugging

I'am replacing the POSIX that is working in the MPI-function with other means, but having trouble with debugging the external process.
I am changing the functions that are from utils/sock/sock.c in MPI library (hydra)
HYDU_sock_write //changing the source code
HYDU_sock_read //changing the source code
HYD_status HYDU_sock_read(int fd, void *buf, int maxlen, int *recvd, int *closed,
enum HYDU_sock_comm_flag flag)
HYD_status HYDU_sock_write(int fd, const void *buf, int maxlen, int *sent, int *closed,
enum HYDU_sock_comm_flag flag)
For example, when I try to run the test source code below with node1 and node2 I can see that two main processes are initiated, and one runs in the node1 and the other runs in the node2.
int main(int argc, char** argv) {
// Initialize the MPI environment
MPI_Init(NULL, NULL);
// Find out rank, size
int world_rank;
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
int world_size;
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
// We are assuming at least 2 processes for this task
if (world_size < 2) {
fprintf(stderr, "World size must be greater than 1 for %s\n", argv[0]);
MPI_Abort(MPI_COMM_WORLD, 1);
}
int number;
if (world_rank == 0) {
// If we are rank 0, set the number to -1 and send it to process 1
number = -1;
MPI_Send(&number, 1, MPI_INT, 1, 0, MPI_COMM_WORLD);
} else if (world_rank == 1) {
MPI_Recv(&number, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
printf("Process 1 received number %d from process 0\n", number);
}
MPI_Finalize();
}
The problem is I can debug the first process in node1, and I successfully changed all the source code in the MPI function. However I haven't got a clue how I can debug the second process which runs in node2 (external host).
My question is..
Is there a good way to debug the second process that runs in the external host? I have tried the printf function but this also only runs in the first process which runs in the node1 and does not show anything in node2.

Related

Error when using MPI_SEND and MPI_RECV on Windows

I am following a source code on my documents, but I encounter an error when I try to use MPI_Send() and MPI_Recv() from Open MPI library.
I have googled and read some threads in this site but I can not find the solution to resolve my error.
This is my error:
mca_oob_tcp_msg_recv: readv faled : Unknown error (108)
Here is details image:
And this is the code that I'm following:
#include <stdio.h>
#include <string.h>
#include <conio.h>
#include <mpi.h>
int main(int argc, char **argv) {
int rank, size, mesg, tag = 123;
MPI_Status status;
MPI_Init(&argv, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
if (size < 2) {
printf("Need at least 2 processes!\n");
} else if (rank == 0) {
mesg = 11;
MPI_Send(&mesg,1,MPI_INT,1,tag,MPI_COMM_WORLD);
MPI_Recv(&mesg,1,MPI_INT,1,tag,MPI_COMM_WORLD,&status);
printf("Rank 0 received %d from rank 1\n",mesg);
} else if (rank == 1) {
MPI_Recv(&mesg,1,MPI_INT,0,tag,MPI_COMM_WORLD,&status);
printf("Rank 1 received %d from rank 0/n",mesg);
mesg = 42;
MPI_Send(&mesg,1,MPI_INT,0,tag,MPI_COMM_WORLD);
}
MPI_Finalize();
return 0;
}
I commented all of MPI_Send(), and MPI_Recv(), and my program worked. In other hand, I commented either MPI_Send() or MPI_Recv(), and I still got that error. So I think the problem are MPI_Send() and MPI_Recv() functions.
P.S.: I'm using Open MPI v1.6 on Windows 8.1 OS.
You pass in the wrong arguments to MPI_Init (two times argv, instead of argc and argv once each).
The sends and receives actually look fine, I think. But there is also one typo in one of your prints with a /n instead of \n.
Here is what works for me (on MacOSX, though):
int main(int argc, char **argv) {
int rank, size, mesg, tag = 123;
MPI_Status status;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
if (size < 2) {
printf("Need at least 2 processes!\n");
} else if (rank == 0) {
mesg = 11;
MPI_Send(&mesg,1,MPI_INT,1,tag,MPI_COMM_WORLD);
MPI_Recv(&mesg,1,MPI_INT,1,tag,MPI_COMM_WORLD,&status);
printf("Rank 0 received %d from rank 1\n",mesg);
} else if (rank == 1) {
MPI_Recv(&mesg,1,MPI_INT,0,tag,MPI_COMM_WORLD,&status);
printf("Rank 1 received %d from rank 0\n",mesg);
mesg = 42;
MPI_Send(&mesg,1,MPI_INT,0,tag,MPI_COMM_WORLD);
}
MPI_Finalize();
return 0;
}
If this does not work, I'd guess your OS does not let the processes communicate with each other via the method chosen by OpenMPI.
Set MPI_STATUS_IGNORED instead of &status in MPI_Recv in both places.

MPI hangs during execution

I'm trying to write a simple program with MPI that finds all numbers less than 514, that are equal to the exponent of the sum of their digits(for example, 512 = (5+1+2)^3. The problem I have is with the main loop - it works just fine on a few iterations(c=10), but when I try to increase the number of iterations(c=x), mpiexec.exe just hangs - seemingly in the middle of printf routine.
I'm pretty sure that deadlocks are to blame, but I couldn't find any.
The source code:
#include <stdlib.h>
#include <stdio.h>
#include <iostream>
#include "mpi.h"
int main(int argc, char* argv[])
{
//our number
int x=514;
//amount of iterations
int c = 10;
//tags for message identification
int tag = 42;
int tagnumber = 43;
int np, me, y1, y2;
MPI_Status status;
/* Initialize MPI */
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &np);
MPI_Comm_rank(MPI_COMM_WORLD, &me);
/* Check that we run on more than two processors */
if (np < 2)
{
printf("You have to use at least 2 processes to run this program\n");
MPI_Finalize();
exit(0);
}
//begin iterations
while(c>0)
{
//if main thread, then send messages to all created threads
if (me == 0)
{
printf("Amount of threads: %d\n", np);
int b = 1;
while(b<np)
{
int q = x-b;
//sends a number to a secondary thread
MPI_Send(&q, 1, MPI_INT, b, tagnumber, MPI_COMM_WORLD);
printf("Process %d sending to process %d, value: %d\n", me, b, q);
//get a number from secondary thread
MPI_Recv(&y2, 1, MPI_INT, b, tag, MPI_COMM_WORLD, &status);
printf ("Process %d received value %d\n", me, y2);
//compare it with the sent one
if (q==y2)
{
//if they're equal, then print the result
printf("\nValue found: %d\n", q);
}
b++;
}
x = x-b+1;
b = 1;
}
else
{
//if not a main thread, then process the message sent and send the result back.
MPI_Recv (&y1, 1, MPI_INT, 0, tagnumber, MPI_COMM_WORLD, &status);
int sum = 0;
int y2 = y1;
while (y1!=0)
{
//find the number's sum of digits
sum += y1%10;
y1 /= 10;
}
int sum2 = sum;
while(sum2<y2)
{
//calculate the exponentiation
sum2 = sum2*sum;
}
MPI_Send (&sum2, 1, MPI_INT, 0, tag, MPI_COMM_WORLD);
}
c--;
}
MPI_Finalize();
exit(0);
}
And I run the compiled exe-file as "mpiexec.exe -n 4 lab2.exe". I use HPC Pack 2008 SDK, if that's of any use to you guys.
Is there any way to fix it? Or maybe some way to debug that situation properly?
Thanks a lot in advance!
Not sure if you already found where's the problem, but your infinite run happens in this loop:
while(sum2<y2)
{
//calculate the exponentiation
sum2 = sum2*sum;
}
You can confirm this by setting c to about 300 or above then make a printf call in this while loop. I haven't completely pinpoint your error of logic, but I marked three comments below at your code location where I feel is strange:
while(c>0)
{
if (me == 0)
{
...
while(b<np)
{
int q = x-b; //<-- you subtract b from x here
...
b++;
}
x = x-b+1; //<-- you subtract b again. sure this is what you want?
b = 1; //<-- this is useless
}
Hope this helps.

why does my program hang when I use MPI_Send and MPI_Recv?

There is a simple communication program that I used in MPICH2. when I execute the program by using
mpiexec.exe -hosts 2 o00 o01 -noprompt mesajlasma.exe
The program starts but does not end. I can see it is still running on the host computer "o01" by using resource monitor program. When I press CTRL + c, it is ended. Then I can see that my program ran properly
Why doesn't my program end. Where does it stuck? why does my program hang when I use MPI_Send and MPI_Recv?
Thanks in advance
// mesajlasma.cpp
#include "stdafx.h"
#include "string.h"
#include "mpi.h"
int main(int argc, char* argv[])
{
int nTasks, rank;
char mesaj[20];
MPI_Status status;
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD,&nTasks);
MPI_Comm_rank(MPI_COMM_WORLD,&rank);
//printf ("\nNumber of threads = %d, My rank = %d\n", nTasks, rank);
if(rank == 1)
{
strcpy_s(mesaj, "Hello World");
if (MPI_SUCCESS==MPI_Send(mesaj, strlen(mesaj)+1, MPI_CHAR, 0, 99, MPI_COMM_WORLD)) printf("_OK!_\n");
}
if(rank == 0)
{
MPI_Recv(mesaj, 20, MPI_CHAR, 1, 99, MPI_COMM_WORLD, &status);
printf("Received Message:%s\n", mesaj);
}
MPI_Finalize();
return 0;
}
You probably also need to pass an argument like -n 2 to your mpiexec.exe command in order to instruct it to launch 2 processes. I believe that the -hosts argument is just an alternative way to specify the hosts on which your program can run, not how many processes will be created.

how to fix this MPI code program

This program demonstrates an unsafe program, because sometimes it will execute fine, and other times it will fail. The reason why the program fails or hangs is due to buffer exhaustion on the receiving task side, as a consequence of the way an MPI library has implemented an eager protocol for messages of a certain size. One possible solution is to include an MPI_Barrier call in the both the send and receive loops.
how its program code is correct???
#include "mpi.h"
#include <stdio.h>
#include <stdlib.h>
#define MSGSIZE 2000
int main (int argc, char *argv[])
{
int numtasks, rank, i, tag=111, dest=1, source=0, count=0;
char data[MSGSIZE];
double start, end, result;
MPI_Status status;
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
if (rank == 0) {
printf ("mpi_bug5 has started...\n");
if (numtasks > 2)
printf("INFO: Number of tasks= %d. Only using 2 tasks.\n", numtasks);
}
/******************************* Send task **********************************/
if (rank == 0) {
/* Initialize send data */
for(i=0; i<MSGSIZE; i++)
data[i] = 'x';
start = MPI_Wtime();
while (1) {
MPI_Send(data, MSGSIZE, MPI_BYTE, dest, tag, MPI_COMM_WORLD);
count++;
if (count % 10 == 0) {
end = MPI_Wtime();
printf("Count= %d Time= %f sec.\n", count, end-start);
start = MPI_Wtime();
}
}
}
/****************************** Receive task ********************************/
if (rank == 1) {
while (1) {
MPI_Recv(data, MSGSIZE, MPI_BYTE, source, tag, MPI_COMM_WORLD, &status);
/* Do some work - at least more than the send task */
result = 0.0;
for (i=0; i < 1000000; i++)
result = result + (double)random();
}
}
MPI_Finalize();
}
Ways to improve this code so that the receiver doesn't end up with an unlimited number of unexpected messages include:
Synchronization - you mentioned MPI_Barrier, but even using MPI_Ssend instead of MPI_Send would work.
Explicit buffering - the use of MPI_Bsend or Brecv to ensure adequate buffering exists.
Posted receives - the receiving process posts IRecvs before starting work to ensure that the messages are received into the buffers meant to hold the data, rather than system buffers.
In this pedagogical case, since the number of messages is unlimited, only the first (synchronization) would reliably work.

Single-Sided communications with MPI-2

Consider the following fragment of OpenMP code which transfers private data between two threads using an intermediate shared variable
#pragma omp parallel shared(x) private(a,b)
{
...
a = somefunction(b);
if (omp_get_thread_num() == 0) {
x = a;
}
}
#pragma omp parallel shared(x) private(a,b)
{
if (omp_get_thread_num() == 1) {
a = x;
}
b = anotherfunction(a);
...
}
I would (in pseudocode ) need to transfer of private data from one process to another using a single-sided message-passing library.
Any ideas?
This is possible, but there's a lot more "scaffolding" involved -- after all, you are communicating data between potentially completely different computers.
The coordination for this sort of thing is done between windows of data which are accessible from other processors, and with lock/unlock operations which coordinate the access of this data. The locks aren't really locks in the sense of being mutexes, but they are more like synchronization points coordinating data access to the window.
I don't have time right now to explain this in the detail I'd like, but below is an example of using MPI2 to do something like shared memory flagging in a system that doesn't have shared memory:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include "mpi.h"
int main(int argc, char** argv)
{
int rank, size, *a, geta;
int x;
int ierr;
MPI_Win win;
const int RCVR=0;
const int SENDER=1;
ierr = MPI_Init(&argc, &argv);
ierr |= MPI_Comm_rank(MPI_COMM_WORLD, &rank);
ierr |= MPI_Comm_size(MPI_COMM_WORLD, &size);
if (ierr) {
fprintf(stderr,"Error initializing MPI library; failing.\n");
exit(-1);
}
if (rank == RCVR) {
MPI_Alloc_mem(sizeof(int), MPI_INFO_NULL, &a);
*a = 0;
} else {
a = NULL;
}
MPI_Win_create(a, 1, sizeof(int), MPI_INFO_NULL, MPI_COMM_WORLD, &win);
if (rank == SENDER) {
/* Lock recievers window */
MPI_Win_lock(MPI_LOCK_EXCLUSIVE, RCVR, 0, win);
x = 5;
/* put 1 int (from &x) to 1 int rank RCVR, at address 0 in window "win"*/
MPI_Put(&x, 1, MPI_INT, RCVR, 0, 1, MPI_INT, win);
/* Unlock */
MPI_Win_unlock(0, win);
printf("%d: My job here is done.\n", rank);
}
if (rank == RCVR) {
for (;;) {
MPI_Win_lock(MPI_LOCK_EXCLUSIVE, RCVR, 0, win);
MPI_Get(&geta, 1, MPI_INT, RCVR, 0, 1, MPI_INT, win);
MPI_Win_unlock(0, win);
if (geta == 0) {
printf("%d: a still zero; sleeping.\n",rank);
sleep(2);
} else
break;
}
printf("%d: a now %d!\n",rank,geta);
printf("a = %d\n", *a);
MPI_Win_free(&win);
if (rank == RCVR) MPI_Free_mem(a);
MPI_Finalize();
return 0;
}

Resources