mpi parallel program to find prime numbers. Please help me dubug - parallel-processing

I wrote the following program to find prime number with the #defined value. It is parallel program using mpi. Can anyone help me find a error in it. It compile well but crashes while executing.
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
#define N 65
int rank, size;
double start_time;
double end_time;
int y, x, i, port1, port2, port3;
int check =0; // prime number checker, if a number is prime it always remains 0 through out calculation. for a number which is not prime it is turns to value 1 at some point
int signal =0; // has no important use. just to check if slave process work is done.
MPI_Status status;
MPI_Request request;
int main(int argc, char *argv[]){
MPI_Init(&argc, &argv); //initialize MPI operations
MPI_Comm_rank(MPI_COMM_WORLD, &rank); //get the rank
MPI_Comm_size(MPI_COMM_WORLD, &size); //get number of processes
if(rank == 0){ // master process divides work and also does initial work itself
start_time = MPI_Wtime();
printf("2\n"); //print prime number 2 first because the algorithm for finding the prime number in this program is just for odd number
port1 = (N/(size-1)); // calculating the suitable amount of work per process
for(i=1;i<size-1;i++){ // master sending the portion of work to each slave
port2 = port1 * i; // lower bound of work for i th process
port3 = ((i+1)*port1)-1; // upper bound of work for i th process
MPI_Isend(&port2, 1, MPI_INT, i, 100, MPI_COMM_WORLD, &request);
MPI_Isend(&port3, 1, MPI_INT, i, 101, MPI_COMM_WORLD, &request);
}
port2 = (size-1)*port1; port3= N; // the last process takes the remaining work
MPI_Isend(&port2, 1, MPI_INT, (size-1), 100, MPI_COMM_WORLD, &request);
MPI_Isend(&port3, 1, MPI_INT, (size-1), 101, MPI_COMM_WORLD, &request);
for(x = 3; x < port1; x=x+2){ // master doing initial work by itself
check = 0;
for(y = 3; y <= x/2; y=y+2){
if(x%y == 0) {check =1; break;}
}
if(check==0) printf("%d\n", x);
}
}
if (rank > 0){ // slave working part
MPI_Recv(&port2,1,MPI_INT, 0, 100, MPI_COMM_WORLD, &status);
MPI_Recv(&port3,1,MPI_INT, 0, 101, MPI_COMM_WORLD, &status);
if (port2%2 == 0) port2++; // changing the even argument to odd to make the calculation fast because even number is never a prime except 2.
for(x=port2; x<=port3; x=x+2){
check = 0;
for(y = 3; y <= x/2; y=y+2){
if(x%y == 0) {check =1; break;}
}
if (check==0) printf("%d\n",x);
}
signal= rank;
MPI_Isend(&signal, 1, MPI_INT, 0, 103, MPI_COMM_WORLD, &request); // just informing master that the work is finished
}
if (rank == 0){ // master concluding the work and printing the time taken to do the work
for(i== 1; i < size; i++){
MPI_Recv(&signal,1,MPI_INT, i, 103, MPI_COMM_WORLD, &status); // master confirming that all slaves finished their work
}
end_time = MPI_Wtime();
printf("\nRunning Time = %f \n\n", end_time - start_time);
}
MPI_Finalize();
return 0;
}
I got following error
mpirun -np 2 ./a.exe
Exception: STATUS_ACCESS_VIOLATION at eip=0051401C
End of stack trace

I found what was wrong with my program.
It was the use of the restricted variable signal. change the name of that variable (in all places it is used) to any other viable name and it works.

Related

MPI Program using MPI_Scatter and MPI_Reduce

Write an MPI program that efficiently compute the sum of array elements.
Program 1: Tasks communicate with MPI_Scatter and MPI_Reduce.
The programs can assume that the number of processes is a power of two.
The programs should add 2^15 = 65536 random doubles in the range 0 to 100.
Task 0 must generate the numbers, store them in array and distribute them to the tasks.
Each task does a serial sum of the numbers it is assigned. The local sums are then
added together using a tree structured parallel sum.
After the parallel sum is complete, task 0 should compute a serial sum of the
same numbers (to verify the result).
Task 0 must print the parallel sum, the serial sum and the time required for the
parallel sum (including data distribution).
#include <stdio.h>
#include <mpi.h>
int main(int argc,char *argv[]){
MPI_Init(NULL,NULL); // Initialize the MPI environment
int rank;
int comm_size;
MPI_Comm_rank(MPI_COMM_WORLD,&rank);
MPI_Comm_size(MPI_COMM_WORLD,&comm_size);
int number1[2];
int number[4];
if(rank == 0){
number[0]=1;
number[1]=2;
number[2]=3;
number[3]=4;
//number[4]=5;
}
double local_start, local_finish, local_elapsed, elapsed;
MPI_Barrier(MPI_COMM_WORLD);
local_start = MPI_Wtime();
//All processes
MPI_Scatter(number, 2, MPI_INT, &number1, 2, MPI_INT, 0, MPI_COMM_WORLD);
//printf("I'm process %d , I received the array : ",rank);
int sub_sum = 0;
for(int i=0 ; i<2 ; i++){
// printf("%d ",number1[i]);
sub_sum = sub_sum + number1[i];
}
printf("\n");
int sum = 0;
MPI_Reduce(&sub_sum, &sum, 1, MPI_INT, MPI_SUM,0,MPI_COMM_WORLD);
local_finish = MPI_Wtime();
local_elapsed = local_finish -local_start;
MPI_Reduce(&local_elapsed,&elapsed,1,MPI_DOUBLE,MPI_MAX,0,MPI_COMM_WORLD);
if(rank == 0)
{
printf("\nthe sum of array is: %d\n",sum);
printf("Elapsed time = %e seconds\n",elapsed);
}
MPI_Finalize();
return 0;
}

MPI Reduce in 2 different groups

I am trying to do the following with the help of MPI_reduce and MPI_groups. I have created 2 different groups, one having the odd rank processes and another having the even rank processes, but in the new groups they are all numbered from 0 to N. What I am doing is that I am reducing all the values in group 1 and 2 using MPI_SUM. Say group 1 has the following values like-> 1,2,3 and group 2 has the following values say--> 4,5,6 then after reducing in group 1 I am getting 6 and in group2 I am getting 15, I am storing them respectively in recv_val1 and recv_val2. Now what I want to do is that I want to add them together, but whenever I do so, I get some garbage values. Why is this happening?
#include <stdio.h>
#include "mpi.h"
int main( int argc, char *argv[])
{
int myrank, size;
MPI_Init(&argc, &argv);
MPI_Comm_rank( MPI_COMM_WORLD, &myrank );
MPI_Comm_size( MPI_COMM_WORLD, &size );
int color = myrank%2;
int newrank, newsize;
int sendval, maxval,recv_val1,recv_val2;
MPI_Status status;
MPI_Comm newcomm;
MPI_Comm_split (MPI_COMM_WORLD, color, myrank, &newcomm);
MPI_Comm_rank( newcomm, &newrank );
MPI_Comm_size( newcomm, &newsize );
//printf("color and newrank are %d and %d \n", color, newrank);
sendval = newrank;
if(color==0)
MPI_Reduce(&sendval, &recv_val1, 1, MPI_INT, MPI_SUM, 0, newcomm); // find sum
else
MPI_Reduce(&sendval, &recv_val2, 1, MPI_INT, MPI_SUM, 0, newcomm); // find sum
if(newrank == 0 && color == 0)
printf("%d\n", recv_val1);
if(newrank == 0 && color == 1)
printf("%d\n", recv_val2);
if(myrank==0)
printf(" sum is %d\n", recv_val1+recv_val2);
MPI_Comm_free(&newcomm);
MPI_Finalize();
return 0;
}

MPI help on how to parallelize my code

I am very much a newbie in this subject and need help on how to parallelize my code.
I have a large 1D array that in reality describes a 3D volume: 21x21x21 single precision values.
I have 3 computers that I want to engage in the computation. The operation that is performed on each cell in the grid(volume) is identical for all cells. The program takes in some data and perform some simple arithmetics on them and the return value is assigned to the grid cell.
My non-parallized code is:
float zg, yg, xg;
stack_result = new float[Nz*Ny*Nx];
// StrMtrx[8] is the vertical step size, StrMtrx[6] is the vertical starting point
for (int iz=0; iz<Nz; iz++) {
zg = iz*StRMtrx[8]+StRMtrx[6]; // find the vertical position in meters
// StrMtrx[5] is the crossline step size, StrMtrx[3] is the crossline starting point
for (int iy=0; iy<Ny; iy++) {
yg = iy*StRMtrx[5]+StRMtrx[3]; // find the crossline position
// StrMtrx[2] is the inline step size, StrMtrx[0] is the inline starting point
for (int ix=0; ix < nx; ix++) {
xg = ix*StRMtrx[2]+StRMtrx[0]; // find the inline position
// do stacking on each grid cell
// "Geoph" is the geophone ids, "Ngeo" is the number of geophones involved,
// "pahse_use" is the wave type, "EnvMtrx" is the input data common to all
// cells, "Mdata" is the length of input data
stack_result[ix+Nx*iy+Nx*Ny*iz] =
stack_for_qds(Geoph, Ngeo, phase_use, xg, yg, zg, EnvMtrx, Mdata);
}
}
}
Now I take in 3 computers and divide the volume in 3 vertical segments, so I would then have 3 sub-volumes each 21x21x7 cells. (note the parsing of the volume is in z,y,x).
The variable "stack_result" is the complete volume.
My parallellized version (which utterly fails, I only get one of the sub-volumes back) is:
MPI_Status status;
int rank, numProcs, rootProcess;
ierr = MPI_Init(&argc, &argv);
ierr = MPI_Comm_rank(MPI_COMM_WORLD, &rank);
ierr = MPI_Comm_size(MPI_COMM_WORLD, &numProcs);
int rowsInZ = Nz/numProcs; // 7 cells in Z (vertical)
int chunkSize = Nx*Ny*rowsInZ;
float *stack_result = new float[Nz*Ny*Nx];
float zg, yg, xg;
rootProcess = 0;
if(rank == rootProcess) {
offset = 0;
for (int n = 1; n < numProcs; n++) {
// send rank
MPI_Send(&n, 1, MPI_INT, n, 2, MPI_COMM_WORLD);
// send the offset in array
MPI_Send(&offset, 1, MPI_INT, n, 2, MPI_COMM_WORLD);
// send volume, now only filled with zeros,
MPI_Send(&stack_result[offset], chunkSize, MPI_FLOAT, n, 1, MPI_COMM_WORLD);
offset = offset+chunkSize;
}
// receive results
for (int n = 1; n < numProcs; n++) {
int source = n;
MPI_Recv(&offset, 1, MPI_INT, source, 2, MPI_COMM_WORLD, &status);
MPI_Recv(&stack_result[offset], chunkSize, MPI_FLOAT, source, 1, MPI_COMM_WORLD, &status);
}
} else {
int rank;
int source = 0;
int ierr = MPI_Recv(&rank, 1, MPI_INT, source, 2, MPI_COMM_WORLD, &status);
ierr = MPI_Recv(&offset, 1, MPI_INT, source, 2, MPI_COMM_WORLD, &status);
ierr = MPI_Recv(&stack_result[offset], chunkSize, MPI_FLOAT, source, 1, MPI_COMM_WORLD, &status);
int nz = rowsInZ; // sub-volume vertical length
int startZ = (rank-1)*rowsInZ;
for (int iz = startZ; iz < startZ+nz; iz++) {
zg = iz*StRMtrx[8]+StRMtrx[6];
for (int iy = 0; iy < Ny; iy++) {
yg = iy*StRMtrx[5]+StRMtrx[3];
for (int ix = 0; ix < Nx; ix++) {
xg = ix*StRMtrx[2]+StRMtrx[0];
stack_result[offset+ix+Nx*iy+Nx*Ny*iz]=
stack_for_qds(Geoph, Ngeo, phase_use, xg, yg, zg, EnvMtrx, Mdata);
} // x-loop
} // y-loop
} // z-loop
MPI_Send(&offset, 1, MPI_INT, source, 2, MPI_COMM_WORLD);
MPI_Send(&stack_result[offset], chunkSize, MPI_FLOAT, source, 1, MPI_COMM_WORLD);
} // else
write("stackresult.dat", stack_result);
delete [] stack_result;
MPI_Finalize();
Thanks in advance for your patience.
You are calling write("stackresult.dat", stack_result); in all MPI ranks. As a result, they all write into and thus overwrite the same file and what you see is the content written by the last MPI process to execute that code statement. You should move the writing into the body of the if (rank == rootProcess) conditional so that only the root process will write.
As a side note, sending the value of the rank is redundant - MPI already assigns each process a rank that ranges from 0 to #processes - 1. That also makes sending of the offset redundant since each MPI process could easily compute the offset on its own based on its rank.

MPI hangs during execution

I'm trying to write a simple program with MPI that finds all numbers less than 514, that are equal to the exponent of the sum of their digits(for example, 512 = (5+1+2)^3. The problem I have is with the main loop - it works just fine on a few iterations(c=10), but when I try to increase the number of iterations(c=x), mpiexec.exe just hangs - seemingly in the middle of printf routine.
I'm pretty sure that deadlocks are to blame, but I couldn't find any.
The source code:
#include <stdlib.h>
#include <stdio.h>
#include <iostream>
#include "mpi.h"
int main(int argc, char* argv[])
{
//our number
int x=514;
//amount of iterations
int c = 10;
//tags for message identification
int tag = 42;
int tagnumber = 43;
int np, me, y1, y2;
MPI_Status status;
/* Initialize MPI */
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &np);
MPI_Comm_rank(MPI_COMM_WORLD, &me);
/* Check that we run on more than two processors */
if (np < 2)
{
printf("You have to use at least 2 processes to run this program\n");
MPI_Finalize();
exit(0);
}
//begin iterations
while(c>0)
{
//if main thread, then send messages to all created threads
if (me == 0)
{
printf("Amount of threads: %d\n", np);
int b = 1;
while(b<np)
{
int q = x-b;
//sends a number to a secondary thread
MPI_Send(&q, 1, MPI_INT, b, tagnumber, MPI_COMM_WORLD);
printf("Process %d sending to process %d, value: %d\n", me, b, q);
//get a number from secondary thread
MPI_Recv(&y2, 1, MPI_INT, b, tag, MPI_COMM_WORLD, &status);
printf ("Process %d received value %d\n", me, y2);
//compare it with the sent one
if (q==y2)
{
//if they're equal, then print the result
printf("\nValue found: %d\n", q);
}
b++;
}
x = x-b+1;
b = 1;
}
else
{
//if not a main thread, then process the message sent and send the result back.
MPI_Recv (&y1, 1, MPI_INT, 0, tagnumber, MPI_COMM_WORLD, &status);
int sum = 0;
int y2 = y1;
while (y1!=0)
{
//find the number's sum of digits
sum += y1%10;
y1 /= 10;
}
int sum2 = sum;
while(sum2<y2)
{
//calculate the exponentiation
sum2 = sum2*sum;
}
MPI_Send (&sum2, 1, MPI_INT, 0, tag, MPI_COMM_WORLD);
}
c--;
}
MPI_Finalize();
exit(0);
}
And I run the compiled exe-file as "mpiexec.exe -n 4 lab2.exe". I use HPC Pack 2008 SDK, if that's of any use to you guys.
Is there any way to fix it? Or maybe some way to debug that situation properly?
Thanks a lot in advance!
Not sure if you already found where's the problem, but your infinite run happens in this loop:
while(sum2<y2)
{
//calculate the exponentiation
sum2 = sum2*sum;
}
You can confirm this by setting c to about 300 or above then make a printf call in this while loop. I haven't completely pinpoint your error of logic, but I marked three comments below at your code location where I feel is strange:
while(c>0)
{
if (me == 0)
{
...
while(b<np)
{
int q = x-b; //<-- you subtract b from x here
...
b++;
}
x = x-b+1; //<-- you subtract b again. sure this is what you want?
b = 1; //<-- this is useless
}
Hope this helps.

how to fix this MPI code program

This program demonstrates an unsafe program, because sometimes it will execute fine, and other times it will fail. The reason why the program fails or hangs is due to buffer exhaustion on the receiving task side, as a consequence of the way an MPI library has implemented an eager protocol for messages of a certain size. One possible solution is to include an MPI_Barrier call in the both the send and receive loops.
how its program code is correct???
#include "mpi.h"
#include <stdio.h>
#include <stdlib.h>
#define MSGSIZE 2000
int main (int argc, char *argv[])
{
int numtasks, rank, i, tag=111, dest=1, source=0, count=0;
char data[MSGSIZE];
double start, end, result;
MPI_Status status;
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
if (rank == 0) {
printf ("mpi_bug5 has started...\n");
if (numtasks > 2)
printf("INFO: Number of tasks= %d. Only using 2 tasks.\n", numtasks);
}
/******************************* Send task **********************************/
if (rank == 0) {
/* Initialize send data */
for(i=0; i<MSGSIZE; i++)
data[i] = 'x';
start = MPI_Wtime();
while (1) {
MPI_Send(data, MSGSIZE, MPI_BYTE, dest, tag, MPI_COMM_WORLD);
count++;
if (count % 10 == 0) {
end = MPI_Wtime();
printf("Count= %d Time= %f sec.\n", count, end-start);
start = MPI_Wtime();
}
}
}
/****************************** Receive task ********************************/
if (rank == 1) {
while (1) {
MPI_Recv(data, MSGSIZE, MPI_BYTE, source, tag, MPI_COMM_WORLD, &status);
/* Do some work - at least more than the send task */
result = 0.0;
for (i=0; i < 1000000; i++)
result = result + (double)random();
}
}
MPI_Finalize();
}
Ways to improve this code so that the receiver doesn't end up with an unlimited number of unexpected messages include:
Synchronization - you mentioned MPI_Barrier, but even using MPI_Ssend instead of MPI_Send would work.
Explicit buffering - the use of MPI_Bsend or Brecv to ensure adequate buffering exists.
Posted receives - the receiving process posts IRecvs before starting work to ensure that the messages are received into the buffers meant to hold the data, rather than system buffers.
In this pedagogical case, since the number of messages is unlimited, only the first (synchronization) would reliably work.

Resources