Mpi Scatter dynamical allocated 2d array(pgm file image) - image

I have implemented a 2d array Mpi scatter which works well. I mean that the master processor can scatter 2d parts of the initial big array. The problem is when I use as input the 2d image file dynamically allocated it doesn't work. I suppose that there must be something wrong with the memory. Is there any way of obtaining 2d parts of a big 2d array dynamically.

I had a similar problem, but it was one-dimensional vector with dynamically allocated.
Solved my problem as follows:
#include <stdio.h>
#include "mpi.h"
main(int argc, char** argv) {
/* .......Variables Initialisation ......*/
int Numprocs, MyRank, Root = 0;
int index;
int *InputBuffer, *RecvBuffer;
int Scatter_DataSize;
int DataSize;
MPI_Status status;
/* ........MPI Initialisation .......*/
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &MyRank);
MPI_Comm_size(MPI_COMM_WORLD, &Numprocs);
if (MyRank == Root) {
DataSize = 80000;
/* ...Allocate memory.....*/
InputBuffer = (int*) malloc(DataSize * sizeof(int));
for (index = 0; index < DataSize; index++)
InputBuffer[index] = index;
}
MPI_Bcast(&DataSize, 1, MPI_INT, Root, MPI_COMM_WORLD);
if (DataSize % Numprocs != 0) {
if (MyRank == Root)
printf("Input is not evenly divisible by Number of Processes\n");
MPI_Finalize();
exit(-1);
}
Scatter_DataSize = DataSize / Numprocs;
RecvBuffer = (int *) malloc(Scatter_DataSize * sizeof(int));
MPI_Scatter(InputBuffer, Scatter_DataSize, MPI_INT, RecvBuffer,
Scatter_DataSize, MPI_INT, Root, MPI_COMM_WORLD);
for (index = 0; index < Scatter_DataSize; ++index)
printf("MyRank = %d, RecvBuffer[%d] = %d \n", MyRank, index,
RecvBuffer[index]);
MPI_Finalize();
}
This link has examples that have helped me:
http://www.cse.iitd.ernet.in/~dheerajb/MPI/Document/hos_cont.html
Hope this helps.

Related

MPI - scattering filepaths to processes

I have 4 filepaths in the global_filetable and I am trying to scatter 2 pilepaths to each process.
The process 0 have proper 2 paths, but there is something strange in the process 1 (null)...
EDIT:
Here's the full code:
#include <stdio.h>
#include <limits.h> // PATH_MAX
#include <mpi.h>
int main(int argc, char *argv[])
{
char** global_filetable = (char**)malloc(4 * PATH_MAX * sizeof(char));
for(int i = 0; i < 4; ++i) {
global_filetable[i] = (char*)malloc(PATH_MAX *sizeof(char));
strncpy (filetable[i], "/path/", PATH_MAX);
}
/*for(int i = 0; i < 4; ++i) {
printf("%s\n", global_filetable[i]);
}*/
int rank, size;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
char** local_filetable = (char**)malloc(2 * PATH_MAX * sizeof(char));
MPI_Scatter(global_filetable, 2*PATH_MAX, MPI_CHAR, local_filetable, 2*PATH_MAX , MPI_CHAR, 0, MPI_COMM_WORLD);
{
/* now all processors print their local data: */
for (int p = 0; p < size; ++p) {
if (rank == p) {
printf("Local process on rank %d is:\n", rank);
for (int i = 0; i < 2; i++) {
printf("path: %s\n", local_filetable[i]);
}
}
MPI_Barrier(MPI_COMM_WORLD);
}
}
MPI_Finalize();
return 0;
}
Output:
Local process on rank 0 is:
path: /path/
path: /path/
Local process on rank 1 is:
path: (null)
path: (null)
Do you have any idea why I am having those nulls?
First, your allocation is inconsistent:
char** local_filetable = (char**)malloc(2 * PATH_MAX * sizeof(char));
The type char** indicates an array of char*, but you allocate a contiguous memory block, which would indicate a char*.
The easiest way would be to use the contiguous memory as char* for both global and local filetables. Depending on what get_filetable() actually does, you may have to convert. You can then index it like this:
char* entry = &filetable[i * PATH_MAX]
You can then simply scatter like this:
MPI_Scatter(global_filetable, 2 * PATH_MAX, MPI_CHAR,
local_filetable, 2 * PATH_MAX, MPI_CHAR, 0, MPI_COMM_WORLD);
Note that there is no more displacement, every rank just gets an equal sized chunk of the contiguous memory.
The next step would be to define a C and MPI struct encapsulating PATH_MAX characters so you can get rid of the constant usage of PATH_MAX and crude indexing.
I think this is much nicer (less complex, less memory management) than using actual char**. You would only need that if memory waste or redundant data transfer becomes an issue.
P.S. Make sure to never put in more than PATH_MAX - 1 characters in an filetable entry to keep space for the tailing \0.
Okay, I'm stupid.
char global_filetable[NUMBER_OF_STRINGS][PATH_MAX];
for(int i = 0; i < 4; ++i) {
strcpy (filetable[i], "/path/");
}
char local_filetable[2][PATH_MAX];
Now it works!

Inverting an image using MPI

I am trying to invert a PGM image using MPI. The grayscale (PGM) image should be loaded on the root processor and then be sent to each of the s^2 processors. Each processor will invert a block of the given image, and the inverted blocks will be gathered back on the root processor, which will assemble the blocks into the final image and write it to a PGM image. I ran the following code, but did not get any output. The image was read after running the code, but there was no indication of writing the resultant image. Could you please let me know what could be wrong with it?
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
#include <time.h>
#include <string.h>
#include <math.h>
#include <memory.h>
#define max(x, y) ((x>y) ? (x):(y))
#define min(x, y) ((x<y) ? (x):(y))
int xdim;
int ydim;
int maxraw;
unsigned char *image;
void ReadPGM(FILE*);
void WritePGM(FILE*);
#define s 2
int main(int argc, char **argv) {
MPI_Init(&argc, &argv);
int p, rank;
MPI_Comm_size(MPI_COMM_WORLD, &p);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
const int NPROWS=s; /* number of rows in _decomposition_ */
const int NPCOLS=s; /* number of cols in _decomposition_ */
const int BLOCKROWS = xdim/NPROWS; /* number of rows in _block_ */
const int BLOCKCOLS = ydim/NPCOLS; /* number of cols in _block_ */
int i, j;
FILE *fp;
float BLimage[BLOCKROWS*BLOCKCOLS];
for (int ii=0; ii<BLOCKROWS*BLOCKCOLS; ii++)
BLimage[ii] = 0;
float BLfilteredMat[BLOCKROWS*BLOCKCOLS];
for (int ii=0; ii<BLOCKROWS*BLOCKCOLS; ii++)
BLfilteredMat[ii] = 0;
if (rank == 0) {
/* begin reading PGM.... */
ReadPGM(fp);
}
MPI_Datatype blocktype;
MPI_Datatype blocktype2;
MPI_Type_vector(BLOCKROWS, BLOCKCOLS, ydim, MPI_FLOAT, &blocktype2);
MPI_Type_create_resized( blocktype2, 0, sizeof(float), &blocktype);
MPI_Type_commit(&blocktype);
int disps[NPROWS*NPCOLS];
int counts[NPROWS*NPCOLS];
for (int ii=0; ii<NPROWS; ii++) {
for (int jj=0; jj<NPCOLS; jj++) {
disps[ii*NPCOLS+jj] = ii*ydim*BLOCKROWS+jj*BLOCKCOLS;
counts [ii*NPCOLS+jj] = 1;
}
}
MPI_Scatterv(image, counts, disps, blocktype, BLimage, BLOCKROWS*BLOCKCOLS, MPI_FLOAT, 0, MPI_COMM_WORLD);
//************** Invert the block **************//
for (int proc=0; proc<p; proc++) {
if (proc == rank) {
for (int j = 0; j < BLOCKCOLS; j++) {
for (int i = 0; i < BLOCKROWS; i++) {
BLfilteredMat[j*BLOCKROWS+i] = 255 - image[j*BLOCKROWS+i];
}
}
} // close if (proc == rank) {
MPI_Barrier(MPI_COMM_WORLD);
} // close for (int proc=0; proc<p; proc++) {
MPI_Gatherv(BLfilteredMat, BLOCKROWS*BLOCKCOLS,MPI_FLOAT, image, counts, disps,blocktype, 0, MPI_COMM_WORLD);
if (rank == 0) {
/* Begin writing PGM.... */
WritePGM(fp);
free(image);
}
MPI_Finalize();
return (1);
}
It is very likely MPI is not the right tool for the job. The reason for this is that your job is inherently bandwidth limited.
Think of it this way: You have a coloring book with images which you all want to color in.
Method 1: you take your time and color them in one by one.
Method 2: you copy each page to a new sheet of paper and mail it to a friend who then colors it in for you. He mails it back to you and in the end you glue all the pages you received from all of your friends together to make one colored-in book.
Note that method two involves copying the whole book, which is arguably the same amount of work needed to color in the whole book. So method two is less time-efficient without even considering the overhead of shoving the pages into an envelope, licking the stamp, going to the post office and waiting for the letter to be delivered.
If you look at your code, every transmitted byte is only touched once throughout the whole program in this line:
BLfilteredMat[j*BLOCKROWS+i] = 255 - image[j*BLOCKROWS+i];
The single processor is much faster at subtracting two integers than it is at sending an integer of the wire, therefore one must advise against using MPI for your particular problem.
My suggestion to solve your problem: Try to avoid unneccessary communication whenever possible. Do all processes have access to the file system on which the files are located? You could try reading them directly from the filesystem.

_CrtlsValidHeapPointer(PuserData), Debug Assertion Failed Visual C++ (MPI)

I am using MPI_Reduce and MPI_Scatter function to scatter an array of integers in "N" processors and printing the partial and accumulate sum of the array. I am using Microsoft MPI (MSMPI) on visual studio 2010. but each time at execution it gives an exception " _CrtlsValidHeapPointer(PuserData)" along the title "Debug Assertion Failed" The code is as under
enter code here
#include <mpi.h>
#include<iostream>
using namespace std;
int main(int argc, char *argv[]) {
int size;
int rank;
int partialsum=0;
int root =0;
int accum=0;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
int *globaldata = NULL;
int *localdata = new int(4);
if (rank == root) {
globaldata = new int(size*4);
for (int i=0; i<(size*4); i++)
globaldata[i] = 2*i+1;
cout<<"Processor"<<rank<<" has global data: ";
for (int i=0; i<(size*4); i++)
cout<<globaldata[i]<<" ";
cout<<"\n";
}
MPI_Scatter(globaldata, 4, MPI_INT, localdata, 4, MPI_INT, root, MPI_COMM_WORLD);
cout<<"Processor "<<rank<<"has local data";
for(int i=0; i<4;i++)
cout<<" "<<localdata[i];
cout<<endl;
for(int k=0;k<4;k++)
partialsum += localdata[k];
cout<<"Processor "<<rank<<" Partial Sum = "<<partialsum<<"\n";
MPI_Reduce(&partialsum,&accum,1,MPI_INT,MPI_SUM, root,MPI_COMM_WORLD);
if (rank == 0) {
cout<<"Processor "<<rank<<" Accumulated Sum = "<<accum;
}
MPI_Finalize();
return 0;
}
The error is very simple and lies here:
globaldata = new int(size*4);
The syntax to allocate dynamic arrays with the new operator is new type[size]:
globaldata = new int[size*4];
In your case a space for a single int is allocated and set to size*4 instead and the initialisation code that immediately follows the allocation of memory at the root overwrites past the end of the allocated memory, thus destroying the heap structure.

How to fix Invalid arguments during creation of MPI derived Datatypes

I have one structure xyz as given below struct xyz { char a; int32_t b; char c[50]; uint32_t d; uchar e[10];}
I need to broadcast it so I used MPI_Bcast() where i required MPI Datatype corresponding to struct xyz for that I used MPI_Type_creat_struct() function to create a new MPI datatype as MPI_Datatype MPI_my_new_datatype, oldtypes[4]; where I used MPI datatypes corresponding to above structure members datatype as followings
oldtypes[4] = {MPI_CHAR, MPI_INT, MPI_UNSIGNED, MPI_UNSIGNED_CHAR}; and to craete new datatype i used following arguments in the function..
MPI_Type_create_struct(4,blockcounts, offsets, oldtypes, &MPI_my_new_datatype); MPI_Type_commit(&MPI_my_new_datatype);
Now it is compiling but giving run time error as below::
* An error occurred in MPI_Type_create_structon communicator MPI_COMM_WORLD MPI_ERR_ARG: invalid argument of some other kind MPI_ERRORS_ARE_FATAL (goodbye).
Can any one find out where is the problem?
You can't "bundle up" the similar types like that. Each field needs to be addressed seperately, and there are 5 of them, not 4.
Also note that, in general, it's a good idea to actually "measure" the offsets rather than infer them.
The following works:
#include <stdio.h>
#include <mpi.h>
#include <stdint.h>
struct xyz_t {
char a; int32_t b; char c[50]; uint32_t d; unsigned char e[10];
};
int main(int argc, char **argv) {
int rank, size, ierr;
MPI_Datatype oldtypes[5] = {MPI_CHAR, MPI_INT, MPI_CHAR, MPI_UNSIGNED, MPI_UNSIGNED_CHAR};
int blockcounts[5] = {1, 1, 50, 1, 10};
MPI_Datatype my_mpi_struct;
MPI_Aint offsets[5];
struct xyz_t old, new;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
/* find offsets */
offsets[0] = (char*)&(old.a) - (char*)&old;
offsets[1] = (char*)&(old.b) - (char*)&old;
offsets[2] = (char*)&(old.c) - (char*)&old;
offsets[3] = (char*)&(old.d) - (char*)&old;
offsets[4] = (char*)&(old.e) - (char*)&old;
MPI_Type_create_struct(5, blockcounts, offsets, oldtypes, &my_mpi_struct);
MPI_Type_commit(&my_mpi_struct);
if (rank == 0) {
old.a = 'a';
old.b = (int)'b';
strcpy(old.c,"This is field c");
old.d = (unsigned int)'d';
strcpy(old.e,"Field e");
MPI_Send(&old, 1, my_mpi_struct, 1, 1, MPI_COMM_WORLD);
} else if (rank == 1) {
MPI_Status status;
MPI_Recv(&new, 1, my_mpi_struct, 0, 1, MPI_COMM_WORLD, &status);
printf("new.a = %c\n", new.a);
printf("new.b = %d\n", new.b);
printf("new.e = %s\n", new.e);
}
MPI_Type_free(&my_mpi_struct);
MPI_Finalize();
return 0;
}
Running:
$ mpirun -np 2 ./struct
new.a = a
new.b = 98
new.e = Field e
Updated: As Dave Goodell below points out, the offset calculations would be better done as
#include <stddef.h>
/* ... */
offsets[0] = offsetof(struct xyz_t,a);
offsets[1] = offsetof(struct xyz_t,b);
offsets[2] = offsetof(struct xyz_t,c);
offsets[3] = offsetof(struct xyz_t,d);
offsets[4] = offsetof(struct xyz_t,e);
and if your MPI supports it (most should, though OpenMPI was slow with some of the MPI2.2 types) the MPI_UNSIGNED should be replaced with an MPI_UINT32

Single-Sided communications with MPI-2

Consider the following fragment of OpenMP code which transfers private data between two threads using an intermediate shared variable
#pragma omp parallel shared(x) private(a,b)
{
...
a = somefunction(b);
if (omp_get_thread_num() == 0) {
x = a;
}
}
#pragma omp parallel shared(x) private(a,b)
{
if (omp_get_thread_num() == 1) {
a = x;
}
b = anotherfunction(a);
...
}
I would (in pseudocode ) need to transfer of private data from one process to another using a single-sided message-passing library.
Any ideas?
This is possible, but there's a lot more "scaffolding" involved -- after all, you are communicating data between potentially completely different computers.
The coordination for this sort of thing is done between windows of data which are accessible from other processors, and with lock/unlock operations which coordinate the access of this data. The locks aren't really locks in the sense of being mutexes, but they are more like synchronization points coordinating data access to the window.
I don't have time right now to explain this in the detail I'd like, but below is an example of using MPI2 to do something like shared memory flagging in a system that doesn't have shared memory:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include "mpi.h"
int main(int argc, char** argv)
{
int rank, size, *a, geta;
int x;
int ierr;
MPI_Win win;
const int RCVR=0;
const int SENDER=1;
ierr = MPI_Init(&argc, &argv);
ierr |= MPI_Comm_rank(MPI_COMM_WORLD, &rank);
ierr |= MPI_Comm_size(MPI_COMM_WORLD, &size);
if (ierr) {
fprintf(stderr,"Error initializing MPI library; failing.\n");
exit(-1);
}
if (rank == RCVR) {
MPI_Alloc_mem(sizeof(int), MPI_INFO_NULL, &a);
*a = 0;
} else {
a = NULL;
}
MPI_Win_create(a, 1, sizeof(int), MPI_INFO_NULL, MPI_COMM_WORLD, &win);
if (rank == SENDER) {
/* Lock recievers window */
MPI_Win_lock(MPI_LOCK_EXCLUSIVE, RCVR, 0, win);
x = 5;
/* put 1 int (from &x) to 1 int rank RCVR, at address 0 in window "win"*/
MPI_Put(&x, 1, MPI_INT, RCVR, 0, 1, MPI_INT, win);
/* Unlock */
MPI_Win_unlock(0, win);
printf("%d: My job here is done.\n", rank);
}
if (rank == RCVR) {
for (;;) {
MPI_Win_lock(MPI_LOCK_EXCLUSIVE, RCVR, 0, win);
MPI_Get(&geta, 1, MPI_INT, RCVR, 0, 1, MPI_INT, win);
MPI_Win_unlock(0, win);
if (geta == 0) {
printf("%d: a still zero; sleeping.\n",rank);
sleep(2);
} else
break;
}
printf("%d: a now %d!\n",rank,geta);
printf("a = %d\n", *a);
MPI_Win_free(&win);
if (rank == RCVR) MPI_Free_mem(a);
MPI_Finalize();
return 0;
}

Resources