I coded a mpi matrix multification program, which use scanf("%d", &size), designate matrix size, then I defined int matrix[size*size], but when I complied it, it reported that matrix is undeclared. Please tell me why, or what my problem is!
According Ed's suggestion, I changed the matrix definition to if(myid == 0) block, but got the same err! Now I post my code, please help me find out where I made mistakes! thank you!
#include "mpi.h"
#include <stdio.h>
#include <math.h>
#include <time.h>
int size;
int main(int argc, char* argv[])
{
int myid, numprocs;
int *p;
MPI_Status status;
int i,j,k;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD,&myid);
MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
if(myid == 0)
{
scanf("%d", &size);
int matrix1[size*size];
int matrix2[size*size];
int matrix3[size*size];
int section = size/numprocs;
int tail = size % numprocs;
srand((unsigned)time(NULL));
for( i=0; i<size; i++)
for( j=0; j<size; j++)
{
matrix1[i*size+j]=rand()%9;
matrix3[i*size+j]= 0;
matrix2[i*size+j]=rand()%9;
}
printf("Matrix1 is: \n");
for( i=0; i<size; i++)
{
for( j=0; j<size; j++)
{
printf("%3d", matrix1[i*size+j]);
}
printf("\n");
}
printf("\n");
printf("Matrix2 is: \n");
for( i=0; i<size; i++)
{
for( j=0; j<size; j++)
{
printf("%3d", matrix2[i*size+j]);
}
printf("\n");
}
//MPI_BCAST(matrix1, size*size, MPI_INT, 0, MPI_COMM_WORLD, );
for( i=1; i<numprocs; i++)
{
MPI_Send(&size, 1, MPI_INT, i, 0, MPI_COMM_WORLD);
MPI_Send(§ion, 1, MPI_INT, i, 0, MPI_COMM_WORLD);
MPI_Send(&tail, 1, MPI_INT, i, 0, MPI_COMM_WORLD);
MPI_Send(maxtrix2, size*size, MPI_INT, i, 0, MPI_COMM_WORLD);
}
j = 0;
for( i=1; i<numprocs-1; i++)
{
p = &matrix1[size*section*j++];
MPI_Send(p, size*section, MPI_INT, i, 1, MPI_COMM_WORLD);
}
p = &matrix1[size*section*j];
MPI_Send(p, size*section+size*tail, MPI_INT, numprocs-1, 1, MPI_COMM_WORLD);
p = matrix3;
for( i=1; i<numprocs-1; i++)
{
MPI_Recv(p, size*section, MPI_INT, i, 1, MPI_COMM_WORLD, &status);
p = &matrix3[size*section*i];
}
MPI_Recv(p, size*section+size*tail, MPI_INT, numprocs-1, 1, MPI_COMM_WORLD, &status);
printf("\n");
printf("Matrix3 is: \n");
for( i=0; i<size; i++)
{
for( j=0; j<size; j++)
{
printf("%2d ", matrix3[i*size+j]);
}
printf("\n");
}
}
else if (myid > 0 && myid<numprocs-1 )
{
MPI_Recv(&size, 1, MPI_INT, 0, 0,MPI_COMM_WORLD, &status);
MPI_Recv(§ion, 1, MPI_INT, 0, 0,MPI_COMM_WORLD, &status);
MPI_Recv(&tail, 1, MPI_INT, 0, 0,MPI_COMM_WORLD, &status);
int matrix1[size*size];
int matrix2[size*size];
int matrix3[size*size];
MPI_Recv(matrix2, size*size, MPI_INT, 0, 0, MPI_COMM_WORLD, &status);
MPI_Recv(matrix1, size*section, MPI_INT, 0, 0, MPI_COMM_WORLD, &status);
for( i=0; i<section; i++)
for( j=0; j<size; j++)
for( k=0; k<size; k++)
{
matrix1[i*size+j] = matrix1[i*size+k]*matrix2[k*size+j];
}
MPI_Send(matrix1, size*section, MPI_INT, 0, 1, MPI_COMM_WORLD);
}
else if (myid > 0 && myid == numprocs-1)
{
MPI_Recv(&size, 1, MPI_INT, 0, 0,MPI_COMM_WORLD, &status);
MPI_Recv(§ion, 1, MPI_INT, 0, 0,MPI_COMM_WORLD, &status);
MPI_Recv(&tail, 1, MPI_INT, 0, 0,MPI_COMM_WORLD, &status);
int matrix1[size*size];
int matrix2[size*size];
int matrix3[size*size];
MPI_Recv(matrix2, size*size, MPI_INT, 0, 0, MPI_COMM_WORLD, &status);
MPI_Recv(matrix1, size*section+size*tail, MPI_INT, 0, 0, MPI_COMM_WORLD, &status);
for( i=0; i<section+tail; i++)
for( j=0; j<size; j++)
for( k=0; k<size; k++)
{
matrix1[i*size+j] = matrix1[i*size+k]*matrix2[k*size+j];
}
MPI_Send(matrix1, size*section+size*tail, MPI_INT, 0, 1, MPI_COMM_WORLD);
}
return 0;
MPI_Finalize();
}
It may be that you are using scanf() on one machine before you set the size of the matrix, however if the size of the matrix is stored on all the machines the scanf() will not be run on them all.
If that is the case, you will have to scanf() the size of the matrix on the main process before you begin with the MPI functionality, and then send the size of the matrix (via COMM_WORLD.Bcast() or some other method) to each process in order for the matrix to be defined correctly.
Of course, this is just a guess because you've provided far too little information to make an informed answer, so I'm going for the most likely explanation.
EDIT
Ok, here's some changes that will make it compile (Some of them may be done anyways, your code came out a bit funny when you pasted it in, and there may be others I've missed, again code is formatted a bit funny)
MPI_Send(maxtrix2, size*size, MPI_INT, i, 0, MPI_COMM_WORLD);
should be
MPI_Send(&matrix2, size*size, MPI_INT, i, 0, MPI_COMM_WORLD);
int section = size/numprocs;
int tail = size % numprocs;
These need to be defined before the first if statement in order for them to work further in, so just define them straight after the main without assigning them. (Otherwise they don't exist when your other processes try to use them)
Sorry but I don't have time to figure out the code and actually get it to do what you want, but that should at least get you runnable code you can debug.
The value of "size" is not known at compile time. Hence the error.
It may seem logical, if you are new to coding, that you are reading the value of size and trying to allocate it. This will, in fact, work for interpreted languages like Python. But your code is in C. C programs need to be compiled to work. When the compiler looks at your code, it doesnt know what is the value of the variable "size". And in the next statement, you are using the variable "size". So, you are attempting to use a variable whose value is not yet known. That is what the compiler is complaining about.
Two ways to solve this:
1) Declare a sufficiently large matrix, say, 1000 X 1000. And during run time, you decide how much size you want to use. But dont give values more than what you hard coded in the source, i.e. 1000 X 1000. What you are doing here is telling the compiler to allocate memory for 1000 X 1000 items, but you may or may not use the entire space. You will be wasting memory and this is not an efficient way to do this.
2) Use dynamic allocation. However, given the nature of this question, this may be too advanced for you at the moment.
Related
I have the following implementation:
int main(int argc, char **argv)
{
int n_runs = 100; // Number of runs
int seed = 1;
int arraySize = 400;
/////////////////////////////////////////////////////////////////////
// initialise the random number generator using a fixed seed for reproducibility
srand(seed);
MPI_Init(nullptr, nullptr);
int rank, n_procs;
MPI_Comm_size(MPI_COMM_WORLD, &n_procs);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
// Initialise the probability step and results vectors.
// We have 21 probabilities between 0 and 1 (inclusive).
double prob_step = 0.05;
std::vector<double> avg_steps_over_p(21,0);
std::vector<double> trans_avg_steps_over_p(21,0);
std::vector<int> min_steps_over_p(21,0);
std::vector<int> trans_min_steps_over_p(21,0);
std::vector<int> max_steps_over_p(21,0);
std::vector<int> trans_max_steps_over_p(21,0);
std::vector<double> prob_reached_end(21,0);
std::vector<double> trans_prob_reached_end(21,0);
// Loop over probabilities and compute the number of steps before the model burns out,
// averaged over n_runs.
for (int i = rank; i < 21; i+=n_procs)
{
double prob = i*prob_step;
int min_steps = std::numeric_limits<int>::max();
int max_steps = 0;
for (int i_run = 0; i_run < n_runs; ++i_run)
{
Results result = forest_fire(arraySize, prob);
avg_steps_over_p[i] += result.stepCount;
if (result.fireReachedEnd) ++prob_reached_end[i];
if (result.stepCount < min_steps) min_steps = result.stepCount;
if (result.stepCount > max_steps) max_steps = result.stepCount;
}
avg_steps_over_p[i] /= n_runs;
min_steps_over_p[i] = min_steps;
max_steps_over_p[i] = max_steps;
prob_reached_end[i] = 1.0*prob_reached_end[i] / n_runs;
}
// Worker processes communicate their results to the master process.
if (rank > 0)
{
MPI_Send(&avg_steps_over_p[0], 21, MPI_DOUBLE, 0, rank, MPI_COMM_WORLD);
MPI_Send(&min_steps_over_p[0], 21, MPI_INT, 0, rank, MPI_COMM_WORLD);
MPI_Send(&max_steps_over_p[0], 21, MPI_INT, 0, rank, MPI_COMM_WORLD);
MPI_Send(&prob_reached_end[0], 21, MPI_DOUBLE, 0, rank, MPI_COMM_WORLD);
} else
{
for (int i = 1; i < n_procs; ++i)
{
MPI_Status status;
MPI_Recv(&trans_avg_steps_over_p[0], 21, MPI_DOUBLE, i, i, MPI_COMM_WORLD, &status);
for (int j = i; j < 21; j += n_procs) {
avg_steps_over_p[j] = trans_avg_steps_over_p[j];
}
MPI_Recv(&trans_min_steps_over_p[0], 21, MPI_INT, i, i, MPI_COMM_WORLD, &status);
for (int j = i; j < 21; j += n_procs) {
min_steps_over_p[j] = trans_min_steps_over_p[j];
}
MPI_Recv(&trans_max_steps_over_p[0], 21, MPI_INT, i, i, MPI_COMM_WORLD, &status);
for (int j = i; j < 21; j += n_procs) {
max_steps_over_p[j] = trans_max_steps_over_p[j];
}
MPI_Recv(&trans_prob_reached_end[0], 21, MPI_DOUBLE, i, i, MPI_COMM_WORLD, &status);
for (int j = i; j < 21; j += n_procs) {
prob_reached_end[j] = trans_prob_reached_end[j];
}
}
// Master process outputs the final result.
std::cout << "Probability, Avg. Steps, Min. Steps, Max Steps" << std::endl;
for (int i = 0; i < 21; ++i)
{
double prob = i * prob_step;
std::cout << prob << "," << avg_steps_over_p[i]
<< "," << min_steps_over_p[i] << ","
<< max_steps_over_p[i] << ","
<< prob_reached_end[i] << std::endl;
}
}
MPI_Finalize();
return 0;
}
I have tried the following parameters: scaling analysis
I'm new to parallelisation and HPC so forgive me if I'm wrong, but I was expecting a speed-up ratio of greater than 3 when increasing the tasks per node and CPUs per task. I haven't yet tried all the possibilities but I believe the behaviour here is odd, especially when keeping CPUs per task at 1 and increasing tasks per node from 2->3->4. I know it's not as simple a case as greater core usage = greater speed up, but from what I've gathered these should speed-up.
Is there a possible inefficiency in my code that is leading to this, or is this expected behaviour? My full code is here, which includes the openMP parallelisation: https://www.codedump.xyz/cpp/Y5Rr68L8Mncmx1Sd.
Many thanks.
I don't know how many operations are in the forest_fire routine but it had better be a couple of tens of thousands otherwise you don't have enough work to overcome the parallelization overhead.
Rank 0 handles all processes sequentially. You should use MPI_Irecv. And I wonder if a collective operation would not be preferable.
You are indexing with [i] which is a strided operation. That is space-wasting as I pointed out in another question you posted. Every process should only allocate as much space as is needed on that process.
I have a parallel code, but I don't understand if it works correctly in parallel.
I have two vectors A and B whose elements are matrices defined with a proper class.
Since the matrices in the vectors are not primitive type I can't send these vectors to other ranks through MPI_Scatter, so I have to use MPI_Send and MPI_Recv. Also, rank 0 has only a coordination role: it sends to the other ranks the blocks they should work with and collects the results at the end, but it does not participate to the computation.
The solution of the exercise is the following:
// rank 0 sends the blocks to the other ranks, which compute the local
// block products, then receive the partial results and prints the global
// vector
if (rank == 0)
{
// send data
for (unsigned j = 0; j < N_blocks; ++j)
{
int dest = j / local_N_blocks + 1;
// send number of rows
unsigned n = A[j].rows();
MPI_Send(&n, 1, MPI_UNSIGNED, dest, 1, MPI_COMM_WORLD);
// send blocks
MPI_Send(A[j].data(), n*n, MPI_DOUBLE, dest, 2, MPI_COMM_WORLD);
MPI_Send(B[j].data(), n*n, MPI_DOUBLE, dest, 3, MPI_COMM_WORLD);
}
// global vector
std::vector<dense_matrix> C(N_blocks);
for (unsigned j = 0; j < N_blocks; ++j)
{
int root = j / local_N_blocks + 1;
// receive number of rows
unsigned n;
MPI_Recv(&n, 1, MPI_UNSIGNED, root, 4, MPI_COMM_WORLD,
MPI_STATUS_IGNORE);
// initialize blocks
dense_matrix received(n,n);
// receive blocks
MPI_Recv(received.data(), n*n, MPI_DOUBLE, root, 5,
MPI_COMM_WORLD, MPI_STATUS_IGNORE);
// store block in the vector
C[j] = received;
}
// print result
print_matrix(C);
}
// all the other ranks receive the blocks and compute the local block
// products, then send the results to rank 0
}
else
{
// local vector
std::vector<dense_matrix> local_C(local_N_blocks);
// receive data and compute products
for (unsigned j = 0; j < local_N_blocks; ++j)
{
// receive number of rows
unsigned n;
MPI_Recv(&n, 1, MPI_UNSIGNED, 0, 1, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
// initialize blocks
dense_matrix local_A(n,n); dense_matrix local_B(n,n);
// receive blocks
MPI_Recv(local_A.data(), n*n, MPI_DOUBLE, 0, 2, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
MPI_Recv(local_B.data(), n*n, MPI_DOUBLE, 0, 3, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
// compute product
local_C[j] = local_A * local_B;
}
// send local results
for (unsigned j = 0; j < local_N_blocks; ++j)
{
// send number of rows
unsigned n = local_C[j].rows();
MPI_Send(&n, 1, MPI_UNSIGNED, 0, 4, MPI_COMM_WORLD);
// send block
MPI_Send(local_C[j].data(), n*n, MPI_DOUBLE, 0, 5, MPI_COMM_WORLD);
}
}
In my opinion, if local_N_blocks= N_blocks / (size - 1); is different from 1, the variable dest doesn't change value at every loop iteration. So, after the first iteration of the "sending loop", the second time that rank 0 faces
MPI_Send(A[j].data(), n*n, MPI_DOUBLE, dest, 2, MPI_COMM_WORLD);
MPI_Send(B[j].data(), n*n, MPI_DOUBLE, dest, 3, MPI_COMM_WORLD);
it has to wait that the operation local_C[j] = local_A * local_B of the previous j has been completed so the code doesn't seem to me well parallelized.
What do you think?
I have a problem with 3 mpi_bcast and one mpi_scatter, my program don't work well ,mpi_scatter don't work and globalparcsr don't scatter between nodes. when i delete second and third mpi_bcast ,mpi_scatter work well. I want broadcast a and globalindividual and globalfitness and then scatter globalparcsr, part of my code as bellow:
int malloc2dint(int ***array, int n, int m) {
/* allocate the n*m contiguous items */
int *p = (int *)malloc(n*m * sizeof(int));
if (!p) return -1;
/* allocate the row pointers into the memory */
(*array) = (int **)malloc(n * sizeof(int*));
if (!(*array)) {
free(p);
return -1;
}
/* set up the pointers into the contiguous memory */
for (int i = 0; i<n; i++)
(*array)[i] = &(p[i*m]);
return 0;
}
int main(int argc, char *argv[]) {
int size, rank, divided_pop_size, sum = 0, root = 0, procgridsize, sum2 = 0,generation=0;
int **globalindividual, **localindividual;
int *globalfitness, *localfitness;
int *globalparcsr, *localparcsr;
int **recbuf;
int *sendcounts, *parsendcount; //specifying the number of elements to send to each processor
int *displs, *pardispls; //Entry i specifies the displacement
MPI_Status status;
int offset, rows;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
divided_pop_size = n_initial_pop / size;
if (rank == root)
{
malloc2dint(&globalindividual, n_initial_pop, num_vertices);
read_graph();
globalfitness = (int*)malloc(n_initial_pop * sizeof(int));
globalparcsr = (int*)malloc(n_initial_pop * sizeof(int));
globalindividual = initial_population(globalindividual, n_initial_pop);
for (int i = 0; i < n_initial_pop; i++) {
printf("\n");
for (int j = 0; j < num_vertices; j++)
printf("%d", globalindividual[i][j]);
}
}
for (int p = 0; p < size; p++) {
if (rank == p) {
malloc2dint(&localindividual,n_initial_pop + 2, num_vertices);
localindividual = initial_population(localindividual, divided_pop_size + 2);
}
}
MPI_Bcast(&a[0][0], 5000 * 5000, MPI_INT, 0, MPI_COMM_WORLD);
MPI_Barrier(MPI_COMM_WORLD);
MPI_Bcast(&globalindividual[0][0], n_initial_pop*num_vertices, MPI_INT, 0, MPI_COMM_WORLD);
MPI_Bcast(&globalfitness, n_initial_pop, MPI_INT, 0, MPI_COMM_WORLD);
MPI_Finalize();
return 0;
}
a is a 2d array and globalindividual is a 2d array with 12 rows and 8 columns and globalfitness is 1d array with size 12
please help me.
I'm having a problem with my code for matrix multiplication using both MPI and OMP. Code is correctly compiled but it give me wrong result,values in matrix c(in matmul function) are to big and matrix C(in main) doesn't even get results from function matmul. If anyone knows how to fix it,please help.
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <sys/time.h>
#include <omp.h>
#include <mpi.h>
int offset,rows,br_elemenata,cvor_id,cvor,ukupno;
MPI_Status status;
double gettime(void) {
struct timeval tv;
gettimeofday(&tv, NULL);
return tv.tv_sec + 1e-6 * tv.tv_usec;
}
void matfill(long N, double *mat, double val) {
long i, j;
for(i = 0; i < N; i ++)
for(j = 0; j < N; j ++)
mat[i * N + j] = val;
}
void matmul(long N, double *a, double *b, double *c) {
long i, j, k;
br_elemenata = N / ukupno; //odredjujemo broj elemenata po cvoru
if (N % ukupno != 0) br_elemenata++; //inkrementujemo broj elemenata po cvoru kako ne bismo neki izostavili
if (cvor == 0){
for (cvor_id=1;cvor_id<ukupno;cvor_id++){
offset = cvor_id * br_elemenata;
rows = N - offset;
if (rows > br_elemenata)
rows = br_elemenata;
// slanje podataka sa cvora 0 na ostale cvorove
MPI_Send(&offset, 1, MPI_INT, cvor_id, 0, MPI_COMM_WORLD);
MPI_Send(&rows, 1, MPI_INT, cvor_id, 0, MPI_COMM_WORLD);
MPI_Send(a+offset, rows*N, MPI_DOUBLE, cvor_id, 0, MPI_COMM_WORLD);
MPI_Send(b, N*N, MPI_DOUBLE, cvor_id, 0, MPI_COMM_WORLD);
}
offset = 0;
rows = br_elemenata;
} else {
// Primanje podataka sa cvora 0
MPI_Recv(&offset, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &status);
MPI_Recv(&rows, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &status);
MPI_Recv(a+offset, rows*N, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD, &status);
MPI_Recv(b, N*N, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD, &status);
}
MPI_Barrier(MPI_COMM_WORLD);
#pragma omp parallel for shared(a,b,c) private(i,j,k)
for (i = offset; i < offset + rows; i ++)
for (j = 0; j < N; j ++)
for (k = 0; k < N; k ++)
c[i + j] += a[i + k] * b[k * N + j];
printf("Clan: %5.2f\n",c[i]);
if (cvor == 0) {
for (cvor_id = 1; cvor_id < ukupno; cvor_id++) {
MPI_Recv(&offset, 1, MPI_INT, cvor_id, 1, MPI_COMM_WORLD, &status);
MPI_Recv(&rows, 1, MPI_INT, cvor_id, 1, MPI_COMM_WORLD, &status);
MPI_Recv(c+offset, rows*N, MPI_DOUBLE, cvor_id, 1, MPI_COMM_WORLD, &status);
}
} else {
MPI_Send(&offset, 1, MPI_INT, 0, 1, MPI_COMM_WORLD);
MPI_Send(&rows, 1, MPI_INT, 0, 1, MPI_COMM_WORLD);
MPI_Send(c+offset, rows*N, MPI_DOUBLE, 0, 1, MPI_COMM_WORLD);
}
}
int main(int argc, char **argv) {
long N;
double *A, *B, *C, t;
MPI_Init(&argc,&argv); //Inicijalizacija MPI
MPI_Comm_size(MPI_COMM_WORLD,&ukupno); //odredjujemo ukupan broj cvorova
MPI_Comm_rank(MPI_COMM_WORLD,&cvor); //odredjuje redni broj cvora, nacin da se svaki cvor identifikuje u komunikaciji
if (argc!=2) {
if (cvor==0) printf("Morate unijeti dimenziju matrice!");
MPI_Finalize(); // ako ne postoji argument pri pozivu funkcije, zavrsiti program
return 1;
}
N = atoi(argv[1]);
A = (double *) malloc(N * N * sizeof(double));
B = (double *) malloc(N * N * sizeof(double));
C = (double *) malloc(N * N * sizeof(double));
matfill(N, A, 1.0);
matfill(N, B, 2.0);
matfill(N, C, 0.0);
t = gettime();
matmul(N, A, B, C);
t = gettime() - t;
// if (cvor == 0){
fprintf(stdout, "%ld\t%le\t%le\n", N, t, (2 * N - 1) * N * N / t);
fflush(stdout);
printf("Clan: %f\n",C[6]);
// }
free(A);
free(B);
free(C);
return EXIT_SUCCESS;
}
The main issue is the offset during communication operations. It should be offset*N.
Corrected code :
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <sys/time.h>
#include <omp.h>
#include <mpi.h>
int offset,rows,br_elemenata,cvor_id,cvor,ukupno;
MPI_Status status;
double gettime(void) {
struct timeval tv;
gettimeofday(&tv, NULL);
return tv.tv_sec + 1e-6 * tv.tv_usec;
}
void matfill(long N, double *mat, double val) {
long i, j;
for(i = 0; i < N; i ++)
for(j = 0; j < N; j ++)
mat[i * N + j] = val;
}
void matprint(long N, double *mat) {
long i, j;
for(i = 0; i < N; i ++){
for(j = 0; j < N; j ++){
printf("%g ",mat[i*N+j]);
}
printf("\n");
}
}
void matdiag(long N, double *mat, double val) {
long i, j;
for(i = 0; i < N; i ++)
for(j = 0; j < N; j ++)
if(i==j){
mat[i * N + j] = (double)i;
}else{
mat[i * N + j] =0;
}
}
void matmul(long N, double *a, double *b, double *c) {
long i, j, k;
br_elemenata = N / ukupno; //odredjujemo broj elemenata po cvoru
if (N % ukupno != 0) br_elemenata++; //inkrementujemo broj elemenata po cvoru kako ne bismo neki izostavili
if (cvor == 0){
for (cvor_id=1;cvor_id<ukupno;cvor_id++){
offset = cvor_id * br_elemenata;
rows = N - offset;
if (rows > br_elemenata)
rows = br_elemenata;
// slanje podataka sa cvora 0 na ostale cvorove
MPI_Send(&offset, 1, MPI_INT, cvor_id, 0, MPI_COMM_WORLD);
MPI_Send(&rows, 1, MPI_INT, cvor_id, 1, MPI_COMM_WORLD);
MPI_Send(a+(offset*N), rows*N, MPI_DOUBLE, cvor_id, 2, MPI_COMM_WORLD);
MPI_Send(b, N*N, MPI_DOUBLE, cvor_id, 3, MPI_COMM_WORLD);
}
offset = 0;
rows = br_elemenata;
} else {
// Primanje podataka sa cvora 0
MPI_Recv(&offset, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &status);
MPI_Recv(&rows, 1, MPI_INT, 0, 1, MPI_COMM_WORLD, &status);
MPI_Recv(a+(offset*N), rows*N, MPI_DOUBLE, 0, 2, MPI_COMM_WORLD, &status);
MPI_Recv(b, N*N, MPI_DOUBLE, 0, 3, MPI_COMM_WORLD, &status);
}
MPI_Barrier(MPI_COMM_WORLD);
#pragma omp parallel for shared(a,b,c) private(i,j,k)
for (i = offset; i < offset + rows; i ++)
for (j = 0; j < N; j ++)
for (k = 0; k < N; k ++)
c[i*N + j] += a[i*N + k] * b[k * N + j];
printf("Clan: %5.2f\n",c[i]);
if (cvor == 0) {
for (cvor_id = 1; cvor_id < ukupno; cvor_id++) {
MPI_Recv(&offset, 1, MPI_INT, cvor_id, 4, MPI_COMM_WORLD, &status);
MPI_Recv(&rows, 1, MPI_INT, cvor_id, 5, MPI_COMM_WORLD, &status);
MPI_Recv(c+(N*offset), rows*N, MPI_DOUBLE, cvor_id, 6, MPI_COMM_WORLD, &status);
}
} else {
MPI_Send(&offset, 1, MPI_INT, 0, 4, MPI_COMM_WORLD);
MPI_Send(&rows, 1, MPI_INT, 0, 5, MPI_COMM_WORLD);
MPI_Send(c+(N*offset), rows*N, MPI_DOUBLE, 0, 6, MPI_COMM_WORLD);
}
}
int main(int argc, char **argv) {
long N;
double *A, *B, *C, t;
MPI_Init(&argc,&argv); //Inicijalizacija MPI
MPI_Comm_size(MPI_COMM_WORLD,&ukupno); //odredjujemo ukupan broj cvorova
MPI_Comm_rank(MPI_COMM_WORLD,&cvor); //odredjuje redni broj cvora, nacin da se svaki cvor identifikuje u komunikaciji
if (argc!=2) {
if (cvor==0) printf("Morate unijeti dimenziju matrice!");
MPI_Finalize(); // ako ne postoji argument pri pozivu funkcije, zavrsiti program
return 1;
}
N = atoi(argv[1]);
A = (double *) malloc(N * N * sizeof(double));
B = (double *) malloc(N * N * sizeof(double));
C = (double *) malloc(N * N * sizeof(double));
matfill(N, A, 1.0);
matfill(N, B, 2.0);
matfill(N, C, 0.0);
matdiag(N,A, 1) ;
t = gettime();
matmul(N, A, B, C);
t = gettime() - t;
if (cvor == 0){
fprintf(stdout, "%ld\t%le\t%le\n", N, t, (2 * N - 1) * N * N / t);
fflush(stdout);
printf("Clan: %f\n",C[6]);
printf("A\n");
matprint(N, A) ;
printf("B\n");
matprint(N, B) ;
printf("C\n");
matprint(N, C) ;
}
free(A);
free(B);
free(C);
MPI_Finalize();
return EXIT_SUCCESS;
}
To compile : mpicc main.c -o main To run : mpirun -np 4 main
If you wish to go further, you will be interested by the MPI_Bcast() function, which sends the same thing to everyone. MPI_Scatter() and MPI_Gather() are helpful to distribute matrices or get it back on a given process.
Moreover, the dgemm() function of BLAS may be used to speed up the computation on a given process.
To reduce the memory footprint, the allocated size of A and C may be decreased to account for br_elemenata (except on process 0)...and offsets will have to change...again !
I've come to seek help with my issue.
The whole below code seems to return proper values for root process, but incorrect values like -1.#IND00 for all other processes. Also Barriers don't work, before I generate the arrays and broadcast them, some of the processes freely go over.
The main idea is to put different parts of vector into other processes and then to glue them into one variable with MPI_Gather.
I have no idea where I have gone wrong.
I'll be grateful for any help given.
double *xNowe = calloc(n, sizeof(double));
double *vec = calloc(n/size, sizeof(double));
MPI_Barrier(MPI_COMM_WORLD);
MPI_Bcast(A, n*n, MPI_DOUBLE, 0, MPI_COMM_WORLD);
MPI_Bcast(b, n, MPI_DOUBLE, 0, MPI_COMM_WORLD);
MPI_Bcast(x0, n, MPI_DOUBLE, 0, MPI_COMM_WORLD);
MPI_Barrier(MPI_COMM_WORLD);
while(delta > granica)
{
ii++;
for(i = mystart; i < myend; i++)
{
vec[i - mystart] = b[i];
for(j = 0; j < n; j++)
{
if(i != j)
{
vec[i - mystart] -= A[i][j] * x0[j];
}
}
vec[i - mystart] = vec[i - mystart] / A[i][i];
if(rank > 0)
printf("\n%f", vec[i - mystart]);
}
printf("1: %d, 10: %d, 50: %d, 110: %d, 200: %d, 300: %d, 400: %d",xNowe[1],xNowe[10],xNowe[110],xNowe[200],xNowe[300],xNowe[400]);
MPI_Allgather(vec, n/size, MPI_DOUBLE, xNowe, n/size, MPI_DOUBLE, MPI_COMM_WORLD);
if(rank == 0)
{
delta = 0;
for(i = 0; i < n; i++)
{
delta = delta + ((xNowe[i] - x0[i] > 0) ? (xNowe[i] - x0[i]) : (-(xNowe[i] - x0[i])));
}
//x0 = xNowe; nie dzialalo
for(i = 0; i < n; i++)
{
x0[i] = xNowe[i];
}
}
MPI_Bcast(&delta, 1, MPI_DOUBLE, 0, MPI_COMM_WORLD);
MPI_Bcast(x0, n, MPI_DOUBLE, 0, MPI_COMM_WORLD);
}
}
Update: The loop crashes at 2nd iteration with value calculated on certain indexes of xNowe as:
1: 1204749721, 10: -1085549499, 50: -1034011523, 110: 1063725393, 200: -17690801
07, 300: -1083408896, 400: -5847835510
1: 0, 10: -524288, 50: 0, 110: -524288, 200: 0, 300: -524288, 400: 0
MPI_Gather() gathers values on proc root. If you wish to gather values everywhere, you may use MPI_Allgather()
http://www.mcs.anl.gov/research/projects/mpi/www/www3/MPI_Allgather.html
Bye,
Francis