MPI matrix multification compile err: undeclared with code - matrix

I coded a mpi matrix multification program, which use scanf("%d", &size), designate matrix size, then I defined int matrix[size*size], but when I complied it, it reported that matrix is undeclared. Please tell me why, or what my problem is!
According Ed's suggestion, I changed the matrix definition to if(myid == 0) block, but got the same err! Now I post my code, please help me find out where I made mistakes! thank you!
int size;
int main(int argc, char* argv[]) {
int myid, numprocs;
int *p;
MPI_Status status;
int i,j,k;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD,&myid);
MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
if(myid == 0)
{
scanf("%d", &size);
int matrix1[size*size];
int matrix2[size*size];
int matrix3[size*size];
int section = size/numprocs;
int tail = size % numprocs;
srand((unsigned)time(NULL));
for( i=0; i<size; i++)
for( j=0; j<size; j++)
{
matrix1[i*size+j]=rand()%9;
matrix3[i*size+j]= 0;
matrix2[i*size+j]=rand()%9;
}
printf("Matrix1 is: \n");
for( i=0; i<size; i++)
{
for( j=0; j<size; j++)
{
printf("%3d", matrix1[i*size+j]);
}
printf("\n");
}
printf("\n");
printf("Matrix2 is: \n");

Reformatted code would be nice...
One problem is that you haven't declared the size variable. Another problem is that the [size] notation for declaring arrays is only good for sizes that are known at compile time. You want to use malloc() instead.

You don't actually need to define a MAX_SIZE if you use dynamic memory allocation.
#include <stdio.h>
#include <stdlib.h>
...
scanf("%d", &size);
int *matrix1 = (int *) malloc(size*size*sizeof(int));
int *matrix2 = (int *) malloc(size*size*sizeof(int));
int *matrix3 = (int *) malloc(size*size*sizeof(int));
...

Related

how to convert char ** to unique_ptr array?

The old code is as below:
char** wargv = new char*[argc];//memory leak!
for(int k = 0; k < argc; ++k)
{
wargv[k] = new char[strlen(argv[k]) + 1];
strncpy(wargv[k], argv[k], strlen(argv[k]));
wargv[k][strlen(argv[k])] = '\0';
}
because there may cause memory leak, so I want to convert wargv to unique_ptr. How to make it?
I know how to convert char* to unique_ptr, the code below works:
int size_t = 10;
std::unique_ptr<char[]> wargv(new char[size_t]{0});
strncpy(wargv.get(), "abcdef", size_t);
but I don't know how to convert char ** to unique_ptr, I tried vector,but it doesn't work.
As #Some programmer dude commented, std::vector<std::string> should be a better choice than std::unique_ptr<>, with memory allocation management.
I try to write a simple example and it works well.
#include <iostream>
#include <string>
#include <vector>
int main(int argc, char** argv) {
std::vector<std::string> collection(argc);
for (auto i = 0; i < argc; i++) {
collection[i] = argv[i];
}
for (const auto& arg : collection) {
std::cout << arg << "\n";
}
}

Matrix Multiplication OpenMP Counter-Intuitive Results

I am currently porting some code over to OpenMP at my place of work. One of the tasks I am doing is figuring out how to speed up matrix multiplication for one of our applications.
The matrices are stored in row-major format, so A[i*cols +j] gives the A_i_j element of the matrix A.
The code looks like this (uncommenting the pragma parallelises the code):
#include <omp.h>
#include <iostream>
#include <iomanip>
#include <stdio.h>
#define NUM_THREADS 8
#define size 500
#define num_iter 10
int main (int argc, char *argv[])
{
// omp_set_num_threads(NUM_THREADS);
int *A = new int [size*size];
int *B = new int [size*size];
int *C = new int [size*size];
for (int i=0; i<size; i++)
{
for (int j=0; j<size; j++)
{
A[i*size+j] = j*1;
B[i*size+j] = i*j+2;
C[i*size+j] = 0;
}
}
double total_time = 0;
double start = 0;
for (int t=0; t<num_iter; t++)
{
start = omp_get_wtime();
int i, k;
// #pragma omp parallel for num_threads(10) private(i, k) collapse(2) schedule(dynamic)
for (int j=0; j<size; j++)
{
for (i=0; i<size; i++)
{
for (k=0; k<size; k++)
{
C[i*size+j] += A[i*size+k] * B[k*size+j];
}
}
}
total_time += omp_get_wtime() - start;
}
std::setprecision(5);
std::cout << total_time/num_iter << std::endl;
delete[] A;
delete[] B;
delete[] C;
return 0;
}
What is confusing me is the following: why is dynamic scheduling faster than static scheduling for this task? Timing the runs and taking an average shows that static scheduling is slower, which to me is a bit counterintuitive since each thread is doing the same amount of work.
Also, am I correctly speeding up my matrix multiplication code?
Parallel matrix multiplication is non-trivial (have you even considered cache-blocking?). Your best bet is likely to be to use a BLAS Library for this, rather than writing it yourself. (Remember, "The best code is the code I do not have to write").
Wikipedia: Basic Linear Algebra Subprograms points to many implementations, a lot of which (including Intel Math Kernel Library) have free licenses.

Why does the left shift on a unsigned int happens from the 16th bit?

I am trying to put the values from the vector into the int.
Given vector :'1','0','1','1','1','0','1','1','1','0','1','1','1','0','1','1' :
Expected output (binary representation for the variable out):
00000000000000001011101110111011.
However, I am getting the following output:
10111011101110110000000000000000
Notice: the insertion begun at the 16bit from right end instead of beginning from the leftmost bit
#include<vector>
#include<iostream>
int main() {
std::vector<unsigned char> test = {'1','0','1','1','1','0','1','1','1','0','1','1','1','0','1','1'};
std::vector<unsigned int> out(1);
int j = 0;
for (int i =0; i < test.size(); i++) {
out[j] = out[j] << 1;
if (test[i] == '1') {out[j] |=0x1;}
}
j++;
for (int p = 0; p < j; p++) {
for (int k = 0; k<32; k++ ) {
std::cout << !!((out[p]<<k)&0x8000);
}
std::cout << std::endl;
}
std::cout << "Size Of:" << sizeof(int);
return 0;
}
The reason why this happens is that you are using a wrong constant for the mask: 0x8000 has its 16-bit set, while you probably meant to use 0x80000000 with the 32-nd bit set. To avoid mistakes like that it's best to construct masks with shifts, for example
(1 << 31)
This expression is evaluated at compile time, so the result is the same as if you computed the constant yourself.
Note that both 0x8000 and 0x80000000 constants are system-dependent. Moreover, 0x80000000 assumes 32-bit int, which is not guaranteed.
A better approach would be shifting the number right instead of left, and masking with 1.
The block of code creating out[j] works just fine.
Your problem is in the output block, due to use of 0x8000. Whenever k >= 16, the low 16 bits will be zero, guaranteeing that 0x8000 is zero.
Your code seems overly complicated to me. Here's my version of a C program that transforms a string of 1's and 0's into an int and one going from int to string.
#include <stdlib.h>
#include <stdio.h>
int main(int argc, char **argv);
int main (int argc, char **argv) {
char str [] = "1010101010101010";
int x;
int out;
for (x=0;x<16;x++) {
if (str[x] == '1') {
out |= (1 << x);
}
}
printf("%d", out) ;
}
and
#include <stdlib.h>
#include <stdio.h>
int main(int argc, char **argv);
int main (int argc, char **argv) {
char str [] = "1010101010101010";
int in = 21845;
char out[17] = {0};
for (x=0;x<16;x++) {
if (in & (1<<x)) {
out[x] = '1';
}
else {
out[x] = '0';
}
}
printf("%s", out) ;
}

_CrtlsValidHeapPointer(PuserData), Debug Assertion Failed Visual C++ (MPI)

I am using MPI_Reduce and MPI_Scatter function to scatter an array of integers in "N" processors and printing the partial and accumulate sum of the array. I am using Microsoft MPI (MSMPI) on visual studio 2010. but each time at execution it gives an exception " _CrtlsValidHeapPointer(PuserData)" along the title "Debug Assertion Failed" The code is as under
enter code here
#include <mpi.h>
#include<iostream>
using namespace std;
int main(int argc, char *argv[]) {
int size;
int rank;
int partialsum=0;
int root =0;
int accum=0;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
int *globaldata = NULL;
int *localdata = new int(4);
if (rank == root) {
globaldata = new int(size*4);
for (int i=0; i<(size*4); i++)
globaldata[i] = 2*i+1;
cout<<"Processor"<<rank<<" has global data: ";
for (int i=0; i<(size*4); i++)
cout<<globaldata[i]<<" ";
cout<<"\n";
}
MPI_Scatter(globaldata, 4, MPI_INT, localdata, 4, MPI_INT, root, MPI_COMM_WORLD);
cout<<"Processor "<<rank<<"has local data";
for(int i=0; i<4;i++)
cout<<" "<<localdata[i];
cout<<endl;
for(int k=0;k<4;k++)
partialsum += localdata[k];
cout<<"Processor "<<rank<<" Partial Sum = "<<partialsum<<"\n";
MPI_Reduce(&partialsum,&accum,1,MPI_INT,MPI_SUM, root,MPI_COMM_WORLD);
if (rank == 0) {
cout<<"Processor "<<rank<<" Accumulated Sum = "<<accum;
}
MPI_Finalize();
return 0;
}
The error is very simple and lies here:
globaldata = new int(size*4);
The syntax to allocate dynamic arrays with the new operator is new type[size]:
globaldata = new int[size*4];
In your case a space for a single int is allocated and set to size*4 instead and the initialisation code that immediately follows the allocation of memory at the root overwrites past the end of the allocated memory, thus destroying the heap structure.

how to reduce page faults in this program?

I'm gating more then 1000 page faults in this program.
can i reduce them to some smaller value or even to zero ?
or even any other changes can speed up the execution
#include <stdio.h>
#include<stdlib.h>
int main(int argc, char* argv[])
{
register unsigned int u, v,i;
register unsigned int arr_size=0;
register unsigned int b_size=0;
register unsigned int c;
register unsigned int *b;
FILE *file;
register unsigned int *arr;
file=fopen(argv[1],"r");
arr=(unsigned int *)malloc(4*10000000);
while(!feof(file)){
++arr_size;
fscanf(file,"%u\n",&arr[arr_size-1]);
}
fclose(file);
b=(unsigned int *)malloc(arr_size*4);
if (arr_size!=0)
{
++b_size;
b[b_size-1]=0;
for (i = 1; i < arr_size; ++i)
{
if (arr[b[b_size-1]] < arr[i])
{
++b_size;
b[b_size-1]=i;
continue;
}
for (u = 0, v = b_size-1; u < v;)
{
c = (u + v) / 2;
if (arr[b[c]] < arr[i]) u=c+1; else v=c;
}
if (arr[i] < arr[b[u]])
{
b[u] = i;
}
if(i>arr_size)break;
}
}
free(arr);
free(b);
printf("%u\n", b_size);
return 0;
}
The line:
arr=(unsigned int *)malloc(4*10000000);
is not a good programming style. Are you sure that your file is as big as 40MBs? try not to allocate the whole memory in the first lines of your program.

Resources