Incorrect simultaneous operation of OpenMP and MPIR in VS 2015 - openmp

Guys. I'm trying to speed up the loop using OpenMP.
If I speed up a loop that uses an integer variable, then everything works correctly:
void main()
{
//mpz_class i("0");
//mpz_class k("1");
//mpz_class l("1211728594799");
int k = 9;
int i = 0;
int l = 1998899087;
#pragma omp parallel for
for (i=k; i <= l; i++) {
if (i == 1998899085)
printf("kkk");
}
system("pause");
}
If I start using the MPIR variable in a loop, then I get errors when building the program in Visual Studio 2015. Here are the numbers of these errors: C3015, C3017,C3019. Here is the code that causes these errors:
void main()
{
mpz_class i("0");
mpz_class k("1");
mpz_class l("1211728594799");
//int k = 9;
//int i = 0;
//int l = 1998899087;
#pragma omp parallel for
for (i=k; i <= l; i++) {
if (i == 1998899085)
printf("kkk");
}
system("pause");
}
MPIR itself works correctly if I disable pragma omp parallel for then the code is going to be fine, but it works much slower than using int variable of the same range of numbers.
What should I do to make Open MP work correctly with MPIR and I could speed up my program by running it in parallel?

Related

Difference between mutual exclusion like atomic and reduction in OpenMP

I'm am following video lectures of Tim Mattson on OpenMP and there was one exercise to find errors in provided code that count area of the Mandelbrot. So here is the solution that was provided:
#define NPOINTS 1000
#define MAXITER 1000
void testpoint(struct d_complex);
struct d_complex{
double r;
double i;
};
struct d_complex c;
int numoutside = 0;
int main(){
int i,j;
double area, error, eps = 1.0e-5;
#pragma omp parallel for default(shared) private(c,j) firstprivate(eps)
for(i = 0; i<NPOINTS; i++){
for(j=0; j < NPOINTS; j++){
c.r = -2.0+2.5*(double)(i)/(double)(NPOINTS)+eps;
c.i = 1.125*(double)(j)/(double)(NPOINTS)+eps;
testpoint(c);
}
}
area=2.0*2.5*1.125*(double)(NPOINTS*NPOINTS-numoutside)/(double)(NPOINTS*NPOINTS);
error=area/(double)NPOINTS;
printf("Area of Mandlebrot set = %12.8f +/- %12.8f\n",area,error);
printf("Correct answer should be around 1.510659\n");
}
void testpoint(struct d_complex c){
// Does the iteration z=z*z+c, until |z| > 2 when point is known to be outside set
// If loop count reaches MAXITER, point is considered to be inside the set
struct d_complex z;
int iter;
double temp;
z=c;
for (iter=0; iter<MAXITER; iter++){
temp = (z.r*z.r)-(z.i*z.i)+c.r;
z.i = z.r*z.i*2+c.i;
z.r = temp;
if ((z.r*z.r+z.i*z.i)>4.0) {
#pragma omp atomic
numoutside++;
break;
}
}
}
The question I have is, could we use reduction in #pragma omp parallel of variable numoutside like:
#pragma omp parallel for default(shared) private(c,j) firstprivate(eps) reduction(+:numoutside)
without atomic construct in testpoint function?
I tested the function without atomic, and the result was different from the one I got in the first place. Why does that happen? And while I understand the concept of mutual exclusion and use of it because of race conditioning, isn't reduction just another form of solving that problem with private variables?
Thank You in advance.

Why does my openMP 2.0 critical directive not flush?

I am currently attempting to parallelize a maximum value search using openMP 2.0 and Visual Studio 2012. I feel like this problem is so simple, it could be used as a textbook example. However, I run into a race condition I do not understand.
The code passage in question is:
double globalMaxVal = std::numeric_limits<double>::min();;
#pragma omp parallel for
for(int i = 0; i < numberOfLoops; i++)
{
{/* ... */} // In this section I determine maxVal
// Besides reading out values from two std::vector via the [] operator, I do not access or manipulate any global variables.
#pragma omp flush(globalMaxVal) // IF I COMMENT OUT THIS LINE I RUN INTO A RACE CONDITION
#pragma omp critical
if(maxVal > globalMaxVal)
{
globalMaxVal = maxVal;
}
}
I do not grasp why it is necessary to flush globalMaxVal. The openMP 2.0 documentation states: "A flush directive without a variable-list is implied for the following directives: [...] At entry to and exit from critical [...]" Yet, I get results diverging from the non-parallelized implementation, if I leave out the flush directive.
I realize that above's code might not be the prettiest or most efficient way to solve my problem, but at the moment I want to understand, why I am seeing this race condition.
Any help would be greatly appreciated!
EDIT:
Below I've now added a minimal, complete and verifiable example below requiring only openMP and the standard library. I've been able to reproduce the problem described above with this code.
For me some runs yield a globalMaxVal != 99, if I omit the flush directive. With the directive, it works just fine.
#include <algorithm>
#include <iostream>
#include <random>
#include <Windows.h>
#include <omp.h>
int main()
{
// Repeat parallelized code 20 times
for(int r = 0; r < 20; r++)
{
int globalMaxVal = 0;
#pragma omp parallel for
for(int i = 0; i < 100; i++)
{
int maxVal = i;
// Some dummy calculations to use computation time
std::random_device rd;
std::mt19937 generator(rd());
std::uniform_real_distribution<double> particleDistribution(-1.0, 1.0);
for(int j = 0; j < 1000000; j++)
particleDistribution(generator);
// The actual code bit again
#pragma omp flush(globalMaxVal) // IF I COMMENT OUT THIS LINE I RUN INTO A RACE CONDITION
#pragma omp critical
if(maxVal > globalMaxVal)
{
globalMaxVal = maxVal;
}
}
// Report outcome - expected to be 99
std::cout << "Run: " << r << ", globalMaxVal: " << globalMaxVal << std::endl;
}
system("pause");
return 0;
}
EDIT 2:
After further testing, we've found that compiling the code in Visual Studio without optimization (/Od) or in Linux gives correct results, whereas the bugs surface in Visual Studio 2012 (Microsoft C/C++ compiler version 17.00.61030) with activated optimization (/O2).

GCC v.4 compilation error with #pragma omp task (variables with reference type are not permitted in private/firstprivate clauses)

I am porting one large MPI-based physics code to OpenMP tasking. On one Cray supercomputing machine the code compiled, linked and runs perfectly (cray-mpich library, Cray compiler were used for this). Then, the code moved to a server for Jenkins continuous integration (I don't have admin rights on that server), and there is only GCC v.4 compiler (Cray compiler can't be used as it's not a Cray machine). On that server my code is not compiled, there is an error:
... error: ‘pcls’ implicitly determined as ‘firstprivate’ has reference type
#pragma omp task
^
It's a spaghetti code, so it's hard to copy-paste here the code lines caused this error, but my guess is that this is due to the problem described here:
http://forum.openmp.org/forum/viewtopic.php?f=5&t=117
Is there any possibility to solve this issue? It seems like with GCC v.6 this was resolved, but not sure... I am curious if someone has this situation...
UPD:
I am providing the skeleton of one function, where one such error is caused (sorry for long listing!):
void EMfields3D::sumMoments_vectorized(const Particles3Dcomm* part)
{
grid_initialisation(...);
#pragma omp parallel
{
for (int species_idx = 0; species_idx < ns; species_idx++)
{
const Particles3Dcomm& pcls = part[species_idx];
assert_eq(pcls.get_particleType(), ParticleType::SoA);
const int is = pcls.get_species_num();
assert_eq(species_idx,is);
double const*const x = pcls.getXall();
double const*const y = pcls.getYall();
double const*const z = pcls.getZall();
double const*const u = pcls.getUall();
double const*const v = pcls.getVall();
double const*const w = pcls.getWall();
double const*const q = pcls.getQall();
const int nop = pcls.getNOP();
#pragma omp master
{
start_timing_for_moments_accumulation(...);
}
...
#pragma omp for // because shared
for(int i=0; i<moments1dsize; i++)
moments1d[i]=0;
// prevent threads from writing to the same location
for(int cxmod2=0; cxmod2<2; cxmod2++)
for(int cymod2=0; cymod2<2; cymod2++)
// each mesh cell is handled by its own thread
#pragma omp for collapse(2)
for(int cx=cxmod2;cx<nxc;cx+=2)
for(int cy=cymod2;cy<nyc;cy+=2)
for(int cz=0;cz<nzc;cz++)
#pragma omp task
{
const int ix = cx + 1;
const int iy = cy + 1;
const int iz = cz + 1;
{
// reference the 8 nodes to which we will
// write moment data for particles in this mesh cell.
//
arr1_double_fetch momentsArray[8];
arr2_double_fetch moments00 = moments[ix][iy];
arr2_double_fetch moments01 = moments[ix][cy];
arr2_double_fetch moments10 = moments[cx][iy];
arr2_double_fetch moments11 = moments[cx][cy];
momentsArray[0] = moments00[iz]; // moments000
momentsArray[1] = moments00[cz]; // moments001
momentsArray[2] = moments01[iz]; // moments010
momentsArray[3] = moments01[cz]; // moments011
momentsArray[4] = moments10[iz]; // moments100
momentsArray[5] = moments10[cz]; // moments101
momentsArray[6] = moments11[iz]; // moments110
momentsArray[7] = moments11[cz]; // moments111
const int numpcls_in_cell = pcls.get_numpcls_in_bucket(cx,cy,cz);
const int bucket_offset = pcls.get_bucket_offset(cx,cy,cz);
const int bucket_end = bucket_offset+numpcls_in_cell;
some_manipulation_with_moments_accumulation(...);
}
}
#pragma omp master
{
end_timing_for_moments_accumulation(...);
}
// reduction
#pragma omp master
{
start_timing_for_moments_reduction(...);
}
{
#pragma omp for collapse(2)
for(int i=0;i<nxn;i++)
{
for(int j=0;j<nyn;j++)
{
for(int k=0;k<nzn;k++)
#pragma omp task
{
rhons[is][i][j][k] = invVOL*moments[i][j][k][0];
Jxs [is][i][j][k] = invVOL*moments[i][j][k][1];
Jys [is][i][j][k] = invVOL*moments[i][j][k][2];
Jzs [is][i][j][k] = invVOL*moments[i][j][k][3];
pXXsn[is][i][j][k] = invVOL*moments[i][j][k][4];
pXYsn[is][i][j][k] = invVOL*moments[i][j][k][5];
pXZsn[is][i][j][k] = invVOL*moments[i][j][k][6];
pYYsn[is][i][j][k] = invVOL*moments[i][j][k][7];
pYZsn[is][i][j][k] = invVOL*moments[i][j][k][8];
pZZsn[is][i][j][k] = invVOL*moments[i][j][k][9];
}
}
}
}
#pragma omp master
{
end_timing_for_moments_reduction(...);
}
}
}
for (int i = 0; i < ns; i++)
{
communicateGhostP2G(i);
}
}
Please, don't try to find a logic here (like why there is "#pragma omp parallel" and then the for-loop appears without "#pragma omp for"; or why in a for-loop there is a task construct)... I was not implementing the code, but I has to port it to OpenMP tasking...

OpenMP: How to copy back value of firstprivate variable back to global

I am new to OpenMP and I am stuck with a basic operation. Here is a sample code for my question.
#include <omp.h>
int main(void)
{
int A[16] = {1,2,3,4,5 ...... 16};
#pragma omp parallel for firstprivate(A)
for(int i = 0; i < 4; i++)
{
for(int j = 0; j < 4; j++)
{
A[i*4+j] = Process(A[i*4+j]);
}
}
}
As evident,value of A is local to each thread. However, at the end, I want to write back part of A calculated by each threadto the corresponding position in global variable A. How this can be accomplished?
Simply make A shared. This is fine, because all loop iterations operate on separate elements of A. Remember that OpenMP is shared memory programming.
You can do so explicitly by using shared instead of firstprivate, or simply remove the declaration:
int A[16] = {1,2,3,4,5 ...... 16};
#pragma omp parallel for
for(int i = 0; i < 4; i++)
By default all variables declared outside of the parallel region. You can find an extended exemplary description in this answer.

Using "unsigned long long" as iteration-range in for-loop using OpenMP

If I do this it works fine:
#pragma omp parallel for
for (int i = 1; i <= 200; i++) { ... }
this still works fine
#pragma omp parallel for
for (unsigned long long i = 1; i <= 200; i++) { ... }
but this isnt working
#pragma omp parallel for
for (unsigned long long i = 1; i <= LLONG_MAX; i++) { ... }
-> compiler error: invalid controlling predicate
LLONG_MAX is coming from
#include <limits.h>
g++ --version -> g++ (tdm64-1) 5.1.0
it is said that openmp 3.0 can handle unsigned integer - types.
I searched alot for this issue, without success. They all use int as iteration-variable.
Someone knows a solution?
i changed the program to:
unsigned long long n = ULLONG_MAX;
#pragma omp parallel for
for (unsigned long long i = 1; i < n; i++) { ... }
it seems to work now. Thank you Jeff for the hint.
i tried before with:
for (auto i = 1; i < n; i++) { ... }
-> no error, but loop didnt produce an output, very strange.

Resources