What is the recommended indentation style for OpenMP clauses? - openmp

OpenMP clauses and brackets could exists same time in code. Is there any regulation of coding style about nested OpenMp clauses?
e.g:
#pragma omp parallel
for (int i = 0; i < N; i++) {
code1();
# pragma omp for // Should this line be intended?
for (int j = 0; j < M; j++) {
code2();
# pragma omp critical {
code3(); // Should this block and brackets be intended?
}
}
code4();
}

From an OpenMP perspective there's no real guideline about how to indent the code.
The way I write the code would look like this:
#pragma omp parallel
for (int i = 0; i < N; i++) {
code1();
#pragma omp for // Should this line be intended?
for (int j = 0; j < M; j++) {
code2();
#pragma omp critical
{ // this curly brace needs to go on its own line
code3(); // Should this block and brackets be intended?
}
}
code4();
}
So, the pragmas start in the first column and the base language code follows whatever style you are using. The rational is that if you deleted all OpenMP pragmas you still get "pretty" base language code.
I seem to also recall that compiler pragmas have to have their '#' in the first column. I'll leave it other to correct my memory on this, as I'm not sure about if ISO C/C++ actually requires it. I haven't seen any compiler lately, that would enforce it.

Related

while loop getting stuck - Openmp

I was trying to implement some piece of parallel code and tried to synchronize the threads using an array of flags as shown below
// flags array set to zero initially
#pragma omp parallel for num_threads (n_threads) schedule(static, 1)
for(int i = 0; i < n; i ++){
for(int j = 0; j < i; j++) {
while(!flag[j]);
y[i] -= L[i][j]*y[j];
}
y[i] /= L[i][i];
flag[i] = 1;
}
However, the code always gets stuck after a few iterations when I try to compile it using gcc -O3 -fopenmp <file_name>. I have tried different number of threads like 2, 4, 8 all of them leads to the loop getting stuck. On putting print statements inside critical sections, I figured out that even though the value of flag[i] gets updated to 1, the while loop is still stuck or maybe there is some other problem with the code, I am not aware of.
I also figured out that if I try to do something inside the while block like printf("Hello\n") the problem goes away. I think there is some problem with the memory consistency across threads but I do not know how to resolve this. Any help would be appreciated.
Edit: The single threaded code I am trying to parallelise is
for(int i=0; i<n; i++){
for(int j=0; j < i; j++){
y[i]-=L[i][j]*y[j];
}
y[i]/=L[i][i];
}
You have data race in your code, which is easy to fix, but the bigger problem is that you also have loop carried dependency. The result of your code does depend on the order of execution. Try reversing the i loop without OpenMP, you will get different result, so your code cannot be parallelized efficiently.
One possibility is to parallelize the j loop, but the workload is very small inside this loop, so the OpenMP overheads will be significantly bigger than the speed gain by parallelization.
EDIT: In the case of your updated code I suggest to forget parallelization (because of loop carried dependency) and make sure that inner loop is properly vectorized, so I suggest the following:
for(int i = 0; i < n; i ++){
double sum_yi=y[i];
#pragma GCC ivdep
for(int j = 0; j < i; j++) {
sum_yi -= L[i][j]*y[j];
}
y[i] = sum_yi/L[i][i];
}
#pragma GCC ivdep tells the compiler that there is no loop carried dependency in the loop, so it can vectorize it safely. Do not forget to inform compiler the about the vectorization capabilities of your processor (e.g. use -mavx2 flag if your processor is AVX2 capable).

Why does my openMP 2.0 critical directive not flush?

I am currently attempting to parallelize a maximum value search using openMP 2.0 and Visual Studio 2012. I feel like this problem is so simple, it could be used as a textbook example. However, I run into a race condition I do not understand.
The code passage in question is:
double globalMaxVal = std::numeric_limits<double>::min();;
#pragma omp parallel for
for(int i = 0; i < numberOfLoops; i++)
{
{/* ... */} // In this section I determine maxVal
// Besides reading out values from two std::vector via the [] operator, I do not access or manipulate any global variables.
#pragma omp flush(globalMaxVal) // IF I COMMENT OUT THIS LINE I RUN INTO A RACE CONDITION
#pragma omp critical
if(maxVal > globalMaxVal)
{
globalMaxVal = maxVal;
}
}
I do not grasp why it is necessary to flush globalMaxVal. The openMP 2.0 documentation states: "A flush directive without a variable-list is implied for the following directives: [...] At entry to and exit from critical [...]" Yet, I get results diverging from the non-parallelized implementation, if I leave out the flush directive.
I realize that above's code might not be the prettiest or most efficient way to solve my problem, but at the moment I want to understand, why I am seeing this race condition.
Any help would be greatly appreciated!
EDIT:
Below I've now added a minimal, complete and verifiable example below requiring only openMP and the standard library. I've been able to reproduce the problem described above with this code.
For me some runs yield a globalMaxVal != 99, if I omit the flush directive. With the directive, it works just fine.
#include <algorithm>
#include <iostream>
#include <random>
#include <Windows.h>
#include <omp.h>
int main()
{
// Repeat parallelized code 20 times
for(int r = 0; r < 20; r++)
{
int globalMaxVal = 0;
#pragma omp parallel for
for(int i = 0; i < 100; i++)
{
int maxVal = i;
// Some dummy calculations to use computation time
std::random_device rd;
std::mt19937 generator(rd());
std::uniform_real_distribution<double> particleDistribution(-1.0, 1.0);
for(int j = 0; j < 1000000; j++)
particleDistribution(generator);
// The actual code bit again
#pragma omp flush(globalMaxVal) // IF I COMMENT OUT THIS LINE I RUN INTO A RACE CONDITION
#pragma omp critical
if(maxVal > globalMaxVal)
{
globalMaxVal = maxVal;
}
}
// Report outcome - expected to be 99
std::cout << "Run: " << r << ", globalMaxVal: " << globalMaxVal << std::endl;
}
system("pause");
return 0;
}
EDIT 2:
After further testing, we've found that compiling the code in Visual Studio without optimization (/Od) or in Linux gives correct results, whereas the bugs surface in Visual Studio 2012 (Microsoft C/C++ compiler version 17.00.61030) with activated optimization (/O2).

OpenMP double for loop

I'd like to use openMP to apply multi-thread.
Here is simple code that I wrote.
vector<Vector3f> a;
int i, j;
for (i = 0; i<10; i++)
{
Vector3f b;
#pragma omp parallel for private(j)
for (j = 0; j < 3; j++)
{
b[j] = j;
}
a.push_back(b);
}
for (i = 0; i < 10; i++)
{
cout << a[i] << endl;
}
I want to change it to works lik :
parallel for1
{
for2
}
or
for1
{
parallel for2
}
Code works when #pragma line is deleted. but it does not work when I use it. What's the problem?
///////// Added
Actually I use OpenMP to more complicated example,
double for loop question.
here, also When I do not apply MP, it works well.
But When I apply it,
the error occurs at vector push_back line.
vector<Class> B;
for 1
{
#pragma omp parallel for private(j)
parallel for j
{
Class A;
B.push_back(A); // error!!!!!!!
}
}
If I erase B.push_back(A) line, it works as well when I applying MP.
I could not find exact error message, but it looks like exception error about vector I guess. Debug stops at
void _Reallocate(size_type _Count)
{ // move to array of exactly _Count elements
pointer _Ptr = this->_Getal().allocate(_Count);
_TRY_BEGIN
_Umove(this->_Myfirst, this->_Mylast, _Ptr);
std::vector::push_back is not thread safe, you cannot call that without any protection against race conditions from multiple threads.
Instead, prepare the vector such that it's size is already correct and then insert the elements via operator[].
Alternatively you can protect the insertion with a critical region:
#pragma omp critical
B.push_back(A);
This way only one thread at a time will do the insertion which will fix the error but slow down the code.
In general I think you don't approach parallelization the right way, but there is no way to give better advise without a clearer and more representative problem description.

OpenMP: How to copy back value of firstprivate variable back to global

I am new to OpenMP and I am stuck with a basic operation. Here is a sample code for my question.
#include <omp.h>
int main(void)
{
int A[16] = {1,2,3,4,5 ...... 16};
#pragma omp parallel for firstprivate(A)
for(int i = 0; i < 4; i++)
{
for(int j = 0; j < 4; j++)
{
A[i*4+j] = Process(A[i*4+j]);
}
}
}
As evident,value of A is local to each thread. However, at the end, I want to write back part of A calculated by each threadto the corresponding position in global variable A. How this can be accomplished?
Simply make A shared. This is fine, because all loop iterations operate on separate elements of A. Remember that OpenMP is shared memory programming.
You can do so explicitly by using shared instead of firstprivate, or simply remove the declaration:
int A[16] = {1,2,3,4,5 ...... 16};
#pragma omp parallel for
for(int i = 0; i < 4; i++)
By default all variables declared outside of the parallel region. You can find an extended exemplary description in this answer.

openMP syntax error

I am using openMP to parallelize for loop. However, the compiler fails to build with: "unrecognized OpenMP #pragma". The output shows it occurs on "for" keyword.
I have already enabled openMP support in language (visual studio). If I try compiling with any other pragmas e.g single critical it seems to work fine. But it can't recognize "for".
#pragma omp parallel for
for (int i = 0; i < method_cnt; i++)
{
//Perform calculation
}
Your syntax is wrong. for and parallel for must be followed by a for-loop and not by a block:
#pragma omp parallel for
{ <--- wrong
for (int i = 0; i < method_cnt; i++)
{
//Perform calculation
}
} <--- wrong
The correct syntax is:
#pragma omp parallel for
for (int i = 0; i < method_cnt; i++)
{
//Perform calculation
}

Resources