How portable is an if clause on a parallel directive? - openmp

This is related to How to disable OMP in a translation unit at the source file?. The patch I am working on has the following due to benchmarking results. It appears we need the ability to turn off OMP on the translation unit:
static const bool CRYPTOPP_RW_USE_OMP = true;
...
ModularArithmetic modp(m_p), modq(m_q);
#pragma omp parallel sections if(CRYPTOPP_RW_USE_OMP)
{
#pragma omp section
m_pre_2_9p = modp.Exponentiate(2, (9 * m_p - 11)/8);
#pragma omp section
m_pre_2_3q = modq.Exponentiate(2, (3 * m_q - 5)/8);
#pragma omp section
m_pre_q_p = modp.Exponentiate(m_q, m_p - 2);
}
The patch also applies to a cross platform library (Linux, Unix, Solaris, BSDs, OS X and Windows), and it supports a lot of older compilers. I need to ensure that I don't break a compile.
Question: how portable is the #pragma omp parallel sections if(CRYPTOPP_RW_USE_OMP)? Will using it break compiles that used to work with just #pragma omp parallel sections?
I tried looking at past OpenMP specifications, like 2.0, but I can't see where its allowed in the grammar (see Appendix C). The closest I could find is the parallel-directive production (line 22), which leads to parallel-clause (line 24) and then unique-parallel-clause.
And looking at documentation for platforms I can't test on, its not clear to me if its available. For example, Microsoft's documentation for Visual Studio 2005 appears to only allow it on a loop.

In the very document you link, page 8, section 2.2 parallel Construct. if is among the available clauses (the first one). It is part of the standard, so portable across all conforming compilers.
In your MSDN link:
if applies to the following directives:
parallel
for (OpenMP)
sections (OpenMP)

Related

OpenMP atomic compare and swap

I have a shared variable s and private variable p inside parallel region.
How can I do the following atomically (or at least better than with critical section):
if ( p > s )
s = p;
else
p = s;
I.e., I need to update the global maximum (if local maximum is better) or read it, if it was updated by another thread.
OpenMP 5.1 introduced the compare clause which allows compare-and-swap (CAS) operations such as
#pragma omp atomic compare
if (s < p) s = p;
In combination with a capture clause, you should be able to achieve what you want:
int s_cap;
// here we capture the shared variable and also update it if p is larger
#pragma omp atomic compare capture
{
s_cap = s;
if (s < p) s = p;
}
// update p if the captured shared value is larger
if (s_cap > p) p = s_cap;
The only problem? The 5.1 spec is very new and, as of today (2020-11-27), none of the widespread compilers, i.e., those available on Godbolt, supports OpenMP 5.1. See here for a more or less up-to-date list. Adding compare is still listed as an unclaimed task on Clang's OpenMP page. GCC is still working on full OpenMP 5.0 support and the trunk build on Godbolt doesn't recognise compare. Intel's oneAPI compiler may or may not support it - it's not available on Godbolt and I can't get it to compile OpenMP code.
Your best bet for now is to use atomic capture combined with a compiler-specific CAS atomic, possibly in a loop.

tasking compiler disable optimisations

how can I disable optimisations with TASKING compiler ? I'm using eclipse IDE
I've read in the documentation that I could use #pragma but didnt understand how
If you specify a certain optimization, all code in the module is subject to that optimization. Within the C
source file you can overrule the C compiler options for optimizations with #pragma optimize flag
and #pragma endoptimize. Nesting is allowed:
#pragma optimize e /* Enable expression
... simplification */
... C source ...
...
It seems the TASKING compiler is compatible with GCC with respect to optimization level flags, per this user guide (which is indeed quite old).
For disabling optimizations altogether, select None (-O0) as optimization level in the C/C++ project settings. Note that -O0 is the default optimization level of the Debug configuration.
Screenshot (Eclipse Oxygen):
If you wish to disable optimizations for a specific part of your C/C++ code, such as a specific function, then the pragma comes handy. For doing so place #pragma optimize 0 before the start of the code, and #pragma endoptimize after the end of it.
For example:
#pragma optimize 0
void myfunc()
{
// function body
}
#pragma endoptimize

OpenMp to Pthreads IR file

What are the gcc command line statements to know the pthread calls for openmp directives? I know about the -fdump command line statements for generating IR file in assembly, gimple, rtl, trees. But I am unable to get any pthread dumps for openmp directives.
GCC does not directly convert OpenMP pragmas into Pthreads code. Rather it converts each OpenMP construct into a set of calls to the GNU OpenMP run-time library libgomp. You could get the intermediate representation by compiling with -fdump-tree-all. Look for a file (or files) with extension .ompexp.
Example:
#include <stdio.h>
int main() {
int i;
#pragma omp parallel for
for(i=0; i<100; i++) {
printf("asdf\n");
}
}
The corresponding section of the .ompexp file that implements the parallel region:
<bb 2>:
__builtin_GOMP_parallel_start (main.omp_fn.0, 0B, 0);
main.omp_fn.0 (0B);
__builtin_GOMP_parallel_end ();
GCC implements parallel regions via code outlining and in that case main.omp_fn.0 is the function that contains the body of the parallel region. In the function itself (omitted here for brevity) the for worksharing construct is implemented by using some simple mathematical calculations that determine the range of iterations for the corresponding thread.

Parallelise list of independent instructions with OpenMP

I have a (long) list of independent instruction that can be executed in parallel. These are not in a loop, they are simply like this:
istr1;
istr2;
...
istrN;
How can I parallelise them using OpenMP? I know I could manually split them among some Pthreads, but I was wondering if there's something more straightforward, and that can automatically adjust the number of threads to the number of CPUs, just like OpenMP does.
That's what OpenMP sections are for.
#pragma omp parallel sections
{
#pragma omp section
istr1;
#pragma omp section
istr2;
...
#pragma omp section
istrN;
}
Another option would be to use explicit tasks:
#pragma omp parallel
{
#pragma omp single
{
#pragma omp task
istr1;
#pragma omp task
istr2;
...
#pragma omp task
istrN;
}
}
The tasks are created inside a single construct to prevent their creation from happening in all threads (thus preventing each task from being created num_threads times). Using explicit tasks might result in better performance since most OpenMP runtimes utilise rather stupid logic when scheduling sections.

in OpenMP, how can I make every single core runs a single thread?

I start to use OpenMP 3 days ago. I want to know how to use #pragma to make every single core runs a single thread. In more details:-
int ncores = omp_get_num_procs();
for(i = 0; i < ncores;i++){
....
}
I want this for loop to be distributed in the cores I have so, what #pragma I should use?
another thing, what are those #pragmas mean?
#pragma omp parallel
#pragma omp for
#pragma omp parallel for
I got little confused with those #pragmas
thank you alot .. :)
Thread Pinning
I want to know how to use #pragma to make every single core runs a
single thread.
Which openmp implementation do you use? The answer depends on that.
Pinning is not defined with pragmas. You will have to use environment variables. When using gcc, one can use an environment variable to pin threads to cores:
GOMP_CPU_AFFINITY="0-3" ./main
binds the first thread to the first core, the second thread to the second, and so on. See the gomp documentation for more information (section 3, Environment Variables). I forgot how to do the same thing with PGI and other compilers, but you should be able to find the answer for those compilers using a popular search engine.
OpenMP Pragmas
There's no way to avoid reading documentation. See this link to an IBM website for example. I found the tutorial by Blaise Barney quite useful.
To add to the previous answer, the equivalent environment variable in the Intel OpenMP library is KMP_AFFINITY. A similar usage to
GOMP_CPU_AFFINITY="0-3"
would be
KMP_AFFINITY="proclist=[0-3]"
Full details for the KMP_AFFINITY syntax and options are here:
http://software.intel.com/sites/products/documentation/studio/composer/en-us/2009/compiler_c/optaps/common/optaps_openmp_thread_affinity.htm
The new OpenMP implementation (3.0+) makes your life much easier. You can simply add the following line to your .bashrc or .bash_profile.
export OMP_PROC_BIND=true

Resources