Reason to use declare target pragma in OpenMP - openmp

I wonder what is the reason to use the declare target directive. I can simply use target {, data} map (to/from/tofrom ...) in order to specify which variables should be used by the device. As for the functions, is that compulsory for a function called from a target region to be declared as target? Suppose, I have the following code:
int data[N];
#pragma omp target
{
#pragma omp parallel for
for (int i=0; i<N; i++)
data[i] = my_function(i);
}
Is it required to surround my_function() declaration/definition with declare target?

In you example the data[N] array will be mapped to the device at the beginning of every target region, and unmapped at the end. In programs with multiple target regions it may be useful to map data[N] only once at startup using declare target directive.
As for the functions, the OpenMP 4.0 specification is quite unclear about this. It says only:
The declare target directive specifies that variables, functions (C, C++ and Fortran), and subroutines (Fortran) are mapped to a device.
So, it doesn't clearly prohibit calls to non-target functions from target regions and other target functions.
But I personally think that my_function must be declared as target. Otherwise why this pragma (for the functions) was introduced at all?

Related

Value/Purpose of __attribute((const)) in gcc c++

Is there any value in using __attribute((const)) in gcc for c++ programs when declaring functions or static members that the compiler can see do not access global memory?
For example,
int Add( int x , int y ) __attribute((const))
{
return x+y;
}
The compiler knows that this function is limited in its scope of memory access. Does the attribute add anything? If so, what?
Thanks,
Josh
__attribute__((const)) in GNU C expresses the intent of the author of the function to not depend on any value other than its input arguments.
This allows the compiler to optimize multiple calls with identical arguments to such a function into a single call without having to analyze the function body. This is especially useful if the function's body is in another translation unit.
In the case of int Add( int x , int y ) __attribute__((const)), multiple calls to, say Add(2,3), could be coalesced into a single call and the return value could be cached, without knowing what Add actually does.
It also allows the compiler to verify that the function actually adheres to the declared intent.
Refer to this LWN article for more details and an example.

Pointers and arrays in an OpenMP depend list

I have got something like head.h:
struct mystruct {
double * a;
double * t_a;
}
typedef struct mystruct pm_t;
and my OpenMP task code mycode.c
int foo(pm_t* t_lb){
#pragma omp task default(none) shared(t_lb, BLOCK) private(i) \
firstprivate(baseIndex) depend (in: t_lb->a, t_lb->t_a)
{
...
Compiling with Intel 17 I get:
error: invalid entity for this variable list in omp clause
firstprivate(baseIndex) depend (in: t_lb->a,t_lb->t_a)
^
I know that OpenMP does not deal with pointers in the depend syntax, but I have also tried with
firstprivate(baseIndex) depend (in: t_lb->a[:1], t_lb->t_a)
with no success. Does anybody see something wrong with this?
Apparently, this should be an error according to the OpenMP specifications:
A variable that is part of another variable (such as an element of a
structure) but is not an array element or an array section cannot
appear in a depend clause." (Version 4.5, page 171, line 18).
However, this restriction is planned to be dropped for Version 5.0 and the Cray compiler has already done it internally. So this will fail with GCC and Intel but will work with the Cray compiler.

How to define a constexpr variable

I want to use a simple compile time constant for example like this:
double foo(double x) { return x + kConstDouble; }
Now I see at least the following ways to define that constant
namespace { static constexpr double kConstDouble = 5.0; }
namespace { constexpr double kConstDouble = 5.0; }
static constexpr double kConstDouble = 5.0;
constexpr double kConstDouble = 5.0;
Which is the right way to go? Is there a difference when kConstDouble is defined in a header vs a source file?
Using static or an anonymous namespace will cause the variable to have internal linkage; it will only be visible within the same translation unit. So if you use one of these within a .cpp file, you won't be able to use the variable anywhere else. This would be done typically if the constant is an implementation detail of that unit of code. If you want to expose it to other translation units, you'll need to put it in a header file. The typical way to do that would be to declare it static (or put it in anonymous namespace), since it is a trivial and constant variable. The other approach would be to declare it extern in the header, and define it in the .cpp to get a truly global variable (as opposed top one where actually every tu has its own copy).
Between static and anonymous namespace; well you don't need both first of all. They both do the same thing AFAIK. But I think it is more idiomatic at this point to use anonymous namespaces in cpp files, as they can be used to also give functions, classes, etc internal linkage. On the other hand, when you want to use it for making a variable globally available, it's more common to use static; I never use anonymous namespaces in header files as I find it misleading.

__verify_pcpu_ptr function in Linux Kernel - What does it do?

#define __verify_pcpu_ptr(ptr)
do {
const void __percpu *__vpp_verify = (typeof((ptr) + 0))NULL;
(void)__vpp_verify;
} while (0)
#define VERIFY_PERCPU_PTR(__p)
({
__verify_pcpu_ptr(__p);
(typeof(*(__p)) __kernel __force *)(__p);
})
What do these two functions do? What are they used for? How do they work?
Thanks.
This is part of the scheme used by per_cpu_ptr to support a pointer that gets a different value for each CPU. There are two motives here:
Ensure that accesses to the per-cpu data structure are only made via the per_cpu_ptr macro.
Ensure that the argument given to the macro is of the correct type.
Restating, this ensures that (a) you don't accidentally access a per-cpu pointer without the macro (which would only reference the first of N members), and (b) that you don't inadvertently use the macro to cast a pointer that is not of the correct declared type to one that is.
By using these macros, you get the support of the compiler in type-checking without any runtime overhead. The compiler is smart enough to eventually recognize that all of these complex machinations result in no observable state change, yet the type-checking will have been performed. So you get the benefit of the type-checking, but no actual executable code will have been emitted by the compiler.

Compile time barriers - compiler code reordering - gcc and pthreads

AFAIK there are pthread functions that acts as memory barriers (e.g. here clarifications-on-full-memory-barriers-involved-by-pthread-mutexes). But what about compile-time barrier, i.e. is compiler (especially gcc) aware of this?
In other words - e.g. - is pthread_create() reason for gcc not to perform reordering?
For example in code:
a = 1;
pthread_create(...);
Is it certain that reordering will not take place?
What about invocations from different functions:
void fun(void) {
pthread_create(...);
...
}
a = 1;
fun();
Is fun() also compile time barrier (assuming pthread_create() is)?
What about functions in different translation units?
Please note that I am interested in general gcc and pthreads behavior scpecification, not necessarily x86-specific (various different embedded platforms in focus).
I am also not interested in other compilers/thread libraries behavior.
Because functions such as pthread_create() are external functions the compiler must ensure that any side effects that could be visible to an external function (such as a write to a global variable) must be done before calling the function. The compile couldn't reorder the write to a until after the function call in the first case) assuming a was global or otherwise potentially accessible externally).
This is behavior that is necessary for any C compiler, and really has little to do with threads.
However, if the variable a was a local variable, the compiler might be able to reorder it until after the function call (a might not even end up in memory at all for that matter), unless something like the address of a was taken and made available externally somehow (like passing it as the thread parameter).
For example:
int a;
void foo(void)
{
a = 1;
pthread_create(...); // the compiler can't reorder the write to `a` past
// the call to `pthread_create()`
// ...
}
void bar(void)
{
int b;
b = 1;
pthread_create(...); // `b` can be initialized after calling `pthread_create()`
// `b` might not ever even exist except as a something
// passed on the stack or in a register to `printf()`
printf( "%d\n", b);
}
I'm not sure if there's a document that outlines this in more detail - this is covered largely by C's 'as if' rule. In C99 that's in 5.1.2.3/3 "Program execution". C is specified by an abstract machine with sequence points where side effects must be complete, and programs must follow that abstract machine model except where the compiler can deduce that the side effects aren't needed.
In my foo() example above, the compiler would generally not be able to deduce that setting a = 1; isn't needed by pthread_create(), so the side effect of setting a to the value 1 must be completed before calling pthread_create(). Note that if there are compilers that perform global optimizations that can deduce that a isn't used elsewhere, they could delay or elide the assignment. However, in that case nothing else is using the side effect, so there would be no problem with that.

Resources