Pointers and arrays in an OpenMP depend list - openmp

I have got something like head.h:
struct mystruct {
double * a;
double * t_a;
}
typedef struct mystruct pm_t;
and my OpenMP task code mycode.c
int foo(pm_t* t_lb){
#pragma omp task default(none) shared(t_lb, BLOCK) private(i) \
firstprivate(baseIndex) depend (in: t_lb->a, t_lb->t_a)
{
...
Compiling with Intel 17 I get:
error: invalid entity for this variable list in omp clause
firstprivate(baseIndex) depend (in: t_lb->a,t_lb->t_a)
^
I know that OpenMP does not deal with pointers in the depend syntax, but I have also tried with
firstprivate(baseIndex) depend (in: t_lb->a[:1], t_lb->t_a)
with no success. Does anybody see something wrong with this?

Apparently, this should be an error according to the OpenMP specifications:
A variable that is part of another variable (such as an element of a
structure) but is not an array element or an array section cannot
appear in a depend clause." (Version 4.5, page 171, line 18).
However, this restriction is planned to be dropped for Version 5.0 and the Cray compiler has already done it internally. So this will fail with GCC and Intel but will work with the Cray compiler.

Related

GCC and cast-as-lvalue : what if I do want to cast as Lvalue?

To avoid declaring flows of one-time-use variables, I like to proceed using as few variables as possible and recasting them like so:
int main()
{
int i1;
#define cChar ((char)i1)
cChar='a';
#undef cChar
}
Which gives me the famous "error : lvalue required as left operand of assignment".
After reading a bit about this issue on this forum, it was pointed out that since 4.0, the cast-as-lvalue was removed (https://gcc.gnu.org/gcc-4.0/changes.html).
However, I fail to understand the why behind this and I was wondering if there was an option (apparently not) or even an alternative to GCC that would accept this kind of operation, which has been working for ages on ye olde compiler (obviously not GCC, but like Borland C++).

How to use syscalls correctly in go (different results from Go unsafe.Sizeof vs C sizeof)

Go's unsafe.Sizeof is returning a different result than C's sizeof.
main.go:
package main
import (
"unsafe"
)
type gpioeventdata struct {
Timestamp uint64
ID uint32
}
func main() {
eventdata := gpioeventdata{}
println("Size", unsafe.Sizeof(eventdata))
}
Prints 12 when compiled with env GOOS=linux GOARCH=arm GOARM=6 go build on macOS and run on Raspberry Pi Zero.
gpio.c:
#include <stdio.h>
#include <linux/gpio.h>
int main() {
printf("sizeof gpioevent_data %zu\n", sizeof(struct gpioevent_data));
}
Prints 16 when compiled and run on Raspberry (with gcc).
struct definition in gpio.h:
struct gpioevent_data {
__u64 timestamp;
__u32 id;
};
Edit
I already thought that this is due to alignment, but a lot of people are passing Go structs to syscall.Syscall (e.g. https://github.com/stapelberg/hmgo/blob/master/internal/gpio/reset.go#L49). So that's basically wrong and you should never do that?
If that's wrong, what would be the correct approach calling syscalls with go so that works correctly with different architectures. For example GPIO ioctl calls:
ret = ioctl(fd, GPIO_GET_LINEEVENT_IOCTL, &req);
...
struct gpioevent_data event;
ret = read(req.fd, &event, sizeof(event));
The go compiler and the C compiler are handling alignment differently.
In C the structure has been aligned to 16 bytes (adding a 4 bytes slack space after id or before it). The go compiler instead packed the fields without adding any slack space.
Your mistake is thinking that two "structures" in different languages with different compilers should have the same memory layout.
Note that there is no way to "compute" what will be the padding in a C or C++ structure because padding is a choice of the implementer. It's well possible that two different standard-conforming C compilers for the same architecture will generate different paddings (or even the same compiler with different compiling options).
The only way to get the padding correct is to check the specific case, either manually or by writing a script that calls the compiler passing the same compiling options and checks the result (e.g. by output the results of offsetof on all the members) and then generates the needed go source code after parsing that output.
According to https://go101.org/article/memory-layout.html, go generally follows the C rules for structure padding (see https://stackoverflow.com/a/38144117/851737 for details of the C memory alignment rules and here for an algorithm in pseudocode).
However, there is a known bug that go doesn't align 64bit values on 32bit architectures correctly.

Reason to use declare target pragma in OpenMP

I wonder what is the reason to use the declare target directive. I can simply use target {, data} map (to/from/tofrom ...) in order to specify which variables should be used by the device. As for the functions, is that compulsory for a function called from a target region to be declared as target? Suppose, I have the following code:
int data[N];
#pragma omp target
{
#pragma omp parallel for
for (int i=0; i<N; i++)
data[i] = my_function(i);
}
Is it required to surround my_function() declaration/definition with declare target?
In you example the data[N] array will be mapped to the device at the beginning of every target region, and unmapped at the end. In programs with multiple target regions it may be useful to map data[N] only once at startup using declare target directive.
As for the functions, the OpenMP 4.0 specification is quite unclear about this. It says only:
The declare target directive specifies that variables, functions (C, C++ and Fortran), and subroutines (Fortran) are mapped to a device.
So, it doesn't clearly prohibit calls to non-target functions from target regions and other target functions.
But I personally think that my_function must be declared as target. Otherwise why this pragma (for the functions) was introduced at all?

How can I get more information from clang substitution-failure errors?

I have the following std::begin wrappers around Eigen3 matrices:
namespace std {
template<class T, int nd> auto begin(Eigen::Matrix<T,nd,1>& v)
-> decltype(v.data()) { return v.data(); }
}
Substitution fails, and I get a compiler error (error: no matching function for call to 'begin'). For this overload, clang outputs the following:
.../file:line:char note: candidate template ignored:
substitution failure [with T = double, nd = 4]
template<class T, int nd> auto begin(Eigen::Matrix<T,nd,1>& v)
^
I want this overload to be selected. I am expecting the types to be double and int, i.e. they are deduced as I want them to be deduced (and hopefully correctly). By looking at the function, I don't see anything that can actually fail.
Every now and then I get similar errors. Here, clang tells me: substitution failure, I'm not putting this function into the overload resolution set. However, this does not help me debugging at all. Why did substitution failed? What exactly couldn't be substituted where? The only thing obvious to me is that the compiler knows, but it is deliberately not telling me :(
Is it possible to force clang to tell me what did exactly fail here?
This function is trivial and I'm having problems. In more complex functions, I guess things can only get worse. How do you go about debugging these kind of errors?
You can debug substitution failures by doing the substitution yourself into a cut'n'paste of the original template and seeing what errors the compiler spews for the fully specialized code. In this case:
namespace std {
auto begin(Eigen::Matrix<double,4,1>& v)
-> decltype(v.data()) {
typedef double T; // Not necessary in this example,
const int nd = 4; // but define the parameters in general.
return v.data();
}
}
Well this has been reported as a bug in clang. Unfortunately, the clang devs still don't know the best way to fix it. Until then, you can use gcc which will report the backtrace, or you can apply this patch to clang 3.4. The patch is a quick hack that will turn substitution failures into errors.

Compile time barriers - compiler code reordering - gcc and pthreads

AFAIK there are pthread functions that acts as memory barriers (e.g. here clarifications-on-full-memory-barriers-involved-by-pthread-mutexes). But what about compile-time barrier, i.e. is compiler (especially gcc) aware of this?
In other words - e.g. - is pthread_create() reason for gcc not to perform reordering?
For example in code:
a = 1;
pthread_create(...);
Is it certain that reordering will not take place?
What about invocations from different functions:
void fun(void) {
pthread_create(...);
...
}
a = 1;
fun();
Is fun() also compile time barrier (assuming pthread_create() is)?
What about functions in different translation units?
Please note that I am interested in general gcc and pthreads behavior scpecification, not necessarily x86-specific (various different embedded platforms in focus).
I am also not interested in other compilers/thread libraries behavior.
Because functions such as pthread_create() are external functions the compiler must ensure that any side effects that could be visible to an external function (such as a write to a global variable) must be done before calling the function. The compile couldn't reorder the write to a until after the function call in the first case) assuming a was global or otherwise potentially accessible externally).
This is behavior that is necessary for any C compiler, and really has little to do with threads.
However, if the variable a was a local variable, the compiler might be able to reorder it until after the function call (a might not even end up in memory at all for that matter), unless something like the address of a was taken and made available externally somehow (like passing it as the thread parameter).
For example:
int a;
void foo(void)
{
a = 1;
pthread_create(...); // the compiler can't reorder the write to `a` past
// the call to `pthread_create()`
// ...
}
void bar(void)
{
int b;
b = 1;
pthread_create(...); // `b` can be initialized after calling `pthread_create()`
// `b` might not ever even exist except as a something
// passed on the stack or in a register to `printf()`
printf( "%d\n", b);
}
I'm not sure if there's a document that outlines this in more detail - this is covered largely by C's 'as if' rule. In C99 that's in 5.1.2.3/3 "Program execution". C is specified by an abstract machine with sequence points where side effects must be complete, and programs must follow that abstract machine model except where the compiler can deduce that the side effects aren't needed.
In my foo() example above, the compiler would generally not be able to deduce that setting a = 1; isn't needed by pthread_create(), so the side effect of setting a to the value 1 must be completed before calling pthread_create(). Note that if there are compilers that perform global optimizations that can deduce that a isn't used elsewhere, they could delay or elide the assignment. However, in that case nothing else is using the side effect, so there would be no problem with that.

Resources