Dependent sections in openmp - openmp

I'm wondering if there is a way to include a dependency between sections in OpenMP? I know that there is a possibility to do that when using tasks but is there a way using sections? Say that I have the following case:
#pragma omp parallell sections
{
#pragma single
{
if (....) {
#pragma section
{
a = A(); // <- Takes much more time than B and will be parallellized further within A
}
#pragma section
{
b = B();
}
}
...
#pragma section (dependent on b?)
for (...) {
c = C(b);
}
}
}
What would be the best way to make sure that the last section is executed after 'b' is available?

Related

Spinlock code using atomic_flag in C++ is not compiling in Mac

I tried to write a simple Spinlock code in C++ but the code is not getting compiled in the Mac but is compiling on other gcc compilers.
#include<iostream>
#include<thread>
#include<atomic>
using namespace std;
class SpinLock
{
public:
atomic_flag flag;
SpinLock() : flag(ATOMIC_FLAG_INIT) {}
void lock()
{
while(flag.test_and_set());
}
void unlock()
{
flag.clear();
}
};
SpinLock spin;
void critical_section()
{
spin.lock();
for(int i = 0 ; i < 10 ; i++)
{
cout << i << " ";
}
cout << endl;
spin.unlock();
}
int main()
{
thread t1(critical_section);
thread t2(critical_section);
t1.join();
t2.join();
}
I tried checking version of clang installed in my machine at the version is:
Apple clang version 13.1.6 (clang-1316.0.21.2)
Target: arm64-apple-darwin21.4.0
Thread model: posix
Can any one help me in the fact that why code is not getting compiled?
atomic_flag (const atomic_flag&T) is a deleted constructor, and STL defines only two constructors for the atomic_flag (another one is default).
ATOMIC_FLAG_INIT usually doesn't even refer to a value to feed the constructor with, it's just a macro to call some kind of default initialization (for MSVC it's defined as curly brackets {}). You may want to initialize your flag like this:
class SpinLock
{
public:
atomic_flag flag = ATOMIC_FLAG_INIT;
SpinLock() {}
...

using C++11 templates to generate multiple versions of an algorithm

Say I'm making a general-purpose collection of some sort, and there are 4-5 points where a user might want to choose implementation A or B. For instance:
homogenous or heterogenous
do we maintain a count of the contained objects, which is slower
do we have it be thread-safe or not
I could just make 16 or 32 implementations, with each combination of features, but obviously this won't be easy to write or maintain.
I could pass in boolean flags to the constructor, that the class could check before doing certain operations. However, the compiler doesn't "know" what those arguments were so has to check them every time, and just checking enough boolean flags itself imposes a performance penalty.
So I'm wondering if template arguments can somehow be used so that at compile time the compiler sees if (false) or if (true) and therefore can completely optimize out the condition test, and if false, the conditional code. I've only found examples of templates as types, however, not as compile-time constants.
The main goal would be to utterly eliminate those calls to lock mutexes, increment and decrement counters, and so on, but additionally, if there's some way to actually remove the mutex or counters from the object structure as well that's be truly optimal.
Conditional computation before 17 was mostly about template specialization. Either specializing the function itself
template<> void f<int>(int) {
std::cout << "Locking an int...\n";
std::cout << "Unlocking an int...\n";
}
template<> void f<std::mutex>(std::mutex &m) {
m.lock();
m.unlock();
}
But this actually creates a rather branchy code (in your case I suspect), so a more sound alternative would be to extract all the dependent, type-specific, parts into static interface and define a static implementation of it for a particular concrete type:
template<class T> struct lock_traits; // interface
template<> struct lock_traits<int> {
void lock(int &) { std::cout << "Locking an int...\n"; }
void unlock(int &) { std::cout << "Unlocking an int...\n"; }
};
template<> struct lock_traits<std::mutex> {
void lock(std::mutex &m) { m.lock(); }
void unlock(std::mutex &m) { m.unlock(); }
};
template<class T> void f(T &t) {
lock_traits<T>::lock(t);
lock_traits<T>::unlock(t);
}
In C++17 if constrexpr was finally introduced, now not all branches do have to compile in all circumstances.
template<class T> void f(T &t) {
if constexpr<std::is_same_v<T, std::mutex>> {
t.lock();
}
else if constexpr<std::is_same_v<T, int>> {
std::cout << "Locking an int...\n";
}
if constexpr<std::is_same_v<T, std::mutex>> {
t.unlock();
}
// forgot to unlock an int here :(
}

Implementing a lock-free stack with OpenMP: compare-and-swap

I'm writing a C library on which I want to optionally support concurrence by using OpenMP (so that one may compile it serially if the compiler does not support OpenMP). I'd like to use a lock-free stack implementation.
I thought about using C's stdatomic.h for the stack, but seems that until a few weeks ago, GCC couldn't use _Atomic with OpenMP, so this would complicate portability. Clang 3.8 seems to handle atomics with OpenMP correctly, but still this would not be the best choice since there's no need to keep it atomic when compiling without OpenMP (thus serially).
I seem to need to use a compare-and-exchange operation when popping from the stack, and I couldn't find any information regarding compare-and-exchange on OpenMP. Is there any way to implement a lock-free stack solely with OpenMP?
My code so far (works with clang):
struct lfstack_node {
void *value;
struct lfstack_node *next;
};
typedef struct lfstack {
_Atomic(size_t) size;
_Atomic(struct lfstack_node *) head;
_Atomic int aba;
} *lfstack_t;
// ...
void *lfstack_pop(lfstack_t stack) {
if(stack) {
atomic_fetch_add(&stack->aba, 1);
struct lfstack_node *node, *next;
do {
node = atomic_load(&stack->head);
if(!node) {
break;
}
// ABA problem here if not handled correctly
next = node->next;
} while(!atomic_compare_exchange_weak(&stack->head, &node, next));
atomic_fetch_sub(&stack->aba, 1);
if(node) {
int zero = 0;
while(!atomic_compare_exchange_weak(&stack->aba, &zero, zero)) {
continue;
}
void *value = node->value;
free(node);
return value;
}
}
return NULL;
}

remove repetition when switching enum class

When I switch on an enum class I have to restate the enum class in every case. This bugs me since outside of constexpr-constructs it is hard to imagine what else I could mean. Is there away to inform the compiler that everything inside a block should be resolved to an enum class of my choice if there is a match?
consider the following example that contains a compiling snippet and for comparisson a non compiling snippet (commented out) that I would like to write.
#include <iostream>
enum class State : std::uint8_t;
void writeline(const char * msg);
void Compiles(State some);
enum class State : std::uint8_t
{
zero = 0,
one = 1
};
int main()
{
Compiles(State::zero);
return 0;
}
void Compiles(State some)
{
switch (some)
{
case State::zero: //State::
writeline("0");
break;
case State::one: //State::
writeline("1");
break;
default:
writeline("d");
break;
}
}
//void WhatIWant(State some)
//{
// using State{ //this makes no sense to the compiler but it expresses what I want to write
// switch (some)
// {
// case zero: //I want the compiler to figure out State::zero
// writeline("0");
// break;
// case one: //I want the compiler to figure out State::one
// writeline("1");
// break;
// default:
// writeline("d");
// break;
// }
// }
//}
void writeline(const char * msg)
{
std::cout << msg << std::endl;
}
Is there a way to use a switch statement and have the compiler figure out the enum class, maybe after giving a hint once?
enum class spessially designed in a way so you have to apply State:: every time.
If you don't want to use State:: prefix in every statement, just use old-style enum from c++98.
NOTE: with C++11 you still can use smth like: enum MyEnym: std::uint8_t{ ... } with regular enums.

OpenMP parallel section dependency

I am running into a very odd OpenMP problem that I can't figure out. This is what it looks like.
I have (let's say)four functions
function0(Data *data) { ..body.. }
function1(Data *data) { ..body.. }
function2(Data *data) { ..body.. }
function3(Data *data) { ..body.. }
These function may or may not modify what is pointed by data
Here is how they are called sequentially
// Version 1
void test(Data *data)
{
function0(data);
function1(data);
function2(data);
function3(data);
}
I can rearrange the calls in anyway I want and it still works perfect. So i am assuming they are somehow(?) independent.
Now when parallelizing
// Version 2
void test(Data *data)
{
int th_id;
#pragma omp parallel private(th_id) default(shared)
{
th_id = omp_get_thread_num();
if(th_id==0) {
function0(data);
}
if(th_id==1) {
function1(data);
}
if(th_id==2){
function2(data);
}
if(th_id==3){
function3(data);
}
}
}
It DOESN'T work (version 2).
However if I synchronize the threads after each call it works
// Version 3
void test(Data *data)
{
int th_id;
#pragma omp parallel private(th_id) default(shared)
{
th_id = omp_get_thread_num();
if(th_id==0) {
function0(data);
}
#pragma omp barrier
if(th_id==1) {
function1(data);
}
#pragma omp barrier
if(th_id==2){
function2(data);
}
#pragma omp barrier
if(th_id==3){
function3(data);
}
}
}
I am thinking there is some data racing problem regarding what is pointed by data
But why would it work (in the sequential version 1) when I rearrange the calls then?
Suppose you had two function like this
function0(int *data)
{
*data = *data + 1;
}
function1(int *data)
{
*data = *data + 2;
}
Clearly you can run those two operations in either order sequentially and at the end the value will have been incremented by 3.
However, if you run the two functions in parallel you have a data race, and it's entirely posisble that one of the additions will be lost, so you could get any the initial value incremented by 1,2, or 3.
Just because the functions appear to be commutative sequentially that doesn't mean that they can safely be run in parallel.

Resources