Critical region for the threads of current team - openmp

I want a piece of code to be critical for the current team of threads and not global critical.
How can i achieve this ??
quick-sort(args)
{
spawn threads #x
{
critical-region
{
// code
}
}
quick-sort(args)
quick-sort(args)
}
Here the open-mp critical-region construct will block all threads before accessing the critical region. But i don't have problem with two threads entering the critical region as long as they are not spawned at the same time. I want a solution for openMP.

You cannot do that with #pragma omp critical, but you may use OpenMP locks:
quick-sort(args)
{
declare an instance of OpenMP lock
omp_init_lock( the lock instance )
spawn threads #x
{
// critical-region
omp_set_lock( the lock instance )
{
// code
}
omp_unset_lock( the lock instance )
}
omp_destroy_lock( the lock instance )
quick-sort(args)
quick-sort(args)
}
Since each invocation of quick-sort will declare its own lock object, it will give you what you want.
However, from your pseudo-code it seems that you will never have two different thread teams running at the same time, unless there are OpenMP parallel regions in other functions. If the only code that has a parallel region ("spawns threads") is in quick-sort, you would need to have a recursive call to that function from inside the parallel region, which you do not.

Related

OpenMP stackvariables

I am currently learning about OpenMP. As default variables declared outside of the parallel region are public, where as variables inside of a parallel region are private. Also stack variables from inside the parallel regions are private.
double A[10];
int index[10];
#pragma omp parallel
{
work(index);
}
printf(%d\n”,index[0]);
But why is "index" on the above example public for each thread? Shouldn't it be private, since its put on the stack, and stack variables are private?
Thanks in advance
The statement
Stack variables in C functions called from parallel regions are
private
is true, but you need to differentiate in your case. First,
int index[10];
#pragma omp parallel
{
// index is a shared variable here
work(index);
}
But when it comes to the function you call, imagine:
void work(int* passed_index)
{
...
}
passed_index - the pointer - is in fact a private variable within work. You can change the pointer, and no other thread will notice.
But the data pointed to by *passed_index is still shared.

Avoiding deadlock in reentrant code C++11

I am working on refactoring some legacy code that suffers from deadlocks. There are two main root causes:
1) the same thread locking the same mutex multiple times, which should not difficult to resolve, and
2) the code occasionally calls into user defined functions which can enter the same code at the top level. I need to lock the mutex before calling user defined functions, but I might end up executing the same code again which will result in a deadlock situation. So, I need some mechanism to tell me that the mutex has already been locked and I should not lock it again. Any suggestions?
Here is a (very) brief summary of what the code does:
class TreeNode {
public:
// Assign a new value to this tree node
void set(const boost::any& value, boost::function<void, const TreeNode&> validator) {
boost::upgrade_lock<boost::shared_mutex> lock(mutexToTree_);
// call validator here
boost::upgrade_to_unique_lock<boost::shared_mutex> ulock(lock);
// set this TreeNode to value
}
// Retrieve the value of this tree node
boost::any get() {
boost::shared_lock<boost::shared_mutex> lock(mutexToTree_);
// get value for this tree node
}
private:
static boost::shared_mutex mutexToRoot_;
};
The problem is that the validator function can call into get(), which locks mutexToRoot_ on the same thread. I could modify mutexToRoot_ to be a recursive mutex but that would prevent other threads from reading the tree during get() operation, which is unwanted behavior.
Since C++11 you can use std::recursive_mutex, which allows the owning thread to call lock or try_lock without blocking/reporting failure, whereas the other threads will block on lock/receive false on try_lock until the owning thread calls unlock as many times as it called lock/try_lock before.

lock-free synchronization, fences and memory order (store operation with acquire semantics)

I am migrating a project that was run on bare-bone to linux, and need to eliminate some {disable,enable}_scheduler calls. :)
So I need a lock-free sync solution in a single writer, multiple readers scenario, where the writer thread cannot be blocked. I came up with the following solution, which does not fit to the usual acquire-release ordering:
class RWSync {
std::atomic<int> version; // incremented after every modification
std::atomic_bool invalid; // true during write
public:
RWSync() : version(0), invalid(0) {}
template<typename F> void sync(F lambda) {
int currentVersion;
do {
do { // wait until the object is valid
currentVersion = version.load(std::memory_order_acquire);
} while (invalid.load(std::memory_order_acquire));
lambda();
std::atomic_thread_fence(std::memory_order_seq_cst);
// check if something changed
} while (version.load(std::memory_order_acquire) != currentVersion
|| invalid.load(std::memory_order_acquire));
}
void beginWrite() {
invalid.store(true, std::memory_order_relaxed);
std::atomic_thread_fence(std::memory_order_seq_cst);
}
void endWrite() {
std::atomic_thread_fence(std::memory_order_seq_cst);
version.fetch_add(1, std::memory_order_release);
invalid.store(false, std::memory_order_release);
}
}
I hope the intent is clear: I wrap the modification of a (non-atomic) payload between beginWrite/endWrite, and read the payload only inside the lambda function passed to sync().
As you can see, here I have an atomic store in beginWrite() where no writes after the store operation can be reordered before the store. I did not find suitable examples, and I am not experienced in this field at all, so I'd like some confirmation that it is OK (verification through testing is not easy either).
Is this code race-free and work as I expect?
If I use std::memory_order_seq_cst in every atomic operation, can I omit the fences? (Even if yes, I guess the performance would be worse)
Can I drop the fence in endWrite()?
Can I use memory_order_acq_rel in the fences? I don't really get the difference -- the single total order concept is not clear to me.
Is there any simplification / optimization opportunity?
+1. I happily accept any better idea as the name of this class :)
The code is basically correct.
Instead of having two atomic variables (version and invalid) you may use single version variable with semantic "Odd values are invalid". This is known as "sequential lock" mechanism.
Reducing number of atomic variables simplifies things a lot:
class RWSync {
// Incremented before and after every modification.
// Odd values mean that object in invalid state.
std::atomic<int> version;
public:
RWSync() : version(0) {}
template<typename F> void sync(F lambda) {
int currentVersion;
do {
currentVersion = version.load(std::memory_order_seq_cst);
// This may reduce calls to lambda(), nothing more
if(currentVersion | 1) continue;
lambda();
// Repeat until something changed or object is in an invalid state.
} while ((currentVersion | 1) ||
version.load(std::memory_order_seq_cst) != currentVersion));
}
void beginWrite() {
// Writer may read version with relaxed memory order
currentVersion = version.load(std::memory_order_relaxed);
// Invalidation requires sequential order
version.store(currentVersion + 1, std::memory_order_seq_cst);
}
void endWrite() {
// Writer may read version with relaxed memory order
currentVersion = version.load(std::memory_order_relaxed);
// Release order is sufficient for mark an object as valid
version.store(currentVersion + 1, std::memory_order_release);
}
};
Note the difference in memory orders in beginWrite() and endWrite():
endWrite() makes sure that all previous object's modifications have been completed. It is sufficient to use release memory order for that.
beginWrite() makes sure that reader will detect object being in invalid state before any futher object's modification is started. Such garantee requires seq_cst memory order. Because of that reader uses seq_cst memory order too.
As for fences, it is better to incorporate them into previous/futher atomic operation: compiler knows how to make the result fast.
Explanations of some modifications of original code:
1) Atomic modification like fetch_add() is intended for cases, when concurrent modifications (like another fetch_add()) are possible. For correctness, such modifications use memory locking or other very time-costly architecture-specific things.
Atomic assignment (store()) does not use memory locking, so it is cheaper than fetch_add(). You may use such assignment because concurrent modifications are not possible in your case (reader does not modify version).
2) Unlike to release-acquire semantic, which differentiate load and store operations, sequential consistency (memory_order_seq_cst) is applicable to every atomic access, and provide total order between these accesses.
The accepted answer is not correct. I guess the code should be something like "currentVersion & 1" instead of "currentVersion | 1". And subtler mistake is that, reader thread can go into lambda(), and after that, the write thread could run beginWrite() and write value to non-atomic variable. In this situation, write action in payload and read action in payload haven't happens-before relationship. concurrent access (without happens-before relationship) to non-atomic variable is a data race. Note that, single total order of memory_order_seq_cst does not means the happens-before relationship; they are consistent, but two kind of things.

Is it safe to call self in std::async

I would like to check Mysql connections are alive.
(using Mysql-connector-c++)
so.. I call "check function" every 5 minutes
like this
void checkConnection()
{
std::this_thread::sleep_for(std::chrono::minutes(5));
// -- check connection
std::async(std::launch::async, checkConnection); // Here
}
Is it safe..??
In the case you specified, it is safe, because std::launch::async—when by itself—specifies that you want this to be done on a separate thread. The danger, otherwise, would have been stack overflow due to recursive calls.
However, note that the behavior here would be:
Sleep 5 mins on thread T1
Check connection on T1
Start new thread T2
End thread T1
Sleep 5 mins on thread T2
...
So you'd always be using one thread, except you'll pay for creation and destruction each time. It's better to use just one thread, e.g.
void checkConnection(std::atomic<bool>& stopFlag)
{
using namespace std::chrono; // for 5min shorthand, if C++14
while (!stopFlag) {
std::this_thread::sleep_for(5min);
// -- check connection
}
}
Or better yet, use a thread pool for which you can specify delays. But that's not provided by the standard library as of yet.

Efficient Independent Synchronized Blocks?

I have a scenario where, at certain points in my program, a thread needs to update several shared data structures. Each data structure can be safely updated in parallel with any other data structure, but each data structure can only be updated by one thread at a time. The simple, naive way I've expressed this in my code is:
synchronized updateStructure1();
synchronized updateStructure2();
// ...
This seems inefficient because if multiple threads are trying to update structure 1, but no thread is trying to update structure 2, they'll all block waiting for the lock that protects structure 1, while the lock for structure 2 sits untaken.
Is there a "standard" way of remedying this? In other words, is there a standard threading primitive that tries to update all structures in a round-robin fashion, blocks only if all locks are taken, and returns when all structures are updated?
This is a somewhat language agnostic question, but in case it helps, the language I'm using is D.
If your language supported lightweight threads or Actors, you could always have the updating thread spawn a new a new thread to change each object, where each thread just locks, modifies, and unlocks each object. Then have your updating thread join on all its child threads before returning. This punts the problem to the runtime's schedule, and it's free to schedule those child threads any way it can for best performance.
You could do this in langauges with heavier threads, but the spawn and join might have too much overhead (though thread pooling might mitigate some of this).
I don't know if there's a standard way to do this. However, I would implement this something like the following:
do
{
if (!updatedA && mutexA.tryLock())
{
scope(exit) mutexA.unlock();
updateA();
updatedA = true;
}
if (!updatedB && mutexB.tryLock())
{
scope(exit) mutexB.unlock();
updateB();
updatedB = true;
}
}
while (!(updatedA && updatedB));
Some clever metaprogramming could probably cut down the repetition, but I leave that as an exercise for you.
Sorry if I'm being naive, but do you not just Synchronize on objects to make the concerns independent?
e.g.
public Object lock1 = new Object; // access to resource 1
public Object lock2 = new Object; // access to resource 2
updateStructure1() {
synchronized( lock1 ) {
...
}
}
updateStructure2() {
synchronized( lock2 ) {
...
}
}
To my knowledge, there is not a standard way to accomplish this, and you'll have to get your hands dirty.
To paraphrase your requirements, you have a set of data structures, and you need to do work on them, but not in any particular order. You only want to block waiting on a data structure if all other objects are blocked. Here's the pseudocode I would base my solution on:
work = unshared list of objects that need updating
while work is not empty:
found = false
for each obj in work:
try locking obj
if successful:
remove obj from work
found = true
obj.update()
unlock obj
if !found:
// Everything is locked, so we have to wait
obj = randomly pick an object from work
remove obj from work
lock obj
obj.update()
unlock obj
An updating thread will only block if it finds that all objects it needs to use are locked. Then it must wait on something, so it just picks one and locks it. Ideally, it would pick the object that will be unlocked earliest, but there's no simple way of telling that.
Also, it's conceivable that an object might become free while the updater is in the try loop and so the updater would skip it. But if the amount of work you're doing is large enough, relative to the cost of iterating through that loop, the false conflict should be rare, and it would only matter in cases of extremely high contention.
I don't know any "standard" way of doing this, sorry. So this below is just a ThreadGroup, abstracted by a Swarm-class, that »hacks» at a job list until all are done, round-robin style, and makes sure that as many threads as possible are used. I don't know how to do this without a job list.
Disclaimer: I'm very new to D, and concurrency programming, so the code is rather amateurish. I saw this more as a fun exercise. (I'm too dealing with some concurrency stuff.) I also understand that this isn't quite what you're looking for. If anyone has any pointers I'd love to hear them!
import core.thread,
core.sync.mutex,
std.c.stdio,
std.stdio;
class Swarm{
ThreadGroup group;
Mutex mutex;
auto numThreads = 1;
void delegate ()[int] jobs;
this(void delegate()[int] aJobs, int aNumThreads){
jobs = aJobs;
numThreads = aNumThreads;
group = new ThreadGroup;
mutex = new Mutex();
}
void runBlocking(){
run();
group.joinAll();
}
void run(){
foreach(c;0..numThreads)
group.create( &swarmJobs );
}
void swarmJobs(){
void delegate () myJob;
do{
myJob = null;
synchronized(mutex){
if(jobs.length > 0)
foreach(i,job;jobs){
myJob = job;
jobs.remove(i);
break;
}
}
if(myJob)
myJob();
}while(myJob)
}
}
class Jobs{
void job1(){
foreach(c;0..1000){
foreach(j;0..2_000_000){}
writef("1");
fflush(core.stdc.stdio.stdout);
}
}
void job2(){
foreach(c;0..1000){
foreach(j;0..1_000_000){}
writef("2");
fflush(core.stdc.stdio.stdout);
}
}
}
void main(){
auto jobs = new Jobs();
void delegate ()[int] jobsList =
[1:&jobs.job1,2:&jobs.job2,3:&jobs.job1,4:&jobs.job2];
int numThreads = 2;
auto swarm = new Swarm(jobsList,numThreads);
swarm.runBlocking();
writefln("end");
}
There's no standard solution but rather a class of standard solutions depending on your needs.
http://en.wikipedia.org/wiki/Scheduling_algorithm

Resources