memory barrier usage with CAS operations - c++11

In the code snippet from CPP reference, the memory barriers std::memory_order_release and std::memory_order_relaxed are used for the success and failure cases respectively. When is it OK to use std::memory_order_release for both or std::memory_order_relaxed for both?
template<class T>
struct node
{
T data;
node* next;
node(const T& data) : data(data), next(nullptr) {}
};
template<class T>
class stack
{
std::atomic<node<T>*> head;
public:
void push(const T& data)
{
node<T>* new_node = new node<T>(data);
// put the current value of head into new_node->next
new_node->next = head.load(std::memory_order_relaxed);
// now make new_node the new head, but if the head
// is no longer what's stored in new_node->next
// (some other thread must have inserted a node just now)
// then put that new head into new_node->next and try again
while(!std::atomic_compare_exchange_weak_explicit(
&head,
&new_node->next,
new_node,
std::memory_order_release,
std::memory_order_relaxed))
; // the body of the loop is empty
// note: the above loop is not thread-safe in at least
// GCC prior to 4.8.3 (bug 60272), clang prior to 2014-05-05 (bug 18899)
// MSVC prior to 2014-03-17 (bug 819819). See member function version for workaround
}
};

Using relaxed for both would not be safe. If the compare_exchange succeeds, then head is updated with the value of new_node, and other threads reading head will get that pointer. However, without release ordering, the value written to new_node->next (now head->next) may not be globally visible yet, so if the other thread tries to read head->next it may see garbage, or misbehave in other ways.
Formally, the write to new_node->next needs to happen before any other thread tries to read it, which can only be ensured by having release ordering on the store that signals other threads that the value is ready. (Likewise, the thread that reads head needs to use acquire ordering.) With relaxed ordering on the success store, the happens-before relationship is not there, so the code has a data race and its behavior is undefined.
Using release for both would not make sense, because release ordering only makes sense for stores, and in the failure case, no store is performed. In fact, for this reason, passing std::memory_order_release for the failure ordering is actually illegal; this is stated on the page where you got the sample code from. Using acquire or seq_cst would be safe (stronger ordering is always safe) but unnecessary, and might cause a needless performance hit.

Related

memory leak delete linked list node

Newbie question. Suppose I have a C++11 linked list implementation with
template <typename X> struct Node {
X value;
Node* next;
Node(X x) {
this->value = x;
this->next = nullptr;
}
};
and later in the code I create a pointer variable
X x = something;
Node<X>* node = new Node(x);
and still later I do
delete node;
Is the x stored within node destructed when this statement is executed?
You may tell me I should use std::list instead of writing my own, but right
now I'm just trying to educate myself on pointers.
Since you did not provide a custom desctructor the compiler will generate the default one for you, which (by default) call destructors on its elements.
Now, the answer to your question really depends on what your x is :) If it is an object that has a destructor (like std::string) - it will be properly destroyed. But if it is a "naked pointer" (like int *) - it will not get destroyed and will cause a memory leak.
N.B. You create your x on a stack so I really-really-really hope that X provides proper copy semantics, otherwise you may end up with an invalid object stored in your node!

Avoiding deadlock in reentrant code C++11

I am working on refactoring some legacy code that suffers from deadlocks. There are two main root causes:
1) the same thread locking the same mutex multiple times, which should not difficult to resolve, and
2) the code occasionally calls into user defined functions which can enter the same code at the top level. I need to lock the mutex before calling user defined functions, but I might end up executing the same code again which will result in a deadlock situation. So, I need some mechanism to tell me that the mutex has already been locked and I should not lock it again. Any suggestions?
Here is a (very) brief summary of what the code does:
class TreeNode {
public:
// Assign a new value to this tree node
void set(const boost::any& value, boost::function<void, const TreeNode&> validator) {
boost::upgrade_lock<boost::shared_mutex> lock(mutexToTree_);
// call validator here
boost::upgrade_to_unique_lock<boost::shared_mutex> ulock(lock);
// set this TreeNode to value
}
// Retrieve the value of this tree node
boost::any get() {
boost::shared_lock<boost::shared_mutex> lock(mutexToTree_);
// get value for this tree node
}
private:
static boost::shared_mutex mutexToRoot_;
};
The problem is that the validator function can call into get(), which locks mutexToRoot_ on the same thread. I could modify mutexToRoot_ to be a recursive mutex but that would prevent other threads from reading the tree during get() operation, which is unwanted behavior.
Since C++11 you can use std::recursive_mutex, which allows the owning thread to call lock or try_lock without blocking/reporting failure, whereas the other threads will block on lock/receive false on try_lock until the owning thread calls unlock as many times as it called lock/try_lock before.

lock-free synchronization, fences and memory order (store operation with acquire semantics)

I am migrating a project that was run on bare-bone to linux, and need to eliminate some {disable,enable}_scheduler calls. :)
So I need a lock-free sync solution in a single writer, multiple readers scenario, where the writer thread cannot be blocked. I came up with the following solution, which does not fit to the usual acquire-release ordering:
class RWSync {
std::atomic<int> version; // incremented after every modification
std::atomic_bool invalid; // true during write
public:
RWSync() : version(0), invalid(0) {}
template<typename F> void sync(F lambda) {
int currentVersion;
do {
do { // wait until the object is valid
currentVersion = version.load(std::memory_order_acquire);
} while (invalid.load(std::memory_order_acquire));
lambda();
std::atomic_thread_fence(std::memory_order_seq_cst);
// check if something changed
} while (version.load(std::memory_order_acquire) != currentVersion
|| invalid.load(std::memory_order_acquire));
}
void beginWrite() {
invalid.store(true, std::memory_order_relaxed);
std::atomic_thread_fence(std::memory_order_seq_cst);
}
void endWrite() {
std::atomic_thread_fence(std::memory_order_seq_cst);
version.fetch_add(1, std::memory_order_release);
invalid.store(false, std::memory_order_release);
}
}
I hope the intent is clear: I wrap the modification of a (non-atomic) payload between beginWrite/endWrite, and read the payload only inside the lambda function passed to sync().
As you can see, here I have an atomic store in beginWrite() where no writes after the store operation can be reordered before the store. I did not find suitable examples, and I am not experienced in this field at all, so I'd like some confirmation that it is OK (verification through testing is not easy either).
Is this code race-free and work as I expect?
If I use std::memory_order_seq_cst in every atomic operation, can I omit the fences? (Even if yes, I guess the performance would be worse)
Can I drop the fence in endWrite()?
Can I use memory_order_acq_rel in the fences? I don't really get the difference -- the single total order concept is not clear to me.
Is there any simplification / optimization opportunity?
+1. I happily accept any better idea as the name of this class :)
The code is basically correct.
Instead of having two atomic variables (version and invalid) you may use single version variable with semantic "Odd values are invalid". This is known as "sequential lock" mechanism.
Reducing number of atomic variables simplifies things a lot:
class RWSync {
// Incremented before and after every modification.
// Odd values mean that object in invalid state.
std::atomic<int> version;
public:
RWSync() : version(0) {}
template<typename F> void sync(F lambda) {
int currentVersion;
do {
currentVersion = version.load(std::memory_order_seq_cst);
// This may reduce calls to lambda(), nothing more
if(currentVersion | 1) continue;
lambda();
// Repeat until something changed or object is in an invalid state.
} while ((currentVersion | 1) ||
version.load(std::memory_order_seq_cst) != currentVersion));
}
void beginWrite() {
// Writer may read version with relaxed memory order
currentVersion = version.load(std::memory_order_relaxed);
// Invalidation requires sequential order
version.store(currentVersion + 1, std::memory_order_seq_cst);
}
void endWrite() {
// Writer may read version with relaxed memory order
currentVersion = version.load(std::memory_order_relaxed);
// Release order is sufficient for mark an object as valid
version.store(currentVersion + 1, std::memory_order_release);
}
};
Note the difference in memory orders in beginWrite() and endWrite():
endWrite() makes sure that all previous object's modifications have been completed. It is sufficient to use release memory order for that.
beginWrite() makes sure that reader will detect object being in invalid state before any futher object's modification is started. Such garantee requires seq_cst memory order. Because of that reader uses seq_cst memory order too.
As for fences, it is better to incorporate them into previous/futher atomic operation: compiler knows how to make the result fast.
Explanations of some modifications of original code:
1) Atomic modification like fetch_add() is intended for cases, when concurrent modifications (like another fetch_add()) are possible. For correctness, such modifications use memory locking or other very time-costly architecture-specific things.
Atomic assignment (store()) does not use memory locking, so it is cheaper than fetch_add(). You may use such assignment because concurrent modifications are not possible in your case (reader does not modify version).
2) Unlike to release-acquire semantic, which differentiate load and store operations, sequential consistency (memory_order_seq_cst) is applicable to every atomic access, and provide total order between these accesses.
The accepted answer is not correct. I guess the code should be something like "currentVersion & 1" instead of "currentVersion | 1". And subtler mistake is that, reader thread can go into lambda(), and after that, the write thread could run beginWrite() and write value to non-atomic variable. In this situation, write action in payload and read action in payload haven't happens-before relationship. concurrent access (without happens-before relationship) to non-atomic variable is a data race. Note that, single total order of memory_order_seq_cst does not means the happens-before relationship; they are consistent, but two kind of things.

Is there a way to make a moved object "invalid"?

I've some code that moves an object into another object. I won't need the original, moved object anymore in the upper level. Thus move is the right choice I think.
However, thinking about safety I wonder if there is a way to invalidate the moved object and thus preventing undefined behaviour if someone accesses it.
Here is a nice example:
// move example
#include <utility> // std::move
#include <vector> // std::vector
#include <string> // std::string
int main () {
std::string foo = "foo-string";
std::string bar = "bar-string";
std::vector<std::string> myvector;
myvector.push_back (foo); // copies
myvector.push_back (std::move(bar)); // moves
return 0;
}
The description says:
The first call to myvector.push_back copies the value of foo into the
vector (foo keeps the value it had before the call). The second call
moves the value of bar into the vector. This transfers its content
into the vector (while bar loses its value, and now is in a valid but
unspecified state).
Is there a way to invalidate bar, such that access to it will cause a compiler error? Something like:
myvector.push_back (std::move(bar)); // moves
invalidate(bar); //something like bar.end() will then result in a compiler error
Edit: And if there is no such thing, why?
Accessing the moved object is not undefined behavior. The moved object is still a valid object, and the program may very well want to continue using said object. For example,
template< typename T >
void swap_by_move(T &a, T &b)
{
using std::move;
T c = move(b);
b = move(a);
a = move(c);
}
The bigger picture answer is because moving or not moving is a decision made at runtime, and giving a compile-time error is a decision made at compile time.
foo(bar); // foo might move or not
bar.baz(); // compile time error or not?
It's not going to work.. you can approximate in compile time analysis, but then it's going to be really difficult for developers to either not get an error or making anything useful in order to keep a valid program or the developer has to make annoying and fragile annotations on functions called to promise not to move the argument.
To put it a different way, you are asking about having a compile time error if you use an integer variable that contains the value 42. Or if you use a pointer that contains a null pointer value. You might be succcessful in implementing an approximate build-time code convention checker using clang the analysis API, however, working on the CFG of the C++ AST and erroring out if you can't prove that std::move has not been called till a given use of a variable.
Move semantics works like that so you get an object in any it's correct state. Correct state means that all fields have correct value, and all internal invariants are still good. That was done because after move you don't actually care about contents of moved object, but stuff like resource management, assignments and destructors should work OK.
All STL classes (and all classed with default move constructor/assignment) just swap it's content with new one, so both states are correct, and it's very easy to implement, fast, and convinient enough.
You can define your class that has isValid field that's generally true and on move (i. e. in move constructor / move assignment) sets that to false. Then your object will have correct state I am invalid. Just don't forget to check it where needed (destructor, assignment etc).
That isValid field can be either one pointer having null value. The point is: you know, that object is in predictable state after move, not just random bytes in memory.
Edit: example of String:
class String {
public:
string data;
private:
bool m_isValid;
public:
String(string const& b): data(b.data), isValid(true) {}
String(String &&b): data(move(b.data)) {
b.m_isValid = false;
}
String const& operator =(String &&b) {
data = move(b.data);
b.m_isValid = false;
return &this;
}
bool isValid() {
return m_isValid;
}
}

C++11 lock free stack

I'm reading C++ Concurrency in Action by Anthony Williams, and don't understand its push implementation of the lock_free_stack class.
Why on earth the atomic load is not in the while loop ? The reason he gave is:
You therefore don’t have to reload head each time through the loop,
because the compiler does that for you.
But I don't get the picture. Can someone shed some light on this?
template<typename T>
class lock_free_stack
{
private:
struct node
{
T data;
node* next;
node(T const& data_) :
data(data_)
{}
};
std::atomic<node*> head;
public:
void push(T const& data)
{
node* const new_node=new node(data);
new_node->next=head.load();
while(!head.compare_exchange_weak(new_node->next,new_node));
}
};
The key is in the interface to compare_exchange_weak, which in this case takes 2 arguments. The first is a reference to the expected value, and the second is the desired. If the current value of the atomic is not equal to the expected input, it will return false and the expected input is set to the current value.
So in this case, what it's doing is setting new_node->next = head. Then, it's saying if head is still equal to new_node->next, swap it into head. If it's no longer that value, it uses the reference to new_node->next to assign it the current value of head. Since every iteration of the loop that fails also replaces new_node->next with the current value of head, there is no read to duplicate that in the body of the loop.
From the documentation of compare_exchange_weak:
Atomically compares the value stored in *this with the value of
expected, and if those are equal, replaces the former with desired
(performs read-modify-write operation). Otherwise, loads the actual
value stored in *this into expected (performs load operation).
As you see, otherwise the actual value of head is loaded into expected.

Resources