multi processor programming - badclhlock - multiprocessing

This code implements an alternative implementation of CLHLock in which
a thread reuses its own node instead of its predecessor node. How this
implementation can go wrong ?
public class BadCLHLock implements Lock {
// most recent lock holder
AtomicReference<Qnode> tail;
// thread-local variable
ThreadLocal<Qnode> myNode;
public void lock() {
Qnode qnode = myNode.get();
qnode.locked = true;
// I’m not done
// Make me the new tail, and find my predecessor
Qnode pred = tail.getAndSet(qnode);
// spin while predecessor holds lock
while (pred.locked) {}
public void unlock() {
// reuse my node next time
myNode.get().locked = false;
static class Qnode { // Queue node inner class
public boolean locked = false;

If thread A runs lock(), unlock() then lock() again then
Qnode pred = tail.getAndSet(qnode);
is executed qnode and pred will be referencing the same node, thus
pred.locked will always be true.
So, if myNode = A from the start, after the first lock()
also tail = A. now when we run lock() again we first set
A.locked = true, set pred = tail (pred = A) and wait for A.locked to turn true which will not happen.
-This is exercise 85 in "The art of multiprocessor programming" by Herlihy & Shavit

Let's T1, T2 be two concurrent threads.
Let's QN1, QN2 be QNodes of each thread respectively.
Deadlock Scenario:
T1 and T2 never acquired the lock:
Tail -> Null
T1 acquires the lock
Tail -> QN1(true)
T2 try to acquire the lock, but because T1 have it, must wait
Tail -> QN2(true) -> QN1(true)
T1 releases the lock
Tail -> QN2(true) -> QN1(false)
But before, T2 check QN1 had release the lock, T1 try to reacquire it.
Tail -> QN1(true) -> QN2(true) -> QN1(true)
Because QN2 points to QN1 as its predecessor, and QN1 knows nothing about.
QN2 thinks that QN1 still holds the lock, and QN1 have to wait QN2 release it.
Deadlock condition happen and nobody will be able to acquire the lock.


First excerpt of every next roll cycle file is being read as part of previous cycle

This is related to the previous question I have posted. I think that while it is related, it might be different enough to warrant its own question.
The code used is:
public static void main(String[] args){
ChronicleQueue QUEUE = SingleChronicleQueueBuilder.single("./chronicle/roll")
ExcerptTailer TAILER = QUEUE.createTailer();
ArrayList<Long> seqNums = new ArrayList<>();
//this reads all roll cycles starting from first and carries on to next rollcycle.
//busy spinner that spins non-stop trying to read from queue
int currentCycle = TAILER.cycle();
//if it moves over to new cycle, start over the sequencing (fresh start for next day)
int cycleCheck = TAILER.cycle();
long indexCheck = TAILER.index();
System.out.println("idx: "+indexCheck);
if (currentCycle != cycleCheck){
LOGGER.warn("Changing to new roll cycle, from: "+currentCycle+" to: "+cycleCheck+". Clearing list of size "+seqNums.size());
seqNums.clear(); // this may cause a memory issue see:
currentCycle = cycleCheck;
cycleCheck = TAILER.cycle();
indexCheck = TAILER.index();
System.out.println("cycle: "+cycleCheck);
System.out.println("idx: "+indexCheck);
//TODO:2nd option, on starting the chronicle runner, always move to end, and wait for next day's cycle to start
if (TAILER.readDocument(w ->"packet").marshallable(
m -> {
long seqNum ="seqNum").readLong();
int size = seqNums.size();
if (size > 0){
int idx;
if ((idx = seqNums.indexOf(seqNum)) >= 0){
LOGGER.warn("Duplicate seqNum: "+seqNum+" at idx: "+idx);
long previous = seqNums.get(size-1);
long gap = seqNum - previous;
if (Math.abs(gap) > 1L){
LOGGER.error("sequence gap at seqNum: "+previous+" and "+seqNum+"! Gap of "+gap);
))){ ; }else { TAILER.close(); break; }
//breaks out from spinner if nothing to be read.
//a named tailer could be used to pick up from where is left off.
At this point, I have 2 roll cycle files, one ends in a sequence Number of 1001, then the next file starts with seqNum of 0. Using the while loop, it would read both files, but there is an if statement to check that the cycle has changed or not and reset accordingly.
The output is as follows:
The output when .moveToCycle() is commented:
As you can see, the first index of the next file is read as part of previous file, but when I use the TAILER.moveToCycle(currentCycle) it moves to start of the next file again, but it has a different index this time. If you comment this line of code out, it will not re-read the entry with seqNum of 0.
Alright, I tested the following and it works just fine. How it works is that it reads the value (I am assuming the internal workings would only shift the index and cycle after it reads an incoming value), then tests for cycle change (from testing before reading to testing after reading). This is probably how one should iterate over multiple roll cycle files, while keeping track of when it roll overs.
Also, note that previously it prints cycle and index before printing the object, now it prints object before printing cycle and index, so its likely that you may misread it and assume it doesn't work if you try to test the following code.
public static void main(String[] args){
ChronicleQueue QUEUE = SingleChronicleQueueBuilder.single("./chronicle/roll")
ExcerptTailer TAILER = QUEUE.createTailer();
ArrayList<Long> seqNums = new ArrayList<>();
//this reads all roll cycles starting from first and carries on to next roll cycle.
//busy spinner that spins non-stop trying to read from queue
int currentCycle = TAILER.cycle();
AtomicLong seqNum = new AtomicLong();
if (TAILER.readDocument(w ->"packet").marshallable(
m -> {
long val ="seqNum").readLong();
//if it moves over to new cycle, start over the sequencing (fresh start for next day)
int cycleCheck = TAILER.cycle();
long indexCheck = TAILER.index();
System.out.println("cycle: "+cycleCheck);
System.out.println("idx: "+indexCheck);
if (currentCycle != cycleCheck){
LOGGER.warn("Changing to new roll cycle, from: "+currentCycle+" to: "+cycleCheck+". Clearing list of size "+seqNums.size());
seqNums.clear(); // this may cause a memory issue see:
currentCycle = cycleCheck;
int size = seqNums.size();
long val = seqNum.get();
if (size > 0){
int idx;
if ((idx = seqNums.indexOf(seqNum)) >= 0){
LOGGER.warn("Duplicate seqNum: "+seqNum+" at idx: "+idx);
long previous = seqNums.get(size-1);
long gap = val - previous;
if (Math.abs(gap) > 1L){
LOGGER.error("sequence gap at seqNum: "+previous+" and "+seqNum+"! Gap of "+gap);
} else { TAILER.close(); break; }
//breaks out from spinner if nothing to be read.
//a named tailer could be used to pick up from where is left off.

why does producer consumer queue with single producer/consumer doesn't need mutex?

The code From wikipedia for producer consumer queue with a single producer and a single consumer is:
semaphore fillCount = 0; // items produced
semaphore emptyCount = BUFFER_SIZE; // remaining space
procedure producer()
while (true)
item = produceItem();
procedure consumer()
while (true)
item = removeItemFromBuffer();
it is stated there that
The solution above works fine when there is only one producer and
When there are more producers/consumers, the pseudocode is the same, with a mutex guarding the putItemIntoBuffer(item); and removeItemFromBuffer(); sections:
mutex buffer_mutex; // similar to "semaphore buffer_mutex = 1", but different (see notes below)
semaphore fillCount = 0;
semaphore emptyCount = BUFFER_SIZE;
procedure producer()
while (true)
item = produceItem();
procedure consumer()
while (true)
item = removeItemFromBuffer();
My question is, why isn't the mutex required in the single producer single consumer case?
consider the following:
5 items in a queue allowing 10 items.
Producer produces an item , decrements the empty semaphore (and succeeds), then starts putting the item into the buffer, and is not finished
Consumer decrements the fill semaphore, then starts to remove item from buffer
unexpected. Trying to remove item from buffer (3) while putting item to buffer (2)
Why does what i described not happen?
Because such queue will usually be implemented as a circular queue. Producer will be writing to the tail of the queue, while consumer reads from the head. They never access the same memory at the same time.
The idea here is that both consumer and producer can track the position of the tail/head independently.
Consider the following pseudo-code:
int producerPtr = 0, consumerPtr = 0;
void putItemIntoBuffer(Item item)
data[producerPtr] = item;
producerPtr = (producerPtr + 1) % BUFFER_SIZE;
Item removeItemFromBuffer(void)
Item item = data[consumerPtr ];
consumerPtr = (consumerPtr + 1) % BUFFER_SIZE;
return item;
Both consumerPtr and producerPtr can be equal only when the queue is either full or empty in which case those functions will not be called as executing process will remain blocked on a semaphore.
You can say that semaphores are used as a message passing mechanism, allowing the other side to increment its pointer, synchronizing this.
Now if you have multiple processes on one side, that side will need to perform increment and data copying atomically, therefor mutex is needed, but only for the side that has multiple processes e.g. for multiple-producer and multiple-consumer queue you can use 2 separate mutexes to decrease contention.

Confusion in understanding thread::join() in "The C++ Programming Language"

When reading "The C++ Programming Language, 4th Edition" by Bjarne Stroustrup, section 42.2.4 join() of the new thread class in STL. It has an example code, that confuses me.
void run(int i, int n) // warning: really poor code
thread t1 {f};
thread t2;
vector<Foo> v;
// ...
if (i<n) {
thread t3 {g};
// ...
t2 = move(t3); // move t3 to outer scope
v[i] = Foo{}; // might throw
// ...
It has these comments after the code snippet:
Here, I have made several bad mistakes. In particular:
We may never reach the two join()s at the end. In that case, the
destructor for t1 will terminate the program.
We may reach the two join()s at the end without the move t2=move(t3) having executed. In that case, t2.join() will terminate the program.
Why we may never reach the two joins(), why the destructor for t1 is called before t1.join()?

Hand-over-hand locking with Rust

I'm trying to write an implementation of union-find in Rust. This is famously very simple to implement in languages like C, while still having a complex run time analysis.
I'm having trouble getting Rust's mutex semantics to allow iterative hand-over-hand locking.
Here's how I got where I am now.
First, this is a very simple implementation of part of the structure I want in C:
#include <stdlib.h>
struct node {
struct node * parent;
struct node * create(struct node * parent) {
struct node * ans = malloc(sizeof(struct node));
ans->parent = parent;
return ans;
struct node * find_root(struct node * x) {
while (x->parent) {
x = x->parent;
return x;
int main() {
struct node * foo = create(NULL);
struct node * bar = create(foo);
struct node * baz = create(bar);
baz->parent = find_root(bar);
Note that the structure of the pointers is that of an inverted tree; multiple pointers may point at a single location, and there are no cycles.
At this point, there is no path compression.
Here is a Rust translation. I chose to use Rust's reference-counted pointer type to support the inverted tree type I referenced above.
Note that this implementation is much more verbose, possibly due to the increased safety that Rust offers, but possibly due to my inexperience with Rust.
use std::rc::Rc;
struct Node {
parent: Option<Rc<Node>>
fn create(parent: Option<Rc<Node>>) -> Node {
Node {parent: parent.clone()}
fn find_root(x: Rc<Node>) -> Rc<Node> {
let mut ans = x.clone();
while ans.parent.is_some() {
ans = ans.parent.clone().unwrap();
fn main() {
let foo = Rc::new(create(None));
let bar = Rc::new(create(Some(foo.clone())));
let mut prebaz = create(Some(bar.clone()));
prebaz.parent = Some(find_root(bar.clone()));
Path compression re-parents each node along a path to the root every time find_root is called. To add this feature to the C code, only two new small functions are needed:
void change_root(struct node * x, struct node * root) {
while (x) {
struct node * tmp = x->parent;
x->parent = root;
x = tmp;
struct node * root(struct node * x) {
struct node * ans = find_root(x);
change_root(x, ans);
return ans;
The function change_root does all the re-parenting, while the function root is just a wrapper to use the results of find_root to re-parent the nodes on the path to the root.
In order to do this in Rust, I decided I would have to use a Mutex rather than just a reference counted pointer, since the Rc interface only allows mutable access by copy-on-write when more than one pointer to the item is live. As a result, all of the code would have to change. Before even getting to the path compression part, I got hung up on find_root:
use std::sync::{Mutex,Arc};
struct Node {
parent: Option<Arc<Mutex<Node>>>
fn create(parent: Option<Arc<Mutex<Node>>>) -> Node {
Node {parent: parent.clone()}
fn find_root(x: Arc<Mutex<Node>>) -> Arc<Mutex<Node>> {
let mut ans = x.clone();
let mut inner = ans.lock();
while inner.parent.is_some() {
ans = inner.parent.clone().unwrap();
inner = ans.lock();
This produces the error (with 0.12.0)
error: cannot assign to `ans` because it is borrowed
ans = inner.parent.clone().unwrap();
note: borrow of `ans` occurs here
let mut inner = ans.lock();
What I think I need here is hand-over-hand locking. For the path A -> B -> C -> ..., I need to lock A, lock B, unlock A, lock C, unlock B, ... Of course, I could keep all of the locks open: lock A, lock B, lock C, ... unlock C, unlock B, unlock A, but this seems inefficient.
However, Mutex does not offer unlock, and uses RAII instead. How can I achieve hand-over-hand locking in Rust without being able to directly call unlock?
EDIT: As the comments noted, I could use Rc<RefCell<Node>> rather than Arc<Mutex<Node>>. Doing so leads to the same compiler error.
For clarity about what I'm trying to avoid by using hand-over-hand locking, here is a RefCell version that compiles but used space linear in the length of the path.
fn find_root(x: Rc<RefCell<Node>>) -> Rc<RefCell<Node>> {
let mut inner : RefMut<Node> = x.borrow_mut();
if inner.parent.is_some() {
} else {
We can pretty easily do full hand-over-hand locking as we traverse this list using just a bit of unsafe, which is necessary to tell the borrow checker a small bit of insight that we are aware of, but that it can't know.
But first, let's clearly formulate the problem:
We want to traverse a linked list whose nodes are stored as Arc<Mutex<Node>> to get the last node in the list
We need to lock each node in the list as we go along the way such that another concurrent traversal has to follow strictly behind us and cannot muck with our progress.
Before we get into the nitty-gritty details, let's try to write the signature for this function:
fn find_root(node: Arc<Mutex<Node>>) -> Arc<Mutex<Node>>;
Now that we know our goal, we can start to get into the implementation - here's a first attempt:
fn find_root(incoming: Arc<Mutex<Node>>) -> Arc<Mutex<Node>> {
// We have to separate this from incoming since the lock must
// be borrowed from incoming, not this local node.
let mut node = incoming.clone();
let mut lock = incoming.lock();
// Could use while let but that leads to borrowing issues.
while lock.parent.is_some() {
node = lock.parent.as_ref().unwrap().clone(); // !! uh-oh !!
lock = node.lock();
If we try to compile this, rustc will error on the line marked !! uh-oh !!, telling us that we can't move out of node while lock still exists, since lock is borrowing node. This is not a spurious error! The data in lock might go away as soon as node does - it's only because we know that we can keep the data lock is pointing to valid and in the same memory location even if we move node that we can fix this.
The key insight here is that the lifetime of data contained within an Arc is dynamic, and it is hard for the borrow checker to make the inferences we can about exactly how long data inside an Arc is valid.
This happens every once in a while when writing rust; you have more knowledge about the lifetime and organization of your data than rustc, and you want to be able to express that knowledge to the compiler, effectively saying "trust me". Enter: unsafe - our way of telling the compiler that we know more than it, and it should allow us to inform it of the guarantees that we know but it doesn't.
In this case, the guarantee is pretty simple - we are going to replace node while lock still exists, but we are not going to ensure that the data inside lock continues to be valid even though node goes away. To express this guarantee we can use mem::transmute, a function which allows us to reinterpret the type of any variable, by just using it to change the lifetime of the lock returned by node to be slightly longer than it actually is.
To make sure we keep our promise, we are going to use another handoff variable to hold node while we reassign lock - even though this moves node (changing its address) and the borrow checker will be angry at us, we know it's ok since lock doesn't point at node, it points at data inside of node, whose address (in this case, since it's behind an Arc) will not change.
Before we get to the solution, it's important to note that the trick we are using here is only valid because we are using an Arc. The borrow checker is warning us of a possibly serious error - if the Mutex was held inline and not in an Arc, this error would be a correct prevention of a use-after-free, where the MutexGuard held in lock would attempt to unlock a Mutex which has already been dropped, or at least moved to another memory location.
use std::mem;
use std::sync::{Arc, Mutex};
fn find_root(incoming: Arc<Mutex<Node>>) -> Arc<Mutex<Node>> {
let mut node = incoming.clone();
let mut handoff_node;
let mut lock = incoming.lock().unwrap();
// Could use while let but that leads to borrowing issues.
while lock.parent.is_some() {
// Keep the data in node around by holding on to this `Arc`.
handoff_node = node;
node = lock.parent.as_ref().unwrap().clone();
// We are going to move out of node while this lock is still around,
// but since we kept the data around it's ok.
lock = unsafe { mem::transmute(node.lock().unwrap()) };
And, just like that, rustc is happy, and we have hand-over-hand locking, since the last lock is released only after we have acquired the new lock!
There is one unanswered question in this implementation which I have not yet received an answer too, which is whether the drop of the old value and assignment of a new value to a variable is a guaranteed to be atomic - if not, there is a race condition where the old lock is released before the new lock is acquired in the assignment of lock. It's pretty trivial to work around this by just having another holdover_lock variable and moving the old lock into it before reassigning, then dropping it after reassigning lock.
Hopefully this fully addresses your question and shows how unsafe can be used to work around "deficiencies" in the borrow checker when you really do know more. I would still like to want that the cases where you know more than the borrow checker are rare, and transmuting lifetimes is not "usual" behavior.
Using Mutex in this way, as you can see, is pretty complex and you have to deal with many, many, possible sources of a race condition and I may not even have caught all of them! Unless you really need this structure to be accessible from many threads, it would probably be best to just use Rc and RefCell, if you need it, as this makes things much easier.
I believe this to fit the criteria of hand-over-hand locking.
use std::sync::Mutex;
fn main() {
// Create a set of mutexes to lock hand-over-hand
let mutexes = Vec::from_fn(4, |_| Mutex::new(false));
// Lock the first one
let val_0 = mutexes[0].lock();
if !*val_0 {
// Lock the second one
let mut val_1 = mutexes[1].lock();
// Unlock the first one
// Do logic
*val_1 = true;
for mutex in mutexes.iter() {
println!("{}" , *mutex.lock());
Edit #1
Does it work when access to lock n+1 is guarded by lock n?
If you mean something that could be shaped like the following, then I think the answer is no.
struct Level {
data: bool,
child: Option<Mutex<Box<Level>>>,
However, it is sensible that this should not work. When you wrap an object in a mutex, then you are saying "The entire object is safe". You can't say both "the entire pie is safe" and "I'm eating the stuff below the crust" at the same time. Perhaps you jettison the safety by creating a Mutex<()> and lock that?
This is still not the answer your literal question of to how to do hand-over-hand locking, which should only be important in a concurrent setting (or if someone else forced you to use Mutex references to nodes). It is instead how to do this with Rc and RefCell, which you seem to be interested in.
RefCell only allows mutable writes when one mutable reference is held. Importantly, the Rc<RefCell<Node>> objects are not mutable references. The mutable references it is talking about are the results from calling borrow_mut() on the Rc<RefCell<Node>>object, and as long as you do that in a limited scope (e.g. the body of the while loop), you'll be fine.
The important thing happening in path compression is that the next Rc object will keep the rest of the chain alive while you swing the parent pointer for node to point at root. However, it is not a reference in the Rust sense of the word.
struct Node
parent: Option<Rc<RefCell<Node>>>
fn find_root(mut node: Rc<RefCell<Node>>) -> Rc<RefCell<Node>>
while let Some(parent) = node.borrow().parent.clone()
node = parent;
return node;
fn path_compress(mut node: Rc<RefCell<Node>>, root: Rc<RefCell<Node>>)
while node.borrow().parent.is_some()
let next = node.borrow().parent.clone().unwrap();
node.borrow_mut().parent = Some(root.clone());
node = next;
This runs fine for me with the test harness I used, though there may still be bugs. It certainly compiles and runs without a panic! due to trying to borrow_mut() something that is already borrowed. It may actually produce the right answer, that's up to you.
On IRC, Jonathan Reem pointed out that inner is borrowing until the end of its lexical scope, which is too far for what I was asking. Inlining it produces the following, which compiles without error:
fn find_root(x: Arc<Mutex<Node>>) -> Arc<Mutex<Node>> {
let mut ans = x.clone();
while ans.lock().parent.is_some() {
ans = ans.lock().parent.clone().unwrap();
EDIT: As Francis Gagné points out, this has a race condition, since the lock doesn't extend long enough. Here's a modified version that only has one lock() call; perhaps it is not vulnerable to the same problem.
fn find_root(x: Arc<Mutex<Node>>) -> Arc<Mutex<Node>> {
let mut ans = x.clone();
loop {
ans = {
let tmp = ans.lock();
match tmp.parent.clone() {
None => break,
Some(z) => z
EDIT 2: This only holds one lock at a time, and so is racey. I still don't know how to do hand-over-hand locking.
As pointed out by Frank Sherry and others, you shouldn't use Arc/Mutex when single threaded. But his code was outdated, so here is the new one (for version 1.0.0alpha2).
This does not take linear space either (like the recursive code given in the question).
struct Node {
parent: Option<Rc<RefCell<Node>>>
fn find_root(node: Rc<RefCell<Node>>) -> Rc<RefCell<Node>> {
let mut ans = node.clone(); // Rc<RefCell<Node>>
loop {
ans = {
let ans_ref = ans.borrow(); // std::cell::Ref<Node>
match ans_ref.parent.clone() {
None => break,
Some(z) => z
} // ans_ref goes out of scope, and ans becomes mutable
fn path_compress(mut node: Rc<RefCell<Node>>, root: Rc<RefCell<Node>>) {
while node.borrow().parent.is_some() {
let next = {
let node_ref = node.borrow();
node.borrow_mut().parent = Some(root.clone());
// RefMut<Node> from borrow_mut() is out of scope here...
node = next; // therefore we can mutate node
Note for beginners: Pointers are automatically dereferenced by dot operator. ans.borrow() actually means (*ans).borrow(). I intentionally used different styles for the two functions.
Although not the answer to your literal question (hand-over locking), union-find with weighted-union and path-compression can be very simple in Rust:
fn unionfind<I: Iterator<(uint, uint)>>(mut iterator: I, nodes: uint) -> Vec<uint>
let mut root = Vec::from_fn(nodes, |x| x);
let mut rank = Vec::from_elem(nodes, 0u8);
for (mut x, mut y) in iterator
// find roots for x and y; do path compression on look-ups
while (x != root[x]) { root[x] = root[root[x]]; x = root[x]; }
while (y != root[y]) { root[y] = root[root[y]]; y = root[y]; }
if x != y
// weighted union swings roots
match rank[x].cmp(&rank[y])
Less => root[x] = y,
Greater => root[y] = x,
Equal =>
root[y] = x;
rank[x] += 1
Maybe the meta-point is that the union-find algorithm may not be the best place to handle node ownership, and by using references to existing memory (in this case, by just using uint identifiers for the nodes) without affecting the lifecycle of the nodes makes for a much simpler implementation, if you can get away with it of course.

Dekker's algorithm for 3 processes

As my asignment I have to verify something on Dekker's algorithm - but with 3 processes -
I can only find original version for 2 processes.
The goal is not the algorithm but it's implementation and verification in SMV System
You should probably ask the course staff, but you can use two Dekker mutexes to achieve three-process mutex. Processes 0 and 1 compete to acquire mutex A; the holder of mutex A and process 2 compete to acquire mutex B, whose holder is allowed to run a critical section.
// Dekkers algorithm version3
boolean t1WantsToEnter=false;
boolean t2WantsToEnter=false;
startThreads;//initialize and launch all threads
Thread T1;
void main();{
while(t2WantsToEnter); //wait
//critical section cokde
//code outside critical section
Thread T2;
void main();{
//critical section code
//code outside critical section
// if u want to know how this code is executed using RAM diagram,
// Dekkers algorithm version3
boolean t1WantsToEnter = false;
boolean t2WantsToEnter = false;
startThreads; // initialize and launch all threads
// Thread T1
void main() {
t1WantsToEnter = true;
while(t2WantsToEnter) { // wait
// critical section cokde
t1WantsToEnter = false;
// code outside critical section
} // end outer while
} // end Thread T1
// Thread T2
void main() {
t2WantsToEnter = true;
while (t1WantsToEter) { //wait
// critical section code
t2WantsToEnter =false;
// code outside critical section
} // end outer while
} // end Thread T2
