How can you implement a condition variable using semaphores? - algorithm

A while back I was thinking about how to implement various synchronization primitives in terms of one another. For example, in pthreads you get mutexes and condition variables, and from these can build semaphores.
In the Windows API (or at least, older versions of the Windows API) there are mutexes and semaphores, but no condition variables. I think that it should be possible to build condition variables out of mutexes and semaphores, but for the life of me I just can't think of a way to do so.
Does anyone know of a good construction for doing this?

Here's a paper from Microsoft Research [pdf] which deals with exactly that.

One way of implementing X given semaphores is to add a server process to the system, use semaphores to communicate with it, and have the process do all the hard work of implementing X. As an academic exercise, this might be cheating, but it does get the job done, and it can be more robust to misbehaviour by the client processes, or to their sudden death.

I may be missing something here, but there seem to be a simpler way to implement a Condition from a Semaphore and Lock than the way described in the paper.
class Condition {
sem_t m_sem;
int m_waiters;
int m_signals;
pthread_mutex_t *m_mutex;
public:
Condition(pthread_mutex_t *_mutex){
sem_init(&this->m_sem,0,0);
this->m_waiters = 0;
this->m_signals = 0;
this->m_mutex = _mutex;
}
~Condition(){}
void wait();
void signal();
void broadcast();
};
void Condition::wait() {
this->m_waiters++;
pthread_mutex_unlock(this->m_mutex);
sem_wait(&this->m_sem);
pthread_mutex_lock(this->m_mutex);
this->m_waiters--;
this->m_signals--;
}
void Condition::signal() {
pthread_mutex_lock(this->m_mutex);
if (this->m_waiters && (this->m_waiters > this->m_signals)) {
sem_post(&this->m_sem);
this->m_signals++;
}
pthread_mutex_unlock(this->m_mutex);
}

Related

Linux Device Drivers 3Ed File IO & How to Influence Scheduling with Explanatory UML Diagrams

I've used UMLet to draw some UML diagrams describing various entity relationships for each of the chapters of Linux Device Drivers 3Ed (LDD3), by Corbet, Rubini, Kroah-Hartman. The latest version of the diagrams can be found here:
Linux Device Drivers 3Ed UML Diagrams
I would like to ask help understanding a scheduling problem which is supported by documentation in the Non-Blocking File IO Sequence Diagram(s) at the above link, and in LDD3 on P156-158, and in particular this code snippet from scull_getwritespace() (also see P156, but this code has been updated to use mutex rather than a semaphore):
/* Wait for space for writing; caller must hold device semaphore. On
* error the semaphore will be released before returning. */
static int scull_getwritespace(struct scull_pipe *dev, struct file *filp)
{
while (spacefree(dev) == 0) { /* full */
DEFINE_WAIT(wait);
mutex_unlock(&dev->mutex);
if (filp->f_flags & O_NONBLOCK)
return -EAGAIN;
PDEBUG("\"%s\" writing: going to sleep\n",current->comm);
prepare_to_wait(&dev->outq, &wait, TASK_INTERRUPTIBLE);
if (spacefree(dev) == 0)
schedule();
finish_wait(&dev->outq, &wait);
if (signal_pending(current))
return -ERESTARTSYS; /* signal: tell the fs layer to handle it */
if (mutex_lock_interruptible(&dev->mutex))
return -ERESTARTSYS;
}
return 0;
}
and in particular:
if(spacefree(dev) == 0)
schedule();
The case of interest is this:
spacefree(dev) == 0 is true and the write process is about to call schedule().
before schedule() is called, the read process issues wake_up_interruptible(&dev->outq) having consumed all the buffer data so the write process should be woken up, so it can produce more data. This sets the write process state back to TASK_RUNNING.
the write process calls schedule() and potentially goes to sleep.
The above is done to avoid race conditions.
Here are my questions:
Is it possible to modify the code/operation of the kernel so that I can guarantee that schedule() doesn't go to sleep but returns immediately from the call? I don't understand schedule() in sufficient detail to answer this question and any help would be appreciated. I think the answer is no because the scheduler gets to choose what happens next and there may be software interrupts, tasklets or signals to process.
Is it possible to modify the code so that I can guarantee the write process runs before the read process is re-entered? Again, I think the answer might be no but perhaps there are some possibilities with thread prioritisation.
I find the pictorial representations of the linux kernel entities very helpful in understanding the patterns in the kernel which greatly improves my coding productivity, but they are very tedious to generate by hand. In the interests of saving time has anybody else done something similar with specific reference to LDD3?
Thanks.
Not sure if you have had a chance to took into kernel map http://www.makelinux.net/kernel_map/

C++11 guarantee no concurrent calls

Are std::mutex and std::unique_ptr sufficient to guarantee that there will be no concurrent calls to an object? In the following code snippet will Object not have any concurrent calls?
class Example {
public:
std::mutex Mutex;
Example(){...
};
//
private:
static std::unique_ptr<Object> Mutex;
};
No, you would have to lock and unlock the mutex, when you need it. Just the existance of a mutex in no guarantee. Also a unique_ptr cannot change this!
mutex example is in the reference: http://en.cppreference.com/w/cpp/thread/mutex
Const however does now guarantee that nothing can change your object at the same time than you are using the const.
This obviously suppose you use a well coded object (like STL containers) and that no one tried to work around the compiler checks.
See : http://herbsutter.com/2013/01/01/video-you-dont-know-const-and-mutable/
unique_ptr is mainly related with resources management and RAII idiom. You don't need to use unique_ptr to achieve thread safety even in some scenarios it may prove useful. However, the example that you provided is a bit unusual, you used the identifier Mutex twice to indicate a std::mutex and a std::unique_ptr.
I suggest to get some good books on C++ before to dwelve in more complex topics like concurrency and resources management.
This nice article of Diego Dagum explains what is new regarding concurency in C++11
http://msdn.microsoft.com/en-us/magazine/hh852594.aspx
And an article about unique_ptr:
http://www.drdobbs.com/cpp/c11-uniqueptr/240002708

How to check which index in a loop is executing without slow down process?

What is the best way to check which index is executing in a loop without too much slow down the process?
For example I want to find all long fancy numbers and have a loop like
for( long i = 1; i > 0; i++){
//block
}
and I want to learn which i is executing in real time.
Several ways I know to do in the block are printing i every time, or checking if(i % 10000), or adding a listener.
Which one of these ways is the fastest. Or what do you do in similar cases? Is there any way to access the value of the i manually?
Most of my recent experience is with Java, so I'd write something like this
import java.util.concurrent.atomic.AtomicLong;
public class Example {
public static void main(String[] args) {
AtomicLong atomicLong = new AtomicLong(1); // initialize to 1
LoopMonitor lm = new LoopMonitor(atomicLong);
Thread t = new Thread(lm);
t.start(); // start LoopMonitor
while(atomicLong.get() > 0) {
long l = atomicLong.getAndIncrement(); // equivalent to long l = atomicLong++ if atomicLong were a primitive
//block
}
}
private static class LoopMonitor implements Runnable {
private final AtomicLong atomicLong;
public LoopMonitor(AtomicLong atomicLong) {
this.atomicLong = atomicLong;
}
public void run() {
while(true) {
try {
System.out.println(atomicLong.longValue()); // Print l
Thread.sleep(1000); // Sleep for one second
} catch (InterruptedException ex) {}
}
}
}
}
Most AtomicLong implementations can be set in one clock cycle even on 32-bit platforms, which is why I used it here instead of a primitive long (you don't want to inadvertently print a half-set long); look into your compiler / platform details to see if you need something like this, but if you're on a 64-bit platform then you can probably use a primitive long regardless of which language you're using. The modified for loop doesn't take much of an efficiency hit - you've replaced a primitive long with a reference to a long, so all you've added is a pointer dereference.
It won't be easy, but probably the only way to probe the value without affecting the process is to access the loop variable in shared memory with another thread. Threading libraries vary from one system to another, so I can't help much there (on Linux I'd probably use pthreads). The "monitor" thread might do something like probe the value once a minute, sleep()ing in between, and so allowing the first thread to run uninterrupted.
To have a null cost reporting (on multi-cpu computers) : set your index as a "global" property (class-wide for instance), and have a separate thread to read and report the index value.
This report could be timer-based (5 times per seconds or so).
Rq : Maybe you'll need also a boolean stating 'are we in the loop ?'.
Volatile and Caches
If you're going to be doing this in, say, C / C++ and use a separate monitor thread as previously suggested then you'll have to make the global/static loop variable volatile. You don't want the compiler decide deciding to use a register for the loop variable. Some toolchains make that assumption anyway, but there's no harm being explicit about it.
And then there's the small issue of caches. A separate monitor thread nowadays will end up on a separate core, and that'll mean that the two separate cache subsystems will have to agree on what the value is. That will unavoidably have a small impact on the runtime of the loop.
Real real time constraint?
So that begs the question of just how real time is your loop anyway? I doubt that your timing constraint is such that you're depending on it running within a specific number of CPU clock cycles. Two reasons, a) no modern OS will ever come close to guaranteeing that, you'd have to be running on the bare metal, b) most CPUs these days vary their own clock rate behind your back, so you can't count on a specific number of clock cycles corresponding to a specific real time interval.
Feature rich solution
So assuming that your real time requirement is not that constrained, you may wish to do a more capable monitor thread. Have a shared structure protected by a semaphore which your loop occasionally updates, and your monitor thread periodically inspects and reports progress. For best performance the monitor thread would take the semaphore, copy the structure, release the semaphore and then inspect/print the structure, minimising the semaphore locked time.
The only advantage of this approach over that suggested in previous answers is that you could report more than just the loop variable's value. There may be more information from your loop block that you'd like to report too.
Mutex semaphores in, say, C on Linux are pretty fast these days. Unless your loop block is very lightweight the runtime overhead of a single mutex is not likely to be significant, especially if you're updating the shared structure every 1000 loop iterations. A decent OS will put your threads on separate cores, but for the sake of good form you'd make the monitor thread's priority higher than the thread running the loop. This would ensure that the monitoring does actually happen if the two threads do end up on the same core.

getloadavg() within kernel

Is there an equivalent api like getloadavg() that can be used within the kernel i.e. for my own driver ?
I have a driver that is thrashing and I would like to throttle it, and i am looking for a kernel-api to find about the cpu usage.
Thank you.
You're probably looking for the get_avenrun() function in kernel/sched.c. An example of how to use it is in fs/proc/loadavg.c:
static int loadavg_proc_show(struct seq_file *m, void *v)
{
unsigned long avnrun[3];
get_avenrun(avnrun, FIXED_1/200, 0);
seq_printf(m, "%lu.%02lu %lu.%02lu %lu.%02lu %ld/%d %d\n",
LOAD_INT(avnrun[0]), LOAD_FRAC(avnrun[0]),
LOAD_INT(avnrun[1]), LOAD_FRAC(avnrun[1]),
LOAD_INT(avnrun[2]), LOAD_FRAC(avnrun[2]),
nr_running(), nr_threads,
task_active_pid_ns(current)->last_pid);
return 0;
}
Though I'm a little skeptical of how you can use the load average to modify a driver -- the load average is best treated as a heuristic for system administrators to gauge how their system changes over time, not necessarily how "healthy" it might be at any given moment -- what specifically in the driver is causing troubles? There's probably a better mechanism to make it play nicely with the rest of the system.

C/C++: duplicate main() loop in many threads

I have this "interesting" problem. I have this legacy code that looks like
int main()
{
while(true) {
doSomething();
}
}
I would like to duplicate that doSomething() in many threads, so that now main() would look like
int main() {
runManyThreads(threadEntry)
}
void threadEntry() {
while(true) {
doSomething();
}
}
The problem is that doSomething() access many global and static variables, and I cannot alter its code. Is there a trick to duplicate those static variables, so each thread has its own set ? (somekind of thread local storage, but without affecting doSomething())..
I use VisualC++
To make a long story short, no, at least not (what I'd call) reasonably.
Under the circumstance of not wanting to change doSomething(), your best bet is probably to run a number of copies of the existing process instead of attempting to use multi-threading. If each thread is going to use a separate copy of global variables and such anyway, the difference between multithreading and multiple processes will be fairly minor in any case.
Untested, but I think you can do something like:
#define threadlocal __declspec(thread)
And then put threadlocal before all the variables that should be local to the thread. Might not work though, it's generally not a good idea to just throw functions into threads when they weren't written to be multi-threaded.
Your best bet is thread local storage.

Resources