inter-process condition variables in Windows - windows

I know that I can use condition variable to synchronize work between the threads, but is there any class like this (condition variable) to synchronize work between the processes, thanks in advance

Use a pair of named Semaphore objects, one to signal and one as a lock. Named sync objects on Windows are automatically inter-process, which takes care of that part of the job for you.
A class like this would do the trick.
class InterprocessCondVar {
private:
HANDLE mSem; // Used to signal waiters
HANDLE mLock; // Semaphore used as inter-process lock
int mWaiters; // # current waiters
protected:
public:
InterprocessCondVar(std::string name)
: mWaiters(0), mLock(NULL), mSem(NULL)
{
// NOTE: You'll need a real "security attributes" pointer
// for child processes to see the semaphore!
// "CreateSemaphore" will do nothing but give you the handle if
// the semaphore already exists.
mSem = CreateSemaphore( NULL, 0, std::numeric_limits<LONG>::max(), name.c_str());
std::string lockName = name + "_Lock";
mLock = CreateSemaphore( NULL, 0, 1, lockName.c_str());
if(!mSem || !mLock) {
throw std::runtime_exception("Semaphore create failed");
}
}
virtual ~InterprocessCondVar() {
CloseHandle( mSem);
CloseHandle( mLock);
}
bool Signal();
bool Broadcast();
bool Wait(unsigned int waitTimeMs = INFINITE);
}
A genuine condition variable offers 3 calls:
1) "Signal()": Wake up ONE waiting thread
bool InterprocessCondVar::Signal() {
WaitForSingleObject( mLock, INFINITE); // Lock
mWaiters--; // Lower wait count
bool result = ReleaseSemaphore( mSem, 1, NULL); // Signal 1 waiter
ReleaseSemaphore( mLock, 1, NULL); // Unlock
return result;
}
2) "Broadcast()": Wake up ALL threads
bool InterprocessCondVar::Broadcast() {
WaitForSingleObject( mLock, INFINITE); // Lock
bool result = ReleaseSemaphore( mSem, nWaiters, NULL); // Signal all
mWaiters = 0; // All waiters clear;
ReleaseSemaphore( mLock, 1, NULL); // Unlock
return result;
}
3) "Wait()": Wait for the signal
bool InterprocessCondVar::Wait(unsigned int waitTimeMs) {
WaitForSingleObject( mLock, INFINITE); // Lock
mWaiters++; // Add to wait count
ReleaseSemaphore( mLock, 1, NULL); // Unlock
// This must be outside the lock
return (WaitForSingleObject( mSem, waitTimeMs) == WAIT_OBJECT_0);
}
This should ensure that Broadcast() ONLY wakes up threads & processes that are already waiting, not all future ones too. This is also a VERY heavyweight object. For CondVars that don't need to exist across processes I would create a different class w/ the same API, and use unnamed objects.

You could use named semaphore or named mutex. You could also share memory between processes by shared memory.

For a project I'm working on I needed a condition variable and mutex implementation which can handle dead processes and won't cause other processes to end up in a deadlock in such a case. I implemented the mutex with the native named mutexes provided by the WIN32 api because they can indicate whether a dead process owns the lock by returning WAIT_ABANDONED. The next issue was that I also needed a condition variable I could use across processes together with these mutexes. I started of with the suggestion from user3726672 but soon discovered that there are several issues in which the state of the counter variable and the state of the semaphore ends up being invalid.
After doing some research, I found a paper by Microsoft Research which explains exactly this scenario: Implementing Condition Variables with Semaphores . It uses a separate semaphore for every single thread to solve the mentioned issues.
My final implementation uses a portion of shared memory in which I store a ringbuffer of thread-ids (the id's of the waiting threads). The processes then create their own handle for every named semaphore/thread-id which they have not encountered yet and cache it. The signal/broadcast/wait functions are then quite straight forward and follow the idea of the proposed solution in the paper. Just remember to remove your thread-id from the ringbuffer if your wait operation fails or results in a timeout.
For the Win32 implementation I recommend reading the following documents:
Semaphore Objects and Using Mutex Objects as those describe the functions you'll need for the implementation.
Alternatives: boost::interprocess has some robust mutex emulation support but it is based on spin locks and caused a very high cpu load on our embedded system which was the final reason why we were looking into our own implementation.
#user3726672: Could you update your post to point to this post or to the referenced paper?
Best Regards,
Michael
Update:
I also had a look at an implementation for linux/posix. Turns out pthread already provides everything you'll need. Just put pthread_cond_t and pthread_mutex_t in some shared memory to share it with the other process and initialize both with PTHREAD_PROCESS_SHARED. Also set PTHREAD_MUTEX_ROBUST on the mutex.

Yes. You can use a (named) Mutex for that. Use CreateMutex to create one. You then wait for it (with functions like WaitForSingleObject), and release it when you're done with ReleaseMutex.

For reference, Boost.Interprocess (documentation for version 1.59) has condition variables and much more. Please note, however, that as of this writing, that "Win32 synchronization is too basic".

Related

Why it is mandatory to check the condition in wait_event after prepare_to_wait?

I am trying to understand how wait_event is implemented in linux kernel. There is a code example in ldd3 where the internal implementation is explained using prepare_to_wait (http://www.makelinux.net/ldd3/chp-6-sect-2).
static int scull_getwritespace(struct scull_pipe *dev, struct file *filp)
{
while (spacefree(dev) == 0) {
DEFINE_WAIT(wait);
up(&dev->sem);
if (filp->f_flags & O_NONBLOCK)
return -EAGAIN;
PDEBUG("\"%s\" writing: going to sleep\n",current->comm);
prepare_to_wait(&dev->outq, &wait, TASK_INTERRUPTIBLE);
if (spacefree(dev) == 0) // Why is this check necessary ??
schedule( );
finish_wait(&dev->outq, &wait);
if (signal_pending(current))
return -ERESTARTSYS; /* signal: tell the fs layer to handle it */
if (down_interruptible(&dev->sem))
return -ERESTARTSYS;
}
return 0;
}
In the book, it is explained as below.
Then comes the obligatory check on the buffer; we must handle the case
in which space becomes available in the buffer after we have entered
the while loop (and dropped the semaphore) but before we put ourselves
onto the wait queue. Without that check, if the reader processes were
able to completely empty the buffer in that time, we could miss the
only wakeup we would ever get and sleep forever. Having satisfied
ourselves that we must sleep, we can call schedule.
I am not able to understand this piece of explanation. How we would go to a indefinite sleep if the if (spacefree(dev) == 0) is not done before calling schedule() ?
if this obligatory check is not present, wakeup() still resets the process state to TASK_RUNNING and schedule returns as explained in the next paragraph.
It is worth looking again at this case: what happens if the wakeup
happens between the test in the if statement and the call to schedule?
In that case, all is well. The wakeup resets the process state to
TASK_RUNNING and schedule returns—although not necessarily right away.
As long as the test happens after the process has put itself on the
wait queue and changed its state, things will work.
The important thing is that the (last) check is done after prepare_to_wait() was called.
prepare_to_wait() puts a pointer to the current process into the wait queue. If the wakeup happens before the prepare_to_wait() call, the wakeup would not be able to affect the current process.

What's the correct method for CoreAudio realtime thread to communicate with UI thread?

I need to pass data between CoreAudio's realtime thread and the UI thread (one way, RT->UI). I know I can't use any Cocoa/Objective C methods like performSelectorOnMainThread or NSNotification and I can't use anything that will allocate memory as this will potentially block the RT thread.
What is the correct method for communicating between threads? Can I use GCD message queues or is there a more basic system to use?
Edit:
Thinking about this a bit more, I suppose I could use a lock free ring buffer, which the RT thread puts a message into, and the UI thread checks for messages to pull out. Is this the best way and if so is there a system already to do this in CoreAudio or available elsewhere or do I need to code it up myself?
It turns out this was a lot simpler than I expected and the solution I came up with was just to use the Portaudio ring buffer. I needed to add pa_ringbuffer.[ch] and pa_memorybarrier.h to my project and then define a MessageData structure to store in the ring buffer.
typedef struct MessageData {
MessageType type;
union {
struct {
NSUInteger position;
} position;
} data;
} MessageData;
Then I allocated some space to store 32 messages and created the ring buffer.
_playbackData->RTToMainBuffer = malloc(sizeof(MessageData) * 32);
PaUtil_InitializeRingBuffer(&_playbackData->RTToMainRB, sizeof(MessageData),
32, _playbackData->RTToMainBuffer);
Finally I started an NSTimer for every 20ms to pull data from the ring buffer
while (PaUtil_GetRingBufferReadAvailable(&_playbackData->RTToMainRB)) {
MessageData *dataPtr1, *dataPtr2;
ring_buffer_size_t sizePtr1, sizePtr2;
// Should we read more than one at a time?
if (PaUtil_GetRingBufferReadRegions(&_playbackData->RTToMainRB, 1,
(void *)&dataPtr1, &sizePtr1,
(void *)&dataPtr2, &sizePtr2) != 1) {
continue;
}
// Parse message
switch (dataPtr1->type) {
case MessageTypeEOS:
break;
case MessageTypePosition:
break;
default:
break;
}
PaUtil_AdvanceRingBufferReadIndex(&_playbackData->RTToMainRB, 1);
}
Then in the realtime thread, pushing a message to the ringbuffer was simply
MessageData *dataPtr1, *dataPtr2;
ring_buffer_size_t sizePtr1, sizePtr2;
if (PaUtil_GetRingBufferWriteRegions(&data->RTToMainRB, 1,
(void *)&dataPtr1, &sizePtr1,
(void *)&dataPtr2, &sizePtr2)) {
dataPtr1->type = MessageTypePosition;
dataPtr1->data.position.position = currentPosition;
PaUtil_AdvanceRingBufferWriteIndex(&data->RTToMainRB, 1);
}
A ringbuffer is a good solution. Two if you need to communicate both ways ie. inbox/outbox message passing.
This is a good implementation for iOS/Mac if you don't want to use Portaudio.
https://github.com/michaeltyson/TPCircularBuffer

Potential kind of asynchronous (overlapped) I/O implementation in Windows

I would like to discuss potential kind of asynchronous (Overlapped) I/O implementations in Windows, because there are many ways to implement this.
Overlapped I/O in Windows provides the ability to process data asynchronously, ie the execution of the operations are nonblocking.
Edit: The purpose of this question is the discussion about improvement of my own implementation on the one hand, and the discussion of alternate implementation on the other hand. What asynchronous I/O implementation would make most sense on parallel heavy I/O, what make most sense in small mostly single threaded application.
I will cite MSDN:
When a function is executed synchronously, it does not return until the operation has been completed. This means that the execution of the calling thread can be blocked for an indefinite period while it waits for a time-consuming operation to finish. Functions called for overlapped operation can return immediately, even though the operation has not been completed. This enables a time-consuming I/O operation to be executed in the background while the calling thread is free to perform other tasks. For example, a single thread can perform simultaneous I/O operations on different handles, or even simultaneous read and write operations on the same handle.
I assume that the reader is familiar with the basic concept of overlapped I/O.
Another solution for asynchronous I/O are completions ports, but this shall not be the subject of this discussion. More information on other I/O concepts can be found on MSDN "About File Management > Input and Output (I/O) > I/O Concepts"
I would like to present my (C/C++) implementation here and share it for discussion.
This is my extended OVERLAPPED struct called IoOperation:
struct IoOperation : OVERLAPPED {
HANDLE Handle;
unsigned int Operation;
char* Buffer;
unsigned int BufferSize;
}
This struct is created each time an asynchronous operation like ReadFile or WriteFile is called. The Handle field shall be initialized with the corresponding device/file handle. Operation is a user defined field that tells what operation was called. The field Buffer is a pointer to a previously allocated chunk of memory with the given size BufferSize. Of course, this struct can be expanded at will. It could contain the operation result, acutaully transfered size etc.
The first thing we need is an (auto reset) event handle to be signaled each time an overlapped I/O is completed.
HANDLE hEvent = CreateEvent(0, FALSE, FALSE, 0);
First I decided to use only one event for all asynchronous operations. Then I decided to register this event with a thread pool thread with RegisterWaitForSingleObject.
HANDLE hWait = 0;
....
RegisterWaitForSingleObject(
&hWait,
hEvent,
WaitOrTimerCallback,
this,
INFINITE,
WT_EXECUTEINPERSISTENTTHREAD | WT_EXECUTELONGFUNCTION
);
So each time this event is signaled, my callback WaitOrTimerCallback is called.
An asynchronous operation is initialized like this:
IoOperation* Io = new IoOperation(hFile, hEvent, IoOperation::Write, Data, DataSize);
if (IoQueue->Enqueue(Io)) {
WriteFile(hFile, Io->Buffer, Io->BufferSize, 0, Io);
}
Each operation is queued and is removed after successful GetOverlappedResult call in my WaitOrTimerCallback callback. Instead calling new all the time here, we could use a memory pool to avoid memory fragmentation and to make allocation faster.
VOID CALLBACK WaitOrTimerCallback(PVOID Parameter, BOOLEAN TimerOrWaitFired) {
list<IoOperation*>::iterator it = IoQueue.begin();
while (it != IoQueue.end()) {
bool IsComplete = true;
DWORD Transfered = 0;
IoOperation* Io = *it;
if (GetOverlappedResult(Io->Handle, Io, &Transfered, FALSE)) {
if (Io->Operation == IoOperation::Read) {
// Handle Read, virtual OnRead(), SetEvent, etc.
} else if (Io->Operation == IoOperation::Write) {
// Handle Read, virtual OnWrite(), SetEvent, etc.
} else {
// ...
}
} else {
if (GetLastError() == ERROR_IO_INCOMPLETE) {
IsComplete = false;
} else {
// Handle Error
}
}
if (IsComplete) {
delete Io;
it = IoQueue.erase(it);
} else {
it++;
}
}
}
Of course, to be multi threading safe, we need a lock protection (critical section) when accessing the I/O queue for example.
There are advantages but also disadvantage of this kind of implementation.
Advantages:
Execution in persistent thread pool thread, no manual thread creation is required
Only one event is required
Each operation is queued in an I/O queue (CancelIoEx can be called later)
Disadvantages:
I/O queue requires extra memory/cpu time
GetOverlappedResult is called for all queued I/O's even incompleted ones

implementing a scheduler class in Windows

I want to implement a scheduler class, which any object can use to schedule timeouts and cancel then if necessary. When a timeout expires, this information will be sent to the timeout setter/owner at that time asynchronously.
So, for this purpose, I have 2 fundamental classes WindowsTimeout and WindowsScheduler.
class WindowsTimeout
{
bool mCancelled;
int mTimerID; // Windows handle to identify the actual timer set.
ITimeoutReceiver* mSetter;
int cancel()
{
mCancelled = true;
if ( timeKillEvent(mTimerID) == SUCCESS) // Line under question # 1
{
delete this; // Timeout instance is self-destroyed.
return 0; // ok. OS Timer resource given back.
}
return 1; // fail. OS Timer resource not given back.
}
WindowsTimeout(ITimeoutReceiver* setter, int timerID)
{
mSetter = setter;
mTimerID = timerID;
}
};
class WindowsScheduler
{
static void CALLBACK timerFunction(UINT uID,UINT uMsg,DWORD dwUser,DWORD dw1,DWORD dw2)
{
WindowsTimeout* timeout = (WindowsTimeout*) uMsg;
if (timeout->mCancelled)
delete timeout;
else
timeout->mDestination->GEN(evTimeout(timeout));
}
WindowsTimeout* schedule(ITimeoutReceiver* setter, TimeUnit t)
{
int timerID = timeSetEvent(...);
if (timerID == SUCCESS)
{
return WindowsTimeout(setter, timerID);
}
return 0;
}
};
My questions are:
Q.1. When a WindowsScheduler::timerFunction() call is made, this call is performed in which context ? It is simply a callback function and I think, it is performed by the OS context, right ? If it is so, does this calling pre-empt any other tasks already running ? I mean do callbacks have higher priority than any other user-task ?
Q.2. When a timeout setter wants to cancel its timeout, it calls WindowsTimeout::cancel().
However, there is always a possibility that timerFunction static call to be callbacked by OS, pre-empting the cancel operation, for example, just after mCancelled = true statement. In such a case, the timeout instance will be deleted by the callback function.
When the pre-empted cancel() function comes again, after the callback function completes execution, will try to access an attribute of the deleted instance (mTimerID), as you can see on the line : "Line under question # 1" in the code.
How can I avoid such a case ?
Please note that, this question is an improved version of the previos one of my own here:
Windows multimedia timer with callback argument
Q1 - I believe it gets called within a thread allocated by the timer API. I'm not sure, but I wouldn't be surprised if the thread ran at a very high priority. (In Windows, that doesn't necessarily mean it will completely preempt other threads, it just means it will get more cycles than other threads).
Q2 - I started to sketch out a solution for this, but then realized it was a bit harder than I thought. Personally, I would maintain a hash table that maps timerIDs to your WindowsTimeout object instances. The hash table could be a simple std::map instance that's guarded by a critical section. When the timer callback occurs, it enters the critical section and tries to obtain the WindowsTimer instance pointer, and then flags the WindowsTimer instance as having been executed, exits the critical section, and then actually executes the callback. In the event that the hash table doesn't contain the WindowsTimer instance, it means the caller has already removed it. Be very careful here.
One subtle bug in your own code above:
WindowsTimeout* schedule(ITimeoutReceiver* setter, TimeUnit t)
{
int timerID = timeSetEvent(...);
if (timerID == SUCCESS)
{
return WindowsTimeout(setter, timerID);
}
return 0;
}
};
In your schedule method, it's entirely possible that the callback scheduled by timeSetEvent will return BEFORE you can create an instance of WindowsTimeout.

pthread condition variables vs win32 events (linux vs windows-ce)

I am doing a performance evaluation between Windows CE and Linux on an arm imx27 board. The code has already been written for CE and measures the time it takes to do different kernel calls like using OS primitives like mutex and semaphores, opening and closing files and networking.
During my porting of this application to Linux (pthreads) I stumbled upon a problem which I cannot explain. Almost all tests showed a performance increase from 5 to 10 times but not my version of win32 events (SetEvent and WaitForSingleObject), CE actually "won" this test.
To emulate the behaviour I was using pthreads condition variables (I know that my implementation doesn't fully emulate the CE version but it's enough for the evaluation).
The test code uses two threads that "ping-pong" each other using events.
Windows code:
Thread 1: (the thread I measure)
HANDLE hEvt1, hEvt2;
hEvt1 = CreateEvent(NULL, FALSE, FALSE, TEXT("MyLocEvt1"));
hEvt2 = CreateEvent(NULL, FALSE, FALSE, TEXT("MyLocEvt2"));
ResetEvent(hEvt1);
ResetEvent(hEvt2);
for (i = 0; i < 10000; i++)
{
SetEvent (hEvt1);
WaitForSingleObject(hEvt2, INFINITE);
}
Thread 2: (just "responding")
while (1)
{
WaitForSingleObject(hEvt1, INFINITE);
SetEvent(hEvt2);
}
Linux code:
Thread 1: (the thread I measure)
struct event_flag *event1, *event2;
event1 = eventflag_create();
event2 = eventflag_create();
for (i = 0; i < 10000; i++)
{
eventflag_set(event1);
eventflag_wait(event2);
}
Thread 2: (just "responding")
while (1)
{
eventflag_wait(event1);
eventflag_set(event2);
}
My implementation of eventflag_*:
struct event_flag* eventflag_create()
{
struct event_flag* ev;
ev = (struct event_flag*) malloc(sizeof(struct event_flag));
pthread_mutex_init(&ev->mutex, NULL);
pthread_cond_init(&ev->condition, NULL);
ev->flag = 0;
return ev;
}
void eventflag_wait(struct event_flag* ev)
{
pthread_mutex_lock(&ev->mutex);
while (!ev->flag)
pthread_cond_wait(&ev->condition, &ev->mutex);
ev->flag = 0;
pthread_mutex_unlock(&ev->mutex);
}
void eventflag_set(struct event_flag* ev)
{
pthread_mutex_lock(&ev->mutex);
ev->flag = 1;
pthread_cond_signal(&ev->condition);
pthread_mutex_unlock(&ev->mutex);
}
And the struct:
struct event_flag
{
pthread_mutex_t mutex;
pthread_cond_t condition;
unsigned int flag;
};
Questions:
Why doesn't I see the performance boost here?
What can be done to improve performance (e.g are there faster ways to implement CEs behaviour)?
I'm not used to coding pthreads, are there bugs in my implementation maybe resulting in performance loss?
Are there any alternative libraries for this?
Note that you don't need to be holding the mutex when calling pthread_cond_signal(), so you might be able to increase the performance of your condition variable 'event' implementation by releasing the mutex before signaling the condition:
void eventflag_set(struct event_flag* ev)
{
pthread_mutex_lock(&ev->mutex);
ev->flag = 1;
pthread_mutex_unlock(&ev->mutex);
pthread_cond_signal(&ev->condition);
}
This might prevent the awakened thread from immediately blocking on the mutex.
This type of implementation only works if you can afford to miss an event. I just tested it and ran into many deadlocks. The main reason for this is that the condition variables only wake up a thread that is already waiting. Signals issued before are lost.
No counter is associated with a condition that allows a waiting thread to simply continue if the condition has already been signalled. Windows Events support this type of use.
I can think of no better solution than taking a semaphore (the POSIX version is very easy to use) that is initialized to zero, using sem_post() for set() and sem_wait() for wait(). You can surely think of a way to have the semaphore count to a maximum of 1 using sem_getvalue()
That said I have no idea whether the POSIX semaphores are just a neat interface to the Linux semaphores or what the performance penalties are.

Resources