Synchronization primitive with IO/Kit - events

I'm looking for a wait/signal synchronization primitive in IO/Kit working like :
Thread1 : wait(myEvent) // Blocking thread1
Thread2 : wait(myEvent) // Blocking thread2
Thread3 : signal(myEvent) // Release one of thread1 or thread2
This can't be done using an IOLock since the lock/unlock operations would be made from different threads, which is a bad idea according to some doc I've read.
Thread1, 2, 3 can be user threads or kernel threads.
I'd also like to have an optional time out with the wait operation.
Thanks for your help !

You want the function IOLockSleepDeadline(), declared in <IOKit/IOLocks.h>.
You set up a single IOLock somewhere with IOLockAlloc() before you begin. Then, threads 1 and 2 lock the IOLock with IOLockLock() and immediately relinquish the lock and go to sleep by calling IOLockSleepDeadline(). When thread 3 is ready, it calls IOLockWakeup() (with oneThread = true if you only want to wake a single thread). This causes thread 1 or 2 to wake up and immediately acquire the lock (so they need to Unlock or sleep again).
IOLockSleep() works similarly, but without the timeout.
You can do something similar using the IOCommandGate's commandSleep() method which may be more appropriate if your driver already is centred around an IOWorkLoop.

The documentation of method IOLocks::IOLockLock states the following:
Lock the mutex. If the lock is held by any thread, block waiting for
its unlock. This function may block and so should not be called from
interrupt level or while a spin lock is held. Locking the mutex
recursively from one thread will result in deadlock.
So it will certainly do block the other threads (T1 and T2) until the thread holding the lock releases it (T3). One thing that it doesn't seem to support is the timeout.

Related

Caller/Backtrace beyond a thread

As far as I know, it is possible to get only the portion of the caller/backtrace information that is within the current thread; anything prior to that (in the thread that created the current thread) is cut off. The following exemplifies this; the fact that a called b, which called c, which created the thread that called d, is cut off:
def a; b end
def b; c end
def c; Thread.new{d}.join end
def d; e end
def e; puts caller end
a
# => this_file:4:in `d'
# this_file:3:in `block in c'
What is the reason for this feature?
Is there a way to get the caller/backtrace information beyond the current thread?
I think I came up with my answer.
Things that can be done to a thread from outside of a thread is not only creating it. Other than creating, you can make wake up, etc. So it is not clear what operation should be attributed as part of the caller. For example, suppose there is a thread:
1: t = Thread.new{
2: Thread.stop
3: puts caller
4: }
5: t.wakeup
The thread t is created at line 1, but it goes into sleep by itself in line 2, then it wakes up by line 5. So, when we locate ourselves at line 3 caller, and consider the caller part outside of the thread, it is not clear whether Thread.new in line 1 should be part of it, or t.wakeup in line 5 should be part of it. Therefore, there is no clear notion callers beyond the current thread.
However, if we define a clear notion, then it is possible for caller beyond a thread to make sense. For example, always adding the callers up to the creation of the thread may make sense. Otherwise, adding the callers leading to the the most recent wakeup or creation may make sense. It is up to the definition.
The answer to both your questions is really the same. Consider a slightly more involved main thread. Instead of simply waiting for the spawned thread to end in c the main thread goes on calling other functions, perhaps even returning from c and going about it's business while the spawned thread goes on about it's business.
This means that the stack in the main thread has changed since the thread starting in d was spawned. In other words, by the time you call puts caller the stack in the main thread is no longer in the state it was when the secondary thread was created. There is no way to safely walk back up the stack beyond this point.
So in short:
The stack of the spawning thread will not remain in the state it was when the thread was spawned so walking back beyond the start of a threads own stack is not safe.
No, since the entire idea behind threads is that they are (pseudo) parallel, their stacks are completely unrelated.
Update:
As suggested in the comments, the stack of the current thread can be copied to the new thread at creation time. This would preserve the information that lead up to the thread being created, but the solution is not without its own set of problems.
Thread creation will be slower. That could be ok, if there was anything to gain from it, but in this case, is it?
What would it mean to return from the thread entry function?
It could return to the function that created the thread and keep running as if it was just a function call - only that it now runs in the second thread, not the original one. Would we want that?
There could be some magic that ensures that the thread terminates even if it's not at the top of the call stack. This would make the information in the call stack above the thread entry function incorrect anyways.
On systems with limits on the stacksize for each thread you could run into problems where the thread ran out of stack even if it's not using very much on it's own.
There probably other scenarios and peculiarities that could be thought out too, but the way threads are created with their own empty stack to start with makes the model both simple and predictable without leaving any useful information out of the callstack.

Completion object race condition

What happens if complete_all() is called on a completion object (from task B) before the task A gets to do wait_for_completion() on the completion object? Is there some API to find if object is already completed at time of wait and return right away? One way could be using a mutex which is locked before sending the message and unlocked before the wait. That lock needs to be acquired before complete_all() and released after but wondering if there is a cleaner/better way. Any ideas are welcome.
More context: task A initializes the completion object, sends a request to task B along with the address of the completion object and then waits for the completion. Task B does some processing when it gets the message and then does complete_all() on the completion object.
If complete() or complete_all() is called before wait_for_completion() for a particular completion object, then wait_for_completion() will return immediately. A completion object is roughly like a semaphore:
Internally, a completion object has a done counter that is initialized to 0.
wait_for_completion() sleeps until done > 0 (or proceeds immediately if done is already greater than 0), and atomically decrements done before returning.
complete() increments done and wakes up the first process sleeping in wait_for_completion().
complete_all() sets done to UINT_MAX / 2 (effectively infinity) and wakes up everyone sleeping in wait_for_completion().
So if I'm understanding your question correctly, there is no need for additionaly locking; the completion object's internal wait.lock spinlock already synchronizes the counter access so that the case you're worrying about is handled correctly.

IOCP loop termination may cause memory leaks? How to close IOCP loop gracefully

I have the classic IOCP callback that dequeues i/o pending requests, process them, and deallocate them, in this way:
struct MyIoRequest { OVERLAPPED o; /* ... other params ... */ };
bool is_iocp_active = true;
DWORD WINAPI WorkerProc(LPVOID lpParam)
{
ULONG_PTR dwKey;
DWORD dwTrans;
LPOVERLAPPED io_req;
while(is_iocp_active)
{
GetQueuedCompletionStatus((HANDLE)lpParam, &dwTrans, &dwKey, (LPOVERLAPPED*)&io_req, WSA_INFINITE);
// NOTE, i could use GetQueuedCompletionStatusEx() here ^ and set it in the
// alertable state TRUE, so i can wake up the thread with an ACP request from another thread!
printf("dequeued an i/o request\n");
// [ process i/o request ]
...
// [ destroy request ]
destroy_request(io_req);
}
// [ clean up some stuff ]
return 0;
}
Then, in the code I will have somewhere:
MyIoRequest * io_req = allocate_request(...params...);
ReadFile(..., (OVERLAPPED*)io_req);
and this just works perfectly.
Now my question is: What about I want to immediately close the IOCP queue without causing leaks? (e.g. application must exit)
I mean: if i set is_iocp_active to 'false', the next time GetQueuedCompletionStatus() will dequeue a new i/o request, that will be the last i/o request: it will return, causing thread to exit and when a thread exits all of its pending i/o requests are simply canceled by the system, according to MSDN.
But the structures of type 'MyIoRequest' that I have instanced when calling ReadFile() won't be destroyed at all: the system has canceled pending i/o request, but I have to manually destroy those structures I have
created, or I will leak all pending i/o requests when I stop the loop!
So, how I could do this? Am I wrong to stop the IOCP loop with just setting that variable to false? Note that is would happen even if i use APC requests to stop an alertable thread.
The solution that come to my mind is to add every 'MyIoRequest' structures to a queue/list, and then dequeue them when GetQueuedCompletionStatusEx returns, but shouldn't that make some bottleneck, since the enqueue/dequeue process of such MyIoRequest structures must be interlocked? Maybe I've misunderstood how to use the IOCP loop. Can someone bring some light on this topic?
The way I normally shut down an IOCP thread is to post my own 'shut down now please' completion. That way you can cleanly shut down and process all of the pending completions and then shut the threads down.
The way to do this is to call PostQueuedCompletionStatus() with 0 for num bytes, completion key and pOverlapped. This will mean that the completion key is a unique value (you wont have a valid file or socket with a zero handle/completion key).
Step one is to close the sources of completions, so close or abort your socket connections, close files, etc. Once all of those are closed you can't be generating any more completion packets so you then post your special '0' completion; post one for each thread you have servicing your IOCP. Once the thread gets a '0' completion key it exits.
If you are terminating the app, and there's no overriding reason to not do so, (eg. close DB connections, interprocess shared memory issues), call ExitProcess(0).
Failing that, call CancelIO() for all socket handles and process all the cancelled completions as they come in.
Try ExitProcess() first!

Make parent thread wait till child thread finishes in VC

According to MSDN:
The WaitForSingleObject function can wait for the following objects:
Change notification
Console input
Event
Memory resource notification
Mutex
Process
Semaphore
Thread
Waitable timer
Then we can use WaitForSingleObject to make the parent-thread wait for child ones.
int main()
{
HANDLE h_child_thread = CreateThread(0,0, child, 0,0,0); //create a thread in VC
WaitForSingleObject(h_child_thread, INFINITE); //So, parent-thread will wait
return 0;
}
Question
Is there any other way to make parent-thread wait for child ones in VC or Windows?
I don't quite understand the usage of WaitForSingleObject here, does it mean that the thread's handle will be available when the thread terminates?
You can establish communication between threads in multiple ways and the terminating thread may somehow signal its waiting thread. It could be as simple as writing some special value to a shared memory location that the waiting thread can check. But this won't guarantee that the terminating thread has terminated when the waiting thread sees the special value (ordering/race conditions) or that the terminating thread terminates shortly after that (it can just hang or block on something) and it won't guarantee that the special value gets ever set before the terminating thread actually terminates (the thread can crash). WaitForSingleObject (and its companion WaitForMultipleObjects) is a sure way to know of a thread termination when it occurs. Just use it.
The handle will still be available in the sense that its value won't be gone. But it is practically useless after the thread has terminated, except you need this handle to get the thread exit code. And you still need to close the handle in the end. That is unless you're OK with handle/memory leaks.
for the first queation - yes. The method commonly used here is "Join". the usage is language dependant.
In .NET C++ you can use the Thread's Join method. this is from the msdn:
Thread* newThread = new Thread(new ThreadStart(0, Test::Work));
newThread->Start();
if(newThread->Join(waitTime + waitTime))
{
Console::WriteLine(S"New thread terminated.");
}
else
{
Console::WriteLine(S"Join timed out.");
}
Secondly, the thread is terminated when when you are signaled with "WaitForSingleObject" but the handle is still valid (for a terminated thread). So you still need to explicitly close the handle with CloseHandle.

NSLock - should just block when locking a locked lock?

I have a loop which starts with a
[lock lock];
because in the body of the loop I am creating another thread which needs to finish before the loop runs again. (The other thread will unlock it when finished).
However on the second loop I get the following error:
2011-02-02 07:15:05.032 BLA[21915:a0f] *** -[NSLock lock]: deadlock (<NSLock: 0x100401f30> '(null)')
2011-02-02 07:15:05.032 BLA[21915:a0f] *** Break on _NSLockError() to debug.
The "lock" documentation states the following:
Abstract: Attempts to acquire a lock, blocking a thread’s execution until the lock can be acquired. (required)
which makes me think it would just block until the lock could be acquired?
Sounds like two problems:
Locking a lock on one thread and unlocking on another is not supported – you probably want NSCondition. Wait on the NSCondition in the parent thread, and signal it in the child thread.
A normal NSLock can’t be locked while already locked. That’s what NSRecursiveLock is for.
Did you remember to send -unlock when you were done? Each call to -lock must be paired with a call to -unlock.

Resources