Suppose I have a synchronization primitive with the following features:
It has a Count property which is initially zero.
It has a Wait method. When invoked on an object with Count zero, it returns immediately.
Otherwise, it blocks and waits for the object to be signalled.
It has a Signal method. When invoked on an object with a positive Count, it decrements it and unblocks the current/next invocation of Wait.
If Signal was called 5 times while Wait waited, not only will the current Wait return, but the next 4 calls to Wait will return immediately.
It has an ExpectSignal (not sure about the name) method. It increments the Count property.
Is there a standard, even if obscure, name for such an entity?
Related
I'm trying to use IOCP relying on Windows API CreateThreadpoolIo and StartThreadpoolIo, but I found the thread pool is just to make the code behind the IO completed parallel. The async IO submit operations are also execute sequentially in the main thread. So why we need this? I think make the IO submit operations parallel can improve the throughput even if they are async operations, right?
The other cost is if we make them parallel, we might need to lock something to guarantee data consistency (thread safe operation).
It is possible to do IOCP without using CreateThreadpool / StartThreadpoolIo, in that case you have to manage calling GetQueuedCompletionStatus yourself (whether in a self-managed thread pool or otherwise - it is even conceivable that it could be interleaved into the actions of the thread that started the I/O, but in that case why bother with IOCP?). StartThreadpoolIO is needed in order to have a thread waiting on GetQueuedCompletionStatus instead of WaitForMultipleObjects (or one of its variants). CancelThreadpoolIo decrements a counter saying how many IOCP operations are outstanding and if that counter reaches 0 the thread pool knows it can stop waiting on GetQueuedCompletionStatus.
CreateThreadpoolIo - create object TP_IO and call ZwSetInformationFile with FileCompletionInformation and FILE_COMPLETION_INFORMATION for set CompletionContext in FILE_OBJECT. as result - if we do I/O on file, when it finished (if no synchronous error returned and we pass not zero ApcContext ) - system queue packet to I/O port ( which we provide in FILE_COMPLETION_INFORMATION ) with Key (from FILE_COMPLETION_INFORMATION ) and ApcContext (form concrete I/O call. win32 api always pass pointer to OVERLAPPED here). the user callback address (IoCompletionCallback ) stored inside TP_IO
StartThreadpoolIO increment reference count on TP_IO and CancelThreadpoolIo (and CloseThreadpoolIo) decrement this reference count. this need for manage life time of TP_IO - before we start any I/O operation - need increment reference count on TP_IO. when I/O finished - packet will be queued to I/O port. one of Threads from pool pop this packet. got Key ( lpCompletionKey) convert it to pointer to TP_IO and call user callback IoCompletionCallback. after callback return - system decrement reference count to TP_IO. if the I/O fail synchronous - will be no packet, no callback. so need direct decrement reference count to TP_IO - for this need call CancelThreadpoolIo
I try to understand use of completion in a piece of code.
Basically, one kernel thread creates automatic variable struct completion which is, I assume, allocated on the thread's stack. Then it pushes pointer of the completion struct to another thread (using fifo) and waits for completion.
struct completion done;
init_completion(&done);
push_to_fifo(&done);
wait_for_completion(&done);
The second thread fetches request from fifo, processes it and completes task.
Will the done variable be accessible from the second thread which calls complete(done)?
The first thread is waiting for the second to finish, so the struct completion on its stack will be stable until after wait_for_completion returns.
The stack space where that structure resides is just regular memory, the same as heap-allocated memory. The only difference is that once this function returns, and its caller invokes a different function, the same memory gets re-used for the stack frame / local variables of that next function.
So, if the other thread were to access the structure after that point, that would be a problem, but the point is that is supposed to be finished by then and once it signals "done", it shouldn't touch that memory again.
I guess it's the thread, say A, on which the timer was created. But I can't figure out how exactly the callback function is called. Assume the timer expires, and then what happens? Does this happen when this thread gets its time slice? And if this is the case, I think the function should be called by the scheduler or what before the context is finally switched to A, then can I say A is the caller?
Thanks.
The timer callback can also be called by a pool thread, a thread that specifically manages timers or in the context of the creating thread, (the creating thread is designed to accept and process an 'Asynchronous Procedure Call'). The flag paramters in CTQT() control the action upon timer expiry.
If the timer event is called by a pool thread or timer-manager thread, that thread will become ready upon expiry and, when there is a core available to run it, it will make the callback 'immediately' within its own context. The thread that created the timer could, if it wished, wait on a synchro object, (event or semaphore), that could be signaled by the timer callback, (ie. normal inter-thread comms).
The timer callback can only be executed in the context of the thread that created it if that thread is in a position to execute the callback when it receives some sort of signal. In the case of these timers, an APC is QUEUED to the creating thread and, if that thread is blocked on one of the 'alertable' wait calls, it will become ready immediately, will run when there is a core available to run it. After the APC has run, the wait call will return. If the wait call is not SleepEx(), it will return WAIT_IO_COMPLETION - a result that is usually ignored. If the thread is not waiting when the APC is queued up, it will not be executed until the thread makes the next wait call, (obviously - since the thread must be off doing something else:).
'And if this is the case, I think the function should be called by the scheduler or what before the context is finally switched to A, then can I say A is the caller?' NO!
may be I misunderstood something but...
When I call pthread_mutex_lock() and then call pthread_mutex_lock() out of the same thread again without calling pthread_mutex_unlock(), the second call of pthread_mutex_lock() will block.
But: when I call EnterCriticalSection() and call EnterCriticalSection() out of the same thread again without calling LeaveCriticalSection(), the second call of EnterCriticalSection() will NOT block since it is called out of the same thread (what is a very weird behaviour for me).
So my question is there a WinAPI function available that behaves like pthread_mutex_lock() and locks independent from the thread context?
I'm aware of libpthread for Windows but I prefer to have a WinAPI function here.
You could use a Semaphore with the maximum count set to one.
See Semaphore Objects
When you successfully acquire the semaphore, its count is decremented: going to zero in our case.
No other thread can acquire it, including the current one.
pthread_mutex_lock documentation:
If the mutex type is PTHREAD_MUTEX_RECURSIVE, then the mutex maintains the concept of a lock count. When a thread successfully acquires a mutex for the first time, the lock count is set to one. Every time a thread relocks this mutex, the lock count is incremented by one. Each time the thread unlocks the mutex, the lock count is decremented by one. When the lock count reaches zero, the mutex becomes available for other threads to acquire. If a thread attempts to unlock a mutex that it has not locked or a mutex which is unlocked, an error will be returned.
MSDN ReleaseMutex states:
A thread can specify a mutex that it already owns in a call to one of the wait functions without blocking its execution. This prevents a thread from deadlocking itself while waiting for a mutex that it already owns. However, to release its ownership, the thread must call ReleaseMutex one time for each time that it obtained ownership (either through CreateMutex or a wait function).
The wait functions are the equivalent to pthread_mutex_lock.
See Mutex Objects (Windows) to get more details about this API.
And this stackoverflow entry to see what the CRITICAL_SECTION object contains. This will disclose
that the CRITICAL_SECTION object holds - among others - a value LockCount to allow recursive use. See the EnterCriticalSection function to learn about this feature.
It is best to describe my question in an example:
We create a Windows Event handle by CreateEvent, with manualReset as FALSE.
We create 4 threads. Ensure that they all start running and waiting on the above event by WaitForSingleObject.
In the main thread, in a for loop, we signal this event 4 times, by SetEvent. such as:
for (int i = 0; i < 4; ++i) ::SetEvent(event);
My question is, can we say that all these 4 threads will certainly be waken up from waiting on this event?
According to my understanding of Windows Event, the answer is YES. Because when the event is set, there is always a thread waiting for it.
However, I read on MSDN that "Setting an event that is already set has no effect". Since the waiting threads probably do not get a chance to run while main thread setting event in the loop. Can they still be notified and reset the event to nonsignaled? If the event is not reset, the following SetEvent in the loop is obviously useless.
Or the OS kernel knows which thread should be notified when an event is set, and reset this event immediately if there is a waiting thread. So the waiting thread does not need to be schedule to reset the event to nonsignaled?
Any clarification or references are welcome. Thanks.
Because when the event is set, there is always a thread waiting for it.
No, you don't know that. A thread may indefinitely suspended for some reason just before the NtWaitForSingleObject system call.
Since the waiting threads probably do not get a chance to run while main thread setting event in the loop.
If a thread is waiting for an object, it doesn't run at all - that's the whole point of being able to block on a synchronization object.
Can they still be notified and reset the event to nonsignaled? If the event is not reset, the following SetEvent in the loop is obviously useless.
The thread that sets the event is the one that resets the signal state back to 0, not the thread that gets woken up. Of course, if there's no thread waiting the signal state won't be reset.
Or the OS kernel knows which thread should be notified when an event is set, and reset this event immediately if there is a waiting thread.
Yes, the kernel does know. Every dispatcher object has a wait list, and when a thread waits on an object it pushes a wait block onto that list.
In a word? No.
There's no guarantee that each and every call to Set() will signal a waiting thread. MSDN describes this behavior as follows:
There is no guarantee that every call
to the Set method will release a
thread from an EventWaitHandle whose
reset mode is
EventResetMode::AutoReset. If two
calls are too close together, so that
the second call occurs before a thread
has been released, only one thread is
released. It is as if the second call
did not happen. Also, if Set is called
when there are no threads waiting and
the EventWaitHandle is already
signaled, the call has no effect.
(Source)
If you want to ensure that a specific number of threads will be signaled, you should use a more suitable kind of synchronization primitive, such as a Semaphore.
When you do SetEvent(event), since your manual reset is set as false for the event, any thread (windows doesnt specify any preferences) from one of the four would get passed the waitforsingleobject() and on the subsequent calls the other 3 threads would randomly be selected since your event is autoreset after releasing every thread.
If you're trying to imply the threads are re-entrant, the threads getting released every time would again be one out of four randomly by OSes choice.