Creating a threadpool for IOCP - windows

Regarding IOCP and Threadpools, should I use a specific function to create a "thread pool"?
Right now I create my "thread pool" by calling CreateThread as many times as the number of threads I want in the pool.
I then assumed that all the threads created with CreateThread should call GetQueuedCompletionStatus and that's actually the call that creates the "thread pool"? (By associating the threads with a specific port) The threadpool = the threads associated with the IOCP?

Related

How to allocated thread pool while using asyncio ProactorEventLoop

I'm currently using asyncio in Python 3.7 and write a TCP server using the asyncio.start_server() function
refer to this example: https://docs.python.org/3/library/asyncio-stream.html
Also try asyncio.ProactorEventLoop that uses “I/O Completion Ports” (IOCP)
According to this Microsoft official doc https://learn.microsoft.com/en-ca/windows/win32/fileio/i-o-completion-ports, it mention that using I/O completion ports with a pre-allocated thread pool but I cannot find where can allocate the number of thread
Where can I allocate the number of thread in thread pool?
Can anyone please help me here? Thanks a lot!
at first general info about I/O completion ports (iocp) and thread pool(s). we have 2 options here:
create all by self:
create iocp by self via CreateIoCompletionPort (or NtCreateIoCompletion).
by self create threads, which will be call GetQueuedCompletionStatus (or NtRemoveIoCompletion).
every file you need bind to your iocp again by self via NtSetInformationFile with FileCompletionInformation and FILE_COMPLETION_INFORMATION or via CreateIoCompletionPort (this win32 api combine functional of NtCreateIoCompletion and NtSetInformationFile).
use system iocp(s) and thread pool(s).
system (ntdll.dll) create default thread pool (now it named TppPoolpGlobalPool) when process startup. you have week control for this pool. you can not got it direct pointer PTP_POOL. exist undocumented TpSetDefaultPoolMaxThreads (for set the maximum number of threads in this pool) but no for minimum.
if want - you can create additional thread pools via CreateThreadpool function.
After creating the new thread pool, you can (but not should!) call SetThreadpoolThreadMaximum to specify the maximum number of threads that the pool can allocate and SetThreadpoolThreadMinimum to specify the minimum number of threads available in the pool.
The thread pool maintains an I/O completion port. the iocp created inside call CreateThreadpool - we have no direct access to it.
so initially in process exist one global/default thread pool (TppPoolpGlobalPool) and iocp (windows 10 for parallel loader create else one thread pool LdrpThreadPool but this of course only for internal use - while DDLs loading)
finally you bind self files to iocp by call CreateThreadpoolIo
note that msdn documentation is wrong here -
Creates a new I/O completion object.
really CreateThreadpoolIo function not create new I/O completion object - it created only inside call CreateThreadpool. this api bind file (not handle but file!) to I/O completion object which is associated to pool. to which pool ? look for last parameter - optional pointer to the TP_CALLBACK_ENVIRON.
you can specify a thread pool in next way - allocate callback environment, call InitializeThreadpoolEnvironment for it and then SetThreadpoolCallbackPool.
If you do not specify a thread pool, the global thread pool will be used in call CreateThreadpoolIo - so file will be bind to default/global process iocp
and you not need by self call GetQueuedCompletionStatus (or NtRemoveIoCompletion) in this case - system do this for you from pool. and then call your IoCompletionCallback callback function, which you pass to system inside CreateThreadpoolIo call
we can also use system global thread pool and iocp via BindIoCompletionCallback (
or RtlSetIoCompletionCallback) - it associates the I/O completion port owned by the global (TppPoolpGlobalPool) thread pool with the specified file handle. this is old api and variant of case 2. here we can not use custom poll - only process global.
now let back to concrete Python code. which case it use ? are it create iocp and thread pool by self ? or it use system thread pool ? if use system - use it global or custom thread pool allocated by CreateThreadpool ? if you dont know this - nothing can be done here. and even if know.. or this library have special api/interface (or how this in python called) for control this (in case self or custom pool used) or you only can use it as is. and really hard decide how many threads you really need in pool

I/O Completion ports when to increase/decrease the RefCount of Per socket in a multi-threaded design?

I read this question
I/O Completion Ports *LAST* called callback, or: where it's safe to cleanup things
And i can not get my issue solved. The answer does not fully cover this method.
I have also searched a lot here and in google but can not find a solution so i am opening a question here hope that is not duplicated.
In a multi-threaded IO Completion ports design when to increase the RefCount of the Per Socket structure? ie the CompletionKey. Currently i do increase it before calling WSARecv and if the return value of the call is not 0 or ERROR_IO_PENDING by last error, i decrease it and call a cleanup function, this function will check if the RefCount is 0, if it is then it will free the Per Socket structure. Else it will just free the Per IO structure (the one of OVERLAPPED), i also increase it before issuing any WSASend using the same way mentioned above. This RefCount is atomic using CRITCAL_SECTION. Upon returning from GetQueuedCompletionStatus i also decrease the RefCount.
However i have some questions about this method
I have a function that sends files from the main thread, the function is reading the file and issuing a PostQueuedCompletionStatus to do a send using WSASend through the IO Worker threads, the function sends file in chunks and when each chunk completes the IO Worker threads will inform the main thread with PostMessage to issue another send of the next chunk.
Now where i am supposed to increase this RefCount? in the main thread just before issuing a call to PostQueuedCompletionStatus? but what if a return from GetQueuedCompletionStatus returned and freed the Per Socket structure and the main thread still using it? (Example the main thread is executing the send function but not yet increased the RefCount) i tried to increase RefCount in the WSASend function in the IO Worker threads but it is the same issue
For instance: what if a thread woke up from GetQueuedCompletionStatus as a socket closure(caused by the outstanding WSARecv) and decrement the RefCount and it became 0 so it will free the Per Socket structure while a WSASend is executing in another IO Worker thread but not yet increased the RefCount? then obviously the thread that is about to issue a WSASend call will crash with access violation whenever it tries to enter the critical section.
Any idea how to synchronize access to this structure between IO Worker threads and the main thread?

Is there a way to create a new thread with space allocated for that thread, but defer the execution in c++11?

Let's say i want to create a thread, I want the necessary spaces allocated for the thread, however, i'd like to defer launching that thread.
I'm working on a threadpool, so i'd like to have some threads ready(but not running) before I start the threadpool.
Is there a way to do so in C++11?
You could have all the threads wait on a semaphore as soon as they start up. And then you can just signal them when it's time for them to actually start running.
This sounds similar to the "Thread Pool / Task" behavior present in a number of languages (and probably several C++ libraries like boost). A Thread Pool has one or more threads, and can queue Tasks. When it doesn't have tasks, a Thread Pool just waits for input. They can also, as implied, queue up tasks if the threads are busy.

Is it possible to share the same message queue between two threads?

What I would like to do is to have one thread waiting for messages (WaitMessage) and another processing the logic of the application. The first thread would wake up on every message, signal somehow this event to the other thread, go to sleep again, etc. Is this possible?
UPDATE
Consider the following situation. We have a GUI thread, and this thread is busy in a long calculation. If there is no other thread, there is no option but to check for new messages from time to time. Otherwise, the GUI would become irresponsive during the long calculation. Right now my system uses this "polling" approach (it has a single thread that checks the message queue from time to time.) However, I would like to know whether this other solution is possible: Have another thread waiting on the OS message queue of the GUI so that when a Windows message arrives this thread will wake up and tell the other about the message. Note that I'm not asking how to communicate the news between threads but whether it is possible for the second thread to wait for OS messages that arrive in the queue of the first thread.
I should also add that I cannot have two different threads, one for the GUI and another for the calculations, because the system I'm working on is a Virtual Machine on top of which runs a Smalltalk image that is not thread safe. That's why having a thread that only signals new OS messages would be the ideal solution (if possible.)
This depends on what the second thread needs to do once the first thread has received a message.
If the second thread simply needs to know the first thread received a message, the first thread could signal an Event object using SetEvent() or PulseEvent(), and the second thread could wait on that event using WaitForSingleObject().
If the second thread needs data from the first thread, it could use an I/O Completion Port. The first thread could wrap the data inside a dynamically allocated struct and post it to the port using PostQueuedCompletionStatus(), and the second thread could wait for the data using GetQueuedCompletionStatus() and then free it when done using it.
Update: based on new information you have provided, it is not possible for one thread to wait on or service another thread's message queue. Only the thread that created and owns the queue can poll messages from its queue. Each thread has its own message queue.
You really need to move your long calculations to a different thread, they don't belong in the GUI thread to begin with. Let the GUI thread manage the GUI and service messages, do any long-running things in another thread.
If you can't do that because your chosen library is not thread safe, then you have 4 options:
find a different library that is thread safe.
have the calculations poll the message queue periodically when running in the GUI thread.
break up the calculations into small chunks that can be triggered by the GUI thread posting messages to itself. Post a message and return to the message loop. When the message is received, do a little bit of work, post the next message, and return to the message loop. Repeat as needed until the work is done. This allows the GUI thread to continue servicing the message queue in between each calculation step.
move the library to a separate process that communicates back with your main app as needed.

MFC CEvent class member function SetEvent , difference with Thread Lock() function?

what i s the difference between SetEvent() and Thread Lock() function? anyone please help me
Events are used when you want to start/continue processing once a certain task is completed i.e. you want to wait until that event occurs. Other threads can inform the waiting thread about the completion of this task using SetEvent.
On the other hand, critical section is used when you want only one thread to execute a block of code at a time i.e. you want a set of instructions to be executed by one thread without any other thread changing the state at that time. For example, you are inserting an item into a linked list which involves multiple steps, at that time you don't want another thread to come and try to insert one more object into the list. So you block the other thread until first one finishes using critical sections.
Events can be used for inter-process communication, ie synchronising activity amongst different processes. They are typically used for 'signalling' the occurrence of an activity (e.g. file write has finished). More information on events:
http://msdn.microsoft.com/en-us/library/windows/desktop/ms686915%28v=vs.85%29.aspx
Critical sections can only be used within a process for synchronizing threads and use a basic lock/unlock concept. They are typically used to protect a resource from multi-threaded access (e.g. a variable). They are very cheap (in CPU terms) to use. The inter-process variant is called a Mutex in Windows. More info:
http://msdn.microsoft.com/en-us/library/windows/desktop/ms682530%28v=vs.85%29.aspx

Resources