I'm currently using asyncio in Python 3.7 and write a TCP server using the asyncio.start_server() function
refer to this example: https://docs.python.org/3/library/asyncio-stream.html
Also try asyncio.ProactorEventLoop that uses “I/O Completion Ports” (IOCP)
According to this Microsoft official doc https://learn.microsoft.com/en-ca/windows/win32/fileio/i-o-completion-ports, it mention that using I/O completion ports with a pre-allocated thread pool but I cannot find where can allocate the number of thread
Where can I allocate the number of thread in thread pool?
Can anyone please help me here? Thanks a lot!
at first general info about I/O completion ports (iocp) and thread pool(s). we have 2 options here:
create all by self:
create iocp by self via CreateIoCompletionPort (or NtCreateIoCompletion).
by self create threads, which will be call GetQueuedCompletionStatus (or NtRemoveIoCompletion).
every file you need bind to your iocp again by self via NtSetInformationFile with FileCompletionInformation and FILE_COMPLETION_INFORMATION or via CreateIoCompletionPort (this win32 api combine functional of NtCreateIoCompletion and NtSetInformationFile).
use system iocp(s) and thread pool(s).
system (ntdll.dll) create default thread pool (now it named TppPoolpGlobalPool) when process startup. you have week control for this pool. you can not got it direct pointer PTP_POOL. exist undocumented TpSetDefaultPoolMaxThreads (for set the maximum number of threads in this pool) but no for minimum.
if want - you can create additional thread pools via CreateThreadpool function.
After creating the new thread pool, you can (but not should!) call SetThreadpoolThreadMaximum to specify the maximum number of threads that the pool can allocate and SetThreadpoolThreadMinimum to specify the minimum number of threads available in the pool.
The thread pool maintains an I/O completion port. the iocp created inside call CreateThreadpool - we have no direct access to it.
so initially in process exist one global/default thread pool (TppPoolpGlobalPool) and iocp (windows 10 for parallel loader create else one thread pool LdrpThreadPool but this of course only for internal use - while DDLs loading)
finally you bind self files to iocp by call CreateThreadpoolIo
note that msdn documentation is wrong here -
Creates a new I/O completion object.
really CreateThreadpoolIo function not create new I/O completion object - it created only inside call CreateThreadpool. this api bind file (not handle but file!) to I/O completion object which is associated to pool. to which pool ? look for last parameter - optional pointer to the TP_CALLBACK_ENVIRON.
you can specify a thread pool in next way - allocate callback environment, call InitializeThreadpoolEnvironment for it and then SetThreadpoolCallbackPool.
If you do not specify a thread pool, the global thread pool will be used in call CreateThreadpoolIo - so file will be bind to default/global process iocp
and you not need by self call GetQueuedCompletionStatus (or NtRemoveIoCompletion) in this case - system do this for you from pool. and then call your IoCompletionCallback callback function, which you pass to system inside CreateThreadpoolIo call
we can also use system global thread pool and iocp via BindIoCompletionCallback (
or RtlSetIoCompletionCallback) - it associates the I/O completion port owned by the global (TppPoolpGlobalPool) thread pool with the specified file handle. this is old api and variant of case 2. here we can not use custom poll - only process global.
now let back to concrete Python code. which case it use ? are it create iocp and thread pool by self ? or it use system thread pool ? if use system - use it global or custom thread pool allocated by CreateThreadpool ? if you dont know this - nothing can be done here. and even if know.. or this library have special api/interface (or how this in python called) for control this (in case self or custom pool used) or you only can use it as is. and really hard decide how many threads you really need in pool
Related
I'm trying to use IOCP relying on Windows API CreateThreadpoolIo and StartThreadpoolIo, but I found the thread pool is just to make the code behind the IO completed parallel. The async IO submit operations are also execute sequentially in the main thread. So why we need this? I think make the IO submit operations parallel can improve the throughput even if they are async operations, right?
The other cost is if we make them parallel, we might need to lock something to guarantee data consistency (thread safe operation).
It is possible to do IOCP without using CreateThreadpool / StartThreadpoolIo, in that case you have to manage calling GetQueuedCompletionStatus yourself (whether in a self-managed thread pool or otherwise - it is even conceivable that it could be interleaved into the actions of the thread that started the I/O, but in that case why bother with IOCP?). StartThreadpoolIO is needed in order to have a thread waiting on GetQueuedCompletionStatus instead of WaitForMultipleObjects (or one of its variants). CancelThreadpoolIo decrements a counter saying how many IOCP operations are outstanding and if that counter reaches 0 the thread pool knows it can stop waiting on GetQueuedCompletionStatus.
CreateThreadpoolIo - create object TP_IO and call ZwSetInformationFile with FileCompletionInformation and FILE_COMPLETION_INFORMATION for set CompletionContext in FILE_OBJECT. as result - if we do I/O on file, when it finished (if no synchronous error returned and we pass not zero ApcContext ) - system queue packet to I/O port ( which we provide in FILE_COMPLETION_INFORMATION ) with Key (from FILE_COMPLETION_INFORMATION ) and ApcContext (form concrete I/O call. win32 api always pass pointer to OVERLAPPED here). the user callback address (IoCompletionCallback ) stored inside TP_IO
StartThreadpoolIO increment reference count on TP_IO and CancelThreadpoolIo (and CloseThreadpoolIo) decrement this reference count. this need for manage life time of TP_IO - before we start any I/O operation - need increment reference count on TP_IO. when I/O finished - packet will be queued to I/O port. one of Threads from pool pop this packet. got Key ( lpCompletionKey) convert it to pointer to TP_IO and call user callback IoCompletionCallback. after callback return - system decrement reference count to TP_IO. if the I/O fail synchronous - will be no packet, no callback. so need direct decrement reference count to TP_IO - for this need call CancelThreadpoolIo
Windows Context Switching
The scheduler maintains a queue of executable threads for each
priority level. These are known as ready threads. When a processor
becomes available, the system performs a context switch. The steps in
a context switch are:
Save the context of the thread that just finished executing.
Place the thread that just finished executing at the end of the queue for its priority.
Find the highest priority queue that contains ready threads.
Remove the thread at the head of the queue, load its context, and execute it.
I don't know much about the topic yet, so I don't know how to elaborate on my question. Where is a thread's context saved, and can it be accessed (edit: read) programmatically (without modifying the kernel)?
If you have a handle to a thread with the required access rights you can suspend the thread and then call GetThreadContext. When a thread is running the values are in the real CPU registers, when it is not running the context is stored in memory not accessible from usermode.
The context stores the values of various CPU registers, it is only useful to debuggers and advanced features like code injection and error logging.
Regarding IOCP and Threadpools, should I use a specific function to create a "thread pool"?
Right now I create my "thread pool" by calling CreateThread as many times as the number of threads I want in the pool.
I then assumed that all the threads created with CreateThread should call GetQueuedCompletionStatus and that's actually the call that creates the "thread pool"? (By associating the threads with a specific port) The threadpool = the threads associated with the IOCP?
what i s the difference between SetEvent() and Thread Lock() function? anyone please help me
Events are used when you want to start/continue processing once a certain task is completed i.e. you want to wait until that event occurs. Other threads can inform the waiting thread about the completion of this task using SetEvent.
On the other hand, critical section is used when you want only one thread to execute a block of code at a time i.e. you want a set of instructions to be executed by one thread without any other thread changing the state at that time. For example, you are inserting an item into a linked list which involves multiple steps, at that time you don't want another thread to come and try to insert one more object into the list. So you block the other thread until first one finishes using critical sections.
Events can be used for inter-process communication, ie synchronising activity amongst different processes. They are typically used for 'signalling' the occurrence of an activity (e.g. file write has finished). More information on events:
http://msdn.microsoft.com/en-us/library/windows/desktop/ms686915%28v=vs.85%29.aspx
Critical sections can only be used within a process for synchronizing threads and use a basic lock/unlock concept. They are typically used to protect a resource from multi-threaded access (e.g. a variable). They are very cheap (in CPU terms) to use. The inter-process variant is called a Mutex in Windows. More info:
http://msdn.microsoft.com/en-us/library/windows/desktop/ms682530%28v=vs.85%29.aspx
can somebody please explain what is the difference if I do
mutex = createMutex
waitForSingleObject
Release(mutex)
and
event = createEvent
waitForSingleObject
Release(event)
I'm so confused, can I use both versions for the synchronization? thanks in advance for any help
You use a mutex to ensure that only one thread of execution can be accessing something. For example, if you want to update a list that can potentially be used by multiple threads, you'd use a mutex:
acquire mutex
update list
release mutex
With a mutex, only one thread at a time can be executing the "update list".
You use a manual reset event if you want multiple threads to wait for something to happen before continuing. For example, you started multiple threads, but they're all paused waiting for some other event before they can continue. Once that event happens, all of the threads can start running.
The main thread would look like this:
create event, initial value false (not signaled)
start threads
do some other initialization
signal event
Each thread's code would be:
do thread initialization
wait for event to be signaled
do thread processing
Yes, both can be used for synchronization but in different ways.
Mutex is a mutual exclusion object and can be acquired only by a single instance at a time. It is used to avoid the simultaneous use of a common resource, such as a global variable, by pieces of computer code
Event is an objet that can be explicitly set to a state by use of the SetEvent function.