GetOverlappedResultEx will create a thread to process on or do I have to create and sync the threads? - windows

Trying to understand how this works... do I have to create various threads to take advantage of the functionality for GetOverlappedResultEx? However why couldn't I just put GetOverlappedResult in a separate thread from the main thread to handle blocking of the IO and not interfere with main operations?
GetOverlappedResult function
https://learn.microsoft.com/en-us/windows/win32/api/ioapiset/nf-ioapiset-getoverlappedresult
Retrieves the results of an overlapped operation on the specified file, named pipe, or communications device. To specify a timeout interval or wait on an alertable thread, use GetOverlappedResultEx.
https://learn.microsoft.com/en-us/windows/win32/api/ioapiset/nf-ioapiset-getoverlappedresultex
Retrieves the results of an overlapped operation on the specified file, named pipe, or communications device within the specified time-out interval. The calling thread can perform an alertable wait.
https://learn.microsoft.com/en-us/windows/win32/fileio/alertable-i-o

You handle threads, for concurrency, yourself.
There are basically three ways to do it:
Having initiated an overlapped (i.e., async completion) I/O operation you do something else and then every once in awhile poll the handle to see if the overlapped operation has completed. This is how you can use GetOverlappedResult looking for STATUS_PENDING to see if the operation isn't done yet.
You sit around waiting for an overlapped operation to complete. But it's not as bad as that, because you can actually sit around waiting for any of a set of overlapped operations to complete. As soon as any one completes you handle it, and then loop around to wait for the rest. Handling it, of course, may fire off another asynch operation, you add that handle to the list. This is where you use WaitForSingleObject{Ex} or better WaitForMultipleObjects{Ex}.
You use I/O Completion ports. Here you pass some handles to a kernel object called an I/O Completion port - this kernel object cleverly combines a thread pool (that it manages itself) with callbacks. It is a very efficient way of dealing with multiple - in fact, very many - async operations in-flight simultaneously. In these callbacks you can do whatever you want, including initiating more async operations and adding them to the same I/O Completion port.
There is also a fourth concept: alertable I/O, which executes a callback on an "APC" on your thread that initiated the I/O, provided your thread is in an "alertable" state - which means it is executing one or another of certain APIs that wait in the kernel. But I've never used it, as it seems to have drawbacks (such as only working on the thread that initiated the I/O, and that the environment the callback environment runs in isn't as clear as it could be) and if you're going to go that far just figure out I/O Completion ports and use them.
Options #2 and #3 of course involve concurrent programming - so in both cases you have to make sure your callbacks are thread-safe with respect to your other threads.
There are plenty of examples of all these methods out there on the intertubes.

Related

Latest Windows threadpool API usage for I/O

I don't understand part of the latest Windows threadpool API. I need help with that.
From the documentation, the recipe to use it for I/O (in my case, for SOCKET) can be summarized as follows:
Call CreateThreadpoolIo.
Call StartThreadpoolIo. You can find this warning there:
You must call this function before initiating each asynchronous I/O operation on the file handle bound to the I/O completion object. Failure to do so will cause the thread pool to ignore an I/O operation when it completes and will cause memory corruption.
Call the operation on the file handle (e.g., WSARecvFrom). If it fails, call CancelThreadpoolIo. Otherwise, process the result when it is available. WSARecvFrom, when used asynchronously, asks for a WSAOVERLAPPED (that you have to create beforehand) but not for any information that links it to the previous call to StartThreadpoolIo. CancelThreadpoolIo only asks for the PTP_IO, but not for any additional information to derive a specific asynchronous operation.
Repeat steps 2 and 3.
Call CloseThreadpoolIo to finish. You can find this warning there:
It may be necessary to cancel threadpool I/O notifications to prevent memory leaks. For more information, see CancelThreadpoolIo.
I usually need it for UDP, so I strive to have several reception operations queued (asynchronous WSARecvFrom operations started) at any given time. That way I don't have to rush to start another reception operation at the beginning of the callback function nor synchronize access to the reception buffers (I can have a pool of them, each one able to contain a datagram, and reissue the reception operation when I finish processing each message; in the interim, other queued operations will keep the receiver busy). Datagrams are independent and self contained. I'm aware that this approach may not be valid for TCP.
StartThreadpoolIo/CancelThreadpoolIo seem to me the source of the problem: StartThreadpoolIo and WSARecvFrom are not directly bound (they don't share any arguments). So:
How can the framework know which operation to cancel when you call CancelThreadpoolIo? How does it cancel just the operation that failed and not any of the pending ones?
You can say, "don't call StartThreadpoolIo concurrently". I can live without several concurrent WSARecvFrom's, but I can't live without concurrent WSARecvFrom and WSASendTo. So I think being unable to have several asynchronous operations at the same time can't be the way the API was designed.
You can say, "call StartThreadpoolIo only once, that will suffice to register the callback; it is an on/off process". But the documentation says:
You must call this function before initiating each asynchronous I/O operation on the file handle...
You can say, "it cancels the operation started by the same thread that just called StartThreadpoolIo". But then the advice of calling CancelThreadpoolIo in the context of calling CloseThreadpoolIo doesn't make sense (I will call CloseThreadpoolIo from the thread that triggers stopping, which will be completely independent from the threads issuing the asynchronous operations; and a single call to CancelThreadpoolIo may not be enough to cancel several operations). Being unable to trigger cancellation from a different thread is a serious limitation, anyway. I'm aware of the existence of CreateThreadpoolCleanupGroup, but my question is more fundamental. I want to understand how this API can be fundamentally right and useful.
You can say "call CreateThreadpoolIo several times, so that you have independent PTP_IO's to work with". It doesn't work. When I call CreateThreadpoolIo a second time, nullptr is returned.
Am I wrong, or is this API awkward? Normally, other asynchronous APIs work with one of these patterns:
Create an operation and receive a handle => call methods passing the handle.
Create a reusable handle => call methods (including starting operations) passing the handle.
The latest Windows threadpool API, in which the handle seems to be implicit, or there are several handles for the same operation (TP_IO, WSAOVERLAPPED, StartThreadpoolIo) and they aren't all explicitly linked together, uses neither of them.
Thank you very much for your help.
How can the framework know which operation to cancel when you call CancelThreadpoolIo? How does it cancel just the operation that failed
and not any of the pending ones?
CancelThreadpoolIo() doesn't cancel IO. It is reciprocal to StartThreadpoolIo(). StartThreadpoolIo() prepares threadpool to accept a completion. If threadpool doesn't expect a completion, it won't wait for it, thus you may miss it. If threadpool expects a completion but completion doesn't happen, threadpool may waste resources.
CancelThreadpoolIo() undoes whatever StartThreadpoolIo() did.

WaitForSingleObject() vs RegisterWaitForSingleObject()?

What is the advantage/disadvantage over using RegisterWaitForSingleObject() instead of WaitForSingleObject()?
The reason that I know:
RegisterWaitForSingleObject() uses the thread pool already available in OS
In case of the use of WaitForSingleObject(), an own thread should be polling for the event.
the only difference is Polling vs. Automatic Event? or Is there any considerable performance advantage between these?
It's pretty straight-forward, WaitForSingleObject() blocks a thread. It is consuming a megabyte of virtual memory and not doing anything useful with it while it is blocked. It won't wake up and resume doing useful stuff until the handle is signaled.
RegisterWaitForSingleObject() does not block a thread. The thread can continue doing useful work. When the handle is signaled, Windows grabs a thread-pool thread to run the code you specified as the callback. The same code you would have programmed after a WFSO call. There is still a thread involved with getting that callback to run, the wait thread, but it can handle many RWFSO requests.
So the big advantage is that your program can use a lot less threads while still handling many service requests. A disadvantage is that it can take a bit longer for the completion code to start running. And it is harder to program correctly since that code runs on another thread. Also note that you don't need RWFSO when you already use overlapped I/O.
They serve two different code models. In case with RegisterWaitForSingleObject you'll get an asynchronous notification callback on a random thread from the thread pool managed by the OS. If you can structure your code like this, it might be more efficient. On the other hand, WaitForSingleObject is a synchronous wait call blocking (an thus 'occupying') the calling thread. In most cases, such code is easier to write and would probably be less error-prone to various dead-lock and race conditions.

What happens when an async_write() operation never ends and there is a strand involved?

I know that the next async_write()'s should be performed when the previous one finished (with or without errors, but when it finished).
I would like to know what happens when, while making async_write() calls, if one of these takes long time for some reason or even never ends (I assume there is no timeouts here like in synchronous operations). When this operation will be considered as failed? When that operation that never ends is finally removed by the OS internally?
Maybe, are there timeouts involved and my assumptions are wrong?
I mean, the write operation is sent to the OS and could possibly block, indefinitely?
So the handler is never called and the next async_write()'s are never called.
NOTE: I am assuming that we are calling run() in several threads but the write operations should be sent in order so I am also assuming that the write handlers are wrapped with a strand.
Thank you for your time.
There are no explicit timeouts for asynchronous operations, but they can be cancelled through the IO object's cancel() member function. These operations will be considered as having failed only when the underlying OS call itself fails in a manner where a retry cannot reasonable occur. For example, if the write fails from:
EINTR, then the write will immediately be reattempted.
EWOULDBLOCK, EAGAIN, or ERROR_RETRY, then Boost.Asio will push the operation back into the job queue. This could occur if the write buffer was full, so pushing the operation back into the queue defers its reattempt, allowing other operations to be attempted.
Other errors will cause the operation to fail.
There should not be an indefinitely block in the system call. Boost.Asio sets the underlying IO objects to non-blocking, and provides synchronous blocking writes behavior by waiting on the associated file descriptor if a write failed with EWOULDBLOCK, EAGAIN, or ERROR_RETRY.
A strand is not affected by long term asynchronous operations. Strands are used to provide strict sequential invocation of handlers, not the operations themselves. In the case of composed operations, such as boost::asio::async_write, the intermediate handlers will also be invoked through the same strand as the final handler. Overall, this behavior helps provide thread safety, as:
All async_write_some operations initiated from intermediate handlers are within the strand.
The operation itself is not within the strand. This allows other for other handlers to run while the actual write is occurring.
The user handler will be invoked within the strand.
This answer may provide some more insight into composed operations and strands.

How to find out when CancelIo() is done?

CancelIo() is supposed to cancel all pending I/O operations associated with the calling thread. In my experience, CancelIo() sometimes cancels future I/O operations as well. Given:
ReadFile(port, buffer, length, &bytesTransferred, overlapped);
If I invoke CancelIo(port) immediately before the read, GetQueuedCompletionStatus() will block forever, never receiving the read operation.
If I invoke CancelIo(port) immediately after the read, GetQueuedCompletionStatus() will return 0 with GetLastError()==ERROR_OPERATION_ABORTED
If I invoke CancelIo(port) and there are no pending or subsequent reads, GetQueuedCompletionStatus() will block forever.
The key point here is that there is no way to detect when CancelIo() has finished executing. How can I ensure that CancelIo() is done executing and it is safe to issue further read requests?
PS: Looking at http://osdir.com/ml/lib.boost.asio.user/2008-02/msg00074.html and http://www.boost.org/doc/libs/1_44_0/doc/html/boost_asio/using.html it sounds like CancelIo() is not really usable. Must customer requires Windows XP support. What are my options?
NOTE: I am reading from a serial port.
CancelIo() works fine. I misunderstood my code.
Upon further investigation it turns out that the code was invoking CancelIo() followed by ReadFile() with a timeout INFINITE. The completion port was never getting notified of the read because the remote end was never sending anything. In other words, CancelIo() did not cancel subsequent operations.
I found some eye-opening documentation here:
Be careful when coding for asynchronous I/O because the system reserves the right to make an operation synchronous if it needs to. Therefore, it is best if you write the program to correctly handle an I/O operation that may be completed either synchronously or asynchronously. The sample code demonstrates this consideration.
It turns out that device drivers may choose to treat an asynchronous operation in a synchronous manner if the data being read is already cached by the device driver. Upon further investigation, I discovered that when CancelIo() was being invoked before ReadFile() it would sometimes cause the latter to return synchronously. I have no idea why the completion port was never getting notified of ReadFile() after a CancelIo() but I can no longer reproduce this problem.
The completion port is signaled regardless of whether ReadFile() is synchronous or asynchronous.
Wait on (possibly with zero timeout) overlapped.Handle. It will be set whether the operation is completed or cancelled.
If you're already using overlapped operations, why do you need to cancel I/O at all? The entire concept of 'cancelling' an in-flight I/O operation is really race-prone, and totally subject to the underlying device stack you're trying to write to; really the only time you'd want to do this is to unblock another thread who is waiting on the completion of that I/O.
It is possible to write asynchronous I/O code without CancelIo function. The question depends on the scenario you are using CancelIO. Let's say that you need to implement file reading thread. Thread pseudo-code:
for(;;)
{
ReadFile(port, buffer, length, &bytesTransferred, overlapped);
WaitForMultipleObjects( overlapped event + stop event);
if ( stop event is signaled )
break;
if (overlapped event is signaled )
handle ReadFile results
}
Such thread reads file (socket, port etc.) using overlapped I/O. Most of the time it waits on WiatForMultipleObjects line. It wakes up when new data is available, or stop event is signaled. To stop this thread, set stop event from another thread. CancelIO is not used.

WaitForSingleObject on a file handle?

What happens when you call WaitForSingleObject() on a handle you've created with CreateFile() or _get_osfhandle()?
For reasons not worth explaining I would like to use WaitForSingleObject() to wait on a HANDLE that I've created with _get_osfhandle(fd), where fd comes from a regular call to _open(). Is this possible?
I have tried it in practice, and on some machines it works as expected (the HANDLE is always in the signaled state because you can read more data from it), and on some machines WaitForSingleObject() will block indefinitely if you let it.
The MSDN page for WaitForSingleObject() says that the only supported things that it handles are "change notifications, console input, events, memory resource notifications, mutex, processes, semaphores, threads, and waitable timers."
Additionally, would it be different if I used CreateFile() instead of _get_osfhandle() on a CRT file descriptor?
Don't do it. As you can see, it has undefined behavior.
Even when the behavior is defined, it's defined in such a way as to be relatively not useful unless you don't like writing additional code. It is signaled when any asynchronous I/O operation on that handle completes, which does not generalize to tracking which I/O operation finished.
Why are you trying to wait on a file handle? Clearly the intent matters when you are doing something that isn't even supported well enough to not block indefinitely.
I found the following links. The concensus seems to me, don't do it.
Asynch IO explorer
Waiting on a file handle
When an I/O operation is started on an
asynchronous handle, the handle goes
into a non-signaled state. Therefore,
when used in the context of a
WaitForSingleObject or
WaitForMultipleObjects operation, the
file handle will become signaled when
the I/O operation completes. However,
Microsoft actively discourages this
technique; it does not generalize if
there exists more than one pending I/O
operation; the handle would become
signaled if any I/O operation
completed. Therefore, although this
technique is feasible, it is not
considered best practice.
Egghead Cafe:
Use ReadDirectoryChangesW in
overlapped mode. WaitForSingleObject
can wait on the event in the
OVERLAPPED struct.
You can also use the API
WaitForSingleObject() to wait on a
file change if you use the following
change notification function:
FindFirstChangeNotification()
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/fileio/fs/findfirstchangenotification.asp
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dllproc/base/waitforsingleobject.asp
An interesting note on "evilness" of ReadDirectoryChangesW:
http://blogs.msdn.com/ericgu/archive/2005/10/07/478396.aspx

Resources