WSASend send order across multiple sockets - windows

If I have one socket, and I do overlapped WSASends (from a single thread), then windows guarantees data will be sent in the order I call WSASend.
But if I have two sockets (connected to the same receiver), and I do overlapped WSASend's on them alternately (again from a single thread), does windows guarantee the order of sends is the order in which I call WSASend? If the answer is no, then would it help if I set the SO_SNDBUF socket option to zero on both sockets (so that calling WSASend would immediately put it on the wire)?
I also have the same question for WSARecv. If from a single thread I do overlapped WSARecv's on multiple sockets, will the completion routines be called in order?

Related

GetOverlappedResultEx will create a thread to process on or do I have to create and sync the threads?

Trying to understand how this works... do I have to create various threads to take advantage of the functionality for GetOverlappedResultEx? However why couldn't I just put GetOverlappedResult in a separate thread from the main thread to handle blocking of the IO and not interfere with main operations?
GetOverlappedResult function
https://learn.microsoft.com/en-us/windows/win32/api/ioapiset/nf-ioapiset-getoverlappedresult
Retrieves the results of an overlapped operation on the specified file, named pipe, or communications device. To specify a timeout interval or wait on an alertable thread, use GetOverlappedResultEx.
https://learn.microsoft.com/en-us/windows/win32/api/ioapiset/nf-ioapiset-getoverlappedresultex
Retrieves the results of an overlapped operation on the specified file, named pipe, or communications device within the specified time-out interval. The calling thread can perform an alertable wait.
https://learn.microsoft.com/en-us/windows/win32/fileio/alertable-i-o
You handle threads, for concurrency, yourself.
There are basically three ways to do it:
Having initiated an overlapped (i.e., async completion) I/O operation you do something else and then every once in awhile poll the handle to see if the overlapped operation has completed. This is how you can use GetOverlappedResult looking for STATUS_PENDING to see if the operation isn't done yet.
You sit around waiting for an overlapped operation to complete. But it's not as bad as that, because you can actually sit around waiting for any of a set of overlapped operations to complete. As soon as any one completes you handle it, and then loop around to wait for the rest. Handling it, of course, may fire off another asynch operation, you add that handle to the list. This is where you use WaitForSingleObject{Ex} or better WaitForMultipleObjects{Ex}.
You use I/O Completion ports. Here you pass some handles to a kernel object called an I/O Completion port - this kernel object cleverly combines a thread pool (that it manages itself) with callbacks. It is a very efficient way of dealing with multiple - in fact, very many - async operations in-flight simultaneously. In these callbacks you can do whatever you want, including initiating more async operations and adding them to the same I/O Completion port.
There is also a fourth concept: alertable I/O, which executes a callback on an "APC" on your thread that initiated the I/O, provided your thread is in an "alertable" state - which means it is executing one or another of certain APIs that wait in the kernel. But I've never used it, as it seems to have drawbacks (such as only working on the thread that initiated the I/O, and that the environment the callback environment runs in isn't as clear as it could be) and if you're going to go that far just figure out I/O Completion ports and use them.
Options #2 and #3 of course involve concurrent programming - so in both cases you have to make sure your callbacks are thread-safe with respect to your other threads.
There are plenty of examples of all these methods out there on the intertubes.

Windows socket completion routine callback after closesocket

While busy working with Windows Sockets in overlapped mode and using Completion routines (so no IOCP) for feedback I found the following curious case:
Open a server socket using listen and AcceptEx.
Connect a Client socket on said port using ConnectEx
We now have (at least) 3 sockets: 1 listing socket, a client connected socket and a server connected socket.
after transferring some data we close both the server and client connected sockets with shutdown. After this step both sockets are closed with closesocket.
Currently: just to be sure we have no pending completion routine I issue the following (pseudocode):
while SleepEx( 0, TRUE ) == WAIT_IO_COMPLETION do ;
I thought now it would be save to free the memory of the OVERLAPPED structures used by WSARecv and WSASend.
After this moment when the thread becomes in an alertable state again another completion routine callback is done for the server connected socket with an error 10053 but using the OVERLAPPED structure we just freed. This is use of memory after free.
Question:
When can you be sure no completion callbacks are issued anymore for a socket using overlapped IO using completion routines?
You need to wait for the I/O completion (closing the socket will cancel outstanding requests and you will get a completion callback).
The OS has ownership of the OVERLAPPED structure and associated buffer until you synchronize on event completion (by waiting for the hEvent or receiving an APC). You cannot do anything with the buffer until you receive this callback, and you definitely must not free it. Wait for the OS to tell you it is no longer needed.
Note that cancellations don't necessarily cause completion immediately, because the driver may be synchronizing with hardware requests and only mark the IRP complete when the hardware state changes. (This would be necessary if DMA is in use but might be done for other operations just for consistency) So the SleepEx loop you showed is not guaranteed to collect all cancellations.
Keep track for each socket of the pending operations, and use WaitForSingleObjectEx instead of SleepEx, to wait explicitly for each one.

Does an IO completetion port spawn a new thread before or after the completion port has something to report?

I am a bit confused as to what actually happens when an IO completion port completes.
I presume that the Win API allows access to an IOCP queue that somehow is able to queue (or stack) a callback reference with a specific handle (let's say a socket).
When windows receives an interrupt from the NIC, then it at some point gets to the IOCP queue for the NIC and executes the callbacks on its own (IOCP) thread pool.
My question is, is this thread from the thread pool spawned upon the interrupt being received, or is it in fact spawned when the call to the Win API is made, effectively having the thread in a wait state until it is then woken by the IOCP queue?
EDIT:
I found this: http://msmvps.com/blogs/luisabreu/archive/2009/06/04/multithreading-i-o-and-the-thread-pool.aspx where is states: "Whenever that operation completes, it will queue a packet on that I/O completion port. The port will then proceed and use one of the thread pool’s thread to run the callback you’ve specified."
It's probably easier to think of an I/O completion port simply as a thread safe queue that the operating system places the results of overlapped operations into for you when they have completed.
You create the IOCP, you then create some threads and these threads call a function to remove items from this queue. Generally this is GetQueuedCompletionStatus(). This function essentially blocks your thread until there's something in the IOCP (queue) and then allows your thread to retrieve that something and run.
You associate file handles and sockets with the IOCP and this simply means that once associated their overlapped completions will be placed in the IOCP (queue) for you.
It's more complex than that, but that's the way you should be thinking.

Listening to multiple sockets: select vs. multi-threading

A server needs to listen to incoming data from several sockets (10-20). After some initializations, those sockets are created and do not change (i.e. no new sockets accepted, and none of them is expected to close during the lifetime of the server).
One option is to select() on all sockets, then deal with incoming data per socket (i.e. route to proper handling function).
Another option is to open one thread per socket and let each thread recv() and handle the input.
(The first option has the benefit of setting a timeout, but this is not an issue in this case,
since all the sockets are quite active).
Assuming the following: Windows server, has enough memory such that 20MB (for the 20 threads) is a non-issue, is any of those options expected to be faster then the other?
There's not much in it in you app. Typically, using a thread-per-socket is easier than asynchronous approaches because it's a simpler overall structure and it's easier to maintain state.

Multiple Socket client connecting to a server

I am designing an simulator application where the application launches multiple socket connection(around 1000 connections) to a server. I don't want to launch as many as threads to handle those connections, since the system cant handle that much clients. Using Select doesnt make sense, since i need to loop through 1000 connections which may be slow. Please suggest me how to handle this scenario.
You want to be using asynchronous I/O with an I/O Completion Port (IOCP).
It's too much to explain shortly, but any Windows application that needs to support a large number of concurrent sockets should be using an IOCP.
An IOCP is essentially an Windows-provided thread safe work queue. You queue a 'completion packet' to an IOCP and then another thread dequeues it and does work with it.
You can also associate many types of handles that support overlapped operations, such as sockets, to an IOCP. When you associate a handle with an IOCP, overlapped operations such as WSARecv will automatically post a completion packet to the associated IOCP.
So, essentially, you could have one thread handling all 1000 connections. Each socket will be created as an overlapped socket and then associated with your IOCP. You can then call WSARecv on all 1000 sockets and wait for a completion packet to become available. When data is received, the operating system will post a completion packet to the associated IOCP. This will contain relevant information, such as how much data was read and the buffer containing the data.
Looping through 1000 handles is still significantly faster than sending 1000 packets, so I wouldn't worry about performance here. select() is still the way to go.

Resources