WSASYSCALLFAILURE with overlapped IO on Windows XP - windows

I hit a bug in my code which uses WSARecv and WSAGetOverlapped result on an overlapped socket. Under heavy load, WSAGetOverlapped returns with WSASYSCALLFAILURE ('A system call that should never fail has failed') and my TCP stream is out of sync afterwards, causing mayhem in the upper levels of my program.
So far I have not been able to isolate it to a given set of hardware or drivers. Has somebody hit this issue as well, and found a solution or workaround?

How many connections, how many pending recvs, how many outsanding sends? What does perfmon or task manager say about the amount of non-paged pool used? How much memory in the box? Does it go away if you run the program on Vista or above? Do you have any LSPs installed?
You could be exhausting non-paged pool and causing a badly written driver to misbehave when it fails to allocate memory. This issue is less likely to bite on Vista or later as the amount of non-paged pool available has increased dramatically (see http://www.lenholgate.com/blog/2009/03/excellent-article-on-non-paged-pool.html for details). Alternatively you might be hitting the "locked pages" limit (you can only lock a fixed number of pages in memory on the OS and each pending I/O operation locks one or more pages depending on buffer size and allocation alignment).

It seems I have solved this issue by sleeping 1ms and retrying the WSAGetOverlapped result when it reports a WSASYSCALLFAILURE.
I had another issue related to overlapped events firing, even though there is no data, which I also had to solve first. The test is now running for over an hour, with a few WSASYSCALLFAILURE handled correctly. Hopefully the overnight test will succeed as well.
#Len: thanks again for your help.
EDIT: The overnight test was successful. My bug was caused by two interdependent issues:
Issue 1: WaitForMultipleObjects in ConnectionSet::select occasionally
signals data on an empty socket, causing SocketConnection::readSync to
deadlock.
Fix: Do a non-blocking read on the first byte of each packet. Reset
ConnectionSet if socket was empty
Issue 2: WSAGetOverlappedResult returns occasionally WSASYSCALLFAILURE,
causing out-of-sync on the TCP stream.
Fix: Retry WSAGetOverlappedResult after a small sleep period.
http://equalizer.svn.sourceforge.net/viewvc/equalizer?view=revision&revision=4649

Related

Call to ExAllocatePoolWithTag never returns

I am having some issues with my virtualHBA driver on Windows Server 2016. A ran the HLK crashdump support test. 3 times out of 10 the test passed. In those 3 failing tests, the crashdump hangs at 0% while taking Complete dump, or Kernel dump or minidump.
By kernel debugging my code, I found that the call to ExAllocatePoolWithTag() for buffer allocation never actually returns.
Below is the statement which never returns.
pDeviceExtension->pcmdbuf=(struct mycmdrsp *)ExAllocatePoolWithTag(NonPagedPoolCacheAligned,pcmdqSignalSize,((ULONG)'TA1'));
I searched on the web regarding this. However, all of the found pages are focusing on this function returning NULL which in my case never returns.
Any help on how to move forward would be highly appreciated.
Thanks in advance.
You can't allocate memory in crash dump mode. You're running at HIGH_LEVEL with interrupts disabled and so you're calling this API at the wrong IRQL.
The typical solution for a hardware adapter is to set the RequestedDumpBufferSize in the PORT_CONFIGURATION_INFORMATION structure during the normal HwFindAdapter call. Then when you're called again in crash dump mode you use the CrashDumpRegion field to get your dump buffer allocation. You then need to write your own "crash dump mode only" allocator to allocate buffers out of this memory region.
It's a huge pain, especially given that it's difficult/impossible to know how much memory you're ultimately going to need. I usually calculate some minimal configuration overhead (i.e. 1 channel, 8 I/O requests at a time, etc.) and then add in a registry configurable slush. The only benefit is that the environment is stripped down so you don't need to be in your all singing, all dancing configuration.

On Windows, WSASend fails with WSAENOBUFS

On Windows XP when I am calling WSASend in iterations on non-blocking socket, it fails with WSAENOBUFS.
I've two cases here:
Case 1:
On non-blocking socket I am calling WSASend. Here is pseudo-code:
while(1)
{
result = WSASend(...); // Buffersize 1024 bytes
if (result == -1)
{
if (WSAGetLastError() == WSAENOBUFS)
{
// Wait for some time before calling WSASend again
Sleep(1000);
}
}
}
In this case WSASend returns sucessfully for around 88000 times. Then it fails with WSAENOBUFS and never recovers even when tried after some time as shown in the code.
Case 2:
In order to solve this problem, I referred this and as suggested there,
just before above code, I called setsockopt with SO_SNDBUF and set buffersize 0 (zero)
In this case, WSASend returns sucessfully for around 2600 times. Then it fails. But after waiting it succeeds again for 2600 times then fails.
Now I've these questions in both the cases:
Case 1:
What factors decides this number 88000 here?
If the failure was because of TCP buffer was full, why it didn't recover after some time?
Case 2:
Again, what factors decides the number 2600 here?
As given in Microsoft KB article, if instead of internal TCP buffers it sends from application buffer directly, why would it fail with WSAENOBUFS?
EDIT:
In case of asynchronous sockets (On Windows XP), the behavior is more strange. If I ignore WSAENOBUFS and continued further writing to socket I eventually get disconnection WSAECONNRESET. And not sure at the moment why does that happen?
The values are undocumented and depend on what's installed on your machine that may sit between your application and the network driver. They're likely linked to the amount of memory in the machine. The limits (most probably non-paged pool memory and i/o page lock limit) are likely MUCH higher on Vista and above.
The best way to deal with the problem is add application level flow control to your protocol so that you don't assume that you can just send at whatever rate you feel like. See this blog posting for details of how non-blocking and async I/O can cause resource usage to balloon and how you have no control over it unless you have your own flow control.
In summary, never assume that you can just write data to the wire as fast as you like using non-blocking/async APIs. Remember that due to how TCP/IP's internal flow control works you COULD be using an uncontrollable amount of local machine resources and the client is the only thing that has any control over how fast those resources are released back to the O/S on the server machine.

When windows releases threads?

Our product consumes a lot of windows resources, such as socket handles, memory, threads and so on. Usually there are 700-900 active threads, but in some cases product can rapidly create new threads and do some work, and close it.
I came across with crash memory dump of our product. With ~* windbg command I can see 817 active threads, but when I run !handle command it prints me these summary:
Type Count
None 15
Event 2603
Section 13
File 705
Directory 4
Mutant 32
WindowStation 2
Semaphore 789
Key 208
Process 1
Thread 5766
Desktop 1
IoCompletion 308
Timer 276
KeyedEvent 1
TpWorkerFactory 48
So, actually process holds 5766 threads. So, my question, When Windows actually frees handles for process? Is it possible some kind of delay, or cashing? Can someone explain this behavior?
I don't think that we have handle leaks, but we have weird behavior in legacy part of system with rapidly creating and closing threads for small tasks. Also I would like to point, that we unlikely run more than 1000 threads simultaneously, I am pretty sure about this.
Thanks.
When you say So, actually process holds 5766 threads., what you really mean is that the process holds 5766 thread handles.
Even though a thread may no longer be running, whether that is the result of a call to ExitThread()/TerminateThread() or returning from the ThreadProc, any handles to that thread will remain valid. This makes it possible to do things like call GetExitCodeThread() on the handle of a thread that has finished its work.
Unfortunately, that means that you have to remember to call CloseHandle() instead of just letting it leak. The MSDN example on Creating Threads covers this to some extent.
Another thing that I will note is that somewhere not too far above 1000 running threads, you are likely to exhaust the amount of virtual address space available to a 32bit process since each thread by default reserves 1MB of address space for its stack.

get_user_pages -EFAULT error caused by VM_GROWSDOWN flag not set

I'm continue my work on the FGPA driver.
Now I'm adding OpenCL support. So I have a following test.
It's just add NUM_OF_EXEC times write and read requests of same buffers and after that waits for completion.
Each write/read request serialized in driver and sequentially executed as DMA transaction. DMA related code can be viewed here.
So the driver takes a transaction, execute it (rsp_setup_dma and fpga_push_data_to_device), waits for interrupt from FPGA (fpga_int_handler), release resources (fpga_finish_dma_write) and begin a new one. When NUM_OF_EXEC equals to 1, all seems to work, but if I increase it, problem appears. At some point get_user_pages (at rsp_setup_dma) returns -EFAULT. Debugging the kernel, I found out, that allocated vma doesn't have VM_GROWSDOWN flag set (at find_extend_vma in mmap.c). But at this point I stuck, because neither I'm sure that I understand why this flag is needed, neither I have an idea why it is not set. Why can get_user_pages fail with the above symptomps? How can I debug this?
On some architectures the stack grows up and on others the stack grows down. See hppa and hppa64 for the weirdos that created the need for such a flag.
So whenever you have to deal with setting up the stack for a kernel thread or process you'll have to provide the direction in which the stack grows as well.

Windows: TCP/IP: force close connection: avoid memleaks in kernel/user-level

A question to windows network programming experts.
When I use pseudo-code like this:
reconnect:
s = socket(...);
// more code...
read_reply:
recv(...);
// merge received data
if(high_level_protocol_error) {
// whoops, there was a deviation from protocol, like overflow
// need to reset connection and discard data right now!
closesocket(s);
goto reconnect;
}
Does kernel un-associate and frees all data "physically" received from NIC(since it must really already be there, in kernel memory, waiting for user-level to read it with recv()), when I closesocket()? Well, it logically should since data is not associated with any internal object anymore, right?
Because I don't really want to waste unknown amount of time for clean shutdown like "call recv() until returns error". That does not make sense: what if it will never return error, say, server continues to send data forever and not closes connection, but that is bad behaviour?
I'm wondering about it since I don't want my application to cause memory leaks anywhere. Is this way of forced resetting connection, that still expected to send in unknown amount of data correct?
// optional addition to question: if this method considered correct for windows, can it be considered correct (with change of closesocket() to close() ) for UNIX-compliant OS?
Kernel drivers in Windows (or any OS really), including tcpip.sys, are supposed to avoid memory leaks in all circumstances, regardless of what you do in user mode. I would think that the developers have charted the possible states, including error states, to make sure that resources aren't leaked. As for user mode, I'm not exactly sure but I wouldn't think that resources are leaked in your process either.
Sockets are just file objects in Windows. When you close the last handle to a file, the IO manager sends a IRP_MJ_CLEANUP message to the driver that owns the file to clean up resources associated with it. The receive buffers associated with the socket would be freed along with the file object.
It does say in the closesocket documentation that pending operations are canceled but that async operations may complete after the function returns. It sounds like closing the socket while in use is a supported scenario and wouldn't lead to a memory leak.
There will be no leak and you are under no obligation to read the stream to EOS before closing. If the sender is still sending after you close it will eventually get a 'connection reset'.

Resources