On Windows, WSASend fails with WSAENOBUFS - windows

On Windows XP when I am calling WSASend in iterations on non-blocking socket, it fails with WSAENOBUFS.
I've two cases here:
Case 1:
On non-blocking socket I am calling WSASend. Here is pseudo-code:
while(1)
{
result = WSASend(...); // Buffersize 1024 bytes
if (result == -1)
{
if (WSAGetLastError() == WSAENOBUFS)
{
// Wait for some time before calling WSASend again
Sleep(1000);
}
}
}
In this case WSASend returns sucessfully for around 88000 times. Then it fails with WSAENOBUFS and never recovers even when tried after some time as shown in the code.
Case 2:
In order to solve this problem, I referred this and as suggested there,
just before above code, I called setsockopt with SO_SNDBUF and set buffersize 0 (zero)
In this case, WSASend returns sucessfully for around 2600 times. Then it fails. But after waiting it succeeds again for 2600 times then fails.
Now I've these questions in both the cases:
Case 1:
What factors decides this number 88000 here?
If the failure was because of TCP buffer was full, why it didn't recover after some time?
Case 2:
Again, what factors decides the number 2600 here?
As given in Microsoft KB article, if instead of internal TCP buffers it sends from application buffer directly, why would it fail with WSAENOBUFS?
EDIT:
In case of asynchronous sockets (On Windows XP), the behavior is more strange. If I ignore WSAENOBUFS and continued further writing to socket I eventually get disconnection WSAECONNRESET. And not sure at the moment why does that happen?

The values are undocumented and depend on what's installed on your machine that may sit between your application and the network driver. They're likely linked to the amount of memory in the machine. The limits (most probably non-paged pool memory and i/o page lock limit) are likely MUCH higher on Vista and above.
The best way to deal with the problem is add application level flow control to your protocol so that you don't assume that you can just send at whatever rate you feel like. See this blog posting for details of how non-blocking and async I/O can cause resource usage to balloon and how you have no control over it unless you have your own flow control.
In summary, never assume that you can just write data to the wire as fast as you like using non-blocking/async APIs. Remember that due to how TCP/IP's internal flow control works you COULD be using an uncontrollable amount of local machine resources and the client is the only thing that has any control over how fast those resources are released back to the O/S on the server machine.

Related

Dismiss or Handle Data Abort when AXI transaction replies an error

Background
I have an ZynqMP system which has four Cortex-A53 cores (PS) along with FPGA logic (PL). They transfer data via AXI bus.
I've placed some Xilinx AXI Quad SPI in my design. Linux which runs on PS successfully probes them, and starts a daemons which periodically (333 Hz) ask MCUs on SPIs to reply their data chunk (~ up to around 500 bytes, split in every 64 bytes.)
They works nicely for a while (median 50 minutes) but suddenly the readl_relaxed() in SPI driver causes Synchronous External Abort which leads an Kernel Panic. It seems to be an AXI's error reply according to ARM TRM, and might be recoverable because it's "synchronous" which means the registers are not corrupted (in my understanding.)
After some search I found the do_sea() func that handles SEA and also found that there's no chance to recover from it according to the implementation.
I want the AXI error to be handled like: discard the read, return SIGBUS and lead the process to be killed, etc.
Of course I'm debugging the Abort and finding why it occurs but at present I have no clue.
Question
So my questions are:
Why SEAs are not recoverable in Linux arm64 implementation?
If I can "handle" or "ignore" it, how do I modify Linux kernel code (I know it's stupid but I'd like to know if there's a way.)
What can reply error in Quad SPI IP? The readl_relaxed I mentioned above reads Rx data FIFO.
1) I’ve never ventured down this path, but it looks to me like they are recoverable if the inf->fn returns 0; which means that ghes_notify_sea() must return 0; thus one of the SEA error sources successfully reported an error.
2) I think you need a bit more info. I would start by changing
drivers/acpi/apei/ghes.c:732
from:
rc = ghes_read_estatus(ghes, 0);
to:
rc = ghes_read_estatus(ghes, 1);
which should get you a bit more information when the error happens.
Armed with that information, you need to find out if you have a malfunctioning handler, or a missing one. Either way, this is the place to address it.
3) You are dealing with an ACPI implementation. There are 155 kloc in the kernel plus unknown quantity in the firmware and hardware. The kernel code doesn’t appear to handle whichever condition you are running into. First you need to determine which of these suspects is involved and what interactions are failing before you can dig out the root cause.
Happy Digging!

Call to ExAllocatePoolWithTag never returns

I am having some issues with my virtualHBA driver on Windows Server 2016. A ran the HLK crashdump support test. 3 times out of 10 the test passed. In those 3 failing tests, the crashdump hangs at 0% while taking Complete dump, or Kernel dump or minidump.
By kernel debugging my code, I found that the call to ExAllocatePoolWithTag() for buffer allocation never actually returns.
Below is the statement which never returns.
pDeviceExtension->pcmdbuf=(struct mycmdrsp *)ExAllocatePoolWithTag(NonPagedPoolCacheAligned,pcmdqSignalSize,((ULONG)'TA1'));
I searched on the web regarding this. However, all of the found pages are focusing on this function returning NULL which in my case never returns.
Any help on how to move forward would be highly appreciated.
Thanks in advance.
You can't allocate memory in crash dump mode. You're running at HIGH_LEVEL with interrupts disabled and so you're calling this API at the wrong IRQL.
The typical solution for a hardware adapter is to set the RequestedDumpBufferSize in the PORT_CONFIGURATION_INFORMATION structure during the normal HwFindAdapter call. Then when you're called again in crash dump mode you use the CrashDumpRegion field to get your dump buffer allocation. You then need to write your own "crash dump mode only" allocator to allocate buffers out of this memory region.
It's a huge pain, especially given that it's difficult/impossible to know how much memory you're ultimately going to need. I usually calculate some minimal configuration overhead (i.e. 1 channel, 8 I/O requests at a time, etc.) and then add in a registry configurable slush. The only benefit is that the environment is stripped down so you don't need to be in your all singing, all dancing configuration.

Windows: TCP/IP: force close connection: avoid memleaks in kernel/user-level

A question to windows network programming experts.
When I use pseudo-code like this:
reconnect:
s = socket(...);
// more code...
read_reply:
recv(...);
// merge received data
if(high_level_protocol_error) {
// whoops, there was a deviation from protocol, like overflow
// need to reset connection and discard data right now!
closesocket(s);
goto reconnect;
}
Does kernel un-associate and frees all data "physically" received from NIC(since it must really already be there, in kernel memory, waiting for user-level to read it with recv()), when I closesocket()? Well, it logically should since data is not associated with any internal object anymore, right?
Because I don't really want to waste unknown amount of time for clean shutdown like "call recv() until returns error". That does not make sense: what if it will never return error, say, server continues to send data forever and not closes connection, but that is bad behaviour?
I'm wondering about it since I don't want my application to cause memory leaks anywhere. Is this way of forced resetting connection, that still expected to send in unknown amount of data correct?
// optional addition to question: if this method considered correct for windows, can it be considered correct (with change of closesocket() to close() ) for UNIX-compliant OS?
Kernel drivers in Windows (or any OS really), including tcpip.sys, are supposed to avoid memory leaks in all circumstances, regardless of what you do in user mode. I would think that the developers have charted the possible states, including error states, to make sure that resources aren't leaked. As for user mode, I'm not exactly sure but I wouldn't think that resources are leaked in your process either.
Sockets are just file objects in Windows. When you close the last handle to a file, the IO manager sends a IRP_MJ_CLEANUP message to the driver that owns the file to clean up resources associated with it. The receive buffers associated with the socket would be freed along with the file object.
It does say in the closesocket documentation that pending operations are canceled but that async operations may complete after the function returns. It sounds like closing the socket while in use is a supported scenario and wouldn't lead to a memory leak.
There will be no leak and you are under no obligation to read the stream to EOS before closing. If the sender is still sending after you close it will eventually get a 'connection reset'.

Is it necessary to set hEvent on the OVERLAPPED structure when doing I/O completion ports?

I'm using I/O completion ports on Windows for serial port communication (we will potentially have lots and lots of serial port usage). I've done the usual, creating the IOCP, spinning up the I/O threads, and associating my CreateFile() handle with the IOCP (CreateFile() was called with FILE_FLAG_OVERLAPPED). That's all working fine. I've set the COMMTIMEOUTS all to 0 except ReadIntervalTimeout which is set to MAXDWORD in order to be completely async.
In my I/O thread, I've noticed that GetQueuedCompletionStatus() blocks indefinitely. I'm using an INFINITE timeout. So I put a ReadFile() call right after I associate my handle with the IOCP. Now that causes GetQueuedCompletionStatus() to release immediately for some reason with 0 bytes transferred, but there's no errors (it returns true, GetLastError() reports 0). I obviously want it to block if there's nothing for it to do. If I put another ReadFile() after GetQueuedCompletionStatus(), then another thread in the pool will pick it up with 0 bytes transferred and no errors.
In the examples I've seen and followed, I don't see anyone setting the hEvent on the OVERLAPPED structure when using IOCP. Is that necessary? I don't care to ever block IOCP threads -- so I'll never be interested in CreateEvent(...) | 1.
If it's not necessary, what could be causing the problem? GetQueuedCompletionStatus() needs to block until data arrives on the serial port.
Are there any good IOCP serial port examples out there? I haven't found a complete serial port + IOCP example out there. Most of them are for sockets. In theory, it should work for serial ports, files, sockets, etc.
I figured it out -- I wasn't calling SetCommMask() with EV_RXCHAR | EV_TXEMPTY and then WaitCommEvent() with the OVERLAPPED struct. After I did that, my IOCP threads behaved as expected. GetQueuedCompletionStatus() returned when a new character appeared on the port. I could then call ReadFile().
So to answer the original question: "no, you don't need to set hEvent for IOCP with serial ports."

WSASYSCALLFAILURE with overlapped IO on Windows XP

I hit a bug in my code which uses WSARecv and WSAGetOverlapped result on an overlapped socket. Under heavy load, WSAGetOverlapped returns with WSASYSCALLFAILURE ('A system call that should never fail has failed') and my TCP stream is out of sync afterwards, causing mayhem in the upper levels of my program.
So far I have not been able to isolate it to a given set of hardware or drivers. Has somebody hit this issue as well, and found a solution or workaround?
How many connections, how many pending recvs, how many outsanding sends? What does perfmon or task manager say about the amount of non-paged pool used? How much memory in the box? Does it go away if you run the program on Vista or above? Do you have any LSPs installed?
You could be exhausting non-paged pool and causing a badly written driver to misbehave when it fails to allocate memory. This issue is less likely to bite on Vista or later as the amount of non-paged pool available has increased dramatically (see http://www.lenholgate.com/blog/2009/03/excellent-article-on-non-paged-pool.html for details). Alternatively you might be hitting the "locked pages" limit (you can only lock a fixed number of pages in memory on the OS and each pending I/O operation locks one or more pages depending on buffer size and allocation alignment).
It seems I have solved this issue by sleeping 1ms and retrying the WSAGetOverlapped result when it reports a WSASYSCALLFAILURE.
I had another issue related to overlapped events firing, even though there is no data, which I also had to solve first. The test is now running for over an hour, with a few WSASYSCALLFAILURE handled correctly. Hopefully the overnight test will succeed as well.
#Len: thanks again for your help.
EDIT: The overnight test was successful. My bug was caused by two interdependent issues:
Issue 1: WaitForMultipleObjects in ConnectionSet::select occasionally
signals data on an empty socket, causing SocketConnection::readSync to
deadlock.
Fix: Do a non-blocking read on the first byte of each packet. Reset
ConnectionSet if socket was empty
Issue 2: WSAGetOverlappedResult returns occasionally WSASYSCALLFAILURE,
causing out-of-sync on the TCP stream.
Fix: Retry WSAGetOverlappedResult after a small sleep period.
http://equalizer.svn.sourceforge.net/viewvc/equalizer?view=revision&revision=4649

Resources