C++ in Windows: GetLastError code 998 during named pipe communication - windows

I have implemented a named pipe server that communicates with multiple named pipe clients. Generally it works, but in some instances, the Client would not be able to get a valid result from TransactNamedPipe. The GetLastError code returned is 998 (Invalid memory access). Which is weird, because the handle I used for TransactNamedPipe was valid from CreateFile.
I have implemented the client to retry when it detects an error (unless the pipe server is not alive). For other error codes (997, 230, 231) it works fine. But when it encounters error code 998, no matter how many times it retries, the named pipe server does not respond; in the named pipe server logs, it just says that the client disconnected, but there was no data exchange.
What could be the reason behind this? Is it because the client requests are coming from multiple threads and the named pipe server cannot cope with the (almost) simultaneous requests? I also implemented "locks" to prevent simultaneous requests from the client to the named pipe server, but the error still occurs.
I have searched the web for named pipe communication with this similar problem, but so far, no results.
Thanks in advance

This is weird, indeed. I updated to the latest Windows SDK, pointed my project to it, and, without any changes to the code, it now works perfectly. It must have been a bug that's already been fixed. I was using the libs that came with VC++ 9.0.

Related

ZeroMQ assertion failed: socket handle no longer valid for some reason

Got a Windows 10 c++ program using ZeroMQ that aborts very often on the same group of computers due to assertion failures.
The assert statement is buried deep into the libzmq code.
On other machines, the same program runs fine without those problems (but in all fairness, that's with different OS build numbers and program configurations).
The assertion failure seems to happen because internal zeromq (socket and/or pipe based) connection(s)/handles get unexpectedly closed.
What could possibly cause something like that?
More information:
The assertion failure seems to have something to do with the channels/mailboxes that ZeroMQ uses for internal signaling. In older versions of the library this works with several loopback TCP sockets while modern versions rely on a solution involving IOCP (I/O completion ports).
Here's a long standing and possibly related issue where the original author himself talked about a similar crash that happened to him:
https://github.com/zeromq/libzmq/issues/1108
Working with the crash dumps of our application I see that the stack trace leading to the assert statement usually happens at point right after attempting to read from a socket (or socket file descriptor?). The read or receive action fails and then the library panics.
So, suddenly a socket handle no longer seems valid. Examples of errors that I see are "The resource is temporarily unavailable" and things like "Invalid handle/parameter".
Can it be that something or someone is forcefully closing the socket for us?
What could be causing this behavior?
This happens for an old version of zeromq (4.0.10) as well as a modern one (4.3.5). This leads me to believe that the fault is somewhere else if such different implementations fail roughly the same way.
When trying to reproduce the problem I can trigger a similar assertion failure for 4.0.x by manually force closing an internal TCP connection that ZeroMQ uses with TCPView. The resulting assertion failure is instant and the crash dump looks identical to what happens in the wild.
But the modern version doesn't seem to use loopback sockets, so I couldn't close the "private" connections there. Maybe they are using pipes or unix style sockets instead (which is now possible on Windows 10 I have heard).
For a moment I have considered ephemeral port exhaustion as a reason for all this trouble but that alone doesn't make sense to me: I don't expect the OS to force close existing connections, existing connections should keep working. You'd expect only new connections to fail then.
As #user253751 suggested, the culprit seems to be a particular piece of code in the application that closes the same HANDLE twice. A serious bug in our code, not ZeroMQ!
On Windows, closed handles immediately get reused, so anything that is opened right after the first CloseHandle is at risk of being unexpectely closed when the second CloseHandle strikes, due to the bug.

Race condition with Windows sockets

I have a strange problem which occurs only with some specific Windows versions (unfortunately I do not have such a Windows variant but only bug reports from a user which can see this problem).
What I'm doing: I send data via TCP/IP from a client application (on a PC) to an embedded device. This is a continuous stream of data out of one threads context and here always a bunch of data are collected so that the packages have a size of a bit less of multiples of 1460 bytes to have as most as effective TCP-packets.
Now a stop-condition can occur, where one has to react immediately. In this case an other TCP-package is sent to the device using the same network connection but out of an other thread. This package contains a payload of only a few bytes but is sent 3..4 times to ensure reaction of the device.
Both threads use the same sending-function, but it is locked by a mutex so that no concurrent accesses can happen.
Now following problem occurs on this one users PC: when stop-packages have been sent, all following send-attempts to the same socket fail with a system error code of 10054 (remote end hung up). Amazingly the log files of the embedded device clearly say the connection was not closed by it.
Since I can't reproduce this problem, I stab in the dark here.
The problem appears only on this users Windows 7, it does not happen with Linux or with my Windows versions (including a Windows 7). Thus any idea/suggestion/comment is welcome: are there any bugs/conditions known where a Windows TCP/IP connection can fail in this way?
Thanks for every comment and idea!

EnumPrinters() + error RPC_S_SERVER_UNAVAILABLE (1722)

I am working on a sample to get the list of printer connected to machine. For that I am using EnumPrinters() API to get the printers. Randomly it gives the error RPC_S_SERVER_UNAVAILABLE (1722). I tried to search in the net, but I could not get the solution.
Please help me to fix this issue.
How are you calling EnumPrinters (hint - post the code)?
For some modes of API invocation, the local system will RPC to the target servers in turn - this uses RPC, so you can get RPC errors back. You may be able to get the info you need via a less heavyweight call that uses different parameters to EnumPrinters.
From the docs:
when EnumPrinters is called with a
level 2 (PRINTER_INFO_2) data
structure, it performs an OpenPrinter
call on each remote connection. If a
remote connection is down, or the
remote server no longer exists, or the
remote printer no longer exists, the
function must wait for RPC to time out
and consequently fail the OpenPrinter
call. This can take a while.
I had this problem recently with my Windows 10 PC. I spent a lot of time with debugging of EnumPrinters, with all different sorts of levels, but nothing worked and I always got the error RPC_S_SERVER_UNAVAILABLE (1722). It turned out that something has stopped the Spooler service and even after a reboot it was disabled. After enabling the Spooler service, everything worked. You can notice the Spooler service failure by looking at the Win10 printer settings: All printers would show "not connected", even Print to PDF.

NFS Client library

I'm looking for some stand alone library to access NFS shares.
I am not looking for mounting the shares, just browsing and accessing the files for reading.
Preferable something with a simple simple API similar to regular POSIX operations of opendir, scandir, read and etc.
Thanks in advance!
Here's a link to this NFS client library, but it looks promising, to quote:
The NFS client handles only one connection at a time, but no connection takes
very long.
Read requests must be for under 8000 bytes. This has to do with packet size.
You don't want to know.
Once 256 files are open simultaneously -- by all applications, since the client
does not discriminate between requests in any way -- file handles begin to be
overwritten. The client prints an error.
If the client has problems opening sockets it quits gracefully, including
returning a message over the socket to the application. The exception is if
it is given a bad hostname to mount, in which case it just responds with failure
rather than quitting.
If the formatting of the code looks messed up, it's because the code was written
half on a Mac (tab = 4 spaces).
Here is another link that might explain the limitation of the 256 files opened simultaneously here on sourceforge.net, see B3 of the FAQ there on sourceforge...
Edit: Here's a question that was posted here on Stackoverflow in respect to recursively reading a directory that could be easily modified to scandir...
There is now a libnfs library on github: https://github.com/sahlberg/libnfs
I see it has Debian and FreeBSD packages.

Meaning/cause of RPC Exception 'No interfaces have been exported.'

We have a fairly standard client/server application built using MS RPC. Both client and server are implemented in C++. The client establishes a session to the server, then makes repeated calls to it over a period of time before finally closing the session.
Periodically, however, especially under heavy load conditions, we are seeing an RPC exception show up with code 1754: RPC_S_NOTHING_TO_EXPORT.
It appears that this happens in the middle of a session. The user is logged on for a while, making successful calls, then one of the calls inexplicably returns this error. As far as we can tell, the server receives no indication that anything went wrong - and it definitely doesn't see the call the client made.
The error code appears to have permanent implications, as well. Having the client retry the connection doesn't work, either. However, if the user has multiple user sessions active simultaneously between the same client and server, the other connections are unaffected.
In essence, I have two questions:
Does anyone know what RPC_S_NOTHING_TO_EXPORT means? The MSDN documentation simply says: "No interfaces have been exported." ... Huh? The session was working fine for numerous instances of the same call up until this point...
Does anyone have any ideas as to how to identify the real problem? Note: Capturing network traffic is something we would rather avoid, if possible, as the problem is sporadic enough that we would likely go through multiple gigabytes of traffic before running into an occurrence.
Capturing network traffic would be one of the best ways to tackle this issue. If you can't do that, could you dump the client process and debug with WinDBG or Visual Studio? Perhaps compare a dump when operating normally versus in the error state?

Resources