Getting specific errors when TCP connections disconnect in Windows - windows

I'm trying to improve the usefulness of the error reporting in a server I am working on. The server uses TCP sockets, and it runs on Windows.
The problem is that when a TCP link drops due to some sort of network failure, the error code that I can get from WSARecv() (or the other Windows socket APIs) is not very descriptive. For most network hiccups, I get either WSAECONNRESET (10054) or WSAETIMEDOUT (10060). But there are about a million things that can cause both of these: the local machine is having a problem, the remote machine or process is having a problem, some intermediate router has a problem, etc. This is a problem because the server operator doesn't have a definitive way to investigate the problem, because they don't necessarily even know where the problem is, or who might be responsible.
At the IP level, it's a different story. If the server operator happens to have a network sniffer attached when something bad happens, it's usually pretty easy to sort of what went wrong. For instance, if an intermediate router sent an ICMP unreachable, the router that sent it will put its IP address in there, and that's usually enough to track it down. Put another way, Windows killed the connection for a reason, probably because it got a specific packet that had a specific problem.
However, a large number of failures are experienced in the field, unexpected. It is not realistic to always have a network sniffer attached to a production server. There needs to be a way to track down problems that happen only rarely, intermittently, or randomly.
How can I solve this problem programmatically?
Is there a way to get Windows to cough up a more specific error message? Is there some easy way to capture and mine recent Windows events (perhaps the one Microsoft Network Monitor uses)? One way I've "solved it" before is to keep dumpcap (from Wireshark) running in ring buffer mode, and force it to stop capturing when a bad event happens, that I can mine later.
I'm also open to the possibility that this is not the right way to solve this problem. For instance, perhaps there is some special Windows mode that can be turned on to cause it to log useful information, that a network administrator could use to track this down after-the-fact.

Related

Simulate slow speed for TCP sockets in Windows

I'm building an application that uses TCP sockets to communicate. I want to test how it behaves under slow-speed conditions.
There are similar question on the site, but as I understand it, they deal with HTTP traffic, or are about Linux. My traffic is not HTTP, just ordinary TCP sockets, and the OS is Windows.
I tried using fiddler's setting for Modem Speed but it didn't work, it seems to work only for HTTP connections.
While it is true that you probably want to invest in an extensive set of unit tests, You can simulate various network conditions using VMWare Workstation:
You will have to install a virtual machine for testing, setup bridged networking (for the vm to access your real network) and upload your code to the vm.
After that you can start changing the settings and see how your application performs.
NetLimiter can also be used, but it has fewer options (in your case, packet loss is very interesting to test and is not available in netlimiter).
There is an excellent utility for Windows that can do throttling and much more:
https://jagt.github.io/clumsy/
I think you're taking the wrong approach here.
You can achieve everything that you need with some well designed unit tests. All of the things that a slow network link causes can be simulated in a unit test environment in controlled conditions.
Things that your code MUST handle to deal with "slow" links are just things that you should be dealing with anyway, including:
The correct handling of fragmented messages. All of your network reading code needs to correctly assume that each read will return between 1 byte and the size of your read buffer. You should never assume that you'll get complete 'messages' as TCP knows nothing of your concept of messages.
TCP flow control causing either your synchronous sends to fail with some form of 'try later' error or your async sends to succeed and potentially use an uncontrolled amount of resources (see here for more details). Note that this can happen even on 'fast' links if you are sending faster than the receiver is consuming.
Timeouts - again this isn't limited to "slow" links. All of your timeout handling code should be robust and tested. You may want to make sure that any read timeout is based on any read completing rather than reading a complete message in x time. You may be getting your data at a slow rate but whilst you're still getting data the link is alive.
Connection failure - again not something specific to "slow" links. You need to know how you deal with connections being reset at any time.
In summary nothing you can achieve by running your client and server on a simulated slow network cannot be achieved with a decent set of unit tests and everything that you would want to test on such a link is something that could affect any of your connections on any speed of link.

Monitoring office internet connection for drop outs in Ruby

I am looking for a simple way to monitor our office internet connection for drop outs. A secondary pipe dream is to also monitor for other 'dodgy' behaviour - packet loss, jitter etc. But the primary goal is to watch for dropped connections. Pinging Google every second is great to keep an eye on latency but we have had a few temporary blips which have caused hell with a few streaming services but have not affected connection latency. The IT department also sometimes decide to block outgoing ICMP traffic which doesn't help with the humble ping tool's efforts.
If this is not something available already via an open source, freeware or commercial tool, ideally I would like to be able to come up with something in Ruby (or, if forced, .NET) which will open a 'long' TCP connection to an arbitrary web server on port 80 (i.e. I don't want to have to write something keeping a socket open on a hosted server) and have the program detect and alert the guys in the office if the connection drops out in a "bad" way. With my attempts using Ruby Socket (http://www.ruby-doc.org/stdlib-1.9.3/libdoc/socket/rdoc/Socket.html) I've had trouble extracting an accurate error code here; ideally I want to isolate actual network connectivity issues from the usual connection timeouts. On a timeout, I'll want to restart the connection silently, but on a real drop out, I'll flash something big and obvious up on screen to alert the guys in the office.
I've spent most of the day googling for examples of this kind of monitoring and trying to hack something together but it seems that it is not a common request. 99% of results are forum posts ending with me being authoritatively informed that speedtest.net will do everything I need. My own attempts have all proven futile - no matter which way I've tried, whenever I seem to be getting somewhere even the most basic drop out test (unplugging the network cable from my laptop!) fails to be detected.
Is this something trivial, and if so could anyone point me in the right direction please? Or am I in for a world of pain? (This has been my general experience whenever I've tried to do anything with network programming in the past...)
Alternatively is there anything pre-written (free, commericial, open source all fine) which will do just this?
Thanks!
Smokeping might do what you want. Nagios might as well.
http://oss.oetiker.ch/smokeping/
http://www.nagios.org/

How do I check the destination that a socket is connected to?

If,for example,The socket in my compiled application is designed to connect to 123.456.789.0.
How do I check if its connected to 123.456.789.0? Is there a way to do this?
The idea is this:I want to prevent other people editing my program and changing the address to,for example, 127.0.0.1 and make it connect through a proxy.
Is there any function/way/trick to check the address after the socket is connected?
Use the getpeername function to retrieve the address of the remote host.
If someone edits your program like you mention, they'll probably alter such a check as well though.
nos's comment about the insecurity of this approach is correct, but incomplete. You wouldn't even need to change the program's code to circumvent your proposed mechanism.
The easiest way around it would be to add an IP alias to one of the machine's network interfaces. Then a program can bind to that interface on the port your program connects to, and the OS's network stack will happily send connections to the attacker's local program, not your remote one.
So, now you say you want to know how to list the computer's interfaces so you can detect this sort of subversion. Your opponent counterattacks, launching your program as a sub-process of theirs after installing a Winsock hook that routes Winsock calls back through the parent process.
We then expect to find you asking how to read the executable code section of a particular DLL loaded into your process space, so you can check that the code is what you expect. Now your opponent drops the Winsock shim, switching to an NDIS layer filter, rewriting packets from your program right before they hit the NIC.
Next we find you looking for someone to tell how to list the drivers installed on a Windows system, so you can check that one of these filters isn't present. Your opponent thinks for about 6 seconds and decides to start screwing with packet routing, selecting one of at least three different attacks I can think of off the top of my head. (No, wait, four.)
I'm not a security expert. Yet, I've spent five minutes on this and already have your security beat seven different ways.
Are you doomed? Maybe, maybe not.
Instead of you coming up with fixes to the risks you can see, better to post a new question saying what it is you're trying to protect, and have the experts comment on risks and possible fixes. (Don't add it here. Your question is already answered, correctly, by nos. This is a different question.)
Security is hard. Expertise counts for far more in that discipline than in most other areas of computer science.

Handling possible errors with network drive file I/O

I'm trying to make file I/O over a network drive (likely over a WAN or VPN) as reliable as possible for a native C++ Windows app...
What are the possible error conditions that I need to be able to handle?
How can I simulate these error conditions in testing?
How do I get detailed information on a particular error? For example, if fopen() fails, does errno tell me everything I need to know, or do I need to get at the GetLastError() value?
How do I reliably distinguish between "network drive access fully functional but the file doesn't exist" and various problems with the network or server?
One particular error condition that I've noticed on my desktop (not specific to the app we're developing) is that sometimes the first attempt to access a file on a network drive will fail, but it presumably causes the drive to be reconnected in the background, because subsequent connections work. I don't know what causes this. This is an example of the kind of error condition that I want to properly handle.
EDIT: This is for a legacy distributed application that uses files on network shares for communication between nodes. Some nodes may be unattended, so passing the error on to the end user may not be an option. The long term goal is to switch to a better protocol, but in the short term I'd like to make the file I/O as reliable as possible.
I believe you're approaching this from a wrong perspective. There's little one can do in the application itself to improve what is essentially a network filesystem driver problem, perhaps except implementing the networked I/O itself. That being said, you should be better off choosing a suitable networked filesystem for your needs. Look at this on Wikipedia.
Generally, your application should behave like the file is locally-stored. Don't try too hard to handle network problems. But if your choice of a network filesystem is good, then these problems can be automatically mitigated.
So I'd say you should settle with checking errno in case of errors. Perhaps fallback on local storage in case writing a remote file fails (if the networked file system doesn't handle this itself).

Meaning/cause of RPC Exception 'No interfaces have been exported.'

We have a fairly standard client/server application built using MS RPC. Both client and server are implemented in C++. The client establishes a session to the server, then makes repeated calls to it over a period of time before finally closing the session.
Periodically, however, especially under heavy load conditions, we are seeing an RPC exception show up with code 1754: RPC_S_NOTHING_TO_EXPORT.
It appears that this happens in the middle of a session. The user is logged on for a while, making successful calls, then one of the calls inexplicably returns this error. As far as we can tell, the server receives no indication that anything went wrong - and it definitely doesn't see the call the client made.
The error code appears to have permanent implications, as well. Having the client retry the connection doesn't work, either. However, if the user has multiple user sessions active simultaneously between the same client and server, the other connections are unaffected.
In essence, I have two questions:
Does anyone know what RPC_S_NOTHING_TO_EXPORT means? The MSDN documentation simply says: "No interfaces have been exported." ... Huh? The session was working fine for numerous instances of the same call up until this point...
Does anyone have any ideas as to how to identify the real problem? Note: Capturing network traffic is something we would rather avoid, if possible, as the problem is sporadic enough that we would likely go through multiple gigabytes of traffic before running into an occurrence.
Capturing network traffic would be one of the best ways to tackle this issue. If you can't do that, could you dump the client process and debug with WinDBG or Visual Studio? Perhaps compare a dump when operating normally versus in the error state?

Resources