Monitoring office internet connection for drop outs in Ruby - ruby

I am looking for a simple way to monitor our office internet connection for drop outs. A secondary pipe dream is to also monitor for other 'dodgy' behaviour - packet loss, jitter etc. But the primary goal is to watch for dropped connections. Pinging Google every second is great to keep an eye on latency but we have had a few temporary blips which have caused hell with a few streaming services but have not affected connection latency. The IT department also sometimes decide to block outgoing ICMP traffic which doesn't help with the humble ping tool's efforts.
If this is not something available already via an open source, freeware or commercial tool, ideally I would like to be able to come up with something in Ruby (or, if forced, .NET) which will open a 'long' TCP connection to an arbitrary web server on port 80 (i.e. I don't want to have to write something keeping a socket open on a hosted server) and have the program detect and alert the guys in the office if the connection drops out in a "bad" way. With my attempts using Ruby Socket (http://www.ruby-doc.org/stdlib-1.9.3/libdoc/socket/rdoc/Socket.html) I've had trouble extracting an accurate error code here; ideally I want to isolate actual network connectivity issues from the usual connection timeouts. On a timeout, I'll want to restart the connection silently, but on a real drop out, I'll flash something big and obvious up on screen to alert the guys in the office.
I've spent most of the day googling for examples of this kind of monitoring and trying to hack something together but it seems that it is not a common request. 99% of results are forum posts ending with me being authoritatively informed that speedtest.net will do everything I need. My own attempts have all proven futile - no matter which way I've tried, whenever I seem to be getting somewhere even the most basic drop out test (unplugging the network cable from my laptop!) fails to be detected.
Is this something trivial, and if so could anyone point me in the right direction please? Or am I in for a world of pain? (This has been my general experience whenever I've tried to do anything with network programming in the past...)
Alternatively is there anything pre-written (free, commericial, open source all fine) which will do just this?
Thanks!

Smokeping might do what you want. Nagios might as well.
http://oss.oetiker.ch/smokeping/
http://www.nagios.org/

Related

Automatic reconnect in case of network failures

I am testing .NET version of ZeroMQ to understand how to handle network failures. I put the server (pub socket) to one external machine and debugging the client (sub socket). If I stop my local Wi-Fi connection for seconds, then ZeroMQ automatically recovers and I even get remaining values. However, if I disable Wi-Fi for longer time like a minute, then it just gets stuck on a frame waiting. How can I configure this period when ZeroMQ is still able to recover? And how can I reconnect manually after, say, several minutes? How can I understand that the socket is locked and I need to kill/open again?
Q :" How can I configure this ... ?"
A :Use the .NET versions of zmq_setsockopt() detailed parameter settings - family of link-management parameters alike ZMQ_RECONNECT_IVL, ZMQ_RCVTIMEO and the likes.
All other questions depend on your code.
If using blocking-forms of the .recv()-methods, you can easily throw yourself into unsalvageable deadlocks, best never block your own code ( why one would ever deliberately lose one's own code domain-of-control ).
If in a need to indeed understand low-level internal link-management details, do not hesitate to use zmq_socket_monitor() instrumentation ( if not available in .NET binding, still may use another language to see details the monitor-instance reports about link-state and related events ).
I was able to find an answer on their GitHub https://github.com/zeromq/netmq/issues/845. Seems that the behavior is by design as I got the same with native zmq lib via .NET binding.

Getting specific errors when TCP connections disconnect in Windows

I'm trying to improve the usefulness of the error reporting in a server I am working on. The server uses TCP sockets, and it runs on Windows.
The problem is that when a TCP link drops due to some sort of network failure, the error code that I can get from WSARecv() (or the other Windows socket APIs) is not very descriptive. For most network hiccups, I get either WSAECONNRESET (10054) or WSAETIMEDOUT (10060). But there are about a million things that can cause both of these: the local machine is having a problem, the remote machine or process is having a problem, some intermediate router has a problem, etc. This is a problem because the server operator doesn't have a definitive way to investigate the problem, because they don't necessarily even know where the problem is, or who might be responsible.
At the IP level, it's a different story. If the server operator happens to have a network sniffer attached when something bad happens, it's usually pretty easy to sort of what went wrong. For instance, if an intermediate router sent an ICMP unreachable, the router that sent it will put its IP address in there, and that's usually enough to track it down. Put another way, Windows killed the connection for a reason, probably because it got a specific packet that had a specific problem.
However, a large number of failures are experienced in the field, unexpected. It is not realistic to always have a network sniffer attached to a production server. There needs to be a way to track down problems that happen only rarely, intermittently, or randomly.
How can I solve this problem programmatically?
Is there a way to get Windows to cough up a more specific error message? Is there some easy way to capture and mine recent Windows events (perhaps the one Microsoft Network Monitor uses)? One way I've "solved it" before is to keep dumpcap (from Wireshark) running in ring buffer mode, and force it to stop capturing when a bad event happens, that I can mine later.
I'm also open to the possibility that this is not the right way to solve this problem. For instance, perhaps there is some special Windows mode that can be turned on to cause it to log useful information, that a network administrator could use to track this down after-the-fact.

Simulate slow speed for TCP sockets in Windows

I'm building an application that uses TCP sockets to communicate. I want to test how it behaves under slow-speed conditions.
There are similar question on the site, but as I understand it, they deal with HTTP traffic, or are about Linux. My traffic is not HTTP, just ordinary TCP sockets, and the OS is Windows.
I tried using fiddler's setting for Modem Speed but it didn't work, it seems to work only for HTTP connections.
While it is true that you probably want to invest in an extensive set of unit tests, You can simulate various network conditions using VMWare Workstation:
You will have to install a virtual machine for testing, setup bridged networking (for the vm to access your real network) and upload your code to the vm.
After that you can start changing the settings and see how your application performs.
NetLimiter can also be used, but it has fewer options (in your case, packet loss is very interesting to test and is not available in netlimiter).
There is an excellent utility for Windows that can do throttling and much more:
https://jagt.github.io/clumsy/
I think you're taking the wrong approach here.
You can achieve everything that you need with some well designed unit tests. All of the things that a slow network link causes can be simulated in a unit test environment in controlled conditions.
Things that your code MUST handle to deal with "slow" links are just things that you should be dealing with anyway, including:
The correct handling of fragmented messages. All of your network reading code needs to correctly assume that each read will return between 1 byte and the size of your read buffer. You should never assume that you'll get complete 'messages' as TCP knows nothing of your concept of messages.
TCP flow control causing either your synchronous sends to fail with some form of 'try later' error or your async sends to succeed and potentially use an uncontrolled amount of resources (see here for more details). Note that this can happen even on 'fast' links if you are sending faster than the receiver is consuming.
Timeouts - again this isn't limited to "slow" links. All of your timeout handling code should be robust and tested. You may want to make sure that any read timeout is based on any read completing rather than reading a complete message in x time. You may be getting your data at a slow rate but whilst you're still getting data the link is alive.
Connection failure - again not something specific to "slow" links. You need to know how you deal with connections being reset at any time.
In summary nothing you can achieve by running your client and server on a simulated slow network cannot be achieved with a decent set of unit tests and everything that you would want to test on such a link is something that could affect any of your connections on any speed of link.

Bittorrent protocol 'not available'/'end connection' response?

I like being able to use a torrent app to grab the latest TV show so that I can watch it at my lesiure. The problem is that the structure of the protocol tends to cause a lot of incoming noise on my connection for some time after I close the client. Since I also like to play online games sometimes this means that I have to make sure that my torrent client is shut off about an hour (depending on how long the tracker advertises me to the swarm) before I want to play a game. Otherwise I get a horrible connection to the game because of the persistent flood of incoming torrent requests.
I threw together a small Ruby app to watch the incoming requests so I'd know when the UTP traffic let up:
http://pastebin.com/TbP4TQrK
The thought occurred to me, though, that there may be some response that I could send to notify the clients that I'm no longer participating in the swarm and that they should stop sending requests. I glanced over the protocol specifications but I didn't find anything of the sort. Does anyone more familiar with the protocol know if there's such a response?
Thanks in advance for any advice.
If a bunch of peers on the internet has your IP and think that you're on their swarm, they will try to contact you a few times before giving up. There's nothing you can do about that. Telling them to stop one at a time will probably end up using more bandwidth that just ignoring the UDP packets would.
Now, there are a few things you can do to mitigate it though:
Make sure your client sends stopped requests to all its trackers. This is part of the protocol specification and most clients do this. If this is successful, the tracker won't tell anyone about you past that point. But peers remember having seen you, so it doesn't mean nobody will try to connect to you.
Turn off DHT. The DHT acts much like a tracker, except that it doesn't have the stopped message. It will take something like 15-30 minutes for your IP to time out once it's announced to the DHT.
I think it might also be relevant to ask yourself if these stray incoming 23 byte UDP packets really matter. Presumably you're not flooded by more than a few per second (probably less). Have you made any actual measurements or is it mostly paranoia to wait for them to let up?
I'm assuming you're playing some latency sensitive FPS, in which case the server will most likely blast you with at least 10-50 full MTU packets per second, without any congestion control. I would be surprised if you attract so many bittorrent connection attempts that it would cause any of the game packets to be dropped.

Meaning/cause of RPC Exception 'No interfaces have been exported.'

We have a fairly standard client/server application built using MS RPC. Both client and server are implemented in C++. The client establishes a session to the server, then makes repeated calls to it over a period of time before finally closing the session.
Periodically, however, especially under heavy load conditions, we are seeing an RPC exception show up with code 1754: RPC_S_NOTHING_TO_EXPORT.
It appears that this happens in the middle of a session. The user is logged on for a while, making successful calls, then one of the calls inexplicably returns this error. As far as we can tell, the server receives no indication that anything went wrong - and it definitely doesn't see the call the client made.
The error code appears to have permanent implications, as well. Having the client retry the connection doesn't work, either. However, if the user has multiple user sessions active simultaneously between the same client and server, the other connections are unaffected.
In essence, I have two questions:
Does anyone know what RPC_S_NOTHING_TO_EXPORT means? The MSDN documentation simply says: "No interfaces have been exported." ... Huh? The session was working fine for numerous instances of the same call up until this point...
Does anyone have any ideas as to how to identify the real problem? Note: Capturing network traffic is something we would rather avoid, if possible, as the problem is sporadic enough that we would likely go through multiple gigabytes of traffic before running into an occurrence.
Capturing network traffic would be one of the best ways to tackle this issue. If you can't do that, could you dump the client process and debug with WinDBG or Visual Studio? Perhaps compare a dump when operating normally versus in the error state?

Resources