Automatic reconnect in case of network failures - zeromq

I am testing .NET version of ZeroMQ to understand how to handle network failures. I put the server (pub socket) to one external machine and debugging the client (sub socket). If I stop my local Wi-Fi connection for seconds, then ZeroMQ automatically recovers and I even get remaining values. However, if I disable Wi-Fi for longer time like a minute, then it just gets stuck on a frame waiting. How can I configure this period when ZeroMQ is still able to recover? And how can I reconnect manually after, say, several minutes? How can I understand that the socket is locked and I need to kill/open again?

Q :" How can I configure this ... ?"
A :Use the .NET versions of zmq_setsockopt() detailed parameter settings - family of link-management parameters alike ZMQ_RECONNECT_IVL, ZMQ_RCVTIMEO and the likes.
All other questions depend on your code.
If using blocking-forms of the .recv()-methods, you can easily throw yourself into unsalvageable deadlocks, best never block your own code ( why one would ever deliberately lose one's own code domain-of-control ).
If in a need to indeed understand low-level internal link-management details, do not hesitate to use zmq_socket_monitor() instrumentation ( if not available in .NET binding, still may use another language to see details the monitor-instance reports about link-state and related events ).

I was able to find an answer on their GitHub https://github.com/zeromq/netmq/issues/845. Seems that the behavior is by design as I got the same with native zmq lib via .NET binding.

Related

Simulate slow speed for TCP sockets in Windows

I'm building an application that uses TCP sockets to communicate. I want to test how it behaves under slow-speed conditions.
There are similar question on the site, but as I understand it, they deal with HTTP traffic, or are about Linux. My traffic is not HTTP, just ordinary TCP sockets, and the OS is Windows.
I tried using fiddler's setting for Modem Speed but it didn't work, it seems to work only for HTTP connections.
While it is true that you probably want to invest in an extensive set of unit tests, You can simulate various network conditions using VMWare Workstation:
You will have to install a virtual machine for testing, setup bridged networking (for the vm to access your real network) and upload your code to the vm.
After that you can start changing the settings and see how your application performs.
NetLimiter can also be used, but it has fewer options (in your case, packet loss is very interesting to test and is not available in netlimiter).
There is an excellent utility for Windows that can do throttling and much more:
https://jagt.github.io/clumsy/
I think you're taking the wrong approach here.
You can achieve everything that you need with some well designed unit tests. All of the things that a slow network link causes can be simulated in a unit test environment in controlled conditions.
Things that your code MUST handle to deal with "slow" links are just things that you should be dealing with anyway, including:
The correct handling of fragmented messages. All of your network reading code needs to correctly assume that each read will return between 1 byte and the size of your read buffer. You should never assume that you'll get complete 'messages' as TCP knows nothing of your concept of messages.
TCP flow control causing either your synchronous sends to fail with some form of 'try later' error or your async sends to succeed and potentially use an uncontrolled amount of resources (see here for more details). Note that this can happen even on 'fast' links if you are sending faster than the receiver is consuming.
Timeouts - again this isn't limited to "slow" links. All of your timeout handling code should be robust and tested. You may want to make sure that any read timeout is based on any read completing rather than reading a complete message in x time. You may be getting your data at a slow rate but whilst you're still getting data the link is alive.
Connection failure - again not something specific to "slow" links. You need to know how you deal with connections being reset at any time.
In summary nothing you can achieve by running your client and server on a simulated slow network cannot be achieved with a decent set of unit tests and everything that you would want to test on such a link is something that could affect any of your connections on any speed of link.

Monitoring office internet connection for drop outs in Ruby

I am looking for a simple way to monitor our office internet connection for drop outs. A secondary pipe dream is to also monitor for other 'dodgy' behaviour - packet loss, jitter etc. But the primary goal is to watch for dropped connections. Pinging Google every second is great to keep an eye on latency but we have had a few temporary blips which have caused hell with a few streaming services but have not affected connection latency. The IT department also sometimes decide to block outgoing ICMP traffic which doesn't help with the humble ping tool's efforts.
If this is not something available already via an open source, freeware or commercial tool, ideally I would like to be able to come up with something in Ruby (or, if forced, .NET) which will open a 'long' TCP connection to an arbitrary web server on port 80 (i.e. I don't want to have to write something keeping a socket open on a hosted server) and have the program detect and alert the guys in the office if the connection drops out in a "bad" way. With my attempts using Ruby Socket (http://www.ruby-doc.org/stdlib-1.9.3/libdoc/socket/rdoc/Socket.html) I've had trouble extracting an accurate error code here; ideally I want to isolate actual network connectivity issues from the usual connection timeouts. On a timeout, I'll want to restart the connection silently, but on a real drop out, I'll flash something big and obvious up on screen to alert the guys in the office.
I've spent most of the day googling for examples of this kind of monitoring and trying to hack something together but it seems that it is not a common request. 99% of results are forum posts ending with me being authoritatively informed that speedtest.net will do everything I need. My own attempts have all proven futile - no matter which way I've tried, whenever I seem to be getting somewhere even the most basic drop out test (unplugging the network cable from my laptop!) fails to be detected.
Is this something trivial, and if so could anyone point me in the right direction please? Or am I in for a world of pain? (This has been my general experience whenever I've tried to do anything with network programming in the past...)
Alternatively is there anything pre-written (free, commericial, open source all fine) which will do just this?
Thanks!
Smokeping might do what you want. Nagios might as well.
http://oss.oetiker.ch/smokeping/
http://www.nagios.org/

Many-to-many messaging on local machine without broker

I'm looking for a mechanism to use to create a simple many-to-many messaging system to allow Windows applications to communicate on a single machine but across sessions and desktops.
I have the following hard requirements:
Must work across all Windows sessions on a single machine.
Must work on Windows XP and later.
No global configuration required.
No central coordinator/broker/server.
Must not require elevated privileges from the applications.
I do not require guaranteed delivery of messages.
I have looked at many, many options. This is my last-ditch request for ideas.
The following have been rejected for violating one or more of the above requirements:
ZeroMQ: In order to do many-to-many messaging a central broker is required.
Named pipes: Requires a central server to receive messages and forward them on.
Multicast sockets: Requires a properly configured network card with a valid IP address, i.e. a global configuration.
Shared Memory Queue: To create shared memory in the global namespace requires elevated privileges.
Multicast sockets so nearly works. What else can anyone suggest? I'd consider anything from pre-packaged libraries to bare-metal Windows API functionality.
(Edit 27 September) A bit more context:
By 'central coordinator/broker/server', I mean a separate process that must be running at the time that an application tries to send a message. The problem I see with this is that it is impossible to guarantee that this process really will be running when it is needed. Typically a Windows service would be used, but there is no way to guarantee that a particular service will always be started before any user has logged in, or to guarantee that it has not been stopped for some reason. Run on demand introduces a delay when the first message is sent while the service starts, and raises issues with privileges.
Multicast sockets nearly worked because it manages to avoid completely the need for a central coordinator process and does not require elevated privileges from the applications sending or receiving multicast packets. But you have to have a configured IP address - you can't do multicast on the loopback interface (even though multicast with TTL=0 on a configured NIC behaves as one would expect of loopback multicast) - and that is the deal-breaker.
Maybe I am completely misunderstanding the problem, especially the "no central broker", but have you considered something based on tuple spaces?
--
After the comments exchange, please consider the following as my "definitive" answer, then:
Use a file-based solution, and host the directory tree on a Ramdisk to insure good performance.
I'd also suggest to have a look at the following StackOverflow discussion (even if it's Java based) for possible pointers to how to manage locking and transactions on the filesystem.
This one (.NET based) may be of help, too.
How about UDP broadcasting?
Couldn't you use a localhost socket ?
/Tony
In the end I decided that one of the hard requirements had to go, as the problem could not be solved in any reasonable way as originally stated.
My final solution is a Windows service running a named pipe server. Any application or service can connect to an instance of the pipe and send messages. Any message received by the server is echoed to all pipe instances.
I really liked p.marino's answer, but in the end it looked like a lot of complexity for what is really a very basic piece of functionality.
The other possibility that appealed to me, though again it fell on the complexity hurdle, was to write a kernel driver to manage the multicasting. There would have been several mechanisms possible in this case, but the overhead of writing a bug-free kernel driver was just too high.

WebSphereMQ PCFMessageAgent / PCFAgent - Is it Thread Safe?

I am implementing a monitoring and administrative MQ API using the WebSphereMQ java PCF (Program Control Format) library. What I would like to know is if the PCFAgent and/or the PCFMessageAgent classes are thread safe. The documentation does not make it clear [to me].
If not, then I have 2 choices:
Create a pool of agents
Create (and disconnect) agents on demand.
Any insight into this issue is appreciated.
Cheers.
The important information you seek is probably on this page:
http://publib.boulder.ibm.com/infocenter/wmqv7/v7r0/index.jsp?topic=%2Fcom.ibm.mq.csqzaw.doc%2Fja11160_.htm
The main issue you will see is that the MQQueueManager object (that you either pass in, or is created for you) cannot really do 2 things at once on a single connection.
So if you have one Agent sitting on a get-with-wait waiting for a response to a big query (saying getting full details for thousands of queues) nothing else can be done using that connection until the reply comes back.
Connect/Disconnect are the biggest overhead when talking to MQ, so if you need multiple threaded access I would go with option 1 otherwise you'll pay a big penalty in performance having to wait for connect each time.

Meaning/cause of RPC Exception 'No interfaces have been exported.'

We have a fairly standard client/server application built using MS RPC. Both client and server are implemented in C++. The client establishes a session to the server, then makes repeated calls to it over a period of time before finally closing the session.
Periodically, however, especially under heavy load conditions, we are seeing an RPC exception show up with code 1754: RPC_S_NOTHING_TO_EXPORT.
It appears that this happens in the middle of a session. The user is logged on for a while, making successful calls, then one of the calls inexplicably returns this error. As far as we can tell, the server receives no indication that anything went wrong - and it definitely doesn't see the call the client made.
The error code appears to have permanent implications, as well. Having the client retry the connection doesn't work, either. However, if the user has multiple user sessions active simultaneously between the same client and server, the other connections are unaffected.
In essence, I have two questions:
Does anyone know what RPC_S_NOTHING_TO_EXPORT means? The MSDN documentation simply says: "No interfaces have been exported." ... Huh? The session was working fine for numerous instances of the same call up until this point...
Does anyone have any ideas as to how to identify the real problem? Note: Capturing network traffic is something we would rather avoid, if possible, as the problem is sporadic enough that we would likely go through multiple gigabytes of traffic before running into an occurrence.
Capturing network traffic would be one of the best ways to tackle this issue. If you can't do that, could you dump the client process and debug with WinDBG or Visual Studio? Perhaps compare a dump when operating normally versus in the error state?

Resources