ZeroMQ slow after upgrade - zeromq

We are currently upgrading ZeroMQ from version 2.2.12 to version 4.0.5 and we are finding that the performance is considerably worse since we upgraded.
We have a fairly simple DEALER/DEALER topology with either 1:1 or 1:many connections. We are running a message pump on either end of the connection using polling.
We are using a number of different bindings (ffi-rzmq, clrzmq, jzmq and zmq_cpp) to allow components written in different languages to communicate. All of our components seem to suffer from the same performance problems.
We are running under Windows 7 using loopback (127.0.0.1) TCP sockets.
Has anyone got any ideas of what could be wrong (or even any additional information I'll need to provide here?)

It turns out the the performance isn't slow, it was that messages between certain components where going missing causing the system to misbehave.
This was caused by us using an unsupported socket pair (DEALER / PUSH) and (DEALER / PULL) in a certain part of our system. This worked in ZeroMQ 2 but not 4.
The fix was to replace with with a supported topology (DEALER / DEALER in our case).

Related

Automatic reconnect in case of network failures

I am testing .NET version of ZeroMQ to understand how to handle network failures. I put the server (pub socket) to one external machine and debugging the client (sub socket). If I stop my local Wi-Fi connection for seconds, then ZeroMQ automatically recovers and I even get remaining values. However, if I disable Wi-Fi for longer time like a minute, then it just gets stuck on a frame waiting. How can I configure this period when ZeroMQ is still able to recover? And how can I reconnect manually after, say, several minutes? How can I understand that the socket is locked and I need to kill/open again?
Q :" How can I configure this ... ?"
A :Use the .NET versions of zmq_setsockopt() detailed parameter settings - family of link-management parameters alike ZMQ_RECONNECT_IVL, ZMQ_RCVTIMEO and the likes.
All other questions depend on your code.
If using blocking-forms of the .recv()-methods, you can easily throw yourself into unsalvageable deadlocks, best never block your own code ( why one would ever deliberately lose one's own code domain-of-control ).
If in a need to indeed understand low-level internal link-management details, do not hesitate to use zmq_socket_monitor() instrumentation ( if not available in .NET binding, still may use another language to see details the monitor-instance reports about link-state and related events ).
I was able to find an answer on their GitHub https://github.com/zeromq/netmq/issues/845. Seems that the behavior is by design as I got the same with native zmq lib via .NET binding.

ZeroMQ and actor model

I'm having problems scaling up an application that uses the actor model and zeromq. To put it simply: I'm trying to create thousands of threads that communicate via sockets. Similar to what one would do with a Erlang-type message passing. I'm not doing it for multicore/performance reasons, but because framing it in this way gives me very clean code.
From a philosophical point of view it sounds as if this is what zmq developers would like to achieve, e.g.
http://zeromq.org/whitepapers:multithreading-magic
However, it seems as if there are some practical limitations. At 1024 inproc sockets I start getting the "ZMQError: Too many open files" error. TCP gives me the typical "Assertion failed: fds.size () <= FD_SETSIZE" crash.
Why does inproc sockets have this limit?
To get it to work I've had to group together items to share a socket. Is there a better way?
Is zmq just the wrong tool for this kind of job? i.e. it's still more a network library than an actor message passing library?
ZMQ uses file descriptors as the "resource unit" for inproc connections. There is a limit for file descriptors set by the OS, you should be able to modify that (found several potential avenues for Windows with a quick Google search), though I don't know what the performance impact might be.
It looks like this is related to the ZMQ library using C code that is portable among systems for opening new files, rather than Windows native code that doesn't suffer from this same limitation.

elasticsearch through NEST: what is the recommended way to connect to a cluster of several hosts

I'm starting to work with NEST.
I've seen in a previous question that I should use TryConnect only once at the beginning of the program and then use Connect.
But that seems a bit too naive for a long running system.
What if I have a cluster of say 3 machines and I want to make sure I can connect to any of the 3 machines?
What should be the recommended way of doing that?
Should I:
- Use TryConnect each time and use a different host + port if it fails (downside - an additional roundtrip each time)?
- Try to work with a client and have some retry mechanism to handle failures due to connectivity issues? Maybe implement a connection pool on top of that?
Any other option?
Any suggestions/recommendations?
Sample code?
Thanks for your help,
Ron
Connection pooling is an often requested feature, but due to the many heuristics involved and different approaches NEST does not come with this out of the box. You will have to implement this yourself.
I would not recommend calling TryConnect() before each call as now you are doing two calls instead of one.
Each NEST call returns a IResponse which you can check for IsValid, ConnectionStatus will hold the request and response details.
See also the documentation on handling responses
In 1.0 NEST will start to throw an exception incase of TCP level errors so more generic approaches to connection pooling can be implemented, and nest might come with a separate nuget package implementing one (if anything as reference). See also this discussion https://github.com/Mpdreamz/NEST/pull/224#issuecomment-16347889
Hope this helps for now.
UPDATE this answer is outdated NEST 1.0 ships with connection pool and cluster failover support out of the box: http://nest.azurewebsites.net/elasticsearch-net/cluster-failover.html

Can epoll/libevent/libev work with UDT?

I'm building a high concurrency server which needs to handle tens of thousands of active sockets. I initially used epoll to build a event-based server and it worked well under moderate scale(several thousands of active sockets). But it seems to become unstable when I have a concurrency more than 10,000 sockets. So I'm considering about libevent(or libev) since it's a mature project and claim to be able to "handle tens of thousands of active sockets".
I'm also thinking of using UDT because it's a "reliable UDP" and I start to have problem with TCP due to the overhead and memory usage. So a natural thought is to use libevent as my event frame work and UDT as transmission protocol. I know that UDT provides its own set of epoll operations. Does it mean that it won't be working with regular linux epoll? If so it won't be working with libevent or libev because they are built based on Linux epoll.
Is there anyone who have worked on both UDT and epoll / libevent / libev? Can UDT work with any of them?
Any help wold be appreciated.
Thanks.
UDT exposes a epoll API which can be used to use the protocol with epoll.
see http://udt.sourceforge.net/udt4/doc/epoll.htm for more information.
After some research I figured out that UDT sockets are not file descriptors, so can't be handled with epoll.

Many-to-many messaging on local machine without broker

I'm looking for a mechanism to use to create a simple many-to-many messaging system to allow Windows applications to communicate on a single machine but across sessions and desktops.
I have the following hard requirements:
Must work across all Windows sessions on a single machine.
Must work on Windows XP and later.
No global configuration required.
No central coordinator/broker/server.
Must not require elevated privileges from the applications.
I do not require guaranteed delivery of messages.
I have looked at many, many options. This is my last-ditch request for ideas.
The following have been rejected for violating one or more of the above requirements:
ZeroMQ: In order to do many-to-many messaging a central broker is required.
Named pipes: Requires a central server to receive messages and forward them on.
Multicast sockets: Requires a properly configured network card with a valid IP address, i.e. a global configuration.
Shared Memory Queue: To create shared memory in the global namespace requires elevated privileges.
Multicast sockets so nearly works. What else can anyone suggest? I'd consider anything from pre-packaged libraries to bare-metal Windows API functionality.
(Edit 27 September) A bit more context:
By 'central coordinator/broker/server', I mean a separate process that must be running at the time that an application tries to send a message. The problem I see with this is that it is impossible to guarantee that this process really will be running when it is needed. Typically a Windows service would be used, but there is no way to guarantee that a particular service will always be started before any user has logged in, or to guarantee that it has not been stopped for some reason. Run on demand introduces a delay when the first message is sent while the service starts, and raises issues with privileges.
Multicast sockets nearly worked because it manages to avoid completely the need for a central coordinator process and does not require elevated privileges from the applications sending or receiving multicast packets. But you have to have a configured IP address - you can't do multicast on the loopback interface (even though multicast with TTL=0 on a configured NIC behaves as one would expect of loopback multicast) - and that is the deal-breaker.
Maybe I am completely misunderstanding the problem, especially the "no central broker", but have you considered something based on tuple spaces?
--
After the comments exchange, please consider the following as my "definitive" answer, then:
Use a file-based solution, and host the directory tree on a Ramdisk to insure good performance.
I'd also suggest to have a look at the following StackOverflow discussion (even if it's Java based) for possible pointers to how to manage locking and transactions on the filesystem.
This one (.NET based) may be of help, too.
How about UDP broadcasting?
Couldn't you use a localhost socket ?
/Tony
In the end I decided that one of the hard requirements had to go, as the problem could not be solved in any reasonable way as originally stated.
My final solution is a Windows service running a named pipe server. Any application or service can connect to an instance of the pipe and send messages. Any message received by the server is echoed to all pipe instances.
I really liked p.marino's answer, but in the end it looked like a lot of complexity for what is really a very basic piece of functionality.
The other possibility that appealed to me, though again it fell on the complexity hurdle, was to write a kernel driver to manage the multicasting. There would have been several mechanisms possible in this case, but the overhead of writing a bug-free kernel driver was just too high.

Resources