Is my case apt for using ZeroMQ? - client

I'm trying to implement a communication system among a variety of devices connected through WiFi.
A Desktop ( Mac / Win / Linux ) serves as a server, whereas mobile phones ( Android / iPhone / Blackberry ), say 50 in number, will be clients.
There should be a client-server as well as client-client 2-way communication.
In client-server communication, I need to access a database in the server.
While surfing about this, I came across ZeroMQ as a high-performance asynchronous messaging library and a better solution for complex Distributed communication system.
Note:
Yeah, I am completely new to this communication and networks, but try learning that. ( Guess the fact is well reflected in the clarity of question :P )
EDIT:
if ZeroMQ seems not to be a good option, suggest me some other means of achieving this.

Yes, ZeroMQ is a great and powerful tool
This does not mean it is the best tool to use for any particular project.
Many other facts matter more, than the built-in code and service-archetypes.
Project's potential for creeping scope, moving sands in diversity of target devices, respective O/S versions, patches, EoL-maintenance/unsupported orphanages
Project plan / vs the Teams' already accrued { ZeroMQ and other-tools } craftmanship
Scaling of the services - from 5, 50, 500, 5000+
Service robustness / { service & transaction }-self-healing strategies
Service risks associated with an absence of any version-{ -control- | -enforcement- }-policy in loosely coupled or even un-controlled domain
Service risks from (non-)-{ -stable | -available } language bindings or wrapper mediators.
One will always learn a lot, once opening the ZeroMQ perspective
There are many points of view, that will help one to better design even non-distributed services. A Zero-copy design rule, a Zero-sharing for performance targets, (almost) Zero-latency, (almost) Zero-overheads for (almost) linear-scaling -- these are just few principles, one may benefit from, if learning ZeroMQ from its ground-Zero-roots.
The best next step I may direct one to, feel free to read ZeroMQ posts here for FF-reading and do not miss to download the great must-read book from Pieter HINTJENS: "Code Connected, Volume 1".
After having ZeroMQ views understood, Nanomsg or any other tool available may give one some additional views ( and one will be then mature and ready also assess the risks / costs to be paid on such grounds ).

Related

What exactly is Software-Defined Networking (SDN)?

I was poring over the docs for Open DayLight, and can't seem to wrap my head around what software-defined networking even is. All the media hype, blogs and articles I can find on SDN are riddled with buzzwords that don't mean anything to me as an engineer. So I ask: What (exactly) is SDN? What are some specific uses cases/problems it solves? Is it:
Just making proprietary networking hardware serve network APIs, thus allowing programs to configure them (instead of IT guys using a console or web interface)?; or
Implementing (traditionally proprietary) networking hardware as software; or
Writing software that somehow integrates with virtual networking hardware used by virtualization platforms (vLANs, vSwitches, etc.)?; or
Something else completely?!?
BONUS: How does Open DayLight fit into this equation here?
First of all, you are right, there is not official definition from NIST or some similar standardization body and the fact that its meaning is fuzzy is exploited by marketing people.
The main point of SDN is that it allows to program network functions with APIs.
In the past, networking devices like switches and routers were only configurable using a proprietary interface (be it vendor specific tools or just the CLI on the device) and there were no APIs which allow to configure OSI L2 - L3 aspects like VLANs and routes but also L6 - L7 aspects like load balancing highly dynamically. Btw. In the case of L6 - L7 functions, the term NVF = Network Virtualized Function seems to be established by now.
This is needed especially for multi tenancy capable virtualized IaaS systems. You can create new VPCs and arrange them together at will. To really isolate tenants from each other, you need to have a L2 isolation and so the same dynamics that is offered for VPCs is propagated to the networking for interconnecting them.
Conclusion: It is about your first bullet with the extension, that the APIs must not necessarily be offered by some hardware appliance, it can also be offered by some pure software implementation.
Regarding OpenDaylight:
It is the OpenStack pendant for SDN. They also actively push integration with OpenStack. They say they are an "open, reference framework for programmability and control through an open source SDN and NFV solution". This means it provides (as you say) a façade for the manfold aspects of networking.
They have all the big names as members which probably means they have the power to establish a de-facto standard like OpenStack did. Members benefit in that they can provide plugins, integrations and adaptations for their products so that they seamlessly integrate with OpenDayligh and you only need to care about a single standard API.
SDN is programmable networks. Different SDN solutions provide different functions in their APIs towards the app developer.
There is a good overview of SDN for software developers here:
https://github.com/BRCDcomm/BVC/wiki/SDN-applications
The most common elements for SDN solutions are
North-bound API: A programming interface used by an application/script to monitor, manage and control the network topology and packet flows within the network.
Network elements: Switching or routing network elements that enforce the rules provided by the application via the north-bound API. These elements may be physical (Cisco, Brocade, Tallac, etc) or virtual (Open VSwitch, Brocade Vyatta vrouter, Cisco 1000, etc) or a combination.
Controller-based solutions have a clustered architectural element (the 'controller') that provides the north-bound api towards applications and an extensible set of south-bound APIs to which network devices connect. Some controllers available today are OpenDaylight, Open Network Operating System (ONOS), Juniper Open Contrail, Brocade Vyatta Controller (ODL distribution), HP VAN Controller and more.
Best rules of thumb to understand an SDN offering:
Read its north-bound API - this tells you what you will be able to monitor, manage and control in your network.
Find out which south-bound APIs it supports - this tells you which switches/routers it might work with.
Some SDN use cases/applications:
DevOps/Admin automation - Applications and scripts that make a network admin or DevOps life easier through automation. OpenStack Neutron is a common example.
Security - HP provides 'Network Protector' that learns the topology of the network and then monitors activity providing alerts and/or remediation of non-compliant behaviors.
Network optimization
Brocade offers 'Traffic Manager' that monitors network utilization and modifies traffic flows in real time to optimize quality based on defined policies.
HP provides 'HP Network Optimizer' that provides an end-to-end voice optimized path for enterprise Microsoft Lync users.
Lyatiss provisions AWS networks in realtime to meet application needs.
Monitoring classroom time-on-task - Elbrys provides an application that provides a teacher with a dashboard to monitor student's time-on-task in real time and cause redirects of individual students to web pages of their choosing. (Disclaimer: I work for Elbrys Networks)
OpenDaylight project proposals page - https://wiki.opendaylight.org/view/Project_Proposals:Main
The concept of SDN is very simple. SDN decouples control-plane (i.e. decision making) from data-plane (the actual forwarding actions) and provides API between them (e.g. OpenFlow API).
Image source: https://www.commsbusiness.co.uk/features/software-defined-networking-sdn-explained/
With SDN architecture, network engineers no longer have to learn proprietary CLI commands for different vendors. They can focus on developing logically centralized control programs to make network global decisions and send it down to network switches (data-plane). Dumped network switches (data-plane) received controller rules/decisions and process network packets accordingly if no decision found they ask the controller.
For example: In SDN architecture routing algorithms developed as a program in the controller, it collects all required metadata (e.g. switches, ports, host connections, links, speed, etc) from the network then make a routing decision for each switch in the network. While in a conventional network, a routing algorithm is implemented in a distributed fashion in all switches (i.e. generally each switch has its own intelligence and makes its own routing decision).
SDN explained by Nick Feamster
Here is a good paper that illustrates the road map to SDN

Replace ZeroMQ's select() on windows

It is unbelievable that ZeroMQ uses select() on Windows, I didn't know that until I have completes my code and started performance test. They should present this information on their web site with big red font.
Is there anyway to replace ZeroMQ's select()?
IOCP is proactor model and can't be easily integrated into it, how about WSAEventSelect, this is also a reactor model and have a near performance like poll.
Another choice for me is http://nanomsg.org/, but it is still alpha.
One of the main objectives in Zeromq is to provide a consistent API for communication between threads, processes, nodes, and clusters. Protocol specific optimization is outside of this scope because of the ways that it can effect other areas of communication. For example, shared memory would be a better form of IPC, but UNIX domain sockets make a consistent API easier. It would also be nice to know when an endpoint disconnects, but how would you implement such behavior between threads?
Their main goal is to allow every pattern to work the same way regardless of topology, protocol, system, or language, to the point that any mixture can be used regardless of how odd it may seem (node.js Websockets communicating with C# brokers passing messages to Ruby and PHP workers which share work with java threads, etc.)
Each of it's features would be enhanced greatly if optimised for each specific protocol and system, but that would also make uniform patterns close to impossible.
BTW, they might accept a pactch if you could find a way to implement iocp while still maintaining this versatility and neutrality.
PPS, nanomsg is made by one of the main original developers of Zeromq. Crossroads.IO is a direct fork of Zeromq, by original Zeromq developers as well and including some developers of nanomsg. if I'm not mistaken, Nano will likely become the core of crossroads when complete.

The Application Split Challenge - fast+easy RPC technology?

the following tries to get an idea of which technologies would be suitable for a specific (as outlined) distributed/RPC problem. If something is not clear, am am very happy to add more details, but please request these in a comment and not in an "answer". Thanks.
First I will describe the current situation, and then follows what we want to achieve and the actual question. Despite this being a rather long post to get some context, the question itself is rather short (see at the end).
The Application Split challenge
Application description:
The app allows the user to configure a number of hardware devices(*)
and then communicate with these to control and collect measurement
channels of a physical experiment.
(*) Hardware devices include temperature sensors, pressure sensors,
motors, ... Communication ranges from serial port communication,
TCP/UDP communication to interfacing with the drivers of 3rd party
plugin cards.
Control involves sending commands to the various hardware devices
to configure them according to the protocols they support.
Measuring involves getting the data from (some of) these devices.
We are hard pressed to keep the whole thing running as customers
demand more and more channels at higher sample rates and we have to
keep up with writing the data+timestamps we get from all devices to
disk, display a subset of the data and still keep the system
responding properly.
Current situation:
[ DisplayAndControl.exe ]
|| /\
|| DLL Interface ||
|| || Window Messages (SendMessage, PostMessage)
|| ||
\/ ||
[ ChannelManager.dll ]
ChannelManager.dll (Native C++ DLL on Windows)
Manages n data channels (physical measurement variables)
Each channel holds a shifting arbitrary number of samples with
high-precision timestamps
Allows to group channels and write their ongoing updates or
historical values ("measurement") to disk
Calculations with channels (arithmetic, integration, mean
values, etc.)
Interfaces with (realtime) hardware devices to get the timestamps
and values of channels
Get value+timestamp from hardware and save in internal
ring buffer for channel
DisplayAndControl.exe (Native C++ MFC App on Windows)
Control the functions of ChannelManager.dll (configure channels
and HW devices)
Live display current values/timestamps/changes of all channels
Graph values of (groups of) channels in diagrams
print diagrams and tables of channel values
Summary of current situation:
The application as it is at the moment is already somewhat modular
in that the (main) executable does the display+interaction and the
(one of several) DLL does the data management (saving of live data
to disk, communication with devices, etc.)
From a performance POV, communication btw. the display module and
the data management module is optimally performant at the moment.
New situation:
[ DisplayAndControl.exe ]
|| /\
|| ? RPC/Messaging ||
|| || ? RPC/Messaging
|| ||
\/ ||
[ ChannelManager.exe (same PC or another) ]
Summary of the envisioned new situation:
For usability, performance and safety reasons, we wish to split up
this Windows app into two separate applications, so that the
performance (and safety) sensitive ChannelManager module can run as
a separate process possibly on a separate Windows PC.
Additionally, since we're already going to split this, we will
allow for multiple DisplayAndControl.exe apps connected to one
single ChannelManager.exe.
One QUESTION now is what technology we should use to facilitate the
communication btw. the now two (or, rather, 1 : small_n) applications.
Performance is important, because a lot of data travels btw. the
two applications and latency should be kept to a minimum. It "only"
needs to work on Windows, but it should be usable from native C++
only which makes all purely .NET based technologies unattractive.
(Note: Porting parts of DisplayAndControl.exe to .NET/WPF is
planned, but ChannelManager.exe should stay pure native, as we
don't want any .NET stuff running inside this process.)
Regarding latency: It is important that we achieve some level of
soft-realtime in the sense that small latency is acceptable, but
large and especially varying latency is not acceptable for usability
and safety reasons. Therefore any protocol that would help in
getting some sort of (soft) realtime behavior would be preferred.
RPC technologies we've looked at:
WCF (or .NET remoting) - Is dotnet only, therefore not
attractive. Performance figures are also not very good.
(D)COM - COM is great for Windows RPC communication, but it
breaks down once you have to have inter-PC comm because it is
horrible to get the security settings working in a corporate IT
network.
CORBA - We have had good experience with CORBA communications in
the past. The communication is easy to get working; there's not
much infrastructure overhead; it works well from C++; writing
a .NET wrapper is pretty trivial. The problem with CORBA is that
it's somewhat complicated to use correctly in C++ (people will use
a lot of time on chasing memory leaks, esp. inexperienced C++
devs). It also will be a learning curve for every developer and
every new developer, as no one expects people to "know" CORBA
nowadays. Also, it might not perform as well as we'd like it to and
as far as I know there's no readily available realtime support.
Thrift - still looks half-baked to use in our scenario.
ICE (from ZeroC) - I would prefer ICE over CORBA anytime, after all
it promises to be a "better CORBA" and I think it does deliver on
that. However, their licensing policy is very suboptimal as they do not sell development licenses but only license per
installation. (Well that's what they told us last time we asked end of 2009.) Their licensing policy also suggests that any 3rd party possibly interested in interfacing with our modules would first have to negotiate a license contract with ZeroC too.
Open MPI - Message Passing interface seems to be targeted at
scenarios with lots of clients "heavily" distributed. Doesn't seem
to fit our problem.
Writing our own communication layer using TCP/UDP - Oh my. I'd
rather not :-)
Google Protocol Buffers - Is not an RPC technology.
Distributed Shared Memory - Well. This got thrown in by a few
devs and I for one am neither sure if there's a working
implementation nor if it fit's our problem.
So again the QUESTION - what "RPC"-like technology would you prefer
in this situation and why?
I can elaborate on Johnny's answer. CORBA provides a robust infrastructure with services that go far beyond simple RPC. As your distributed application grows, you can use CORBA features to manage the mapping between interface and implementation, to provide secure connections, etc. As an RPC, CORBA provides the means for easy synchronous or asynchronous invocations.
The learning curve isn't that steep either. While some of the terms are a little arcane, the concepts such as managed (counted) references should be familiar to today's C++ programmers. And when the C++0x mapping is available, it will be even easier. Training is available to help make this transition even easier.
You mentioned not knowing about realtime support. In fact, CORBA for C++ has rich RT support. There is a RT CORBA specification and several C++ ORBs that implement it. TAO, which is open source and commercially supported, has extensive RT support, including the RT_ORB, RT_POA, an TAO-Specific RT Event service. With these tools you are able to designate priority levels for threads in the ORB, and have separate communication channels for different priority levels.
I'd suggest taking a look at Thrift. While it looks half-baked, I believe it's only the documentation that's lacking - the implementation is quite solid.
CORBA should perform well and there are people with experience. We realize that the IDL to C++ mapping is hard to use, there is a RFP from the OMG asking for a new IDL to C++0x mapping, that should make it much easier to use

Where would I go to learn write code that had to be very, very secure but DOES expose external services (running on a standard Windows or Linux OS)

Where would I go to learn write code that had to be very, very secure and that DOES expose external services (running on a standard Windows or Linux OS). Knowing what services can and cannot be safely exposed would be part of the issue. Note that I am not looking for a favorite choice between Linux and Windows, as the choice is not likely to be mine to make in any given case. However the level of security needs to be military grade.
I almost feel embarressed giving this as a for instance, but how would I know whether or not I could use, say, WCF, in such a setting.
High security is a difficult concept as it generally involves way more than just the code you wrote.
Basically every layer of the OSI model has to be taken into consideration. Things like, preventing capture of the data stream (or it being rerouted) between the end points (quantum cryptography).
At the higher levels, you have things like various things like
Physical security of the devices (all endpoints if possible).
Hardening the OS (e.g: closing ports, turning off unused services, using kerberos, VPN tunnels, and leveraging white lists of machines allowed to connect, etc);
Encrypting the data at rest (file encryption), in transmission (SSL), and in memory (column/table encryption).
Ensuring and enforcing proper authentication and authorization at every level (in app, in sql, etc).
Log EVERYTHING. At a minimal it should answer "who/what/when/where/how"
Along with the logging, Actively Monitor it. aka: intrusion detection.
Then we can move on to other things like looking at other attack vectors like sql injection, xss, internal / disgruntled employees, etc.
And once you've done all of that be prepared when a hacker gets away with everything they want simply by social engineering.
In short, the best tact to take in order to secure any computer related application is to listen to the ethos of Fox Mulder, and Trust No One. Another favorite of mine that applies is: It's only paranoia if they aren't after you.
You could use formal methods to (sort-of) prove the critical parts of your software. A tool like Frama-C (free, LGPL license, targetting embedded systems) could be relevant (at least if your software is critical, embedded, written in C).
But military grade don't mean much. Your client will (and should) define exactly the standards to respect. For instance, critical [civilian] aircraft software needs to follow something like DO-178C (or its predecessor, DO-178B). Different industries have different standards similar to that. (both railways and medical industries have their own standards, which might be different in North America than in Europe).
If your system (& client) is less demanding (i.e. no billion dollars or hundreds lives threatened by bugs) you could consider customizing your compiler or using some other tool. For example, GCC is customizable thru plugins or thru MELT extensions.
Don't forget that software reliability has a big price (that means a big cost for you, hence for your client).
Well, the question of where can be answered simply. Not in school. I suggest to create a learning path for yourself. Pick a technology that you like and learn it inside out. A basic book to get you started should suffice, however the rest of the stuff you learn as you go, or via the documentation of that technology.
For instance - learning under .NET (Microsoft) involves a basic A-Press text-book (i suggest Pro C# and The .NET 4.0 Platform). Thereafter searching through the .NET Framework Reference on MSDN will give you the rest.
If you are looking for WCF reference, I suggest the (MCTS Exam 70-503, Microsoft .NET Framework 3.5 Windows Communication Foundation) and MSDN.
Just keep in mind that not a single technology will achieve what you are looking for. For example: WCF co-mingles with WF (Windows Workflow Foundation), as well as SQL Data Services and Entity Framework. Being exposed to multiple technologies will definitely broaden your vision.
===============================================================================
WCF is a beast in this regard. Here are the advantages over some other means of communication:
Messages (data) passed between end points can be secured via message-level security (encryption). The transport channel chosen can also be secured at protocol level via transport layer security (encryption).
End points themselves can authorize and impersonate clients (client level security). You can implement end-to-end service tracing, health monitoring & performance counters, message logging, as well as forward and backward compatibility with newer/older clients (via graceful degradation of the message format, provided in WCF). If you chose to do so, you can even implement routing as fail-safe for your communications channel. WCF also supports transactions (ACID), concurrency, as well as a per-instance throttling, giving you the most flexibility in writing secure/robust military grade code.
In retrospect the security and flexibility of WCF are astonishing. A similiar technology (if not the same) is the WS-Security spec. It is part of the WS-* specifications for web services and deals with Xml signature and Xml encryption to provide secure communications channel between two end points.
The disadvantages of WS-* however is that it is a one-way means of communication. WCF can facilitate 2 way communication. A client can send a request to a server, but also a server can send requests to the client. WS-* dictates that a client can only send and receive responses to the server, but not vice versa.
I am not a WCF developer so i thought the highlights might provoke you into doing your own research. "There are hundreds of ways to skin an animal, neither of them is wrong..."

How do CPG of Corosync, ZeroMQ, and Spread compare for messaging?

I'm interested in:
Performance
Latency
Throughput
Resource usage (CPU, memory, ...)
High availability
No single point of failure
Features
Transport options
Routing options
Stability
Community
Active development
Widely used
Helpful mailing list, forum, IRC channel, ...
Ease of integration with my current codebase
Gotchas maybe
Any other thing you think I omitted
I've read about them, but I couldn't find a good comparison. Specially I'm interested in performance benchmarks comparing them. (Maybe I should do one on my own! I hope not.)
Well, I haven't used the other two, but can share my experiences with ZeroMQ. In my opinion, it excels at all of yours.
Speed and throughput
It's as fast as TCP, doesn't use CPU or a lot a memory. It can push A LOT of messages very quickly without a sweat. It will saturate your network channel way before you run out of memory (I doubt you'll ever be able to max-out the CPU). There was a comparison to RabbitMQ somewhere and ZMQ outperforms it by a factor of 2. From things I've read around the web it's in use in high speed trading.
RabbitMQ is also a very good tool. Have a look at it - it might be good fit for what you are looking
SPOF
If you design you application properly, then you can have no single point of failure. It's very easy to connect two sockets to another one. So if one of them fails - the other is there to handle the work. There are things like High water marks to help you along the way. Read the ZeroMQ Guide to learn how to design your app without a SPOF.
Transports and routing
Regarding transport options (if I'm understanding this correctly) - it's up to you to define your protocol. ZeroMQ basically promises you that it will deliver this blob of data to the other end. Use JSON, Protocol buffers, Morse code, whatever you like.
There is no built-in routing in like there is in AMQP. Again, it up to you to specify which ZeroMQ socket connects to which, but this is very easy.
Stability
I've been developing with it for a few months (using Python) and haven't found a single issue with its stability. Even when I try to use it the wrong way it just throws a nice error telling me not to do that. Even restarting/killing some of the services and bringing them back up doesn't cause any problems. I'd say it a very stable piece of software.
As a note: always use the latest version - the 2.1 version is very much stability oriented, so many stability issues are resolved in it.
Community
Bindings for more than 20 languages, active mailing list, very good documentation, frequent releases. Anything else?
Integration
Because it's designed as a library it's up to you to design you application (unlike the case with a framework) and it pretty much stands out of your way. It feels a bit like a normal TCP socket, much more powerful and easier to use (it guarantees you that a message will be delivered as a whole, not only the first 128 bytes and the rest later as it the case with regular sockets).
Gotchas
There are some, but they are all documented in the guide. (For example: you might miss the first few messages from a PUB socket when you connect (SUB) to it. There is an explanation to this in the guide and a recipe how to handle it).
Overall
I find this one of the best designed pieces of software - stable, well written, well documented and doesn't stand in my way.
I recommend you to read the guide end-to-end. It's well written, examples in a lot of languages (including C++) and it describes a lot of edge cases and pain points.

Resources