The Application Split Challenge - fast+easy RPC technology? - windows

the following tries to get an idea of which technologies would be suitable for a specific (as outlined) distributed/RPC problem. If something is not clear, am am very happy to add more details, but please request these in a comment and not in an "answer". Thanks.
First I will describe the current situation, and then follows what we want to achieve and the actual question. Despite this being a rather long post to get some context, the question itself is rather short (see at the end).
The Application Split challenge
Application description:
The app allows the user to configure a number of hardware devices(*)
and then communicate with these to control and collect measurement
channels of a physical experiment.
(*) Hardware devices include temperature sensors, pressure sensors,
motors, ... Communication ranges from serial port communication,
TCP/UDP communication to interfacing with the drivers of 3rd party
plugin cards.
Control involves sending commands to the various hardware devices
to configure them according to the protocols they support.
Measuring involves getting the data from (some of) these devices.
We are hard pressed to keep the whole thing running as customers
demand more and more channels at higher sample rates and we have to
keep up with writing the data+timestamps we get from all devices to
disk, display a subset of the data and still keep the system
responding properly.
Current situation:
[ DisplayAndControl.exe ]
|| /\
|| DLL Interface ||
|| || Window Messages (SendMessage, PostMessage)
|| ||
\/ ||
[ ChannelManager.dll ]
ChannelManager.dll (Native C++ DLL on Windows)
Manages n data channels (physical measurement variables)
Each channel holds a shifting arbitrary number of samples with
high-precision timestamps
Allows to group channels and write their ongoing updates or
historical values ("measurement") to disk
Calculations with channels (arithmetic, integration, mean
values, etc.)
Interfaces with (realtime) hardware devices to get the timestamps
and values of channels
Get value+timestamp from hardware and save in internal
ring buffer for channel
DisplayAndControl.exe (Native C++ MFC App on Windows)
Control the functions of ChannelManager.dll (configure channels
and HW devices)
Live display current values/timestamps/changes of all channels
Graph values of (groups of) channels in diagrams
print diagrams and tables of channel values
Summary of current situation:
The application as it is at the moment is already somewhat modular
in that the (main) executable does the display+interaction and the
(one of several) DLL does the data management (saving of live data
to disk, communication with devices, etc.)
From a performance POV, communication btw. the display module and
the data management module is optimally performant at the moment.
New situation:
[ DisplayAndControl.exe ]
|| /\
|| ? RPC/Messaging ||
|| || ? RPC/Messaging
|| ||
\/ ||
[ ChannelManager.exe (same PC or another) ]
Summary of the envisioned new situation:
For usability, performance and safety reasons, we wish to split up
this Windows app into two separate applications, so that the
performance (and safety) sensitive ChannelManager module can run as
a separate process possibly on a separate Windows PC.
Additionally, since we're already going to split this, we will
allow for multiple DisplayAndControl.exe apps connected to one
single ChannelManager.exe.
One QUESTION now is what technology we should use to facilitate the
communication btw. the now two (or, rather, 1 : small_n) applications.
Performance is important, because a lot of data travels btw. the
two applications and latency should be kept to a minimum. It "only"
needs to work on Windows, but it should be usable from native C++
only which makes all purely .NET based technologies unattractive.
(Note: Porting parts of DisplayAndControl.exe to .NET/WPF is
planned, but ChannelManager.exe should stay pure native, as we
don't want any .NET stuff running inside this process.)
Regarding latency: It is important that we achieve some level of
soft-realtime in the sense that small latency is acceptable, but
large and especially varying latency is not acceptable for usability
and safety reasons. Therefore any protocol that would help in
getting some sort of (soft) realtime behavior would be preferred.
RPC technologies we've looked at:
WCF (or .NET remoting) - Is dotnet only, therefore not
attractive. Performance figures are also not very good.
(D)COM - COM is great for Windows RPC communication, but it
breaks down once you have to have inter-PC comm because it is
horrible to get the security settings working in a corporate IT
network.
CORBA - We have had good experience with CORBA communications in
the past. The communication is easy to get working; there's not
much infrastructure overhead; it works well from C++; writing
a .NET wrapper is pretty trivial. The problem with CORBA is that
it's somewhat complicated to use correctly in C++ (people will use
a lot of time on chasing memory leaks, esp. inexperienced C++
devs). It also will be a learning curve for every developer and
every new developer, as no one expects people to "know" CORBA
nowadays. Also, it might not perform as well as we'd like it to and
as far as I know there's no readily available realtime support.
Thrift - still looks half-baked to use in our scenario.
ICE (from ZeroC) - I would prefer ICE over CORBA anytime, after all
it promises to be a "better CORBA" and I think it does deliver on
that. However, their licensing policy is very suboptimal as they do not sell development licenses but only license per
installation. (Well that's what they told us last time we asked end of 2009.) Their licensing policy also suggests that any 3rd party possibly interested in interfacing with our modules would first have to negotiate a license contract with ZeroC too.
Open MPI - Message Passing interface seems to be targeted at
scenarios with lots of clients "heavily" distributed. Doesn't seem
to fit our problem.
Writing our own communication layer using TCP/UDP - Oh my. I'd
rather not :-)
Google Protocol Buffers - Is not an RPC technology.
Distributed Shared Memory - Well. This got thrown in by a few
devs and I for one am neither sure if there's a working
implementation nor if it fit's our problem.
So again the QUESTION - what "RPC"-like technology would you prefer
in this situation and why?

I can elaborate on Johnny's answer. CORBA provides a robust infrastructure with services that go far beyond simple RPC. As your distributed application grows, you can use CORBA features to manage the mapping between interface and implementation, to provide secure connections, etc. As an RPC, CORBA provides the means for easy synchronous or asynchronous invocations.
The learning curve isn't that steep either. While some of the terms are a little arcane, the concepts such as managed (counted) references should be familiar to today's C++ programmers. And when the C++0x mapping is available, it will be even easier. Training is available to help make this transition even easier.
You mentioned not knowing about realtime support. In fact, CORBA for C++ has rich RT support. There is a RT CORBA specification and several C++ ORBs that implement it. TAO, which is open source and commercially supported, has extensive RT support, including the RT_ORB, RT_POA, an TAO-Specific RT Event service. With these tools you are able to designate priority levels for threads in the ORB, and have separate communication channels for different priority levels.

I'd suggest taking a look at Thrift. While it looks half-baked, I believe it's only the documentation that's lacking - the implementation is quite solid.

CORBA should perform well and there are people with experience. We realize that the IDL to C++ mapping is hard to use, there is a RFP from the OMG asking for a new IDL to C++0x mapping, that should make it much easier to use

Related

Is my case apt for using ZeroMQ?

I'm trying to implement a communication system among a variety of devices connected through WiFi.
A Desktop ( Mac / Win / Linux ) serves as a server, whereas mobile phones ( Android / iPhone / Blackberry ), say 50 in number, will be clients.
There should be a client-server as well as client-client 2-way communication.
In client-server communication, I need to access a database in the server.
While surfing about this, I came across ZeroMQ as a high-performance asynchronous messaging library and a better solution for complex Distributed communication system.
Note:
Yeah, I am completely new to this communication and networks, but try learning that. ( Guess the fact is well reflected in the clarity of question :P )
EDIT:
if ZeroMQ seems not to be a good option, suggest me some other means of achieving this.
Yes, ZeroMQ is a great and powerful tool
This does not mean it is the best tool to use for any particular project.
Many other facts matter more, than the built-in code and service-archetypes.
Project's potential for creeping scope, moving sands in diversity of target devices, respective O/S versions, patches, EoL-maintenance/unsupported orphanages
Project plan / vs the Teams' already accrued { ZeroMQ and other-tools } craftmanship
Scaling of the services - from 5, 50, 500, 5000+
Service robustness / { service & transaction }-self-healing strategies
Service risks associated with an absence of any version-{ -control- | -enforcement- }-policy in loosely coupled or even un-controlled domain
Service risks from (non-)-{ -stable | -available } language bindings or wrapper mediators.
One will always learn a lot, once opening the ZeroMQ perspective
There are many points of view, that will help one to better design even non-distributed services. A Zero-copy design rule, a Zero-sharing for performance targets, (almost) Zero-latency, (almost) Zero-overheads for (almost) linear-scaling -- these are just few principles, one may benefit from, if learning ZeroMQ from its ground-Zero-roots.
The best next step I may direct one to, feel free to read ZeroMQ posts here for FF-reading and do not miss to download the great must-read book from Pieter HINTJENS: "Code Connected, Volume 1".
After having ZeroMQ views understood, Nanomsg or any other tool available may give one some additional views ( and one will be then mature and ready also assess the risks / costs to be paid on such grounds ).

Replace ZeroMQ's select() on windows

It is unbelievable that ZeroMQ uses select() on Windows, I didn't know that until I have completes my code and started performance test. They should present this information on their web site with big red font.
Is there anyway to replace ZeroMQ's select()?
IOCP is proactor model and can't be easily integrated into it, how about WSAEventSelect, this is also a reactor model and have a near performance like poll.
Another choice for me is http://nanomsg.org/, but it is still alpha.
One of the main objectives in Zeromq is to provide a consistent API for communication between threads, processes, nodes, and clusters. Protocol specific optimization is outside of this scope because of the ways that it can effect other areas of communication. For example, shared memory would be a better form of IPC, but UNIX domain sockets make a consistent API easier. It would also be nice to know when an endpoint disconnects, but how would you implement such behavior between threads?
Their main goal is to allow every pattern to work the same way regardless of topology, protocol, system, or language, to the point that any mixture can be used regardless of how odd it may seem (node.js Websockets communicating with C# brokers passing messages to Ruby and PHP workers which share work with java threads, etc.)
Each of it's features would be enhanced greatly if optimised for each specific protocol and system, but that would also make uniform patterns close to impossible.
BTW, they might accept a pactch if you could find a way to implement iocp while still maintaining this versatility and neutrality.
PPS, nanomsg is made by one of the main original developers of Zeromq. Crossroads.IO is a direct fork of Zeromq, by original Zeromq developers as well and including some developers of nanomsg. if I'm not mistaken, Nano will likely become the core of crossroads when complete.

multi-client inter-process communication on Windows, VB6

What is the best way for multiple client programs to
communicate with a single server program, all running
on a single Windows computer? All written in VB6.
I'd appreciate recommendations of how you might solve
this problem.
NOTE: we are working on transition to .NET, but have to
add a capability to the V6B version before the .NET will
be ready.
The possibilities include TPC connections, named pipes,
shared memory, messages, files, and more.
A client passes the server a string as input, and the server
combines it with data known only to the server, to generate
another string which is returned to the client. Both strings
are only about 100 characters long. The server is contacted
only when a new file needs to be opened, and so it is a very
low volume of communication... probably a flurry of 10 calls
within 15 seconds, followed by an hour of idle time.
But it is possible that two clients would choose about the
same time to request information. Blocking/Locking are certainly
acceptable, as the server will be done with each request in
well under a second, and several seconds of delay is unimportant
to any of the programs.
The server's algorithm is complex, and for several reasons important
to the application should not be replicated in each helper program.
That is the reason for needing a server.
Background:
I am adding capability to a large existing legacy program.
This single program has several other legacy programs which
act as helpers and are run when the user makes certain
choices. These programs are started with a shell command,
and are not just separate threads. For instance, one helper
loads new data from a DVD drive onto the hard drive. Another
helper just displays a chart of the current positions of
the planets.
This is a LARGE commercial legacy program that happens to be
written in VB6. We are working to convert it and all the
helper programs to .NET, but must first release a new version
under vb6 with this added capability. (Please don't tell me
to not use VB6, as we are already moving elsewhere.)
We need a temporary VB6 solution.
VB6 does TCP and UDP extremely well via the standard Winsock Control component included in Pro and Enterprise Editions. A lot of shadetree coders do seem to struggle with it though. This is probably the most obvious route since the only other native IPC in VB6 would be COM/DCOM and DDE, however MSMQ provided excellent support for VB6 as well.
The downside of IP-based protocols is their limited namespace and resulting high probability of collisions (64K port numbers, many set aside for standard applications, ephemeral port ranges, etc.). They're also somewhat "heavyweight" but considering the vast resources of even the oldest PCs still in service and your light traffic requirements you can ignore that in deciding.
Another option you've considered is Named Pipes.
This offers a number of advantages in your situation. For one thing the namespace is much larger requiring only a unique name, which in the post-Win9x era can be up to 256 characters long making uniqueness fairly easy to achieve. For another, as long as your firewalls permit "File and Print Sharing" you're all set on that front.
Also, for your application you only seem to require an RPC-style mechanism rather than arbitrary bidirectional streams or messages. TransactNamedPipe() calls in your clients might be ideal. Named Pipes work over a LAN, but within one PC they are quite fast and light weight.
While VB6 doesn't come with a Named Pipe component such a thing is fairly easy to create as long as extremely high performance isn't required. You can use Timer-based polling in the server instead of trying to implement overlapped I/O to get asynchronicity. I put one together a couple of years ago and have had good luck with this approach.
I published a fairly stable rendition of this a while back at PipeRPC - RPC Over Named Pipes. There is an older and a somewhat newer version there with examples of use and documentation. As designed, clients make "calls" passing a Byte array of request parameters and receiving back a Byte array of response results. You can also shove Unicode Strings though with no changes, letting the compiler coerce the types.
Just one "drop in" UserControl for both clients and servers.
Looking back at this question:
The server's algorithm is complex, and for several reasons important
to the application should not be replicated in each helper program.
That is the reason for needing a server.
If that's really the concern why not just create a shared DLL that all programs use?
For a one-off upgrade release to an existing VB6 application being moved to a newer platform, I would stress keeping the modification as simple and straightforward as possible. As a result, I wouldn't go down any routes involving shared memory or anything relatively unusual.
A few options, none perfectly simple, but at least some ideas:
Expose a COM object in the server code that performs the translation, and can be consumed by the client apps. The clients instantiate the object from the server as an out-of-process object, and let COM handle all the marshalling, etc.
Does the server have any network awareness? VB6 doesn't do sockets/tcp natively very well, but if you've had a reason to add that in, you might be able to leverage it to perform a socket-based connection and data exchange.
The server and client could each poll a common resource folder for the presence of a specific file that constituted inbound/outbound requests for the translation service you describe. Not very elegant, but it might be the simplest.
Just a few ideas to give you some things to think about. Hope that's helpful in some way. Good luck!

Where would I go to learn write code that had to be very, very secure but DOES expose external services (running on a standard Windows or Linux OS)

Where would I go to learn write code that had to be very, very secure and that DOES expose external services (running on a standard Windows or Linux OS). Knowing what services can and cannot be safely exposed would be part of the issue. Note that I am not looking for a favorite choice between Linux and Windows, as the choice is not likely to be mine to make in any given case. However the level of security needs to be military grade.
I almost feel embarressed giving this as a for instance, but how would I know whether or not I could use, say, WCF, in such a setting.
High security is a difficult concept as it generally involves way more than just the code you wrote.
Basically every layer of the OSI model has to be taken into consideration. Things like, preventing capture of the data stream (or it being rerouted) between the end points (quantum cryptography).
At the higher levels, you have things like various things like
Physical security of the devices (all endpoints if possible).
Hardening the OS (e.g: closing ports, turning off unused services, using kerberos, VPN tunnels, and leveraging white lists of machines allowed to connect, etc);
Encrypting the data at rest (file encryption), in transmission (SSL), and in memory (column/table encryption).
Ensuring and enforcing proper authentication and authorization at every level (in app, in sql, etc).
Log EVERYTHING. At a minimal it should answer "who/what/when/where/how"
Along with the logging, Actively Monitor it. aka: intrusion detection.
Then we can move on to other things like looking at other attack vectors like sql injection, xss, internal / disgruntled employees, etc.
And once you've done all of that be prepared when a hacker gets away with everything they want simply by social engineering.
In short, the best tact to take in order to secure any computer related application is to listen to the ethos of Fox Mulder, and Trust No One. Another favorite of mine that applies is: It's only paranoia if they aren't after you.
You could use formal methods to (sort-of) prove the critical parts of your software. A tool like Frama-C (free, LGPL license, targetting embedded systems) could be relevant (at least if your software is critical, embedded, written in C).
But military grade don't mean much. Your client will (and should) define exactly the standards to respect. For instance, critical [civilian] aircraft software needs to follow something like DO-178C (or its predecessor, DO-178B). Different industries have different standards similar to that. (both railways and medical industries have their own standards, which might be different in North America than in Europe).
If your system (& client) is less demanding (i.e. no billion dollars or hundreds lives threatened by bugs) you could consider customizing your compiler or using some other tool. For example, GCC is customizable thru plugins or thru MELT extensions.
Don't forget that software reliability has a big price (that means a big cost for you, hence for your client).
Well, the question of where can be answered simply. Not in school. I suggest to create a learning path for yourself. Pick a technology that you like and learn it inside out. A basic book to get you started should suffice, however the rest of the stuff you learn as you go, or via the documentation of that technology.
For instance - learning under .NET (Microsoft) involves a basic A-Press text-book (i suggest Pro C# and The .NET 4.0 Platform). Thereafter searching through the .NET Framework Reference on MSDN will give you the rest.
If you are looking for WCF reference, I suggest the (MCTS Exam 70-503, Microsoft .NET Framework 3.5 Windows Communication Foundation) and MSDN.
Just keep in mind that not a single technology will achieve what you are looking for. For example: WCF co-mingles with WF (Windows Workflow Foundation), as well as SQL Data Services and Entity Framework. Being exposed to multiple technologies will definitely broaden your vision.
===============================================================================
WCF is a beast in this regard. Here are the advantages over some other means of communication:
Messages (data) passed between end points can be secured via message-level security (encryption). The transport channel chosen can also be secured at protocol level via transport layer security (encryption).
End points themselves can authorize and impersonate clients (client level security). You can implement end-to-end service tracing, health monitoring & performance counters, message logging, as well as forward and backward compatibility with newer/older clients (via graceful degradation of the message format, provided in WCF). If you chose to do so, you can even implement routing as fail-safe for your communications channel. WCF also supports transactions (ACID), concurrency, as well as a per-instance throttling, giving you the most flexibility in writing secure/robust military grade code.
In retrospect the security and flexibility of WCF are astonishing. A similiar technology (if not the same) is the WS-Security spec. It is part of the WS-* specifications for web services and deals with Xml signature and Xml encryption to provide secure communications channel between two end points.
The disadvantages of WS-* however is that it is a one-way means of communication. WCF can facilitate 2 way communication. A client can send a request to a server, but also a server can send requests to the client. WS-* dictates that a client can only send and receive responses to the server, but not vice versa.
I am not a WCF developer so i thought the highlights might provoke you into doing your own research. "There are hundreds of ways to skin an animal, neither of them is wrong..."

How do CPG of Corosync, ZeroMQ, and Spread compare for messaging?

I'm interested in:
Performance
Latency
Throughput
Resource usage (CPU, memory, ...)
High availability
No single point of failure
Features
Transport options
Routing options
Stability
Community
Active development
Widely used
Helpful mailing list, forum, IRC channel, ...
Ease of integration with my current codebase
Gotchas maybe
Any other thing you think I omitted
I've read about them, but I couldn't find a good comparison. Specially I'm interested in performance benchmarks comparing them. (Maybe I should do one on my own! I hope not.)
Well, I haven't used the other two, but can share my experiences with ZeroMQ. In my opinion, it excels at all of yours.
Speed and throughput
It's as fast as TCP, doesn't use CPU or a lot a memory. It can push A LOT of messages very quickly without a sweat. It will saturate your network channel way before you run out of memory (I doubt you'll ever be able to max-out the CPU). There was a comparison to RabbitMQ somewhere and ZMQ outperforms it by a factor of 2. From things I've read around the web it's in use in high speed trading.
RabbitMQ is also a very good tool. Have a look at it - it might be good fit for what you are looking
SPOF
If you design you application properly, then you can have no single point of failure. It's very easy to connect two sockets to another one. So if one of them fails - the other is there to handle the work. There are things like High water marks to help you along the way. Read the ZeroMQ Guide to learn how to design your app without a SPOF.
Transports and routing
Regarding transport options (if I'm understanding this correctly) - it's up to you to define your protocol. ZeroMQ basically promises you that it will deliver this blob of data to the other end. Use JSON, Protocol buffers, Morse code, whatever you like.
There is no built-in routing in like there is in AMQP. Again, it up to you to specify which ZeroMQ socket connects to which, but this is very easy.
Stability
I've been developing with it for a few months (using Python) and haven't found a single issue with its stability. Even when I try to use it the wrong way it just throws a nice error telling me not to do that. Even restarting/killing some of the services and bringing them back up doesn't cause any problems. I'd say it a very stable piece of software.
As a note: always use the latest version - the 2.1 version is very much stability oriented, so many stability issues are resolved in it.
Community
Bindings for more than 20 languages, active mailing list, very good documentation, frequent releases. Anything else?
Integration
Because it's designed as a library it's up to you to design you application (unlike the case with a framework) and it pretty much stands out of your way. It feels a bit like a normal TCP socket, much more powerful and easier to use (it guarantees you that a message will be delivered as a whole, not only the first 128 bytes and the rest later as it the case with regular sockets).
Gotchas
There are some, but they are all documented in the guide. (For example: you might miss the first few messages from a PUB socket when you connect (SUB) to it. There is an explanation to this in the guide and a recipe how to handle it).
Overall
I find this one of the best designed pieces of software - stable, well written, well documented and doesn't stand in my way.
I recommend you to read the guide end-to-end. It's well written, examples in a lot of languages (including C++) and it describes a lot of edge cases and pain points.

Resources