I'm building a distributed application using ZMQ framework that needs to assure the integrity of the packages exchanged. My question is whether or not do I need to perform integrity checks on the client and server on the application layer.
I have implemented a checksum approach using MD5 hash in both client's and server's side. However, I suspect that this might be redundant since zmq might be already handling integrity checks in the background. I have read ZMQ - The guide and found scarce information on this matter rather than small references that indicate that zmq already does integrity checks:
It delivers whole messages exactly as they were sent, using a simple
framing on the wire. If you write a 10k message, you will receive a
10k message.
I also searched in forums, including SO and couldn't found any solid reference that could confirm the reference. I would appreciate if someone could confirm it and ideally include a useful source.
EDIT
I am looking for answers other than "trust the docs" or "implement checksums" for two reasons:
I think that there need to be clear and easy-to-find references to what seems to be one of the key selling points of ZMQ.
The system under design must be fast, thus not wasting time in redundant ops.
Looking at the documentation i read it as the entire purpose of ZeroMO is reliable transmission.
I'd say that it depends on what you're worried about most.
ZMQ is built on top of tcp and other protocols, and these in turn rely on underlying things like IP, Ethernet. When you start getting down towards these physical layers of a network stack, there's integrity checking built in to provide reliable services to the layers above (include application libraries like ZMQ). So, in ordinary circumstances, one would not need to put in your own integrity checks because that's already taken care of for you. ZMQ does not do anything extra so far as I know - it simply assumes the underlying network stack always delivers bytes properly, intact.
However, such underlying integrity checks do not guarantee to eliminate all bit errors, they just get the bit error rate up to some very good level where most applications don't care (e.g. 1 in 10^12) and probably never, ever experience a problem. Adding in a supplementary checksum is going to push the net b.e.r. to even more very safe levels.
If you're worried about some active attack by a malicious third party against the zmtp protocol itself, then you may wish to introduce your own integrity check, likely cryptographic in nature, with all that entails. This might involve using libsodium along with ZeroMQ. That certainly was a thing, and probably still is unless I'm out of date and have missed a deprecation notice.
Summary
I'd say:
Ordinary app, runtime of days, weeks - nothing extra needed
Very long running app (years) and bit errors are utterly unacceptable (e.g. a safety critical application) - add a checksum
Needs to operate in a hostile environment where protocol attacks may occur - add a strong encryption layer like libsodium.
The ZeroMQ (ZMQ) messaging library does not require integrity checks in the application layer. ZMQ provides a set of low-level communication protocols that enable applications to exchange messages with each other in a fast and efficient manner. ZMQ does not include any built-in mechanisms for integrity checks, such as error correction or checksum calculations, at the application layer.
However, this does not mean that applications built on top of ZMQ do not need integrity checks. Depending on the specific requirements and goals of the application, it may be necessary to implement integrity checks in the application layer to ensure the correctness and reliability of the data being exchanged. For example, an application may want to include checksum calculations or error correction mechanisms in order to detect and correct errors in the transmitted data.
In general, the decision to include integrity checks in an application built on top of ZMQ will depend on the specific requirements and goals of the application, as well as the trade-offs between performance, reliability, and complexity. It is up to the developers of the application to determine whether integrity checks are necessary, and to implement appropriate mechanisms to ensure the integrity of the data being exchanged.
Related
If I want to write a node for a P2P application (like Bitcoin, Bitorrent, etc.) there are a lot of parts that are the same:
I need to bootstrap to the network (discover other peers)
I need to manage a list of peers, and monitor their states
I need to retrieve lists of more peers from my neighbour peers
Etc, etc.
Since I don't want to re-invent the wheel, is their a framework that I could as a sort of base library to build on?
You mention both bitcoin and bittorrent, which are quite different, so I'm assuming you don't want to be bound to any specific protocol or even serialization format.
And yet you mention peer-discovery and stats-management which are high-level concerns, be built on top of some network protocol.
But the protocol dictates how such a mechanism would work.
It sortof sounds like you're asking if there are pre-built roofs that would fit on skyscrapers just as well as on a wood cabin.
So if you actually want to design your own protocol you probably should look more at the foundation first.
which language do you want to use
what IO / event processing libraries are available
what protocol parsers and serializers are available
do you aim for throughput? low memory footprint? low latency? minimal amount of programmer-hours spent?
what kind of security is needed? heavy crypto use at the protocol level will need a trustworthy crypto library (don't roll your own!)
what kind of auxiliary things do you need (where does the data go? filesystem? databases? do you need a UI?)
Alternatively, depending how one interprets your question, if you want to write a client for a specific network then you should simply look for a library implementing the core concepts of that specific network while freeing you up to implement the rest of the application.
In bittorrent's case such an example would be libtorrent
I am building a set of programs that consist of multiple clients and a single server.
The clients are frequently pushing small packets of data to the server, which will validate the information (returning an error if the data is invalid), and process the received information. The information may then incur the firing of events, which clients will be subscribed to, allowing for clients to be instantly (or as close as possible) notified (along with a small amount of data).
I have some ideas about how to do this, but I am trying to avoid creating a protocol of my own, mainly as I'm sure it would take forever and I would probably make a few errors. So I was wondering if there are any existing protocols that I could implement into my system that would provide such functionality.
The number of clients will initially be quite small, but will be growing over time to potentially include 1000's of clients (with their own subscriptions), and several front end servers (each one handling a subset of subscriptions) parsing the information back and forth with back end servers for improved capability.
So, if anyone knows of any existing protocols that implement these requirements and functionality, that would be fantastic.
EDIT
I am currently looking at the XMPP protocol, and the JXTA protocol suite (for reference, and implement with another language). Both seem quite good and provide the necessary connectivity, but I have not had the opportunity to test each of them out in my environment, or if they are even suitable for what I am attempting.
Additionally, some of the network clients will be outside of the local network and operating over WAN. Security is not so much of an issue, but I need to take into account the increase latency of this, and firewall rules (local to the connection that is hosting the application and ISP firewalls) that could be blocking certain ports or transport protocols (I have read some text that said that some ISPs where blocking UDP packets, but not sure of how wide this goes. I can do it at home, the office, mobile, friends houses, etc and have yet to experience it myself).
I'm sorry if the following is not exactly what you're after but I am slightly confused by your use of the word 'protocol'. I understand a protocol to be a 'communication specification' only, where the implementation is left entirely to you. If that is the case I always find the the following graphic usefull, link.
If on the other hand you are looking for a solution which allows you to easily implement the networking side of your application, helping save time, then checkout the following network libraries, which implement their own custom protocol:
NetworkComms.Net
Lidgren
ZeroMQ
Disclaimer: I'm a developer for NetworkComms.Net
Question: I have to come up with unique ID for each networked client, such that:
it (ID) should persist once client software is installed on target computer, and should continue to persist if software is re-installed on same computer and same OS installment,
it should not change if hardware configuration is modified in most ways (except changing the motherboard)
When hard drive with client software installed is cloned to another computer with identical hardware configuration (or, as similar as possible), client software should be aware of that change.
A little bit of explanation and some back-story:
This question is basically age old question that also touches the topic of software copy-protection, as some of the mechanisms used in that area are mentioned here. I should be clear at this point that I'm not looking for a copy-protection scheme. Please, read on. :)
I'm working on a client-server software that is supposed to work in a local network. One of the problems I have to solve is to identify each unique client in the network (not so much of a problem), so that I can apply certain attributes to every specific client, retain and enforce those attributes during the deployment lifetime of a specific client.
While I was looking for a solution, I was aware of the following:
Windows activation system uses some kind of heavy fingerprinting mechanism that is extremely sensitive to hardware modifications,
Disk imaging software copies along all Volume IDs (tied to each partition when formatted), and custom, uniquely generated IDs during installation process, during first run, or in any other way, that is strictly software in its nature, and stored in registry or on hard drive, so it's very easy to confuse two.
The obvious choice for this kind of problem would be to find out BIOS identifiers (not 100% sure if this is unique through identical motherboard models, though), as that's the only thing I can rely on that isn't duplicated, transferred by cloning, and that can't be changed (at least not by using some user-space program). Everything else fails as either being not reliable (MAC cloning, anyone?), or too demanding (in terms that it's too sensitive to configuration changes).
Sub-question that I'd like to ask is, am I doing it correctly, architecture-wise? Perhaps there is a better tool for the task that I have to accomplish...
Another approach I had in mind is something similar to a handshake mechanism, where a server maintains an internal lookup table of connected client IDs (which can be even completely software-based and non-unique at any given moment), and tells the client to come up with a different ID during handshake, if a duplicate ID is provided upon connection. That approach, unfortunately, doesn't play nicely with one of the requirements to tie attributes to specific client during lifetime.
It seems to me that you should construct the unique ID corresponding to your requirements. This ID can be constructed as a hash (like MD5, SHA1 or SHA512) from the information which is important for you (some information about software and hardware component).
You can make your solution more secure if you sign such hash with your private key and your software verify during the starting, that the key (signed hash value) is signed (only public key must be installed together with your software). One can expand such kind of solution with different online services, but corporate clients could find online services not so nice.
What you're looking for is the Windows WMI. You can get the motherboard ID (which is unique across the same type of motherboard) or many many other types of unique identifiers and come up with some clever seeded function to generate a UHID. Whoa did I just make up an acronym?
And if you're looking specifically for getting the Motherboard (BIOS) ID:
WMI class: Win32_BIOS
Namespace: \Root\Cimv2
Documentation: http://msdn.microsoft.com/en-us/library/aa394077(VS.85).aspx
Sample code: http://msdn.microsoft.com/en-us/library/aa390423%28VS.85%29.aspx
Edit: You didn't specify a language (and I assumed C++), but this can be done in Java (with a COM driver), and any .NET language, as well.
Many programs use the hostId in order to build a license code (like those based on FlexLM). Have a look at what Matlab does depending on the operative system:
http://www.mathworks.com/support/solutions/en/data/1-171PI/index.html
Also have a look at this question:
Getting a unique id from a unix-like system
Once I also saw some programs basing their licenses on the serial number of the hard drive, an maybe that is the less likely thing to change. Some would suggest to use the MAC of your ethernet card, but that can be reprogrammed.
MAC
DON'T RELY ON MAC! EVER. It is not permanent. The user can easily change it (under 30 seconds).
Volume ID
DON'T RELY ON Volume ID! EVER. It is not permanent. The user can easily change it. It also changes by simply formatting the drive.
WMI
WMI is a service. Can be easily disabled. Actually, I tried that and I find out that on many computers is disabled or broken (yes, quite often broken).
License server
Connection to a validation server may cause you also lots of troubles because:
* your customers may not always be connected to the Internet.
* your customers may connect with special settings (router/NAT/proxy/gateway) that they need to input into your program in order to let it connect to the validation server.
* they may be behind a firewall that will block all programs except a few (my case). In some cases the firewall may not be under their control (valid for MOST corporate users)!
* it is super easy to redirect your program to a local fake webserver that emulates your licensing server.
Hardware data
If you need strong protection you need to rely on hardware. Something that cannot be edited by the user. Something like CPU ID instruction available in the Intel/AMD CPUs and the serial number written into the drive's IDE interface.
The CPU ID and HDD ID are permanent. They will never change, not even after you format the computer and reinstall Windows.
It is doable. For example this library reads the hardware ID of a computer. There is a compiled demo and also sourcecode/DLL. Disclaimer: the link leads to a commercial product (19€/no royalties).
Update: this question is specifically about protecting (encipher / obfuscate) the content client side vs. doing it before transmission from the server. What are the pros / cons on going in an approach like itune's one - in which the files aren't ciphered / obfuscated before transmission.
As I added in my note in the original question, there are contracts in place that we need to comply to (as its the case for most services that implement drm). We push for drm free, and most content providers deals are on it, but that doesn't free us of obligations already in place.
I recently read some information regarding how itunes / fairplay approaches drm, and didn't expect to see the server actually serves the files without any protection.
The quote in this answer seems to capture the spirit of the issue.
The goal should simply be to "keep
honest people honest". If we go
further than this, only two things
happen:
We fight a battle we cannot win. Those who want to cheat will succeed.
We hurt the honest users of our product by making it more difficult to use.
I don't see any impact on the honest users in here, files would be tied to the user - regardless if this happens client or server side. This does gives another chance to those in 1.
An extra bit of info: client environment is adobe air, multiple content types involved (music, video, flash apps, images).
So, is it reasonable to do like itune's fairplay and protect the media client side.
Note: I think unbreakable DRM is an unsolvable problem and as most looking for an answer to this, the need for it relates to it already being in a contract with content providers ... in the likes of reasonable best effort.
I think you might be missing something here. Users hate, hate, hate, HATE DRM. That's why no media company ever gets any traction when they try to use it.
The kicker here is that the contract says "reasonable best effort", and I haven't the faintest idea of what that will mean in a court of law.
What you want to do is make your client happy with the DRM you put on. I don't know what your client thinks DRM is, can do, costs in resources, or if your client is actually aware that DRM can be really annoying. You would have to answer that. You can try to educate the client, but that could be seen as trying to explain away substandard work.
If the client is not happy, the next fallback position is to get paid without litigation, and for that to happen, the contract has to be reasonably clear. Unfortunately, "reasonable best effort" isn't clear, so you might wind up in court. You may be able to renegotiate parts of the contract in the client's favor, or you may not.
If all else fails, you hope to win the court case.
I am not a lawyer, and this is not legal advice. I do see this as more of a question of expectations and possible legal interpretation than a technical question. I don't think we can help you here. You should consult with a lawyer who specializes in this sort of thing, and I don't even know what speciality to recommend. If you're in the US, call your local Bar Association and ask for a referral.
I don't see any impact on the honest users in here, files would be tied to the user - regardless if this happens client or server side. This does gives another chance to those in 1.
Files being tied to the user requires some method of verifying that there is a user. What happens when your verification server goes down (or is discontinued, as Wal-Mart did)?
There is no level of DRM that doesn't affect at least some "honest users".
Data can be copied
As long as client hardware, standalone, can not distinguish between a "good" and a "bad" copy, you will end up limiting all general copies, and copy mechanisms. Most DRM companies deal with this fact by a telling me how much this technology sets me free. Almost as if people would start to believe when they hear the same thing often enough...
Code can't be protected on the client. Protecting code on the server is a largely solved problem. Protecting code on the client isn't. All current approaches come with stingy restrictions.
Impact works in subtle ways. At the very least, you have the additional cost of implementing client-side-DRM (and all follow-up cost, including the horde of "DMCA"-shouting lawyer gorillas) It is hard to prove that you will offset this cost with the increased revenue.
It's not just about code and crypto. Once you implement client-side DRM, you unleash a chain of events in Marketing, Public Relations and Legal. A long as they don't stop to alienate users, you don't need to bother.
To answer the question "is it reasonable", you have to be clear when you use the word "protect" what you're trying to protect against...
For example, are you trying to:
authorized users from using their downloaded content via your app under certain circumstances (e.g. rental period expiry, copied to a different computer, etc)?
authorized users from using their downloaded content via any app under certain circumstances (e.g. rental period expiry, copied to a different computer, etc)?
unauthorized users from using content received from authorized users via your app?
unauthorized users from using content received from authorized users via any app?
known users from accessing unpurchased/unauthorized content from the media library on your server via your app?
known users from accessing unpurchased/unauthorized content from the media library on your server via any app?
unknown users from accessing the media library on your server via your app?
unknown users from accessing the media library on your server via any app?
etc...
"Any app" in the above can include things like:
other player programs designed to interoperate/cooperate with your site (e.g. for flickr)
programs designed to convert content to other formats, possibly non-DRM formats
hostile programs designed to
From the article you linked, you can start to see some of the possible limitations of applying the DRM client-side...
The third, originally used in PyMusique, a Linux client for the iTunes Store, pretends to be iTunes. It requested songs from Apple's servers and then downloaded the purchased songs without locking them, as iTunes would.
The fourth, used in FairKeys, also pretends to be iTunes; it requests a user's keys from Apple's servers and then uses these keys to unlock existing purchased songs.
Neither of these approaches required breaking the DRM being applied, or even hacking any of the products involved; they could be done simply by passively observing the protocols involved, and then imitating them.
So the question becomes: are you trying to protect against these kinds of attack?
If yes, then client-applied DRM is not reasonable.
If no (for example, you're only concerned about people using your app, like Apple/iTunes does), then it might be.
(repeat this process for every situation you can think of. If the adig nswer is always either "client-applied DRM will protect me" or "I'm not trying to protect against this situation", then using client-applied DRM is resonable.)
Note that for the last four of my examples, while DRM would protect against those situations as a side-effect, it's not the best place to enforce those restrictions. Those kinds of restrictions are best applied on the server in the login/authorization process.
If the server serves the content without protection, it's because the encryption is per-client.
That being said, wireshark will foil your best-laid plans.
Encryption alone is usually just as good as sending a boolean telling you if you're allowed to use the content, since the bypass is usually just changing the input/output to one encryption API call...
You want to use heavy binary obfuscation on the client side if you want the protection to literally hold for more than 5 minutes. Using decryption on the client side, make sure the data cannot be replayed and that the only way to bypass the system is to reverse engineer the entire binary protection scheme. Properly done, this will stop all the kids.
On another note, if this is a product to be run on an operating system, don't use processor specific or operating system specific anomalies such as the Windows PEB/TEB/syscalls and processor bugs, those will only make the program even less portable than DRM already is.
Oh and to answer the question title: No. It's a waste of time and money, and will make your product not work on my hardened Linux system.
One of the features of the actor model in Erlang is transparent distribution. Unless I'm misinterpreting, when you send messages between actors, you theoretically shouldn't assume that they are in the same process space or even co-located on the same physical machine.
I've always been under the impression that distributed, fault tolerant systems require careful application design to solve inherent problems around ordering/causality and consensus (among others).
I'm pretty sure that Erlang doesn't promise to solve these classes of problems transparently, so my question is, how do Erlang developers cope with this? Do you design your application as if all the actors are in the same process space and then only solve distribution problems when it comes time to actually distribute them?
If so, is this transparent distribution feature of Erlang really just concerned with the wire protocol used for remote messaging and not really transparent in the sense that a true distributed application still requires careful design in the application layer?
You are correct that erlang does not inherently solve the problems of Ordering/Causality or Consensus. What erlang does abstract for you is the difference between sending messages to local or remote nodes.
I'm not sure it would really be possible to solve those problems in a language design. That more properly belongs in a framework. The OTP framework does have some tools to help with that. Really though it's somewhat dependent on the specific problem you are solving.
For one example of an Erlang VectorClock implementation look at distributerl
Erlang OTP Supervisors also might provide some of the necessary infrastructure for consensus but there is some thought that Consensus is an impossibility in asynchronous message passing distributed systems. See your referenced wiki page for additional information on that.
Erlang does, in fact, solve these problems transparently. It can do this because it is a functional language with immutable (single-assignment) variables. It uses the Actor model for concurrency, and was specifically designed to allow hot-swapping of code and concurrent programming without the programmer having to worry about synchronization.
The Wikipedia article actually has a pretty good description of this. It is my understanding that Ericsson invented the language as a practical way to program massively parallel phone switches.
Erlang promises those things (http://www.sics.se/~joe/thesis/armstrong_thesis_2003.pdf section 3.1 (39-40)):
Everything is a process.
Processes are strongly isolated.
Process creation and destruction is a lightweight operation.
Message passing is the only way for processes to interact.
Processes have unique names.
If you know the name of a process you can send it a message.
Processes share no resources.
Error handling is non-local.
Processes do what they are supposed to do or fail.
and rest is up to you. If you want know why see chapter 2. Shortly, you can send message to process if you know its PID even it is on another piece of HW. You can't be sure if message arrive unless you receive response with common secret. You can be sure that you will receive failure message when process failure when you monitor (or link) it. Those are basic elements with which you can build up what ever you want.