I'm using ZeroMQ PUSH/PULL technique.
The PUSH socket blocks when no PULL sockets available.
What are the different scenarios in which there's packet loss and if possible, how can we tackle them?
Yes, PUSH-side access may block, but need not do so:
distributed systems' designs ought anticipate such state and rather use a nonblocking mode of the .send( payload, zmq.NOBLOCK ) command.
Some other means are further available for a detailed configuration via .setsockopt( ... )
Zero-compromise:
There is nothing as a packet-loss in the ZeroMQ Scalable Formal Communication Pattern design. Either ZeroMQ delivers a complete message-payload, or nothing. There is zero alternative to this, so packet-loss does not match the set of ZeroMQ ideas & principles implemented under the hood. The .Context() instance is capable of saving most of the low level transport & signalisation situations and still re-connect lost abstract-socket transports & deliver the data or deliver nothing, Zero-Compromise on this.
The best next step:
As the messaging landscape is rich of features and new design paradigms, the best next step is to learn more about the technology and about designing such distributed systems. May enjoy other ZeroMQ-related posts here and there linked pdf-book with many design issues and approaches to solutions from Pieter HINTJENS himself. Worth one's time to read it.
Related
I've read the part of the guide that recommends creating one Context. My previous implementation of my application had multiple contexts that I created ad-hoc to get a subscription running. I've since changed it to using a single context for all subscriptions.
What are the drawbacks of creating multiple contexts and what use cases are there for doing so? The guide has this following blurb:
Getting the Context Right
ZeroMQ applications always start by creating a context, and then using that for creating sockets. In C, it’s the zmq_ctx_new() call. You should create and use exactly one context in your process. Technically, the context is the container for all sockets in a single process, and acts as the transport for inproc sockets, which are the fastest way to connect threads in one process. If at runtime a process has two contexts, these are like separate ZeroMQ instances. If that’s explicitly what you want, OK, but otherwise remember
Does this mean that it's just not as efficient to use multiple contexts, but it would still work?
Q : "What are the drawbacks of creating multiple contexts ... ?"
Resources consumed. Nothing else. The more Context()-instances one produces, the more memory-allocated & the more overhead-time was spent on doing that.
One-time add-on costs may represent a drawback - some people forget about the Amdahl's Law (and forget to account for setup & termination add-on costs there) where small amounts of "useful"-work may start to be expensive right due to the (for some, ... it may surprise how often & how many ... hidden) add-on costs in attempts to distribute/parallelise some part of the application workloads, yet need not bother you, if not entering low-latency or ultra-low latency domains. Run-time add-on overheads ( to maintain each of the Context()-instances internal work - yes, it works in the background, so it consumes some CPU-clocks even when doing nothing ) may start doing troubles, when numbers of semi-persistent instances grow higher ( also depends on CPU-microarchitecture & O/S & soft-real-time needs, if present )
Q : "What ... use cases are there for doing so?"
When good software architect designs the code for ultimate performance and tries to shave-off the last few nanoseconds, there we go.
Using well thought & smart-crafted specialised-Context()-engines, the resulting ZeroMQ performance may grow to almost the CPU/memory-I/O based limits. One may like to read more on relative-prioritisation, CPU-core-mappings and other high-performance tricks on doing this, in my evangelisations of ZeroMQ design-principles.
Q : "Does this mean that it's just not as efficient to use multiple contexts, but it would still work?"
The part "it would still work" is easier - it would, if not violating the O/S maximum number of threads permitted and if there is still RAM available to store the actual flow of the messages intended for out-of-platform delivery, which uses additional, O/S-specific, buffers - yes, additional SpaceDOMAIN and TimeDOMAIN add-on latency & latency jitter costs start to appear in doing that.
The Zero-Copy inproc:// TransportClass is capable of actually doing a pure in-memory flag-signalling of memory-mapped Zero-Copy message-data, that never moves. In specific cases, there can be zero-I/O-threads Context()-instances for such inproc://-only low-latency data-"flow" models, as the data is Zero-Copied and never "flow" ;o) ).
Q : "Why does ZeroMQ recommend..."
Well, this seems to be a part of the initial Pieter HINTJENS' & Martin SUSRIK's evangelisation of Zero-Sharing, Zero-blocking designs. That was an almost devilish anti-pattern to the Herd of Nerds, who lock/unlock "shared" resources and were suddenly put to a straight opposite ZeroMQ philosophy of designing smart behaviours (without a need to see under the hood).
The art of the Zen-of-Zero - never share, never block, never copy (if not in a need to do so) was astoundingly astonishing to Nerds, who could not initially realise the advantages thereof (as they were for decades typing in code that was hard to read, hard to rewrite, hard to debug, right due to the heaps of sharing-, locking- and blocking-introduced sections and that they/we were "proud-off" to be the Nerds, who "can", where not all our colleagues were able to decode/understand the less improve).
The "central", able to be globally shared Context()-instance was a sign of light for those, who started to read, learn and use the new paradigm.
After 12+ years this may seem arcane, yet the art of the Zen-of-Zero started with this pain (and a risk of an industry-wide "cultural", not a technical, rejection).
Until today, this was a brave step from both Pieter HINTJENS & Martin SUSTRIK.
Ultimate ~Respect!~ to the whole work they together undertook... for our learning their insights & chances to re-use them in BAU... without an eye-blink.
Great minds.
Currently I am working on migrating TIBCO RV to NATS IO for a project. I am curious to know what internal architecture makes NATS IO to be superior interms of performance as they claim in their website http://nats.io/about/. I couldn't find any resources online explaining internals of nats. Could anyone please help me on this?.
There's a good overview referenced in the protocol documentation to a presentation given by Derek Collison, the creator of NATS. He covers some of the highly performant areas of NATS including the zero allocation byte parser, subject management algorithms, and golang optimizations.
NATS is open source - implementation details can be found in the gnatsd repository. The protocol parser and the subject handling would be a few areas to look at.
I was heavily involved in both RV and in NATS of course. I am not claiming that NATS is faster than RV. Although I designed and built both, I have not tested RV in many years for any type of performance. NATS should compare well, is OSS of course and has a simple TEXT based protocol vs a binary protocol for RV. Also NATS is an overlay design using TCP/IP, similar to TIBCO's EMS which I also designed, however RV can use multicast (PGM) or reliable broadcast. So RV will be more efficient at large fanout in most cases.
In general messaging system's performance is tied to 3 simple things IMO.
How many messages can be processed per IO call or jump from user to kernel space.
How fast can you route messages for distributions, e.g. subject distributors.
How efficient is the system at copying data to achieve #1 coalescing messages.
I have a server/client application, which uses a REQ/REP formal pattern and I know this is synchronous.
Can I completely replace zmq.REQ / zmq.REP by zmq.ROUTER and zmq.DEALER ?
Or do these have to be used only as intermediate proxies?
ZeroMQ is a box with a few smart and powerful building blocks
However, only the Architect and the Designer decide how well or how poor these get harnessed in your distributed applications' architecture.
So, a synchronicity or asynchronicity is not an inherent feature of some particular ZeroMQ Scaleable Formal Communication Pattern's access-node, but depends on real deployment, within some larger context of use.
Yes, ROUTER can talk to DEALER, but ...
as one may read in details in ZeroMQ API-specification tables, so called compatible socket-archetypes are listed for each named socket type, however anyone can grasp much stronger powers from ZeroMQ if trying to start using the ZeroMQ way of thinking by spending more time on the ZeroMQ concept and their set of Zero-maxims -- Zero-copy + (almost) Zero-latency + Zero-warranty + (almost) Zero-scaling degradation etc.
The best next step:
IMHO if you are serious about professional messaging, get the great book and source both the elementary setups knowledge, a bit more complex multi-socket messaging layer designs with soft signaling and also the further thoughts about the great powers of concurrent, heterogeneous, distributed processing to advance your learning curve.
Pieter Hintjens' book "Code Connected, Volume 1" ( available in PDF ) is more than a recommended source for your issue.
There you will get grounds for your further use of ZeroMQ.
ZeroMQ is a great tool, not just for the messaging layer itself. Worth time and efforts.
I'm learning nanomsg and zeromq these days with Golang binding. I've tested with Req-Rep which can work, but is it a proper idea to use this mechanism to build reliable internal server for serving data under high concurrent requests (but from limited client sources < 30)?
Some pseudo code may looks like,
for {
data, conn = socketRep.readData()
go func(data, conn){
result=process(data)
conn.sendReply(result)
conn.close()
}()
}
How to achieve similar communication pattern in nanomsg? Is there any example (C is ok) available?
====UPDATE====
Sorry the question looks too broad. The most important question for me is, "Is there any workable Req/Rep example (C is ok) around?"
The VERY first thing one shall know when deciding on "How to build Reliable Server ... for serving high concurrent load"
Learning any new library is thrilling and makes a lot of new insights.
One very important insight is to undertake just right-sized & REASONABLE challenges alongside one's own learning-curve trajectory.
If Pieter Hintjens, co-father of the ZeroMQ, has written [ almost in bold ] the following remarks right upon entering into a chapter on designing any RELIABLE SERVICE, he knew pretty well WHY to precede forthcoming paragraphs on such designs with a similar highlighted warning...
(cit.:) " ... going to get into unpleasantly complex territory here
that has me getting up for another espresso. You should appreciate
that making "reliable" messaging is complex enough that you always
need to ask, "do we actually need this?" before jumping into it.
If you can get away with unreliable, or "good enough" reliability,
you can make a huge win in terms of cost and complexity. Sure,
you may lose some data now and then. It is often a good trade-off.
..."
Nanomsg is out of doubts a great & smart library Project
The high-level philosophy thoughts brought from Pieter Hintjens books on advanced designs, that build "beyond the elementary ZeroMQ" Scaleable Formal Communication Patterns remains much the same
IMHO it is best to spend a few more weeks on ideas and design-paradigms & stories in both Pieter Hintjens' books, before moving into any coding.
Both a 400+ pages book The ZeroMQ Guide - for Python Developers, Section II., Advanced ZeroMQ, ( namely Chapters 6.2, 6.7, 7.1 and 7.5 )
and
a 300+ pages book "Code Connected Volume 1", ( namely the process of adding reliability in Chapter 5 - be it for the sake of reliability per-se or also for the sake of unlocking next levels of performace via load-balancing over a pool-of-resources )
will help anyone to start exploring this great but thrilling field of distributed system architecture & will help get perspectives needed for designing surviveable aproaches without re-exploring many dead-ends, that have already been proved to be dead-ends.
I've read Lamport's paper on Paxos. I've also heard that it isn't used much in practice, for reasons of performance. What algorithms are commonly used for consensus in distributed systems?
Not sure if this is helpful (since this is not from actual production information), but in our "distributed systems" course we've studied, along with Paxos, the Chandra-Toueg and Mostefaoui-Raynal algorithms (of the latter our professor was especially fond).
Check out the Raft algorithm for a consensus algorithm that is optimized for ease of understanding and clarity of implementation. Oh... it is pretty fast as well.
https://ramcloud.stanford.edu/wiki/display/logcabin/LogCabin
https://ramcloud.stanford.edu/wiki/download/attachments/11370504/raft.pdf
If performance is an issue, consider whether you need all of the strong consistency guarantees Paxos gives you. See e.g. http://queue.acm.org/detail.cfm?id=1466448 and http://incubator.apache.org/cassandra/. Searching on Paxos optimised gets me hits, but I suspect that relaxing some of the requirements will buy you more than tuning the protocol.
The Paxos system I run (which supports really, really big web sites) is halfway in-between Basic-Paxos Multi-paxos. I plan on moving it to a full Multi-Paxos implementation.
Paxos isn't that great as a high-throughput data storage system, but it excels in supporting those systems by providing leader election. For example, say you have a replicated data store where you want a single master for performance reasons. Your data store nodes will use the Paxos system to choose the master.
Like Google Chubby, my system is run as a service and can also store data as configuration container. (I use configuration loosely; I hear Google uses Chubby for DNS.) This data doesn't change as often as user input so it doesn't need high throughput write SLAs. Reading, on the other hand, is extremely quick because it is fully replicated and you can read from any node.
Update
Since writing this, I have upgraded my Paxos system. I am now using a chain-consensus protocol as the primary consensus system. The chain system still utilizes Basic-Paxos for re-configuration—including notifying chain nodes when the chain membership changes.
Paxos is optimal in terms of performance of consensus protocols, at least in terms of the number of network delays (which is often the dominating factor). It's clearly not possible to reliably achieve consensus while tolerating up to f failures without a single round-trip communication to at least (f-1) other nodes in between a client request and the corresponding confirmation, and Paxos achieves this lower bound. This gives a hard bound on the latency of each request to a consensus-based protocol regardless of implementation. In particular, Raft, Zab, Viewstamped Replication and all other variants on consensus protocols all have the same performance constraint.
One thing that can be improved from standard Paxos (also Raft, Zab, ...) is that there is a distinguished leader which ends up doing more than its fair share of the work and may therefore end up being a bit of a bottleneck. There is a protocol known as Egalitarian Paxos which spreads the load out across multiple leaders, although it's mindbendingly complicated IMO, is only applicable to certain domains, and still must obey the lower bound on the number of round-trips within each request. See the paper "There Is More Consensus in Egalitarian Parliaments" by Moraru et al for more details.
When you hear that Paxos is rarely used due to its poor performance, it is frequently meant that consensus itself is rarely used due to poor performance, and this is a fair criticism: it is possible to achieve much higher performance if you can avoid the need for consensus-based coordination between nodes as much as possible, because this allows for horizontal scalability.
Snarkily, it's also possible to achieve better performance by claiming to be using a proper consensus protocol but actually doing something that fails in some cases. Aphyr's blog is littered with examples of these failures not being as rare as you might like, where database implementations have either introduced bugs into good consensus-like protocols by way of "optimisation", or else developed custom consensus-like protocols that fail to be fully correct in some subtle fashion. This stuff is hard.
You should check the Apache Zookeeper project. It is used in production by Yahoo! and Facebook among others.
http://hadoop.apache.org/zookeeper/
If you look for academic papers describing it, it is described in a paper at usenix ATC'10. The consensus protocol (a variant of Paxos) is described in a paper at DSN'11.
Google documented how they did fast paxos for their megastore in the following paper: Link.
With Multi-Paxos when the leader is galloping it can respond to the client write when it has heard that the majority of nodes have written the value to disk. This is as good and efficient as you can get to maintain the consistency guarantees that Paxos makes.
Typically though people use something paxos-like such as zookeeper as an external service (dedicated cluster) to keep critical information consistent (who has locked what, who is leader, who is in a cluster, what's the configuration of the cluster) then run a less strict algorithm with less consistency guarantees which relies upon application specifics (eg vector clocks and merged siblings). The short ebook distributed systems for fun and profit as a good overview of the alternatives.
Note that lots of databases compete on speed by using risky defaults which risk consistency and can loose data under network partitions. The Aphry blog series on Jepson shows whether well know opensouce systems loose data. One cannot cheat the CAP Theorem; if you configure systems for safety then they end up doing about the same messaging and same disk writes as paxos. So really you cannot say Paxos is slow you have to say "a part of a system which needs consistency under network partitions requires a minimum number of messages and disk flushes per operation and that is slow".
There are two general blockchain consensus systems:
Those that produce unambiguous 100% finality given a defined set of
validators
Those which do not provide 100% finality but instead
rely on high probability of finality.
The first generation blockchain consensus algorithms (Proof of Work, Proof of Stake, and BitShares’ Delegated Proof of Stake) only offer high probability of finality that grows with time. In theory someone could pay enough money to mine an alternative “longer” Bitcoin blockchain that goes all the way back to genesis.
More recent consensus algorithms, whether HashGraph, Casper, Tendermint, or DPOS BFT all adopt long-established principles of Paxos and related consensus algorithms. Under these models it is possible to reach unambiguous finality under all network conditions so long as more than ⅔ of participants are honest.
Objective and unambiguous 100% finality is a critical property for all blockchains that wish to support inter-blockchain communication. Absent 100% finality, a reversion on one chain could have irreconcilable ripple effects across all interconnected chains.
The abstract protocol for these more recent protocols involves:
Propose block
All participants acknowledge block (pre-commitment)
All participants acknowledge when ⅔+ have sent them pre-commitments
(commitment)
A block is final once a node has received ⅔+ commitments
Unanimous agreement on finality is guaranteed unless ⅓+
are bad and evidence of bad behavior is available to all
It is the technical differences in the protocols that give rise to real-world impact on user experience. This includes things such as latency until finality, degrees of finality, bandwidth, and proof generation / validation overhead.
Look for more details on delegated proof of stake by eos here
Raft is more understandable, and faster alternative of Paxos. One of the most popular distributed systems which uses Raft is Etcd. Etcd is the distributed store used in Kubernetes.
It's equivalent to Paxos in fault-tolerance.