ZeroMQ: are PUB/SUB topic subscriptions cheap? - zeromq

Problem: I have a number of file uploads coming via HTTP in parallel ( uploads receiver ). I'm storing them temporarily on a local disk. Another process ( uploads submitter ) gets notified about new uploads and does specific processing ( parsing, extracting metadata, uploading to S3 etc ). Once upload processing done I want uploads receiver to be notified by submitter to reply back with status ( whether submission is ok or error ) to the remote uploader. Using ZeroMQ PUB/SUB pattern, what would be better:
subscribe all upload receiver threads to a single topic. Each
receiver thread would have to filter messages based on upload id or
something to find a notification that belongs to it.
subscribe each receiver thread to a new topic which represents
particular upload. This one seems more reasonable assuming topics are
cheap in ZeroMQ, i.e. not much resources is needed to keep them and they
can be auto-expired. I expect new uploads to come at dozens of
files per second, single upload processing may take up to several
seconds so theoretically I can have up to thousand of topics active
at the same moment of time. Also I may not always be able to
unsubscribe due to various failure modes.

Initial notice: On Using Different ZeroMQ Version Numbers:
While more recent versions may use PUB-side topic filtering, the early ZeroMQ versions did use SUB-side approach, which means that all the ( network ) message-transport traffic goes to all SUB-s as an acceptable penalty for distributing the processing-workload, that would otherwise be needed to get handled at lowest possible latency on the PUB-side.
This is important for cases, where in an open distributed system association the homogenity of versions is not enforceable.
Whereas you design architecture seems to be co-located on the same <localhost> the performance impact remains non-distributed ( concentrated ) and may implicate just some limited latency/priority tweaking, if overall bottleneck appears during this Use-Case up-scaling.
On Scaleability Ranges - Limits are still farther than your Use-Case:
As Martin Sustrik ( ZeroMQ co-father ) presented in details, ZeroMQ was designed with expected scales up to some small tens of thousands:
(cit.:) " Efficient Subscription Matching
In ZeroMQ, simple tries are used to store and match PUB/SUB subscriptions. The subscription mechanism was intended for up to 10,000 subscriptions where simple trie works well. However, there are users who use as much as 150,000,000 subscriptions. In such cases there's a need for a more efficient data structure. "
Further details on design & scaling might be found interesting in this Martin's post.
The Best Next Step?
A fair approach would be to mock-up each of the questioned approaches and benchmark them, scaled to { 1.0x , 1.5x, 2.0x, 5.0x } of the expected static scales in-vitro to have quantitatively supported data about real overheads, performance and latencies relevant to the alternative strategies under review.
Anyway, Vovan, enjoy the worlds of smart signalling/messaging in the distributed processing.

Related

Performance of Nats Jetstream

I'm trying to understand how Nats Jetstream scales and have a couple of questions.
How efficient is subscribing by subject to historic messages? For example lets say have a stream foo that consists of 100 million messages with a subject of foo.bar and then a single message with a subject foo.baz. If I then make a subscription to foo.baz from the start of the stream will something on the server have to perform a linear scan of all messages in foo or will it be able to immediately seek to the foo.baz message.
How well does the system horizontally scale? I ask because I'm having issues getting Jetstream to scale much above a few thousand messages per second, regardless of how many machines I throw at it. Test parameters are as follows:
Nats Server 2.6.3 running on 4 core 8GB nodes
Single Stream replicated 3 times (disk or in-memory appears to make no difference)
500 byte message payloads
n publishers each publishing 1k messages per second
The bottleneck appears to be on the publishing side as I can retrieve messages at least as fast as I can publish them.
Publishing in NATS JetStream is slightly different than publishing in Core NATS.
Yes, you can publish a Core NATS message to a subject that is recorded by a stream and that message will indeed be captured in the stream, but in the case of the Core NATS publication, the publishing application does not expect an acknowledgement back from the nats-server, while in the case of the JetStream publish call, there is an acknowledgement sent back to the client from the nats-server that indicates that the message was indeed successfully persisted and replicated (or not).
So when you do js.Publish() you are actually making a synchronous relatively high latency request-reply (especially if your replication is 3 or 5, and more so if your stream is persisted to file, and depending on the network latency between the client application and the nats-server), which means that your throughput is going to be limited if you are just doing those synchronous publish calls back to back.
If you want throughput of publishing messages to a stream, you should use the asynchronous version of the JetStream publish call instead (i.e. you should use js.AsyncPublish() that returns a PubAckFuture).
However in that case you must also remember to introduce some amount of flow control by limiting the number of 'in-flight' asynchronous publish applications you want to have at any given time (this is because you can always publish asynchronously much much faster than the nats-server(s) can replicate and persist messages.
If you were to continuously publish asynchronously as fast as you can (e.g. when publishing the result of some kind of batch process) then you would eventually overwhelm your servers, which is something you really want to avoid.
You have two options to flow-control your JetStream async publications:
specify a max number of in-flight asynchronous publication requests as an option when obtaining your JetStream context: i.e. js = nc.JetStream(nats.PublishAsyncMaxPending(100))
Do a simple batch mechanism to check for the publication's PubAcks every so many asynchronous publications, like nats bench does: https://github.com/nats-io/natscli/blob/e6b2b478dbc432a639fbf92c5c89570438c31ee7/cli/bench_command.go#L476
About the expected performance: using async publications allows you to really get the throughput that NATS and JetStream are capable of. A simple way to validate or measure performance is to use the nats CLI tool (https://github.com/nats-io/natscli) to run benchmarks.
For example you can start with a simple test: nats bench foo --js --pub 4 --msgs 1000000 --replicas 3 (in memory stream with 3 replicas 4 go-routines each with it's own connection publishing 128 byte messages in batches of 100) and you should get a lot more than a few thousands messages per second.
For more information and examples of how to use the nats bench command you can take a look at this video: https://youtu.be/HwwvFeUHAyo
Would be good to get an opinion on this. I have a similar behaviour and the only way to achieve higher throughput for publishers is to lower replication (from 3 to 1) but that won't be an acceptable solution.
I have tried adding more resources (cpu/ram) with no success on increasing the publishing rate.
Also, scaling horizontally did not make any difference.
In my situation , i am using Bench tool to publish to js.
For an R3 filestore you can expect ~250k small msgs per second. If you utilize synchronous publish that will be dominated by RTT from the application to the system, and from the stream leader to the closest follower. You can use windowed intelligent async publish to get better performance.
You can get higher numbers with memory stores, but again will be dominated by RTT throughout the system.
If you give me a sense of how large are your messages we can show you some results from nats bench against the demo servers (R1) and NGS (R1 & R3).
For the original question regarding filtered consumers, >= 2.8.x will not do a linear scan to retrieve foo.baz. We could also show an example of this as well if it would help.
Feel free to join the slack channel (slack.nats.io) which is a pretty active community. Even feel free to DM me directly, happy to help.

What would be the right ZMQ Pattern?

I am trying to build a ZeroMQ pattern where,
There can be many clients connecting to a single server endpoint
Server will distribute incoming client tasks to available workers (will be mapped to the number of cores on the server)
These tasks are long running (in hours) and need to perform a lot of local I/O
During each task execution (iteration) there will be data/messages (potentially in order of [GB]s) sent back and forth between the client and the server worker
Client and server workers need to know if there are failures/errors on the peer side, so that they can recover (retry) or shutdown gracefully and try later
Based on the above, I presume that the ROUTER/DEALER pattern would be useful. PUB/SUB is discarded as I need to know if the peer fails.
I tried using various combinations of the ROUTER/DEALER pattern but I am unable to ensure that multiple messages from a client reach the same worker within an iteration. I understand that I need to implement a broker/forwarder/device that routes the incoming messages to the right recipient/handler/worker. But I am unable to map the frontend and backend sockets in the broker. I am looking at MajorDomo pattern, but I guess there has to be a simpler broker model that could just route the messages to the assigned worker. (not really get into services)
I am looking for some examples, if there are any or any guidance on what I may be missing. I am trying to build this in Golang.
Q : "What would be the right ZMQ Pattern?"
Based on the complex composition of all the requirements posted under items 1 - 5, I dare to say, The Right would be NOT to use a single one of the standard, built-in, ZeroMQ trivial primitive Communication Archetype Patterns, but to rather create a multi-layered application-specific composition of a ( M + N + 1 hot-standby robust-enough?) (self-resilient?) Signalling-Messaging infrastructure, that covers all your current ( and possibly extensible for any future one ) application-level requirements, like depicted here for a way simpler distributed-computing use-case, where but a trivial remote-SigKILL was implemented.
Yes, the best would be to create ( and maintain ) your own formalised signalling, that the application level can handle and interact across -- like the heart-beating for detecting dead-worker(s) + permitting to re-instate such failed jobs right on-detected failures (most probably re-located and/or re-scheduled to take place & respective resources not statically pre-mapped, but where physically most feasible at the re-instating moment of time - so even more telemetry signalling will help you decide about the re-instating of the such failed micro-jobs).
ZeroMQ is a fabulous framework right for such complex signalling and messaging hierarchies, so your System Architect's imagination is the only ceiling in this concept.
ZeroMQ will take the rest and do all the hard work nice and easily.

Publishing snapshot data when subscriber connects to publisher in ZeroMQ PUB/SUB model

I have a simple ZeroMQ PUB/SUB architecture for streaming data from the publisher to subscribers. When the subscribers are connected, publisher starts streaming the data but I want to modify it, so that publisher publishes the most recent snapshot of the data first and after that starts streaming.
How can I achieve this?
Q : How can I achieve this?
( this being: "... streaming the data but I want to modify it, so that publisher publishes the most recent snapshot of the data first and after that starts streaming."
Solution :
Instantiate a pair of PUB-s, the first called aSnapshotPUBLISHER, the second aStreamingPUBLISHER. Using XPUB-archetype for the former may help to easily integrate some add-on logic for subscriber-base management ( a NTH-feature, yet kinda O/T ATM ).
Get configured the former with aSnapshotPUBLISHER.setsockopt( ZMQ_CONFLATE, 1 ), other settings may focus on reducing latency and ensuring all the needed resources are available for both the smooth streaming via aStreamingPUBLISHER while also having the most recent snapshot readily available in aSnapshotPUBLISHER for any newcomer.
SUB-side agents simply follow this approach, having setup a pair of working (.bind()/.connect()) links ( to either of the PUB-s or a pair of XPUB+PUB ) and having got confirmed the links are up and running smooth, stop sourcing the snapshots from aSnapshotPUBLISHER and remain consuming but the (now synced using BaseID / TimeStamp / FrameMarker or similarly aligned) streaming-data from aStreamingPUBLISHER.
The known ZMQ_CONFLATE-mode as-is limitation of not supporting multi-frame message payloads is needless to consider a trouble, once a low-latency rule-of-thumb is to pack/compress any data into right-sized BLOB-s rather than moving any sort of "decorated" but inefficient data-representation formats over the wire.

Parallel Req/Rep via Pub/Sub

I have multiple servers, at any point, one and only one will be the leader whcih can respond to a request, all others just drop the request. The issue is that the client does not know which server is the leader.
I have tried using a pub socket on the client for the parallel request out, however I can't work out the right semantics for the response. In terms of how to get the server to respond to that specific client.
A hacky solution which I have tried is to have a sub socket on the client to pub sockets on all the servers, with the leader responding by publishing a message with a filter such that it only goes to the client.
However I am unable to receive any responses this way, the server believes that it sent the message and the client believes it subscribed to "" but then doesn't receive anything...
So I am wondering whether there is a more proper way of doing this? I have thought that potentially a dealer/router with sending to a specific client would work, however I am unsure how to do that.
Essentially I am trying to do a standard Req/Rep however doing the req in parallel to all the nodes, rather than round robin.
UPDATE: By sending the routing id of the dealer in the pub request, making the remote call idempotent (just returning pre-computed results on repeated attempts), and then sending the result back via a router, with message filtering on the receiving side, it now works.
Q : " is (there) a more proper way of doing this? "
Yes.
Start to apply the Maslow's Hammer rule:
“When the only tool you have is a hammer, every problem begins to resemble a nail.”
In other words, do not try use (one) hammer for solving every problem. PUB/SUB-archetype was designed to serve those-and-only-those multi-party Formal-Communications-Pattern archetypes, where many SUB-scribe to .recv() some PUB-lisher(s) .send()-broadcast messages, but nothing other.
Similarly, REQ/REP-archetype was defined and implemented so as to serve one-and-only-one multi-party distributed Formal-Communications-Pattern ( and will obviously not meet any use-case, which has any single other or even a slightly different requirement ).
Users often require some special, non-trivial features, that obviously were not a part of the said trivial Formal-Communications-Pattern archetype primitives ( those ready-made blocks, made available in the ZeroMQ toolbox ).
It is architecs' / designers' role to define, analyse and implement any more complex user-specific distributed-behaviour definition ( a protocol ) and to implement it, most often using a layered combination of the ready-made ZeroMQ primitives.
If in doubts, take a sheet of paper and pencil, draw a small crowd of kids on playground and sketch their "shouts", their "listening", their "silence", "waiting" and "doubts", their many or few "replies", their "voting" and "anger" of not being voted for by friends, their fight for a place on the Sun and their "persistence" not to let others take theirs turn and let 'em sit on the "swing" after releasing the so far pleasurable swinging oneselves.
All this is the part of finding the right mix of ( protocol-orchestrated ) levels of control and levels of freedom to act.
There we get the new, distributed-behaviour, tailor-made for your specific use-case.
Probability to find a ready-made primitive tool to match and fulfill any user-specific use case is limitlessly close to Zero ( sure, unless one's own, user-specific use-case requirements match all those of the primitive archetype, but that is not a user-specific use-case, but a re-use of an already implemented archetype for the very same situation, that was foreseen by the ZeroMQ fathers, wasn't it? )
Again, welcome to the art of Zen-of-Zero.
Maylike to readthis and this and this

ZeroMQ to send messages between systems

I am very much new to the ZeroMQ library.
Hence I wanted to know the pattern ( REQ-REP, PUSH-PULL, PUB-SUB ) that will be the best for our application.
The application which we are using has two systems,
the one which the user interacts with
and
the second is the scheduler, which executes a job, scheduled by the user in the first system.
Now I want to make use of ZeroMQ to send messages in the below scenarios:
from userSystem to schedulerSystem that a job with particular job id is submitted for execution.
from schedulerSystem to userSystem that the job sent with a particular job id has been executed succesfully or the execution has failed
Can somebody please help with this,
stating the reason for using a particular pattern?
Thanks in advance.
Which is the best Formal Communication Pattern to use? None...
Dear Ann,with all due respect, nobody would try to seriously answer a question which of all the possible phone numbers is the best for any kind of use.
Why? There is simply no Swiss-Army-Knife for doing just anything.
That is surprisingly the good news.
As a system designer one may create The Right Solution on a green-field, using the just-enough design strategies for not doing more than necessary ( overhead-wise ) and have all the pluses on your design side ( scaleability-wise, low-latency-wise, memory-footprint-wise, etc. )
If no other requirements than (1) and (2) above appear,a light-weight schemelike this may work fine as an MVP "just-enough" design:
If userSystem does not process anything depending on a schedulerSystem output value, a PUSH-PULL might be an option for sending a job, with possible extensions.
For userSystem receiving independent, asynchronously organised state-reporting messages about respective jobID return code(s), again a receiver side poll-ed PUSH-PULL might work well.
Why? Otherwise natural unstructured behaviour-wise PAIR-PAIR disallows your processing from growing in scale once performance or architecture or both demand to move. PAIR-PAIR does not allow your communication framework to join more entities together, while others do and your processing power may go distributed until your IP-visibility and end-to-end latency permit.
The real world is typically much more complex
Just one picture, Fig.60 from the below-mentioned book:
The best next step?
To see a bigger picture on this subject >>> with more arguments, a simple signalling-plane picture and a direct link to a must-read book from Pieter HINTJENS.

Resources