Publishing snapshot data when subscriber connects to publisher in ZeroMQ PUB/SUB model - zeromq

I have a simple ZeroMQ PUB/SUB architecture for streaming data from the publisher to subscribers. When the subscribers are connected, publisher starts streaming the data but I want to modify it, so that publisher publishes the most recent snapshot of the data first and after that starts streaming.
How can I achieve this?

Q : How can I achieve this?
( this being: "... streaming the data but I want to modify it, so that publisher publishes the most recent snapshot of the data first and after that starts streaming."
Solution :
Instantiate a pair of PUB-s, the first called aSnapshotPUBLISHER, the second aStreamingPUBLISHER. Using XPUB-archetype for the former may help to easily integrate some add-on logic for subscriber-base management ( a NTH-feature, yet kinda O/T ATM ).
Get configured the former with aSnapshotPUBLISHER.setsockopt( ZMQ_CONFLATE, 1 ), other settings may focus on reducing latency and ensuring all the needed resources are available for both the smooth streaming via aStreamingPUBLISHER while also having the most recent snapshot readily available in aSnapshotPUBLISHER for any newcomer.
SUB-side agents simply follow this approach, having setup a pair of working (.bind()/.connect()) links ( to either of the PUB-s or a pair of XPUB+PUB ) and having got confirmed the links are up and running smooth, stop sourcing the snapshots from aSnapshotPUBLISHER and remain consuming but the (now synced using BaseID / TimeStamp / FrameMarker or similarly aligned) streaming-data from aStreamingPUBLISHER.
The known ZMQ_CONFLATE-mode as-is limitation of not supporting multi-frame message payloads is needless to consider a trouble, once a low-latency rule-of-thumb is to pack/compress any data into right-sized BLOB-s rather than moving any sort of "decorated" but inefficient data-representation formats over the wire.

Related

Performance of Nats Jetstream

I'm trying to understand how Nats Jetstream scales and have a couple of questions.
How efficient is subscribing by subject to historic messages? For example lets say have a stream foo that consists of 100 million messages with a subject of foo.bar and then a single message with a subject foo.baz. If I then make a subscription to foo.baz from the start of the stream will something on the server have to perform a linear scan of all messages in foo or will it be able to immediately seek to the foo.baz message.
How well does the system horizontally scale? I ask because I'm having issues getting Jetstream to scale much above a few thousand messages per second, regardless of how many machines I throw at it. Test parameters are as follows:
Nats Server 2.6.3 running on 4 core 8GB nodes
Single Stream replicated 3 times (disk or in-memory appears to make no difference)
500 byte message payloads
n publishers each publishing 1k messages per second
The bottleneck appears to be on the publishing side as I can retrieve messages at least as fast as I can publish them.
Publishing in NATS JetStream is slightly different than publishing in Core NATS.
Yes, you can publish a Core NATS message to a subject that is recorded by a stream and that message will indeed be captured in the stream, but in the case of the Core NATS publication, the publishing application does not expect an acknowledgement back from the nats-server, while in the case of the JetStream publish call, there is an acknowledgement sent back to the client from the nats-server that indicates that the message was indeed successfully persisted and replicated (or not).
So when you do js.Publish() you are actually making a synchronous relatively high latency request-reply (especially if your replication is 3 or 5, and more so if your stream is persisted to file, and depending on the network latency between the client application and the nats-server), which means that your throughput is going to be limited if you are just doing those synchronous publish calls back to back.
If you want throughput of publishing messages to a stream, you should use the asynchronous version of the JetStream publish call instead (i.e. you should use js.AsyncPublish() that returns a PubAckFuture).
However in that case you must also remember to introduce some amount of flow control by limiting the number of 'in-flight' asynchronous publish applications you want to have at any given time (this is because you can always publish asynchronously much much faster than the nats-server(s) can replicate and persist messages.
If you were to continuously publish asynchronously as fast as you can (e.g. when publishing the result of some kind of batch process) then you would eventually overwhelm your servers, which is something you really want to avoid.
You have two options to flow-control your JetStream async publications:
specify a max number of in-flight asynchronous publication requests as an option when obtaining your JetStream context: i.e. js = nc.JetStream(nats.PublishAsyncMaxPending(100))
Do a simple batch mechanism to check for the publication's PubAcks every so many asynchronous publications, like nats bench does: https://github.com/nats-io/natscli/blob/e6b2b478dbc432a639fbf92c5c89570438c31ee7/cli/bench_command.go#L476
About the expected performance: using async publications allows you to really get the throughput that NATS and JetStream are capable of. A simple way to validate or measure performance is to use the nats CLI tool (https://github.com/nats-io/natscli) to run benchmarks.
For example you can start with a simple test: nats bench foo --js --pub 4 --msgs 1000000 --replicas 3 (in memory stream with 3 replicas 4 go-routines each with it's own connection publishing 128 byte messages in batches of 100) and you should get a lot more than a few thousands messages per second.
For more information and examples of how to use the nats bench command you can take a look at this video: https://youtu.be/HwwvFeUHAyo
Would be good to get an opinion on this. I have a similar behaviour and the only way to achieve higher throughput for publishers is to lower replication (from 3 to 1) but that won't be an acceptable solution.
I have tried adding more resources (cpu/ram) with no success on increasing the publishing rate.
Also, scaling horizontally did not make any difference.
In my situation , i am using Bench tool to publish to js.
For an R3 filestore you can expect ~250k small msgs per second. If you utilize synchronous publish that will be dominated by RTT from the application to the system, and from the stream leader to the closest follower. You can use windowed intelligent async publish to get better performance.
You can get higher numbers with memory stores, but again will be dominated by RTT throughout the system.
If you give me a sense of how large are your messages we can show you some results from nats bench against the demo servers (R1) and NGS (R1 & R3).
For the original question regarding filtered consumers, >= 2.8.x will not do a linear scan to retrieve foo.baz. We could also show an example of this as well if it would help.
Feel free to join the slack channel (slack.nats.io) which is a pretty active community. Even feel free to DM me directly, happy to help.

What would be the right ZMQ Pattern?

I am trying to build a ZeroMQ pattern where,
There can be many clients connecting to a single server endpoint
Server will distribute incoming client tasks to available workers (will be mapped to the number of cores on the server)
These tasks are long running (in hours) and need to perform a lot of local I/O
During each task execution (iteration) there will be data/messages (potentially in order of [GB]s) sent back and forth between the client and the server worker
Client and server workers need to know if there are failures/errors on the peer side, so that they can recover (retry) or shutdown gracefully and try later
Based on the above, I presume that the ROUTER/DEALER pattern would be useful. PUB/SUB is discarded as I need to know if the peer fails.
I tried using various combinations of the ROUTER/DEALER pattern but I am unable to ensure that multiple messages from a client reach the same worker within an iteration. I understand that I need to implement a broker/forwarder/device that routes the incoming messages to the right recipient/handler/worker. But I am unable to map the frontend and backend sockets in the broker. I am looking at MajorDomo pattern, but I guess there has to be a simpler broker model that could just route the messages to the assigned worker. (not really get into services)
I am looking for some examples, if there are any or any guidance on what I may be missing. I am trying to build this in Golang.
Q : "What would be the right ZMQ Pattern?"
Based on the complex composition of all the requirements posted under items 1 - 5, I dare to say, The Right would be NOT to use a single one of the standard, built-in, ZeroMQ trivial primitive Communication Archetype Patterns, but to rather create a multi-layered application-specific composition of a ( M + N + 1 hot-standby robust-enough?) (self-resilient?) Signalling-Messaging infrastructure, that covers all your current ( and possibly extensible for any future one ) application-level requirements, like depicted here for a way simpler distributed-computing use-case, where but a trivial remote-SigKILL was implemented.
Yes, the best would be to create ( and maintain ) your own formalised signalling, that the application level can handle and interact across -- like the heart-beating for detecting dead-worker(s) + permitting to re-instate such failed jobs right on-detected failures (most probably re-located and/or re-scheduled to take place & respective resources not statically pre-mapped, but where physically most feasible at the re-instating moment of time - so even more telemetry signalling will help you decide about the re-instating of the such failed micro-jobs).
ZeroMQ is a fabulous framework right for such complex signalling and messaging hierarchies, so your System Architect's imagination is the only ceiling in this concept.
ZeroMQ will take the rest and do all the hard work nice and easily.

Fault tolerant redundancy

This might result in biased and opinion based answers, if so I'll close the question but...
I have a rather basic requirement of improving our up-time and speed. As part of this I'm looking at the two main competing approaches, traditional pub/sub and akka.net. We don't have any issues currently or expect to have any need for concurrency control.
What we have is several basic workflows which are data analysis, manipulation and persistence of the result:
Step 1) Capture work to be done (IE what objects need to do some work)
Step 2) Execute that work load and produce a result
Step 3) Save result
Using traditional pub/sub This seems rather easy. Have micro services for each step, push a message at the end of each step with the data required (or more to the point data that might be useful) for the next step. Using any off the self message queue/topic/subscription software this provides a nice ability to:
1) geographically spread the loads around the world to where the source data is located
2) increase the number of "workers" that subscribe to increase through put
3) push to something central that can support the idea of connecting "workers" with a minimal learning curve
4) any component (or set of workers for a component) further down the workflow has/have a queue where the messages queue and wait for said component to come back online (even if the whole component disconnects)
5) adding new components that do something new and different, is as easy as registering a new subscription to a topic.
It's all pretty much out of the box easy joy... assuming sensible aggregate and bounded context patterns are adhered to here. I'm not seeking advise of how to write good distributed code, I'm looking for how deploy it, support it, debug rouge/missing/corrupt messages etc. Which is why I want to know what Akka.net offers.
I've seen there's Akka.net clustering . It may or may not be production ready yet, but best I understand what it can/could do for us.
So the main questions I have are:
1) Where are messages stored prior to arriving? So long as a publisher has access to the messaging bus/software endpoint, any such software will store and hold messages waiting for a subscriber to connect and pick up it's messages (obvious assumptions about the subscription having already been registered so the messages queue for it). How does Akka.net cluster handle all of this?
2) What tooling exists for operational support of these queues and mailboxes in Akka.net cluster? What tools give an operator insight into what is in a mailbox received but waiting to be processed and what tools exist for viewing what has been "published" and not yet "received"? Most competing Pub/Sub software has operational tools so I'm looking for some comparison here.
3) How do you debug rouge, missing or corrupt messages. We all know we should trust our software but a bad message can cause a system to spiral out of control, so how would I eject a bad message from the system? How can I modify a message so it's going to behave differently because the business needs something fixed at 3:30 am? How can I answer "where is my message" with "it IS in the system and it IS waiting to be received" or "it has been received and just in the mailbox"?
4) If a component goes down HARD (recycle, hardware failure what ever) what will restore the mailboxes, queues etc? Any message that's actually being processed has an acceptable lost tolerance, but 1000 messages in a mailbox getting lost isn't so tolerable, what persistence and tolerance is there?
5) The light review I've done appears to advocate for a supervisor pattern to be built into your software to marshal messages around (I'm guessing to manage and release concurrency locks?). Given concurrency isn't an issue here, what out of the box pub/sub mechanism do you support that isn't basic message remoting between two (or x internally defined in code) components? Again with subscriptions and topics in most pub/sub software, your first object pushes a message (it's central so it's a potential single point of failure) but that component (and neither doesn't any other code) have to be aware of what will consume that message. It's expansion nirvana compared the old school way where we manually pushed a message from one object to the next (and to the next), rebuilding or recompiling for each new class that same message had to go to. I'm keen to not have to build our own message router.
6) When all instances of a particular component go offline (say step 3 above) what remembers that there's actually something there that needs to queue and remember those messages (say the ones pushed blindly from step 2 above)? In other software, until you delete the subscription the messages keep queuing up based on what ever rules are defined for TTL etc. What is provided for this?

ZeroMQ: are PUB/SUB topic subscriptions cheap?

Problem: I have a number of file uploads coming via HTTP in parallel ( uploads receiver ). I'm storing them temporarily on a local disk. Another process ( uploads submitter ) gets notified about new uploads and does specific processing ( parsing, extracting metadata, uploading to S3 etc ). Once upload processing done I want uploads receiver to be notified by submitter to reply back with status ( whether submission is ok or error ) to the remote uploader. Using ZeroMQ PUB/SUB pattern, what would be better:
subscribe all upload receiver threads to a single topic. Each
receiver thread would have to filter messages based on upload id or
something to find a notification that belongs to it.
subscribe each receiver thread to a new topic which represents
particular upload. This one seems more reasonable assuming topics are
cheap in ZeroMQ, i.e. not much resources is needed to keep them and they
can be auto-expired. I expect new uploads to come at dozens of
files per second, single upload processing may take up to several
seconds so theoretically I can have up to thousand of topics active
at the same moment of time. Also I may not always be able to
unsubscribe due to various failure modes.
Initial notice: On Using Different ZeroMQ Version Numbers:
While more recent versions may use PUB-side topic filtering, the early ZeroMQ versions did use SUB-side approach, which means that all the ( network ) message-transport traffic goes to all SUB-s as an acceptable penalty for distributing the processing-workload, that would otherwise be needed to get handled at lowest possible latency on the PUB-side.
This is important for cases, where in an open distributed system association the homogenity of versions is not enforceable.
Whereas you design architecture seems to be co-located on the same <localhost> the performance impact remains non-distributed ( concentrated ) and may implicate just some limited latency/priority tweaking, if overall bottleneck appears during this Use-Case up-scaling.
On Scaleability Ranges - Limits are still farther than your Use-Case:
As Martin Sustrik ( ZeroMQ co-father ) presented in details, ZeroMQ was designed with expected scales up to some small tens of thousands:
(cit.:) " Efficient Subscription Matching
In ZeroMQ, simple tries are used to store and match PUB/SUB subscriptions. The subscription mechanism was intended for up to 10,000 subscriptions where simple trie works well. However, there are users who use as much as 150,000,000 subscriptions. In such cases there's a need for a more efficient data structure. "
Further details on design & scaling might be found interesting in this Martin's post.
The Best Next Step?
A fair approach would be to mock-up each of the questioned approaches and benchmark them, scaled to { 1.0x , 1.5x, 2.0x, 5.0x } of the expected static scales in-vitro to have quantitatively supported data about real overheads, performance and latencies relevant to the alternative strategies under review.
Anyway, Vovan, enjoy the worlds of smart signalling/messaging in the distributed processing.

An event store could become a single point of failure?

Since a couple of days I've been trying to figure it out how to inform to the rest of the microservices that a new entity was created in a microservice A that store that entity in a MongoDB.
I want to:
Have low coupling between the microservices
Avoid distributed transactions between microservices like Two Phase Commit (2PC)
At first a message broker like RabbitMQ seems to be a good tool for the job but then I see the problem of commit the new document in MongoDB and publish the message in the broker not being atomic.
Why event sourcing? by eventuate.io:
One way of solving this issue implies make the schema of the documents a bit dirtier by adding a mark that says if the document have been published in the broker and having a scheduled background process that search unpublished documents in MongoDB and publishes those to the broker using confirmations, when the confirmation arrives the document will be marked as published (using at-least-once and idempotency semantics). This solutions is proposed in this and this answers.
Reading an Introduction to Microservices by Chris Richardson I ended up in this great presentation of Developing functional domain models with event sourcing where one of the slides asked:
How to atomically update the database and publish events and publish events without 2PC? (dual write problem).
The answer is simple (on the next slide)
Update the database and publish events
This is a different approach to this one that is based on CQRS a la Greg Young.
The domain repository is responsible for publishing the events, this
would normally be inside a single transaction together with storing
the events in the event store.
I think that delegate the responsabilities of storing and publishing the events to the event store is a good thing because avoids the need of 2PC or a background process.
However, in a certain way it's true that:
If you rely on the event store to publish the events you'd have a
tight coupling to the storage mechanism.
But we could say the same if we adopt a message broker for intecommunicate the microservices.
The thing that worries me more is that the Event Store seems to become a Single Point of Failure.
If we look this example from eventuate.io
we can see that if the event store is down, we can't create accounts or money transfers, losing one of the advantages of microservices. (although the system will continue responding querys).
So, it's correct to affirmate that the Event Store as used in the eventuate example is a Single Point of Failure?
What you are facing is an instance of the Two General's Problem. Basically, you want to have two entities on a network agreeing on something but the network is not fail safe. Leslie Lamport proved that this is impossible.
So no matter how much you add new entities to your network, the message queue being one, you will never have 100% certainty that agreement will be reached. In fact, the opposite takes place: the more entities you add to your distributed system, the less you can be certain that an agreement will eventually be reached.
A practical answer to your case is that 2PC is not that bad if you consider adding even more complexity and single points of failures. If you absolutely do not want a single point of failure and wants to assume that the network is reliable (in other words, that the network itself cannot be a single point of failure), you can try a P2P algorithm such as DHT, but for two peers I bet it reduces to simple 2PC.
We handle this with the Outbox approach in NServiceBus:
http://docs.particular.net/nservicebus/outbox/
This approach requires that the initial trigger for the whole operation came in as a message on the queue but works very well.
You could also create a flag for each entry inside of the event store which tells if this event was already published. Another process could poll the event store for those unpublished events and put them into a message queue or topic. The disadvantage of this approach is that consumers of this queue or topic must be designed to de-duplicate incoming messages because this pattern does only guarantee at-least-once delivery. Another disadvantage could be latency because of the polling frequency. But since we have already entered the eventually consistent area here this might not be such a big concern.
How about if we have two event stores, and whenever a Domain Event is created, it is queued onto both of them. And the event handler on the query side, handles events popped from both the event stores.
Ofcourse every event should be idempotent.
But wouldn’t this solve our problem of the event store being a single point of entry?
Not particularly a mongodb solution but have you considered leveraging the Streams feature introduced in Redis 5 to implement a reliable event store. Take a look this intro here
I find that it has rich set of features like message tailing, message acknowledgement as well as the ability to extract unacknowledged messages easily. This surely helps to implement at least once messaging guarantees. It also support load balancing of messages using "consumer group" concept which can help with scaling the processing part.
Regarding your concern about being the single point of failure, as per the documentation, streams and consumer information can be replicated across nodes and persisted to disk (using regular Redis mechanisms I believe). This helps address the single point of failure issue. I'm currently considering using this for one of my microservices projects.

Resources