What is the ZeroMQ PUB/SUB internal behaviour? - zeromq

I'm trying to get my head around to the behaviour of zmq with PUB/SUB.
Q1: I can't find a real reason why with the PUSH/PULL sockets combo I can create a queue that actually queue in memory messages that it can't get delivered (the consumer is not available) when with the PUB/SUB not.
Q2: Is there any technical whitepaper or document that describes in detail the internals of the sockets?
EDIT:
This example of PUSH/PULL streamer works as expected (the worker join late or restart and gets the queued messages in the feeder. PUB/SUB forwarder does not behave in the same way.

While Q1 is hard to be answered / fully addressed without a SLOC ...
there is still a chance your code ( though yet unpublished,which StackOverflow so much encourages user to include in a form aka MCVEand you may already have felt or soon might feel some flames for not doing so ) just forgotten to set a subscription topic-filter
aSubSOCKET.setsockopt( zmq.SUBSCRIBE = "" ) # ->recv "EVERYTHING" / NO-TOPIC-FILTER
aSubSOCKET.setsockopt( zmq.SUBSCRIBE = "GOOD-NEWS" ) # ->recv "GOOD-NEWS" MESSAGES to be received only
A2: yes, there are exhaustive descriptions of all ZeroMQ API calls +
besides the API manpage collection for ØMQ/2.1.1 and other versions,there is a great online published pdf book "Code Connected, Vol.1" from Pieter HINTJENS himself.
Worth reading. A lot of insights into general distributed-processing area and ZeroMQ way.

Related

ZeroMQ PUSH / PULL - how to know which events are pending in SEND BUFFER queue?

We have a service pair doing PUSH/PULL pattern of message communication. As mentioned in the docs, if the PULL service is down or not running, then a sender will queue up to high water mark number of events and by default a .send() after that will block.
Now, while an app is in the blocking state, the app could be killed or something else may happen, leading up to loosing those messages in the queue.
I understand PUSH/PULL is not the best method if we want that kind of reliability and should probably use some of the other method listed at: https://zguide.zeromq.org/docs/chapter4/ but is there a way in PUSH/PULL method to get event call back on the events still on queue on say app exit/periodic callbacks/signals?
I also understand, that I could use NOBLOCK or ZMQ_IMMEDIATE or ZMQ_SNDTIMEO in such situation and catch the error and use application level recovery (similar to DLQ pattern) but I was looking into things available from the ZeroMQ library.
Q : "... how to know which events are pending in SEND BUFFER queue ?"
A :Well,having used ZeroMQ since v2.1, v3.x, till v4.x in 2022-Q1, there has never been a way, how a user-level code may interact with ZeroMQ internal queues and/or state(s) as there was no such method in c-API to do so.
Q : "... is there a way in PUSH/PULL method to get event call back on the events still on queue on say app exit/periodic callbacks/signals?"
A :Well, let's solve this by using a concurrently operated signalling-socket, for receiving POSACK-messages from "live"-clients, i.e. those, that can and do receive messages - thus being able to back-throttle messages for those, that did not respond in reasonable TAT. Using a mix of several, properly selected Scalable Formal Communications Patterns archetypes to work in cooperation, helps solve this "soft"-signalling control. Without an ambition to solve all details, a set of one-PUB.bind() / many-SUB.connect()-sockets for selectively directed payload-transport with subscription-based controls and one-PULL.bind() / many-PUSH.connect()-s for "soft"-control signalling of still-alive-heartbeats, traffic back-throttling and similar services

ZeroMQ - Can we check subscribers before sending a message?

The classic ZeroMQ PUB pattern, is something like :
format your complete message
send your message
( managed by ZMQ ) if there is a subscriber to the topic, then send it, else trash it ?
What I've noticed in one of my applications, is that the formatting of some of the messages is very heavy and takes a lot of time. When I don't have a subscriber for the topic, I do all this work for nothing.
I was wondering if there was a way to check whether a topic was subscribed before formatting the rest of the message.
I understand there'd be a TOCTOU problem :
1. check the topic is subscribed ( it's not )
2. ( ZMQ receives a subscription for the topic )
3. data is not sent...
or
1. check the topic is subscribed ( it is )
2. start formatting message
3. ( ZMQ receives a un-subscription for the topic )
4. send to socket, data is not sent ( wasted time )
... and I'm OK with both.
I've tried with multi-part messages ( sending first the "header/topic" without formatting the rest of the message ) but :
- it doesn't seem to do what I'm meaning here
- my subscribers also have to handle the multi-part messages ( can do a simple zmq_recv() ), which is a bit annoying
Any idea ? I think I see where to patch in xpub.cpp , adding a method that would copy/paste part of xpub::xsend() ( https://github.com/zeromq/libzmq/blob/656205b5f9159677d325cff5e6e26c97f95d8cd7/src/xpub.cpp#L289 ) but I'm not even sure that's something the ZMQ community would be interested into.
In case one has never worked with ZeroMQ,one may here enjoy to first look at "ZeroMQ Principles in less than Five Seconds"before diving into further details
Q : "Can we check subscribers before sending a message?"
Yes, we can.
If indeed in such a need, beware the XPUB Archetype collects incoming subscription-management messages ( if they arrive ) usable for doing something like this.
That does not mean one can stand blind and rely on this. Unless in a fully-restricted environment, where rigid version-control and enforcement policies are strong & in-place, there always may be a client, that does not use the more recent, changed, version, that performs the topic-filtering on (X)PUB-side. Given such chance, the SUB-side topic-filtering ought be fully simulated, if it delivers all the subscription-management records onto the (X)PUB-side, as the newer versions expect, before starting to blind-sightedly "believe" into such a test-before-send policy is being adopted.
Damned version management :o)
You may also know, that the topic-filtering ( since ever and hopefully will remain so ) does not require any formatting the less a multi-part messaging overheads. It works as a plain bit-field matching, the performance of which was tuned-up, so who would ever want to waste any single [ns] of some add-on overhead costs in this domain?
Welcome to the Art of Zen-of-Zero

How to get data a ZMQ_PUB service?

Can I publisher service receive data from an external source and send them to the subscribers?
In the wuserver.cpp example, the data are generated from the same script.
Can I write a ZMQ_PUBLISHER entity, which receives data from external data source / application ... ?
In this affirmation:
There is one more important thing to know about PUB-SUB sockets: you do not know precisely when a subscriber starts to get messages. Even if you start a subscriber, wait a while, and then start the publisher, the subscriber will always miss the first messages that the publisher sends. This is because as the subscriber connects to the publisher (something that takes a small but non-zero time), the publisher may already be sending messages out.
Does this mean, that a PUB-SUB ZeroMQ pattern is performed to a best effort - UDP style?
Q1: Can I write a ZMQ_PUBLISHER entity, which receives data from external data source/application?
A1: Oh sure, this is why ZeroMQ is so helping us in designing smart distributed-systems. Just imagine the PUB-side process to also have other { .bind() | .connect() }-calls, so as to establish such other links to data-feeder(s), and you are done to operate the wished to have scheme. In distributed-systems this gives you a new freedom to smart integrate heterogeneous systems to talk to each other in a very efficient way.
Q2:Does this mean, that a PUB-SUB ZeroMQ pattern is performed to a best effort - UDP style?
A2: No, it has another meaning. The newly declared subscriber entities at some uncertain moment start to negotiate their respective subscription-topic filtering and such a ( distributed ) process takes some a-priori unknown time. Unless until the new / changed topic-filter policy was established, there is nothing to go into the SUB-side exgress interface to meet a .recv()-call, so no one can indeed tell, when that will get happened, can he?
On a higher level, there is another well known dichotomy of ZeroMQ -- Zero-Warranty Principle -- expect to either get delivered a complete message or none at all, which prevents the framework users from a need to handle any kind of damaged / inconsistent message-payloads. Either OK, or None. That's a great warranty. The more for distributed-systems.

Fault tolerant redundancy

This might result in biased and opinion based answers, if so I'll close the question but...
I have a rather basic requirement of improving our up-time and speed. As part of this I'm looking at the two main competing approaches, traditional pub/sub and akka.net. We don't have any issues currently or expect to have any need for concurrency control.
What we have is several basic workflows which are data analysis, manipulation and persistence of the result:
Step 1) Capture work to be done (IE what objects need to do some work)
Step 2) Execute that work load and produce a result
Step 3) Save result
Using traditional pub/sub This seems rather easy. Have micro services for each step, push a message at the end of each step with the data required (or more to the point data that might be useful) for the next step. Using any off the self message queue/topic/subscription software this provides a nice ability to:
1) geographically spread the loads around the world to where the source data is located
2) increase the number of "workers" that subscribe to increase through put
3) push to something central that can support the idea of connecting "workers" with a minimal learning curve
4) any component (or set of workers for a component) further down the workflow has/have a queue where the messages queue and wait for said component to come back online (even if the whole component disconnects)
5) adding new components that do something new and different, is as easy as registering a new subscription to a topic.
It's all pretty much out of the box easy joy... assuming sensible aggregate and bounded context patterns are adhered to here. I'm not seeking advise of how to write good distributed code, I'm looking for how deploy it, support it, debug rouge/missing/corrupt messages etc. Which is why I want to know what Akka.net offers.
I've seen there's Akka.net clustering . It may or may not be production ready yet, but best I understand what it can/could do for us.
So the main questions I have are:
1) Where are messages stored prior to arriving? So long as a publisher has access to the messaging bus/software endpoint, any such software will store and hold messages waiting for a subscriber to connect and pick up it's messages (obvious assumptions about the subscription having already been registered so the messages queue for it). How does Akka.net cluster handle all of this?
2) What tooling exists for operational support of these queues and mailboxes in Akka.net cluster? What tools give an operator insight into what is in a mailbox received but waiting to be processed and what tools exist for viewing what has been "published" and not yet "received"? Most competing Pub/Sub software has operational tools so I'm looking for some comparison here.
3) How do you debug rouge, missing or corrupt messages. We all know we should trust our software but a bad message can cause a system to spiral out of control, so how would I eject a bad message from the system? How can I modify a message so it's going to behave differently because the business needs something fixed at 3:30 am? How can I answer "where is my message" with "it IS in the system and it IS waiting to be received" or "it has been received and just in the mailbox"?
4) If a component goes down HARD (recycle, hardware failure what ever) what will restore the mailboxes, queues etc? Any message that's actually being processed has an acceptable lost tolerance, but 1000 messages in a mailbox getting lost isn't so tolerable, what persistence and tolerance is there?
5) The light review I've done appears to advocate for a supervisor pattern to be built into your software to marshal messages around (I'm guessing to manage and release concurrency locks?). Given concurrency isn't an issue here, what out of the box pub/sub mechanism do you support that isn't basic message remoting between two (or x internally defined in code) components? Again with subscriptions and topics in most pub/sub software, your first object pushes a message (it's central so it's a potential single point of failure) but that component (and neither doesn't any other code) have to be aware of what will consume that message. It's expansion nirvana compared the old school way where we manually pushed a message from one object to the next (and to the next), rebuilding or recompiling for each new class that same message had to go to. I'm keen to not have to build our own message router.
6) When all instances of a particular component go offline (say step 3 above) what remembers that there's actually something there that needs to queue and remember those messages (say the ones pushed blindly from step 2 above)? In other software, until you delete the subscription the messages keep queuing up based on what ever rules are defined for TTL etc. What is provided for this?

Detect dropped messages in ZeroMQ Queues

Since it does not seem to be possible to query/inspect the underlying ZeroMQ queues/buffers sockets to see how much they are utilized, is there some way to detect when a message is dropped due to full buffers in a Publisher socket when sent/queued?
For example, if the publisher queue is full, the zmq_send operation will simply drop the message.
Basically, what I want to achieve is a way to detect situations where the queues are getting stressed and/or full to be able to (later on) tune the solution to work better. One alternative way would be to add a sequence number to each message and do a simple calculation in the subscriber but I can never be sure that a message was lost due to full buffers in the publisher.
There is an example for this in the ZeroMQ Guide (which you should read and digest if you want to use 0MQ happily): http://zguide.zeromq.org/page:all#Slow-Subscriber-Detection-Suicidal-Snail-Pattern
The mechanism is as you answered yourself, to add a sequence number in the message, and allow the subscriber to detect gaps and take appropriate action. For most pubsub scenarios you can raise the default HWM, which is 1,000, to something much higher; it depends on your average message size.
I know this is an old post but here is what I did when recently facing the same issue.
I opted to use a DEALER/ROUTER and set the ZMQ_SNDHWM option to 1. Also I provided the timeout parameter on each zmq_send(). The timeout could be anything between 10 ms to 3 seconds, depending on what your scenario is ( a local or remote send ).
If the message is not sent within the timeout or the send-buffer is full the zmq_send() will return false. That enabled me to set up a retry queue in front of zmq. I know it's not a perfect solution but for me it worked just fine. What puzzles me though is the meaning of true/false returned by the DEALER-socket zmq_send(). I have not been able to find the answer to that question. Whether it indicates that the message has been buffered or that the message has been delivered to the ROUTER has eluded me. In my case I got the results needed anyway.
Just for the record this was done using netmq but I guess it applies to ZeroMQ as well.
I do agree wtih james though. ZeroMQ ( and netmq ) should at least provide a way to inspect the queue ( and get the messages out ) and also a way to tell the various sockets not to drop messages. The best option would be to send messages not delivered in timely fashion according to the configured options to some sort of deadletter queue. The deadletter queue could then be handled separately.

Resources