ZeroMQ PUSH / PULL - how to know which events are pending in SEND BUFFER queue? - zeromq

We have a service pair doing PUSH/PULL pattern of message communication. As mentioned in the docs, if the PULL service is down or not running, then a sender will queue up to high water mark number of events and by default a .send() after that will block.
Now, while an app is in the blocking state, the app could be killed or something else may happen, leading up to loosing those messages in the queue.
I understand PUSH/PULL is not the best method if we want that kind of reliability and should probably use some of the other method listed at: https://zguide.zeromq.org/docs/chapter4/ but is there a way in PUSH/PULL method to get event call back on the events still on queue on say app exit/periodic callbacks/signals?
I also understand, that I could use NOBLOCK or ZMQ_IMMEDIATE or ZMQ_SNDTIMEO in such situation and catch the error and use application level recovery (similar to DLQ pattern) but I was looking into things available from the ZeroMQ library.

Q : "... how to know which events are pending in SEND BUFFER queue ?"
A :Well,having used ZeroMQ since v2.1, v3.x, till v4.x in 2022-Q1, there has never been a way, how a user-level code may interact with ZeroMQ internal queues and/or state(s) as there was no such method in c-API to do so.
Q : "... is there a way in PUSH/PULL method to get event call back on the events still on queue on say app exit/periodic callbacks/signals?"
A :Well, let's solve this by using a concurrently operated signalling-socket, for receiving POSACK-messages from "live"-clients, i.e. those, that can and do receive messages - thus being able to back-throttle messages for those, that did not respond in reasonable TAT. Using a mix of several, properly selected Scalable Formal Communications Patterns archetypes to work in cooperation, helps solve this "soft"-signalling control. Without an ambition to solve all details, a set of one-PUB.bind() / many-SUB.connect()-sockets for selectively directed payload-transport with subscription-based controls and one-PULL.bind() / many-PUSH.connect()-s for "soft"-control signalling of still-alive-heartbeats, traffic back-throttling and similar services

Related

How does a microservice return data to the caller when using a message broker? or a message queue?

I am prettty new to microservices, and I am trying to figure out how to set a micro-service architecture in which my publisher that emits an event, can receive a response with data from the consumer within the publisher?
From what i have read about message-brokers and message-queues, it seems like it's one-way communication. The producer emits an event (or rather, sends a message) which is handled by the message broker, and then the consumer consumes that event and performs some action.
This allows for decoupled code, which is part of what im looking for, but i dont understand if the consumer is able to return any data to the caller.
Say for example I have a microservice that communicates with an external API to fetch data. I want to be able to send a message or emit an event from my front-facing server, which then calls the service that fetches data, parses the data, and then returns that data back to my servver1 (front-facing server)
Is there a way to make message brokers or queues bidirectional? Or is it only useable in one direction. I keep reading message brokers allow services to communicate with each other, but I only find examples in which data flow goes one way.
Even reading rabbitMQ documentation hasn't really made it very clear to me how i could do this
In general, when talking about messaging, it's one-way.
When you send a letter to someone you're not opening up a mind-meld so that they telepathically communicate their response to you.
Instead, you include a return address (or some other means of contacting you).
So to map a request-response interaction when communicating with explicit messaging (e.g. via a message queue), the solution is the same: you include some directions which the recipient can/will interpret as "send a response here". That could, for instance be, "publish a message on this queue with this correlation ID".
Your publisher then, after sending this message, subscribes to the queue it's designated and waits for a message with the expected correlation ID.
Needless to say, this is fairly elaborate: you are, in some sense, reimplementing a decent portion of a session protocol like TCP on top of a datagram protocol like IP (albeit in this case, we may have some stronger reliability guarantees than we'd get from IP). It's worth noting that this sort of request-response interaction intrinsically couples the two parties (we can't really say "sender and receiver": each is the other's audience), so we're basically putting in some effort to decouple the two sides and then some more effort to recouple them.
With that in mind, if the actual business use case calls for a request-response interaction like this, consider implementing it with an actual request-response protocol (e.g. REST over HTTP or gRPC...) and accept that you have this coupling.
Alternatively, if you really want to pursue loose coupling, go for broke and embrace the asynchronicity at the heart of the universe (maybe that way lies true enlightenment?). Have your publisher return success with that correlation ID as soon as its sent its message. Meanwhile, have a different service be tracking the state of those correlation IDs and exposing a query interface (CQRS, hooray!). Your client can then check at any time whether the thing it wanted succeeded, even if its connection to your publisher gets interrupted.
Queues are the wrong level of abstraction for request-reply. You can build an application out of them, but it would be nontrivial to support and operate.
The solution is to use an orchestration system like temporal.io or AWS Step Functions. These services out of the box provide state management, asynchronous communication, and automatic recovery in case of various types of failures.

In event-driven architecture, is it ok to have all services send their event to a component that forwards it to the proper service?

Let's say I want to set up and event-driven architecture with services A-D where the events propagate as follows
A
/ \
B C
/
D
In other words,
(1) A publishes an event
(2) Subscribers B and C receive A's event
(3) C publishes an event
(4) Subscriber D receive's C's event
One way is to have services B and C directly listen to a queue into which A posts messages. But the issue I see with this is maintenance. Once the system becomes complicated with 1000s of subscriptions, it becomes difficult to have any visibility into how the updates are propagating.
A solution I propose to this problem is to have another service X that knows the tree in the in the first image and is responsible for directing the propagation of events according to the tree. Every service publishes its event to X and it publishes the event to the listening services. So it's kinda of a middleman like
A
|
X
/ \
B C
|
X
|
D
This also makes it easier to track the event propagation.
Are there any downsides to this (other than extra cost associating with twice as much message transferring)?
You’re thinking of events like they are implemented in a Winforms UI where the publisher sends the event directly to the subscriber. That’s not how events work in an EDA architecture. The word “event” has taken on a whole new meaning.
Before we start, you’re jumbling together the ideas of a message and an event when they really need to be kept separate. A message is a request for some action to happen, while an event is notification that something has already happened. The important distinction for this discussion is that a message publisher assumes 1 or more other processes will receive and process the message. If the message is not processed by something, downstream errors will occur. An event has no such assumption and can go unread without adversely affecting anything. Another difference is that once messages are processed they are typically thrown away, whereas events are kept for an extended period (days, or weeks).
With that in mind, the ‘X’ service you talk about already exists (please don’t build one) and is integral to the process – it’s called the bus. There are 2 types of bus; a message bus (think RabbitMQ, MSMQ, ZeroMQ, etc) or event bus (Kafka, Kinesis, or Azure Event Hub). In either case, a publisher puts a message on to the bus and subscribers get it from the bus. You may implement the bus servers as multiple physical buses, but when imagining it think of them all being the same logical bus.
The key point that’s tripping you up, and it’s a subtle difference, is thinking that the message bus has business logic indicating where messages go. The business logic of who gets what message is determined by the subscribers – the message bus is just a holding place for the messages to wait for pickup.
In your example, A publishes an event to the bus with a message type of “MT1”. B and C both tell the bus that they are interested in events of type “MT1”. When the bus receives the request from B and C to be notified of “MT1” messages, the bus creates a queue for B and a queue for C. When A publishes the message, the bus puts a copy in the “B-MT1” queue and a copy in the “C-MT1” queue. Note that the bus doesn’t know why B and C want to receive those messages, only that they’ve subscribed.
These messages sit there until processed by their respective subscribers (the processes can poll or the bus can push the messages, but the key idea is that the messages are held until processed). Once processed, the messages are thrown away.
For C to communicate with D, D will subscribe to messages of type “MT2” and C will publish them to the bus.
Constantin’s answer above has a point that this is a single point of failure, but it can be managed with standard network architecture like failover servers, local message persistence, message acknowledgements, etc.
One of your concerns is that with 1000’s of subscriptions it becomes difficult to follow the path, and you’re right. This is an inherent downside of EDA and there’s nothing you can do about it. Eventual consistency is also something the business is going to complain about, but it’s part of the beast and is actually a good thing from a technical perspective because it enables more scalability. The biggest problem I’ve found using the term Eventual Consistency is that the business thinks it means hours or days, not seconds.
BTW, This whole discussion assumes the message publishers and subscribers are different apps. All the same ideas can be applied within the same address space, just with a different bus. If you’re a .net shop look at Mediatr. For other tech stacks, there are similar solutions that I’m sure google knows about.
If your main concern is visibility into the propagation of events (which is a very valid concern for debugging and long-term application maintenance of a distributed system), you can use a correlation identifier to trace the generation of messages from the initial event through the entire chain. You don't need to build another layer of orchestration -- let your messaging platform handle that for you.
Most messaging platforms/libraries have the concept built in: e.g., NServiceBus defines a ConversationId field in the message headers, and AMQP defines a correlation-id field in the basic messaging model.
Your system should have some kind of logging that allows you to audit messages -- the correlation ID will allow you to group all messages that result from a single command/request to make debugging distributed logic much simpler.
If you set a GUID in the client requests, you can even correlate actions in the UI to the backend API, right through all the events recursively generated.
It is OK but the microservices shouldn't care how they get the messages in the first place. From their point of view the input messages just arrive. You will then be tempted to design your system to depend on some global order of events, which is hard in a distributed scalable system. Resist that temptation and design your system to relay only on local ordering of events (i.e. the ordering in an Event stream emitted by an Aggregate in Event sourcing + DDD).
One downside that I see is that the availability and the scalability may be hurt. You will then have a single point of failure for the entire system. If this fails everything fails. When it needs to be scaled up then you will have again problems as you will have distributed messaging system.

How to get data a ZMQ_PUB service?

Can I publisher service receive data from an external source and send them to the subscribers?
In the wuserver.cpp example, the data are generated from the same script.
Can I write a ZMQ_PUBLISHER entity, which receives data from external data source / application ... ?
In this affirmation:
There is one more important thing to know about PUB-SUB sockets: you do not know precisely when a subscriber starts to get messages. Even if you start a subscriber, wait a while, and then start the publisher, the subscriber will always miss the first messages that the publisher sends. This is because as the subscriber connects to the publisher (something that takes a small but non-zero time), the publisher may already be sending messages out.
Does this mean, that a PUB-SUB ZeroMQ pattern is performed to a best effort - UDP style?
Q1: Can I write a ZMQ_PUBLISHER entity, which receives data from external data source/application?
A1: Oh sure, this is why ZeroMQ is so helping us in designing smart distributed-systems. Just imagine the PUB-side process to also have other { .bind() | .connect() }-calls, so as to establish such other links to data-feeder(s), and you are done to operate the wished to have scheme. In distributed-systems this gives you a new freedom to smart integrate heterogeneous systems to talk to each other in a very efficient way.
Q2:Does this mean, that a PUB-SUB ZeroMQ pattern is performed to a best effort - UDP style?
A2: No, it has another meaning. The newly declared subscriber entities at some uncertain moment start to negotiate their respective subscription-topic filtering and such a ( distributed ) process takes some a-priori unknown time. Unless until the new / changed topic-filter policy was established, there is nothing to go into the SUB-side exgress interface to meet a .recv()-call, so no one can indeed tell, when that will get happened, can he?
On a higher level, there is another well known dichotomy of ZeroMQ -- Zero-Warranty Principle -- expect to either get delivered a complete message or none at all, which prevents the framework users from a need to handle any kind of damaged / inconsistent message-payloads. Either OK, or None. That's a great warranty. The more for distributed-systems.

If nobody needs reliable messaging on transport level, how to implement reliable PubSub on business level?

This question is mostly out of curiosity. I read this article about WS-ReliableMessaging by Marc de Graauw some time ago and agreed that reliable messaging should be applied on the business level as whenever possible.
Now, the question is, he explains clearly what his approach is in a point-to-point fashion. However, I fail to see how you could implement reliable messaging on the business level in a Publish/Subscribe situation.
I will try to demonstrate the difference by showing commands (point-to-point) vs. events (publish/subscribe). Note that these examples are highly simplified.
Command: Transfer(uniqueId, amount, sourceAccount, recipientAccount)
If the account holder sends this transfer, he could wait for the confirmation MoneyTransferred (assuming this event will contain a reference to the uniqueId in the Transfer command.
If the account holder doesn't received the MoneyTransferred within a given timeout period, he could send the same command again. (of course assuming the command processor is idempotent)
So I see how reliable messaging could work on business level in a point-to-point fashion.
Now, say we the previous command succeeded and produced a MoneyTransferred event. Somewhere in the system we have an event processor (MoneyTransferEmailNotifier) that handles MoneyTransferred events and will send an email notification to the recipient of the transfer.
This MoneyTransferEmailNotifier is subscribed to MoneyTransferred events. But note that system sending the MoneyTransferred event does not really care who or how many listeners there are to this event. The whole point is the decoupling here. I raise an event and don't care if there zero or 20 listeners that subscribe to this event.
At this point, if there is no reliable messaging (minimally at-least-once-delivery) provided by the infrastructure, how can we prevent the loss of the MoneyTransferred event? I do want the recipient to get his e-mail notification.
I fail to see how any real 'business-level' solution will resolve this.
(1) One of the solutions I can think of is by explicitly subscribing to events on 'business level' and thereby bypassing any infrastructure component. But aren't we at that moment introducing infrastructure in our business?
(2) The other 'solution' would be by introducing a process manager that does something like this:
PM receives Transfer command
PM forwards Transfer command to the accounts subsystem
If successful, sends command SendEmailNotification(recipient) to the notification subsystem
This does seem to be the solution that DDD prescribes, correct? But doesn't this introduce more coupling?
What do you think?
Edit 2016-04-16
Maybe the root question is a little bit more simplistic: If you do not have an infrastructural component that ensures at-least or exactly-once delivery, how can you ensure (when you're in an at-most-once infrastructure) that your events emitted will be received?
Not all events need to be delivered but there are many that are key (like the example of sending the confirmation email)
This MoneyTransferEmailNotifier is subscribed to MoneyTransferred events. But note that system sending the MoneyTransferred event does not really care who or how many listeners there are to this event. The whole point is the decoupling here. I raise an event and don't care if there zero or 20 listeners that subscribe to this event.
Your tangle, I believe, is here - that only the publish subscribe middleware can deliver events to where they need to go.
Greg Young covers this in his talk on polyglot data (slides).
Summarizing: the pub/sub middleware is in the way. A pull based model, where consumers retrieve data from the durable event store gives you a reliable way to retrieve the messages from the store. So you pull the data from the store, and then use the business level data to recognize previous work as before.
For instance, upon retrieving the MoneyTransferred event with its business data, the process manager looks around for an EmailSent event with matching business data. If the second event is found, the process manager knows that at least one copy of the email was successfully delivered, and no more work need be done.
The push based models (pub/sub, UDP multicast) become latency optimizations -- the arrival of the push message tells the subscriber to pull earlier than it normally would.
In the extreme push case, you pack into the pushed message enough information that the subscriber(s) can act upon it immediately, and trust that the idempotent handling of the message will prevent problems when the redundant copy of the message arrives on the slower channel.
If nobody needs reliable messaging on transport level, how to implement reliable PubSub on business level?
The original article does not state that "nobody needs reliable messaging on transport level", it states that the ordering of messages should be enforced at the business level because, in some cases, if this ordering is an important characteristic of the business.
In any case, PubSub is at the infrastructure level, you can't say that you implement PubSub at the business level. It doesn't make sense.
But then how you could ensure only-once-delivery at the business level? By using a Saga/Process manager. On of the important responsibilities of them is exactly that. You can combine that with idempotent Aggregates. Also, you could identify terms that emphasis ordering from the Ubiquitous language like transaction phase and include them in your domain models (for example as properties of the events).
If you do not have an infrastructural component that ensures at-least
or exactly-once delivery, how can you ensure (when you're in an
at-most-once infrastructure) that your events emitted will be
received?
If you do not have at-least-once then you could use the first event that it is initiating the hole process. I would use event polling and a Saga that ensure that every important step in the process is reached at the right moment.
In your case, as the sending of the email is an important business aspect, I would include it as a step in the process.

Detect dropped messages in ZeroMQ Queues

Since it does not seem to be possible to query/inspect the underlying ZeroMQ queues/buffers sockets to see how much they are utilized, is there some way to detect when a message is dropped due to full buffers in a Publisher socket when sent/queued?
For example, if the publisher queue is full, the zmq_send operation will simply drop the message.
Basically, what I want to achieve is a way to detect situations where the queues are getting stressed and/or full to be able to (later on) tune the solution to work better. One alternative way would be to add a sequence number to each message and do a simple calculation in the subscriber but I can never be sure that a message was lost due to full buffers in the publisher.
There is an example for this in the ZeroMQ Guide (which you should read and digest if you want to use 0MQ happily): http://zguide.zeromq.org/page:all#Slow-Subscriber-Detection-Suicidal-Snail-Pattern
The mechanism is as you answered yourself, to add a sequence number in the message, and allow the subscriber to detect gaps and take appropriate action. For most pubsub scenarios you can raise the default HWM, which is 1,000, to something much higher; it depends on your average message size.
I know this is an old post but here is what I did when recently facing the same issue.
I opted to use a DEALER/ROUTER and set the ZMQ_SNDHWM option to 1. Also I provided the timeout parameter on each zmq_send(). The timeout could be anything between 10 ms to 3 seconds, depending on what your scenario is ( a local or remote send ).
If the message is not sent within the timeout or the send-buffer is full the zmq_send() will return false. That enabled me to set up a retry queue in front of zmq. I know it's not a perfect solution but for me it worked just fine. What puzzles me though is the meaning of true/false returned by the DEALER-socket zmq_send(). I have not been able to find the answer to that question. Whether it indicates that the message has been buffered or that the message has been delivered to the ROUTER has eluded me. In my case I got the results needed anyway.
Just for the record this was done using netmq but I guess it applies to ZeroMQ as well.
I do agree wtih james though. ZeroMQ ( and netmq ) should at least provide a way to inspect the queue ( and get the messages out ) and also a way to tell the various sockets not to drop messages. The best option would be to send messages not delivered in timely fashion according to the configured options to some sort of deadletter queue. The deadletter queue could then be handled separately.

Resources