How does a microservice return data to the caller when using a message broker? or a message queue? - microservices

I am prettty new to microservices, and I am trying to figure out how to set a micro-service architecture in which my publisher that emits an event, can receive a response with data from the consumer within the publisher?
From what i have read about message-brokers and message-queues, it seems like it's one-way communication. The producer emits an event (or rather, sends a message) which is handled by the message broker, and then the consumer consumes that event and performs some action.
This allows for decoupled code, which is part of what im looking for, but i dont understand if the consumer is able to return any data to the caller.
Say for example I have a microservice that communicates with an external API to fetch data. I want to be able to send a message or emit an event from my front-facing server, which then calls the service that fetches data, parses the data, and then returns that data back to my servver1 (front-facing server)
Is there a way to make message brokers or queues bidirectional? Or is it only useable in one direction. I keep reading message brokers allow services to communicate with each other, but I only find examples in which data flow goes one way.
Even reading rabbitMQ documentation hasn't really made it very clear to me how i could do this

In general, when talking about messaging, it's one-way.
When you send a letter to someone you're not opening up a mind-meld so that they telepathically communicate their response to you.
Instead, you include a return address (or some other means of contacting you).
So to map a request-response interaction when communicating with explicit messaging (e.g. via a message queue), the solution is the same: you include some directions which the recipient can/will interpret as "send a response here". That could, for instance be, "publish a message on this queue with this correlation ID".
Your publisher then, after sending this message, subscribes to the queue it's designated and waits for a message with the expected correlation ID.
Needless to say, this is fairly elaborate: you are, in some sense, reimplementing a decent portion of a session protocol like TCP on top of a datagram protocol like IP (albeit in this case, we may have some stronger reliability guarantees than we'd get from IP). It's worth noting that this sort of request-response interaction intrinsically couples the two parties (we can't really say "sender and receiver": each is the other's audience), so we're basically putting in some effort to decouple the two sides and then some more effort to recouple them.
With that in mind, if the actual business use case calls for a request-response interaction like this, consider implementing it with an actual request-response protocol (e.g. REST over HTTP or gRPC...) and accept that you have this coupling.
Alternatively, if you really want to pursue loose coupling, go for broke and embrace the asynchronicity at the heart of the universe (maybe that way lies true enlightenment?). Have your publisher return success with that correlation ID as soon as its sent its message. Meanwhile, have a different service be tracking the state of those correlation IDs and exposing a query interface (CQRS, hooray!). Your client can then check at any time whether the thing it wanted succeeded, even if its connection to your publisher gets interrupted.

Queues are the wrong level of abstraction for request-reply. You can build an application out of them, but it would be nontrivial to support and operate.
The solution is to use an orchestration system like temporal.io or AWS Step Functions. These services out of the box provide state management, asynchronous communication, and automatic recovery in case of various types of failures.

Related

Message validation for async messaging systems

I'm looking for the best approach as to how I can go about doing validation of a message as its enqueued in async messaging based systems.
Scenario:
Let's say we have a two services A and B where they need to interact with each other asynchronously. And we have a queue between them lets say SQS which will receive the message from A, which will be then polled by service B.
Ask:
How can I validate the message like doing schema validation as its enqueued to SQS since currently SQS doesnt have any in-built schema validation functionality like we have for JMS
Couple of options I can think of:
Have a validation layer maybe a small service sitting between A and SQS queue but not sure how feasible this will be
Use some sort of MOM like AWS Eventbridge between A and SQS queue as it has functionalities to validate schemas as well as it could act as a central location to store all the schemas
Have a rest endpoint in B that'll do the validation and have SQS sitting behind B but then this removes the async communication b/w A and B
Would appreciate any inputs on the above ask and how it could be resolved via best practices.
I'd recommend to read about the Mediator Topology of Event-Driven architecture style. From the details that you shared, it sounds to me that putting a "Mediator Service" called M for example, which will get messages from A, make the required validations, and then will send the message to SQS on its way to B - will achieve what you want.
Validation of the message payloads can occur on the "way in" or the "way out" depending on your use case and scaling needs. Most scenarios will aim to prevent invalid data getting too far downstream i.e. you will validate before putting data into SQS.
However, there are reasons you may choose to validate the message payload while reading from the queue. For example, you may have many services adding messages, those messages may have multiple "payload versions" over time, different teams could be building services (frontend and backend) etc. Don't assume everything and everyone is consistent.
Assuming that the payload data in SQS is validated and can be processed by a downstream consumer without checking could cause lots of problems and/or breaking scenarios. Always check your data in these scenarios. In my experience it's either the number one reason, or close to it, for why breaking changes occur.
Final point: with event-driven architectures the design decision points are not just about the processing/compute software services but also about the event data payloads themselves which also have to be designed properly.

How to get data a ZMQ_PUB service?

Can I publisher service receive data from an external source and send them to the subscribers?
In the wuserver.cpp example, the data are generated from the same script.
Can I write a ZMQ_PUBLISHER entity, which receives data from external data source / application ... ?
In this affirmation:
There is one more important thing to know about PUB-SUB sockets: you do not know precisely when a subscriber starts to get messages. Even if you start a subscriber, wait a while, and then start the publisher, the subscriber will always miss the first messages that the publisher sends. This is because as the subscriber connects to the publisher (something that takes a small but non-zero time), the publisher may already be sending messages out.
Does this mean, that a PUB-SUB ZeroMQ pattern is performed to a best effort - UDP style?
Q1: Can I write a ZMQ_PUBLISHER entity, which receives data from external data source/application?
A1: Oh sure, this is why ZeroMQ is so helping us in designing smart distributed-systems. Just imagine the PUB-side process to also have other { .bind() | .connect() }-calls, so as to establish such other links to data-feeder(s), and you are done to operate the wished to have scheme. In distributed-systems this gives you a new freedom to smart integrate heterogeneous systems to talk to each other in a very efficient way.
Q2:Does this mean, that a PUB-SUB ZeroMQ pattern is performed to a best effort - UDP style?
A2: No, it has another meaning. The newly declared subscriber entities at some uncertain moment start to negotiate their respective subscription-topic filtering and such a ( distributed ) process takes some a-priori unknown time. Unless until the new / changed topic-filter policy was established, there is nothing to go into the SUB-side exgress interface to meet a .recv()-call, so no one can indeed tell, when that will get happened, can he?
On a higher level, there is another well known dichotomy of ZeroMQ -- Zero-Warranty Principle -- expect to either get delivered a complete message or none at all, which prevents the framework users from a need to handle any kind of damaged / inconsistent message-payloads. Either OK, or None. That's a great warranty. The more for distributed-systems.

If nobody needs reliable messaging on transport level, how to implement reliable PubSub on business level?

This question is mostly out of curiosity. I read this article about WS-ReliableMessaging by Marc de Graauw some time ago and agreed that reliable messaging should be applied on the business level as whenever possible.
Now, the question is, he explains clearly what his approach is in a point-to-point fashion. However, I fail to see how you could implement reliable messaging on the business level in a Publish/Subscribe situation.
I will try to demonstrate the difference by showing commands (point-to-point) vs. events (publish/subscribe). Note that these examples are highly simplified.
Command: Transfer(uniqueId, amount, sourceAccount, recipientAccount)
If the account holder sends this transfer, he could wait for the confirmation MoneyTransferred (assuming this event will contain a reference to the uniqueId in the Transfer command.
If the account holder doesn't received the MoneyTransferred within a given timeout period, he could send the same command again. (of course assuming the command processor is idempotent)
So I see how reliable messaging could work on business level in a point-to-point fashion.
Now, say we the previous command succeeded and produced a MoneyTransferred event. Somewhere in the system we have an event processor (MoneyTransferEmailNotifier) that handles MoneyTransferred events and will send an email notification to the recipient of the transfer.
This MoneyTransferEmailNotifier is subscribed to MoneyTransferred events. But note that system sending the MoneyTransferred event does not really care who or how many listeners there are to this event. The whole point is the decoupling here. I raise an event and don't care if there zero or 20 listeners that subscribe to this event.
At this point, if there is no reliable messaging (minimally at-least-once-delivery) provided by the infrastructure, how can we prevent the loss of the MoneyTransferred event? I do want the recipient to get his e-mail notification.
I fail to see how any real 'business-level' solution will resolve this.
(1) One of the solutions I can think of is by explicitly subscribing to events on 'business level' and thereby bypassing any infrastructure component. But aren't we at that moment introducing infrastructure in our business?
(2) The other 'solution' would be by introducing a process manager that does something like this:
PM receives Transfer command
PM forwards Transfer command to the accounts subsystem
If successful, sends command SendEmailNotification(recipient) to the notification subsystem
This does seem to be the solution that DDD prescribes, correct? But doesn't this introduce more coupling?
What do you think?
Edit 2016-04-16
Maybe the root question is a little bit more simplistic: If you do not have an infrastructural component that ensures at-least or exactly-once delivery, how can you ensure (when you're in an at-most-once infrastructure) that your events emitted will be received?
Not all events need to be delivered but there are many that are key (like the example of sending the confirmation email)
This MoneyTransferEmailNotifier is subscribed to MoneyTransferred events. But note that system sending the MoneyTransferred event does not really care who or how many listeners there are to this event. The whole point is the decoupling here. I raise an event and don't care if there zero or 20 listeners that subscribe to this event.
Your tangle, I believe, is here - that only the publish subscribe middleware can deliver events to where they need to go.
Greg Young covers this in his talk on polyglot data (slides).
Summarizing: the pub/sub middleware is in the way. A pull based model, where consumers retrieve data from the durable event store gives you a reliable way to retrieve the messages from the store. So you pull the data from the store, and then use the business level data to recognize previous work as before.
For instance, upon retrieving the MoneyTransferred event with its business data, the process manager looks around for an EmailSent event with matching business data. If the second event is found, the process manager knows that at least one copy of the email was successfully delivered, and no more work need be done.
The push based models (pub/sub, UDP multicast) become latency optimizations -- the arrival of the push message tells the subscriber to pull earlier than it normally would.
In the extreme push case, you pack into the pushed message enough information that the subscriber(s) can act upon it immediately, and trust that the idempotent handling of the message will prevent problems when the redundant copy of the message arrives on the slower channel.
If nobody needs reliable messaging on transport level, how to implement reliable PubSub on business level?
The original article does not state that "nobody needs reliable messaging on transport level", it states that the ordering of messages should be enforced at the business level because, in some cases, if this ordering is an important characteristic of the business.
In any case, PubSub is at the infrastructure level, you can't say that you implement PubSub at the business level. It doesn't make sense.
But then how you could ensure only-once-delivery at the business level? By using a Saga/Process manager. On of the important responsibilities of them is exactly that. You can combine that with idempotent Aggregates. Also, you could identify terms that emphasis ordering from the Ubiquitous language like transaction phase and include them in your domain models (for example as properties of the events).
If you do not have an infrastructural component that ensures at-least
or exactly-once delivery, how can you ensure (when you're in an
at-most-once infrastructure) that your events emitted will be
received?
If you do not have at-least-once then you could use the first event that it is initiating the hole process. I would use event polling and a Saga that ensure that every important step in the process is reached at the right moment.
In your case, as the sending of the email is an important business aspect, I would include it as a step in the process.

Is there an enterprise message queue which can drop duplicate messages (first value stays)?

I am looking looking for a message queue with these requirements. Couldn't find it; maybe the closest was the rabbitmq-lvc plugin (but I need the first value in the line to stick and stay in front).
Would anyone know a technology to support these?
message queue is FIFO
if a duplicate message is being enqueued, the message queue itself either rejects or drops it.
For example, producers put these three messages (each with a discriminator value) into the queue in this sequence: M1(discriminator=7654), M2(discriminator=2435), M3(discriminator=7654).
Now I want the message queue to see that M3 has the same discriminator value as M1 and thus drop/reject M3. Consumers receive only: M1, M2.
Thanks
Tom
I don't know the other transports but I know that WebSphere MQ doesn't do this and I believe that the explanation why would apply broadly across the category. I'd be very surprised to find that any messaging transport actually provides this. Here are a few reasons why:
Async messages are supposed to be atomic. Different vendors make their own accommodations for message affinity (a relationship between two or more messages) but as a rule, message affinity is to be avoided. Your use case not only requires the transport to deal with message affinity, but to do so over an indeterminate interval between related messages.
Message payload is a blob. For performance reasons, WMQ doesn't touch message payloads except for things like compression or code page conversion. Anything that requires parsing the message payload is a job for WebSphere Message Broker, DataPower or WebSphere ESB. I would expect any messaging transport which claims to be performant would face similar issues because parsing payloads results in longer code paths and non-linear performance degradation. The exception is message properties but WMQ uses these for selection only and I expect that is generally the case.
Stateless operation. As a transport, the state of the application may be stored in a persistent message but the state of the transport layer should not depend on the state of the application across different units of work. Again, an ESB type of product is best suited when you want to delegate management of some of the application state to the messaging layer and especially when such management spans many units of work.
Assured delivery. WMQ was designed to never lose your persistent message. If the app explicitly sets expiry the message might go away because the sender said it was OK to do so. If the message is non-persistent it might go away, but only in an exceptional condition and, again, because the sender said it was OK to do so. The use case you describe might result in a message going away not because the sender said it was OK, or even because the recipient said it was OK but because of an interaction with some unrelated 3rd party who happened to beat you to the queue with a duplicate value. What if that first message has an invalid header or code page problem and gets rolled back? What if I as an attacker spew out garbage messages with all possible 4-digit values for discriminator?
As I said, I don't know the other messaging products so there may be something out there which meets your requirement and if so I'll be interested to read about it. However in the event hat nobody replies, this post may shed some light on the reasons why.

Can AMQP clients be both a publisher and subscriber?

I'm just starting to research AMQP and I'm wondering if I'd be using it for something it's not designed for. Here's something like what I want to do:
ClientA does goes about it's business
and publishes it's state to some
exchange (correct me if I use the
wrong terms anywhere).
ClientB connects to the same broker
and "says what publishers are
publishing here? I choose you,
clientB. What is going on?".
ClientA says "My foo is bar and my baz
is true"
ClientB says "OK. Set your baz to
false"
edit for a less abstract example"
ClientA talks/listens to a hardware
device, say a video projector. When
ClientB comes online, it wants to find
any projector clients (like ClientA)
that are connected and then to know
the status of the projectors (is the
lamp on?) and also change, if it needs to, the status
(turn the lamp off). So ClientA is
keeping some state (lamp is off) and
can send it out when requested, and
call also respond to commands from the
exchange and convert and pass them to
the projector (turn lamp on).
I'm finding it hard to follow your example, but it sounds like you want these A and B types to have back-and-forth conversations with each other. Is that correct?
AMQP is better suited for asynchronous message passing, and to add the kind of point-to-point style you're describing requires that you set up request and reply queues so that clients can both send and receive messages. It's certainly possible to have clients both publish and consume messages.
This is possible and it would make sense if the different actors in your example, are networked devices because AMQP would provide a loosely coupled way of messaging.
One thing to watch out for is the last abstract line where client B says "OK, set some attribute". That sounds suspiciously like a scenario where subroutine calls return some value and then the next step takes place. AMQP can certainly simulate that kind of RPC, but it works better when processes can send a message and don't have to wait for completing.
If most of your messaging doesn't involve waiting for turnaround replies, then AMQP sounds like a fit for what you are doing. But if most of your needs are RPC, then it may not be the best choice.
AMQP really shines when there are future possibilities, for instance in your scenario, if you needed to add a couple thousand projectors, 10,000 client Bs, and several other device types that also need to exchange status. The loose coupling of AMQP makes it easy to add other applications to the broker, just by declaring new exchanges.

Resources