Messaging: How do your messages look like - microservices

This question is about message queueing in between a service architecture. There is hardly something to find about this topic.
The situation:
Microservice A and microservice B. Microservice A deals about entity "something" and B needs to about. I keep it general to avoid discussions about boundaries.
In our case A sends a message which contains event and related entity id like
Event: somethingCreated
SomethingID: 1234
B consumes this message and if it needs further information it fetchs this from A with SomethingID.
The second approach would be that the message not only contains the information above but also meta data like
Event: somethingCreated
SomethingID: 1234
SomeFieldKey: someFieldValue
...
Lean message:
Pro:
* Less network usage
* Always the same structure of messages
Cons:
* If information from A is needed on demand there must be some mechanism to catch e. g. network failures
Fat message:
Pro:
* Information is already there
Con:
* What if the attached information is not enough?
So it has both pros and cons and my intention here is to get an overview Which approach you are using.
Thanks for answers in advance

Simple answer is it depends,
We have services that expose all the data with their events and we have services which just shares the reference id and also services which are between these two when it comes to event payload.
Our point is the service which is producing events is mostly in control of what the content of payload will be. We review the use case and monitor the usage of the events and services call and accordingly and make the payload fatter or leaner.
We do have an upper limit on our message size but other than that no restriction.
When data flows in between services network latency has not been an issue for us.( I mean not because of increase in size of the payload)
So you need to allow your individual services to take a call. Each service has an SLA to meet in terms of response time and when it gets breached, you review, find out bottlenecks and resolve them.

Related

Message validation for async messaging systems

I'm looking for the best approach as to how I can go about doing validation of a message as its enqueued in async messaging based systems.
Scenario:
Let's say we have a two services A and B where they need to interact with each other asynchronously. And we have a queue between them lets say SQS which will receive the message from A, which will be then polled by service B.
Ask:
How can I validate the message like doing schema validation as its enqueued to SQS since currently SQS doesnt have any in-built schema validation functionality like we have for JMS
Couple of options I can think of:
Have a validation layer maybe a small service sitting between A and SQS queue but not sure how feasible this will be
Use some sort of MOM like AWS Eventbridge between A and SQS queue as it has functionalities to validate schemas as well as it could act as a central location to store all the schemas
Have a rest endpoint in B that'll do the validation and have SQS sitting behind B but then this removes the async communication b/w A and B
Would appreciate any inputs on the above ask and how it could be resolved via best practices.
I'd recommend to read about the Mediator Topology of Event-Driven architecture style. From the details that you shared, it sounds to me that putting a "Mediator Service" called M for example, which will get messages from A, make the required validations, and then will send the message to SQS on its way to B - will achieve what you want.
Validation of the message payloads can occur on the "way in" or the "way out" depending on your use case and scaling needs. Most scenarios will aim to prevent invalid data getting too far downstream i.e. you will validate before putting data into SQS.
However, there are reasons you may choose to validate the message payload while reading from the queue. For example, you may have many services adding messages, those messages may have multiple "payload versions" over time, different teams could be building services (frontend and backend) etc. Don't assume everything and everyone is consistent.
Assuming that the payload data in SQS is validated and can be processed by a downstream consumer without checking could cause lots of problems and/or breaking scenarios. Always check your data in these scenarios. In my experience it's either the number one reason, or close to it, for why breaking changes occur.
Final point: with event-driven architectures the design decision points are not just about the processing/compute software services but also about the event data payloads themselves which also have to be designed properly.

How to handle events processing time between services

Let's say we have two services A and B. B has a relation to A so it needs to know about the existing entities of A.
Service A publishes events every time an entity is created or updated. Service B subscribes to the events published by A and therefore knows about the entities existing in service A.
Problem: The client (UI or other micro services) creates a new entity 'a' and right away creates a new entity 'b' with a reference to 'a'. This is done without much delay so what happens if service B did not receive/handle the event from B before getting the create request with a reference to 'b'?
How should this be handled?
Service B must fail and the client should handle this and possibly do retry.
Service B accepts the entity and over time expect the relation to be fulfilled when the expected event is received. Service B provides a state for the entity that ensures it cannot be trusted before the relation have been verified.
It is poor design that the client can/has to do these two calls in the same transaction. The design should be different. How?
Other ways?
I know that event platforms like Kafka ensures very fast event transmittance but there will always be a delay and since this is an asynchronous process there will be kind of a race condition.
What you're asking about falls under the general category of bridging the gap between Eventual Consistency and good User Experience which is a well-documented challenge with a distributed architecture. You have to choose between availability and consistency; typically you cannot have both.
Your example raises the question as to whether service boundaries are appropriate. It's a common mistake to define microservice boundaries around Entities, but that's an anti-pattern. Microservice boundaries should be consistent with domain boundaries related to the business use case, not how entities are modeled within those boundaries. Here's a good article that discusses decomposition, but the TL;DR; is:
Microservices should be verbs, not nouns.
So, for example, you could have a CreateNewBusinessThing microservice that handles this specific case. But, for now, we'll assume you have good and valid reasons to have the services divided as they are.
The "right" solution in your case depends on the needs of the consuming service/application. If the consumer is an application or User Interface of some sort, responsiveness is required and that becomes your overriding need. If the consumer is another microservice, it may well be that it cares more about getting good "finalized" data rather than being responsive.
In either of those cases, one good option is a facade (aka gateway) service that lives between your client and the highly-dependent services. This service can receive and persist the request, then respond however you'd like. It can give the consumer a 200 - OK response with an endpoint to call back to check status of the request - very responsive. Or, it could receive a URL to use as a webhook when the response is completed from both back-end services, so it could notify the client directly. Or it could publish events of its own (it likely should). Essentially, you can tailor the facade service to provide to as many consumers as needed in the way each consumer wants to talk.
There are other options too. You can look into Task-Based UI, the Saga pattern, or even just Faking It.
I think you would like to leverage the flexibility of a broker and the confirmation of a synchronous call . Both of them can be achieved by this
https://www.rabbitmq.com/tutorials/tutorial-six-dotnet.html

What are the strategies for payload in an event-driven architecture

I want to know that further details about payloads in an event-driven architecture. I used several online resources and didn't get many details. Please help me to find,
Use of the Full Payload.
Provide Metadata and an API link with a token to access the Actual Payload, than sending the full data.
To answer your question, api link rather than full data let's take a sample:
In Amazon, Order Microservice sends a event OrderCancelled and Customer service listen to that event.
Now there could be two ways of sending the event data:
Send complete order data in the Event
Pros: Listener services do not need to query Order Service for their functioning.
Cons: Lots of data will be passed in the event even though only 10 % is used. Lots of I/O.
Send only order id, cancel reason , customer id , date in the event
Pros: If the data is choosen carefully, much less data is sent in the event.
Cons: If the data is choosen incorrectly, then that means lots of API requests.

In event-driven architecture, is it ok to have all services send their event to a component that forwards it to the proper service?

Let's say I want to set up and event-driven architecture with services A-D where the events propagate as follows
A
/ \
B C
/
D
In other words,
(1) A publishes an event
(2) Subscribers B and C receive A's event
(3) C publishes an event
(4) Subscriber D receive's C's event
One way is to have services B and C directly listen to a queue into which A posts messages. But the issue I see with this is maintenance. Once the system becomes complicated with 1000s of subscriptions, it becomes difficult to have any visibility into how the updates are propagating.
A solution I propose to this problem is to have another service X that knows the tree in the in the first image and is responsible for directing the propagation of events according to the tree. Every service publishes its event to X and it publishes the event to the listening services. So it's kinda of a middleman like
A
|
X
/ \
B C
|
X
|
D
This also makes it easier to track the event propagation.
Are there any downsides to this (other than extra cost associating with twice as much message transferring)?
You’re thinking of events like they are implemented in a Winforms UI where the publisher sends the event directly to the subscriber. That’s not how events work in an EDA architecture. The word “event” has taken on a whole new meaning.
Before we start, you’re jumbling together the ideas of a message and an event when they really need to be kept separate. A message is a request for some action to happen, while an event is notification that something has already happened. The important distinction for this discussion is that a message publisher assumes 1 or more other processes will receive and process the message. If the message is not processed by something, downstream errors will occur. An event has no such assumption and can go unread without adversely affecting anything. Another difference is that once messages are processed they are typically thrown away, whereas events are kept for an extended period (days, or weeks).
With that in mind, the ‘X’ service you talk about already exists (please don’t build one) and is integral to the process – it’s called the bus. There are 2 types of bus; a message bus (think RabbitMQ, MSMQ, ZeroMQ, etc) or event bus (Kafka, Kinesis, or Azure Event Hub). In either case, a publisher puts a message on to the bus and subscribers get it from the bus. You may implement the bus servers as multiple physical buses, but when imagining it think of them all being the same logical bus.
The key point that’s tripping you up, and it’s a subtle difference, is thinking that the message bus has business logic indicating where messages go. The business logic of who gets what message is determined by the subscribers – the message bus is just a holding place for the messages to wait for pickup.
In your example, A publishes an event to the bus with a message type of “MT1”. B and C both tell the bus that they are interested in events of type “MT1”. When the bus receives the request from B and C to be notified of “MT1” messages, the bus creates a queue for B and a queue for C. When A publishes the message, the bus puts a copy in the “B-MT1” queue and a copy in the “C-MT1” queue. Note that the bus doesn’t know why B and C want to receive those messages, only that they’ve subscribed.
These messages sit there until processed by their respective subscribers (the processes can poll or the bus can push the messages, but the key idea is that the messages are held until processed). Once processed, the messages are thrown away.
For C to communicate with D, D will subscribe to messages of type “MT2” and C will publish them to the bus.
Constantin’s answer above has a point that this is a single point of failure, but it can be managed with standard network architecture like failover servers, local message persistence, message acknowledgements, etc.
One of your concerns is that with 1000’s of subscriptions it becomes difficult to follow the path, and you’re right. This is an inherent downside of EDA and there’s nothing you can do about it. Eventual consistency is also something the business is going to complain about, but it’s part of the beast and is actually a good thing from a technical perspective because it enables more scalability. The biggest problem I’ve found using the term Eventual Consistency is that the business thinks it means hours or days, not seconds.
BTW, This whole discussion assumes the message publishers and subscribers are different apps. All the same ideas can be applied within the same address space, just with a different bus. If you’re a .net shop look at Mediatr. For other tech stacks, there are similar solutions that I’m sure google knows about.
If your main concern is visibility into the propagation of events (which is a very valid concern for debugging and long-term application maintenance of a distributed system), you can use a correlation identifier to trace the generation of messages from the initial event through the entire chain. You don't need to build another layer of orchestration -- let your messaging platform handle that for you.
Most messaging platforms/libraries have the concept built in: e.g., NServiceBus defines a ConversationId field in the message headers, and AMQP defines a correlation-id field in the basic messaging model.
Your system should have some kind of logging that allows you to audit messages -- the correlation ID will allow you to group all messages that result from a single command/request to make debugging distributed logic much simpler.
If you set a GUID in the client requests, you can even correlate actions in the UI to the backend API, right through all the events recursively generated.
It is OK but the microservices shouldn't care how they get the messages in the first place. From their point of view the input messages just arrive. You will then be tempted to design your system to depend on some global order of events, which is hard in a distributed scalable system. Resist that temptation and design your system to relay only on local ordering of events (i.e. the ordering in an Event stream emitted by an Aggregate in Event sourcing + DDD).
One downside that I see is that the availability and the scalability may be hurt. You will then have a single point of failure for the entire system. If this fails everything fails. When it needs to be scaled up then you will have again problems as you will have distributed messaging system.

An event store could become a single point of failure?

Since a couple of days I've been trying to figure it out how to inform to the rest of the microservices that a new entity was created in a microservice A that store that entity in a MongoDB.
I want to:
Have low coupling between the microservices
Avoid distributed transactions between microservices like Two Phase Commit (2PC)
At first a message broker like RabbitMQ seems to be a good tool for the job but then I see the problem of commit the new document in MongoDB and publish the message in the broker not being atomic.
Why event sourcing? by eventuate.io:
One way of solving this issue implies make the schema of the documents a bit dirtier by adding a mark that says if the document have been published in the broker and having a scheduled background process that search unpublished documents in MongoDB and publishes those to the broker using confirmations, when the confirmation arrives the document will be marked as published (using at-least-once and idempotency semantics). This solutions is proposed in this and this answers.
Reading an Introduction to Microservices by Chris Richardson I ended up in this great presentation of Developing functional domain models with event sourcing where one of the slides asked:
How to atomically update the database and publish events and publish events without 2PC? (dual write problem).
The answer is simple (on the next slide)
Update the database and publish events
This is a different approach to this one that is based on CQRS a la Greg Young.
The domain repository is responsible for publishing the events, this
would normally be inside a single transaction together with storing
the events in the event store.
I think that delegate the responsabilities of storing and publishing the events to the event store is a good thing because avoids the need of 2PC or a background process.
However, in a certain way it's true that:
If you rely on the event store to publish the events you'd have a
tight coupling to the storage mechanism.
But we could say the same if we adopt a message broker for intecommunicate the microservices.
The thing that worries me more is that the Event Store seems to become a Single Point of Failure.
If we look this example from eventuate.io
we can see that if the event store is down, we can't create accounts or money transfers, losing one of the advantages of microservices. (although the system will continue responding querys).
So, it's correct to affirmate that the Event Store as used in the eventuate example is a Single Point of Failure?
What you are facing is an instance of the Two General's Problem. Basically, you want to have two entities on a network agreeing on something but the network is not fail safe. Leslie Lamport proved that this is impossible.
So no matter how much you add new entities to your network, the message queue being one, you will never have 100% certainty that agreement will be reached. In fact, the opposite takes place: the more entities you add to your distributed system, the less you can be certain that an agreement will eventually be reached.
A practical answer to your case is that 2PC is not that bad if you consider adding even more complexity and single points of failures. If you absolutely do not want a single point of failure and wants to assume that the network is reliable (in other words, that the network itself cannot be a single point of failure), you can try a P2P algorithm such as DHT, but for two peers I bet it reduces to simple 2PC.
We handle this with the Outbox approach in NServiceBus:
http://docs.particular.net/nservicebus/outbox/
This approach requires that the initial trigger for the whole operation came in as a message on the queue but works very well.
You could also create a flag for each entry inside of the event store which tells if this event was already published. Another process could poll the event store for those unpublished events and put them into a message queue or topic. The disadvantage of this approach is that consumers of this queue or topic must be designed to de-duplicate incoming messages because this pattern does only guarantee at-least-once delivery. Another disadvantage could be latency because of the polling frequency. But since we have already entered the eventually consistent area here this might not be such a big concern.
How about if we have two event stores, and whenever a Domain Event is created, it is queued onto both of them. And the event handler on the query side, handles events popped from both the event stores.
Ofcourse every event should be idempotent.
But wouldn’t this solve our problem of the event store being a single point of entry?
Not particularly a mongodb solution but have you considered leveraging the Streams feature introduced in Redis 5 to implement a reliable event store. Take a look this intro here
I find that it has rich set of features like message tailing, message acknowledgement as well as the ability to extract unacknowledged messages easily. This surely helps to implement at least once messaging guarantees. It also support load balancing of messages using "consumer group" concept which can help with scaling the processing part.
Regarding your concern about being the single point of failure, as per the documentation, streams and consumer information can be replicated across nodes and persisted to disk (using regular Redis mechanisms I believe). This helps address the single point of failure issue. I'm currently considering using this for one of my microservices projects.

Resources