Difficulty Understanding Event Sourcing Microservice Event Receiving/Communication - microservices

I've been aware of event sourcing, CQRS, DDD and micro services for a little while and I'm now at that point where I want to try and start implementing stuff and giving something a go.
I've been looking into the technical side of CQRS and I understand the DDD concepts in there. How both the write side handles commands from the UI and publishes events from it, and how the read side handles events and creates projections on them.
The difficulty I'm having is the communication & a handling events from service-to-service (both from a write to read service and between micro services).
So I want to focus on eventstore (this one: https://eventstore.com/ to be less ambiguous). This is what I want to use as I understand it is a perfect for event sourcing and the simple nature of storing the events means I can use this for a message bus as well.
So my issue falls into two questions:
Between the write and the read, in order for the read side to receive/fetch the events created from the write side, am i right in thinking something like a catch up subscription can be used to subscribe to a stream to receive any events written to it or do i use something like polling to fetch events from a given point?
Between micro services, I am having an even harder time... So when looking at CQRS tutorials/talks etc... they always seem to talk with an example of an isolated service which receives commands from the UI/API. This is fine. I understand the write side will have an API attached to it so the user can interact with it to perform commands. E.g. create a customer. However... say if I have two micro services, e.g. a order micro service and an shipping micro service, how does the shipping micro service get the events published from the order micro service. Specifically, how does those customer events, translate to commands for the shipping service.
So let's take a simple example of: - Command created from the order's API to place an order. - A OrderPlacedEvent is published to the event store. How does the shipping service listen and react to this is it need to then DispatchOrder and create ain turn an OrderDispatchedEvent.
Does the write side of the shipping microservice then need to poll or also have a catch up subscription to the order stream? If so how does an event get translated to an command using DDD approach?

something like a catch up subscription can be used to subscribe to a stream to receive any events written to it
Yes, using catch-up subscriptions is the right way of doing it. You need to keep the stream position of your subscription persisted somewhere as well.
Here you can find some sample code that works. I am not posting the whole snippet since it is too long.
The projection service startup flow is:
Load the checkpoint (first time ever it would be the stream start)
Subscribe to the stream from that checkpoint
The runtime flow will then be:
The subscription will then call the function you provide when it receives an event. There's some plumbing there to do, like if you subscribe to $all, you need to filter out system events (it will be easier in the next version of Event Store)
Project the event
Store the new checkpoint
If you make your projections idempotent, you can store the checkpoint from time to time and save some IO.
how does the shipping micro service get the events published from the order micro service
When you build a brand new system and you have a small team working on all the components, you can make a shortcut and subscribe to domain events from another service, as you'd do with projections. Within the integration context (between the boxes), ordering should not be important so you can use persistent subscriptions so you won't need to think about checkpoints. Event Store will do it for you.
Be aware that it introduces tight coupling on the domain event schema of the originating service. Your contexts will have the Partnership relationship or the downstream service will be a Conformist.
When you move forward with your system, you might decide to decouple those contexts properly. So, you introduce a stable event API for the service that publishes events for others to consume. The same subscription that you used for integration can now instead take care of translating domain (internal) events to integration (external) events. The consuming context would then use the stable API and the domain model of the upstream service will be free in iterating on their domain model, as soon as they keep the conversion up-to-date.
It won't be necessary to use Event Store for the downstream context, they could just as well use a message broker. Integration events usually don't need to be persisted due to their transient nature.
We are running a webinar series about Event Sourcing at Event Store, check our web site to get on-demand access to previous webinars and you might find interesting to join future ones.

The difficulty I'm having is the communication & a handling events from service-to-service (both from a write to read service and between micro services).
The difficulty is not your fault - the DDD literature is really weak when it comes to discussing the plumbing.
Greg Young discusses some of the issues of subscription in the latter part of his Polygot Data talk.
Eventide Project has documentation that does a decent job of explaining the principles behind how the plumbing fits things together.
Between micro services, I am having an even harder time...
The basic idea: your message store is fundamentally a database; when the host of your microservice wakes up, it queries the message store for messages after some checkpoint, and then feeds them to your domain logic (updating its own local copy of the checkpoint as needed).
So the host pulls a document with events in it from the store, and transforms that document into a stream of handle(Event) commands that ultimately get passed to your domain component.
Put another way, you build a host that polls the database for information, parses the response, and then passes the parsed data to the domain model, and writes its own checkpoints.

Related

Choreography Sagas in DDD - Chain of Integration Events?

I'm currently studying Saga pattern. Most examples seem to focus on Orchestration Sagas where we have one central saga execution coordinator service that dispatches and receives messages/events. Unfortunately information on how to implement Choreography Sagas seem to be lacking a bit.
In domain driven design, we have multiple bounded contexts, ideally, where each bounded context is a self contained microservice. If microservice A wants to communicate with another microservice B we use Integration Events. Integration Events are published and subscribed to using some asynchronous communication - RabbitMQ, Azure Service Bus.
Assuming we want to start some Saga, for example, where we have to run transactions on Order Service and Customer Service - how exactly do services communicate with each other? Is it just regular Integration Events or something entirely different?
The way I see it and given picture below (source), Saga would be executed this way:
A new order is created. Status is set to "Pending" and OrderSubmittedDomainEvent domain event is emitted.
Domain event handler receives OrderSubmittedDomainEvent domain event, it then creates and dispatches ReserveCreditIntegrationEvent integration event.
Customer Service receives ReserveCreditIntegrationEvent integration event.
It attempts to reserve customer credit.
If credit is successfully reserved, CustomerCreditReservedDomainEvent domain event is emitted.
Domain event handler received CustomerCreditReservedDomainEvent domain event, it creates and dispatches CreditReservedIntegrationEvent integration event.
Order Service receives CreditReservedIntegrationEvent integration event and sets Order Status to "Confirmed".
saga is completed.
Is this the right approach?
I think using Choreography rather then Orchestration for distributed transactions makes sense if you chose it for the right reasons. For instance, if you need to spare the usually higher effort of implementing a central choreography as you don't need to know what state a transaction is in until it has finished. Or because you know that the order of the transaction workflow is stable and is unlikely to change which would also be on the plus side of choreography. But would be a drawback for Choreography if the order changes frequently because you would need to adapt all microservices in that case...
So you need to know the advantages and drawbacks of the two approaches.
If you chose Choreography for the right reasons I would say that I am missing the compensation logic in your considerations. What if the credit was reserved but then the order fails in the order service? Compensation events need to be considered as well in such cases...
Other than that there is the usual suspects:
such as making sure that each service will reliably send the next event after it processed the received event. For this you could look into the Transactional Outbox pattern.
or making sure that you have deduplication of events implemented in each of the services as for reliable sending of events accross distributed transactions you cannot be a hundred percent sure that an event will only be sent once.
And if you are even interested in an alternative to the Saga pattern you can look into the Routing Slip pattern. It is well suited for distributed transaction workflows that will differ depending on the current use case by avoiding that each service needs to know each route. The sequence of the workflow is attached to the initial message of the transaction and all subsequent messages. Then each service receiving a message with the routing slip performs its tasks and passes the next message including the routing slip to the next station (service) on the list.
Note: I am not sure what exactly you mean by ...IntegrationEvent. I would not differentiate between domain and integration events, all events are relevant from the business perspective in your example otherwise they would not be relevant to other Microservices.

Should we store Events in a database? (Event Driven Design)

We have several services that publishes and subscribes to Domain Events. What we usually do is log events whenever we publish and log events whenever we process events. We basically use this to apply choreography pattern.
We are not doing Event Sourcing in these systems, and there's no programmatic use for them after publishing/processing. That's the main driver we opted not to store these in a durable container, like a database or event store.
Question is, are we missing some fundamental thing by doing this?
Is storing Events a must?
I consider queued messages as system messages, even if they represent some domain event in an event-driven architecture (pub/sub messaging).
There is absolutely no hard-and-fast rule about their storage. If you would like to keep them around you could have your messaging mechanism forward them to some auditing endpoint for storage and then remove them after some time (if necessary).
You are not missing anything fundamental by not storing them.
You're definitely not missing out on anything (but there is a catch) especially if that's not a need by the business. An Event-Sourced System would definitely store all the events generated by the system into a database (or any other event-store)
The main use of an event store is to be able to restore the state of the system to the current state in case of a failure by replaying messages. To make this process of recovery faster we have snapshots.
In your case since these events are just are only relevant until the process is completed, it would not make sense to store them until you have a failure. (this is the catch) especially in a Distributed Transaction case scenario.
What I would suggest?
Don't store the event themselves but log the relevant details about these events and maybe use an ELK stack or Grafana to store these logs.
Use either the Saga Pattern or the Routing Slip pattern in case of a Distributed Transaction and log them as well.
In case a failure occurs while processing an event, put that event into an exception queue and handle it. If it's a part of a distributed transaction make sure either they all have the same TransactionId or they have a CorrelationId so you can lookup for logs and save your system.
For reliably performing your business transactions in a distributed archicture you somehow need to make sure that your events are published at least once.
So a service that publishes events needs to persist such an event within the same transaction that causes it to get created.
Considering you are publishing an event via infrastructure services (e.g. a messaging service) you can not rely on it being available all the time.
Also, your own service instance could go down after persisting your newly created or changed aggregate but before it had the chance to publish the event via, for instance, a messaging service.
Question is, are we missing some fundamental thing by doing this? Is storing Events a must?
It doesn't matter that you are not doing event sourcing. Unless it is okay from the business perspective to sometimes lose an event forever you need to temporarily persist your event with your local transaction until it got published.
You can look into the Transactional Outbox Pattern to achieve reliable event publishing.
Note: Logging/tracking your events somehow for monitoring or later analyzing/reporting purpose is a different thing and has another motivation.

Microservices: how to track fallen down services?

Problem:
Suppose there are two services A and B. Service A makes an API call to service B.
After a while service A falls down or to be lost due to network errors.
How another services will guess that an outbound call from service A is lost / never happen? I need some another concurrent app that will automatically react (run emergency code) if service A outbound CALL is lost.
What are cutting-edge solutions exist?
My thoughts, for example:
service A registers a call event in some middleware (event info, "running" status, timestamp, etc).
If this call is not completed after N seconds, some "call timeout" event in the middleware automatically starts the emergency code.
If the call is completed at the proper time service A marks the call status as "completed" in the same middleware and the emergency code will not be run.
P.S. I'm on Java stack.
Thanks!
I recommend to look into patterns such as Retry, Timeout, Circuit Breaker, Fallback and Healthcheck. Or you can also look into the Bulkhead pattern if concurrent calls and fault isolation are your concern.
There are many resources where these well-known patterns are explained, for instance:
https://www.infoworld.com/article/3310946/how-to-build-resilient-microservices.html
https://blog.codecentric.de/en/2019/06/resilience-design-patterns-retry-fallback-timeout-circuit-breaker/
I don't know which technology stack you are on but usually there is already some functionality for these concerns provided already that you can incorporate into your solution. There are libraries that already take care of this resilience functionality and you can, for instance, set it up so that your custom code is executed when some events such as failed retries, timeouts, activated circuit breakers, etc. occur.
E.g. for the Java stack Hystrix is widely used, for .Net you can look into Polly .Net to make use of retry, timeout, circuit breaker, bulkhead or fallback functionality.
Concerning health checks you can look into Actuator for Java and .Net core already provides a health check middleware that more or less provides that functionality out-of-the box.
But before using any libraries I suggest to first get familiar with the purpose and concepts of the listed patterns to choose and integrate those that best fit your use cases and major concerns.
Update
We have to differentiate between two well-known problems here:
1.) How can service A robustly handle temporary outages of service B (or the network connection between service A and B which comes down to the same problem)?
To address the related problems the above mentioned patterns will help.
2.) How to make sure that the request that should be sent to service B will not get lost if service A itself goes down?
To address this kind of problem there are different options at hand.
2a.) The component that performed the request to service A (which than triggers service B) also applies the resilience patterns mentioned and will retry its request until service A successfully answers that it has performed its tasks (which also includes the successful request to service B).
There can also be several instances of each service and some kind of load balancer in front of these instances which will distribute and direct the requests to an available instance (based on regular performed healthchecks) of the specific service. Or you can use a service registry (see https://microservices.io/patterns/service-registry.html).
You can of course chain several API calls after another but this can lead to cascading failures. So I would rather go with an asynchronous communication approach as described in the next option.
2b.) Let's consider that it is of utmost importance that some instance of service A will reliably perform the request to service B.
You can use message queues in this case as follows:
Let's say you have a queue where jobs to be performed by service A are collected.
Then you have several instances of service A running (see horizontal scaling) where each instance will consume the same queue.
You will use message locking features by the message queue service which makes sure that as soon one instance of service A reads a message from the queue the other instances won't see it. If service A was able to complete it's job (i.e. call service B, save some state in service A's persistence and whatever other tasks you need to be included for a succesfull procesing) it will delete the message from the queue afterwards so no other instance of service A will also process the same message.
If service A goes down during the processing the queue service will automatically unlock the message for you and another instance A (or the same instance after it has restarted) of service A will try to read the message (i.e. the job) from the queue and try to perform all the tasks (call service B, etc.)
You can combine several queues e.g. also to send a message to service B asynchronously instead of directly performing some kind of API call to it.
The catch is, that the queue service is some highly available and redundant service which will already make sure that no message is getting lost once published to a queue.
Of course you also could handle jobs to be performed in your own database of service A but consider that when service A receives a request there is always a chance that it goes down before it can save that status of the job to it's persistent storage for later processing. Queue services already address that problem for you if chosen thoughtfully and used correctly.
For instance, if look into Kafka as messaging service you can look into this stack overflow answer which relates to the problem solution when using this specific technology: https://stackoverflow.com/a/44589842/7730554
There is many way to solve your problem.
I guess you are talk about 2 topics Design Pattern in Microservices and Cicruit Breaker
https://dzone.com/articles/design-patterns-for-microservices
To solve your problem, Normally I put a message queue between services and use Service Discovery to detect which service is live and If your service die or orverload then use Cicruit Breaker methods

need clarification on microservices

I need some clarifications on microservices.
1) As I understand only choreography needs event sourcing and in choreography we use publish/subscribe pattern. Also we use program likes RabbitMQ to ensure communication between publisher and subscribers.
2) Orchestration does not use event sourcing. It uses observer pattern and directly communicate with observers. So it doesn't need bus/message brokers (like RabbitMQ). And to cooridante all process in orchestration we use mediator pattern.
Is that correct?
In microservice orchestration , a centralized approach is followed for execution of the decisions and control with help of orchestrator. The orchestrator has to communicate directly with respective service , wait for response and decide based on the response from and hence it is tightly coupled. It is more of synchronous approach with business logic predominantly in the orchestrator and it takes ownership for sequencing with respect to business logic. The orchestration approach typically follows a request/response type pattern whereby there are point-to-point connection between the services.
In, microservice choreography , a decentralized approach is followed whereby there is more liberty such that every microservice can execute their function independently , they are self-aware and it does not require any instruction from a centralized entity. It is more of asynchronous approach with business logic spread across the microservices, whereby every microservice shall listen to other service events and make it's own decision to perform an action or not. Accordingly, the choreography approach relies on a message broker (publish/subscribe) for communication between the microservices whereby each service shall be observing the events in the system and act on events autonomously.
TLDR: Choreography is the one which doesn't need persistance of the status of the process, orchestration needs to keep the status of the process somewhere.
I think you got this somewhat mixed up with implementation details.
Orchestration is called such, because there is a central process manager (sometimes mentioned as saga, wrongly imho) which directs (read orchestrates) operations across other services. In this pattern, the process manager directs actions to BC's, but needs to keep a state on previous operations in order to undo, roll back, or take any corrective or reporting actions deemed necessary. This status can be held either in an event stream, normal form db, or even implicitly and in memory (as in a method executing requests one by one and undoing the previous ones on an error), if the oubound requests are done through web requests for example. Please note that orchestrators may use synchronous, request-response communication (like making web requests). In that case the orchestrator still keeps a state, it's just that this state is either implicit (order of operations) or in-mem. State still exists though, and if you want to achieve resiliency (to be able to recover from an exception or any catastrophic failure), you would again need to persist that state on-disk so that you could recover.
Choreography is called such because the pieces of business logic doing the operations observe and respond to each other. So for example when a service A does things, it raises an event which is observed by B to do a follow up actions, and so on and so forth, instead of having a process manager ask A, then ask B, etc. Choregraphy may or may not need persistance. This really depends on the corrective actions that the different services need to do.
An example: As a practical example, let's say that on a purchase you want to reserve goods, take payment, then manifest a shipment with a courier service, then send an email to the recipient.
The order of the operations matter in both cases (because you want to be able to take corrective actions if possible), so we decide do the payment after the manifestation with the courier.
With orchestration, we'd have a process manager called PM, and the process would do:
PM is called when the user attempts to make a purchase
Call the Inventory service to reserve goods
Call the Courier integration service to manifest the shipment with a carrier
Call the Payments service to take a payment
Send an email to the user that they're receiving their goods.
If the PM notices an error on 4, they only corrective action is to retry to send the emai, and then report. If there was an error during payment then the PM would directly call Courier integration service to cancel the shipment, then call Inventory to un-reserve the goods.
With choreography, what would happen is:
An OrderMade event is raised and observed by all services that need data
Inventory handles the OrderMade event and raises an OrderReserved
CourierIntegration handles the OrderReserved event and raises ShipmentManifested
Payments service handles the ShipmentManifested and on success raises PaymentMade
The email service handles PaymentMade and sends a notification.
The rollback would be the opposite of the above process. If the Payments service raised an error, Courier Integration would handle it and raise a ShipmentCancelled event, which in turn is handled by Inventory to raise OrderUnreserved, which in turn may be handled by the email service to send a notification.

In DDD, who should be resposible for handling domain events?

Who should be responsible for handling domain events? Application services, domain services or entities itself?
Let's use simple example for this question.
Let's say we work on shop application, and we have an application service dedicated to order operations. In this application Order is an aggregate root and following rules, we can work only with one aggregate within single transaction. After Order is placed, it is persisted in a database. But there is more to be done. First of all, we need to change number of items available in the inventory and secondly notify some other part of a system (probably another bounded context) that shipping procedure for that particular order should be started. Because, as already stated, we can modify only one aggregate within transaction, I think about publishing OrderPlacedEvent that will be handled by some components in the separate transactions.
Question arise: which components should handle this type of event?
I'd like to:
1) Application layer if the event triggers modification of another Aggregate in the same bounded context.
2) Application layer if the event trigger some infrastructure service.
e.g. An email is sent to the customer. So an application service is needed to load order for mail content and mail to and then invoke infrastructure service to send the mail.
3) I prefer a Domain Service personally if the event triggers some operations in another bounded context.
e.g. Shipping or Billing, an infrastructure implementation of the Domain Service is responsible to integrate other bounded context.
4) Infrastructure layer if the event need to be split to multiple consumers. The consumer goes to 1),2) or 3).
For me, the conclusion is Application layer if the event leads to an seperate acceptance test for your bounded context.
By the way, what's your infrastructure to ensure durability of your event? Do you include the event publishing in the transaction?
These kind of handlers belong to application layer. You should probably create a supporting application service's method too. This way you can start separate transaction.
I think the most common and usual place to put the EventHandlers is in the application layer. Doing the analogy with CQRS, EventHandlers are very similar to CommandHandlers and I usually put them both close to each other (in the application layer).
This article from Microsoft also gives some examples putting handlers there. Look a the image bellow, taken from the related article:

Resources