Is it possible to combine spring statemachine with event sourcing pattern? - event-sourcing

My idea is to keep track of states of a domain object by spring statemachine. i.e. statemachine defines how to transit states of the domain object. When the events are persisted/restored to/from the event store, the state of the domain object can be (re)generated by sending events to the statemachine.
However, it seems that creating a statemachine object is relatively expensive, it's not that performant to create a state-machine object whenever a state transition happened on a domain object. If I only maintain a statemachine object, I would worry about concurrency problems. One approach is to have a 'statemachine-pool', but it gets messy if I have to create statamachines for multiple different domain objects.
So is it a good idea to apply spring statemachine with event sourcing pattern?

Provided that all the transitions are based on events I would say that it is a pretty good idea, yes.
The fundamental idea of Event Sourcing is that of ensuring every change to the state of an application is captured in an event object, and that these event objects are themselves stored in the sequence they were applied for the same lifetime as the application state itself.
The main point about event sourcing is that you store the events leading to a particular state - instead of just storing the current state - so that you can replay them up to a given point of time.
Thus, using event sourcing has no impact on how you create your state machines.
However, it seems that creating a state-machine object is relatively expensive, it's not that performant to create a state-machine object whenever a state transition happened on a domain object.
Creating a state-machine every time there is a state transition is not related with event sourcing. Would you do it differently if you were only storing the current state? You'd still need to either create the state-machine from the last stored state - or look it up in a cache or a pool - before you could apply the transition.
The only performance hit derived from using event sourcing would be that of replaying the transitions from the beginning in order to reach the current state. Now, if this is costly you can use snapshots to minimize the amount of transitions that must be replayed.

Related

DDD dealing with Eventual consistency for multiple aggregates inside a bounded context with NoSQL

I am currently working on a DDD Geolocation application that has two separate aggregate roots inside one bounded context. Due to frequent coordinate updates I am using redis to persist my data which doesn't allow rollbacks.
My first aggregate root is a trip object containing driver (users), passengers (list of users), etc.
My second aggregate root is user position updates
When a coordinate update is sent I will generate and fire a "UpdateUserPostionEvent". As a side effect I will also generate and fire a "UpdateTripEvent" at a certain point, which will update coordinates of drivers/passengers.
My question is how can I deal with eventual consistency if I am firing my "UpdateLiveTripEvent" asynchronously. My UpdateLiveTripEventHandler has several points of failure and besides logging an error how can I deal with this inconsistency?
I am using a library called MediatR and the INotificationHandler which is as far as I know is "Fire and Forget"
Edit: Ended up finding this SO post that describes exactly what I need (saga/process manager) but unfortunately I am unable to find any kind of Saga implentation for handling events within the same BC. All examples I am seeing involve a sevice bus.
Same or different Bounded Context; with or without Sagas; it does not matter.
Why a event handling fail? Domain rules or Infrastructure.
Domain rules:
A raised event handled by an aggregate (the event handler use the aggregate to apply the event) should NEVER fail by Domain Rules.
If the "target" aggregate has Domain Rules that reject the event your aggregate design is wrong. Commands/Operations can be rejected by Domain rules. Events can not be rejected (nor Undo) by Domain rules.
A event should be raised when all domain rules to this operation was checked by the "origin" aggregate. The "target" aggregate apply the event and maybe raises another event with some values calculated by the "target" aggregate (domain rules, but not for reject the event; events are unrejectable by domain rules; but to "continue" the consistency "chain" with good responsibility segregation). That is the reason why events should have sentences in past as names; because already happened.
Event simulation:
Agg1: Hey buddies! User did this cool thing and everything seems to be OK. --> UserDidThisCoolThingEvent
Agg2: Woha, that is awesome! I'm gonna put +3 in User points. --> UserReceivedSomePointsEvent
Agg3: +3 points to this user? The user just reach 100 points. That is a lot! I'm gonna to convert this User into VIP User. --> UserTurnedIntoVIPEvent
Agg4: A new VIP User? Let's notify it to the rest of the Users to create some envy ;)
Infrastructure:
Fix it and apply the event. ;) Even "by hand" if needed once your persistence engine, network and/or machine is up again.
Automatic retries for short time fails. ErrorQueues/Logs to not loose your events (and apply it later) in a long time outage.
Event sourcing also helps with this because you can always reapply the persisted events in the "target" aggegate without extra effort to keep events somewhere (i.e. event logs) because your domain persistence is also your event store.

Saga Choreography implementation problems

I am designing and developing a microservice platform based on the specifications of http://microservices.io/
The entire framework integrates through socket thus removing the overhead of multiple HTTP requests (like most REST APIs).
A service registry host receives the registry of multiple microservice hosts, each microservice is responsible for a domain of the business. Another host we call a router (or API gateway) is responsible for exposing the microservices for consumption by third parties.
We will use the structure of Sagas (in choreography style) to distribute the requisitions, so we have some doubts:
Should a microservice issue the event in any process manager or should it be passed directly to the next microservice responsible for the chain of events? (the same logic applies to rollback)
Who should know how to build the Saga chain of events? The first microservice that receives a certain work or the router?
If an event needs to pass a very large volume of data to the next Saga event, how is this done in terms of the request structure? Is it divided into multiple Sagas for example (as a result pagination type)?
I think the main point is that in this router and microservice structure, who is responsible for building the Sagas and propagating their events.
The article Patterns for Microservices — Sync vs. Async does a great job defining many of the terms used here and has animated gifs demonstrating sync vs. async and orchestrated vs. choreographed as well as hybrid setups.
I know the OP answered his own question for his use case, but I want to try and address the questions raised a bit more generally in lieu of the linked article.
Should a microservice issue the event in any process manager or should it be passed directly to the next microservice responsible for the chain of events?
To use a more general term, a process manager is an orchestrator. A concrete implementation of this may involve a stateful actor that orchestrates a workflow, keeping track of the progress in some way. Since a saga is workflow itself (composed of both forward and compensating actions), it would be the job of the process manager to keep track of the state the saga until completion (success or failure). This typically involves the actor sending synchronous* calls to services waiting for some result before going to the next step. Parallel operations can of course be introduced and what not, but the point is that this actor dictates the progression of the saga.
This is fundamentally different from the choreography model. With this model there is no central actor keeping track of the state of a saga, but rather the saga progresses implicitly via the events that each step emits. Arguably, this is a more pure case of an event-driven model since there is no coordination.
That said, the challenge with this model is observing the state at any given point in time. With the orchestration model above, in theory, each actor could be queried for the state of the saga. In this choreographed model, we don't have this luxury, so in practice a correlation ID is added to every message corresponding to (in this case) a saga. If the messages are queryable in some way (the event bus supports it or through some other storage means), then the messages corresponding to a saga could be queried and the saga state could be reconstructed.. (effectively an event sourced modeled).
Who should know how to build the Saga chain of events? The first microservice that receives a certain work or the router?
This is an interesting question by itself and one that I have been thinking about quite a lot. The easiest and default answer would be.. hard code the saga plans and map them to the incoming message types. E.g. message A triggers plan X, message B triggers plan Y, etc.
However, I have been thinking about what a control plane might look like that manages these plans and provides the mechanism for pushing changes dynamically to message handlers and/or orchestrators dynamically. The two specific use cases in mind are changes in authorization policies or dynamically adding new steps to a plan.
If an event needs to pass a very large volume of data to the next Saga event, how is this done in terms of the request structure? Is it divided into multiple Sagas for example (as a result pagination type)?
The way I have approached this is to include references to the large data if these are objects such as a file or something. For data that are inherently streams themselves, a parallel channel could be referenced that a consumer could read from once it receives the message. I think the important distinction here is to decouple thinking about the messages driving the workflow from where the data is physically materialized which depends on the data representation.
For microservices, every microservice should be responsible for its domain business.
Should a microservice issue the event in any process manager or should it be passed directly to the next microservice responsible for the chain of events? (the same logic applies to rollback)
All events are not passed to the next microservice, but are published, then all microservices interested in the events should subscribe to them.
If there is rollback, you should consider orchestration.
Who should know how to build the Saga chain of events? The first microservice that receives a certain work or the router?
The microservice who publish the event will certainly know how to build it. There are no chain of events, because every microservice interested in the event will subscribe it separately.
If an event needs to pass a very large volume of data to the next Saga event, how is this done in terms of the request structure? Is it divided into multiple Sagas for example (as a result pagination type)?
Only publish the data others may be interested, not all. In most cases, the data are not large, and message queue can handle them efficiently

CQRS + Microservices Handling event rollback

We are using microservices, cqrs, event store using nodejs cqrs-domain, everything works like a charm and the typical flow goes like:
REST->2. Service->3. Command validation->4. Command->5. aggregate->6. event->7. eventstore(transactional Data)->8. returns aggregate with aggregate ID-> 9. store in microservice local DB(essentially the read DB)-> 10. Publish Event to the Queue
The problem with the flow above is that since the transactional data save i.e. persistence to the event store and storage to the microservice's read data happen in a different transaction context if there is any failure at step 9 how should i handle the event which has already been propagated to the event store and the aggregate which has already been updated?
Any suggestions would be highly appreciated.
The problem with the flow above is that since the transactional data save i.e. persistence to the event store and storage to the microservice's read data happen in a different transaction context if there is any failure at step 9 how should i handle the event which has already been propagated to the event store and the aggregate which has already been updated?
You retry it later.
The "book of record" is the event store. The downstream views (the "published events", the read models) are derived from the book of record. They are typically behind the book of record in time (eventual consistency) and are not typically synchronized with each other.
So you might have, at some point in time, 105 events written to the book of record, but only 100 published to the queue, and a representation in your service database constructed from only 98.
Updating a view is typically done in one of two ways. You can, of course, start with a brand new representation and replay all of the events into it as part of each update. Alternatively, you track in the metadata of the view how far along in the event history you have already gotten, and use that information to determine where the next read of the event history begins.
Inside your event store, you could track whether read-side replication was successful.
As soon as step 9 suceeds, you can flag the event as 'replicated'.
That way, you could introduce a component watching for unreplicated events and trigger step 9. You could also track whether the replication failed multiple times.
Updating the read-side (step 9) and flagigng an event as replicated should happen consistently. You could use a saga pattern here.
I think i have now understood it to a better extent.
The Aggregate would still be created, answer is that all the validations for any type of consistency should happen before my aggregate is constructed, it is in case of a failure beyond the purview of the code that a failure exists while updating the read side DB of the microservice which needs to be handled.
So in an ideal case aggregate would be created however the event associated would remain as undispatched unless all the read dependencies are updated, if not it remains as undispatched and that can be handled seperately.
The Event Store will still have all the event and the eventual consistency this way is maintained as is.

Event-sourcing: Dealing with derived data

How does an event-sourcing system deal with derived data? All the examples I've read on event-sourcing demonstrate services reacting to fact events. A popular example seems to be:
Bank Account System
Events
Funds deposited
Funds withdrawn
Services
Balance Service
They then show how the Balance service can, at any point, derive a state (I.e. balance) from the events. That makes sense; those events are facts. There's no question that they happened - they are external to the system.
However, how do we deal with data calculated BY the system?
E.g.
Overdrawn service:
A services which is responsible for monitoring the balance and performing some action when it goes below zero.
Does the event-sourcing approach dictate how we should use (or not use) derived data? I.e. The balance. Perhaps one of the following?
1) Use: [Funds Withdrawn event] + [Balance service query]
Listen for the "Funds withdrawn" event and then ask the Balance service for the current balance.
2) Use: [Balance changed event]
Get the balance service to throw a "Balance changed" event containing the current balance. Presumably this isn't a "fact" as it's not external to the system, therefore prone to miscalculation.
3) Use: [Funds withdrawn event] + [Funds deposited event]
We could just skip the Balance service and have each service maintain its own balance directly from the facts. ...though that would result in each service having its own (potentially different) version of the balance.
A services which is responsible for monitoring the balance and performing some action when it goes below zero.
Executive summary: the way this is handled in event sourced systems is not actually all that different from the alternatives.
Stepping back a second - the advantage of having a domain model is to ensure that all proposed changes satisfy the business rules. Borrowing from the CQRS language: we send command messages to a command handler. The handler loads the state of the model, and tries to apply the command. If the command is allowed, the changes to the state of the domain model is updated and saved.
After persisting the state of the model, the command handler can query that state to determine if their are outstanding actions to be performed. Udi Dahan describes this in detail in his talk on Reliable messaging.
So the most straight forward way to describe your service is one that updates the model each time the account balance changes, and sets the "account overdrawn" flag if the balance is negative. After the model is saved, we schedule any actions related to that state.
Part of the justification for event sourcing is that the state of the domain model is derivable from the history. Which is to say, when we are trying to determine if the model allows a command, we load the history, and compute from the history the current state, and then use that state to determine whether the command is permitted.
What this means, in practice, is that we can write an AccountOverdrawn event at the same time that we write the AccountDebited event.
That AccountDebited event can be subscribed to - Pub/Sub. The typical handling is that the new events get published after they are successfully written to the book of record. An event listener subscribing to the events coming out of the domain model observes the event, and schedules the command to be run.
Digression: typically, we'll want at-least-once execution of these activities. That means keeping track of acknowledgements.
Therefore, the event handler is also a thing with state. It doesn't have any business state in it, and certainly no rules that would allow it to reject events. What it does track is which events it has seen, and which actions need to be scheduled. The rules for loading this event handler (more commonly called a process manager) are just like those of the domain model - load events from the book of record to obtain the current state, then see if the event being handled changes anything.
So it is really subscribing to two events - the AccountDebited event, and whatever event returns from the activity to acknowledge that it has completed.
This same mechanic can be used to update the domain model in response to events from elsewhere.
Example: suppose we get a FundsWithdrawn event from an ATM, and we need to update the account history to match it. So our event handler gets loaded, updates itself, and schedules a RecordATMWithdrawal command to be run. When the command loads, it loads the account, updates the balances, and writes out the AccountCredited and AccountOverdrawn events as before. The event handler sees these events, loads the correct state process state based on the meta data, and updates the state of the process.
In CQRS terms, this is all taking place in the "write models"; these processes are all about updating the book of record.
The balance query itself is easy - we already showed that the balance can be derived from the history of the domain model, and that's just how your balance service is expected to do it.
To sum up; at any given time you can load the history of the domain model, to query its state, and you can load up the history of the event processor, to determine what work has yet to be acknowledged.
Event sourcing is an evolving discipline with a bunch of diverse practices, practitioners and charismatic people. You can't expect them to provide you with some very consistent modelling technique for all scenarios like you described. Each one of those scenarios has it's pros and cons and you specified some of them. Also it may vary dramatically from one project to another, because business requirements (evolutionary pressures of the market) will be different.
If you are working on some mission-critical system and you want to have very consistent balance all the time - it's better to use RDBMS and ACID transactions.
If you need maximum speed and you are okay with eventually consistent states and not very anxious about precision of your balances (some events may be missing here and there for bunch of reasons) then you can derive your projections for balances from events asynchronously.
In both scenarios you can use event sourcing, but you don't necessarily have to generate your projections asynchronously. It's okay to generate projection in the same transaction scope as you making changes to your write model if you really need to do that.
Will it make Greg Young happy? I have no idea, but who cares about such things if your balances one day may go out of sync in mission-critical system ...

How to handle saving on child context but the objected is already deleted in parent context?

I have core data nested contexts setup. Main queue context for UI and saving to SQLite persistent store. Private queue context for syncing data with the web service.
My problem is the syncing process can take a long time and there are the chance that the syncing object is deleted in the Main queue context. When the private queue is saved, it will crash with the "Core Data could not fulfill faulted" exception.
Do you have any suggestion on how to check this issue or the way to configure the context for handle this case?
There is no magic behind nested contexts. They don't solve a lot of problems related to concurrency without additional work. Many people (you seem to be one of those people) expect things to work out of the box which are not supposed to work. Here is a little bit of background information:
If you create a child context using the private queue concurrency type then Core Data will create a queue for this context. To interact with objects registered at this context you have to use either performBlock: or performBlockAndWait:. The most important thing those two methods do is to make sure to invoke the passed block on the queue of the context. Nothing more - nothing less.
Think about this for a moment in the context of a non Core Data based application. If you want to do something in the background you could create a new queue and schedule blocks to do work on that queue in the background. If your job is done you want to communicate the result of the background operations to another layer inside your app logic. What happens when the user deleted the object/data in the meantime which is related to the results from the background operation? Basically the same: A crash.
What you experience is not a Core Data specific problem. It is a problem you have as soon you introduce concurrency. What you need is to think about a policy or some kind of contract between your child and parent contexts. For example, before you delete the object from the root context you should cancel all of the operations/blocks which are running on other queues and wait for the cancellation to finish before you actually delete the object.

Resources