State machine: validation before initial save in database?

State machine: validation before initial save in database? - validation

This is a question regarding state machines in general, I don't need help with the actual implementation. Imagine a state machine that formalizes a simple bug report, from inception to its final demise. The bug might transition across states such as "NEW", "CONFIRMED", "RESOLVED", "REOPENED", and "CLOSED". Along with every state transition there is also some accompanying validation code, which could for instance make sure that when moving from NEW to CONFIRMED we have recorded who confirmed it.
My question is related to the initial state – when the bug is just "NEW". It's tempting to say that initial validation is not part of the state machine (e.g. making sure that the bug actually has some description, for instance, before saving it with state "NEW" in the database). But isn't that also a state transition, from "just created" to "NEW"? Shouldn't that transition be validated like any other transition? Isn't it artificial and sub-optimal to separate the initial validation from all other validations?
On the other hand, if we do create a "fake" initial state (say, "CREATED"), along with its respective transition ("CREATED" --> "NEW"), then what happens when that transition isn't validated? If it is validated, it's all good – we switch states and we save the object with the new state (actually called "NEW" here) in the database. But if it doesn't validate then we obviously don't want to save it in the database, and that breaks the state machine pattern by not having an initial state and a final state (we would have an initial state, albeit a fake one – "CREATED" –, but two final states – "CLOSED" and "DELETED"). Not only that, but the "DELETED" state would also be fake, in that there will never be any persistent objects with that state (just as there will never be any persistent objects with state "CREATED").
How do you handle this issue?

Ok, after further investigation it looks like the pattern does solve my issue by itself: in some (most?) state machine models, there is in fact an initial transaction that ends in the initial state. So there is in fact a "fake" initial state, as far as the actual code is concerned, but that state must not be considered a real state in the state machine.

Related

Spring State Machine - Persist Libraries and Final State - Stops Listening

I was looking at spring state machine (spending a small amount of time evaluating, before being moved onto another project).
I wanted to use papyrus and UML modeling for an Order Flow. This worked. I had a REST interface working. I expanded to look at the persistence demo and created a number of state machines using a cross-reference id.
I used thymeleaf to show the various orders, their states and send events.
This all seemed to work UNTIL any one of the state machines entered a "Final State" (The one that looks like a bullseye). At this point the AbstractPersistStateMachineHandler stopped triggering/listening and the onPersist no longer fired.
Is there an issue with using a "final state" and the persistence (https://docs.spring.io/spring-statemachine/docs/3.2.0/reference/#statemachine-recipes-persist) approach?
If i reworked it to just ensure this last state was a "normal" state (but with no exists) then it worked fine, but from a state model perspective probably doesnt accurately show that we have reached the end of the lifecyle.
Alot of what i did would have been based around: \spring-statemachine\spring-statemachine-samples\datapersist

No state machine in elsa-workflows?

love the elsa-workflows project as I was heavily using WWF in the past. However many of my workflows where state machines. I can't see any in elsa, any plans to support this ?

Elsa 2 does not support the state machine model (only the flowchart model), but I am planning on revising the engine for Elsa 3 which would allow any type of model, including state machine and simple sequential flows like we have in Windows WF.
UPDATE
After I answered with the above I started to think ahead of the state machine architecture for V3, during which I realized we can implement the state machine model already today with V2.
All it would take is a simple new activity called e.g. "State" that has an infinite number of outcomes. This State activity would simply set a workflow variable called e.g. "StateMachineState" or "CurrentState". Each outbound connection would be connected to any trigger responsible for transitioning into the next state. This could be a message from a service bus, a timer, an HTTP request, or anything else that's available with Elsa.
The only real change that would need to be added to make the user experience smooth is the ability to keep adding connections without having to specify them manually from the activity editor. With the current design, we could probably just automatically add an extra outcome to the activity. So initially there would just be e.g. "Transition 1". When that one becomes connected, a "Transition 2" would appear.
Anyway, I am revising my answer to: it's not here yet, but:
You can implement it yourself today, and
I will add an initial version of the State machine model to either Elsa 2.1 or 2.2, depending on any hidden gotchas I might have failed to see.
UPDATE 2
I just pushed a change that includes a State activity.
With this, you can now easily implement a state machine by adding State activities to your workflow. Here's an example of a traffic light state machine:
This workflow kick starts automatically after 5 seconds, after which it will transition into the "Green" state. Then it stays there for 10 seconds before transitioning into the "Yellow" state. After 5 seconds, it then transitions into the "Red" state, and finally transitions back to the "Green" state after 5 seconds. Then it repeats.
To use the State activity, you specify things:
State name.
Allowed transitions (the traffic light example includes only one transition per state, but you can specify more than just one).

Where to apply business logic in EventSourcing

In eventsourcing, I am having bit confusion on where exactly have to apply Business logic? I have already searched in google, but all examples are very basic ie., Updating state of an object inside Handler from an event object, but in my other scenario, had some confusion didnt understood on where exactly have to apply Business logic.
For eg: lets take a scenario to update status of IntervieweeVO, which exists inside Interview aggregate class as below:
class Interview extends AggregateRoot {
private IntervieweeVO IntervieweeVO;
}
class IntervieweeVO {
int performance;
String status;
}
class IntervieweeSelectedEvent extends BaseEvent {
private IntervieweeVO IntervieweeVO;
}
I have a business logic, ie., if interviewee performance < 3, then status = REJECTED, otherwise status should be SELECTED.
So, my doubt is: where should I keep above business logic? Below are 3 scenarios:
1) Before Applying an Event: Do Business Logic, then apply(IntervieweeSelectedEvent) and then eventstore.save(intervieweeSelectedEvent)
2) Inside EventHandler: Apply Business logic inside EventHandler class, like handle(IntervieweeSelectedEvent intervieweeSelectedEvent) , check Business logic and then update Object state in ReadModel table.
3) Applying Business Logic in both places ie., Before Applying an event and also while handing the event (combining above 1 + 2)
Please clarify me on above.

The main issue with event sourcing is that it is hard to produce a viable example using synthetic scenarios.
But probably I could suggest something a little bit better than Interview. If you compare pre-computer era event sourced systems, you'll find that an event stream, which is the store of events composing the lifecycle of some entity, it rather a long-living thing. Events in an entity could span a few days (a list that tracks some document flow), a year (accounting period for some organisation) or tens of years (medical records for some person).
A single event stream usually represents a single entity - a legal process, a ledger or a person... Each event is a transactional (as in ACID) change to the state of the entity.
In your case such an entity could be, say, a position. Which is opened, announced, interviewee invited, invitation accepted, skills assessed, offer made, offer accepted, position closed. From the top of my head.
When an event is added to an entity, it means that the entity's state has changed. It is the new truth about the entity. You want to be careful about changing the truth. So, that's where business logic happens. You run some business logic to make up the decision whether to change the truth or not. It you decide to update the state of the truth - you save the event. That being said, "Interviewee rejected" is a valid event in this case.
Since an event is persisted, all the saved events of an entity are unconditionally the part of the truth about the entity, in their respective order. You then don't decide whether to "accept" or "reject" a persisted event - only how it would affect a projection.

You should be able to reconstruct the entity's state as of a specific point in time from the event stream.
This implies that applying events should NOT contain any logic other than state mapping logic. All state necessary to project the AR's state from the events must be explicitly defined in those events.
Events are an expressive way to define state changes, not operations/commands. For instance, if IntervieweeRejected means IntervieweeStatusChanged(rejected) then that meaning can't ever change. The IntervieweeRejected event can't ever imply anything else than status = rejected, unless there's some other state captured in the event's data (e.g. reason).
Obviously, the way the state is represented can always change, but the meaning must not. For example the AR may have started by only projecting the current status and later on projected the entire status history.
apply(IntervieweeRejected) => status = REJECTED //at first
apply(IntervieweeRejected) => statusHistory.add(REJECTED) //later
I have a business logic, ie., if interviewee performance < 3, then
status = REJECTED, otherwise status should be SELECTED.
Business logic would be placed in standard public AR methods. In this specific case you may expect interviewee.assessPerformance(POOR) to yield IntervieweePerformanceAssessed(POOR) and IntervieweeRejected events. Should you need to reevaluate that smart screening policy at a later time (e.g. if it has changed) then you could implement a reevaluateSmartScreeningPolicy operation.
Also, please note that such logic may not even belong in the Interviewee AR itself. The smart screening policy may be seen as something that happend after/in response to the IntervieweePerformanceAssessed event. Furthermore, I can easily see how a smart screening policy could become very complex, AI-driven which could justify it living in a dedicated Screening bounded context.
Your question actually made me think about how to effectively capture the context or why events occurred and I've asked about that here :)

you tagged your question cqrs but this is acutally the missing part in your example.
Eventsourcing is merely a way to look at the current state of an object. You either save that state as it appears now, or you source it from everything that happend. (eg Bank accounts current banalance as value or sum of all transactions)
So an event is a "fact" of something that happend. In your case that would be the interview with a certain score. And (dependent on your business logic) it COULD also state the status if the barrier is expected to change over time.
The crucial point is here that you should always adhere to the following chain:
"A command gets validated and if it passes it creates an unchangeable event that is persisted"
This means that in your case I would go for option 1. A SelectIntervieweeCommand should be validated and if everything is okay create an IntervieweeSelectedEvent which is an unchangeable fact. Thus the business logic wether the interviewee passed or not, must reside in the command handler function.

Is it ok to have FAT events with event sourcing?

I have recently been building an application on top of Greg Young EventStore as my peristance layer and I have been pondering how big should I allow an event to get?
For example I have an UK Address Aggregate with the following fields
UK_Address
-BuildingName
-Street
-Locality
-Town
-Postcode
Now I'm building the UI using React/Redux and was thinking should I create a single FAT addressUpdated Event contatining all the above fields?
Or should I Create a event for each of the different fields? and batch them within the client until the Save event is fired? buildingNameUpdated Event, streetUpdated Event, localityUpdated Event.
I'm not sure if the answer is as black and white ask I have asked it what I really would like to know is what conditions/constraints could you use to make the decision?

should I create a event for each of the different fields?
No. The representations of your events are part of the API -- so you want to use spellings that make sense at the level of the business, not at the level of the implementation.
Now I'm building the UI using React/Redux and was thinking should I create a single FAT updateAddress Event containing all the above fields?
You don't need to constrain the data that you send to your UI to match that which is in the persistence store. The UI is just a cached representation of a read model; there's no reason that representation needs to have the same form as what is in your event store.
Consider the React model itself -- your code makes changes to the "in memory" representation of your data, and then the library computes the new DOM and replaces it, which in turn causes the browser to update its view, which in turn causes the pixels on the screen to change.
So taking a fat event from the store, and breaking it into field level events for the UI is fine. Taking multiple events from the store and aggregating them into a single message for the UI is also fine. Taking events from the event store and transforming them into a spelling that the UI will recognize is also fine.
Do you have any comment regarding Arien answer regarding keeping fields that need to be consistent together? so regardless of when your snapshop the current state of the world it would be in a valid state?
I don't believe that this makes sense, and I'm not sure if it is possible in general.
It doesn't make sense, because "valid state" is a write model concern only; events are things that have happened, its too late to vote on whether they are valid or not. For instance, if you deploy a new model, with a new invariant, it still needs to respect the history of what happened before. So you can build a snapshot for that new model, but the snapshot may not be "valid". Too bad.
Given that, I don't think it makes sense to worry over whether each individual event in a commit leaves the snapshot in a valid state.
In particular, if a particular transaction involves multiple entities, it is very likely that the domain language will suggest an event for each entity (we "debit cash" and "credit accounts receivable"). The entities themselves, of course, are capable of changing independently of each other -- it's the aggregate that maintains the balance.

You have to bundle al the information together in one event when this data has to be consistent with each other.
So when you update one field of an address you probably get an unwanted address.
This will happen when the client has not processed all the events at a certain time due to eventual consistency.
Example:
Change address (City=1, Street=1, Housenumber=1) to (City=2, Street=2, Housenumber=2)
When you do this with 3 events and you have just processed one at the time of reading you could get the address: (City=2, Street=1, Housenumber=1).

If puzzled, give a try to a solution that is easier to implement. I guess "FAT" event will be easier: you will end up spending less time for implementing/debugging/supporting.
It is usually referred as YAGNI-KISS-Occam's Razor principles.

In theory and I find it to be a good rule of thumb is to have your commands and events reflecting the intent of the user staying true to DDD. You can find a good explanation of the pros and cons about event granularity here: https://medium.com/#hugo.oliveira.rocha/what-they-dont-tell-you-about-event-sourcing-6afc23c69e9a

Is it acceptable to have an invalid state in eventsourcing after event upgrading and before patching?

Let say I have a stream of persisted events that build a valid state according to some "schema" I have defined.
I change the schema and the events are upgraded to reflect this.
However, some state could not be made valid just by upgrading events, I also needed to add more events to patch the state to make it fully valid.
Firstly, is this reasoning at all valid in terms of event sourcing?
If so, how do I handle cases where a specific version of a state no longer becomes valid? I mean is this acceptable? Should it still be possible to rehydrate a version with invalid state? If this is a write model and it's not the latest version, I could not modify this state anyway so maybee it's no big deal?

However, some state could not be made valid just by upgrading events, I also needed to add more events to patch the state to make it fully valid.
"Compensating events" is the usual term; there is a clerical error in the book of record, so we need to add a new event to the history that corrects the mistake.
If so, how do I handle cases where a specific version of a state no longer becomes valid?
As a rule, you want to be wary, extremely wary, of introducing any automated validation that prevents you from loading an invalid history. Remember, state is just state; the business rules constrain the way the domain is allowed to change. Leaving broken states readable, but broken, is safe.
In particular, if you allow the state to load, it is a straight forward exercise to enumerate your event streams, test the final state of the object, and produce an exception report for any streams that produce an invalid state, escalating them to operators/management for handling, and so on.
Assuming that you are reasonable careful about input validation, and comparing whether your proposed command is consistent with latest known state (aggregates enforce business rules, but they don't need to hoard those rules for themselves), then you can probably achieve error rates low enough that you don't need aggressive data validation. That's especially true when the errors are easy to detect and cheap to fix.
Failing that, freezing any aggregates while they are in an invalid state is a good way to prevent further damage.
But if you really need the state to stay valid, there's a trick that you can play with compensating events.
Consider: the basic pattern of event sourcing looks something like
History history = repository.getHistoryById(id)
State current = State.SEED
for (Event e : history) {
current = current.apply(e)
}
There's actually a hidden concept here, which encapsulates the logic for processing the events prior to passing them to the state. Hidden, because the null case just passes the enumerated events straight through to the target.
History history = repository.getHistoryById(id)
Historian historian = new Historian();
State current = State.SEED
for (Event e : historian.reviewEvents(history)) {
current = current.apply(e)
}
The historian gives you a place to put your compensating event logic - based on its own state, the historian passes through most events, but fixes the ones that knows needs edits/compensation/redactions
Where does the historian state come from? Why, from the history of the historian, of course. You load the history of the event corrections, which will typically be short, into the historian, and then let the historian clean up the events for the aggregate.
And if you need corrections for the historian? It's turtles all the way down! Each stream has a unique historian; the identifier for the historian's stream is calculated from the stream it filters (named UUID's, for example, would allow you to do this). So for each stream, you check to see if a historian stream exists; when you find one that doesn't, you know to stop searching and use the null historian, roll up the changes, process the final sequence of events to regenerate the state of your real object, and off you go.
Mind you, I haven't seen a reference implementation of this idea anywhere; it's whiteboard sound, but the truth is I've been deferring this requirement in my own designs.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio