Does an append-only event store result in an append-only codebase? - events

When implementing an application with event sourcing, the persistence engine at work is an event store. That is, an append-only log of events, in past tense, in the order or occurrence. By simply replaying the events through the application, the state at any point in time can be reproduced.
My concern – doesn't this append-only event store inevitably lead to an append-only codebase? How can you maintain a codebase if removing, or even altering code, might leave the application unable to replay the sequence of events? Can the number of source lines of code ever decrease?
What if a business rule has to be modified, or perhaps worse, what if a nasty bug early in the early days of the application allowed it to enter into a forbidden state? Must the faulty code be kept alive indefinitely? Of course, a lot of these issues can – in theory – be dealt with using event versioning, event schemas, snapshot versioning etc. But hasn't event sourcing become a burden at that point?
Event sourcing is a fairly new technology, at least in production. I suspect that there are few applications that have been running on it for more than a couple of years. What will they look like in 10 years? That's not an unrealistic age for an enterprise application.

My concern – doesn't this append-only event store inevitably lead to an append-only codebase?
No, it implies an append-only schema, which is decoupled from your implementation.
What if a business rule has to be modified, or perhaps worse, what if a nasty bug early in the early days of the application allowed it to enter into a forbidden state? Must the faulty code be kept alive indefinitely?
Not really - the domain is decoupled from the durable representations.
Yes, there are some common scenarios that you need to incorporate into your design; like the idea that you may need to compensate for errors earlier in the event history.
It's not, fundamentally, different from what you would do if you were only storing current state. If you have a representation of an aggregate in your database that is in the wrong state, you just update it in place, right? by changing some of the fields to what they are supposed to be.
The idea is the same in event sourcing; you have an event stream that produces a state that you don't want to be in. You figure out what additional events are necessary to reach the state you should be in, and append them. Tada.
Of course, a lot of these issues can – in theory – be dealt with using event versioning, event schemas, snapshot versioning etc. But hasn't event sourcing become a burden at that point?
Not really? Yes, you need to design flexibility into your schema, so that you can evolve your model aggressively, but at it's core it's not different from storing current state - you can still migrate if you have to.
But you also have other levers to play with.
It does, perhaps, require more upfront design capital - you have to think about things like schema lifetimes, and the fact that your book of record accumulates data from multiple revisions of your model.
That doesn't mean it's a shoe for all feet. Designing good message schema is an investment. If the consumers of that schema (which in this case really means your model, and the subscribers) don't need to evolve independently, then maybe that investment doesn't make sense.

Related

Eventual consistency - Axon conflict resolver

I'm working on a PoC to evaluate the use of Axon framework for the development of a new application.
My concern is about the eventual consistency with the CQRS pattern since consistency is a requirement for us.
There are a lot of articles and threads about this topic, so I apologize if I'm creating a duplicate thread.
Axon offer a conflict resolver but I'm not sure to understand how it works.
I found an example on a open source project.
This solution stores the version of the aggregate in the event store and read model. The client will read then the version from the read model.
What if I have different read models, could there be version conflicts?
How does Axon solve the conflicts?
Thanks
Before we dive into how Axon deals with consistency, there are a few things that I'd like to point out in the context of CQRS as a concept.
There is a lot of misconception around consistency in combination with CQRS. The concept of eventual consistency applies between the different models that you have defined within your application. For example, a Command Model may have changed state recently, but the Query Model doesn't reflect that state yet. The Query Model is eventually consistent with the Command Model. However, the information within that Query Model is still consistent in itself.
More importantly, this allows you to make conscious choices around where consistency is important and where it can be relaxed. Typically, Command Models make decisions in which consistency is important. You'd want to make sure each decision is made with the relevant knowledge of recent changes. That's the purpose of the Aggregate. An Aggregate will always make decisions that are consistent with its state.
I recommend reading up on the Reactive Principles document [1], namely Section V [2].
Then Axon. Axon implements the concepts of DDD and CQRS very strictly. Consistency is sacred within an Aggregate. For example, when using Event Sourcing, the events with an Aggregate's stream are guaranteed to have been generated based on a State that included all previous events in that stream. In other words, event number 9 in the stream was created with the knowledge of events number 0 through 8. Guaranteed.
When events are published, this doesn't mean any projections are already up to date. This may take a few milliseconds. Relaxing consistency here allows us to scale our system. The only downside is that a user may execute a command, perform a query and not see the results yet. This is actually much more common in systems than you think. There are numerous ways to prevent this from being a problem. Updating user interfaces in real-time is a powerful way of working with this. Then it doesn't matter which user made the change; they see it practically immediately.
The other way round may pose a challenge. A user observes the system state through a Query. This may (and always will, even without CQRS) provide stale data; the data may have been altered while the user is watching it. The user decides to make a change. However, in parallel, the information has already been changed. This other change may be such that, had the user known, it would have never submitted that Command.
In Axon, you can use Conflict Resolvers to detect these "unseen" parallel actions. You can use the "aggregate sequence" from incoming events and store them with your projection. If a user action results in a Command towards that aggregate, pass the Aggregate Sequence as Expected Aggregate Version. If the actual Aggregate's version doesn't match this (because it has been altered in the meantime), you get to decide whether that is problematic. There is a short explanation in the Reference Guide [3].
I hope this sheds some light on consistency in the context of CQRS and Axon.
[1] https://principles.reactive.foundation
[2] https://principles.reactive.foundation/principles/tailor-consistency.html
[3] https://docs.axoniq.io/reference-guide/axon-framework/axon-framework-commands/modeling/conflict-resolution

Amount of properties per command/event in event sourcing

I'm learning cqrs/event sourcing, and recently I listen some speach and speaker told that you need pass as few parameters to event as possible, in other words to make events tiny as possible. The main reason for that is it's impossible to change events later as it will break the event history, and its easelly to design small events correctly. But what if for example in UI you need fill in for example form with 10 fields to create new aggregate, and same situation can be with updating the aggregate? How to be in such a case? And how to be if business later consider to change something, but we have huge event which updating 10 fields?
The decision is always context-specific and each case deserves its own review of using thin events vs fat events.
The motivation for using thin domain events is to include just enough information that is required to ensure the state transition.
As for fat events, your projections might require a piece of entity state to avoid using any logic in the projection itself (best practice).
For integration, you'd prefer emitting fat events because you hardly know who will consume your event. Still, the content of the event should convey the information related to the meaning of the event itself.
References:
Putting your events on a diet
Patterns for Decoupling in Distributed Systems: Fat Event
recently I listen some speach and speaker told that you need pass as few parameters to event as possible, in other words to make events tiny as possible.
I'm not convinced that holds up. If you are looking for good ides about designing events, you should review Greg Young's e-book on versioning.
If you are event sourcing, then you are primarily concerned with ensuring that your stream of events allows you to recreate the state of your domain model. The events themselves should be representations of changes that a domain expert will recognize. If you find yourself trying to invent smaller events just to fit some artificial constraint like "no more than three properties per event" then you are going to end up with data that doesn't really match the way your domain experts think -- which is to say, technical debt.

Event source the whole system is bad

I'm learning a proper microservice architecture using CQRS, MassTransit and different type of storage for the read side. One thing which often comes along CQRS is the event sourcing. I do understand it's not mandatory at all. However, I can't think of why using it on the whole system is really an anti pattern.
Having an store for all events as a single source of truth can help you build / rebuild a read store on the fly whenever you want.
You are not locked in to any vendor (except for the event store)
For me, the question is more like is it easier to not start with event sourcing (and still have separate data storage depending on which the microservices. eg: elasticsearch, mongodb, etc etc) and migrating / provisioning whenever it's needed or on the other hand, start with event sourcing everything so that you don't have to deal with migration later on.
I can't think of why using it on the whole system is really an anti pattern.
I agree -- calling it an "anti pattern" is an overstatement.
The spelling I believe? Using event sourcing on the whole system isn't cost effective today.
It could be tomorrow, as we get more practice with it, and the costs of designing these systems goes down and we learn to extract more benefit from them.
In the mean time - how valuable are the temporal queries that you get from event sourcing? In your core domain, where you get competitive advantage, they could be quite valuable. In places where you are just doing bookkeeping of information provided to you by the outside world? Not so much - you may be getting everything you need out of simpler solutions that only keep track of "now".
I recently published a blog post about this issue. It explains why event sourcing is a persistence strategy and shouldn't be used at global scale.
To summarize it: Event Sourcing forces you to emit an event for every changed data. This can result in very fine grained events. If you use Event Sourcing for inter microservice communication, you expose those events to the outside world.
In the end you expose the your persistence layer, comparable to exposing your (relational) database schema in a CRUD based persistence strategy.

Should I protect my database from invalid data?

I always tend to "protect" my persistance layer from violations via the service layer. However, I am beginning to wonder if it's really necessary. What's the point in taking the time in making my database robust, building relationships & data integrity when it never actually comes into play.
For example, consider a User table with a unique contraint on the Email field. I would naturally want to write blocker code in my service layer to ensure the email being added isn't already in the database before attempting to add anything. In the past I have never really seen anything wrong with it, however, as I have been exposed to more & more best practises/design principles I feel that this approach isn't very DRY.
So, is it correct to always ensure data going to the persistance layer is indeed "valid" or is it more natural to let the invalid data get to the database and handle the error?
Please don't do that.
Implementing even "simple" constraints such as keys is decidedly non-trivial in a concurrent environment. For example, it is not enough to query the database in one step and allow the insertion in another only if the first step returned empty result - what if a concurrent transaction inserted the same value you are trying to insert (and committed) in between your steps one and two? You have a race condition that could lead to duplicated data. Probably the simplest solution for this is to have a global lock to serialize transactions, but then scalability goes out of the window...
Similar considerations exist for other combinations of INSERT / UPDATE / DELETE operations on keys, as well as other kinds of constraints such as foreign keys and even CHECKs in some cases.
DBMSes have devised very clever ways over the decades to be both correct and performant in situations like these, yet allow you to easily define constraints in declarative manner, minimizing the chance for mistakes. And all the applications accessing the same database will automatically benefit from these centralized constraints.
If you absolutely must choose which layer of code shouldn't validate the data, the database should be your last choice.
So, is it correct to always ensure data going to the persistance layer is indeed "valid" (service layer) or is it more natural to let the invalid data get to the database and handle the error?
Never assume correct data and always validate at the database level, as much as you can.
Whether to also validate in upper layers of code depends on a situation, but in the case of key violations, I'd let the database do the heavy lifting.
Even though there isn't a conclusive answer, I think it's a great question.
First, I am a big proponent of including at least basic validation in the database and letting the database do what it is good at. At minimum, this means foreign keys, NOT NULL where appropriate, strongly typed fields wherever possible (e.g. don't put a text field where an integer belongs), unique constraints, etc. Letting the database handle concurrency is also paramount (as #Branko Dimitrijevic pointed out) and transaction atomicity should be owned by the database.
If this is moderately redundant, than so be it. Better too much validation than too little.
However, I am of the opinion that the business tier should be aware of the validation it is performing even if the logic lives in the database.
It may be easier to distinguish between exceptions and validation errors. In most languages, a failed data operation will probably manifest as some sort of exception. Most people (me included) are of the opinion that it is bad to use exceptions for regular program flow, and I would argue that email validation failure (for example) is not an "exceptional" case.
Taking it to a more ridiculous level, imagine hitting the database just to determine if a user had filled out all required fields on a form.
In other words, I'd rather call a method IsEmailValid() and receive a boolean than try to have to determine if the database error which was thrown meant that the email was already in use by someone else.
This approach may also perform better, and avoid annoyances like skipped IDs because an INSERT failed (speaking from a SQL Server perspective).
The logic for validating the email might very well live in a reusable stored procedure if it is more complicated than simply a unique constraint.
And ultimately, that simple unique constraint provides final protection in case the business tier makes a mistake.
Some validation simply doesn't need to make a database call to succeed, even though the database could easily handle it.
Some validation is more complicated than can be expressed using database constructs/functions alone.
Business rules across applications may differ even against the same (completely valid) data.
Some validation is so critical or expensive that it should happen prior to data access.
Some simple constraints like field type/length can be automated (anything run through an ORM probably has some level of automation available).
Two reasons to do it. The db may be accessed from another application..
You might make a wee error in your code, and put data in the db, which because your service layer operates on the assumption that this could never happen, makes it fall over if you are lucky, silent data corruption being worst case.
I've always looked at rules in the DB as backstop for that exceptionally rare occasion when I make a mistake in the code. :)
The thing to remember, is if you need to , you can always relax a constraint, tightening them after your users have spent a lot of effort entering data will be far more problematic.
Be real wary of that word never, in IT, it means much sooner than you wished.

Backing a user interface with a state machine

I am developing a web application with a somewhat complex user interface. It seems like it might be a good idea to back the UI with a corresponding state machine, defining the transitions possible between various states and the corresponding behavior.
The perceived benefits are that the code for controlling the behavior is structured consistently, and that the state of the UI can be persisted and resumed easily.
Can anyone who has tried this lend any insights into this approach? Are there any pitfalls I need to be aware of?
Off the top of my head, these are a bit obvious, but still, as nobody replied anything:
i'd advise to persist the state of the application server side, indexed via a session variable/user id for security and flexibility reasons;
interfaces are better modeled by an event-based approach IMHO, but this is a bit dependent on what layer of the UI you're developing, and also on your language of choice for development. You may be able to store some logic on item triggers and items themselves.
By event-based approach, i refer somewhat to this technique, which some "more visual" oriented environments (adobe flex, oracle forms and also html, in a sort of limited fashion) use. In a nutshell, you have triggers (item.on_click, label.on_mouse_over, text_field.on_record_update) which you use to drive the states of the interface.
One very common caveat of this kind of approach (distributed control) is endless loops: you have an item that enables another item, which when enabled fires its own triggers and eventually gets the first item to fire that same trigger again. This is quite often not obvious when developing, but very common to detect when testing.
Some languages/environments offer some protection against the more obvious cases, but this is something to be on the lookout for.
This is probably useful for your approach.

Resources