How do event sourcing and CQRS help to achieve decoupled architecture for microservices.
We can have microservices which own their data and others accessing that data via service even by having traditional means of persistence. Isn't?
Event sourcing and CQRS are not intended to be used to decouple services.
The main goal achived by using CQRS is to improve the performance of a service because you can use different persistence type for writes (command) and for reads (query).
Thanks to that, you can use a higly performance write type persistence like event log to store all events that happen in the service and use a relational model for example for reads where you store the information in the way that you need for your queries.
The way to achieve consistency between the two models is normally by using events generated by the command model that are consumed by the query to update the read model. The drawback of this appproach is the eventual consistency, because the update of the read model does not happen inmediately.
Higly related with cqrs is event sourcing that states that all modifications in the model should be stored like events in an event store. This way you have an historic of all the actions made in the application. The advantage of this is that you have an historic of all changes for audit purposes and that writes are extremely fast. The drawback is that if you want to get the current state you have to replay all the events since the begining.
To solve this, you use cqrs to get the actual state to make queries
Related
I am working in a web app which implements backend in event sourcing. Event sourcing has given us great power to go back in time, run projections to get different types of reports. Also, we can potentially build our database from scratch by replaying the projections if we need to.
We have certain modules which do not give much analytical value by implementing event sourcing in it. For example, a questionnaire creation, which is nothing but a simple form CRUD. We have event sourcing in it, but the only advantage we potentially have from that is to rebuild the forms database from the stored domain events. Or to get values like how much time a user took to make the questionnaire etc.
But still those analytics do not give us much info because the state change in a form is not as valuable as other parts of the system. Like e.g. changing state of a bank account through domain events give us much more information as compared to a changing state of a form CRUD.
How do you guys approach such situations and know if a certain part of the app is good for event sourcing or if it is an overkill?
Whether or not it's overkill is a matter of opinion, but CRUD (or really CUD, since a read isn't a meaningful event) events (e.g. WidgetCreated, WidgetUpdated, WidgetDeleted; especially the WidgetUpdated) can be a sign of an anemic domain.
Assuming that each update is atomic in your DB, you can likely get the same results (a stream of events for other components to consume) by using change data capture (e.g. Debezium for many SQL DBs to put a change feed into a Kafka topic, or some DBs like Azure Cosmos offer a native change feed) to capture changes to records in the DB.
Event sourcing now with CRUD events does give you the flexibility to flesh out the domain model later if requirements change. That requires a sense of how likely (and when...) the requirements are to change in a way that makes richer domain events handy vs. how much effort you're expending in event-sourcing now.
I'm new in ES, and only trying to sort everything in my head. I have heard that ES is actually solving the consistency issue between write and read database (with some delay for sure). But I still do not fully understand how?
If command is coming to domain and aggregate root firing event to update event store, same event is sending to update read side?? But what if message lost, we will have outdated read side.
Is projections the only solution??So instead of updating from event, read side walking through event store and reproducing aggregate (from beginning or from some snapshot). But in such case it's probably breaking some rules as read side should be simple and it should not know about domain. And also usually read side is a separate application so she can't know about aggregate.
For sure we also can use rabbitMQ or some other message broker to not lost messages,and actually I think we need. But I also read that to make it consistent "you can use rabbit or ES", but again how ES can make it consistent by own??
Benjamin is completely right about the purpose of Event Sourcing.
My answer aims to add some more details.
First:
Read models and projections aren't suppose to represent the aggregate state.
Projections are the way for event-sourced systems to build the read model for CQRS. CQRS in essence postulates that write and read models usually serve different purposes and therefore it makes perfect sense to use another model for the read side.
Therefore, you often find multiple projections building different, narrowly purposed models, targeting specific needs for queries.
Second:
By "solving consistency issues" you probably mean that in event-sourced systems each state transition is represented as an event (or multiple events). Therefore, writes are always transactional. The database you choose as your event store should support (could using some library or additional tool) real-time subscription that would allow you to receive new events in your projection, in order. For new projections, it will start reading from the start and eventually come real-time. Subscriptions usually need to keep the current processing position in the global stream of events so when the projection restarts, it starts receiving events from the point which is last known to it.
By doing this, you will guarantee that every state transition in the write model will be reflected in the read model. This is probably what you mean in your original question.
Third:
Now, all those things above imply that you cannot use a message bus (only) to deliver events to projections. Brokers give no ordering guarantees and can deliver one message more than once. Also, message brokers don't keep history so you cannot build new projections at will.
However, it doesn't mean that you can't use brokers at all. Some projections don't require ordering and are idempotent. But the feed for events to publish via a broker is the same subscription, so you get guaranteed delivery and can read past events if necessary.
Fourth:
CQRS doesn't imply separate databases. Sometimes, using CQRS just means that you use some persistence layer for your domain objects, so you read and write aggregates. But for queries, you just query at will, whatever you want. A database view is a technical example of CQRS.
Almost there:
Projections need to have little to no logic, it is true. The main point here is to ensure idempotency, if possible, so projections usually should not use operations to calculate new values based on old values and information from events.
But projections will know about your domain. Everything in your system should know about your domain.
And last:
You can definitely use different databases for write and read models without getting to Event Sourcing. You just need to choose a database that supports a change feed. SQL Server, Postgres, CosmosDb and other databases have such functionality.
P.S. I'd suggest spending some time studying those concepts. I can point to the book repository, it has CQRS and Event Sourcing examples: https://github.com/PacktPublishing/Hands-On-Domain-Driven-Design-with-.NET-Core
I have heard that ES is actually solving the consistency issue between
write and read database
To the best of my knowledge, Event sourcing has NOTHING to do with consistency between read/write to your db. Consistency between read/write has actually more to do with the type of db you are using such as relational which are mostly ACID versus the non-relational db which are often eventual consistency.
ES is not meant for that, instead ES : "Capture all changes to an application state as a sequence of events" Martin Fowler.
ES works like time machine, which allows you to change the state of your application to a specific date time in the past.
When we talk about sourcing events, we have a simple dual write architecture where we can write to database and then write the events to a queue like Kafka. Other downstream systems can read those events and act on/use them accordingly.
But the problem occurs when trying to make both DB and Events in sync as the ordering of these events are required to make sense out of it.
To solve this problem people encourage to use database commit logs as a source of events, and there are tools build around it like Airbnb's Spinal Tap, Redhat's Debezium, Oracle's Golden gate, etc... It solves the problem of consistency, ordering guaranty and all these.
But the problem with using the Database commit log as event source is we are tightly coupling with DB schema. DB schema for a micro-service is exposed, and any breaking changes in DB schema like datatype change or column name change can actually break the downstream systems.
So is using the DB CDC as an event source a good idea?
A talk on this problem and using Debezium for event sourcing
Extending Constantin's answer:
TLDR;
Transaction log tailing/mining should be hidden from others.
It is not strictly an event-stream, as you should not access it directly from other services. It is generally used when transitioning a legacy system gradually to a microservices based. The flow could look like this:
Service A commits a transaction to the DB
A framework or service polls the commit log and maps new commits to Kafka as events
Service B is subscribed to a Kafka stream and consumes events from there, not from the DB
Longer story:
Service B doesn't see that your event is originated from the DB nor it accesses the DB directly. The commit data should be projected into an event. If you change the DB, you should only modify your projection rule to map commits in the new schema to the "old" event format, so consumers must not be changed. (I am not familiar with Debezium, or if it can do this projection).
Your events should be idempotent as publishing an event and committing a transaction
atomically is a problem in a distributed scenario, and tools will guarantee at-least-once-delivery with exactly-once-processing semantics at best, and the exactly-once part is rarer. This is due to an event origin (the transaction log) is not the same as the stream that will be accessed by other services, i.e. it is distributed. And this is still the producer part, the same problem exists with Kafka->consumer channel, but for a different reason. Also, Kafka will not behave like an event store, so what you achieved is a message queue.
I recommend using a dedicated event-store instead if possible, like Greg Young's: https://eventstore.org/. This solves the problem by integrating an event-store and message-broker into a single solution. By storing an event (in JSON) to a stream, you also "publish" it, as consumers are subscribed to this stream. If you want to further decouple the services, you can write projections that map events from one stream to another stream. Your event consuming should be idempotent with this too, but you get an event store that is partitioned by aggregates and is pretty fast to read.
If you want to store the data in the SQL DB too, then listen to these events and insert/update the tables based on them, just do not use your SQL DB as your event store cuz it will be hard to implement it right (failure-proof).
For the ordering part: reading events from one stream will be ordered. Projections that aggregates multiple event streams can only guarantee ordering between events originating from the same stream. It is usually more than enough. (btw you could reorder the messages based on some field on the consumer side if necessary.)
If you are using Event sourcing:
Then the coupling should not exist. The Event store is generic, it doesn't care about the internal state of your Aggregates. You are in the worst case coupled with the internal structure of the Event store itself but this is not specific to a particular Microservice.
If you are not using Event sourcing:
In this case there is a coupling between the internal structure of the Aggregates and the CDC component (that captures the data change and publish the event to an Message queue or similar). In order to limit the effects of this coupling to the Microservice itself, the CDC component should be part of it. In this way when the internal structure of the Aggregates in the Microservice changes then the CDC component is also changed and the outside world doesn't notice. Both changes are deployed at the same time.
So is using the DB CDC as an event source a good idea?
"Is it a good idea?" is a question that is going to depend on your context, the costs and benefits of the different trade offs that you need to make.
That said, it's not an idea that is consistent with the heritage of event sourcing as I learned it.
Event sourcing - the idea that our book of record is a ledger of state changes - has been around a long long time. After all, when we talk about "ledger", we are in fact alluding to those documents written centuries ago that kept track of commerce.
But a lot of the discussion of event sourcing in software is heavily influenced by domain driven design; DDD advocates (among other things) aligning your code concepts with the concepts in the domain you are modeling.
So here's the problem: unless you are in some extreme edge case, your database is probably some general purpose application that you are customizing/configuring to meet your needs. Change data capture is going to be limited by the fact that it is implemented using general purpose mechanisms. So the events that are produced are going to look like general purpose patch documents (here's the diff between before and after).
But if we trying to align our events with our domain concepts (ie, what does this change to our persisted state mean), then patch documents are a step in the wrong direction.
For example, our domain might have multiple "events" that make changes to the same, or very similar, sets of fields in our model. Trying to rediscover the motivation for a change by reverse engineering the diff is kind of a dumb problem to have; especially when we have already fought with the same sort of problem learning user interface design.
In some domains, a general purpose change is good enough. In some contexts, a general purpose change is good enough for now. Horses for courses.
But it's not really the sort of implementation that the "event sourcing" community is talking about.
Besides Constantin Galbenu mentioned CDC component side, you can also do it in event storage side like Kafka stream API.
What is Kafka stream API? Input is read from one or more topics in order to generate output to one or more topics, effectively transforming the input streams to output streams.
After transfer detailed data to abstract data, your DB schema is only bind with the transformation now and can release the tightly relation between DB and subscribers.
If your data schema need to change a lot, maybe you should add a new topic for it.
I was asked to do some exploration in event sourcing. my objective is to create a tiny API layer that satisfies all the traditional CRUD operation. I am now using a package called 'sourced' and trying to play around with it (Using Nodejs).
However, I came to realize that the event sourcing is not quite useful when it is used alone. usually, it is coupled with CQRS.
My understanding of the CQRS is, when the UI sends a write command to the server. the app does some validation towards the data. and saves it in the event store(I am using mongoDB), for example: here is what my event store should look like:
{method:"createAccount",name:"user1", account:1}
{method:"deposit",name:"user1",account: 1 , amount:100}
{method:"deposit",name:"user1",account: 1 , amount:100}
{method:"deposit",name:"user1",account: 1 , amount:100}
{method:"withdraw",name:"user1",account1,amount:250}
It contains all the audit information rather than the eventual status.
however, I am confused how can I handle the read operation. what if I want to read the balance of an account. what exactly will happen?
here are my questions:
If we can not query the event store(database) directly for reading operation, then where should we query? should it be a cache in memory?
If we query the memory. is the eventual status already there or I have to do a replay (or left-fold) operation to calculate the result. for example, the balance of the account 1 is 50.
I found some bloggers talked about 'subscribe' or 'broadcast'. what are they and broadcast to who?
I will be really appreciated for any suggestion and please corret me if my understanding is wrong.
Great question Nick. The concept you are missing is 'Projections'. When an event is persisted you then broadcast the event. You projection code listens for specific events and then do things like update and create a 'read model'. The read model is a version of the end state (usually persisted but can be done in memory).
The nice thing is that you can highly optimise these read models for reading. Say goodbye to complicated and inefficient joins etc.
Becuase the read model is not the source of truth and it is designed specifically for reading, it is ok to have data duplication in it. Just make sure you manage it when appropriate events are received.
For more info check out these articles:
Overview of a Typical CQRS and ES Application **
How to Build a Master Details View when using CQRS and Event Sourcing
Handling Concurrency Conflicts in a CQRS and Event Sourced system
Hope you find these useful.
** The diagram refers to denormalisation where it should be talking about projections.
You can query the event store. The actual method of querying is specific to every implementation but in general you can poll for events or subscribe and be notified when a new event is persisted.
The event store is just a persistence for the write side that guaranties a strong consistency for the write operations and an eventual consistency for the read operations. In order to "understand" something from the events you need to project those events to a read-model then query the read-model. For example you can have a read-model that contain the current balance for every account as a MongoDB collection.
I am quite new in context of Micro-service architecture and reading this post : http://microservices.io/patterns/data/event-sourcing.html to get familiar with Event sourcing and data storage in Microservice architecture.
I have read many documents about 3 important aspect of system :
Using event sourcing instead of a simply shared DB and ORM and
row update
Events are JAVA objects.
In case of saving data permanently
, we need to use DB (either relational or noSQL)
Here are my questions :
How database comes along with event sourcing? I have read CQRS
pattern, but I can not understand how CQRS pattern is related to
event store and event objects ?
Can any body provide me a
complete picture and set of operations happens with all players to
gather: CQRS pattern , Event sourcing (including event storage
module) and finally different microservices?
In a system
composed of many microservices, should we have one event storage or
each microservice has its own ? or both possible ?
same
question about CQRS. This pattern is implemented in all
microservices or only in one ?
Finally, in case of using
microservice architecture, it is mandatory to have only one DB or
each Microserivce should have its own ?
As you can see, I have understood all small pieces of game , but I can not relate them together to compose a whole image. Specially relevance between CQRS and event sourcing and storing data in DB.
I read many articles for example :
https://ookami86.github.io/event-sourcing-in-practice/
https://msdn.microsoft.com/en-us/library/jj591577.aspx
But in all of them small players are discussed. Even a hand drawing piece of image will be appreciated.
How database comes along with event sourcing? I have read CQRS pattern, but I can not understand how CQRS pattern is related to event store and event objects ?
"Query" part of CQRS instructs you how to create a projection of events, which is applicable in some "bounded context", where the database could be used as a means to persist that projection. "Command" part allows you to isolate data transformation logic and decouple it from the "query" and "persistence" aspects of your app. To simply put it - you just project your event stream into the database in many ways (projection could be relational as well), depending on the task. In this model "query" and "command" have their own way of projecting and storing events data, optimised for the needs of that specific part of the application. Same data will be stored in events and in projections, this will allow achieving simplicity and loose coupling among subdomains (bounded contexts, microservices).
Can any body provide me a complete picture and set of operations happens with all players to gather: CQRS pattern , Event sourcing (including event storage module) and finally different microservices?
Have you seen Greg Young's attempt to provide simplest possible implementation? If you still confused, consider creating more specific question about his example.
In a system composed of many microservices, should we have one event storage or each microservice has its own ? or both possible ?
It is usually one common event storage, but there definitely could be some exceptions, edge cases where you really will need multiple storages for different microservices here and there. It all depends on the business case. If you not sure - most likely you just need a single storage for now.
same question about CQRS. This pattern is implemented in all microservices or only in one ?
It could be implemented in most performance-demanding microservices. It all depends on how complex your implementation becomes when you are introducing CQRS into it. If it gets simpler - why not implement it everywhere? But if people in your team become more and more confused by the need to perform more explicit synchronisation between commands and queries parts - maybe cqrs is too much for you. It all depends on your team, on your domain ... there is no single simple answer, unfortunately.
Finally, in case of using microservice architecture, it is mandatory to have only one DB or each Microservice should have its own ?
If same microservices sharing same tables - this is usually considered as an antipattern, as it increases coupling, the system becomes more fragile. You can still share the same database, but there should be no shared tables. Also, tables from one microservice better not have FK's to tables in another microservice. Same reason - to reduce coupling.
PS: consider not to ask coarse-grained questions, as it will be harder to get a response from people. Several smaller, more specific questions will have better chance to be answered.