How to insert static data in event sourcing? - microservices

We’re migrating from a monolithic application to microservice and we use event sourcing and CQRS.
Each service has its own read models. When a service needs to insert data, it fires an event. The services which use that data, will update their read models.
The challenge I'm facing now is how to deal with static data. Back when we had a monolithic application, we had created some database scripts to insert these into the database.
Now that every service has its own read model how should we insert this static data?
My database is postgresql.

There are a few possibilities which come to mind.
If the static data is just to enable further normalization in a relational DB, since CQRS/ES tends to denormalize, you may well be able to do without it and just include the denormalized data in your events.
Alternatively, you can provide the static data as its own stream of events (note that this stream will be fairly "dry" with very few events flowing). Read models that need it can incorporate it (perhaps by incorporating it as static data in their DB).
If the data is exceptionally static and all your services are implemented in the same language, it might also make sense to build it into a library.

Related

Migrating an asynchronous businness flow to an event-driven system

In the effort to redesign an asynchronous flow based functional service to an event driven one, we have come up with changes on different part of this system. The service receives various statuses from external services through the API, which does computations and persists the result into the data store. The core logic is now moved from the api by introducing a queue (Kafka). Similarly the query functionality is provided through another interface (api) fronted by web UI. With this the command and query are separated. See below the diagram.
I have few questions on the approach
Is it right to have the query API (read) service & the event-complete-handler (write) operate on the same database with both dependent on the DB schema? Or is it better to have the query-api read from the replica DB?
The core-business-logic, at the end of computation, writes only to database and not to db+Kafka in a single transaction. Persisting to the database is handled by the event-complete-handler. Is this approach better?
Say in the future, if the core-business-logic needs to query the database to do the computation on every event, can it directly read from the database? Again, does it not create DB schema dependency between the services?
Is it right to have the query API (read) service & the event-complete-handler (write) operate on the same database with both dependent on the DB schema? Or is it better to have the query-api read from the replica DB?
"Right" is a loaded term. The idea behind CQRS is that the pattern can allow you to separate commands and queries so that your system can be distributed and scaled out. Typically they would be using different databases in a SOA/Microservice architecture. One service would process the command which produces an event on the service bus. Query handlers would listen to this event to change their data for querying.
For example:
A service which process the CreateWidgetCommand would produce an event onto the bus with the properties of the command.
Any query services which are interested widgets for producing their data views would subscribe to this event type.
When the event is produced, the subscribed query handlers will consume the event and update their respective databases.
When the query is invoked, their interrogate their own database.
This means you could, in theory, make the command handler as simple as throwing the event onto the bus.
The core-business-logic, at the end of computation, writes only to database and not to db+Kafka in a single transaction. Persisting to the database is handled by the event-complete-handler. Is this approach better?
No. If you question is about the transactionality of distributed systems, you cannot rely on traditional transactions, since any commands may be affecting any number of distributed data stores. The way transactionality is handled in distributed systems is often with a compensating transaction, where you code the steps to reverse the mutations made from consuming the bus messages.
Say in the future, if the core-business-logic needs to query the database to do the computation on every event, can it directly read from the database? Again, does it not create DB schema dependency between the services?
If you follow the advice in the first response, the approach here should be obvious. All distinct queries are built from their own database, which are kept "eventually consistent" by consuming events from the bus.
Typically these architectures have major complexity downsides, especially if you are concerned with consistency and transactionality.
People don't generally implement this type of architecture unless there is a specific need.
You can however design your code around CQRS and DDD so that in the future, transitioning to this type of architecture can be relatively painless.
The topic of DDD is too dense for this answer. I encourage you to do some independent learning.

How to implement Event sourcing and a database in a microservice architecture?

I have been learning lately about microservices architecture and it's features.
in this source it appears that event sourcing is replacing a database, however, it is later stated:
The event store is difficult to query since it requires typical queries to reconstruct the state of the business entities. That is likely to be complex and inefficient. As a result, the application must use Command Query Responsibility Segregation (CQRS) to implement queries.
In the CQRS Page the author seems to describe a singular database that listens to all events and reconstructs itself.
My question(s) is:
What is actually needed to implement event sourcing with a queryable database? particularly:
Where is the events database? Where is the queryable database? Do I need to have multiple event stores for every service or can I store events in a message broker like Kafka? is the CQRS database actually is one "whole" database that collects all the events? And how can all of this scale?
I'm sorry if I'm not clear with my question, I am very confused myself. I guess I'm looking for a full example architecture of how things will look in the grand picture.
Where is the queryable database?
I'm guessing this is the most useful starting point, because it will be most familiar. The queryable database is in the same place that your this-is-the-entire-database was when you weren't doing event sourcing.
That could be a database exclusively to support this microservice, or it could be a database that is shared by several microservices, with some part of the schema where this microservice has exclusive write authority. Another way of thinking about this: the microservices are using different logical databases, which might be physically deployed together.
Where is the events database?
Same general idea - you can have one events database per microservice; or you could have several different microservices sharing the same database. Again, you have partitioning of authority, and the same logical vs physical separation to consider.
What changes with the introduction of events and CQRS is that the query/reporting database no longer stores the authoritative copy of the information that is used by the microservice. The authoritative information lives in the event store, and the query/reporting database acts more like a cache.
Our command handlers will typically load information only from the authoritative store (aka the events); that's the data that we lock if we are processing commands concurrently.
We copy information that is stored in the events into the query/reporting database(s). Depending on our needs, that can be done synchronously by the command handlers, but it is more common to use background batch processing to do that work, meaning that the data in the reporting database will often be a little bit stale.
can I store events in a message broker like Kafka?
Current consensus is that Kafka cannot reliably be used for event sourcing as understood by the CQRS community.
https://issues.apache.org/jira/browse/KAFKA-2260
https://cwiki.apache.org/confluence/display/KAFKA/KIP-27+-+Conditional+Publish
Roughly, the problem is this: when you have two processes with the authority to write events, how do you ensure that they don't introduce inconsistencies? With event stores we can use locks, or conditional writes (aka compare and swap), to ensure that nobody came along and snuck in a few extra events that might change the events we are writing.
With Kafka, there doesn't seem to be a mechanism that supports prevention, so you need to lean more into apologies, or something.
the CQRS database actually is one "whole" database that collects all the events?
Logically? No. But you certain can combine them physically into the same appliance. For example, message-db is "just" a postgres schema with some tables, functions, and so on. You certainly could combine that with the tables you use for queries and reports.
I'm looking for a full example architecture of how things will look in the grand picture.
The materials published by Greg Young in 2010 might be a decent starting point.
Event Source is not replacing the DB. It has some benefits and challenges. So, we should choose it wisely. If you are not comfortable then don't choose it. You can implement Microservice Style without event sourcing.
Query able DB - Simple solution is to implement CQRS pattern and keep your Query DB in sync with Event Source DB.
Event DB should be with owner service like if you are keeping events about Order than it should be in Order service. (Yeah, other service can have replica of the same).
You may use Kafka as intermediate storage for event but not the final one.
CQRS is not about one DB. It an pattern where we use to DB models, one is for Command and Another one is for Query.
If you understand Java then please refer Book "Microservice Patterns - Chris Richardson" and if you are from C# or Microsoft technology stack then you may refer "https://github.com/dotnet-architecture/eShopOnAzure".

How to sync data between databases (each database for each instance of a service) in Microservices?

If each instance of service has a separate database in Microservices architecture, how can we keep the data synced? For instance, if instace#1 serves a request and stores data in its database db#1 and another request on instannce#2 wants the data that was inserted to db#1 through instance#1, how can the database db#2 of instance#2 get the data from the database db#1 of instance#2? I think z-scaling is the solution here!
The microservice architecture uses a pattern called 'Eventual consistency'. Like you described, newly inserted data won't be directly available in all databases. You can read more about it here
That being said, the CQRS pattern is a populair way to solve the data distrubution / eventual consistency problem.
By using a messagebroker / bus, you can publish so called 'events' on a queue.
Microservices interested in changes / certain entities, can subscribe to those entities and save them in their own database.
This enables loosely coupled microservices, and the data necessary for certain entities is stored in the same database. Data duplication is ok, since we use eventual cosistency to make sure (eventually) everything is in sync over all microservices.
More information about the CQRS pattern using microservices can be found here
Here's a more practical example of something i'm working on right now. The language is in Dutch, but the flow should be self explanatory:
Hope this helps!
I suggest reading up on the following topics: CQRS, microservices, eventual consistency and messagebrokers (rabbitmq, kafka, etc)

Microservice cross-db referencial integrity

We have a database that manages codes, such as a list of valid currencies, a list of country codes, etc (hereinafter known as CodesDB).
We also have multiple microservices that in a monolithic app + database would have foreign key constraints to rows in tables in the CodesDB.
When a microservice receives a request to modify data, what are my options for ensuring the codes passed in the request are valid?
I am currently leaning towards having the CodesDB microservice post an event onto a service bus announcing when a code is added or modified - and then each other microservice interested in that type of code (country / currency / etc) can then issue an API request to the CodeDB microservice to grab the state it needs and reflect the changes in its own local DB. That way we get referential integrity within each microservice DB.
Is this the correct approach? Are there any other recommended approaches?
Asynchronous event based notification is a pattern commonly used in micro services world for ensuring eventual consistency. Depending on how strict your consistency requirement are you may have to ensure additional checks.
Another possible approach could be to use
Read only data stores using materialized view. This is a form of CQRS pattern where data from multiple services is stored in a de-normalized form in read only data store. The data gets updated asynchronously using the approach mentioned above. The consumers gets fast access to data without having to query multiple services
Caching - You could also possibly use distributed or replicated depending on your performance or consistency requirements.

Example micoservice app with CQRS and Event Sourcing

I'm planning to create a simple microservice app (set and get appointments) with CQRS and Event Sourcing but I'm not sure if I'm getting everything correctly. Here's the plan:
docker container: public delivery app with REST endpoints for getting and settings appointments. The endpoints for settings data are triggering a RabbitMQ event (async), the endpoint for getting data are calling the command service (sync).
docker container: for the command service with connection to a SQL database for setting (and editing) appointments. It's listening to the RabbidMQ event of the main app. A change doesn't overwrite the data but creates a new entry with a new version. When data has changed it also fires an event to sync the new data to the query service.
docker container: the SQL database for the command service.
docker container: the query service with connection to a MongoDB. It's listening for changes in the command service to update its database. It's possible for the main app to call for data but not with REST but with ??
docker container: an event sourcing service to listen to all commands and storing them in a MongoDB.
docker container: the event MongoDB.
Here are a couple of questions I don't get:
let's say there is one appointment in the command database and it already got synced to the query service. Now there is a call for changing the title of this appointment. So the command service is not performing an UPDATE but an INSERT with the same id but a new version number. What is it doing afterwards? Reading the new data from the SQL and triggering an event with it? The query service is listening and storing the same data in its MongoDB? Is it overwriting the old data or also creating a new entry with a version? That seems to be quite redundant? Do I in fact really need the SQL database here?
how can the main app call for data from the query service if one don't want to uses REST?
Because it stores all commands in the event DB (6. docker container) it is possible to restore every state by running all commands again in order. Is that "event sourcing"? Or is it "event sourcing" to not change the data in the SQL but creating a new version for each change? I'm confused what exactely event sourcing is and where to apply it. Do I really need the 5. (and 6.) docker container for event sourcing?
When a client wants to change something but afterwards also show the changed data the only way I see is to trigger the change and than wait (let's say with polling) for the query service to have that data. What's a good way to achieve that? Maybe checking for the existing of the future version number?
Is this whole structure a reasonable architecture or am I completely missing something?
Sorry, a lot of questions but thanks for any help!
Let’s take this one first.
Is this whole structure a reasonable architecture or am I completely
missing something?
Nice architecture plan! I know it feels like there are a lot of moving pieces, but having lots of small pieces instead of one big one is what makes this my favorite pattern.
What is it doing afterwards? Reading the new data from the SQL and
triggering an event with it? The query service is listening and
storing the same data in its MongoDB? Is it overwriting the old data
or also creating a new entry with a version? That seems to be quite
redundant? Do I in fact really need the SQL database here?
There are 2 logical databases (which can be in the same physical database but for scaling reasons it's best if they are not) in CQRS – the domain model and the read model. These are very different structures. The domain model is stored as in any CRUD app with third normal form, etc. The read model is meant to make data reads blazing fast by custom designing tables that match the data a view needs. There will be a lot of data duplication in these tables. The idea is that it’s more responsive to have a table for each view and update that table in when the domain model changes because there’s nobody sitting at a keyboard waiting for the view to render so it’s OK for the view model data generation to take a little longer. This results in some wasted CPU cycles because you could update the view model several times before anyone asked for that view, but that’s OK since we were really using up idle time anyway.
When a command updates an aggregate and persists it to the DB, it generates a message for the view side of CQRS to update the view. There are 2 ways to do this. The first is to send a message saying “aggregate 83483 needs to be updated” and the view model requeries everything it needs from the domain model and updates the view model. The other approach is to send a message saying “aggregate 83483 was updated to have the following values: …” and the read side can update its tables without having to query. The first approach requires fewer message types but more querying, while the second is the opposite. You can mix and match these two approaches in the same system.
Since the read side has very different table structures, you need both databases. On the read side, unless you want the user to be able to see old versions of the appointments, you only have to store the current state of the view so just update existing data. On the command side, keeping historical state using a version number is a good idea, but can make db size grow.
how can the main app call for data from the query service if one don't
want to uses REST?
How the request gets to the query side is unimportant, so you can use REST, postback, GraphQL or whatever.
Is that "event sourcing"?
Event Sourcing is when you persist all changes made to all entities. If the entities are small enough you can persist all properties, but in general events only have changes. Then to get current state you add up all those changes to see what your entities look like at a certain point in time. It has nothing to do with the read model – that’s CQRS. Note that events are not the request from the user to make a change, that’s a message which then is used to create a command. An event is a record of all fields that changed as a result of the command. That’s an important distinction because you don’t want to re-run all that business logic when rehydrating an entity or aggregate.
When a client wants to change something but afterwards also show the
changed data the only way I see is to trigger the change and than wait
(let's say with polling) for the query service to have that data.
What's a good way to achieve that? Maybe checking for the existing of
the future version number?
Showing historical data is a bit sticky. I would push back on this requirement if you can, but sometimes it’s necessary. If you must do it, take the standard read model approach and save all changes to a view model table. If the circumstances are right you can cheat and read historical data directly from the domain model tables, but that’s breaking a CQRS rule. This is important because one of the advantages of CQRS is its scalability. You can scale the read side as much as you want if each read instance maintains its own read database, but having to read from the domain model will ruin this. This is situation dependent so you’ll have to decide on your own, but the best course of action is to try to get that requirement removed.
In terms of timing, CQRS is all about eventual consistency. The data changes may not show up on the read side for a while (typically fractions of a second but that's enough to cause problems). If you must show new and old data, you can poll and wait for the proper version number to appear, which is ugly. There are other alternatives involving result queues in Rabbit, but they are even uglier.

Resources