Implementing CQRS / ES the proper way - microservices

Recently I'm looking forward to implement the CQRS / ES pattern with Event sourcing in my microservices.
I've been reading for these patterns, but I have some questions that I couldn't find an answer anywhere:
When doing CQRS / ES, should each microservice have its own local
database anymore (Within microservice)?
I know that there will be an event store for writes, and a read-only projection database and i totally understand their purpose, but do microservices need
their own local database for any reason? (Advantages / disadvantages)
Example: Order microservice could have local orders database, item service an items local database etc...apart from the Event source DB and projections database implemented.
How to validate if some data exists in a microservice before
actually issuing a command?
Let's say i want to make a new order, so i assume first I have to
check if that item is still in stock, then perform the other
operation/s.
However, if i want to check if an item is still in stock, where do i
query that data, will it be the projection (read-only) database, or
a local database that each microservice has?
I've read many articles about CQRS / ES at this point, but most of them just explain the concept rather than actually diving into real-life scenarios / explaining how to implement it. I would appreciate if you had any recommendations.
Much appreciated

In general, when dealing with microservices, it's recommended (regardless of whether or not you're doing CQRS/ES) that no two microservices use the same database, or at the very least that no two microservices be writing to the same database. This allows each microservice to control its schema, which only needs to change if the microservice needs it to. One other advantage of this is that the database becomes entirely encapsulated within the service: it's purely an implementation detail.
It's entirely possible that a microservice implementing a read-model might not have a database: it might be able to keep all state in memory (an example might be a read-model which exposes metrics for your monitoring infrastructure), or it might simply be translating events from the write-model into commands to another service (so all of its state is just its position in the event stream).
if i want to check if an item is still in stock, where do i query that data, will it be the projection (read-only) database, or a local database that each microservice has?
In an event-sourced system, every view that's not the stream of events is a projection. So, depending on your requirements, your service can query another service or maintain its own view based on the events.
Note that at any given instant there may exist an event which has been published to the event stream (i.e. it has indisputably happened) but for which there also exists a projection which has not processed the event: the projections are eventually consistent with the event stream. So any check of whether an item is in stock will only tell you that the item was in stock at some point in the past (never mind, to use Greg Young's example, that no in-stock data can guarantee that nothing's been stolen from the warehouse unless the thieves happened to have the decency to update the count as they walked out with their loot). The nanosecond after your query, it might receive word of an event which makes it out-of-stock before you placed your order.
Accordingly, it may just be worth sending a command and letting it get reject your order if the item is not in stock. The write-side (which is the more strongly consistent part of the system, though it should be remembered that in many cases, one component's events are another component's commands) is under no obligation to accept every command; "command" in this context really means "polite request to publish events to the event stream which are conformant with my desired state of the universe".

Related

Is it possible to replicate tables from multiple databases in Google Cloud?

The company that I work at uses a microservices architecture with the 'database per service' pattern. This pattern makes it harder to query based on data from multiple services, since each service has its own database. Imagine a service for managing your products and one for managing stock. You would have to somehow combine the data from both services to query for products based on stock.
I know that event sourcing and API composition are potential solutions to the problem, but I was wondering if it is possible to continuously replicate specific tables from the product and stock databases based on database transaction logs. Wouldn't this be much simpler than say implementing an event based solution like event sourcing? One service that I am working with contains a lot of domain events, which would make implementing and maintaining event-based solution rather complex.
Another reason for why I am considering to look at the problem from a different angle is that there is a lot of data. In-memory joins with say API composition will most likely be slow.
To sum it all up, I would like to know if it is possible to continuously replicate specific tables from different databases into one database.
The technologies that my company uses are primarily Spring Framework and PostgreSQL.
I would step back and ask why you have microservices (including why you have multiple databases). This is because it's quite easy to make choices that are superficially easy but which achieve that ease by negating the reason you had the microservices to begin with, and in such a situation, it may in fact be easier to just not do microservices.
For example, you might be doing microservices because you want to be able to have the team maintaining your product service be able to make changes without coordinating with the stock service or vice versa. By setting up a direct replication of a table from service A's database into service B's database, you essentially require many changes service A might want to make to that table to be coordinated with service B. It's perhaps less operationally coupled than unifying the services into a monolith, but in terms of developer velocity, you're giving up a fair amount.
Alternatively, if the rationale is to allow one service to be down (failures, maintenance, releases: doesn't matter) without taking the others down, a replication which guarantees strong consistency implies that taking service B's database down prevents service A from updating its database (because if you allowed service A to update its database in that situation, you couldn't have strong consistency).
Rather than direct replication, it might make sense to use change data capture (e.g. with Debezium) to publish a stream of changes from the transaction logs (e.g. to Kafka). The critical difference from logical replication is that the consumer can, for instance, choose to ignore updates to columns it doesn't care about: the stock service might include details like where things are stocked in a warehouse, for instance, which is data you don't need for answering a query like "show me the products in this category which are in stock". This can be a nice middle ground between going full event-sourcing and other approaches.

How to implement Event sourcing and a database in a microservice architecture?

I have been learning lately about microservices architecture and it's features.
in this source it appears that event sourcing is replacing a database, however, it is later stated:
The event store is difficult to query since it requires typical queries to reconstruct the state of the business entities. That is likely to be complex and inefficient. As a result, the application must use Command Query Responsibility Segregation (CQRS) to implement queries.
In the CQRS Page the author seems to describe a singular database that listens to all events and reconstructs itself.
My question(s) is:
What is actually needed to implement event sourcing with a queryable database? particularly:
Where is the events database? Where is the queryable database? Do I need to have multiple event stores for every service or can I store events in a message broker like Kafka? is the CQRS database actually is one "whole" database that collects all the events? And how can all of this scale?
I'm sorry if I'm not clear with my question, I am very confused myself. I guess I'm looking for a full example architecture of how things will look in the grand picture.
Where is the queryable database?
I'm guessing this is the most useful starting point, because it will be most familiar. The queryable database is in the same place that your this-is-the-entire-database was when you weren't doing event sourcing.
That could be a database exclusively to support this microservice, or it could be a database that is shared by several microservices, with some part of the schema where this microservice has exclusive write authority. Another way of thinking about this: the microservices are using different logical databases, which might be physically deployed together.
Where is the events database?
Same general idea - you can have one events database per microservice; or you could have several different microservices sharing the same database. Again, you have partitioning of authority, and the same logical vs physical separation to consider.
What changes with the introduction of events and CQRS is that the query/reporting database no longer stores the authoritative copy of the information that is used by the microservice. The authoritative information lives in the event store, and the query/reporting database acts more like a cache.
Our command handlers will typically load information only from the authoritative store (aka the events); that's the data that we lock if we are processing commands concurrently.
We copy information that is stored in the events into the query/reporting database(s). Depending on our needs, that can be done synchronously by the command handlers, but it is more common to use background batch processing to do that work, meaning that the data in the reporting database will often be a little bit stale.
can I store events in a message broker like Kafka?
Current consensus is that Kafka cannot reliably be used for event sourcing as understood by the CQRS community.
https://issues.apache.org/jira/browse/KAFKA-2260
https://cwiki.apache.org/confluence/display/KAFKA/KIP-27+-+Conditional+Publish
Roughly, the problem is this: when you have two processes with the authority to write events, how do you ensure that they don't introduce inconsistencies? With event stores we can use locks, or conditional writes (aka compare and swap), to ensure that nobody came along and snuck in a few extra events that might change the events we are writing.
With Kafka, there doesn't seem to be a mechanism that supports prevention, so you need to lean more into apologies, or something.
the CQRS database actually is one "whole" database that collects all the events?
Logically? No. But you certain can combine them physically into the same appliance. For example, message-db is "just" a postgres schema with some tables, functions, and so on. You certainly could combine that with the tables you use for queries and reports.
I'm looking for a full example architecture of how things will look in the grand picture.
The materials published by Greg Young in 2010 might be a decent starting point.
Event Source is not replacing the DB. It has some benefits and challenges. So, we should choose it wisely. If you are not comfortable then don't choose it. You can implement Microservice Style without event sourcing.
Query able DB - Simple solution is to implement CQRS pattern and keep your Query DB in sync with Event Source DB.
Event DB should be with owner service like if you are keeping events about Order than it should be in Order service. (Yeah, other service can have replica of the same).
You may use Kafka as intermediate storage for event but not the final one.
CQRS is not about one DB. It an pattern where we use to DB models, one is for Command and Another one is for Query.
If you understand Java then please refer Book "Microservice Patterns - Chris Richardson" and if you are from C# or Microsoft technology stack then you may refer "https://github.com/dotnet-architecture/eShopOnAzure".

CQRS Event-sourcing and own database per microservice

I have some questions above event-sourcing and cqrs in microservices architecture.
I understand that after send command some microservice executes it and emits event. Event-store subcsribes on it and saves inside his database. Also some ReadModel basing on this event generates and saves optimized data inside read database.
My first question is - Can microservice has his own database and store
data inside it too? Or maybe in event-sourcing approach microservices
don't have their own databases and everything is only stored inside
event store?
My second question is - when I execute command in microservice and
need some data for validation purposes do I need call ReadModel or
what? Assuming microservices haven't got their own databases I have no
choice?
Can microservice has his own database and store data inside it too?
Definitely, microservice can have its own database. But let's use terms from ES/CQRS. Database can represent Event Store (append-only log of immutabale events) and Read Model - some database used to answer queries which is populated by proseccing events.
So, microservice can have its own Read model, populated from events from other microservices.
Or microservice can process commands and save events to the shared Event Store.
Or microservice can process commands and save events to its own Event store.
Choice is yours, and it depends on degree of separation you want to achieve among microservices.
I would put all events that usually consumed together into same Event store. Which means I should be able to query for these events and have a single ordered stream as a result.
when I execute command in microservice and need some data for validation purposes do I need call ReadModel or what?
Command is executed by Aggregate, that has its own state. This state is built by processing all events for this aggregate, and this state should be used to validate a command.
You cannot/should not talk to Read Models in the command handler, primarily because those read models are not consistent with aggregate state. Aggregate state is consistent.
You can query Read Model before sending a command (to make sure it can be sent). But in command handler you need to rely on aggregate state only.
There is a famous case of registering user with requirement of a unique name. As a primary validation, in your UI code you can query read model and tell user that entered name is taken. If name is not taken, UI lets user issue a command. I'm assuming your Aggregate root is user.
But when processing this command ({id:123, type:CREATE_USER, name:somename}) you cannot check that "somename" is taken, because aggregate state for user 123 does not contain a list of taken names. You can potentially query some AllUsernames read model, but it can be milliseconds old, and some other user could take this "somename" already. So in this scenario, you will find a duplication during adding names to read model. And at that point you can do some compensation action - usually issue a command to suspend a user with duplicated name and ask him to re-register or change his name somehow.
It may seems strange, but if you have a really distributed system with several replicas of user list, you'll have the same problem, so why not just embrace the fact that data is always not fully consistent, and just deal with it?

Example micoservice app with CQRS and Event Sourcing

I'm planning to create a simple microservice app (set and get appointments) with CQRS and Event Sourcing but I'm not sure if I'm getting everything correctly. Here's the plan:
docker container: public delivery app with REST endpoints for getting and settings appointments. The endpoints for settings data are triggering a RabbitMQ event (async), the endpoint for getting data are calling the command service (sync).
docker container: for the command service with connection to a SQL database for setting (and editing) appointments. It's listening to the RabbidMQ event of the main app. A change doesn't overwrite the data but creates a new entry with a new version. When data has changed it also fires an event to sync the new data to the query service.
docker container: the SQL database for the command service.
docker container: the query service with connection to a MongoDB. It's listening for changes in the command service to update its database. It's possible for the main app to call for data but not with REST but with ??
docker container: an event sourcing service to listen to all commands and storing them in a MongoDB.
docker container: the event MongoDB.
Here are a couple of questions I don't get:
let's say there is one appointment in the command database and it already got synced to the query service. Now there is a call for changing the title of this appointment. So the command service is not performing an UPDATE but an INSERT with the same id but a new version number. What is it doing afterwards? Reading the new data from the SQL and triggering an event with it? The query service is listening and storing the same data in its MongoDB? Is it overwriting the old data or also creating a new entry with a version? That seems to be quite redundant? Do I in fact really need the SQL database here?
how can the main app call for data from the query service if one don't want to uses REST?
Because it stores all commands in the event DB (6. docker container) it is possible to restore every state by running all commands again in order. Is that "event sourcing"? Or is it "event sourcing" to not change the data in the SQL but creating a new version for each change? I'm confused what exactely event sourcing is and where to apply it. Do I really need the 5. (and 6.) docker container for event sourcing?
When a client wants to change something but afterwards also show the changed data the only way I see is to trigger the change and than wait (let's say with polling) for the query service to have that data. What's a good way to achieve that? Maybe checking for the existing of the future version number?
Is this whole structure a reasonable architecture or am I completely missing something?
Sorry, a lot of questions but thanks for any help!
Let’s take this one first.
Is this whole structure a reasonable architecture or am I completely
missing something?
Nice architecture plan! I know it feels like there are a lot of moving pieces, but having lots of small pieces instead of one big one is what makes this my favorite pattern.
What is it doing afterwards? Reading the new data from the SQL and
triggering an event with it? The query service is listening and
storing the same data in its MongoDB? Is it overwriting the old data
or also creating a new entry with a version? That seems to be quite
redundant? Do I in fact really need the SQL database here?
There are 2 logical databases (which can be in the same physical database but for scaling reasons it's best if they are not) in CQRS – the domain model and the read model. These are very different structures. The domain model is stored as in any CRUD app with third normal form, etc. The read model is meant to make data reads blazing fast by custom designing tables that match the data a view needs. There will be a lot of data duplication in these tables. The idea is that it’s more responsive to have a table for each view and update that table in when the domain model changes because there’s nobody sitting at a keyboard waiting for the view to render so it’s OK for the view model data generation to take a little longer. This results in some wasted CPU cycles because you could update the view model several times before anyone asked for that view, but that’s OK since we were really using up idle time anyway.
When a command updates an aggregate and persists it to the DB, it generates a message for the view side of CQRS to update the view. There are 2 ways to do this. The first is to send a message saying “aggregate 83483 needs to be updated” and the view model requeries everything it needs from the domain model and updates the view model. The other approach is to send a message saying “aggregate 83483 was updated to have the following values: …” and the read side can update its tables without having to query. The first approach requires fewer message types but more querying, while the second is the opposite. You can mix and match these two approaches in the same system.
Since the read side has very different table structures, you need both databases. On the read side, unless you want the user to be able to see old versions of the appointments, you only have to store the current state of the view so just update existing data. On the command side, keeping historical state using a version number is a good idea, but can make db size grow.
how can the main app call for data from the query service if one don't
want to uses REST?
How the request gets to the query side is unimportant, so you can use REST, postback, GraphQL or whatever.
Is that "event sourcing"?
Event Sourcing is when you persist all changes made to all entities. If the entities are small enough you can persist all properties, but in general events only have changes. Then to get current state you add up all those changes to see what your entities look like at a certain point in time. It has nothing to do with the read model – that’s CQRS. Note that events are not the request from the user to make a change, that’s a message which then is used to create a command. An event is a record of all fields that changed as a result of the command. That’s an important distinction because you don’t want to re-run all that business logic when rehydrating an entity or aggregate.
When a client wants to change something but afterwards also show the
changed data the only way I see is to trigger the change and than wait
(let's say with polling) for the query service to have that data.
What's a good way to achieve that? Maybe checking for the existing of
the future version number?
Showing historical data is a bit sticky. I would push back on this requirement if you can, but sometimes it’s necessary. If you must do it, take the standard read model approach and save all changes to a view model table. If the circumstances are right you can cheat and read historical data directly from the domain model tables, but that’s breaking a CQRS rule. This is important because one of the advantages of CQRS is its scalability. You can scale the read side as much as you want if each read instance maintains its own read database, but having to read from the domain model will ruin this. This is situation dependent so you’ll have to decide on your own, but the best course of action is to try to get that requirement removed.
In terms of timing, CQRS is all about eventual consistency. The data changes may not show up on the read side for a while (typically fractions of a second but that's enough to cause problems). If you must show new and old data, you can poll and wait for the proper version number to appear, which is ugly. There are other alternatives involving result queues in Rabbit, but they are even uglier.

Text search for microservice architectures

I am investigating into implementing text search on a microservice based system. We will have to search for data that span across more than one microservice.
E.g. say we have two services for managing Organisations and managing Contacts. We should be able to search for organisations by contact details in one search operation.
Our preferred search solution is Elasticsearch. We already have a working solution based on embedded objects (and/or parent-child) where when a parent domain is updated the indexing payload is enriched with the dependent object data, which is held in a cache (we avoid making calls to the service managing child directly for this purpose).
I am wondering if there is a better solution. Is there a microservice pattern applicable to such scenarios?
It's not particularly a microservice pattern I would suggest you, but it fits perfectly into microservices and it's called Event sourcing
Event sourcing describes an architectural pattern in which events are generated by different sources. An event will now trigger 0 or more so called Projections which then use the data contained in the event to aggregate information in the form it is needed.
This is directly applicable to your problem: Whenever the organisation service changes it's internal state (Added / removed / updated an organization) it can fire an event. If an organization is added, it will for example aggregate the contacts to this organization and store this aggregate. The search for it is now trivial: Lookup the organizations id in the aggregated information (this can be indexed) and get back the contacts associated with this organization. Of course the same works if contracts are added to the contract service: It just fires a message with the contract creation information and the corresponding projections now alter different aggregates that can again be indexed and searched quickly.
You can have multiple projections responding to a single event - which enables you to aggregate information in many different forms - exactly the way you'd like to query it later. Don't be afraid of duplicated data: event sourcing takes this trade-off intentionally and since this is not the data your business-services rely on and you do not need to alter it manually - this duplication will not hurt you.
If you store the events in the chronological order they happened (which I seriously advise you to do!) You can 'replay' these events over and over again. This helps for example if a projection was buggy and has to be fixed!
If your're interested I suggest you read up on event sourcing and look for some kind of event store:
event sourcing
event store
We use event sourcing to aggregate an array of different searches in our system and we aggregate millons of records every day into mongodb. All projections have their own collection create their own indexes and until now we never had to resort to different systems / patterns like elastic search or the likes!
Let me know if this helped!
Amendment
use the data contained in the event to aggregate information in the form it is needed
An event should contain all the information necessary to aggregate more information. For example if you have an organization creation event, you need to at least provide some information on what the organizations name is, an ID of some kind, creation date, parent organizations ID etc. As a rule of thumb, we send all the information we gather in the service that gets the request (don't take it directly form the request ;-) check it first, then write it to the event and send it off) because we do not know what we're gonna need in the future. Just stay cautious - payloads should not get too large!
We can now have multiple projections responding to this event: One that adds the organizations to it's parents aggregate (to get an easy lookup for all children of a given organization), one that just adds it to the search set of all organizations and maybe a third that aggregates all the parents of a given child organization so the lookup for the parent organizations is easy and fast.
We have the same service process these events that also process client requests. The motivation behind it is, that the schema of the data that your projections create is tightly coupled to the way it is read by the service that the client interacts with. This does not have to be that way and it could be separated into two services - but you create an almost invisible dependency there and releasing these two services independently becomes even more challenging. But if you do not mind that additional level of complexity - you can separate the two.
We're currently also considering writing a generic service for aggregating information from events for things like searches, where projections could be scripted. That only makes the invisible dependencies problem less conspicuous, it does not solve it.

Resources