How to introduce CQRS to an existing project? - elasticsearch

I want to introduce CQRS using asynchronous events and read models stored in elasticsearch.
I have two questions:
How can I fill elasticsearch by data to make it consistent with entire system? I have a lot of data in other microservices (user, item etc.). After deploy feature with CQRS elasticsearch will listen for different events to track system state.
What are strategies to make elasticsearch consistent with entire system when something went wrong (for example RabbitMQ didn't consume some events).

Well the way you are placing up the scenarios , the two things those directly strike are :
Eventual Consistency
Following Saga patterns to take care of reverting the data
If you are having some write database other than elastic search then probably sending events through some messaging broker only can make the data consistent in elastic search engine.
In case any data revert is to be done then probably saga patterns are to be followed.

Related

Migrating an asynchronous businness flow to an event-driven system

In the effort to redesign an asynchronous flow based functional service to an event driven one, we have come up with changes on different part of this system. The service receives various statuses from external services through the API, which does computations and persists the result into the data store. The core logic is now moved from the api by introducing a queue (Kafka). Similarly the query functionality is provided through another interface (api) fronted by web UI. With this the command and query are separated. See below the diagram.
I have few questions on the approach
Is it right to have the query API (read) service & the event-complete-handler (write) operate on the same database with both dependent on the DB schema? Or is it better to have the query-api read from the replica DB?
The core-business-logic, at the end of computation, writes only to database and not to db+Kafka in a single transaction. Persisting to the database is handled by the event-complete-handler. Is this approach better?
Say in the future, if the core-business-logic needs to query the database to do the computation on every event, can it directly read from the database? Again, does it not create DB schema dependency between the services?
Is it right to have the query API (read) service & the event-complete-handler (write) operate on the same database with both dependent on the DB schema? Or is it better to have the query-api read from the replica DB?
"Right" is a loaded term. The idea behind CQRS is that the pattern can allow you to separate commands and queries so that your system can be distributed and scaled out. Typically they would be using different databases in a SOA/Microservice architecture. One service would process the command which produces an event on the service bus. Query handlers would listen to this event to change their data for querying.
For example:
A service which process the CreateWidgetCommand would produce an event onto the bus with the properties of the command.
Any query services which are interested widgets for producing their data views would subscribe to this event type.
When the event is produced, the subscribed query handlers will consume the event and update their respective databases.
When the query is invoked, their interrogate their own database.
This means you could, in theory, make the command handler as simple as throwing the event onto the bus.
The core-business-logic, at the end of computation, writes only to database and not to db+Kafka in a single transaction. Persisting to the database is handled by the event-complete-handler. Is this approach better?
No. If you question is about the transactionality of distributed systems, you cannot rely on traditional transactions, since any commands may be affecting any number of distributed data stores. The way transactionality is handled in distributed systems is often with a compensating transaction, where you code the steps to reverse the mutations made from consuming the bus messages.
Say in the future, if the core-business-logic needs to query the database to do the computation on every event, can it directly read from the database? Again, does it not create DB schema dependency between the services?
If you follow the advice in the first response, the approach here should be obvious. All distinct queries are built from their own database, which are kept "eventually consistent" by consuming events from the bus.
Typically these architectures have major complexity downsides, especially if you are concerned with consistency and transactionality.
People don't generally implement this type of architecture unless there is a specific need.
You can however design your code around CQRS and DDD so that in the future, transitioning to this type of architecture can be relatively painless.
The topic of DDD is too dense for this answer. I encourage you to do some independent learning.

Implementing CQRS / ES the proper way

Recently I'm looking forward to implement the CQRS / ES pattern with Event sourcing in my microservices.
I've been reading for these patterns, but I have some questions that I couldn't find an answer anywhere:
When doing CQRS / ES, should each microservice have its own local
database anymore (Within microservice)?
I know that there will be an event store for writes, and a read-only projection database and i totally understand their purpose, but do microservices need
their own local database for any reason? (Advantages / disadvantages)
Example: Order microservice could have local orders database, item service an items local database etc...apart from the Event source DB and projections database implemented.
How to validate if some data exists in a microservice before
actually issuing a command?
Let's say i want to make a new order, so i assume first I have to
check if that item is still in stock, then perform the other
operation/s.
However, if i want to check if an item is still in stock, where do i
query that data, will it be the projection (read-only) database, or
a local database that each microservice has?
I've read many articles about CQRS / ES at this point, but most of them just explain the concept rather than actually diving into real-life scenarios / explaining how to implement it. I would appreciate if you had any recommendations.
Much appreciated
In general, when dealing with microservices, it's recommended (regardless of whether or not you're doing CQRS/ES) that no two microservices use the same database, or at the very least that no two microservices be writing to the same database. This allows each microservice to control its schema, which only needs to change if the microservice needs it to. One other advantage of this is that the database becomes entirely encapsulated within the service: it's purely an implementation detail.
It's entirely possible that a microservice implementing a read-model might not have a database: it might be able to keep all state in memory (an example might be a read-model which exposes metrics for your monitoring infrastructure), or it might simply be translating events from the write-model into commands to another service (so all of its state is just its position in the event stream).
if i want to check if an item is still in stock, where do i query that data, will it be the projection (read-only) database, or a local database that each microservice has?
In an event-sourced system, every view that's not the stream of events is a projection. So, depending on your requirements, your service can query another service or maintain its own view based on the events.
Note that at any given instant there may exist an event which has been published to the event stream (i.e. it has indisputably happened) but for which there also exists a projection which has not processed the event: the projections are eventually consistent with the event stream. So any check of whether an item is in stock will only tell you that the item was in stock at some point in the past (never mind, to use Greg Young's example, that no in-stock data can guarantee that nothing's been stolen from the warehouse unless the thieves happened to have the decency to update the count as they walked out with their loot). The nanosecond after your query, it might receive word of an event which makes it out-of-stock before you placed your order.
Accordingly, it may just be worth sending a command and letting it get reject your order if the item is not in stock. The write-side (which is the more strongly consistent part of the system, though it should be remembered that in many cases, one component's events are another component's commands) is under no obligation to accept every command; "command" in this context really means "polite request to publish events to the event stream which are conformant with my desired state of the universe".

How to implement Event sourcing and a database in a microservice architecture?

I have been learning lately about microservices architecture and it's features.
in this source it appears that event sourcing is replacing a database, however, it is later stated:
The event store is difficult to query since it requires typical queries to reconstruct the state of the business entities. That is likely to be complex and inefficient. As a result, the application must use Command Query Responsibility Segregation (CQRS) to implement queries.
In the CQRS Page the author seems to describe a singular database that listens to all events and reconstructs itself.
My question(s) is:
What is actually needed to implement event sourcing with a queryable database? particularly:
Where is the events database? Where is the queryable database? Do I need to have multiple event stores for every service or can I store events in a message broker like Kafka? is the CQRS database actually is one "whole" database that collects all the events? And how can all of this scale?
I'm sorry if I'm not clear with my question, I am very confused myself. I guess I'm looking for a full example architecture of how things will look in the grand picture.
Where is the queryable database?
I'm guessing this is the most useful starting point, because it will be most familiar. The queryable database is in the same place that your this-is-the-entire-database was when you weren't doing event sourcing.
That could be a database exclusively to support this microservice, or it could be a database that is shared by several microservices, with some part of the schema where this microservice has exclusive write authority. Another way of thinking about this: the microservices are using different logical databases, which might be physically deployed together.
Where is the events database?
Same general idea - you can have one events database per microservice; or you could have several different microservices sharing the same database. Again, you have partitioning of authority, and the same logical vs physical separation to consider.
What changes with the introduction of events and CQRS is that the query/reporting database no longer stores the authoritative copy of the information that is used by the microservice. The authoritative information lives in the event store, and the query/reporting database acts more like a cache.
Our command handlers will typically load information only from the authoritative store (aka the events); that's the data that we lock if we are processing commands concurrently.
We copy information that is stored in the events into the query/reporting database(s). Depending on our needs, that can be done synchronously by the command handlers, but it is more common to use background batch processing to do that work, meaning that the data in the reporting database will often be a little bit stale.
can I store events in a message broker like Kafka?
Current consensus is that Kafka cannot reliably be used for event sourcing as understood by the CQRS community.
https://issues.apache.org/jira/browse/KAFKA-2260
https://cwiki.apache.org/confluence/display/KAFKA/KIP-27+-+Conditional+Publish
Roughly, the problem is this: when you have two processes with the authority to write events, how do you ensure that they don't introduce inconsistencies? With event stores we can use locks, or conditional writes (aka compare and swap), to ensure that nobody came along and snuck in a few extra events that might change the events we are writing.
With Kafka, there doesn't seem to be a mechanism that supports prevention, so you need to lean more into apologies, or something.
the CQRS database actually is one "whole" database that collects all the events?
Logically? No. But you certain can combine them physically into the same appliance. For example, message-db is "just" a postgres schema with some tables, functions, and so on. You certainly could combine that with the tables you use for queries and reports.
I'm looking for a full example architecture of how things will look in the grand picture.
The materials published by Greg Young in 2010 might be a decent starting point.
Event Source is not replacing the DB. It has some benefits and challenges. So, we should choose it wisely. If you are not comfortable then don't choose it. You can implement Microservice Style without event sourcing.
Query able DB - Simple solution is to implement CQRS pattern and keep your Query DB in sync with Event Source DB.
Event DB should be with owner service like if you are keeping events about Order than it should be in Order service. (Yeah, other service can have replica of the same).
You may use Kafka as intermediate storage for event but not the final one.
CQRS is not about one DB. It an pattern where we use to DB models, one is for Command and Another one is for Query.
If you understand Java then please refer Book "Microservice Patterns - Chris Richardson" and if you are from C# or Microsoft technology stack then you may refer "https://github.com/dotnet-architecture/eShopOnAzure".

How to sync data between databases (each database for each instance of a service) in Microservices?

If each instance of service has a separate database in Microservices architecture, how can we keep the data synced? For instance, if instace#1 serves a request and stores data in its database db#1 and another request on instannce#2 wants the data that was inserted to db#1 through instance#1, how can the database db#2 of instance#2 get the data from the database db#1 of instance#2? I think z-scaling is the solution here!
The microservice architecture uses a pattern called 'Eventual consistency'. Like you described, newly inserted data won't be directly available in all databases. You can read more about it here
That being said, the CQRS pattern is a populair way to solve the data distrubution / eventual consistency problem.
By using a messagebroker / bus, you can publish so called 'events' on a queue.
Microservices interested in changes / certain entities, can subscribe to those entities and save them in their own database.
This enables loosely coupled microservices, and the data necessary for certain entities is stored in the same database. Data duplication is ok, since we use eventual cosistency to make sure (eventually) everything is in sync over all microservices.
More information about the CQRS pattern using microservices can be found here
Here's a more practical example of something i'm working on right now. The language is in Dutch, but the flow should be self explanatory:
Hope this helps!
I suggest reading up on the following topics: CQRS, microservices, eventual consistency and messagebrokers (rabbitmq, kafka, etc)

Given a data ingestion platform with various use cases, what would a good data store be for user configuration data?

The initial use case for our multi-tenant data ingestion platform was to pull in RSS data, file meta data and SQL query results. For this, ElasticSearch was chosen as the data store and Kafka as the microservices message broker.
New streaming, low-latency and time-series data are another requirement. Thus, ElasticSearch is not a contender for this in favor of Aerospike or InfluxDB.
The initial plan was to put user account and configuration data into an ElasticSearch index/topic, as I wanted to have everything in ES.
Based on our growing requirements I can see we may have a variety of different database types depending on the use case. Would continuing to store this information in ES still be a good idea?
Using Kafka as the micro-services bus.
Since you are asking in a Kafka tag, I'm assuming that no matter the use-case and its data store, Kafka will definitely be used.
So why not store user configuration in Kafka?
It sounds like a fairly small topic, so you can set the retention to 100 years or something similar. If you expect user configuration to change often, you can make it a compacted topic. Now when you microservices start, they just need to read this topic and store the configuration in their memory. This will give you the flexibility to choose the right data store for your application data without worrying too much about the configuration.

Resources