Addressing CRUD "tables" in event sourcing - event-sourcing

I'm starting down an ES journey and want to know if traditional support tables should be stored in the event log or should those be handled differently? These tables would typical have a CRUD page. In other words, would it be common to have 2 approaches in the same application, one for support tables and one for transactional data?
A support table would be like "Account" in an accounting application or "Product Type" or the actual "Product" table in an ERP application (I'm not writing an ERP application - that's an example of the type of table I'm talking about).
If we store CRUD-type data in the event log, then we might have events:
ProductCreated
ProductUpdated
ProductDeleted (which would just mark it as deleted)
Then, do we attempt to find out what changed (in ProductUpdated event) and just store the change and replay to get the latest image of the Product?
Mostly, I'm after what approach to use for CRUD tables - traditional or store in the event log? Additional information would be great!

Suppose you start purely with an event log, including for events like ProductCreated, etc., and no other data store. What happens then is that every time your application starts up, it has to replay all the events in the log to build its current state.
Now, suppose you create a traditional SQL table to store the current state of your app (say a products table) and the ID of the last event that was processed to get to that state (say a last_event table). What happens then is every time your app starts up, it has to replay only the events with higher IDs than the stored ID and process those to build its new state.
On the flip side, your app now has to be careful to keep these two states synchronised. If you need to have concurrency, you'll need to be careful to do atomic operations only on your SQL tables--but that should be reasonably easy with transacctions.

Your support tables are just a read-model/projection of the event stream. In general you don't create those support models in case you need them. You create a read-model only if you use it somewhere in the UI.
Anyway, one important benefit behind Event sourcing is that you won't need to use join in your queries. That is, you create a table for each read-model that contains all the data it needs - full denormalisation. You keep that table super-optimised for the query.

Related

Implementing CQRS / ES the proper way

Recently I'm looking forward to implement the CQRS / ES pattern with Event sourcing in my microservices.
I've been reading for these patterns, but I have some questions that I couldn't find an answer anywhere:
When doing CQRS / ES, should each microservice have its own local
database anymore (Within microservice)?
I know that there will be an event store for writes, and a read-only projection database and i totally understand their purpose, but do microservices need
their own local database for any reason? (Advantages / disadvantages)
Example: Order microservice could have local orders database, item service an items local database etc...apart from the Event source DB and projections database implemented.
How to validate if some data exists in a microservice before
actually issuing a command?
Let's say i want to make a new order, so i assume first I have to
check if that item is still in stock, then perform the other
operation/s.
However, if i want to check if an item is still in stock, where do i
query that data, will it be the projection (read-only) database, or
a local database that each microservice has?
I've read many articles about CQRS / ES at this point, but most of them just explain the concept rather than actually diving into real-life scenarios / explaining how to implement it. I would appreciate if you had any recommendations.
Much appreciated
In general, when dealing with microservices, it's recommended (regardless of whether or not you're doing CQRS/ES) that no two microservices use the same database, or at the very least that no two microservices be writing to the same database. This allows each microservice to control its schema, which only needs to change if the microservice needs it to. One other advantage of this is that the database becomes entirely encapsulated within the service: it's purely an implementation detail.
It's entirely possible that a microservice implementing a read-model might not have a database: it might be able to keep all state in memory (an example might be a read-model which exposes metrics for your monitoring infrastructure), or it might simply be translating events from the write-model into commands to another service (so all of its state is just its position in the event stream).
if i want to check if an item is still in stock, where do i query that data, will it be the projection (read-only) database, or a local database that each microservice has?
In an event-sourced system, every view that's not the stream of events is a projection. So, depending on your requirements, your service can query another service or maintain its own view based on the events.
Note that at any given instant there may exist an event which has been published to the event stream (i.e. it has indisputably happened) but for which there also exists a projection which has not processed the event: the projections are eventually consistent with the event stream. So any check of whether an item is in stock will only tell you that the item was in stock at some point in the past (never mind, to use Greg Young's example, that no in-stock data can guarantee that nothing's been stolen from the warehouse unless the thieves happened to have the decency to update the count as they walked out with their loot). The nanosecond after your query, it might receive word of an event which makes it out-of-stock before you placed your order.
Accordingly, it may just be worth sending a command and letting it get reject your order if the item is not in stock. The write-side (which is the more strongly consistent part of the system, though it should be remembered that in many cases, one component's events are another component's commands) is under no obligation to accept every command; "command" in this context really means "polite request to publish events to the event stream which are conformant with my desired state of the universe".

Example micoservice app with CQRS and Event Sourcing

I'm planning to create a simple microservice app (set and get appointments) with CQRS and Event Sourcing but I'm not sure if I'm getting everything correctly. Here's the plan:
docker container: public delivery app with REST endpoints for getting and settings appointments. The endpoints for settings data are triggering a RabbitMQ event (async), the endpoint for getting data are calling the command service (sync).
docker container: for the command service with connection to a SQL database for setting (and editing) appointments. It's listening to the RabbidMQ event of the main app. A change doesn't overwrite the data but creates a new entry with a new version. When data has changed it also fires an event to sync the new data to the query service.
docker container: the SQL database for the command service.
docker container: the query service with connection to a MongoDB. It's listening for changes in the command service to update its database. It's possible for the main app to call for data but not with REST but with ??
docker container: an event sourcing service to listen to all commands and storing them in a MongoDB.
docker container: the event MongoDB.
Here are a couple of questions I don't get:
let's say there is one appointment in the command database and it already got synced to the query service. Now there is a call for changing the title of this appointment. So the command service is not performing an UPDATE but an INSERT with the same id but a new version number. What is it doing afterwards? Reading the new data from the SQL and triggering an event with it? The query service is listening and storing the same data in its MongoDB? Is it overwriting the old data or also creating a new entry with a version? That seems to be quite redundant? Do I in fact really need the SQL database here?
how can the main app call for data from the query service if one don't want to uses REST?
Because it stores all commands in the event DB (6. docker container) it is possible to restore every state by running all commands again in order. Is that "event sourcing"? Or is it "event sourcing" to not change the data in the SQL but creating a new version for each change? I'm confused what exactely event sourcing is and where to apply it. Do I really need the 5. (and 6.) docker container for event sourcing?
When a client wants to change something but afterwards also show the changed data the only way I see is to trigger the change and than wait (let's say with polling) for the query service to have that data. What's a good way to achieve that? Maybe checking for the existing of the future version number?
Is this whole structure a reasonable architecture or am I completely missing something?
Sorry, a lot of questions but thanks for any help!
Let’s take this one first.
Is this whole structure a reasonable architecture or am I completely
missing something?
Nice architecture plan! I know it feels like there are a lot of moving pieces, but having lots of small pieces instead of one big one is what makes this my favorite pattern.
What is it doing afterwards? Reading the new data from the SQL and
triggering an event with it? The query service is listening and
storing the same data in its MongoDB? Is it overwriting the old data
or also creating a new entry with a version? That seems to be quite
redundant? Do I in fact really need the SQL database here?
There are 2 logical databases (which can be in the same physical database but for scaling reasons it's best if they are not) in CQRS – the domain model and the read model. These are very different structures. The domain model is stored as in any CRUD app with third normal form, etc. The read model is meant to make data reads blazing fast by custom designing tables that match the data a view needs. There will be a lot of data duplication in these tables. The idea is that it’s more responsive to have a table for each view and update that table in when the domain model changes because there’s nobody sitting at a keyboard waiting for the view to render so it’s OK for the view model data generation to take a little longer. This results in some wasted CPU cycles because you could update the view model several times before anyone asked for that view, but that’s OK since we were really using up idle time anyway.
When a command updates an aggregate and persists it to the DB, it generates a message for the view side of CQRS to update the view. There are 2 ways to do this. The first is to send a message saying “aggregate 83483 needs to be updated” and the view model requeries everything it needs from the domain model and updates the view model. The other approach is to send a message saying “aggregate 83483 was updated to have the following values: …” and the read side can update its tables without having to query. The first approach requires fewer message types but more querying, while the second is the opposite. You can mix and match these two approaches in the same system.
Since the read side has very different table structures, you need both databases. On the read side, unless you want the user to be able to see old versions of the appointments, you only have to store the current state of the view so just update existing data. On the command side, keeping historical state using a version number is a good idea, but can make db size grow.
how can the main app call for data from the query service if one don't
want to uses REST?
How the request gets to the query side is unimportant, so you can use REST, postback, GraphQL or whatever.
Is that "event sourcing"?
Event Sourcing is when you persist all changes made to all entities. If the entities are small enough you can persist all properties, but in general events only have changes. Then to get current state you add up all those changes to see what your entities look like at a certain point in time. It has nothing to do with the read model – that’s CQRS. Note that events are not the request from the user to make a change, that’s a message which then is used to create a command. An event is a record of all fields that changed as a result of the command. That’s an important distinction because you don’t want to re-run all that business logic when rehydrating an entity or aggregate.
When a client wants to change something but afterwards also show the
changed data the only way I see is to trigger the change and than wait
(let's say with polling) for the query service to have that data.
What's a good way to achieve that? Maybe checking for the existing of
the future version number?
Showing historical data is a bit sticky. I would push back on this requirement if you can, but sometimes it’s necessary. If you must do it, take the standard read model approach and save all changes to a view model table. If the circumstances are right you can cheat and read historical data directly from the domain model tables, but that’s breaking a CQRS rule. This is important because one of the advantages of CQRS is its scalability. You can scale the read side as much as you want if each read instance maintains its own read database, but having to read from the domain model will ruin this. This is situation dependent so you’ll have to decide on your own, but the best course of action is to try to get that requirement removed.
In terms of timing, CQRS is all about eventual consistency. The data changes may not show up on the read side for a while (typically fractions of a second but that's enough to cause problems). If you must show new and old data, you can poll and wait for the proper version number to appear, which is ugly. There are other alternatives involving result queues in Rabbit, but they are even uglier.

Use of Oracle Advanced Queuing to receive changes of database table rows

I am confused about Oracle Advanced Queueing. It looks like it is a way to asynchronously send database notification to application layer.
But looking in some details, there is queue to be setup, alongside a table. and there is explicit calls to publish messages that will afterward be pushed to the application layer.
Does this work automatically with table rows modification ?
I want, if a particular table changes (no matter who/how changed), to receive a notification about it in form of a binary object that represents the row changed.
(Note: I know about Oracle Query change notification, CQN, but I am not satisfied with its performance, my goal is then to see if Oracle Advanced Queue can offer similar goal with better speed).
Thanks in advance.

How to design an event receiver which is capable of showing the recent events

The home page of meetup shows information on the recent meetups on the right hand side of the page. What kind of design patterns / tools (pref java based) would you use to implement such an output.
There's a couple of different approaches, which one you use would depend on several factors including the complexity of the business processes, the degree of flexibility desired and load.
Simple Solution
"RSVP Updates" are written directly to some data source during the "RSVP" process; this process is essentially hard-coded.
Have something that reads out the RSVP directly from the whatever data source / table they live in.
This solution will be fine if the load and data volumes are excessive. The key point is that the RSVP UI widget ends up pulling the data out of the same data source as where the updates are written to.
Performance
A few different options, based on the above as a starting point:
Hold the data twice: once in the "master" (Transactional) table of RSVP data, and once in a table built for servicing the UI (Basically OLTP vs OLAP). The second table would include all the relevant data so that there were no look-ups to other tables, and as it's an independent copy of the data you can manage it differently if you want to (for example: purge out old records so the table size is kept small).
Or, instead of a second table just keep all the data in memory. this would require you to pull the data out of the main transactional table whenever the in memory copy is lost.
Flexibility
Same as the original approach but instead of hard-coding in the step that records the RSVP (into a single data source) use a more loosely-coupled approach so that you can add / change / remove as many event processors as you wish. One will write the RSVP data to the main RSVP data source, while a second will do the same/similar but aggregated ready for the
"Recent RSVPs" UI Widget.
Dependency Injection will give you the flexibility - certainly if you deal with a single implementation of the event handler.
The Publish / Subscribe or Chain of Responsibility patterns might give you the basis of an approach.
Is that the kind of info you were after?

the best way to track data changes in oracle

as the title i am talking about, what's the best way to track data changes in oracle? i just want to know which row being updated/deleted/inserted?
at first i think about the trigger, but i need to write more triggers on each table and then record down the rowid which effected into my change table, it's not good, then i search in Google, learn new concepts about materialized view log and change data capture,
materialized view log is good for me that i can compare it to original table then i can get the different records, even the different of the fields, i think the way is the same with i create/copy new table from original (but i don't know what's different?);
change data capture component is complicate for me :), so i don't want to waste my time to research it.
anybody has the experience the best way to track data changes in oracle?
You'll want to have a look at the AUDIT statement. It gathers all auditing records in the SYS.AUD$ table.
Example:
AUDIT insert, update, delete ON t BY ACCESS
Regards,
Rob.
You might want to take a look at Golden Gate. This makes capturing changes a snap, at a price but with good performance and quick setup.
If performance is no issue, triggers and audit could be a valid solution.
If performance is an issue and Golden Gate is considered too expensive, you could also use Logminer or Change Data Capture. Given this choice, my preference would go for CDC.
As you see, there are quite a few options, near realtime and offline.
Coding a solution by hand also has a price, Golden Gate is worth investigating.
Oracle does this for you via redo logs, it depends on what you're trying to do with this info. I'm assuming your need is replication (track changes on source instance and propagate to 1 or more target instances).
If thats the case, you may consider Oracle streams (other options such as Advanced Replication, but you'll need to consider your needs):
From Oracle:
When you use Streams, replication of a
DML or DDL change typically includes
three steps:
A capture process or an application
creates one or more logical change
records (LCRs) and enqueues them into
a queue. An LCR is a message with a
specific format that describes a
database change. A capture process
reformats changes captured from the
redo log into LCRs, and applications
can construct LCRs. If the change was
a data manipulation language (DML)
operation, then each LCR encapsulates
a row change resulting from the DML
operation to a shared table at the
source database. If the change was a
data definition language (DDL)
operation, then an LCR encapsulates
the DDL change that was made to a
shared database object at a source
database.
A propagation propagates the staged
LCR to another queue, which usually
resides in a database that is separate
from the database where the LCR was
captured. An LCR can be propagated to
a number of queues before it arrives
at a destination database.
At a destination database, an apply
process consumes the change by
applying the LCR to the shared
database object. An apply process can
dequeue the LCR and apply it directly,
or an apply process can dequeue the
LCR and send it to an apply handler.
In a Streams replication environment,
an apply handler performs customized
processing of the LCR and then applies
the LCR to the shared database object.

Resources