Should an Event Sourcing entry contain what should update the view model or the payload of the event? - event-sourcing

I have a situation where data is coming from a third party service. It is being passed through to a function that formats the data and then saves it to a view model in a way that I can visualize for my system.
In an Event driven approach, should I save the payload of the request (as this can easily be repayable) in the Event stream, or the formatted changes it produces to the view model (a more accurate representation of the current state of the data)?
Or something else completely?
Thanks

The incoming data can be viewed as a command expressing the intent to ultimately update some state. In this case the command is from outside our system, but commands can also be internal to our system. Especially for external commands, one critical thing to remember is that a command can be rejected.
In event sourcing, however, events are internal and express that the change has occurred and cannot be denied (at most it can be ignored). Thus it's probably best to store them in the format that is the most convenient for that internal use.
I would characterize the requests as commands and the formatted changes as events. Saving the payload is command sourcing, saving the formatted changes is event sourcing (confusingly, Fowler's earliest descriptions of event sourcing are more like command sourcing) and both are valid approaches. Event sourcing tends to imply a commitment to replay to a similar state while command sourcing leaves open the ability for replay to depend on something in the outside world. I've seen (and developed even) applications which used both techniques (e.g. incoming data is dumped to Kafka, a consumer treats those messages as commands against aggregates whose state is persisted as a stream of events, which gets projected back into Kafka).
If you (in CQRS/ES fashion) consider the read-side of your application to be a separate autonomous component from the write-side, then you reach the interesting conclusion that when the write-side publishes events, from the read-side's perspective it's publishing commands to the read-side. "One component's events are often another component's commands".

Related

CQRS Where to Query for business logic/Internal Processes

I'm currently looking at implementing CQRS driven by events (not yet event sourcing) in for a service at work; the reasoning being:
I need aggregate data to support a RestAPI coming out of this service (which will be used to populate views)- however the aggregated data will not be used by the application logic/processing (ie the data originating outside this service, the bits that of the aggregate originating within it will be used)
I need to stream events to other systems so that they can react to the data (will produce to a Kafka topic, so the 'read'/'projection' side of this system will consume the same events as the external systems, from these Kafka topics
I will be consuming events from internal systems to help populate the aggregate for the views in first point (ie it's data from this service and other's)
The reason for not going event sourced currently is that a) we're in a bit of a time crunch, and b) due to still learning about it. Having said which, it is something that we are looking to do in the future- though currently, we have a static DB in the 'Command' side of the system, which will just store current state
I'm pretty confident with the concept of using the aggregate data to provide the Rest API; however my confusion is coming from when I want to change a resource from within the system (for example via a cron job triggered 5 times a day) Example:
If I have resource of class x, which (given some data), wants a piece of state changing
I need to select instances of the class x which meet the requirements (from one of the DB's). Think select * from {class x} where last_changed_ date > 5 days ago;
Then create a command to change the state of these instances of x (in my case, the static command DB would be updated, as well as an event made to update the read DB)
The middle bullet point is what is confusing me. If I pull the data out of the Read DB, and check some information on it, then decide to change a property; I then have to convert the object from the 'Read Object' to the 'Command Object', so that I can then persist it and create an event? With my current architecture- I could query the command DB no problem, to find all the instances of {class x} that match the criteria, however I don't know if a) this is the right thing to do, and b) how this would work if I was using an event store as a DB? I'd have to query a table with millions of rows to find the most recent bit of state about the objects, to then see if they match?
Lots of what I read online has been very conceptual- so I think when it comes to implementations it maybe seems more difficult than it is? Anyhow, if anyone has any advice it would be hugely appreciated!
TIA :)
CQRS can be interpreted in a "permissive" way: rather than saying "thou shalt not query the command/write side", it says "it's OK to have a query/read side that's separate from the command/write side". Because you have this permission to do such separation, it follows that one can optimize the command/write side for a more write-heavy workload (in practice, there are always some reads in the command/write side: since command validation is typically done against some state, that requires some means of getting the state!). From this, it's extremely likely that there will be some queries which can be performed efficiently against the command/write side and some that can't be (without deoptimizing the command/write side). From this perspective, it's OK to perform the first kind of query against the command/write side: you can get the benefit of strong consistency by doing that, though be sure to make sure that you're not affecting the command/write side's primary raison d'etre of taking writes.
Event sourcing is in many ways the maximally optimized persistence model for a command/write side, especially if you have some means of keeping the absolute latest state cached and ensuring concurrency control. This is because you can then have many times more writes than reads. The tradeoff in event sourcing is that nearly all reads become rather more expensive than in an update-in-place model: it's thus generally the case that CQRS doesn't force event sourcing but event sourcing tends to force CQRS (and in turn, event sourcing can simplify ensuring that a CQRS system is eventually consistent, which can be difficult to ensure with update-in-place).
In an event-sourced system, you would tend to have a read-side which subscribes to the event stream and tracks the mapping of X ID to last updated and which periodically queries and issues commands. Alternatively, you can have a scheduler service that lets you say "issue this command at this time, unless canceled or rescheduled before then" and a read-side which subscribes to updates and schedules a command for the given ID 5 days from now after canceling the command from the previous update.

Is Event sourcing using Database CDC considered good architecture?

When we talk about sourcing events, we have a simple dual write architecture where we can write to database and then write the events to a queue like Kafka. Other downstream systems can read those events and act on/use them accordingly.
But the problem occurs when trying to make both DB and Events in sync as the ordering of these events are required to make sense out of it.
To solve this problem people encourage to use database commit logs as a source of events, and there are tools build around it like Airbnb's Spinal Tap, Redhat's Debezium, Oracle's Golden gate, etc... It solves the problem of consistency, ordering guaranty and all these.
But the problem with using the Database commit log as event source is we are tightly coupling with DB schema. DB schema for a micro-service is exposed, and any breaking changes in DB schema like datatype change or column name change can actually break the downstream systems.
So is using the DB CDC as an event source a good idea?
A talk on this problem and using Debezium for event sourcing
Extending Constantin's answer:
TLDR;
Transaction log tailing/mining should be hidden from others.
It is not strictly an event-stream, as you should not access it directly from other services. It is generally used when transitioning a legacy system gradually to a microservices based. The flow could look like this:
Service A commits a transaction to the DB
A framework or service polls the commit log and maps new commits to Kafka as events
Service B is subscribed to a Kafka stream and consumes events from there, not from the DB
Longer story:
Service B doesn't see that your event is originated from the DB nor it accesses the DB directly. The commit data should be projected into an event. If you change the DB, you should only modify your projection rule to map commits in the new schema to the "old" event format, so consumers must not be changed. (I am not familiar with Debezium, or if it can do this projection).
Your events should be idempotent as publishing an event and committing a transaction
atomically is a problem in a distributed scenario, and tools will guarantee at-least-once-delivery with exactly-once-processing semantics at best, and the exactly-once part is rarer. This is due to an event origin (the transaction log) is not the same as the stream that will be accessed by other services, i.e. it is distributed. And this is still the producer part, the same problem exists with Kafka->consumer channel, but for a different reason. Also, Kafka will not behave like an event store, so what you achieved is a message queue.
I recommend using a dedicated event-store instead if possible, like Greg Young's: https://eventstore.org/. This solves the problem by integrating an event-store and message-broker into a single solution. By storing an event (in JSON) to a stream, you also "publish" it, as consumers are subscribed to this stream. If you want to further decouple the services, you can write projections that map events from one stream to another stream. Your event consuming should be idempotent with this too, but you get an event store that is partitioned by aggregates and is pretty fast to read.
If you want to store the data in the SQL DB too, then listen to these events and insert/update the tables based on them, just do not use your SQL DB as your event store cuz it will be hard to implement it right (failure-proof).
For the ordering part: reading events from one stream will be ordered. Projections that aggregates multiple event streams can only guarantee ordering between events originating from the same stream. It is usually more than enough. (btw you could reorder the messages based on some field on the consumer side if necessary.)
If you are using Event sourcing:
Then the coupling should not exist. The Event store is generic, it doesn't care about the internal state of your Aggregates. You are in the worst case coupled with the internal structure of the Event store itself but this is not specific to a particular Microservice.
If you are not using Event sourcing:
In this case there is a coupling between the internal structure of the Aggregates and the CDC component (that captures the data change and publish the event to an Message queue or similar). In order to limit the effects of this coupling to the Microservice itself, the CDC component should be part of it. In this way when the internal structure of the Aggregates in the Microservice changes then the CDC component is also changed and the outside world doesn't notice. Both changes are deployed at the same time.
So is using the DB CDC as an event source a good idea?
"Is it a good idea?" is a question that is going to depend on your context, the costs and benefits of the different trade offs that you need to make.
That said, it's not an idea that is consistent with the heritage of event sourcing as I learned it.
Event sourcing - the idea that our book of record is a ledger of state changes - has been around a long long time. After all, when we talk about "ledger", we are in fact alluding to those documents written centuries ago that kept track of commerce.
But a lot of the discussion of event sourcing in software is heavily influenced by domain driven design; DDD advocates (among other things) aligning your code concepts with the concepts in the domain you are modeling.
So here's the problem: unless you are in some extreme edge case, your database is probably some general purpose application that you are customizing/configuring to meet your needs. Change data capture is going to be limited by the fact that it is implemented using general purpose mechanisms. So the events that are produced are going to look like general purpose patch documents (here's the diff between before and after).
But if we trying to align our events with our domain concepts (ie, what does this change to our persisted state mean), then patch documents are a step in the wrong direction.
For example, our domain might have multiple "events" that make changes to the same, or very similar, sets of fields in our model. Trying to rediscover the motivation for a change by reverse engineering the diff is kind of a dumb problem to have; especially when we have already fought with the same sort of problem learning user interface design.
In some domains, a general purpose change is good enough. In some contexts, a general purpose change is good enough for now. Horses for courses.
But it's not really the sort of implementation that the "event sourcing" community is talking about.
Besides Constantin Galbenu mentioned CDC component side, you can also do it in event storage side like Kafka stream API.
What is Kafka stream API? Input is read from one or more topics in order to generate output to one or more topics, effectively transforming the input streams to output streams.
After transfer detailed data to abstract data, your DB schema is only bind with the transformation now and can release the tightly relation between DB and subscribers.
If your data schema need to change a lot, maybe you should add a new topic for it.

ES,CQRS messaging flow

I was trying to understanding ES+CQRS and tech stack can be used.
As per my understanding flow should be as below.
UI sends a request to Controller(HTTP Adapter)
Controller calls application service by passing Request Object as parameter.
Application Service creates Command from Request Object passed from controller.
Application Service pass this Command to Message Consumer.
Message Consumer publish Command to message broker(RabbitMQ)
Two Subscriber will be listening for above command
a. One subscriber will generate Aggregate from eventStore using command
and will apply command than generated event will be stored in event store.
b. Another subscriber will be at VIEW end,that will populate data in view database/cache.
Kindly suggest my understanding is correct.
Kindly suggest my understanding is correct
I think you've gotten a bit tangled in your middleware.
As a rule, CQRS means that the writes happen to one data model, and reads in another. So the views aren't watching commands, they are watching the book of record.
So in the subscriber that actually processes the command, the command handler will load the current state from the book of record into memory, update the copy in memory according to the domain model, and then replace the state in the book of record with the updated version.
Having update the book of record, we can now trigger a refresh of the data model that backs the view; no business logic is run here, this is purely a transform of the data from the model we use for writes to the model we use for reads.
When we add event sourcing, this pattern is the same -- the distinction is that the data model we use for writes is a history of events.
How atomicity is achieved in writing data in event store and writing data in VIEW Model?
It's not -- we don't try to make those two actions atomic.
how do we handle if event is stored in EventStrore but System got crashed before we send event in Message Queue
The key idea is to realize that we typically build new views by reading events out of the event store; not by reading the events out of the message queue. The events in the queue just tell us that an update is available. In the absence of events appearing in the message queue, we can still poll the event store watching for updates.
Therefore, if the event store is unreachable, you just leave the stale copy of the view in place, and wait for the system to recover.
If the event store is reachable, but the message queue isn't, then you update the view (if necessary) on some predetermined schedule.
This is where the eventual consistency part comes in. Given a successful write into the event store, we are promising that the effects of that write will be visible in a finite amount of time.

CQRS+ES: Client log as event

I'm developing small CQRS+ES framework and develop applications with it. In my system, I should log some action of the client and use it for analytics, statistics and maybe in the future do something in domain with it. For example, client (on web) download some resource(s) and I need save date, time, type (download, partial,...), from region or country (maybe IP), etc. after that in some view client can see count of download or some complex report. I'm not sure how to implement this feather.
First solution creates analytic context and some aggregate, in each client action send some command like IncreaseDownloadCounter(resourced) them handle the command and raise domain event's and updating view, but in this scenario first download occurred and after that, I send command so this is not really command and on other side version conflict increase.
The second solution is raising event, from client side and update the view model base on it, but in this type of handling my event not store in event store because it's not raise by command and never change any domain context. If is store it in event store, no aggregate to handle it after fetch for some other use.
Third solution is raising event, from client side and I store it on other database may be for each type of event have special table, but in this manner of event handle I have multiple event storage with different schema and difficult on recreating view models and trace events for recreating contexts states so in future if I add some domain for use this type of event's it's difficult to use events.
What is the best approach and solution for this scenario?
First solution creates analytic context and some aggregate
Unquestionably the wrong answer; the event has already happened, so it is too late for the domain model to complain.
What you have is a stream of events. Putting them in the same event store that you use for your aggregate event streams is fine. Putting them in a separate store is also fine. So you are going to need some other constraint to make a good choice.
Typically, reads vastly outnumber writes, so one concern might be that these events are going to saturate the domain store. That might push you towards storing these events separately from your data model (prior art: we typically keep the business data in our persistent book of record, but the sequence of http requests received by the server is typically written instead to a log...)
If you are supporting an operational view, push on the requirement that the state be recovered after a restart. You might be able to get by with building your view off of an in memory model of the event counts, and use something more practical for the representations of the events.
Thanks for your complete answer, so I should create something like the ES schema without some field (aggregate name or type, version, etc.) and collect client event in that repository, some offline process read and update read model or create command to do something on domain space.
Something like that, yes. If the view for the client doesn't actually require any validation by your model at all, then building the read model from the externally provided events is fine.
Are you recommending save some claim or authorization token of the user and sender app for validation in another process?
Maybe, maybe not. The token describes the authority of the event; our own event handler is the authority for the command(s) that is/are derived from the events. It's an interesting question that probably requires more context -- I'd suggest you open a new question on that point.

Design of notification events

I am designing some events that will be raised when actions are performed or data changes in a system. These events will likely be consumed by many different services and will be serialized as XML, although more broadly my question also applies to the design of more modern funky things like Webhooks.
I'm specifically thinking about how to describe changes with an event and am having difficulty choosing between different implementations. Let me illustrate my quandry.
Imagine a customer is created, and a simple event is raised.
<CustomerCreated>
<CustomerId>1234</CustomerId>
<FullName>Bob</FullName>
<AccountLevel>Silver</AccountLevel>
</CustomerCreated>
Now let's say Bob spends lots of money and becomes a gold customer, or indeed any other property changes (e.g.: he now prefers to be known as Robert). I could raise an event like this.
<CustomerModified>
<CustomerId>1234</CustomerId>
<FullName>Bob</FullName>
<AccountLevel>Gold</AccountLevel>
</CustomerModified>
This is nice because the schema of the Created and Modified events are the same and any subscriber receives the complete current state of the entity. However it is difficult for any receiver to determine which properties have changed without tracking state themselves.
I then thought about an event like this.
<CustomerModified>
<CustomerId>1234</CustomerId>
<AccountLevel>Gold</AccountLevel>
</CustomerModified>
This is more compact and only contains the properties that have changed, but comes with the downside that the receiver must apply the changes and reassemble the current state of the entity if they need it. Also, the schemas of the Created and Modified events must be different now; CustomerId is required but all other properties are optional.
Then I came up with this.
<CustomerModified>
<CustomerId>1234</CustomerId>
<Before>
<FullName>Bob</FullName>
<AccountLevel>Silver</AccountLevel>
</Before>
<After>
<FullName>Bob</FullName>
<AccountLevel>Gold</AccountLevel>
</After>
</CustomerModified>
This covers all bases as it contains the full current state, plus a receiver can figure out what has changed. The Before and After elements have the exact same schema type as the Created event. However, it is incredibly verbose.
I've struggled to find any good examples of events; are there any other patterns I should consider?
You tagged the question as "Event Sourcing", but your question seems to be more about Event-Driven SOA.
I agree with #Matt's answer--"CustomerModified" is not granular enough to capture intent if there are multiple business reasons why a Customer would change.
However, I would back up even further and ask you to consider why you are storing Customer information in a local service, when it seems that you (presumably) already have a source of truth for customer. The starting point for consuming Customer information should be getting it from the source when it's needed. Storing a copy of information that can be queried reliably from the source may very well be an unnecessary optimization (and complication).
Even if you do need to store Customer data locally (and there are certainly valid reasons for need to do so), consider passing only the data necessary to construct a query of the source of truth (the service emitting the event):
<SomeInterestingCustomerStateChange>
<CustomerId>1234</CustomerId>
</SomeInterestingCustomerStateChange>
So these event types can be as granular as necessary, e.g. "CustomerAddressChanged" or simply "CustomerChanged", and it is up to the consumer to query for the information it needs based on the event type.
There is not a "one-size-fits-all" solution--sometimes it does make more sense to pass the relevant data with the event. Again, I agree with #Matt's answer if this is the direction you need to move in.
Edit Based on Comment
I would agree that using an ESB to query is generally not a good idea. Some people use an ESB this way, but IMHO it's a bad practice.
Your original question and your comments to this answer and to Matt's talk about only including fields that have changed. This would definitely be problematic in many languages, where you would have to somehow distinguish between a property being empty/null and a property not being included in the event. If the event is getting serialized/de-serialized from/to a static type, it will be painful (if not impossible) to know the difference between "First Name is being set to NULL" and "First Name is missing because it didn't change".
Based on your comment that this is about synchronization of systems, my recommendation would be to send the full set of data on each change (assuming signal+query is not an option). That leaves the interpretation of the data up to each consuming system, and limits the responsibility of the publisher to emitting a more generic event, i.e. "Customer 1234 has been modified to X state". This event seems more broadly useful than the other options, and if other systems receive this event, they can interpret it as they see fit. They can dump/rewrite their own data for Customer 1234, or they can compare it to what they have and update only what changed. Sending only what changed seems more specific to a single consumer or a specific type of consumer.
All that said, I don't think any of your proposed solutions are "right" or "wrong". You know best what will work for your unique situation.
Events should be used to describe intent as well as details, for example, you could have a CustomerRegistered event with all the details for the customer that was registered. Then later in the stream a CustomerMadeGoldAccount event that only really needs to capture the customer Id of the customer who's account was changed to gold.
It's up to the consumers of the events to build up the current state of the system that they are interested in.
This allows only the most pertinent information to be stored in each event, imagine having hundreds of properties for a customer, if every command that changed a single property had to raise an event with all the properties before and after, this gets unwieldy pretty quickly. It's also difficult to determine why the change occurred if you just publish a generic CustomerModified event, which is often a question that is asked about the current state of an entity.
Only capturing data relevant to the event means that the command that issues the event only needs to have enough data about the entity to validate the command can be executed, it doesn't need to even read the whole customer entity.
Subscribers of the events also only need to build up a state for things that they are interested in, e.g. perhaps an 'account level' widget is listening to these events, all it needs to keep around is the customer ids and account levels so that it can display what account level the customer is at.
Instead of trying to convey everything through payload xmls' fields, you can distinguish between different operations based on -
1. Different endpoint URLs depending on the operation(this is preferred)
2. Have an opcode(operation code) as an element in the xml file which tells which operation is to used to handle the incoming request.(more nearer to your examples)
There are a few enterprise patterns applicable to your business case - messaging and its variants, and if your system is extensible then Enterprise Service Bus should be used. An ESB allows reliable handling of events and processing.

Resources