DDD and EDA - Singular vs Plural event names with set-oriented operations - events

Context: the product I'm working on is moving away from a monolith to a modular monolith architecture, and in the process implementing DDD concepts, as well as a more event-driven architecture.
Problem: a lot of operations are set-oriented (i.e. they accept a set of Items instead of a single one). From what I understand, this is a violation of the Aggregate rule of "one Aggregate change per transaction", however Vaughn Vernon mentions in IDDD (p. 367/368) that "UI convenience allowing the user to create batch Aggregates" (paraphrased) is one of the "accepted reasons" to break this rule. There is no mention on what the corresponding events would look like.
Question: Would it be correct, in this particular case, to batch all the ItemCreated events in a single ItemsCreated events (plural vs singular), with all the individual events as payload?
So, if the user creates 10 Items at once, instead of having 10 ItemCreated (singular) events, I would have a single ItemsCreated (plural) event, with the 10 Items referenced.
Other notes: I understand that Domain Events are emitted by Aggregates, and as such there should be a 1:1 match between event-emitting commands and Domain Events. I am not sure if this batching of Events can be accomplished away from the Aggregates.

I understand that Domain Events are emitted by Aggregates, and as such there should be a 1:1 match between event-emitting commands and Domain Events.
There are a number of people who feel quite strongly that one "transaction" should necessarily mean one "event". I've argued with some of them. They aren't particularly convincing; but apparently neither was I.
1:1 is simple - but you have to be careful about the cases where you pay for simple with more complexity somewhere else.
Would it be correct, in this particular case, to batch all the ItemCreated events in a single ItemsCreated events (plural vs singular), with all the individual events as payload?
It could be (but I would guess that it won't be).
What I think you should do is look into the nuance of the situation more carefully - within the business domain that you are modeling, is this really one thing that's going on, or is it 10 different things that just happen to be coincident in time because they were delivered together?
Does everybody who cares about one of the items in this set necessarily care about all of them?
If you were to implement "create 10" as two distinct "create 5"s, would that call for one event? two events? 10 events?
The fact that you are considering these to be 10 different aggregates (as opposed to one aggregate with 10 different entities within it) suggests that we really do have 10 different acts of creation that the business cares about.

Related

Event Sourcing: multiple events vs a single "StatusChanged"

Assuming the common "Order" aggregate, my view of events is that each should be representative of the command that took place. E.g. OrderCreated, OrderePicked, OrderPacked, OrderShipped.
Applying these events in the aggregate changes the status of the order accordingly.
The problem:
I have a projector that lists all orders in the system and their statuses. So it consumes the events, and like with the aggregate "apply" method, it implements the logic that changes the status of the order.
So now the logic exists in two places, which is... not good.
A solution to this is to replace all the above events with a single StatusChanged event that contains a property with the new status.
Pros: both aggregate and projectors just need to handle one event type, and set the status to what's in that event. Zero logic.
Cons: the list of events is now very implicit. Instead of getting a list of WHAT HAPPENED (created, packed, shipped, etc.), we now have a list of the status changes events.
How do you prefer to approach this?
Note: this is not the full list of events. other events contain other properties, so clearly they don't belong to this problem. the problem is with events that don't contain any info, just change the status of an order.
In general it's better to have more finer-grained events, because this preserves context (and means that you don't have to write logic to reconstruct the context in your consumers).
You typically will have at most one projector which is duplicating your aggregate's event handler. If its purpose is actually to duplicate the aggregate's event handler (e.g. update a datastore which facilitates cross-aggregate querying), you may want to look at making that explicit as a means of making the code DRY (e.g. function-as-value, strategy pattern...).
For the other projectors you write (and there will be many as you go down the CQRS/ES road), you're going to be ignoring events that aren't interesting to that projection and/or doing radically different things in response to the events you don't ignore. If you go down the road of coarse events (CRUD being about the limit of coarseness: a StatusChanged event is just the "U" in CRUD), you're setting yourself up for either:
duplicating the aggregate's event handling/reconstruction in the projector
carrying oldState and newState in the event (viz. just saying StatusChanged { newState } isn't sufficient)
before you can determine what changed and the code for determining whether a change is interesting will probably be duplicated and more complex than the aggregate's event-handling code.
The coarser the events, the greater the likelihood of eventually having more duplication, less understandability, and worse performance (or higher infrastructure spend).
So now the logic exists in two places, which is... not good.
Not necessarily a problem. If the logic is static, then it really doesn't matter very much. If the logic is changing, but you can coordinate the change (ex: both places are part of the same deployment bundle), then its fine.
Sometimes this means introducing an extra layer of separation between your "projectors" and the consumers - ex: something that is tightly coupled to the aggregate watching the events, and copying status changes to some (logical) cache where other processes can read the information. Thus, you preserve the autonomy of your component without compromising your event stream.
Another possibility to consider is that we're allowed to produce more than one event from a command - so you could have both an OrderPicked event and a StatusChanged event, and then use your favorite filtering method for subscribers only interested in status changes.
In effect, we've got two different sets of information to track to remember later - inputs (information in the command, information copied from local caches), and also things we have calculated from those inputs, prior state, and the business policies that are now in effect.
So it may make sense to separate those expressions of information anyway.
If event sourcing is a good approach for the problems you are solving, then you are probably working on problems that are pretty important to the business, where specialization matters (otherwise, licensing an off the shelf product and creating adapters would be more cost effective). In which case, you should probably be expecting to invest in thinking deeply about the different trade offs you need to make, rather than hoping for a one-size-fits-all solution.

CQRS Where to Query for business logic/Internal Processes

I'm currently looking at implementing CQRS driven by events (not yet event sourcing) in for a service at work; the reasoning being:
I need aggregate data to support a RestAPI coming out of this service (which will be used to populate views)- however the aggregated data will not be used by the application logic/processing (ie the data originating outside this service, the bits that of the aggregate originating within it will be used)
I need to stream events to other systems so that they can react to the data (will produce to a Kafka topic, so the 'read'/'projection' side of this system will consume the same events as the external systems, from these Kafka topics
I will be consuming events from internal systems to help populate the aggregate for the views in first point (ie it's data from this service and other's)
The reason for not going event sourced currently is that a) we're in a bit of a time crunch, and b) due to still learning about it. Having said which, it is something that we are looking to do in the future- though currently, we have a static DB in the 'Command' side of the system, which will just store current state
I'm pretty confident with the concept of using the aggregate data to provide the Rest API; however my confusion is coming from when I want to change a resource from within the system (for example via a cron job triggered 5 times a day) Example:
If I have resource of class x, which (given some data), wants a piece of state changing
I need to select instances of the class x which meet the requirements (from one of the DB's). Think select * from {class x} where last_changed_ date > 5 days ago;
Then create a command to change the state of these instances of x (in my case, the static command DB would be updated, as well as an event made to update the read DB)
The middle bullet point is what is confusing me. If I pull the data out of the Read DB, and check some information on it, then decide to change a property; I then have to convert the object from the 'Read Object' to the 'Command Object', so that I can then persist it and create an event? With my current architecture- I could query the command DB no problem, to find all the instances of {class x} that match the criteria, however I don't know if a) this is the right thing to do, and b) how this would work if I was using an event store as a DB? I'd have to query a table with millions of rows to find the most recent bit of state about the objects, to then see if they match?
Lots of what I read online has been very conceptual- so I think when it comes to implementations it maybe seems more difficult than it is? Anyhow, if anyone has any advice it would be hugely appreciated!
TIA :)
CQRS can be interpreted in a "permissive" way: rather than saying "thou shalt not query the command/write side", it says "it's OK to have a query/read side that's separate from the command/write side". Because you have this permission to do such separation, it follows that one can optimize the command/write side for a more write-heavy workload (in practice, there are always some reads in the command/write side: since command validation is typically done against some state, that requires some means of getting the state!). From this, it's extremely likely that there will be some queries which can be performed efficiently against the command/write side and some that can't be (without deoptimizing the command/write side). From this perspective, it's OK to perform the first kind of query against the command/write side: you can get the benefit of strong consistency by doing that, though be sure to make sure that you're not affecting the command/write side's primary raison d'etre of taking writes.
Event sourcing is in many ways the maximally optimized persistence model for a command/write side, especially if you have some means of keeping the absolute latest state cached and ensuring concurrency control. This is because you can then have many times more writes than reads. The tradeoff in event sourcing is that nearly all reads become rather more expensive than in an update-in-place model: it's thus generally the case that CQRS doesn't force event sourcing but event sourcing tends to force CQRS (and in turn, event sourcing can simplify ensuring that a CQRS system is eventually consistent, which can be difficult to ensure with update-in-place).
In an event-sourced system, you would tend to have a read-side which subscribes to the event stream and tracks the mapping of X ID to last updated and which periodically queries and issues commands. Alternatively, you can have a scheduler service that lets you say "issue this command at this time, unless canceled or rescheduled before then" and a read-side which subscribes to updates and schedules a command for the given ID 5 days from now after canceling the command from the previous update.

Event sourcing, hold read side consistent

I'm new in ES, and only trying to sort everything in my head. I have heard that ES is actually solving the consistency issue between write and read database (with some delay for sure). But I still do not fully understand how?
If command is coming to domain and aggregate root firing event to update event store, same event is sending to update read side?? But what if message lost, we will have outdated read side.
Is projections the only solution??So instead of updating from event, read side walking through event store and reproducing aggregate (from beginning or from some snapshot). But in such case it's probably breaking some rules as read side should be simple and it should not know about domain. And also usually read side is a separate application so she can't know about aggregate.
For sure we also can use rabbitMQ or some other message broker to not lost messages,and actually I think we need. But I also read that to make it consistent "you can use rabbit or ES", but again how ES can make it consistent by own??
Benjamin is completely right about the purpose of Event Sourcing.
My answer aims to add some more details.
First:
Read models and projections aren't suppose to represent the aggregate state.
Projections are the way for event-sourced systems to build the read model for CQRS. CQRS in essence postulates that write and read models usually serve different purposes and therefore it makes perfect sense to use another model for the read side.
Therefore, you often find multiple projections building different, narrowly purposed models, targeting specific needs for queries.
Second:
By "solving consistency issues" you probably mean that in event-sourced systems each state transition is represented as an event (or multiple events). Therefore, writes are always transactional. The database you choose as your event store should support (could using some library or additional tool) real-time subscription that would allow you to receive new events in your projection, in order. For new projections, it will start reading from the start and eventually come real-time. Subscriptions usually need to keep the current processing position in the global stream of events so when the projection restarts, it starts receiving events from the point which is last known to it.
By doing this, you will guarantee that every state transition in the write model will be reflected in the read model. This is probably what you mean in your original question.
Third:
Now, all those things above imply that you cannot use a message bus (only) to deliver events to projections. Brokers give no ordering guarantees and can deliver one message more than once. Also, message brokers don't keep history so you cannot build new projections at will.
However, it doesn't mean that you can't use brokers at all. Some projections don't require ordering and are idempotent. But the feed for events to publish via a broker is the same subscription, so you get guaranteed delivery and can read past events if necessary.
Fourth:
CQRS doesn't imply separate databases. Sometimes, using CQRS just means that you use some persistence layer for your domain objects, so you read and write aggregates. But for queries, you just query at will, whatever you want. A database view is a technical example of CQRS.
Almost there:
Projections need to have little to no logic, it is true. The main point here is to ensure idempotency, if possible, so projections usually should not use operations to calculate new values based on old values and information from events.
But projections will know about your domain. Everything in your system should know about your domain.
And last:
You can definitely use different databases for write and read models without getting to Event Sourcing. You just need to choose a database that supports a change feed. SQL Server, Postgres, CosmosDb and other databases have such functionality.
P.S. I'd suggest spending some time studying those concepts. I can point to the book repository, it has CQRS and Event Sourcing examples: https://github.com/PacktPublishing/Hands-On-Domain-Driven-Design-with-.NET-Core
I have heard that ES is actually solving the consistency issue between
write and read database
To the best of my knowledge, Event sourcing has NOTHING to do with consistency between read/write to your db. Consistency between read/write has actually more to do with the type of db you are using such as relational which are mostly ACID versus the non-relational db which are often eventual consistency.
ES is not meant for that, instead ES : "Capture all changes to an application state as a sequence of events" Martin Fowler.
ES works like time machine, which allows you to change the state of your application to a specific date time in the past.

Event-driven architecture and structure of events

I'm new to EDA and I've read a lot about benefits and would probably be interested to apply it during my next project but still haven't understood something.
When raising an event, which pattern is the most suited:
Name the event "CustomerUpdate" and include all information (updated or not) about the customer
Name the event "CustomerUpdate" and include only information that have really been updated
Name the event "CustomerUpdate" and include minimum information (Identifier) and/or a URI to let the consumer retrieves information about this Customer.
I ask the question because some of our events could be heavy and frequent.
Thx for your answers and time.
Name the event "CustomerUpdate"
First let's start with your event name. The purpose of an event is to describe something which has already happenned. This is different from a command, which is to issue an instruction for something yet to happen.
Your event name "CustomerUpdate" sounds ambiguous in this respect, as it could be describing something in the past or something in the future.
CustomerUpdated would be better, but even then, Updated is another ambiguous term, and is nonspecific in a business context. Why was the customer updated in this instance? Was it because they changed their payment details? Moved home? Were they upgraded from silver to gold status? Events can be made as specific as needed.
This may seem at first to be overthinking, but event naming becomes especially relevant as you remove data and context from the event payload, moving more toward skinny events (the "option 3" from your question, which I discuss below).
That is not to suggest that it is always appropriate to define events at this level of granularity, only that it is an avenue which is open to you early on in the project which may pay dividends later on (or may swamp you with thousands of event types).
Going back to your actual question, let's take each of your options in turn:
Name the event "CustomerUpdate" and include all information (updated
or not) about the customer
Let's call this "pattern" the Fat message.
Fat messages (also called snapshots) represent the state of the described entity at a given point in time with all the event context present in the payload. They are interesting because the message itself represents the contract between service and consumer. They can be used for communicating changes of state between business domains, where it may be preferred that all event context be present during message processing by the consumer.
Advantages:
Self consistent - can be consumed entirely without knowledge of other systems.
Simple to consume (upsert).
Disadvantages:
Brittle - the contract between service and consumer is coupled to the message itself.
Easy to overwrite current data with old data if messages arrive in the wrong order (hint: you can mitigate this by using the event sourcing pattern)
Large.
Name the event "CustomerUpdate" and include only information that have
really been updated
Let's call this pattern the Delta message.
Deltas are similar to fat messages in many ways, though they are generally more complex to generate and consume. A good example here is the JSONPatch standard.
Because they are only a partial description of the event entity, deltas also come with a built-in assumption that the consumer knows something about the event being described. For this reason they may be less suitable for sending outside a business domain, where the event entity may not be well known.
Deltas really shine when synchronising data between systems sharing the same entity model, ideally persisted in non-relational storage (eg, no-sql). In this instance an entity can be retrieved, the delta applied, and then persisted again with minimal effort.
Advantages:
Smaller than Fat messages
Excels in use cases involving shared entity models
Portable (if based on a standard such as jsonpatch, or to a lesser extent, diffgram)
Disadvantages:
Similar to the Fat message, assumes complete knowledge of the data entity.
Easy to overwrite current data with old data.
Complex to generate and consume (except for specific use cases)
Name the event "CustomerUpdate" and include minimum information
(Identifier) and/or a URI to let the consumer retrieves information
about this Customer.
Let's call this the Skinny message.
Skinny messages are different from the other message patterns you have defined, in that the service/consumer contract is no longer explicit in the message, but implied in that at some later time the consumer will retrieve the event context. This decouples the contract and the message exchange, which is a good thing.
This may or may not lend itself well to cross-business domain communication of events, depending on how your enterprise is set up. Because the event payload is so small (usually an ID with some headers), there is no context other than the name of the event on which the consumer can base processing decisions; therefore it becomes more important to make sure the event is named appropriately, especially if there are multiple ways a consumer could handle a CustomerUpdated message.
Additionally it may not be good practice to include an actual resource address in the event data - because events are things which have already happened, event messages are generally immutable and therefore any information in the event should be true forever in case the events need to be replayed. In this instance a resource address could easily become obsolete and events would not be re-playable.
Advantages:
Decouples service contract from message.
Information about the event contained in the event name.
Naturally idempotent (with time-stamp).
Generally tiny.
Simple to generate and consume.
Disadvantages:
Consumer must make additional call to retrieve event context - requires explicit knowledge of other systems.
Event context may have become obsolete at the point where the consumer retrieves it, making this approach generally unsuitable for some real-time applications.
When raising an event, which pattern is the most suited?
I think the answer to this is: it depends on lots of things, and there is probably no one right answer.
Update from comments: Also worth reading, a very old, classic, blog post on messaging: https://learn.microsoft.com/en-gb/archive/blogs/nickmalik/killing-the-command-message-should-we-use-events-or-documents (also here: http://vanguardea.com/killing-the-command-message-should-we-use-events-or-documents/)
Martin Fowler gave a great talk about "The Many Meanings of Event-Driven Architecture" (the content is based on this paper) in which he mentioned the Event-Carried State Transfer pattern.
It seems to be close to your second option "Delta message" with the difference that it doesn't try to describe an entity, but instead describe a named business fact that happened and carry over all the necessary data to understand this fact.
I don't think it matters how you have modeled your persistence layer when it comes to designing domain events. Likewise, I don't think it matters how your consumer has modeled its own persistence layer when designing domain events.
Thus, I don't think it's wise to put as an advantage the fact that you can apply the event as a patch directly on your data (from a consumer point of view), because it pushes the producer to design their events given the persistence model of a consumer.
In that case, I would tend to think that you're designing persistence patches, instead of domain events.
What do you think?

Design of notification events

I am designing some events that will be raised when actions are performed or data changes in a system. These events will likely be consumed by many different services and will be serialized as XML, although more broadly my question also applies to the design of more modern funky things like Webhooks.
I'm specifically thinking about how to describe changes with an event and am having difficulty choosing between different implementations. Let me illustrate my quandry.
Imagine a customer is created, and a simple event is raised.
<CustomerCreated>
<CustomerId>1234</CustomerId>
<FullName>Bob</FullName>
<AccountLevel>Silver</AccountLevel>
</CustomerCreated>
Now let's say Bob spends lots of money and becomes a gold customer, or indeed any other property changes (e.g.: he now prefers to be known as Robert). I could raise an event like this.
<CustomerModified>
<CustomerId>1234</CustomerId>
<FullName>Bob</FullName>
<AccountLevel>Gold</AccountLevel>
</CustomerModified>
This is nice because the schema of the Created and Modified events are the same and any subscriber receives the complete current state of the entity. However it is difficult for any receiver to determine which properties have changed without tracking state themselves.
I then thought about an event like this.
<CustomerModified>
<CustomerId>1234</CustomerId>
<AccountLevel>Gold</AccountLevel>
</CustomerModified>
This is more compact and only contains the properties that have changed, but comes with the downside that the receiver must apply the changes and reassemble the current state of the entity if they need it. Also, the schemas of the Created and Modified events must be different now; CustomerId is required but all other properties are optional.
Then I came up with this.
<CustomerModified>
<CustomerId>1234</CustomerId>
<Before>
<FullName>Bob</FullName>
<AccountLevel>Silver</AccountLevel>
</Before>
<After>
<FullName>Bob</FullName>
<AccountLevel>Gold</AccountLevel>
</After>
</CustomerModified>
This covers all bases as it contains the full current state, plus a receiver can figure out what has changed. The Before and After elements have the exact same schema type as the Created event. However, it is incredibly verbose.
I've struggled to find any good examples of events; are there any other patterns I should consider?
You tagged the question as "Event Sourcing", but your question seems to be more about Event-Driven SOA.
I agree with #Matt's answer--"CustomerModified" is not granular enough to capture intent if there are multiple business reasons why a Customer would change.
However, I would back up even further and ask you to consider why you are storing Customer information in a local service, when it seems that you (presumably) already have a source of truth for customer. The starting point for consuming Customer information should be getting it from the source when it's needed. Storing a copy of information that can be queried reliably from the source may very well be an unnecessary optimization (and complication).
Even if you do need to store Customer data locally (and there are certainly valid reasons for need to do so), consider passing only the data necessary to construct a query of the source of truth (the service emitting the event):
<SomeInterestingCustomerStateChange>
<CustomerId>1234</CustomerId>
</SomeInterestingCustomerStateChange>
So these event types can be as granular as necessary, e.g. "CustomerAddressChanged" or simply "CustomerChanged", and it is up to the consumer to query for the information it needs based on the event type.
There is not a "one-size-fits-all" solution--sometimes it does make more sense to pass the relevant data with the event. Again, I agree with #Matt's answer if this is the direction you need to move in.
Edit Based on Comment
I would agree that using an ESB to query is generally not a good idea. Some people use an ESB this way, but IMHO it's a bad practice.
Your original question and your comments to this answer and to Matt's talk about only including fields that have changed. This would definitely be problematic in many languages, where you would have to somehow distinguish between a property being empty/null and a property not being included in the event. If the event is getting serialized/de-serialized from/to a static type, it will be painful (if not impossible) to know the difference between "First Name is being set to NULL" and "First Name is missing because it didn't change".
Based on your comment that this is about synchronization of systems, my recommendation would be to send the full set of data on each change (assuming signal+query is not an option). That leaves the interpretation of the data up to each consuming system, and limits the responsibility of the publisher to emitting a more generic event, i.e. "Customer 1234 has been modified to X state". This event seems more broadly useful than the other options, and if other systems receive this event, they can interpret it as they see fit. They can dump/rewrite their own data for Customer 1234, or they can compare it to what they have and update only what changed. Sending only what changed seems more specific to a single consumer or a specific type of consumer.
All that said, I don't think any of your proposed solutions are "right" or "wrong". You know best what will work for your unique situation.
Events should be used to describe intent as well as details, for example, you could have a CustomerRegistered event with all the details for the customer that was registered. Then later in the stream a CustomerMadeGoldAccount event that only really needs to capture the customer Id of the customer who's account was changed to gold.
It's up to the consumers of the events to build up the current state of the system that they are interested in.
This allows only the most pertinent information to be stored in each event, imagine having hundreds of properties for a customer, if every command that changed a single property had to raise an event with all the properties before and after, this gets unwieldy pretty quickly. It's also difficult to determine why the change occurred if you just publish a generic CustomerModified event, which is often a question that is asked about the current state of an entity.
Only capturing data relevant to the event means that the command that issues the event only needs to have enough data about the entity to validate the command can be executed, it doesn't need to even read the whole customer entity.
Subscribers of the events also only need to build up a state for things that they are interested in, e.g. perhaps an 'account level' widget is listening to these events, all it needs to keep around is the customer ids and account levels so that it can display what account level the customer is at.
Instead of trying to convey everything through payload xmls' fields, you can distinguish between different operations based on -
1. Different endpoint URLs depending on the operation(this is preferred)
2. Have an opcode(operation code) as an element in the xml file which tells which operation is to used to handle the incoming request.(more nearer to your examples)
There are a few enterprise patterns applicable to your business case - messaging and its variants, and if your system is extensible then Enterprise Service Bus should be used. An ESB allows reliable handling of events and processing.

Resources