Event sourcing: splitting event in more detailed - event-sourcing

While user registration process in my domain several actions occur: user created (with email/password or with linked social network account), user login is done.
I have (see) two options how to register the events:
One UserRegistred event (which contains all the info, password hashes, external social accounts)
Multiple events UserCreated, UserPasswordSet, UserExternalAccountLinked, UserLoggedIn
Events from second option (UserPasswordSet, UserExternalAccountLinked, UserLoggedIn) may appear on their own later while performing corresponded operations.
I understand that question and options may be subjective, but I would like hear opinions of experienced ES/DDD users about the issue.

I don't claim to be experienced, but I think it's simpler output multiple events rather than having a complex simple event.
The pros are:
Simplicity - projections (including the aggregate itself) and other event handlers don't need to understand a complex UserRegistered event as well as the fine grained events
Less churn on the event schemas - e.g. if you change details of your authentication events, fewer event types will need to change (since there's no UserRegistered event to change)
Clarity - the events better capture the sequence of state changes involved in user registration
I can think of a minor con:
Non-atomic registration. It's likely projections could handle a single user registered event and atomically create the read model in a state that the client can immediately query. If you have multiple events, the read model might handle them one by one, meaning the user may be temporarily in a half-registered state, that you might not want to handle in your clients.
This can be avoided by having your read projection consume all available events and make its update in a single transaction, so that the sequence of events causes only a single transaction commit, and hence you never see a half-registered user. This is more efficient in any case, but might not be that simple, depending on your read store.
Alternatively, you can automatically filter out half-registered users when querying the service

Related

Topic granularity in event driven architecture

I was wondering what should be the granularity of the topic names in an event-driven service-oriented architecture.
Let's imagine we have a user management system where users can perform different actions like signing up, signing in, modifying some profile attributes, etc. If we wanted to notify the rest of the services of these changes, I can think of some possibilities for the topic naming:
One topic per each of the classic CRUD operations in each of the models (excluding read since the state of the user does not change). We would have user-created, user-updated, user-deleted. This approach is generic enough, but there would be potentially many services subscribed to user-updated topic and discarding all those events that do not modify a specific field.
One topic per business-relevant change. In addition to user-created and user-deleted, we could have events like user-email-updated, user-signed-in (which otherwise would be fired as a user-updated event where the date of the last sign-in was changed), etc. My feeling is that even though it would be handy for those subscribers only interested in a very specific change, it would be harder for those services that need to sync whatever happens to the user, as they would have to be subscribed to an increasing number of topics to keep track of all the changes in the user model.
A mix between 1 and 3, where both events user-updated and user-email-updated would be sent when the user updates the email, but just user-updated would be sent if the user changes the profile.
The way to go is to implement the 2nd option but implement it with a topic hierarchy to allow subscribers to choose the granularity of their interest (as in subscribing to users.* or *.updated or user.actions.login etc. )
Some technologies (e.g. RabbitMQ) has this capability built-in, for others you can implement a topic registry and provide the infrastructure to manage subscriptions yourself

DDD dealing with Eventual consistency for multiple aggregates inside a bounded context with NoSQL

I am currently working on a DDD Geolocation application that has two separate aggregate roots inside one bounded context. Due to frequent coordinate updates I am using redis to persist my data which doesn't allow rollbacks.
My first aggregate root is a trip object containing driver (users), passengers (list of users), etc.
My second aggregate root is user position updates
When a coordinate update is sent I will generate and fire a "UpdateUserPostionEvent". As a side effect I will also generate and fire a "UpdateTripEvent" at a certain point, which will update coordinates of drivers/passengers.
My question is how can I deal with eventual consistency if I am firing my "UpdateLiveTripEvent" asynchronously. My UpdateLiveTripEventHandler has several points of failure and besides logging an error how can I deal with this inconsistency?
I am using a library called MediatR and the INotificationHandler which is as far as I know is "Fire and Forget"
Edit: Ended up finding this SO post that describes exactly what I need (saga/process manager) but unfortunately I am unable to find any kind of Saga implentation for handling events within the same BC. All examples I am seeing involve a sevice bus.
Same or different Bounded Context; with or without Sagas; it does not matter.
Why a event handling fail? Domain rules or Infrastructure.
Domain rules:
A raised event handled by an aggregate (the event handler use the aggregate to apply the event) should NEVER fail by Domain Rules.
If the "target" aggregate has Domain Rules that reject the event your aggregate design is wrong. Commands/Operations can be rejected by Domain rules. Events can not be rejected (nor Undo) by Domain rules.
A event should be raised when all domain rules to this operation was checked by the "origin" aggregate. The "target" aggregate apply the event and maybe raises another event with some values calculated by the "target" aggregate (domain rules, but not for reject the event; events are unrejectable by domain rules; but to "continue" the consistency "chain" with good responsibility segregation). That is the reason why events should have sentences in past as names; because already happened.
Event simulation:
Agg1: Hey buddies! User did this cool thing and everything seems to be OK. --> UserDidThisCoolThingEvent
Agg2: Woha, that is awesome! I'm gonna put +3 in User points. --> UserReceivedSomePointsEvent
Agg3: +3 points to this user? The user just reach 100 points. That is a lot! I'm gonna to convert this User into VIP User. --> UserTurnedIntoVIPEvent
Agg4: A new VIP User? Let's notify it to the rest of the Users to create some envy ;)
Infrastructure:
Fix it and apply the event. ;) Even "by hand" if needed once your persistence engine, network and/or machine is up again.
Automatic retries for short time fails. ErrorQueues/Logs to not loose your events (and apply it later) in a long time outage.
Event sourcing also helps with this because you can always reapply the persisted events in the "target" aggegate without extra effort to keep events somewhere (i.e. event logs) because your domain persistence is also your event store.

CQRS+ES: Client log as event

I'm developing small CQRS+ES framework and develop applications with it. In my system, I should log some action of the client and use it for analytics, statistics and maybe in the future do something in domain with it. For example, client (on web) download some resource(s) and I need save date, time, type (download, partial,...), from region or country (maybe IP), etc. after that in some view client can see count of download or some complex report. I'm not sure how to implement this feather.
First solution creates analytic context and some aggregate, in each client action send some command like IncreaseDownloadCounter(resourced) them handle the command and raise domain event's and updating view, but in this scenario first download occurred and after that, I send command so this is not really command and on other side version conflict increase.
The second solution is raising event, from client side and update the view model base on it, but in this type of handling my event not store in event store because it's not raise by command and never change any domain context. If is store it in event store, no aggregate to handle it after fetch for some other use.
Third solution is raising event, from client side and I store it on other database may be for each type of event have special table, but in this manner of event handle I have multiple event storage with different schema and difficult on recreating view models and trace events for recreating contexts states so in future if I add some domain for use this type of event's it's difficult to use events.
What is the best approach and solution for this scenario?
First solution creates analytic context and some aggregate
Unquestionably the wrong answer; the event has already happened, so it is too late for the domain model to complain.
What you have is a stream of events. Putting them in the same event store that you use for your aggregate event streams is fine. Putting them in a separate store is also fine. So you are going to need some other constraint to make a good choice.
Typically, reads vastly outnumber writes, so one concern might be that these events are going to saturate the domain store. That might push you towards storing these events separately from your data model (prior art: we typically keep the business data in our persistent book of record, but the sequence of http requests received by the server is typically written instead to a log...)
If you are supporting an operational view, push on the requirement that the state be recovered after a restart. You might be able to get by with building your view off of an in memory model of the event counts, and use something more practical for the representations of the events.
Thanks for your complete answer, so I should create something like the ES schema without some field (aggregate name or type, version, etc.) and collect client event in that repository, some offline process read and update read model or create command to do something on domain space.
Something like that, yes. If the view for the client doesn't actually require any validation by your model at all, then building the read model from the externally provided events is fine.
Are you recommending save some claim or authorization token of the user and sender app for validation in another process?
Maybe, maybe not. The token describes the authority of the event; our own event handler is the authority for the command(s) that is/are derived from the events. It's an interesting question that probably requires more context -- I'd suggest you open a new question on that point.

Event-sourcing: Dealing with derived data

How does an event-sourcing system deal with derived data? All the examples I've read on event-sourcing demonstrate services reacting to fact events. A popular example seems to be:
Bank Account System
Events
Funds deposited
Funds withdrawn
Services
Balance Service
They then show how the Balance service can, at any point, derive a state (I.e. balance) from the events. That makes sense; those events are facts. There's no question that they happened - they are external to the system.
However, how do we deal with data calculated BY the system?
E.g.
Overdrawn service:
A services which is responsible for monitoring the balance and performing some action when it goes below zero.
Does the event-sourcing approach dictate how we should use (or not use) derived data? I.e. The balance. Perhaps one of the following?
1) Use: [Funds Withdrawn event] + [Balance service query]
Listen for the "Funds withdrawn" event and then ask the Balance service for the current balance.
2) Use: [Balance changed event]
Get the balance service to throw a "Balance changed" event containing the current balance. Presumably this isn't a "fact" as it's not external to the system, therefore prone to miscalculation.
3) Use: [Funds withdrawn event] + [Funds deposited event]
We could just skip the Balance service and have each service maintain its own balance directly from the facts. ...though that would result in each service having its own (potentially different) version of the balance.
A services which is responsible for monitoring the balance and performing some action when it goes below zero.
Executive summary: the way this is handled in event sourced systems is not actually all that different from the alternatives.
Stepping back a second - the advantage of having a domain model is to ensure that all proposed changes satisfy the business rules. Borrowing from the CQRS language: we send command messages to a command handler. The handler loads the state of the model, and tries to apply the command. If the command is allowed, the changes to the state of the domain model is updated and saved.
After persisting the state of the model, the command handler can query that state to determine if their are outstanding actions to be performed. Udi Dahan describes this in detail in his talk on Reliable messaging.
So the most straight forward way to describe your service is one that updates the model each time the account balance changes, and sets the "account overdrawn" flag if the balance is negative. After the model is saved, we schedule any actions related to that state.
Part of the justification for event sourcing is that the state of the domain model is derivable from the history. Which is to say, when we are trying to determine if the model allows a command, we load the history, and compute from the history the current state, and then use that state to determine whether the command is permitted.
What this means, in practice, is that we can write an AccountOverdrawn event at the same time that we write the AccountDebited event.
That AccountDebited event can be subscribed to - Pub/Sub. The typical handling is that the new events get published after they are successfully written to the book of record. An event listener subscribing to the events coming out of the domain model observes the event, and schedules the command to be run.
Digression: typically, we'll want at-least-once execution of these activities. That means keeping track of acknowledgements.
Therefore, the event handler is also a thing with state. It doesn't have any business state in it, and certainly no rules that would allow it to reject events. What it does track is which events it has seen, and which actions need to be scheduled. The rules for loading this event handler (more commonly called a process manager) are just like those of the domain model - load events from the book of record to obtain the current state, then see if the event being handled changes anything.
So it is really subscribing to two events - the AccountDebited event, and whatever event returns from the activity to acknowledge that it has completed.
This same mechanic can be used to update the domain model in response to events from elsewhere.
Example: suppose we get a FundsWithdrawn event from an ATM, and we need to update the account history to match it. So our event handler gets loaded, updates itself, and schedules a RecordATMWithdrawal command to be run. When the command loads, it loads the account, updates the balances, and writes out the AccountCredited and AccountOverdrawn events as before. The event handler sees these events, loads the correct state process state based on the meta data, and updates the state of the process.
In CQRS terms, this is all taking place in the "write models"; these processes are all about updating the book of record.
The balance query itself is easy - we already showed that the balance can be derived from the history of the domain model, and that's just how your balance service is expected to do it.
To sum up; at any given time you can load the history of the domain model, to query its state, and you can load up the history of the event processor, to determine what work has yet to be acknowledged.
Event sourcing is an evolving discipline with a bunch of diverse practices, practitioners and charismatic people. You can't expect them to provide you with some very consistent modelling technique for all scenarios like you described. Each one of those scenarios has it's pros and cons and you specified some of them. Also it may vary dramatically from one project to another, because business requirements (evolutionary pressures of the market) will be different.
If you are working on some mission-critical system and you want to have very consistent balance all the time - it's better to use RDBMS and ACID transactions.
If you need maximum speed and you are okay with eventually consistent states and not very anxious about precision of your balances (some events may be missing here and there for bunch of reasons) then you can derive your projections for balances from events asynchronously.
In both scenarios you can use event sourcing, but you don't necessarily have to generate your projections asynchronously. It's okay to generate projection in the same transaction scope as you making changes to your write model if you really need to do that.
Will it make Greg Young happy? I have no idea, but who cares about such things if your balances one day may go out of sync in mission-critical system ...

Design of notification events

I am designing some events that will be raised when actions are performed or data changes in a system. These events will likely be consumed by many different services and will be serialized as XML, although more broadly my question also applies to the design of more modern funky things like Webhooks.
I'm specifically thinking about how to describe changes with an event and am having difficulty choosing between different implementations. Let me illustrate my quandry.
Imagine a customer is created, and a simple event is raised.
<CustomerCreated>
<CustomerId>1234</CustomerId>
<FullName>Bob</FullName>
<AccountLevel>Silver</AccountLevel>
</CustomerCreated>
Now let's say Bob spends lots of money and becomes a gold customer, or indeed any other property changes (e.g.: he now prefers to be known as Robert). I could raise an event like this.
<CustomerModified>
<CustomerId>1234</CustomerId>
<FullName>Bob</FullName>
<AccountLevel>Gold</AccountLevel>
</CustomerModified>
This is nice because the schema of the Created and Modified events are the same and any subscriber receives the complete current state of the entity. However it is difficult for any receiver to determine which properties have changed without tracking state themselves.
I then thought about an event like this.
<CustomerModified>
<CustomerId>1234</CustomerId>
<AccountLevel>Gold</AccountLevel>
</CustomerModified>
This is more compact and only contains the properties that have changed, but comes with the downside that the receiver must apply the changes and reassemble the current state of the entity if they need it. Also, the schemas of the Created and Modified events must be different now; CustomerId is required but all other properties are optional.
Then I came up with this.
<CustomerModified>
<CustomerId>1234</CustomerId>
<Before>
<FullName>Bob</FullName>
<AccountLevel>Silver</AccountLevel>
</Before>
<After>
<FullName>Bob</FullName>
<AccountLevel>Gold</AccountLevel>
</After>
</CustomerModified>
This covers all bases as it contains the full current state, plus a receiver can figure out what has changed. The Before and After elements have the exact same schema type as the Created event. However, it is incredibly verbose.
I've struggled to find any good examples of events; are there any other patterns I should consider?
You tagged the question as "Event Sourcing", but your question seems to be more about Event-Driven SOA.
I agree with #Matt's answer--"CustomerModified" is not granular enough to capture intent if there are multiple business reasons why a Customer would change.
However, I would back up even further and ask you to consider why you are storing Customer information in a local service, when it seems that you (presumably) already have a source of truth for customer. The starting point for consuming Customer information should be getting it from the source when it's needed. Storing a copy of information that can be queried reliably from the source may very well be an unnecessary optimization (and complication).
Even if you do need to store Customer data locally (and there are certainly valid reasons for need to do so), consider passing only the data necessary to construct a query of the source of truth (the service emitting the event):
<SomeInterestingCustomerStateChange>
<CustomerId>1234</CustomerId>
</SomeInterestingCustomerStateChange>
So these event types can be as granular as necessary, e.g. "CustomerAddressChanged" or simply "CustomerChanged", and it is up to the consumer to query for the information it needs based on the event type.
There is not a "one-size-fits-all" solution--sometimes it does make more sense to pass the relevant data with the event. Again, I agree with #Matt's answer if this is the direction you need to move in.
Edit Based on Comment
I would agree that using an ESB to query is generally not a good idea. Some people use an ESB this way, but IMHO it's a bad practice.
Your original question and your comments to this answer and to Matt's talk about only including fields that have changed. This would definitely be problematic in many languages, where you would have to somehow distinguish between a property being empty/null and a property not being included in the event. If the event is getting serialized/de-serialized from/to a static type, it will be painful (if not impossible) to know the difference between "First Name is being set to NULL" and "First Name is missing because it didn't change".
Based on your comment that this is about synchronization of systems, my recommendation would be to send the full set of data on each change (assuming signal+query is not an option). That leaves the interpretation of the data up to each consuming system, and limits the responsibility of the publisher to emitting a more generic event, i.e. "Customer 1234 has been modified to X state". This event seems more broadly useful than the other options, and if other systems receive this event, they can interpret it as they see fit. They can dump/rewrite their own data for Customer 1234, or they can compare it to what they have and update only what changed. Sending only what changed seems more specific to a single consumer or a specific type of consumer.
All that said, I don't think any of your proposed solutions are "right" or "wrong". You know best what will work for your unique situation.
Events should be used to describe intent as well as details, for example, you could have a CustomerRegistered event with all the details for the customer that was registered. Then later in the stream a CustomerMadeGoldAccount event that only really needs to capture the customer Id of the customer who's account was changed to gold.
It's up to the consumers of the events to build up the current state of the system that they are interested in.
This allows only the most pertinent information to be stored in each event, imagine having hundreds of properties for a customer, if every command that changed a single property had to raise an event with all the properties before and after, this gets unwieldy pretty quickly. It's also difficult to determine why the change occurred if you just publish a generic CustomerModified event, which is often a question that is asked about the current state of an entity.
Only capturing data relevant to the event means that the command that issues the event only needs to have enough data about the entity to validate the command can be executed, it doesn't need to even read the whole customer entity.
Subscribers of the events also only need to build up a state for things that they are interested in, e.g. perhaps an 'account level' widget is listening to these events, all it needs to keep around is the customer ids and account levels so that it can display what account level the customer is at.
Instead of trying to convey everything through payload xmls' fields, you can distinguish between different operations based on -
1. Different endpoint URLs depending on the operation(this is preferred)
2. Have an opcode(operation code) as an element in the xml file which tells which operation is to used to handle the incoming request.(more nearer to your examples)
There are a few enterprise patterns applicable to your business case - messaging and its variants, and if your system is extensible then Enterprise Service Bus should be used. An ESB allows reliable handling of events and processing.

Resources