Event Sourcing - Aggregate modeling - event-sourcing

Two questions
1) How to model aggregate and reference between them
2) How to organise/store events so that they can be retrieved efficiently
Take this typical use case as example, we have Order and LineItem (they are an aggregate, Order is the aggregate root), and Product aggregate.
As LineItem needs to know which Product, so there are two options 1) LineItem has direct reference to Product aggregate (which seems not a best practice, as it violate the idea of aggregate being a consistence boundary because we can update Product aggregate directly from Order aggregate) 2) then LineItem only has ProductId.
It looks like 2nd option is the way to go...What do you think here?
However, another problem arises which is about building a Order read/view model. In this Order view model, it needs to know which Products are in Order (i.e. ProductId, Type, etc.). The typical use case is reporting, and CommandHandler also can use this Product object to perform logic such as whether there are too many particular products, etc. In order to do it, given the fact that those data are in two separate aggregate, then we need 1+ database roundtrips. As we are using events to build model, so the pseudo code looks like below
1) for a given order id (guid, order aggregate id), we load all the events for it; -- 1st database access
2) then build a Order aggregate, then we know which ProductId are referenced in Order;
3) for the list of ProductIds, we load all events for it; -- 2nd database access
If we build a really big graph of objects (a lot of different aggregates), then this may end up with a few more database access (each of which is slow)...What's your idea in here?
Thanks

Take this typical use case as example, we have Order and LineItem (they are an aggregate, Order is the aggregate root), and Product aggregate.
The Order aggregate makes sense the way you have described it. "Product aggregate" is more suspicious; do you ask the model if the product is allowed to change, or are you telling the model that the product has changed?
If Product can change without first consulting with the order, then the LineItem must not include the product. A reference to the product (aka the ProductId) is ok.
If we build a really big graph of objects (a lot of different aggregates), then this may end up with a few more database access (each of which is slow)...What's your idea in here?
For reads, reports, and the like -- where you aren't going to be adding new events to the history -- one possible answer is to do the slow work in advance. An asynchronous process listens for writes in the event store, and then publishes those events to a bus. Subscribers build new versions of the reports when new events are observed, and cache the results. (search keyword: cqrs)
When a client asks for a report, you give them one out of the cache. All the work is done, so it's very quick.
For command handlers, the answer is more complicated. Business rules are supposed to be in the domain model, so having the command handler try to validate the command (as opposed to the domain model) is a bit broken.
The command handler can load the products to see what the state might look like, and pass that information to the aggregate with the command data, but it's not clear that's a good idea -- if the client is going to send a command to be run, and you need to flesh out the Order command with Product data, why not instead have the command add the product data to the command directly, and skip the middle man.
CommandHandler also can use this Product object to perform logic such as whether there are too many particular products, etc.
This example is a bit vague, but taking a guess: you are thinking about cases where you prevent an order from being placed if the available inventory is insufficient to fulfill the order.
For real world inventory - a physical book in a warehouse - that's probably the wrong approach to take. First, the model itself is wrong; if you want to know how much product is in the warehouse, you should be querying the warehouse, not the product. Second, a physical warehouse is not constrained by your model -- calling the addProduct method on a warehouse aggregate doesn't cause the product to magically appear there.
Third, it probably doesn't match very well with what your domain experts want anyway. If the model says that the warehouse doesn't have enough product, do you think the stake holders want the system to
tell the shopper to buy the product somewhere else, or...
accept the order, and contact the supplier for a new delivery.
Hint: when in doubt, carefully review how amazon.com does it.

Related

Product price changed while creating order

What is the DDD way of handling the following scenario:
user enters Order Create screen and starts creatingnew Order with OrderItems
user chooses ProductX from products catalog and adds quantity
OrderItem for ProductX is created on Order and user goes on adding another product
in the meantime, before Order is saved, admin changes price for ProductX
Assuming Product and Order/OrderItem are separate aggregates, potentially even separate bounded contexts, how is this handled?
I can think of several options:
optimistic concurrency combined with db transactions, but then if we broaden the question to microservices where each microservice has its own db - what then?
joining everything into one giant AR but that doesn’t seem right.
introduce a business rule that no product prices are updated during the point of sales working hours but that is often not possible (time triggered discounts, e.g.)
What is the proper DDD/microservices way of solving this?
What is the proper DDD/microservices way of solving this?
The general answer is that you make time an explicit part of your pricing model. Price changes made to the product catalog have an effective date, which means that you can, by modeling time in the order, have complete agreement on what price the shopper saw at the time of the order.
This might introduce the concept of a QuotedPrice as something separate from the Catalog price, where the quote is a promise to hold a price for some amount of time.
To address this sort of problem in general, here are three important papers to review:
Memories, Guesses, and Apologies -- Pat Helland, 2007
Data on the Outside vs Data on the Inside -- Pat Helland, 2005
Race Conditions Don't Exist -- Udi Dahan, 2010
I think one way to solve this through is Events. As you said, Product and Order can are very least separate aggregates, I would keep them loosely coupled. Putting them into one single aggregate root would against Open/Close and Single Responsibility Principle.
If a Product changes it can raise a ProductChanged event and likewise of an Order.
Depending on whether these Domain-Objects are within the same service or different service you can create a Domain-Event or an Integration event. Read more about it here.
From the above link:
A domain event is, something that happened in the domain that you want other parts of the same domain (in-process) to be aware of. The notified parts usually react somehow to the events.
I think this fits perfectly to your scenario.

Elasticsearch read model sync with write model

My application following CQRS strategy separates Read model from Write model. I have a Product and multiple Purchase orders related to that Product.
The PurchaseOrder read model is in Elasticsearch and with product name attached. Now if I change the product name in the write model then I need to update all the PurchaseOrder's productName field accordingly in the read model(using Elasticsearch's bulk update API).
My question is: As I have millions of PurchaseOrders, will this productName sync be a performance issue? Or any suggestions for modeling such kind of syncing?
Although I do not believe that changing a product name on existing orders is a good idea (the invoice might have been generated and the product name in the order should match the one in the invoice), the question still has merit.
You may want your PurchaseOrder to only keep the ID (and perhaps the version?) of the Product, so that you can avoid such a mass update. This approach, on the other hand, requires a call to the Product aggregate root every time you want to translate the ID of the product in its own name. The impact of such a read can obviously be mitigated by using a cache.
I guess it really depend on the number of occurrences of such two circumstances to happen and I would then optimize the most occurring one.

How to implement constraints that are external to a microservice?

Suppose we have two microservices, Customers and Orders, with no dependencies between them, i.e. they don't call each other, they don't know each other. Every order, though, has a reference to a customer by means of a customer id. In other words one customer may have zero or more orders, and one order belongs to exactly one customer.
For the sake of the example, it's totally fine to delete a customer unless there are orders belonging to that customer. What is the most appropriate way to implement a constraint on Customers that prevents a customer from being deleted if one or more orders have a reference to that customer? Analogous to referential integrity in a relational database.
These are the approaches I can think of:
Let Customers ask Orders if a given customer has any orders, e.g. via API call.
Let Customers keep track of which orders are assigned to every customer, e.g. by having each customer record maintain a list of order ids.
Merge Customers and Orders into a single microservice and manage it internally.
I'm not sure which approach is the best for the given example in a microservices context. I can see pros and cons in all three approaches but I won't list them here because I'd like to hear other people's thoughts on the problem, including approaches not listed above. Thanks.
Probably the second approach would help if you're going to decouple through events, either tracking a list of ids or a counter just telling how many orders are stored for such a Customer.
On the Order microservice you will emit an event when there is a creation/deletion that will be captured by the Customer (or any other microservice interested) who will take care of updating the list of order ids (or increment/reduce the counter).
If customer order counter is 0 then you may delete the customer.
Let's start with your third approach: This will not work in a Microservice world, because you will always have those constraints between some Services. And if you want to solve all of them this way, you'll end up with a Monolith - and that's the end of your Microservice story.
The first and second approach have both the same "problem": These are async operations, that may return false positive (or false negative) results: It's possible to make api requests for delete customer and create order (or delete order) at the same time.
Though this can happen:
For your first approach: Customer Service asks Order Service if there are any Orders for this Customer. Order Service returns 0. And at the same time Order Service creates a new Order for that Customer in another thread. So you end up with a deleted Customer and still created an Order.
The same applies for your second approach: The messaging between those services is async. Though it's possible that Customer Service knows of 0 Orders, and permits the delete. But at the same time the Order Service creates a new Order for this Customer. And the OrderCreated message will hit the Customer Service after the Customer has already been deleted.
If you try to do it the other way around, you'll end up with the same situation: Your Order Service could listen to CustomerDeleted messages, and then disallow creating new Orders for this Customer. But again: This message can arrive while there are still Orders in the database for this Customer.
Of course this is very unlikely to happen, but it still is possible and you cannot prevent it in an async Microservice world without transactions (which of course you want to avoid).
You should better ask yourself: How should the system handle Orders where the corresponding Customer has been deleted?
The answer to this question is most likely dependent on your business rules. For example: If the Order Service receives a CustomerDeleted message, it may be okay to simply delete all Orders from this Customer. Or maybe the behavior depends on the Order's state property: It's okay to delete all Orders with state = draft, but every other Order from this Customer should still be processed and shipped as usual.

DM and hierarchies - dimensions for future use

My very first DM so be gentle..
Modeling a hierarchy with ERD as follows:
Responses are my facts. All the advice I've seen indicates creating a single dimension (say dim_event) and denormalizing event, department and organization into that dimension:
What if I KNOW that there will be future facts/reports that rely on an Organization dimension, or a Department dimension that do not involve this particular fact?
It makes more sense to me (from the OLTP world) to create individual dimensions for the major components and attach them to the fact. That way they could be reused as conformed dimensions.
This way for any updating dimension attributes there would be one dim table; if I had everything denormalized I could have org name in several dimension tables.
--Update--
As requested:
An "event" is an email campaign designed to gather response data from a specific subset of clients. They log in and we ask them a series of questions and score the answers.
The "response" is the set of scores we generate from the event.
So an "event" record may look like this:
name: '2019 test Event'
department: 'finance'
"response" records look something like this:
event: '2019 test Event'
retScore: 2190
balScore: 19.98
If your organization and department are tightly coupled (i.e. department implies organization as well), they should be denormalized and created as a single dimension. If department & organization do not have a hierarchical relationship, they would be separate dimensions.
Your Event would likely be a dim (degenerate) and a fact. The fact would point to the various dimensions that describe the Event and would contain the measures about what happened at the Event (retScore, balScore).
A good way to identify if you're dealing with a dim or a fact is to ask "What do I know before any thing happens?" I expect you'd know which orgs & depts are available. You may even know certain types of recurring events (blood drive, annual fundraiser), which could also be a separate dimension (event type). But you wouldn't have any details about a specific event, HR Fundraiser 2019 (fact), until one is scheduled.
A dimension represents the possibilities, but a fact record indicates something actually happens. My favorite analogy for this is a restaurant menu vs a restaurant order. The items on the menu can be referenced even if they've never been ordered. The menu is the dimension, the order is the fact.
Hope this helps.

Very slow search of a simple entity relationship

We use CRM 4.0 at our institution and have no plans to upgrade presently as we've spend the last year and a half customising and extending the CRM to work with our processes.
A tiny part of model is a simply hierarchy, we have a group of learning rooms that has a one-to-many relationship with another entity that describes the courses available for that learning room.
Another entity has a list of all potential and enrolled students who have expressed an interest in whichever course.
That bit's all straightforward and works pretty well and is modelled into 3 custom entities.
Now, we've got an Admin application that reads the rooms and then wants to show the courses for that room, but only where there are enrolled students.
In SQL this is simplified to:
SELECT DISTINCT r.CourseName, r.OtherInformation
FROM Rooms r
INNER JOIN Students S
ON S.CourseId = r.CourseId
WHERE r.RoomId = #RoomId
And this indeed is very close to the eventual SQL that CRM generates.
We use a Crm QueryEntity, a Filter and a LinkEntity to represent this same structure.
The problem now is that the CRM normalizes the a customize entity into a Base Table which has the standard CRM entity data that all share, and then an ExtensionBase Table which has our customisations. To Give a flattened access to this, it creates a view that merges both tables.
This view is what is used by the Generated SQL.
Now the base tables have indices but the view doesn't.
The problem we have is that all we want to do is return Courses where the inner join is satisfied, it's enough to prove there are entries and CRM makes it SELECT DISTINCT, so we only get one item back for Room.
At first this worked perfectly well, but now we have thousands of queries, it takes well over 30 seconds and of course causes a timeout in anything but SMS.
I'm given to believe that we can create and alter indices on tables in CRM and that's not considered to be an unsupported modification; but what about Views ?
I know that if we alter an entity then its views are recreated, which would of course make us redo our indices when this happens.
Is there any way to hint to CRM4.0 that we want a specific index in place ?
Another source recommends that where you get problems like this, then it's best to bring data closer together, but this isn't something I'd feel comfortable in trying to engineer into our solution.
I had considered putting a new entity in that only has RoomId, CourseId and Enrolment Count in to it, but that smacks of being incredibly hacky too; After all, an index would resolve the need to duplicate this data and have some kind of trigger that updates the data after every student operation.
Lastly, whilst I know we're stuck on CRM4 at the moment, is this the kind of thing that we could expect to have resolved in CRM2011 ? It would certainly add more weight to the upgrading this 5 year old product argument.
Since views are "dynamic" (conceptually, their contents are generated on-the-fly from the base tables every time they are used), they typically can't be indexed. However, SQL Server does support something called an "indexed view". You need to create a unique clustered index on the view, and the query analyzer should be able to use it to speed up your join.
Someone asked a similar question here and I see no conclusive answer. The cited concerns from Microsoft are Referential Integrity (a non-issue here) and Upgrade complications. You mention the unsupported option of adding the view and managing it over upgrades and entity changes. That is an option, as unsupported and hackish as it is, it should work.
FetchXml does have aggregation but the query execution plans still uses the views: here is the SQL generated from a simple select count from incident:
'select
top 5000 COUNT(*) as "rowcount"
, MAX("__AggLimitExceededFlag__") as "__AggregateLimitExceeded__" from (select top 50001 case when ROW_NUMBER() over(order by (SELECT 1)) > 50000 then 1 else 0 end as "__AggLimitExceededFlag__" from Incident as "incident0" ...
I dont see a supported solution for your problem.
If you are building an outside admin app and you are hosting CRM 4 on-premise you could go directly to the database for your query bypassing the CRM API. Not supported but would allow you to solve the problem.
I'm going to add this as a potential answer although I don't believe its a sustainable or indeed valid long-term solution.
After analysing the indexes that CRM had defined automatically, I realised that selecting more information in my query would be enough to fulfil the column requirements of an Index and now the query runs in less then a second.

Resources