Fat Domain Models => Inefficient? - performance

Looking at DDD, we abstract the database into the various models which we operate on and look at it as a repository where our models live. Then we add the Data Layers and the Service/Business layers on top of it. My question is, in doing so, are we creating inefficiencies in data transfer by building fat models?
For example, say we have system that displays an invoice for a customer on the screen.
Thinking of it in terms of OOP, we'd probably end up with an object that looks somewhat like this:
class Invoice {
Customer _customer;
OrderItems _orderitems;
ShippingInfo _shippingInfo;
}
class Customer {
string name;
int customerID;
Address customerAddress;
AccountingInfo accountingInfo;
ShoppingHistory customerHistory;
}
(for the sake of the question/argument,
let's say it was determined that the customer class had to
implement AccountingInfo and ShoppingHistory)
If the invoice solely needs to print the customer name, why would we carry all the other baggage with it? Using the repository type of approach seems like we would be building these complex domain objects which require all these resources (CPU, memory, complex query joins, etc) AND then transferring it over the tubes to the client.
Simply adding a customerName property to the invoice class would be breaking away from abstractions and seems like a horrible practice. On the third hand, half filling an object like the Customer seems like a very bad idea as you could end up creating multiple versions of the same object (e.g. one that has a an Address, but no ShoppingHistory, and one that has AccountingInfo but no Address, etc, etc). What am I missing, or not understanding?

As good object relational mappers can lazy load the relationships, you would therefore pull back the customer for your invoice, but ignore their accounting and shopping history. You could roll your own if you're not using an oject-relational mapper.
Often you can't do this within your client as you'll have crossed your trasaction boundary (ended your database trasaction) and so it is up to your service layer to ensure the right data has been loaded.
Testing the right data is available (and not too much of it) is often good to do in unit tests on a service layer.

You say "it was determined that the customer class had to implement AccountingInfo and ShoppingHistory", so clearly displaying an invoice is NOT the only task that the system performs (how else was it "determined" that customers need those other functionalities otherwise?-).
So you need a table of customers anyway (for those other functionalities) -- and of course your invoice printer needs to get customer data (even just the name) from that one table, the same one that's used by other functionality in the system.
So the "overhead" is purely illusory -- it appears to exist when you look at one functionality in isolation, but doesn't exist at all when you look at the whole system as an integrated whole.

Related

Entity/Domain purety dilemma in the clean architecutre/Domain driven design

Im working on a eCommerce system in which I try to implement the clean architecture.
But currently Im stuck a little bit.
So I have a use case called: CreateItemUseCase in which I create a Item (alias product) for the shop.
In this use case I call a method (createItemEntity()) of a Entity called ItemEntity.
This method creates just a data object with data like:
userId
itemTitle
itemDescription
...
So now I need another method in the ItemEntity which validates the userId.
To create a Item the user needs to have a userId so the method in the ItemEntity would be called:
validateUserId()
This method should check if the user has a userId in the database and if not the Item creation would be imposible.
Now my question:
How do I validate the userId?
Should I have the validateUserId() method take a array as a parameter, In which all the User Id´s are saved... something like this:
validateUserId(toBeValidated: Int, allUserIds: Array[Int])
{
// loop through the allUserIds to see if toBeValidated is in there ...
}
Or should I query the data in the method (which Im pretty sure, would violate the dependencie rule) like this:
validateUserId(toBeValidated: Int)
{
// get all user id´s through a query, and check if toBeValidated is in there ...
}
Or should I do it completly different?
In general, entities should only contain logic that is operating on information (data) that is within the entity's scope. Knowing how to query if a user with a certain user id exists or not is not in the scope of the item entity.
I think your motivation to keep all the logic for validation together is reasonable but on the other hand you should not introduce infrastructure dependencies (like talking to the database or user repository) to the entity. Knowing how to query if a user with a certain user id exists or not is not in the scope of the item entity.
Or should I query the data in the method (which Im pretty sure, would violate the dependencie rule) like this
Exactly, that's why it's usually best trying to avoid that to keep entities free from such dependencies. Introducing such dependencies can easily get out of hand and also increase complexity for testing such entities. If you need to do that it should be a very thought decision that justifies that.
Should I have the validateUserId() method take a array as a parameter, In which all the User Id´s are saved... something like this
This is not such a bad idea in general, because you would not make the entity dependent on infrastructure and provide the entity with all the data it needs for decision making. But on the other hand now you can run into another problem: bad performance.
Now you would retrieve all user ids everytime you create an item. If you would do the check for the user's existence somewhere else this can be optimized much better.
I suggest to ask the user repository beforehand if the user exists prior to performance the entity creation including all the other potentially required validations inside item entity that make sense there. The user repository could have a query that optimizes for just checking for the existence of this user by id.
In case these two operations (asking for the user's existence and creating the new item) only happen at one place of the application I'd be pragmatic and perform the user existence check directly in the use case. If this would occur from different places in your application you can extract that logic into a separate (domain) service (e.g. item service) which deals with the repetitive flow operations working with the user repository and item entity.
What you are dealing here with is a trade-off decision between domain model purity, domain model completeness and performance considerations. In this great blog this is named the Domain-Driven Design Trilemma. I suggest going through the reasoning in the article, I'm pretty sure it will help you coming to a final decision.
I think this is one of side case of what we call Business Gerunds
Details: https://www.forbes.com/sites/forbestechcouncil/2022/05/19/10-best-practices-for-event-streaming-success/
If Item has to validate the user, just see what common attributes are there between entities and who is responsible for change of those, and then a segregation can be done in DDD representation, and using a composite via transaltion, outside world entities can exist as the same

Best practice with coding system values

I think this should be an easy one, but haven't found any clear answer, on what would the best practice be.
In an application, we keep current status of an order (open, canceled, shipped, closed ...).
This variables cannot change without code change, but application should meet the following criteria:
status names should be easily displayed in different languages,
application can search via freetext status names (like googling for "open")
status_id should be available to developer via enum
zero headache when adding new statuses
Possible ways we have tackled this so far:
having DB table status with PK(id, language_id) and a separate enum which represents this statuses in an application.
PROS: 1.,2.,3. work out of the box, CONS: 4. needs to run update script on every client installation, SQL selects can become large and cumbersome, when dealing with a lot of code tables
having just enum:
PROS: 3.,4. CONS: 1.,2. is a total nightmare
having enums, which populate database tables on each start of an application:
PROS: 1.,2.,3.,4. work CONS: some overhead on application start, SQL select can become large and cumbersome, when dealing a lot code tables.
What is the most common way of tackling this problem?
Sounds like you summarized it pretty good yourself, and comparing the pros/cons points towards #3. Just one comment when you implement #3 though:
Use a caching mechanism (even a simple HashMap!) plus adding the option to refresh the cache - will ease your work when you'll want to change values (without the need to restart every time!).
I would, and do, use method 3 because it is the best of the lot. You can use resource files to store the translations in and map the enum values to keys in the resource files. Your database can contain the id of the enum for the status.
1.status names should be easily displayed in different languages,
2.application can search via freetext status names (like googling for "open")
These are interfaces layer's concern, you'd better not mix them in you domain model.
I would setup a mapping between status enum and i18n codes. the mapping could be stored in a file (cached in memory) or hardcoded.
for example: if you use dto or view adatper to render your ui.
public class OrderDetailViewAdapter {
private Order order;
public String getStatus() {
return i18nMapper.to(order.getStatus());//use hardcoded switch case or file impl
}
}
Or you could done this before you populating you dtos.
You could use a similar solution for goal2. When user types text, find corresponding enum from mapping and use enum for search.
Anyway, use db tables the less the better.
Personally, I always use dedicated enum class inside domain. Only responsibility of this class is holding status name (OPEN, CANCELED, SHIPPED, ...). Status name is not visible outside codebase. Also, status could be also stored inside database field as string (varchar or similar).
For the purpose of rendering, depending of number of use cases, sometimes I implement formatting inside formatter (e.g. OrderFormatter::formatStatusName(), OrderFormatter::formatAbbreviatedStatusName(), ...). If formatting is needed often I create dedicated class with all formatting styles needed (OrderStatusFormatter::short(), OrderStatusFormatter::abbriviated()...). Of course, internal mapping is needed to map status name to status title, and this is tricky part. But if you want layering you can't avoid mapping.
Translation is not dealt so far. I translate strings inside templates so formatters are clean of that responsibility. To summarize:
enum inside domain model
formatter inside presentation layer
translation inside template
There is no need to create special table for order status translations. Better choice would be to implement generic translation mechanism, seperated from your business code.

Validation on domain entities along with MVP

How do you apply validation in an MVP/domain environment ?
Let me clearify with an example:
Domain entity:
class Customer
{
string Name;
etc.
}
MVP-model
class CustomerModel
{
string Name;
etc.
}
I want to apply validation on my domain entities but the MVP model has it's own model/class
apart from the domain entity, does that mean I have to copy the validation code
to also work on the MVP-model?
One solution I came up with is to drop the MVP-model and use the domain entity as MVP-Model,
but I don't want to set data to the entities that isn't validated yet.
And second problem that rises is that if the entity has notify-events,
other parts of the application will be affected with faulty data.
A third thing with that approach is if the user edits some data and then cancels the edit, how do I revert to the old values ? (The entity might not come from a DB so reloading the entity is't possible in all cases).
Another solution is to make some sort of copy/clone of the entity in question and use the copy as MVP-model, but then it might get troublesome if the entity has a large object graph.
Anyone has some tips about these problems?
Constraining something like the name of a person probably does not rightfully belong in the domain model, unless in the client's company there is actually a rule that they don't do business with customers whose names exceed 96 characters.
String length and the like are not concerns of the domain -- two different applications employing the same model could have different requirements, depending on the UI, persistence constraints, and use cases.
On the one hand, you want to be sure that your model of a person is complete and accurate, but consider the "real world" person you are modeling. There are no rules about length and no logical corollary to "oops, there was a problem trying to give this person a name." A person just has a name, so I'd argue that it is the responsibility of the presenter to validate what the user enters before populating the domain model, because the format of the data is a concern of the application moreso than the domain.
Furthermore, as Udi Dahan explains in his article, Employing the Domain Model Pattern, we use the domain model pattern to encapsulate rules that are subject to change. That a person should not a have a null name is not a requirement that is likely ever to change.
I might consider using Debug.Assert() in the domain entity just for an added layer of protection through integration and/or manual testing, if I was really concerned about a null name sneaking in, but something like length, again, doesn't belong there.
Don't use your domain entities directly -- keep that presentation layer; you're going to need it. You laid out three very real problems with using entities directly (I think Udi Dahan's article touches on this as well).
Your domain model should not acquiesce to the needs of the application, and soon enough your UI is going to need an event or collection filter that you're just going to have to stick into that entity. Let the presentation layer serve as the adapter instead and each layer will be able to maintain its integrity.
Let me be clear that the domain model does not have to be devoid of validation, but the validation that it contains should be domain-specific. For example, when attempting to give someone a pay raise, there may be a requirement that no raise can be awarded within 6 months of the last one so you'd need to validate the effective date of the raise. This is a business rule, is subject to change, and absolutely belongs in the domain model.

What exactly is the model in MVC

I'm slightly confused about what exactly the Model is limited to. I understand that it works with data from a database and such. Can it be used for anything else though? Take for example an authentication system that sends out an activation email to a user when they register. Where would be the most suitable place to put the code for the email? Would a model be appropriate... or is it better put in a view, controller, etc?
Think of it like this. You're designing your application, and you know according to the roadmap that version 1 will have nothing but a text based command line interface. version 2 will have a web based interface, and version 3 will use some kind of gui api, such as the windows api, or cocoa, or some kind of cross platform toolkit. It doesn't matter.
The program will probably have to go across to different platforms too, so they will have different email subsystems they will need to work with.
The model is the portion of the program that does not change across these different versions. It forms the logical core that does the actual work of whatever special thing that the program does.
You can think of the controller as a message translator. it has interfaces on two sides, one faces towards the model, and one faces towards the view. When you make your different versions, the main activity will be rewriting the view, and altering one side of the controller to interface with the view.
You can put other platform/version specific things into the controller as well.
In essense, the job of the controller is to help you decouple the domain logic that's in the model, from whatever platform specific junk you dump into the view, or in other modules.
So to figure out whether something goes in the model or not, ask yourself the question "If I had to rewrite this application to work on platform X, would I have to rewrite that part?" If the answer is yes, keep it out of the model. If the answer is no, it may go into the model, if it's part of the essential logic of the program.
This answer might not be orthodox, but it's the only way I've ever found to think of the MVC paradigm that doesn't make my brain melt out of my ear from the meaningless theoretical mumbo jumbo that discussions about MVC are so full of.
Great question. I've asked this same question many times in my early MVC days. It's a difficult question to answer succintly, but I'll do my best.
The model does generally represent the "data" of your application. This does not limit you to a database however. Your data could be an XML file, a web resource, or many other things. The model is what encapsulates and provides access to this data. In an OOP language, this is typically represented as an object, or a collection of objects.
I'll use the following simple example throughout this answer, I will refer to this type of object as an Entity:
<?php
class Person
{
protected $_id;
protected $_firstName;
protected $_lastName;
protected $_phoneNumber;
}
In the simplest of applications, say a phone book application, this Entity would represent a Person in the phone book. Your View/Controller (VC) code would use this Entity, and collections of these Entities to represent entries in your phone book. You may be wondering, "OK. So, how do I go about creating/populating these Entities?". A common MVC newbie mistake is to simply start writing data access logic directly in their controller methods to create, read, update, and delete (CRUD) these. This is rarely a good idea. The CRUD responsibilities for these Entities should reside in your Model. I must stress though: the Model is not just a representation of your data. All of the CRUD duties are part of your Model layer.
Data Access Logic
Two of the simpler patterns used to handle the CRUD are Table Data Gateway and Row Data Gateway. One common practice, which is generally "not a good idea", is to simply have your Entity objects extend your TDG or RDG directly. In simple cases, this works fine, but it bloats your Entities with unnecessary code that has nothing to do with the business logic of your application.
Another pattern, Active Record, puts all of this data access logic in the Entity by design. This is very convenient, and can help immensely with rapid development. This pattern is used extensively in Ruby on Rails.
My personal pattern of choice, and the most complex, is the Data Mapper. This provides a strict separation of data access logic and Entities. This makes for lean business-logic exclusive Entities. It's common for a Data Mapper implementation to use a TDG,RDG, or even Active Record pattern to provide the data access logic for the mapper object. It's a very good idea to implement an Identity Map to be used by your Data Mapper, to reduce the number of queries you are doing to your storage medium.
Domain Model
The Domain Model is an object model of your domain that incorporates behavior and data. In our simple phone book application this would be a very boring single Person class. We might need to add more objects to our domain though, such as Employer or Address Entities. These would become part of the Domain Model.
The Domain Model is perfect for pairing with the Data Mapper pattern. Your Controllers would simply use the Mapper(s) to CRUD the Entities needed by the View. This keeps your Controllers, Views, and Entities completely agnostic to the storage medium. This also allows for differing Mappers for the same Entity. For example, you could have a Person_Db_Mapper object and a Person_Xml_Mapper object; the Person_Db_Mapper would use your local DB as a data source to build Entities, and Person_Xml_Mapper could use an XML file that someone uploaded, or that you fetched with a remote SOAP/XML-RPC call.
Service Layer
The Service Layer pattern defines an application's boundary with a layer of services that establishes a set of available operations and coordinates the application's response in each operation. I think of it as an API to my Domain Model.
When using the Service Layer pattern, you're encapsulating the data access pattern (Active Record, TDG, RDG, Data Mapper) and the Domain Model into a convenient single access point. This Service Layer is used directly by your Controllers, and if well-implemented provides a convenient place to hook in other API interfaces such as XML-RPC/SOAP.
The Service Layer is also the appropriate place to put application logic. If you're wondering what the difference between application and business logic is, I will explain.
Business logic is your domain logic, the logic and behaviors required by your Domain Model to appropriately represent the domain. Here are some business logic examples:
Every Person must have an Address
No Person can have a phone number longer than 10 digits
When deleting a Person their Address should be deleted
Application logic is the logic that doesn't fit inside your Domain. It's typically things your application requires that don't make sense to put in the business logic. Some examples:
When a Person is deleted email the system administrator
Only show a maximum of 5 Persons per page
It doesn't make sense to add the logic to send email to our Domain Model. We'd end up coupling our Domain Model to whatever mailing class we're using. Nor would we want to limit our Data Mapper to fetch only 5 records at a time. Having this logic in the Service Layer allows our potentially different APIs to have their own logic. e.g. Web may only fetch 5, but XML-RPC may fetch 100.
In closing, a Service ayer is not always needed, and can be overkill for simple cases. Application logic would typically be put directly in your Controller or, less desirably, In your Domain Model (ew).
Resources
Every serious developer should have these books in his library:
Design Patterns: Elements of Reusable Object-Oriented Software
Patterns of Enterprise Application Architecture
Domain-Driven Design: Tackling Complexity in the Heart of Software
The model is how you represent the data of the application. It is the state of the application, the data which would influence the output (edit: visual presentation) of the application, and variables that can be tweaked by the controller.
To answer your question specifically
The content of the email, the person to send the email to are the model.
The code that sends the email (and verify the email/registration in the first place) and determine the content of the email is in the controller. The controller could also generate the content of the email - perhaps you have an email template in the model, and the controller could replace placeholder with the correct values from its processing.
The view is basically "An authentication email has been sent to your account" or "Your email address is not valid". So the controller looks at the model and determine the output for the view.
Think of it like this
The model is the domain-specific representation of the data on which the application operates.
The Controller processes and responds to events (typically user actions) and may invoke changes on the model.
So, I would say you want to put the code for the e-mail in the controller.
MVC is typically meant for UI design. I think, in your case a simple Observer pattern would be ideal. Your model could notify a listener registerd with it that a user has been registered. This listener would then send out the email.
The model is the representation of your data-storage backend. This can be a database, a file-system, webservices, ...
Typically the model performs translation of the relational structures of your database to the object-oriented structure of your application.
In the example above: You would have a controller with a register action. The model holds the information the user enters during the registration process and takes care that the data is correctly saved in the data backend.
The activation email should be send as a result of a successful save operation by the controller.
Pseudo Code:
public class RegisterModel {
private String username;
private String email;
// ...
}
public class RegisterAction extends ApplicationController {
public void register(UserData data) {
// fill the model
RegisterModel model = new RegisterModel();
model.setUsername(data.getUsername());
// fill properties ...
// save the model - a DAO approach would be better
boolean result = model.save();
if(result)
sendActivationEmail(data);
}
}
More info to the MVC concept can be found here:
It should be noted that MVC is not a design pattern that fits well for every kind of application. In your case, sending the email is an operation that simply has no perfect place in the MVC pattern. If you are using a framework that forces you to use MVC, put it into the controller, as other people have said.

Repository Pattern: What is the 'right size'?

I'm building some repositories for an MVC application, and I'm trying to come up with the right way to divide responsibilities between repositories. In most cases, this is obvious. But there is one particular case where I'm not sure what the right answer is.
The users of this application need to track multiple types of time for their employees. For simplicity, let's consider only two. I'll call them "time cards" and "attendance." The exact nature of the difference between these two is not really important, but you should note that the end-users consider them entirely separate data. I think, though, that the reason they consider them entirely separate data is that they have never really had the opportunity to see them together in the past. Both types of records have almost entirely different business rules concerning editing the records, but they are also, generally speaking, both records of where an employee was at a particular time. Both types of time records have a great deal of properties in common, such as a total number of hours, and an employee for whom the time was collected. Both types also have a few properties which are completely unique to the individual type. We're keeping these "extra" properties in an instance of another type. So the general structure looks like this:
class TimeRecord
{
Person Employee { get; set; }
TimeSpan? Hours { get; set; }
}
class TimeCardData
{
TimeRecord Record { get; set; }
TProperty TimeCardProperty { get; set; }
}
class AttendanceData
{
TimeRecord Record { get; set; }
TProperty AttendanceProperty { get; set; }
}
So the question is, How many repositories are required here?
1 Repository
A design with only one repository would expose methods to return "time cards", "attendance" records, or both types in one list. This is fairly convenient for clients of the repository, but, to my mind, has a real danger of becoming a very fat class. I think that a repository for just "time cards" is already going to be one of the largest repositories in the system even without also handling "attendance" simply due to the complex business rules involved.
2 Repositories
Another design would have one repository for "time cards" and another repository for "attendance" records. This has the advantage that the business rules for, e.g., "time cards" are in a place by themselves. But I'd also like to have a way to get a list of all time records, regardless of type. It's not clear which repository to use for this case. Both?
3 Repositories
A design with one repository for "time cards", another repository for "attendance" records, and a third repository to deliver a read-only list of all time records is also a possibility. Like the 2 repository design, this has the advantage that the business rules for, e.g., "time cards" are in a place by themselves. It's now clear where to get the combined list. But I find it a bit weird that I could get the same record from two different repositories.
Hybrid
A hybrid approach would use a single repository, but move any business rules code (including selection of records) into separate types. In this example, a single "time record repository" would aggregate instances of business rule implementation classes for "time card" and "attendance" time. I think this is the approach I'm favoring right now.
Other?
Anything I've missed? Any compelling arguments for one design over the other?
Repositories are, at least not to my knowledge, a place for business rules. They are just a facade meant to mimic a collection; underneath they're basically pure data access (if that's they're job, you may not be persisting anything with a Repository as well). So separate repositories should not be considered for "business rules" reasons.
If your domain objects are really separate objects, then you should have separate repositories. Remember what a repository is: it's a facade. It mimics a collection to your domain. See here for a really good blog post on Repositories: http://devlicio.us/blogs/casey/archive/2009/02/20/ddd-the-repository-pattern.aspx
The repository is a facade; an abstraction.
That said... I do not think you have separate objects. You've got some issues here that have nothing to do with repositories and everything to do with the domain and the design of the domain. Are the two types of "timecards" actually two different things, or are they really the same?
You say, "But I find it a bit weird that I could get the same record from two different repositories."
That tells me that they are actually the same data, expressed in different ways. And there are ways to handle that.
If this is really the case, then what you have here is subclasses of a common base class (something that can be modeled in a DB pretty easy and handled elegantly with NHibernate, for instance).
I'll give you an example of a project I am working on. I have something called a "Broadcast". It's a base class; abstract. Can't be instantiated. I have two specific concrete types of this class: DeviceBroadcast and FileBroadcast. One streams audio/video from a device (like a DirectX capture card) and one streams audio/video from a file source (like an .mp3).
I have one repository that returns a Broadcast object. I can cast it to a FileBroadcast to manipulate specific information about a FileBroadcast, or I can cast to a DeviceBroadcast for the same reason - if it is of that type. A Broadcast cannot be both a FileBroadcast and DeviceBroadcast type. It has to be one or the other.
In the database I store the generic broadcast parameters in a Broadcast table, and then I store the file specific properties in a FileBroadcast table. Same goes for the DeviceBroadcast table; separate. When I query via the Repository, however, I just want Broadcasts. That's my root aggregate object and thus that's my repository.
The Broadcast base class has common methods that both subclasses use (like the GetCommand() method, which returns a specific command-line argument to launch a VLC process). Subclasses have to override and implement that method because it's abstract. In this way, the "business logic" that is unique to a FileBroadcast is contained in the FileBroadcast class. The "business logic" that is unique to a DeviceBroadcast is contained in the DeviceBroadcast class. Any logic that is common to both is contained in the superclass, Broadcast.
You seem to have a similiar situation here and that's why I am sharing my design. I think it might serve you well.
Above all, think about your domain and the data. If you are going to get duplicate data by way of separate repositories, then you need to give more thought to how you're designing the domain. Don't let the users dictate your domain design either. They know the domain from their perspective. All you have to do is be able to present the data to them in a way they understand. That doesn't mean you have to have a bad design; You can have a good design behind the scenes because your code is the thing that has to use the domain.

Resources